Re: quorum connection manager shutdown takes long time

Powell Molleti Tue, 01 Sep 2015 14:50:29 -0700

Apologies for not posting the link to the old thread, here it is:
http://bit.ly/1JAaJaJ


Thanks
Powell.

On 8/31/15, 2:34 PM, "Powell Molleti" <[email protected]> wrote:

>In reference to:
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir
>a_browse_ZOOKEEPER-2D2246&d=BQIFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNt
>Xt-uEs&r=yJGBUr8YNYcKMSgrAENRm8UHFXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_op
>YW1s-OXb2MVJaveBSbPqIFQw&s=UVM1pPxP0lnSUZGXwuC4jgmqh82pMqRdHJTXWKjy7pQ&e=
>
>Plainly removing  sock.setSoTimeout(0) from
>https://urldefense.proofpoint.com/v2/url?u=http-3A__s.apache.org_TfI&d=BQI
>FAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8
>UHFXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=Sddv
>lzYICW65qMs-kxwcASfZGRMQKh_67Ot4EpzPW4k&e=  has the unintended
>consequence of shutting down both the RecvWorker and SendWorker threads
>for all cases. Seems like current code is designed to  keep the socket
>alive (and threads to keep running) so as to reuse this channel to
>communicate again with the the peer node which still alive but needs to
>redo leader election.
>
>I could not reproduce any issue if threads shutdown after the timeout
>since new threads are created for next iteration of leader election. I
>rather would like to reuse the threads and the channel hence I propose
>the following approach.
>
>The alternative I suggest is to still remove setSoTimeout(0) from here:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__s.apache.org_TfI&d=BQI
>FAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8
>UHFXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=Sddv
>lzYICW65qMs-kxwcASfZGRMQKh_67Ot4EpzPW4k&e=   , also enable SO_KEEPALIVE
>via setKeepAlive() on this socket and do not consider it an error when
>timeout occurs here:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.ly_1JHIdVY&d=BQIFA
>w&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8UH
>FXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=ktRCMe
>jYwu8LPG_s1B6_rlPeoZFTNj8PrRET3yEAg6A&e=  but consider it an error when
>it happens here: 
>https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.ly_1NTjQ9R&d=BQIFA
>w&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8UH
>FXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=jUAFeY
>zMBnBkanBaYzZ8blViliOscQ4eSd0xm7FYb9g&e=
>
>This means that users can play with keep alive timeouts for TCP sockets
>to quicken TCP socket failures propagating to user-space and zookeeper
>also resets the socket if it detects other side is not responding when it
>knows it needs a response within some bounded time.
>
>Ideally I wish there is some userspace pings of every socket channel
>between zookeeper nodes to detect dead channels quickly. Seems like one
>exists for sockets that do Follow/Lead after leader election is done but
>not for this?. Such a feature could be added with care towards making it
>backward compatible.
>
>I posted the above text to Jira. Also please point out any wrong
>assumptions I have made and provide comments and suggestions.
>
>Thanks
>Powell.
>
>
>> From Raúl Gutiérrez Segalés <[email protected]>
>> Subject Re: quorum connection manager shutdown takes long time
>> Date Thu, 10 Jul 2014 18:02:37 GMT
>> On 9 July 2014 08:28, Michi Mutsuzaki <[email protected]> wrote:
>
>>> I don't know how I missed that :) QA said this is reproducible, so
>>> I'll try commenting this line out. Thanks Flavio!
>>>
>
>> I am curious, was it that?
>> -rgs
>

Re: quorum connection manager shutdown takes long time

Reply via email to