Re: question about curator - retry policy

Jordan Zimmerman Fri, 20 May 2016 15:24:44 -0700

Retry policy is only used for individual operations. Any client-server system 
needs to have retries to avoid temporary network events. The entire 
curator-client and curator-framework modules are written to handle ZooKeeper 
client connection maintenance. So, there isn’t one thing I can point to.


Internally, the ServiceDiscovery code uses a PathChildrenCache instance. If all 
you are using is Service Discovery there is almost no need for you to monitor 
the connection state. What are you trying to accomplish?

-Jordan

> On May 20, 2016, at 5:19 PM, Moshiko Kasirer <[email protected]> wrote:
> 
> The thing is we have many negative tests in which we stop and start the zk 
> quorum the issue I raised only happens from time to time.... So it's hat I 
> hard to reproduce. But you just wrote that when the quorom is up the 
> connection should be reconnected ... how? who does that? ZkClient  or 
> curator? That is not related to retry policy?
> 
> בתאריך 21 במאי 2016 01:12,‏ "Jordan Zimmerman" <[email protected] 
> <mailto:[email protected]>> כתב:
> If the ZK cluster’s quorum is restored, then the connection state should 
> change to RECONNECTED. There are copious tests in Curator itself that show 
> this. If you’re seeing that Curator does not restore a broken connection then 
> there is a deeper bug. Can you create a test that shows the problem?
> 
> -Jordan
> 
>> On May 20, 2016, at 5:07 PM, Moshiko Kasirer <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> I mean that while zk cluster is up the curator connection state stays LOST
>> Which in our case means the app node in which it happens doesnt register 
>> himself as avalable.... I just don't seem to understand when does curator 
>> gives up on trying to connect zk and when he doesn't give up. 
>> Thanks for the help !
>> 
>> בתאריך 21 במאי 2016 00:58,‏ "Jordan Zimmerman" <[email protected] 
>> <mailto:[email protected]>> כתב:
>> You must have a retry policy so that you don’t overwhelm your network and 
>> ZooKeeper cluster. The example code shows how to create a reasonable one.
>>> sometimes although zk cluster is up the curator service discovery 
>>> connection isn't
>>> 
>> Service Discovery’s internal instances might be waiting based on the retry 
>> policy. But, what do you mean by "curator service discovery connection 
>> isn’t”? There isn’t such a thing as a service discovery connection. 
>> 
>> -Jordan
>> 
>>> On May 20, 2016, at 4:53 PM, Moshiko Kasirer <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> We are using your service discovery. So you are saying I should not care 
>>> about the retry policy...? So the only thing left to explain is how come 
>>> sometimes although zk cluster is up the curator service discovery 
>>> connection isn't.....
>>> 
>>> בתאריך 21 במאי 2016 00:43,‏ "Jordan Zimmerman" <[email protected] 
>>> <mailto:[email protected]>> כתב:
>>> If you are using Curator’s Service Discovery code, it will be continuously 
>>> re-trying the connections. This is not because of the retry policy it’s 
>>> because the Service Discovery code manages connection interruptions 
>>> internally.
>>> 
>>> -Jordan
>>> 
>>>> On May 20, 2016, at 4:40 PM, Moshiko Kasirer <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Thanks for the replay I will send those logs ASAP.
>>>> It's difficult to understand the connection mechanism of zk ....
>>>> We are using curator 2.10 as our service discovery so we have to make sure 
>>>> that when zk is alive we connect and inform the our server is up we do 
>>>> that by listening to curator connection listener which I think has also to 
>>>> do with the retry policy.... But what I can't understand is why sometimes 
>>>> we can see in the log that curator gave up (Lost) yet still a second later 
>>>> curator connection is restored how? Is it because zk session heartbeat 
>>>> restored the connection? Does that Iovine curator to change his connection 
>>>> state? And on the other side we sometimes get to a point were zk is up but 
>>>> curator connection stays as Lost...
>>>> That is why I thought of using the new always try policy you entered do 
>>>> you think it can help? That why  hope there will be no way that zk is up 
>>>> but curator status is lost.....as once he will retry he will reconnect to 
>>>> zk.... Is that correct?
>>>> 
>>>> בתאריך 21 במאי 2016 00:10,‏ "Jordan Zimmerman" <[email protected] 
>>>> <mailto:[email protected]>> כתב:
>>>> Curator’s retry policies are used within each CuratorFramework operation. 
>>>> For example, when you call client.setData().forPath(p, b) the retry policy 
>>>> will be invoked if there is a retry-able exception during the operation. 
>>>> In addition to the retryPolicy, there are connection timeouts. The 
>>>> behavior of how this is handled changed between Curator 2.x and Curator 
>>>> 3.x. In Curator 2.x, for every iteration of the retry, the operation will 
>>>> wait until connection timeout when there’s no connection. In Curator 3.x, 
>>>> the connection timeout wait only occurs once (if the default 
>>>> ConnectionHandlingPolicy is used).
>>>> 
>>>> In any event, ZooKeeper itself tries to maintain the connection. Also, 
>>>> Curator will re-create the internally managed connection depending various 
>>>> network interruptions, etc. I’d need to see the logs to give you more 
>>>> input. 
>>>> 
>>>> -Jordan
>>>> 
>>>>> On May 19, 2016, at 10:12 AM, Moshiko Kasirer <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> first i would like to thank you about curator we are using it as part of 
>>>>> our service discovery 
>>>>> 
>>>>> solution and it helps a lot!! 
>>>>> 
>>>>> i have a question i hope you will be able to help me with. 
>>>>> 
>>>>> its regarding the curator retry policy it seems to me we dont really 
>>>>> understand when this policy is 
>>>>> 
>>>>> invoked,  as i see in our logs that although i configured it as max retry 
>>>>> 1 actually in the logs i see 
>>>>> 
>>>>> many ZK re connection attempts (and many curator gave up messages but 
>>>>> later i see 
>>>>> 
>>>>> reconnected status...) . is it possible that that policy is only relevant 
>>>>> to manually invoked 
>>>>> 
>>>>> operations against the ZK cluster done via curator ? and that the re 
>>>>> connections i see in the logs 
>>>>> 
>>>>> are caused by the fact that the ZK was available during start up so 
>>>>> sessions were created and 
>>>>> 
>>>>> then when ZK was down the ZK clients (not curator)  are sending 
>>>>> heartbeats as part of the ZK 
>>>>> 
>>>>> architecture? that is the part i am failing to understand and i hope you 
>>>>> can help me with that.
>>>>> 
>>>>> you have recently added RetreyAllways policy and i wanted to know if it 
>>>>> is save to use it? 
>>>>> 
>>>>> the thing is we always want to retry to reconnect to ZK when he is 
>>>>> available but that is something 
>>>>> 
>>>>> the ZK client does as long as he has open sessions right?  i am not sure 
>>>>> that it has to do with the 
>>>>> 
>>>>> retry policy ... 
>>>>> 
>>>>> thanks,
>>>>> 
>>>>> moshiko
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Moshiko Kasirer
>>>>> Software Engineer
>>>>> T: +972-74-700-4357 <tel:%2B972-74-700-4357>
>>>>>  <http://www.linkedin.com/company/164748>  
>>>>> <http://twitter.com/liveperson>         
>>>>> <http://www.facebook.com/LivePersonInc>        We Create Meaningful 
>>>>> Connections
>>>>>  <http://roia.biz/im/n/ndiXvq1BAAGhL0MAABW7QgABwExmMQA-A/>
>>>>> 
>>>>> This message may contain confidential and/or privileged information. 
>>>>> If you are not the addressee or authorized to receive this on behalf of 
>>>>> the addressee you must not use, copy, disclose or take action based on 
>>>>> this message or any information herein. 
>>>>> If you have received this message in error, please advise the sender 
>>>>> immediately by reply email and delete this message. Thank you.
>>>> 
>>>> 
>>>> This message may contain confidential and/or privileged information. 
>>>> If you are not the addressee or authorized to receive this on behalf of 
>>>> the addressee you must not use, copy, disclose or take action based on 
>>>> this message or any information herein. 
>>>> If you have received this message in error, please advise the sender 
>>>> immediately by reply email and delete this message. Thank you.
>>> 
>>> 
>>> This message may contain confidential and/or privileged information. 
>>> If you are not the addressee or authorized to receive this on behalf of the 
>>> addressee you must not use, copy, disclose or take action based on this 
>>> message or any information herein. 
>>> If you have received this message in error, please advise the sender 
>>> immediately by reply email and delete this message. Thank you.
>> 
>> 
>> This message may contain confidential and/or privileged information. 
>> If you are not the addressee or authorized to receive this on behalf of the 
>> addressee you must not use, copy, disclose or take action based on this 
>> message or any information herein. 
>> If you have received this message in error, please advise the sender 
>> immediately by reply email and delete this message. Thank you.
> 
> 
> This message may contain confidential and/or privileged information. 
> If you are not the addressee or authorized to receive this on behalf of the 
> addressee you must not use, copy, disclose or take action based on this 
> message or any information herein. 
> If you have received this message in error, please advise the sender 
> immediately by reply email and delete this message. Thank you.

Re: question about curator - retry policy

Reply via email to