Re: question about curator - retry policy

Jordan Zimmerman Fri, 20 May 2016 19:58:54 -0700

You don’t need to maintain your own cache. Service Discovery already handles 
that.


-Jordan

> On May 20, 2016, at 5:36 PM, Moshiko Kasirer <[email protected]> wrote:
> 
> We are using nginx as our web tier which delegate requests to app nodes using 
> consistent hashing to one of the registered app nodes. Since we have many web 
> and app nodes we have to make sure all available app nodes are known to the 
> web tier and that in any given time they all see the same app nodes picture. 
> So we built an app on top of your service discovery that when app node ris up 
> he register and web tier is listening to that cluster and changes his 
> available app nodes view.In adoption we handle situations when there is on 
> connection to zk using a cache file with latest available view until the 
> connection is restored. For some reason sometimes although zk is up and 
> running the curator connection to which we listen to know if we should 
> reregister isn't invoked meaning stays as LOST...
> 
> בתאריך 21 במאי 2016 01:23,‏ "Jordan Zimmerman" <[email protected] 
> <mailto:[email protected]>> כתב:
> Retry policy is only used for individual operations. Any client-server system 
> needs to have retries to avoid temporary network events. The entire 
> curator-client and curator-framework modules are written to handle ZooKeeper 
> client connection maintenance. So, there isn’t one thing I can point to. 
> 
> Internally, the ServiceDiscovery code uses a PathChildrenCache instance. If 
> all you are using is Service Discovery there is almost no need for you to 
> monitor the connection state. What are you trying to accomplish?
> 
> -Jordan
> 
>> On May 20, 2016, at 5:19 PM, Moshiko Kasirer <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> The thing is we have many negative tests in which we stop and start the zk 
>> quorum the issue I raised only happens from time to time.... So it's hat I 
>> hard to reproduce. But you just wrote that when the quorom is up the 
>> connection should be reconnected ... how? who does that? ZkClient  or 
>> curator? That is not related to retry policy?
>> 
>> בתאריך 21 במאי 2016 01:12,‏ "Jordan Zimmerman" <[email protected] 
>> <mailto:[email protected]>> כתב:
>> If the ZK cluster’s quorum is restored, then the connection state should 
>> change to RECONNECTED. There are copious tests in Curator itself that show 
>> this. If you’re seeing that Curator does not restore a broken connection 
>> then there is a deeper bug. Can you create a test that shows the problem?
>> 
>> -Jordan
>> 
>>> On May 20, 2016, at 5:07 PM, Moshiko Kasirer <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> I mean that while zk cluster is up the curator connection state stays LOST
>>> Which in our case means the app node in which it happens doesnt register 
>>> himself as avalable.... I just don't seem to understand when does curator 
>>> gives up on trying to connect zk and when he doesn't give up. 
>>> Thanks for the help !
>>> 
>>> בתאריך 21 במאי 2016 00:58,‏ "Jordan Zimmerman" <[email protected] 
>>> <mailto:[email protected]>> כתב:
>>> You must have a retry policy so that you don’t overwhelm your network and 
>>> ZooKeeper cluster. The example code shows how to create a reasonable one.
>>>> sometimes although zk cluster is up the curator service discovery 
>>>> connection isn't
>>>> 
>>> Service Discovery’s internal instances might be waiting based on the retry 
>>> policy. But, what do you mean by "curator service discovery connection 
>>> isn’t”? There isn’t such a thing as a service discovery connection. 
>>> 
>>> -Jordan
>>> 
>>>> On May 20, 2016, at 4:53 PM, Moshiko Kasirer <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> We are using your service discovery. So you are saying I should not care 
>>>> about the retry policy...? So the only thing left to explain is how come 
>>>> sometimes although zk cluster is up the curator service discovery 
>>>> connection isn't.....
>>>> 
>>>> בתאריך 21 במאי 2016 00:43,‏ "Jordan Zimmerman" <[email protected] 
>>>> <mailto:[email protected]>> כתב:
>>>> If you are using Curator’s Service Discovery code, it will be continuously 
>>>> re-trying the connections. This is not because of the retry policy it’s 
>>>> because the Service Discovery code manages connection interruptions 
>>>> internally.
>>>> 
>>>> -Jordan
>>>> 
>>>>> On May 20, 2016, at 4:40 PM, Moshiko Kasirer <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Thanks for the replay I will send those logs ASAP.
>>>>> It's difficult to understand the connection mechanism of zk ....
>>>>> We are using curator 2.10 as our service discovery so we have to make 
>>>>> sure that when zk is alive we connect and inform the our server is up we 
>>>>> do that by listening to curator connection listener which I think has 
>>>>> also to do with the retry policy.... But what I can't understand is why 
>>>>> sometimes we can see in the log that curator gave up (Lost) yet still a 
>>>>> second later curator connection is restored how? Is it because zk session 
>>>>> heartbeat restored the connection? Does that Iovine curator to change his 
>>>>> connection state? And on the other side we sometimes get to a point were 
>>>>> zk is up but curator connection stays as Lost...
>>>>> That is why I thought of using the new always try policy you entered do 
>>>>> you think it can help? That why  hope there will be no way that zk is up 
>>>>> but curator status is lost.....as once he will retry he will reconnect to 
>>>>> zk.... Is that correct?
>>>>> 
>>>>> בתאריך 21 במאי 2016 00:10,‏ "Jordan Zimmerman" 
>>>>> <[email protected] <mailto:[email protected]>> כתב:
>>>>> Curator’s retry policies are used within each CuratorFramework operation. 
>>>>> For example, when you call client.setData().forPath(p, b) the retry 
>>>>> policy will be invoked if there is a retry-able exception during the 
>>>>> operation. In addition to the retryPolicy, there are connection timeouts. 
>>>>> The behavior of how this is handled changed between Curator 2.x and 
>>>>> Curator 3.x. In Curator 2.x, for every iteration of the retry, the 
>>>>> operation will wait until connection timeout when there’s no connection. 
>>>>> In Curator 3.x, the connection timeout wait only occurs once (if the 
>>>>> default ConnectionHandlingPolicy is used).
>>>>> 
>>>>> In any event, ZooKeeper itself tries to maintain the connection. Also, 
>>>>> Curator will re-create the internally managed connection depending 
>>>>> various network interruptions, etc. I’d need to see the logs to give you 
>>>>> more input. 
>>>>> 
>>>>> -Jordan
>>>>> 
>>>>>> On May 19, 2016, at 10:12 AM, Moshiko Kasirer <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> first i would like to thank you about curator we are using it as part of 
>>>>>> our service discovery 
>>>>>> 
>>>>>> solution and it helps a lot!! 
>>>>>> 
>>>>>> i have a question i hope you will be able to help me with. 
>>>>>> 
>>>>>> its regarding the curator retry policy it seems to me we dont really 
>>>>>> understand when this policy is 
>>>>>> 
>>>>>> invoked,  as i see in our logs that although i configured it as max 
>>>>>> retry 1 actually in the logs i see 
>>>>>> 
>>>>>> many ZK re connection attempts (and many curator gave up messages but 
>>>>>> later i see 
>>>>>> 
>>>>>> reconnected status...) . is it possible that that policy is only 
>>>>>> relevant to manually invoked 
>>>>>> 
>>>>>> operations against the ZK cluster done via curator ? and that the re 
>>>>>> connections i see in the logs 
>>>>>> 
>>>>>> are caused by the fact that the ZK was available during start up so 
>>>>>> sessions were created and 
>>>>>> 
>>>>>> then when ZK was down the ZK clients (not curator)  are sending 
>>>>>> heartbeats as part of the ZK 
>>>>>> 
>>>>>> architecture? that is the part i am failing to understand and i hope you 
>>>>>> can help me with that.
>>>>>> 
>>>>>> you have recently added RetreyAllways policy and i wanted to know if it 
>>>>>> is save to use it? 
>>>>>> 
>>>>>> the thing is we always want to retry to reconnect to ZK when he is 
>>>>>> available but that is something 
>>>>>> 
>>>>>> the ZK client does as long as he has open sessions right?  i am not sure 
>>>>>> that it has to do with the 
>>>>>> 
>>>>>> retry policy ... 
>>>>>> 
>>>>>> thanks,
>>>>>> 
>>>>>> moshiko
>>>>>> 
>>>>>> -- 
>>>>>> 
>>>>>> Moshiko Kasirer
>>>>>> Software Engineer
>>>>>> T: +972-74-700-4357 <tel:%2B972-74-700-4357>
>>>>>>  <http://www.linkedin.com/company/164748>         
>>>>>> <http://twitter.com/liveperson>         
>>>>>> <http://www.facebook.com/LivePersonInc>        We Create Meaningful 
>>>>>> Connections
>>>>>>  <http://roia.biz/im/n/ndiXvq1BAAGhL0MAABW7QgABwExmMQA-A/>
>>>>>> 
>>>>>> This message may contain confidential and/or privileged information. 
>>>>>> If you are not the addressee or authorized to receive this on behalf of 
>>>>>> the addressee you must not use, copy, disclose or take action based on 
>>>>>> this message or any information herein. 
>>>>>> If you have received this message in error, please advise the sender 
>>>>>> immediately by reply email and delete this message. Thank you.
>>>>> 
>>>>> 
>>>>> This message may contain confidential and/or privileged information. 
>>>>> If you are not the addressee or authorized to receive this on behalf of 
>>>>> the addressee you must not use, copy, disclose or take action based on 
>>>>> this message or any information herein. 
>>>>> If you have received this message in error, please advise the sender 
>>>>> immediately by reply email and delete this message. Thank you.
>>>> 
>>>> 
>>>> This message may contain confidential and/or privileged information. 
>>>> If you are not the addressee or authorized to receive this on behalf of 
>>>> the addressee you must not use, copy, disclose or take action based on 
>>>> this message or any information herein. 
>>>> If you have received this message in error, please advise the sender 
>>>> immediately by reply email and delete this message. Thank you.
>>> 
>>> 
>>> This message may contain confidential and/or privileged information. 
>>> If you are not the addressee or authorized to receive this on behalf of the 
>>> addressee you must not use, copy, disclose or take action based on this 
>>> message or any information herein. 
>>> If you have received this message in error, please advise the sender 
>>> immediately by reply email and delete this message. Thank you.
>> 
>> 
>> This message may contain confidential and/or privileged information. 
>> If you are not the addressee or authorized to receive this on behalf of the 
>> addressee you must not use, copy, disclose or take action based on this 
>> message or any information herein. 
>> If you have received this message in error, please advise the sender 
>> immediately by reply email and delete this message. Thank you.
> 
> 
> This message may contain confidential and/or privileged information. 
> If you are not the addressee or authorized to receive this on behalf of the 
> addressee you must not use, copy, disclose or take action based on this 
> message or any information herein. 
> If you have received this message in error, please advise the sender 
> immediately by reply email and delete this message. Thank you.

Re: question about curator - retry policy

Reply via email to