You don’t need to maintain your own cache. Service Discovery already handles that.
-Jordan > On May 20, 2016, at 5:36 PM, Moshiko Kasirer <[email protected]> wrote: > > We are using nginx as our web tier which delegate requests to app nodes using > consistent hashing to one of the registered app nodes. Since we have many web > and app nodes we have to make sure all available app nodes are known to the > web tier and that in any given time they all see the same app nodes picture. > So we built an app on top of your service discovery that when app node ris up > he register and web tier is listening to that cluster and changes his > available app nodes view.In adoption we handle situations when there is on > connection to zk using a cache file with latest available view until the > connection is restored. For some reason sometimes although zk is up and > running the curator connection to which we listen to know if we should > reregister isn't invoked meaning stays as LOST... > > בתאריך 21 במאי 2016 01:23, "Jordan Zimmerman" <[email protected] > <mailto:[email protected]>> כתב: > Retry policy is only used for individual operations. Any client-server system > needs to have retries to avoid temporary network events. The entire > curator-client and curator-framework modules are written to handle ZooKeeper > client connection maintenance. So, there isn’t one thing I can point to. > > Internally, the ServiceDiscovery code uses a PathChildrenCache instance. If > all you are using is Service Discovery there is almost no need for you to > monitor the connection state. What are you trying to accomplish? > > -Jordan > >> On May 20, 2016, at 5:19 PM, Moshiko Kasirer <[email protected] >> <mailto:[email protected]>> wrote: >> >> The thing is we have many negative tests in which we stop and start the zk >> quorum the issue I raised only happens from time to time.... So it's hat I >> hard to reproduce. But you just wrote that when the quorom is up the >> connection should be reconnected ... how? who does that? ZkClient or >> curator? That is not related to retry policy? >> >> בתאריך 21 במאי 2016 01:12, "Jordan Zimmerman" <[email protected] >> <mailto:[email protected]>> כתב: >> If the ZK cluster’s quorum is restored, then the connection state should >> change to RECONNECTED. There are copious tests in Curator itself that show >> this. If you’re seeing that Curator does not restore a broken connection >> then there is a deeper bug. Can you create a test that shows the problem? >> >> -Jordan >> >>> On May 20, 2016, at 5:07 PM, Moshiko Kasirer <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I mean that while zk cluster is up the curator connection state stays LOST >>> Which in our case means the app node in which it happens doesnt register >>> himself as avalable.... I just don't seem to understand when does curator >>> gives up on trying to connect zk and when he doesn't give up. >>> Thanks for the help ! >>> >>> בתאריך 21 במאי 2016 00:58, "Jordan Zimmerman" <[email protected] >>> <mailto:[email protected]>> כתב: >>> You must have a retry policy so that you don’t overwhelm your network and >>> ZooKeeper cluster. The example code shows how to create a reasonable one. >>>> sometimes although zk cluster is up the curator service discovery >>>> connection isn't >>>> >>> Service Discovery’s internal instances might be waiting based on the retry >>> policy. But, what do you mean by "curator service discovery connection >>> isn’t”? There isn’t such a thing as a service discovery connection. >>> >>> -Jordan >>> >>>> On May 20, 2016, at 4:53 PM, Moshiko Kasirer <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> We are using your service discovery. So you are saying I should not care >>>> about the retry policy...? So the only thing left to explain is how come >>>> sometimes although zk cluster is up the curator service discovery >>>> connection isn't..... >>>> >>>> בתאריך 21 במאי 2016 00:43, "Jordan Zimmerman" <[email protected] >>>> <mailto:[email protected]>> כתב: >>>> If you are using Curator’s Service Discovery code, it will be continuously >>>> re-trying the connections. This is not because of the retry policy it’s >>>> because the Service Discovery code manages connection interruptions >>>> internally. >>>> >>>> -Jordan >>>> >>>>> On May 20, 2016, at 4:40 PM, Moshiko Kasirer <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Thanks for the replay I will send those logs ASAP. >>>>> It's difficult to understand the connection mechanism of zk .... >>>>> We are using curator 2.10 as our service discovery so we have to make >>>>> sure that when zk is alive we connect and inform the our server is up we >>>>> do that by listening to curator connection listener which I think has >>>>> also to do with the retry policy.... But what I can't understand is why >>>>> sometimes we can see in the log that curator gave up (Lost) yet still a >>>>> second later curator connection is restored how? Is it because zk session >>>>> heartbeat restored the connection? Does that Iovine curator to change his >>>>> connection state? And on the other side we sometimes get to a point were >>>>> zk is up but curator connection stays as Lost... >>>>> That is why I thought of using the new always try policy you entered do >>>>> you think it can help? That why hope there will be no way that zk is up >>>>> but curator status is lost.....as once he will retry he will reconnect to >>>>> zk.... Is that correct? >>>>> >>>>> בתאריך 21 במאי 2016 00:10, "Jordan Zimmerman" >>>>> <[email protected] <mailto:[email protected]>> כתב: >>>>> Curator’s retry policies are used within each CuratorFramework operation. >>>>> For example, when you call client.setData().forPath(p, b) the retry >>>>> policy will be invoked if there is a retry-able exception during the >>>>> operation. In addition to the retryPolicy, there are connection timeouts. >>>>> The behavior of how this is handled changed between Curator 2.x and >>>>> Curator 3.x. In Curator 2.x, for every iteration of the retry, the >>>>> operation will wait until connection timeout when there’s no connection. >>>>> In Curator 3.x, the connection timeout wait only occurs once (if the >>>>> default ConnectionHandlingPolicy is used). >>>>> >>>>> In any event, ZooKeeper itself tries to maintain the connection. Also, >>>>> Curator will re-create the internally managed connection depending >>>>> various network interruptions, etc. I’d need to see the logs to give you >>>>> more input. >>>>> >>>>> -Jordan >>>>> >>>>>> On May 19, 2016, at 10:12 AM, Moshiko Kasirer <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> first i would like to thank you about curator we are using it as part of >>>>>> our service discovery >>>>>> >>>>>> solution and it helps a lot!! >>>>>> >>>>>> i have a question i hope you will be able to help me with. >>>>>> >>>>>> its regarding the curator retry policy it seems to me we dont really >>>>>> understand when this policy is >>>>>> >>>>>> invoked, as i see in our logs that although i configured it as max >>>>>> retry 1 actually in the logs i see >>>>>> >>>>>> many ZK re connection attempts (and many curator gave up messages but >>>>>> later i see >>>>>> >>>>>> reconnected status...) . is it possible that that policy is only >>>>>> relevant to manually invoked >>>>>> >>>>>> operations against the ZK cluster done via curator ? and that the re >>>>>> connections i see in the logs >>>>>> >>>>>> are caused by the fact that the ZK was available during start up so >>>>>> sessions were created and >>>>>> >>>>>> then when ZK was down the ZK clients (not curator) are sending >>>>>> heartbeats as part of the ZK >>>>>> >>>>>> architecture? that is the part i am failing to understand and i hope you >>>>>> can help me with that. >>>>>> >>>>>> you have recently added RetreyAllways policy and i wanted to know if it >>>>>> is save to use it? >>>>>> >>>>>> the thing is we always want to retry to reconnect to ZK when he is >>>>>> available but that is something >>>>>> >>>>>> the ZK client does as long as he has open sessions right? i am not sure >>>>>> that it has to do with the >>>>>> >>>>>> retry policy ... >>>>>> >>>>>> thanks, >>>>>> >>>>>> moshiko >>>>>> >>>>>> -- >>>>>> >>>>>> Moshiko Kasirer >>>>>> Software Engineer >>>>>> T: +972-74-700-4357 <tel:%2B972-74-700-4357> >>>>>> <http://www.linkedin.com/company/164748> >>>>>> <http://twitter.com/liveperson> >>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful >>>>>> Connections >>>>>> <http://roia.biz/im/n/ndiXvq1BAAGhL0MAABW7QgABwExmMQA-A/> >>>>>> >>>>>> This message may contain confidential and/or privileged information. >>>>>> If you are not the addressee or authorized to receive this on behalf of >>>>>> the addressee you must not use, copy, disclose or take action based on >>>>>> this message or any information herein. >>>>>> If you have received this message in error, please advise the sender >>>>>> immediately by reply email and delete this message. Thank you. >>>>> >>>>> >>>>> This message may contain confidential and/or privileged information. >>>>> If you are not the addressee or authorized to receive this on behalf of >>>>> the addressee you must not use, copy, disclose or take action based on >>>>> this message or any information herein. >>>>> If you have received this message in error, please advise the sender >>>>> immediately by reply email and delete this message. Thank you. >>>> >>>> >>>> This message may contain confidential and/or privileged information. >>>> If you are not the addressee or authorized to receive this on behalf of >>>> the addressee you must not use, copy, disclose or take action based on >>>> this message or any information herein. >>>> If you have received this message in error, please advise the sender >>>> immediately by reply email and delete this message. Thank you. >>> >>> >>> This message may contain confidential and/or privileged information. >>> If you are not the addressee or authorized to receive this on behalf of the >>> addressee you must not use, copy, disclose or take action based on this >>> message or any information herein. >>> If you have received this message in error, please advise the sender >>> immediately by reply email and delete this message. Thank you. >> >> >> This message may contain confidential and/or privileged information. >> If you are not the addressee or authorized to receive this on behalf of the >> addressee you must not use, copy, disclose or take action based on this >> message or any information herein. >> If you have received this message in error, please advise the sender >> immediately by reply email and delete this message. Thank you. > > > This message may contain confidential and/or privileged information. > If you are not the addressee or authorized to receive this on behalf of the > addressee you must not use, copy, disclose or take action based on this > message or any information herein. > If you have received this message in error, please advise the sender > immediately by reply email and delete this message. Thank you.
