perhaps it would fit into the common client that stefan is proposing. we
don't have such a timer currently in the client code that we just need
to expose, so it will be something we need to add. one thing to be
careful of is trying to be too tricky. you don't want to trigger right
after the session timeout because things can be in flight and a session
renewal response might actually be on the way or the service bounced due
to a leader failure, which is why i was recommending something like
twice the session timeout.
to be honest i think most of our applications just sit there trying to
reconnect forever. after all if you do close the session and try to move
on, the ZooKeeper service is still down, so trying with a new ZooKeeper
handle isn't going to help anything.
Jean-Daniel Cryans wrote:
Thank you, I now see the rationale in not telling the client it's session is
over because you can't be sure it actually is. But would it make sense to
add a new state in KeeperState representing that corner case? Something like
AfterSessionTimeout. I'm pretty sure other would find that useful for the
same reason as us.
If anyone +1 on that, I'll open a jira and give it a try.
On Tue, Jun 23, 2009 at 6:04 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:
ZooKeeper only tells you about states that it is sure about, so you will
not get the Expired event until you reconnect to ZooKeeper. if you never
connect again to ZooKeeper, you will not get the Expired event. if you want
to timeout using some sanity value, 2 times the session timeout for example,
you can implement that yourself by setting a timer when you get the
disconnected event and then close the session explicitly when the timer goes
there is a caveat in doing this: if your whole cluster goes down for 20
mins and then comes back up, your session timeout will get reset and the
session will still be alive even though you have closed it. it will then
have to timeout before it actually goes away. closing the session when the
client is disconnected just stops the client from trying to reconnect.
does this make sense?
Jean-Daniel Cryans wrote:
Working on integrating HBase with ZK, we came around an issue that we
are unable to resolve. I was trying to see how was our handling of
network partitions and session expirations and what I did is just
starting a single ZK instance with a very simple HBase setup, then I
killed the ZK server. The only thing I got from Zookeeper was a
KeeperState.Disconnected then... nothing (for like 20+ minutes).
Normally if I had a quorum I would still get that message but then I
would get another one telling me it's connected to another ZK quorum
server. So how do I know if I'm really partitioned from the ZK quroum?
Shouldn't we get a session expired at some point? From what I
understand you can only get a KeeperState.Expired when you connect
back to the quorum after x time, but what if you can "never" connect
back to it?
BTW this is r785019.
Thx a lot!