What Jordan said + time use is only in the relative sense, not the absolute. Session tracking (expiration) is relative to the start of leadership.
Patrick On Mon, Dec 4, 2017 at 12:21 PM, Jordan Zimmerman < [email protected]> wrote: > ZooKeeper, indeed, does not use wall clock time. It uses System.nanoTime() > for most operations. Further, all operations go through the Leader node so > only the Leader's notion of time matters. The Leader manages the session > via a "SessionTracker" instance. The code is in SessionTrackerImpl.java. > There is a sessionExpiryQueue which is a kind of priority queue that > returns expired sessions based on System.nanoTime(). > > -JZ > > > On Dec 4, 2017, at 12:09 PM, Abraham Fine <[email protected]> wrote: > > > > Hello Anthony and Shawn- > > > > To the best of my knowledge ZooKeeper does not use the "wall clock" time > > anywhere. So that should not be the problem. > > > > Please consider enabling debug logging, which should allow you to track > > the "pings". > > > > Thanks, > > Abe > > > > On Mon, Dec 4, 2017, at 11:51, Anthony Shaya wrote: > >> Thanks Shawn, should I message the developer mailing list for a more > >> definitive answer? > >> > >> Thanks again for the reply. > >> > >> -----Original Message----- > >> From: Shawn Heisey [mailto:[email protected]] > >> Sent: Monday, December 4, 2017 2:49 PM > >> To: [email protected] > >> Subject: Re: Zookeeper session expiration > >> > >> On 12/4/2017 8:22 AM, Anthony Shaya wrote: > >>> My question is related to how session expiration works, I noticed on > many of the client machines the times across these machines were all off > (by anywhere from 1 minute to 20 minutes - which was resolved after > discovery - haven't verified this completely yet). Can this directly affect > session expiration within the zookeeper cluster? > >>> > >>> * I read the following in https://na01.safelinks. > protection.outlook.com/?url=https%3A%2F%2Fwiki.apache.org% > 2Fhadoop%2FZooKeeper%2FFAQ&data=02%7C01%7C%7C6d6643860a4e4a8194c808d53b50 > 23ec%7Cc61157e903cb47589165ee7845cb0ca3%7C0%7C0% > 7C636480137750841475&sdata=RwGGH19FLeYFmXMrg5GBkSLJ65ANj1 > EXkTvwyk6OLd4%3D&reserved=0 , "Expirations happens when the cluster does > not hear from the client within the specified session timeout period (i.e. > no heartbeat).". So in some case it seems like if the times were wrong > across the machines its possible one of the clients could of effectively > sent a heart beat in the past (not sure about this tbh) and then the > cluster expires the session? > >> > >> I make these comments without any knowledge of what ZK code actually > >> does. I am a member of this list because I'm a representative of the > >> Apache Solr project, which uses the ZK client in order to maintain a > >> cluster. > >> > >> IMHO, any software which makes actual decisions based on the timestamps > >> in messages from another system is badly designed. I would hope that > the > >> ZK designers know this, and always make any decisions related to time > >> using the clock in the local system only. > >> > >> If ZK's designers did the right thing, then a session timeout would > >> indicate that quite literally no heartbeats were received in X seconds, > >> as measured by the local clock, and the local clock ONLY ... NOT from > >> timestamp information received from another system. > >> > >> Although such a lack of communication could be caused by any number of > >> things, including network hardware failure, one of the most common > >> reasons I have seen for problems like this is extreme java garbage > >> collection pauses in the client software. > >> > >> Situations where the heap is a little bit too small can cause a java > >> program to basically be doing garbage collection constantly, so it > >> doesn't have much time to do anything else, like send heartbeats to ZK > >> servers. > >> > >> Situations where the heap is HUGE and garbage collection is not well > >> tuned can lead to pauses of a minute or longer while Java does a massive > >> full GC. > >> > >>> * I don't have the zookeeper node log for the above time to see > what was going on in zookeeper when the cluster determined the session > expired. > >>> > >>> * Is there any additional logging I can turn on to troubleshoot zk > session expiration issues? > >> > >> Hopefully your ZK clients also have logging. Failing that, you could > >> turn on GC logging for the software with the ZK client (assuming it's a > >> Java client) and find a program or website that can examine the log and > >> give you statistics or a graph of GC pauses. > >> > >> If there is a problem in software using the client and whatever logging > >> is available doesn't help you figure out what's wrong, you're generally > >> going to need to talk to whoever wrote that software for help > >> troubleshooting it. > >> > >> Thanks, > >> Shawn > >> > >> > >> > >> This message is intended exclusively for the individual or entity to > >> which it is addressed. This communication may contain information that > is > >> proprietary, privileged, confidential or otherwise legally exempt from > >> disclosure. If you are not the named addressee, or have been > >> inadvertently and erroneously referenced in the address line, you are > not > >> authorized to read, print, retain, copy or disseminate this message or > >> any part of it. If you have received this message in error, please > notify > >> the sender immediately by e-mail and delete all copies of the message. > >> (ID m031214) > >
