Hi Gelbana, max_latency tells you the time elapsed between creating the request and FinalRequestProcessor processing it. So the cause for being that high could be basically anything. Turning on debug log for ZooKeeper server could help pinpoint at what point the request get stuck for so long.
Regards, Norbert On Wed, Apr 3, 2019 at 3:07 PM Muhammad Gelbana <[email protected]> wrote: > Another couple of things I found: > > *A couple of Zookeeper client threads are stuck at these stacktraces for > ~30 seconds* > "pool-2-thread-1-EventThread" #1218 daemon prio=5 os_prio=0 > tid=0x00007ff3f5e23800 nid=0x5cd8 waiting on condition [0x00007ff3ef803000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000000018d6d8ed8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:504) > > "pool-2-thread-1-SendThread(72.55.136.25:2181)" #1217 daemon prio=5 > os_prio=0 tid=0x00007ff3f5e23000 nid=0x5cd7 runnable [0x00007ff3ef904000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x000000018d68a730> (a sun.nio.ch.Util$3) > - locked <0x000000018d68a720> (a java.util.Collections$UnmodifiableSet) > - locked <0x000000018d68a258> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) > > *Running the mntr command returned the following stats* > zk_version 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on > 06/29/2018 04:05 GMT > zk_avg_latency 0 > zk_max_latency *17657* > zk_min_latency 0 > zk_packets_received 1427134 > zk_packets_sent 1596974 > zk_num_alive_connections 64 > zk_outstanding_requests 0 > zk_server_state follower > zk_znode_count 1394 > zk_watch_count 592 > zk_ephemerals_count 192 > zk_approximate_data_size 181257 > zk_open_file_descriptor_count 94 > zk_max_file_descriptor_count 1048576 > zk_fsync_threshold_exceed_count 1 > > I find the *zk_max_latency* extremely hight. I'm wondering what kind of > latency is that ? How can I debug the reason for this value ? > > Thanks, > Gelbana > > > > On Wed, Apr 3, 2019 at 1:42 PM Muhammad Gelbana <[email protected]> > wrote: > > > I'm trying to debug a problem where our client application suddenly loses > > its Zookeeper session. I concluded that by looking at the Zookeeper > server > > logs. > > > > I increased the logging details for the client and found the following > log > > messages > > > >> DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> > > > > I noticed that the duration between each log message is ~33 seconds while > > on another environment (my laptop), the duration goes down to ~1 second. > > What could be causing this huge difference ? I doubt that whatever is > > causing this effect causes the delay to increase significantly at some > > point to the extend that makes my client lose its session. > > >
