Re: Unexpected delay between pings sent from the client to server
Try turning on verbose GC logging and look if it's GC. In my experience that's often the cause. "client application suddenly loses its Zookeeper session" - might be client GC, or it could be server GC. See if either of those shed light. Patrick On Sat, Apr 6, 2019 at 1:10 PM Norbert Kalmar wrote: > Hi Gelbana, > > max_latency tells you the time elapsed between creating the request and > FinalRequestProcessor processing it. So the cause for being that high could > be basically anything. > Turning on debug log for ZooKeeper server could help pinpoint at what point > the request get stuck for so long. > > Regards, > Norbert > > On Wed, Apr 3, 2019 at 3:07 PM Muhammad Gelbana > wrote: > > > Another couple of things I found: > > > > *A couple of Zookeeper client threads are stuck at these stacktraces for > > ~30 seconds* > > "pool-2-thread-1-EventThread" #1218 daemon prio=5 os_prio=0 > > tid=0x7ff3f5e23800 nid=0x5cd8 waiting on condition > [0x7ff3ef803000] > >java.lang.Thread.State: WAITING (parking) > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <0x00018d6d8ed8> (a > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > at > > > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > > at > > > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:504) > > > > "pool-2-thread-1-SendThread(72.55.136.25:2181)" #1217 daemon prio=5 > > os_prio=0 tid=0x7ff3f5e23000 nid=0x5cd7 runnable [0x7ff3ef904000] > >java.lang.Thread.State: RUNNABLE > > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > > - locked <0x00018d68a730> (a sun.nio.ch.Util$3) > > - locked <0x00018d68a720> (a > java.util.Collections$UnmodifiableSet) > > - locked <0x00018d68a258> (a sun.nio.ch.EPollSelectorImpl) > > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > > at > > > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) > > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) > > > > *Running the mntr command returned the following stats* > > zk_version3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on > > 06/29/2018 04:05 GMT > > zk_avg_latency0 > > zk_max_latency*17657* > > zk_min_latency0 > > zk_packets_received1427134 > > zk_packets_sent1596974 > > zk_num_alive_connections64 > > zk_outstanding_requests0 > > zk_server_statefollower > > zk_znode_count1394 > > zk_watch_count592 > > zk_ephemerals_count192 > > zk_approximate_data_size181257 > > zk_open_file_descriptor_count94 > > zk_max_file_descriptor_count1048576 > > zk_fsync_threshold_exceed_count1 > > > > I find the *zk_max_latency* extremely hight. I'm wondering what kind of > > latency is that ? How can I debug the reason for this value ? > > > > Thanks, > > Gelbana > > > > > > > > On Wed, Apr 3, 2019 at 1:42 PM Muhammad Gelbana > > wrote: > > > > > I'm trying to debug a problem where our client application suddenly > loses > > > its Zookeeper session. I concluded that by looking at the Zookeeper > > server > > > logs. > > > > > > I increased the logging details for the client and found the following > > log > > > messages > > > > > >> DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for > sessionid: > > >> 0x3000da76fa904b6 after 0ms > > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > > >> DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for > sessionid: > > >> 0x3000da76fa904b6 after 0ms > > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > > >> DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for > sessionid: > > >> 0x3000da76fa904b6 after 0ms > > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > > >> DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for > sessionid: > > >> 0x3000da76fa904b6 after 0ms > > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > > >> DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for > sessionid: > > >> 0x3000da76fa904b6 after 0ms > > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > > >> DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for > sessionid: > > >> 0x3000da76fa904b6 after 0ms > > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > > >> > > > > > > I noticed that the duration between each log message is ~33 seconds > while > > > on another environment (my laptop), the duration goes down to ~1 >
Re: Unexpected delay between pings sent from the client to server
Hi Gelbana, max_latency tells you the time elapsed between creating the request and FinalRequestProcessor processing it. So the cause for being that high could be basically anything. Turning on debug log for ZooKeeper server could help pinpoint at what point the request get stuck for so long. Regards, Norbert On Wed, Apr 3, 2019 at 3:07 PM Muhammad Gelbana wrote: > Another couple of things I found: > > *A couple of Zookeeper client threads are stuck at these stacktraces for > ~30 seconds* > "pool-2-thread-1-EventThread" #1218 daemon prio=5 os_prio=0 > tid=0x7ff3f5e23800 nid=0x5cd8 waiting on condition [0x7ff3ef803000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00018d6d8ed8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:504) > > "pool-2-thread-1-SendThread(72.55.136.25:2181)" #1217 daemon prio=5 > os_prio=0 tid=0x7ff3f5e23000 nid=0x5cd7 runnable [0x7ff3ef904000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x00018d68a730> (a sun.nio.ch.Util$3) > - locked <0x00018d68a720> (a java.util.Collections$UnmodifiableSet) > - locked <0x00018d68a258> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) > > *Running the mntr command returned the following stats* > zk_version3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on > 06/29/2018 04:05 GMT > zk_avg_latency0 > zk_max_latency*17657* > zk_min_latency0 > zk_packets_received1427134 > zk_packets_sent1596974 > zk_num_alive_connections64 > zk_outstanding_requests0 > zk_server_statefollower > zk_znode_count1394 > zk_watch_count592 > zk_ephemerals_count192 > zk_approximate_data_size181257 > zk_open_file_descriptor_count94 > zk_max_file_descriptor_count1048576 > zk_fsync_threshold_exceed_count1 > > I find the *zk_max_latency* extremely hight. I'm wondering what kind of > latency is that ? How can I debug the reason for this value ? > > Thanks, > Gelbana > > > > On Wed, Apr 3, 2019 at 1:42 PM Muhammad Gelbana > wrote: > > > I'm trying to debug a problem where our client application suddenly loses > > its Zookeeper session. I concluded that by looking at the Zookeeper > server > > logs. > > > > I increased the logging details for the client and found the following > log > > messages > > > >> DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for sessionid: > >> 0x3000da76fa904b6 after 0ms > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > >> > > > > I noticed that the duration between each log message is ~33 seconds while > > on another environment (my laptop), the duration goes down to ~1 second. > > What could be causing this huge difference ? I doubt that whatever is > > causing this effect causes the delay to increase significantly at some > > point to the extend that makes my client lose its session. > > >
Re: Unexpected delay between pings sent from the client to server
Another couple of things I found: *A couple of Zookeeper client threads are stuck at these stacktraces for ~30 seconds* "pool-2-thread-1-EventThread" #1218 daemon prio=5 os_prio=0 tid=0x7ff3f5e23800 nid=0x5cd8 waiting on condition [0x7ff3ef803000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00018d6d8ed8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:504) "pool-2-thread-1-SendThread(72.55.136.25:2181)" #1217 daemon prio=5 os_prio=0 tid=0x7ff3f5e23000 nid=0x5cd7 runnable [0x7ff3ef904000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00018d68a730> (a sun.nio.ch.Util$3) - locked <0x00018d68a720> (a java.util.Collections$UnmodifiableSet) - locked <0x00018d68a258> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) *Running the mntr command returned the following stats* zk_version3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT zk_avg_latency0 zk_max_latency*17657* zk_min_latency0 zk_packets_received1427134 zk_packets_sent1596974 zk_num_alive_connections64 zk_outstanding_requests0 zk_server_statefollower zk_znode_count1394 zk_watch_count592 zk_ephemerals_count192 zk_approximate_data_size181257 zk_open_file_descriptor_count94 zk_max_file_descriptor_count1048576 zk_fsync_threshold_exceed_count1 I find the *zk_max_latency* extremely hight. I'm wondering what kind of latency is that ? How can I debug the reason for this value ? Thanks, Gelbana On Wed, Apr 3, 2019 at 1:42 PM Muhammad Gelbana wrote: > I'm trying to debug a problem where our client application suddenly loses > its Zookeeper session. I concluded that by looking at the Zookeeper server > logs. > > I increased the logging details for the client and found the following log > messages > >> DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for sessionid: >> 0x3000da76fa904b6 after 0ms >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] >> DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for sessionid: >> 0x3000da76fa904b6 after 0ms >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] >> DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for sessionid: >> 0x3000da76fa904b6 after 0ms >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] >> DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for sessionid: >> 0x3000da76fa904b6 after 0ms >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] >> DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for sessionid: >> 0x3000da76fa904b6 after 0ms >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] >> DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for sessionid: >> 0x3000da76fa904b6 after 0ms >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] >> > > I noticed that the duration between each log message is ~33 seconds while > on another environment (my laptop), the duration goes down to ~1 second. > What could be causing this huge difference ? I doubt that whatever is > causing this effect causes the delay to increase significantly at some > point to the extend that makes my client lose its session. >
Unexpected delay between pings sent from the client to server
I'm trying to debug a problem where our client application suddenly loses its Zookeeper session. I concluded that by looking at the Zookeeper server logs. I increased the logging details for the client and found the following log messages > DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for sessionid: > 0x3000da76fa904b6 after 0ms > [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for sessionid: > 0x3000da76fa904b6 after 0ms > [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for sessionid: > 0x3000da76fa904b6 after 0ms > [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for sessionid: > 0x3000da76fa904b6 after 0ms > [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for sessionid: > 0x3000da76fa904b6 after 0ms > [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for sessionid: > 0x3000da76fa904b6 after 0ms > [org.apache.zookeeper.ClientCnxn$SendThread.readResponse] > I noticed that the duration between each log message is ~33 seconds while on another environment (my laptop), the duration goes down to ~1 second. What could be causing this huge difference ? I doubt that whatever is causing this effect causes the delay to increase significantly at some point to the extend that makes my client lose its session.