Re: Ping and client session timeouts
On Sat, May 22, 2010 at 12:38 AM, Ted Dunning wrote: > You may actually be swapping. That can be even worse than GC! Just to close the loop on this one, the problem was indeed GC: not only was I running 1200 threads, I had a vicious memory leak too. Good times. Thanks for the help on that one, folks. Steve -- Stephen Green http://thesearchguy.wordpress.com
Re: Ping and client session timeouts
You may actually be swapping. That can be even worse than GC! On Fri, May 21, 2010 at 11:32 AM, Stephen Green wrote: > Right. The system can be very memory-intensive, but at the time these > are occurring, it's not under a really heavy load, and there's plenty > of heap available. However, while looking at a thread dump from one of > the nodes, I realized that a very poor decision meant that I had more > than 1200 threads running. I expect this is more of a problem than > the GC at this point. I'm taking steps to correct this problem now. >
Re: Ping and client session timeouts
On 05/21/2010 11:32 AM, Stephen Green wrote: Right. The system can be very memory-intensive, but at the time these are occurring, it's not under a really heavy load, and there's plenty of heap available. However, while looking at a thread dump from one of the nodes, I realized that a very poor decision meant that I had more than 1200 threads running. I expect this is more of a problem than the GC at this point. I'm taking steps to correct this problem now. Lately, I've had fewer and fewer problems with GC. In a former life, I sat down the hall from the folks who wrote Hotspot's GC and they're pretty sharp folks :-) GC as a cause is very common, however had you mentioned 1200 threads I would have guessed that to be a potential issue. ;-) Right. I'd like to have as small a timeout as possible so that I notice quickly when things disappear. What's a reasonable minimum? I notice recommendations in other messages on the list that 2 is a good value. The setting you should use typically is determined by your sla requirements. How soon do you want ephemeral nodes to be cleaned up if a client fails? Say you were doing leader election, this would gate re-election in the case where the current leader failed (set it lower and you are more responsive (faster), but also more susceptible to "false positives" (such as temp network glitch). Set it higher and you ride over the network glitches however it takes longer to recover when a client really does go down). In some cases (hbase, solr) we've seen that the timeout had to be set artificially high due to the limitations of the current JVM GC algos. For example some hbase users were seeing GC pause times of > 4 minutes. So this raises the question - do you consider this a failure or not? (I could reboot the machine faster than it takes to run that GC...) Good luck, Patrick
Re: Ping and client session timeouts
On Fri, May 21, 2010 at 2:15 PM, Patrick Hunt wrote: > Hi Stephen, my comments inline below: Thanks for the speedy response! >> My understanding (having had a quick look at the code) is that the >> client connection will send a ping every sessionTimeout * 2/3 ms or so >> to keep the session alive, but I keep seeing sessions dying. On the > > Actually the client sends a ping every 1/3 the timeout, and then looks for a > response before another 1/3 elapses. This allows time to reconnect to a > different server (and still maintain the session) if the current server were > to become unavailable. That makes sense. I'm trying to get the system up to speed before I introduce more complexity, but I expect that I'll be using a 3-ZK setup before too long. >> client side I'll see something like the following sequence of events: >> >> [05/21/10 15:59:40.753] INFO Initiating client connection, >> connectString=zookeeper:2200 sessionTimeout=3 >> watcher=com.echonest.cluster.zoocontai...@1eb3319f >> [05/21/10 15:59:40.767] INFO Socket connection established to >> zookeeper/10.255.9.187:2200, initiating session >> [05/21/10 15:59:40.787] INFO Session establishment complete on server >> zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated >> timeout = 3 > > Ok, this (^^^) says that the timeout is set to 30sec. > >> [05/21/10 16:13:03.729] INFO Client session timed out, have not heard >> from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket >> connection and attempting reconnect >> [05/21/10 16:13:19.268] INFO Initiating client connection, >> connectString=zookeeper:2200 sessionTimeout=3 >> watcher=com.echonest.cluster.zoocontai...@1eb3319f >> [05/21/10 16:14:12.326] INFO Client session timed out, have not heard >> from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket >> connection and attempting reconnect > > This (^^^) is very suspicious, in particular "have not heard from server in > 53058ms". This means that the client heartbeat code didn't notice that the > heartbeat was exceeded for 53 seconds! This should never happen, the client > does a "select" with a timeout of 1/3 the session timeout (10sec here). > > The fact that the select is taking 43 addl seconds (53-10sec select timeout) > tells me that your client jvm is not allowing the heartbeat thread to run. > The most common reason for this is GC. Is your client application very > memory intensive? Heavy on GC? You should turn on your GC logging and review > the output after reproducing this issue (turning on CMS/incremental GC mode > usually resolves this issue, but you should verify first). What we typically > see here is that the client JVM is running GC for very long periods of time, > this blocks all the threads, and as a result the heartbeat is not sent by > the client! Right. The system can be very memory-intensive, but at the time these are occurring, it's not under a really heavy load, and there's plenty of heap available. However, while looking at a thread dump from one of the nodes, I realized that a very poor decision meant that I had more than 1200 threads running. I expect this is more of a problem than the GC at this point. I'm taking steps to correct this problem now. Lately, I've had fewer and fewer problems with GC. In a former life, I sat down the hall from the folks who wrote Hotspot's GC and they're pretty sharp folks :-) > As you are running in a virtualized environment this could also be a factor > (it's def an issue from a GC perspective). But I suspect that gc is the > issue here, look at that first. I'll do a run with verbose:gc (and maybe a few other triple-bucky JVM options) and see what's going on. > See this page for some common issues users have faced in the past: > http://bit.ly/5WwS44 Thanks. >> I started out with a session timeout of 10,000ms, but as you can see, >> I have the same problem at 30,000ms. >> > > You may not need to use this after you resolve the GC issue. Right. I'd like to have as small a timeout as possible so that I notice quickly when things disappear. What's a reasonable minimum? I notice recommendations in other messages on the list that 2 is a good value. >> Do I have a fundamental misunderstanding? What else should I do to >> figure out what's going on here? > > As I suggested above, give GC logging a try. I found 'gchisto' a very useful > tool for reviewing the resulting log files. > http://sysadminsjourney.com/content/2008/09/15/profile-your-java-gc-performance-gchisto Tony will be chuffed to hear that folks are recommending his gc tools :-) I'll let you know how I get on. Steve -- Stephen Green http://thesearchguy.wordpress.com
Re: Ping and client session timeouts
Hi Stephen, my comments inline below: On 05/21/2010 09:31 AM, Stephen Green wrote: I feel like I'm missing something fairly fundamental here. I'm building a clustered application that uses ZooKeeper (3.3.1) to store its configuration information. There are 33 nodes in the cluster (Amazon EC2 instance, if that matters), and I'm currently using a single ZooKeeper instance. When a node starts up, it makes a connection to ZK, sets the data on a few paths and makes an ephemeral node for itself. I keep the connection open while the node is running so that I can use watches to find out if a node disappears, but after the initial setup, the node usually won't write or read anything from ZK. My understanding (having had a quick look at the code) is that the client connection will send a ping every sessionTimeout * 2/3 ms or so to keep the session alive, but I keep seeing sessions dying. On the Actually the client sends a ping every 1/3 the timeout, and then looks for a response before another 1/3 elapses. This allows time to reconnect to a different server (and still maintain the session) if the current server were to become unavailable. client side I'll see something like the following sequence of events: [05/21/10 15:59:40.753] INFO Initiating client connection, connectString=zookeeper:2200 sessionTimeout=3 watcher=com.echonest.cluster.zoocontai...@1eb3319f [05/21/10 15:59:40.767] INFO Socket connection established to zookeeper/10.255.9.187:2200, initiating session [05/21/10 15:59:40.787] INFO Session establishment complete on server zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated timeout = 3 Ok, this (^^^) says that the timeout is set to 30sec. [05/21/10 16:13:03.729] INFO Client session timed out, have not heard from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket connection and attempting reconnect [05/21/10 16:13:19.268] INFO Initiating client connection, connectString=zookeeper:2200 sessionTimeout=3 watcher=com.echonest.cluster.zoocontai...@1eb3319f [05/21/10 16:14:12.326] INFO Client session timed out, have not heard from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket connection and attempting reconnect This (^^^) is very suspicious, in particular "have not heard from server in 53058ms". This means that the client heartbeat code didn't notice that the heartbeat was exceeded for 53 seconds! This should never happen, the client does a "select" with a timeout of 1/3 the session timeout (10sec here). The fact that the select is taking 43 addl seconds (53-10sec select timeout) tells me that your client jvm is not allowing the heartbeat thread to run. The most common reason for this is GC. Is your client application very memory intensive? Heavy on GC? You should turn on your GC logging and review the output after reproducing this issue (turning on CMS/incremental GC mode usually resolves this issue, but you should verify first). What we typically see here is that the client JVM is running GC for very long periods of time, this blocks all the threads, and as a result the heartbeat is not sent by the client! As you are running in a virtualized environment this could also be a factor (it's def an issue from a GC perspective). But I suspect that gc is the issue here, look at that first. See this page for some common issues users have faced in the past: http://bit.ly/5WwS44 If I'm reading this correctly, the connection gets set up and then the server experiences an error trying to read from the client, so it closes the connection. It's not clear if this causes the session timeout or vice-versa (these systems are both running ntp, but I doubt that we can count on interleaving those log times correctly.) Yes, this is due to the client not getting the heartbeat, so it will close the connection and attempt to reestablish. I started out with a session timeout of 10,000ms, but as you can see, I have the same problem at 30,000ms. You may not need to use this after you resolve the GC issue. Do I have a fundamental misunderstanding? What else should I do to figure out what's going on here? As I suggested above, give GC logging a try. I found 'gchisto' a very useful tool for reviewing the resulting log files. http://sysadminsjourney.com/content/2008/09/15/profile-your-java-gc-performance-gchisto Regards, Patrick
Ping and client session timeouts
I feel like I'm missing something fairly fundamental here. I'm building a clustered application that uses ZooKeeper (3.3.1) to store its configuration information. There are 33 nodes in the cluster (Amazon EC2 instance, if that matters), and I'm currently using a single ZooKeeper instance. When a node starts up, it makes a connection to ZK, sets the data on a few paths and makes an ephemeral node for itself. I keep the connection open while the node is running so that I can use watches to find out if a node disappears, but after the initial setup, the node usually won't write or read anything from ZK. My understanding (having had a quick look at the code) is that the client connection will send a ping every sessionTimeout * 2/3 ms or so to keep the session alive, but I keep seeing sessions dying. On the client side I'll see something like the following sequence of events: [05/21/10 15:59:40.753] INFO Initiating client connection, connectString=zookeeper:2200 sessionTimeout=3 watcher=com.echonest.cluster.zoocontai...@1eb3319f [05/21/10 15:59:40.767] INFO Socket connection established to zookeeper/10.255.9.187:2200, initiating session [05/21/10 15:59:40.787] INFO Session establishment complete on server zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated timeout = 3 [05/21/10 16:13:03.729] INFO Client session timed out, have not heard from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket connection and attempting reconnect [05/21/10 16:13:19.268] INFO Initiating client connection, connectString=zookeeper:2200 sessionTimeout=3 watcher=com.echonest.cluster.zoocontai...@1eb3319f [05/21/10 16:14:12.326] INFO Client session timed out, have not heard from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket connection and attempting reconnect [05/21/10 16:14:29.810] INFO Socket connection established to zookeeper/10.255.9.187:2200, initiating session [05/21/10 16:14:29.814] INFO Session establishment complete on server zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004f, negotiated timeout = 3 While on the server side, I'll see something like: 2010-05-21 15:59:40,775 - INFO [SyncThread:0:nioserverc...@1579] - Established session 0x128bb7b828d004c with negotiated timeout 3 for client /10.198.98.3:50376 2010-05-21 15:59:40,799 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x128bb7b828d004c type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/clusters Error:KeeperErrorCode = NodeExists for /clusters 2010-05-21 15:59:40,825 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x128bb7b828d004c type:delete cxid:0xc zxid:0xfffe txntype:unknown reqpath:n/a Error Path:/clusters/echostore/groups/01222100-81db-439e-a3df-90e061166e44/master Error:KeeperErrorCode = NoNode for /clusters/echostore/groups/01222100-81db-439e-a3df-90e061166e44/master 2010-05-21 16:13:03,732 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:nioserverc...@633] - EndOfStreamException: Unable to read additional data from client sessionid 0x128bb7b828d004c, likely client has closed socket 2010-05-21 16:13:03,733 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:nioserverc...@1434] - Closed socket connection for client /10.198.98.3:50376 which had sessionid 0x128bb7b828d004c 2010-05-21 16:13:20,012 - INFO [SessionTracker:zookeeperser...@315] - Expiring session 0x128bb7b828d004c, timeout of 3ms exceeded 2010-05-21 16:13:20,013 - INFO [ProcessThread:-1:preprequestproces...@385] - Processed session termination for sessionid: 0x128bb7b828d004c If I'm reading this correctly, the connection gets set up and then the server experiences an error trying to read from the client, so it closes the connection. It's not clear if this causes the session timeout or vice-versa (these systems are both running ntp, but I doubt that we can count on interleaving those log times correctly.) I started out with a session timeout of 10,000ms, but as you can see, I have the same problem at 30,000ms. Do I have a fundamental misunderstanding? What else should I do to figure out what's going on here? Steve