Re: Ping and client session timeouts

2010-06-04 Thread Stephen Green
On Sat, May 22, 2010 at 12:38 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 You may actually be swapping.  That can be even worse than GC!

Just to close the loop on this one, the problem was indeed GC:  not
only was I running 1200 threads, I had a vicious memory leak too.
Good times.

Thanks for the help on that one, folks.

Steve
-- 
Stephen Green
http://thesearchguy.wordpress.com


Ping and client session timeouts

2010-05-21 Thread Stephen Green
I feel like I'm missing something fairly fundamental here.  I'm
building a clustered application that uses ZooKeeper (3.3.1) to store
its configuration information.  There are 33 nodes in the cluster
(Amazon EC2 instance, if that matters), and I'm currently using a
single ZooKeeper instance. When a node starts up, it makes a
connection to ZK, sets the data on a few paths and makes an ephemeral
node for itself. I keep the connection open while the node is running
so that I can use watches to find out if a node disappears, but after
the initial setup, the node usually won't write or read anything from
ZK.

My understanding (having had a quick look at the code) is that the
client connection will send a ping every sessionTimeout * 2/3 ms or so
to keep the session alive, but I keep seeing sessions dying.  On the
client side I'll see something like the following sequence of events:

[05/21/10 15:59:40.753] INFO Initiating client connection,
connectString=zookeeper:2200 sessionTimeout=3
watcher=com.echonest.cluster.zoocontai...@1eb3319f
[05/21/10 15:59:40.767] INFO Socket connection established to
zookeeper/10.255.9.187:2200, initiating session
[05/21/10 15:59:40.787] INFO Session establishment complete on server
zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated
timeout = 3
[05/21/10 16:13:03.729] INFO Client session timed out, have not heard
from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket
connection and attempting reconnect
[05/21/10 16:13:19.268] INFO Initiating client connection,
connectString=zookeeper:2200 sessionTimeout=3
watcher=com.echonest.cluster.zoocontai...@1eb3319f
[05/21/10 16:14:12.326] INFO Client session timed out, have not heard
from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket
connection and attempting reconnect
[05/21/10 16:14:29.810] INFO Socket connection established to
zookeeper/10.255.9.187:2200, initiating session
[05/21/10 16:14:29.814] INFO Session establishment complete on server
zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004f, negotiated
timeout = 3

While on the server side, I'll see something like:

2010-05-21 15:59:40,775 - INFO  [SyncThread:0:nioserverc...@1579] -
Established session 0x128bb7b828d004c with negotiated timeout 3
for client /10.198.98.3:50376
2010-05-21 15:59:40,799 - INFO
[ProcessThread:-1:preprequestproces...@405] - Got user-level
KeeperException when processing sessionid:0x128bb7b828d004c
type:create cxid:0x1 zxid:0xfffe txntype:unknown
reqpath:n/a Error Path:/clusters Error:KeeperErrorCode = NodeExists
for /clusters
2010-05-21 15:59:40,825 - INFO
[ProcessThread:-1:preprequestproces...@405] - Got user-level
KeeperException when processing sessionid:0x128bb7b828d004c
type:delete cxid:0xc zxid:0xfffe txntype:unknown
reqpath:n/a Error
Path:/clusters/echostore/groups/01222100-81db-439e-a3df-90e061166e44/master
Error:KeeperErrorCode = NoNode for
/clusters/echostore/groups/01222100-81db-439e-a3df-90e061166e44/master
2010-05-21 16:13:03,732 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:nioserverc...@633] -
EndOfStreamException: Unable to read additional data from client
sessionid 0x128bb7b828d004c, likely client has closed socket
2010-05-21 16:13:03,733 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2200:nioserverc...@1434] -
Closed socket connection for client /10.198.98.3:50376 which had
sessionid 0x128bb7b828d004c
2010-05-21 16:13:20,012 - INFO  [SessionTracker:zookeeperser...@315] -
Expiring session 0x128bb7b828d004c, timeout of 3ms exceeded
2010-05-21 16:13:20,013 - INFO
[ProcessThread:-1:preprequestproces...@385] - Processed session
termination for sessionid: 0x128bb7b828d004c

If I'm reading this correctly, the connection gets set up and then the
server experiences an error trying to read from the client, so it
closes the connection.  It's not clear if this causes the session
timeout or vice-versa (these systems are both running ntp, but I doubt
that we can count on interleaving those log times correctly.)

I started out with a session timeout of 10,000ms, but as you can see,
I have the same problem at 30,000ms.

Do I have a fundamental misunderstanding? What else should I do to
figure out what's going on here?

Steve


Re: Ping and client session timeouts

2010-05-21 Thread Patrick Hunt

Hi Stephen, my comments inline below:

On 05/21/2010 09:31 AM, Stephen Green wrote:

I feel like I'm missing something fairly fundamental here.  I'm
building a clustered application that uses ZooKeeper (3.3.1) to store
its configuration information.  There are 33 nodes in the cluster
(Amazon EC2 instance, if that matters), and I'm currently using a
single ZooKeeper instance. When a node starts up, it makes a
connection to ZK, sets the data on a few paths and makes an ephemeral
node for itself. I keep the connection open while the node is running
so that I can use watches to find out if a node disappears, but after
the initial setup, the node usually won't write or read anything from
ZK.

My understanding (having had a quick look at the code) is that the
client connection will send a ping every sessionTimeout * 2/3 ms or so
to keep the session alive, but I keep seeing sessions dying.  On the


Actually the client sends a ping every 1/3 the timeout, and then looks 
for a response before another 1/3 elapses. This allows time to reconnect 
to a different server (and still maintain the session) if the current 
server were to become unavailable.



client side I'll see something like the following sequence of events:

[05/21/10 15:59:40.753] INFO Initiating client connection,
connectString=zookeeper:2200 sessionTimeout=3
watcher=com.echonest.cluster.zoocontai...@1eb3319f
[05/21/10 15:59:40.767] INFO Socket connection established to
zookeeper/10.255.9.187:2200, initiating session
[05/21/10 15:59:40.787] INFO Session establishment complete on server
zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated
timeout = 3


Ok, this (^^^) says that the timeout is set to 30sec.


[05/21/10 16:13:03.729] INFO Client session timed out, have not heard
from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket
connection and attempting reconnect
[05/21/10 16:13:19.268] INFO Initiating client connection,
connectString=zookeeper:2200 sessionTimeout=3
watcher=com.echonest.cluster.zoocontai...@1eb3319f
[05/21/10 16:14:12.326] INFO Client session timed out, have not heard
from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket
connection and attempting reconnect


This (^^^) is very suspicious, in particular have not heard from server 
in 53058ms. This means that the client heartbeat code didn't notice 
that the heartbeat was exceeded for 53 seconds! This should never 
happen, the client does a select with a timeout of 1/3 the session 
timeout (10sec here).


The fact that the select is taking 43 addl seconds (53-10sec select 
timeout) tells me that your client jvm is not allowing the heartbeat 
thread to run. The most common reason for this is GC. Is your client 
application very memory intensive? Heavy on GC? You should turn on your 
GC logging and review the output after reproducing this issue (turning 
on CMS/incremental GC mode usually resolves this issue, but you should 
verify first). What we typically see here is that the client JVM is 
running GC for very long periods of time, this blocks all the threads, 
and as a result the heartbeat is not sent by the client!


As you are running in a virtualized environment this could also be a 
factor (it's def an issue from a GC perspective). But I suspect that gc 
is the issue here, look at that first.


See this page for some common issues users have faced in the past: 
http://bit.ly/5WwS44




If I'm reading this correctly, the connection gets set up and then the
server experiences an error trying to read from the client, so it
closes the connection.  It's not clear if this causes the session
timeout or vice-versa (these systems are both running ntp, but I doubt
that we can count on interleaving those log times correctly.)



Yes, this is due to the client not getting the heartbeat, so it will 
close the connection and attempt to reestablish.



I started out with a session timeout of 10,000ms, but as you can see,
I have the same problem at 30,000ms.



You may not need to use this after you resolve the GC issue.


Do I have a fundamental misunderstanding? What else should I do to
figure out what's going on here?


As I suggested above, give GC logging a try. I found 'gchisto' a very 
useful tool for reviewing the resulting log files.

http://sysadminsjourney.com/content/2008/09/15/profile-your-java-gc-performance-gchisto

Regards,

Patrick


Re: Ping and client session timeouts

2010-05-21 Thread Stephen Green
On Fri, May 21, 2010 at 2:15 PM, Patrick Hunt ph...@apache.org wrote:
 Hi Stephen, my comments inline below:

Thanks for the speedy response!

 My understanding (having had a quick look at the code) is that the
 client connection will send a ping every sessionTimeout * 2/3 ms or so
 to keep the session alive, but I keep seeing sessions dying.  On the

 Actually the client sends a ping every 1/3 the timeout, and then looks for a
 response before another 1/3 elapses. This allows time to reconnect to a
 different server (and still maintain the session) if the current server were
 to become unavailable.

That makes sense.  I'm trying to get the system up to speed before I
introduce more complexity, but I expect that I'll be using a 3-ZK
setup before too long.

 client side I'll see something like the following sequence of events:

 [05/21/10 15:59:40.753] INFO Initiating client connection,
 connectString=zookeeper:2200 sessionTimeout=3
 watcher=com.echonest.cluster.zoocontai...@1eb3319f
 [05/21/10 15:59:40.767] INFO Socket connection established to
 zookeeper/10.255.9.187:2200, initiating session
 [05/21/10 15:59:40.787] INFO Session establishment complete on server
 zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated
 timeout = 3

 Ok, this (^^^) says that the timeout is set to 30sec.

 [05/21/10 16:13:03.729] INFO Client session timed out, have not heard
 from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket
 connection and attempting reconnect
 [05/21/10 16:13:19.268] INFO Initiating client connection,
 connectString=zookeeper:2200 sessionTimeout=3
 watcher=com.echonest.cluster.zoocontai...@1eb3319f
 [05/21/10 16:14:12.326] INFO Client session timed out, have not heard
 from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket
 connection and attempting reconnect

 This (^^^) is very suspicious, in particular have not heard from server in
 53058ms. This means that the client heartbeat code didn't notice that the
 heartbeat was exceeded for 53 seconds! This should never happen, the client
 does a select with a timeout of 1/3 the session timeout (10sec here).

 The fact that the select is taking 43 addl seconds (53-10sec select timeout)
 tells me that your client jvm is not allowing the heartbeat thread to run.
 The most common reason for this is GC. Is your client application very
 memory intensive? Heavy on GC? You should turn on your GC logging and review
 the output after reproducing this issue (turning on CMS/incremental GC mode
 usually resolves this issue, but you should verify first). What we typically
 see here is that the client JVM is running GC for very long periods of time,
 this blocks all the threads, and as a result the heartbeat is not sent by
 the client!

Right.  The system can be very memory-intensive, but at the time these
are occurring, it's not under a really heavy load, and there's plenty
of heap available. However, while looking at a thread dump from one of
the nodes, I realized that a very poor decision meant that I had more
than 1200 threads running.  I expect this is more of a problem than
the GC at this point.  I'm taking steps to correct this problem now.

Lately, I've had fewer and fewer problems with GC.  In a former life,
I sat down the hall from the folks who wrote Hotspot's GC and they're
pretty sharp folks :-)

 As you are running in a virtualized environment this could also be a factor
 (it's def an issue from a GC perspective). But I suspect that gc is the
 issue here, look at that first.

I'll do a run with verbose:gc (and maybe a few other triple-bucky JVM
options) and see what's going on.

 See this page for some common issues users have faced in the past:
 http://bit.ly/5WwS44

Thanks.

 I started out with a session timeout of 10,000ms, but as you can see,
 I have the same problem at 30,000ms.


 You may not need to use this after you resolve the GC issue.

Right.  I'd like to have as small a timeout as possible so that I
notice quickly when things disappear.  What's a reasonable minimum?  I
notice recommendations in other messages on the list that 2 is a
good value.

 Do I have a fundamental misunderstanding? What else should I do to
 figure out what's going on here?

 As I suggested above, give GC logging a try. I found 'gchisto' a very useful
 tool for reviewing the resulting log files.
 http://sysadminsjourney.com/content/2008/09/15/profile-your-java-gc-performance-gchisto

Tony will be chuffed to hear that folks are recommending his gc tools
:-)  I'll let you know how I get on.

Steve
-- 
Stephen Green
http://thesearchguy.wordpress.com


Re: Ping and client session timeouts

2010-05-21 Thread Patrick Hunt


On 05/21/2010 11:32 AM, Stephen Green wrote:

Right.  The system can be very memory-intensive, but at the time these
are occurring, it's not under a really heavy load, and there's plenty
of heap available. However, while looking at a thread dump from one of
the nodes, I realized that a very poor decision meant that I had more
than 1200 threads running.  I expect this is more of a problem than
the GC at this point.  I'm taking steps to correct this problem now.

Lately, I've had fewer and fewer problems with GC.  In a former life,
I sat down the hall from the folks who wrote Hotspot's GC and they're
pretty sharp folks :-)


GC as a cause is very common, however had you mentioned 1200 threads I 
would have guessed that to be a potential issue. ;-)



Right.  I'd like to have as small a timeout as possible so that I
notice quickly when things disappear.  What's a reasonable minimum?  I
notice recommendations in other messages on the list that 2 is a
good value.



The setting you should use typically is determined by your sla 
requirements. How soon do you want ephemeral nodes to be cleaned up if a 
client fails? Say you were doing leader election, this would gate 
re-election in the case where the current leader failed (set it lower 
and you are more responsive (faster), but also more susceptible to 
false positives (such as temp network glitch). Set it higher and you 
ride over the network glitches however it takes longer to recover when a 
client really does go down).


In some cases (hbase, solr) we've seen that the timeout had to be set 
artificially high due to the limitations of the current JVM GC algos. 
For example some hbase users were seeing GC pause times of  4 minutes. 
So this raises the question - do you consider this a failure or not? (I 
could reboot the machine faster than it takes to run that GC...)


Good luck,

Patrick


Re: Ping and client session timeouts

2010-05-21 Thread Ted Dunning
You may actually be swapping.  That can be even worse than GC!

On Fri, May 21, 2010 at 11:32 AM, Stephen Green eelstretch...@gmail.comwrote:

 Right.  The system can be very memory-intensive, but at the time these
 are occurring, it's not under a really heavy load, and there's plenty
 of heap available. However, while looking at a thread dump from one of
 the nodes, I realized that a very poor decision meant that I had more
 than 1200 threads running.  I expect this is more of a problem than
 the GC at this point.  I'm taking steps to correct this problem now.