Hi,

We have added code to our application to reconnect and re-establish watches 
when we receive a Disconnected event.  I am running tests on a heavily loaded 
system where the zookeeper server and clients are all impacted.  On this test 
system we regularly experience session timeouts and appropriately react to 
reconnect and set up our watches.  There is an uncommon case that I am having 
trouble puzzling out.  When running one of our tests in a loop about 1% of the 
time we hit a case where on the client side we think the session has expired 
but on the server side it has been renewed.   We will then fail to be able to 
create an ephemeral node because it already exists and does not ever get 
cleaned up (since the previous session is still valid).  I'm trying to figure 
out if we are misusing the API or if we have encountered a bug.   I'm happy to 
provide more details.  One thing I am wondering is if it is inappropriate to 
create a new session within the event thread of another session which has 
received the disconnected event.

Thanks,
Martin Serrano

  The logs are filtered a little just to keep this email a reasonable size, but 
everything relevant to the session in question is here.

Associated client application logs (the reconnecting message is ours where we 
re-establish watches, etc using a new session):

2011-09-12 18:49:55,729 INFO [ZooKeeperClientCnxn--EventThread-1325fd35663000d] 
- Session level event, session 1325fd35663000d: WatchedEvent state:Disconnected 
type:None path:null
2011-09-12 18:49:55,729 INFO [ZooKeeperClientCnxn--EventThread-1325fd35663000d] 
- Reconnecting, current session 1325fd35663000d
2011-09-12 18:49:55,729 INFO [ZooKeeperClientCnxn--EventThread-1325fd35663000d] 
- Connecting to zookeeper on localhost:16015
2011-09-12 18:50:08,823 INFO [main_master_10.2.1.100_16040] - obj[76a9b9c] 
Reestablishing agent watches
2011-09-12 18:50:09,120 WARN [ZooKeeperClientCnxn--EventThread-1325fd35663000d] 
- obj[1e5b04ae] Failed to write status node multinode-monitor/master.  Saw 
previous value null, retrying
  org.apache.zookeeper.KeeperException$NodeExistsException - KeeperErrorCode = 
NodeExists for multinode-monitor/master

Associated client zookeeper logs:

2011-09-12 18:48:15,995 INFO  ClientCnxn 
[ZooKeeperClientCnxn--SendThread-1325fd35663000d] - Session establishment 
complete on server localhost/127.0.0.1:16015, sessionid = 0x1325fd35663000d, 
negotiated timeout = 40000
2011-09-12 18:49:31,245 INFO  ClientCnxn 
[ZooKeeperClientCnxn--SendThread-1325fd35663000d] - Client session timed out, 
have not heard from server in 26672ms for sessionid 0x1325fd35663000d, closing 
socket connection and attempting reconnect
2011-09-12 18:49:55,729 INFO  ZooKeeper 
[ZooKeeperClientCnxn--EventThread-1325fd35663000d] - Initiating client 
connection, connectString=localhost:16015 sessionTimeout=600000 
watcher=com.attivio.global.AieZooKeeper@4f536eec<mailto:watcher=com.attivio.global.AieZooKeeper@4f536eec>
2011-09-12 18:50:08,057 INFO  ClientCnxn 
[ZooKeeperClientCnxn--SendThread-1325fd35663000d] - Session establishment 
complete on server localhost/127.0.0.1:16015, sessionid = 0x1325fd35663000d, 
negotiated timeout = 40000

Associated server zookeeper logs:

2011-09-12 18:48:16,010 INFO  NIOServerCnxn [SyncThread:0] - Established 
session 0x1325fd35663000d with negotiated timeout 40000 for client 
/127.0.0.1:4257
2011-09-12 18:49:54,120 WARN  NIOServerCnxn 
[NIOServerCxn.Factory:localhost/127.0.0.1:16015] - Exception causing close of 
session 0x1325fd35663000d due to java.io.IOException: An established connection 
was aborted by the software in your host machine
2011-09-12 18:49:54,120 INFO  NIOServerCnxn 
[NIOServerCxn.Factory:localhost/127.0.0.1:16015] - Closed socket connection for 
client /127.0.0.1:4257 which had sessionid 0x1325fd35663000d
2011-09-12 18:50:08,057 INFO  NIOServerCnxn 
[NIOServerCxn.Factory:localhost/127.0.0.1:16015] - Client attempting to renew 
session 0x1325fd35663000d at /127.0.0.1:4566
2011-09-12 18:50:08,057 INFO  NIOServerCnxn 
[NIOServerCxn.Factory:localhost/127.0.0.1:16015] - Established session 
0x1325fd35663000d with negotiated timeout 40000 for client /127.0.0.1:4566

Reply via email to