[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2016-10-13 Thread guoping.gp (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574315#comment-15574315
 ] 

guoping.gp commented on ZOOKEEPER-1382:
---

the same issue is still can be found on 3.4.6, 
https://issues.apache.org/jira/browse/ZOOKEEPER-2615.

when the client app reconnect to a server, the setWatchs and close still stay 
in the race condition as following possible:

Daemon Thread [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:6181] (Suspended 
(breakpoint at line 966 in NIOServerCnxn)) 
NIOServerCnxn.close() line: 966 
NIOServerCnxn.doIO(SelectionKey) line: 360  
NIOServerCnxnFactory.run() line: 208
Thread.run() line: 745

and then

Thread [CommitProcessor:1] (Suspended (breakpoint at line 303 in 
FinalRequestProcessor))
FinalRequestProcessor.processRequest(Request) line: 303 
CommitProcessor.run() line: 74

> Zookeeper server holds onto dead/expired session ids in the watch data 
> structures
> -
>
> Key: ZOOKEEPER-1382
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
>Reporter: Neha Narkhede
>Assignee: Germán Blanco
>Priority: Critical
> Fix For: 3.4.6, 3.5.0
>
> Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
> ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
> ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
> ZOOKEEPER-1382_3.3.4.patch
>
>
> I've observed that zookeeper server holds onto expired session ids in the 
> watcher data structures. The result is the wchp command reports session ids 
> that cannot be found through cons/dump and those expired session ids sit 
> there maybe until the server is restarted. Here are snippets from the client 
> and the server logs that lead to this state, for one particular session id 
> 0x134485fd7bcb26f -
> There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
> and I'm using ZkClient to connect to the cluster
> From the application log -
> application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application Session establishment complete 
> on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
> negotiated timeout = 6000
> application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
> [main-SendThread(223.prod:12913)] [application] Client session timed out, 
> have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
> socket connection and attempting reconnect
> application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
> [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
> ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
> connection
> On the leader zk, 225 -
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
> timeout of 6000ms exceeded
> zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
> [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
> for sessionid: 0x134485fd7bcb26f
> On the server, the client was initially connected to, 223 -
> zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
> 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
> zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
> [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
> /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
> Here are the log snippets from 226, which is the server, the client 
> reconnected to, before getting session expired event -
> 2012-01-27 09:52:38,190 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
> attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
> 2012-01-27 09:52:38,191 - INFO  
> [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
> 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
> 2012-01-27 09:52:38,191 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
> socket connection for client /172.17.136.82:49367 which had sessionid 
> 0x134485fd7bcb26f
> wchp output from 226, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *226.*wchp* | wc -l
> 3
> wchp output from 223, taken on 01/30 -
> nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
> *223.*wchp* | wc -l
> 0
> cons output from 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846547#comment-13846547
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

Thank you [~fournc]!

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-11 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845600#comment-13845600
 ] 

Camille Fournier commented on ZOOKEEPER-1382:
-

Pretty sweet tests. I think this looks good on trunk, will check it in.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-11 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845607#comment-13845607
 ] 

Camille Fournier commented on ZOOKEEPER-1382:
-

Looks good on 3.4 as well, will close this as soon as builds complete 
successfully. Thanks to all who helped with this!

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845636#comment-13845636
 ] 

Hudson commented on ZOOKEEPER-1382:
---

SUCCESS: Integrated in ZooKeeper-trunk #2150 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/2150/])
ZOOKEEPER-1382. Zookeeper server holds onto dead/expired session ids in the 
watch data structures
  (Germán Blanco and Michael Morello via camille) (camille: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550213)
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/MockPacket.java
* 
/zookeeper/trunk/src/java/test/org/apache/zookeeper/server/MockNIOServerCnxn.java
* 
/zookeeper/trunk/src/java/test/org/apache/zookeeper/server/MockSelectorThread.java
* 
/zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/WatchLeakTest.java


 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841000#comment-13841000
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12617336/ZOOKEEPER-1382.patch
  against trunk revision 1547702.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1818//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1818//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1818//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-04 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838906#comment-13838906
 ] 

Rakesh R commented on ZOOKEEPER-1382:
-

Hi German, 
Patch looks very nice, especially test cases are really good:)

Add few suggestions:
- Please add message while asserting, this will give more clarity, like
{code}
assertEquals(There must be NO watches!, 0, watchCount);
{code}
- Formatting is not proper in MockSelectorThread.java. Normally, between 
methods there will be a newline.
- This is a suggestion and FYI. I would like to use 
@RunWith(Parameterized.class), when there are situations to run the same test 
case with different input values. In the patch, we have testcases with 
sessiontimeout and without sessiontimeout flags. Please refer 
MultiTransactionTest to understand more on this pattern. This way it will again 
make the test cases very simple.


 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-12-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839019#comment-13839019
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

Thank you [~rakeshr], but the test cases are originally from someone else, 
[~barkbay] I believe. And I agree, it seems very difficult to come up with a 
way to test this and he has done a good job.
And thank you for the suggestions, I will try to take care of them in a few 
days and propose a new patch.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-11-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832637#comment-13832637
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

[~fpj], could you please take a look at this and say if it is ok now?

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-11-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821233#comment-13821233
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

I forgot to thank [~rgs] for reviewing the patch. ... dicen que más vale tarde 
que nunca.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815855#comment-13815855
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12612572/ZOOKEEPER-1382.patch
  against trunk revision 1539529.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1747//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1747//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1747//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-11-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814641#comment-13814641
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

I wouldn't recommend to run zookeeper using ramdisk without having an 
additional method that ensures consistency. Ramdisk means data is not 
persistent after a reboot, and this JIRA shows that losing data in one node 
when the quorum was in minority may lead to permanent inconsistencies in the 
quorum.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-11-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814672#comment-13814672
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

Added a review request for this [https://reviews.apache.org/r/15259/]

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-10-24 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804293#comment-13804293
 ] 

Flavio Junqueira commented on ZOOKEEPER-1382:
-

Given that the Mock* classes are only used in WatchLeakTest, I was wondering if 
it wouldn't be best to have them defined inside the WatchLeakTest class. 

Also, there is a test case not running because @Test is commented out.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805018#comment-13805018
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12610262/ZOOKEEPER-1382.patch
  against trunk revision 1535491.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1722//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch, 
 ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801438#comment-13801438
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12609581/ZOOKEEPER-1382.patch
  against trunk revision 1534390.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1714//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1714//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1714//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Germán Blanco
Priority: Critical
 Fix For: 3.4.6, 3.5.0

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382-branch-3.4.patch, 
 ZOOKEEPER-1382.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-10-07 Thread Antonio (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788048#comment-13788048
 ] 

Antonio commented on ZOOKEEPER-1382:


Hi, is ZooKeeper 3.4.3 affected by this bug?

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 synchronized (this.factory) {
 sk.selector().wakeup();
 enableRecv();
 }
 } catch (Exception e) {
  

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-10-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788227#comment-13788227
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

Yes

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 synchronized (this.factory) {
 sk.selector().wakeup();
 enableRecv();
 }
 } catch (Exception e) {
 LOG.warn(Exception while 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-09-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777620#comment-13777620
 ] 

Germán Blanco commented on ZOOKEEPER-1382:
--

I would like to propose to skip the tests cases for this JIRA.
The modification required is minimal and it seems clearly correct. However, 
trying to build a test that works and that doesn't require future maintenance 
seems to be very complex.
I have done the mock classes as suggested by Flavio above, but the tests don't 
work any more (at least in trunk). My guess is that it is because of latest 
changes in the code. I don't see an easy fix (at least easy for me), and what 
is worse, it seems to me that these test cases will require review very often, 
since they depend very deeply on how things are implemented.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598293#comment-13598293
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12572967/ZOOKEEPER-1382.patch
  against trunk revision 1453693.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1427//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1427//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1427//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-10 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598458#comment-13598458
 ] 

Camille Fournier commented on ZOOKEEPER-1382:
-

I can't even get the TruncateCorruptionTest to run to completion or failure at 
all with or without this patch. I think we need to open a separate bug report 
on it. Besides that, this fix looks good to me.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-10 Thread Alexander Shraer (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598463#comment-13598463
 ] 

Alexander Shraer commented on ZOOKEEPER-1382:
-

I've opened a bug for this some time ago: 
https://issues.apache.org/jira/browse/ZOOKEEPER-1629
perhaps we should disable this test for now ? for example, if one of the Java 
tests fails, the C tests don't run at all.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-08 Thread Michael Morello (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13596979#comment-13596979
 ] 

Michael Morello commented on ZOOKEEPER-1382:


My bad, I have not spend enough time on the new branch, you are right. The same 
fix could be applied.
Nevertheless the unit test is no more relevant.
I'll try to work on that.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 synchronized 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-05 Thread Thawan Kooburat (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594399#comment-13594399
 ] 

Thawan Kooburat commented on ZOOKEEPER-1382:


Can you explain a bit more why we cannot rely on disableRecv() / enableRecv() 
in 3.5 to fix this issue? 

From my understanding, although the request is now process in non-blocking 
fashion, we are calling disableRecv() while processing a ConnectRequest 
packet, so the SelectorThread won't try to select this socket again for 
reading after we finished processing the packet. 



 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-04 Thread Michael Morello (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592254#comment-13592254
 ] 

Michael Morello commented on ZOOKEEPER-1382:


It seems that the behavior of disableRecv() / enableRecv() has changed a bit in 
3.5 : there is a new thread (SelectorThread) in charge of interest op updates 
and updates are processed in a non blocking fashion (updates are queued to be 
handled later by the SelectorThread, if i understand correctly it is more a 
throttle mechanism than something else). The unit test is no more relevant.

If we want to fix the two branches (3.4 and 3.5) it is imho no more a good idea 
to rely on enableRecv().
May be it would be cleaner to introduce a new function 
ServerCnxn.setValidSession(boolean valid) that informs the implementation 
(NIOServerCnxn or NettyServerCnxn) that the session has been (re)validated and 
then allow subsequent messages to be processed.

If you don't mind i will try to provide an other patch and a unit test for both 
branches.




 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591848#comment-13591848
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12571815/ZOOKEEPER-1382.patch
  against trunk revision 1448007.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1419//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, ZOOKEEPER-1382.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13591854#comment-13591854
 ] 

Hadoop QA commented on ZOOKEEPER-1382:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12571817/ZOOKEEPER-1382-branch-3.4.patch
  against trunk revision 1448007.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1420//console

This message is automatically generated.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-03-03 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592041#comment-13592041
 ] 

Mahadev konar commented on ZOOKEEPER-1382:
--

Michael,
 Would you be able to upload a patch for trunk as well?

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch, 
 ZOOKEEPER-1382-branch-3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 synchronized (this.factory) {
 sk.selector().wakeup();
 enableRecv();
 }
 } catch 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2013-02-11 Thread Michael Morello (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575670#comment-13575670
 ] 

Michael Morello commented on ZOOKEEPER-1382:


Since we monitor sessions and watches we note that we encounter this problem 
from time to time.
This is not a big issue but it causes a memory leak.
I'll try to see if I can work on it but it does seem hard to reproduce in a 
unit test.

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.4
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
 Fix For: 3.4.6

 Attachments: ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 

[jira] [Commented] (ZOOKEEPER-1382) Zookeeper server holds onto dead/expired session ids in the watch data structures

2012-02-15 Thread Camille Fournier (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208792#comment-13208792
 ] 

Camille Fournier commented on ZOOKEEPER-1382:
-

This is a lot of change for a fix that seems to be really small. Can you put 
this into reviewboard for more careful review? I'm not sure we will want all 
the logging changes so you might want to go through and trim that stuff up 
before putting it up there. Thanks!

 Zookeeper server holds onto dead/expired session ids in the watch data 
 structures
 -

 Key: ZOOKEEPER-1382
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1382
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.4
Reporter: Neha Narkhede
Assignee: Neha Narkhede
 Attachments: ZOOKEEPER-1382_3.3.4.patch


 I've observed that zookeeper server holds onto expired session ids in the 
 watcher data structures. The result is the wchp command reports session ids 
 that cannot be found through cons/dump and those expired session ids sit 
 there maybe until the server is restarted. Here are snippets from the client 
 and the server logs that lead to this state, for one particular session id 
 0x134485fd7bcb26f -
 There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 
 and I'm using ZkClient to connect to the cluster
 From the application log -
 application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application Session establishment complete 
 on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, 
 negotiated timeout = 6000
 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] 
 [main-SendThread(223.prod:12913)] [application] Client session timed out, 
 have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing 
 socket connection and attempting reconnect
 application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] 
 [main-SendThread(226.prod:12913)] [application] Unable to reconnect to 
 ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket 
 connection
 On the leader zk, 225 -
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, 
 timeout of 6000ms exceeded
 zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO  
 [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination 
 for sessionid: 0x134485fd7bcb26f
 On the server, the client was initially connected to, 223 -
 zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1580] - Established session 
 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO  
 [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client 
 /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f
 Here are the log snippets from 226, which is the server, the client 
 reconnected to, before getting session expired event -
 2012-01-27 09:52:38,190 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client 
 attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
 2012-01-27 09:52:38,191 - INFO  
 [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 
 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
 2012-01-27 09:52:38,191 - INFO  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed 
 socket connection for client /172.17.136.82:49367 which had sessionid 
 0x134485fd7bcb26f
 wchp output from 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*wchp* | wc -l
 3
 wchp output from 223, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*wchp* | wc -l
 0
 cons output from 223 and 226, taken on 01/30 -
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *226.*cons* | wc -l
 0
 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f 
 *223.*cons* | wc -l
 0
 So, what seems to have happened is that the client was able to re-register 
 the watches on the new server (226), after it got disconnected from 223, 
 inspite of having an expired session id. 
 In NIOServerCnxn, I saw that after suspecting that a session is expired, a 
 server removes the cnxn and its watches from its internal data structures. 
 But before that it allows more requests to be processed even if the session 
 is expired -
 // Now that the session is ready we can start receiving packets
 synchronized