[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791583#action_12791583
 ] 

Mahadev konar commented on ZOOKEEPER-628:
-----------------------------------------

qian,
 I was thinking about how this could happen and I think this could be possible 
with some amount of corruption in the zk database. ZOOKEEPER-596 should get rid 
of all those scenarios. Would it be possible for you to upload the zookeeper 
data directory and snapshots for all the servers? It would come in very handy 
to find out what the problem could have been. you can tar and zip it up and 
upload it on the jira.

> the ephemeral node wouldn't disapper due to session close error
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-628
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-628
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.2.1
>         Environment: Linux 2.6.9 x86_64
>            Reporter: Qian Ye
>
> I find a very strange scenario today, I'm not sure how it happen, I just 
> found it like this. Maybe you can give me some information about it, my 
> Zookeeper Server is version 3.2.1.
> My Zookeeper cluster contains three servers, with ip: 
> 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create ephemeral 
> node under znode: se/diserver_tc. The client runs on the server with ip 
> 10.81.13.173. The client can create a ephemeral node on zookeeper server and 
> write the host ip (10.81.13.173) in to the node as its data. There is only 
> one client process can be running at a time, because the client will listen 
> to a certain port.
> It is strange that I found there were two ephemeral node with the ip 
> 10.81.13.173 under znode se/diserver_tc.
> se/diserver_tc/diserver_tc0000000067
> STAT:
>         czxid: 124554079820
>         mzxid: 124554079820
>         ctime: 1260609598547
>         mtime: 1260609598547
>         version: 0
>         cversion: 0
>         aversion: 0
>         ephemeralOwner: 226627854640480810
>         dataLength: 92
>         numChildren: 0
>         pzxid: 124554079820
> se/diserver_tc/diserver_tc0000000095
> STAT:
>         czxid: 128849019107
>         mzxid: 128849019107
>         ctime: 1260772197356
>         mtime: 1260772197356
>         version: 0
>         cversion: 0
>         aversion: 0
>         ephemeralOwner: 154673159808876591
>         dataLength: 92
>         numChildren: 0
>         pzxid: 128849019107
> There are TWO with different session id! And after I kill the client process 
> on the server 10.81.13.173, the se/diserver_tc/diserver_tc0000000095 node 
> disappear, but the se/diserver_tc/diserver_tc0000000067 stay the same. That 
> means it is not my coding mistake to create the node twice. I checked several 
> times and I'm sure that there is no another client instance running. And I 
> use the 'stat' command to check the three zookeeper servers, and there is no 
> client from 10.81.13.173,
> $echo stat | nc 10.81.12.144 2181   
> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
> Clients:
>  /10.81.13.173:35676[1](queued=0,recved=0,sent=0) # it is caused by the nc 
> process
> Latency min/avg/max: 0/3/254
> Received: 11081
> Sent: 0
> Outstanding: 0
> Zxid: 0x1e000001f5
> Mode: follower
> Node count: 32
> $ echo stat | nc 10.81.12.141 2181
> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
> Clients:
>  /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
>  /10.81.13.173:35677[1](queued=0,recved=0,sent=0) # it is caused by the nc 
> process
> Latency min/avg/max: 0/0/37
> Received: 37128
> Sent: 0
> Outstanding: 0
> Zxid: 0x1e000001f5
> Mode: follower
> Node count: 26
> $ echo stat | nc 10.81.12.145 2181
> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
> Clients:
>  /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
>  /10.81.13.173:35678[1](queued=0,recved=0,sent=0) # it is caused by the nc 
> process
> Latency min/avg/max: 0/2/213
> Received: 26700
> Sent: 0
> Outstanding: 0
> Zxid: 0x1e000001f5
> Mode: leader
> Node count: 26
> The three 'stat' commands show different Node count! 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to