Re: A very strange scenario, may due to some bug on the server side

Mahadev Konar Tue, 15 Dec 2009 20:49:29 -0800

Hi Qian,
 That's infortunate. I think we lost most of the information. But you can
still grep for the session in the logs and see if you find something and
attach it to the jira.


This is quite critical. So if you can recreate the scenario by trying things
out with a INFO level log it would be great.


Thanks
mahadev


On 12/15/09 6:32 PM, "Qian Ye" <yeqian....@gmail.com> wrote:

> Sorry, my friend wrote a wrong log4j.properties, only record the log above
> the WARN level. Will it help if I correct the log4j.properties and restart
> the zookeeper server on 10.81.12.144. Will the information about session
> 0x32524d5440e022a be recorded in this way?
> 
> On Wed, Dec 16, 2009 at 1:46 AM, Mahadev Konar <maha...@yahoo-inc.com>wrote:
> 
>> Hi Qian,
>>  This is quite weird. Are you sure the version is 3.2.1?
>>   If yes, please create a jira for this.
>> 
>>  Also, can you extract the server logs for the session
>> 
>> 
>>>>         ephemeralOwner: 226627854640480810
>> 
>> And post it on a jira? Ephemeral Owner is the session id. You can convert
>> the above number to hex and look through the logs to see what happened to
>> this session and post the logs on the jira. Looks like the session close
>> for
>> the session (226627854640480810) wasn't successful (a bug mostly). So we
>> need to trace back on what happened on a close of this session and why it
>> did not close.
>> 
>> Grepping all the server logs for session id (0x32524d5440e022a, this is the
>> hex of the the above decimal number) might give us some insight into this.
>> 
>> 
>> Thanks
>> mahadev
>> 
>> On 12/15/09 7:44 AM, "Benjamin Reed" <br...@yahoo-inc.com> wrote:
>> 
>>> does  se/diserver_tc/diserver_tc0000000067 appear on all three servers?
>>> 
>>> ben
>>> 
>>> Qian Ye wrote:
>>>> Hi guys:
>>>> 
>>>> I find a very strange scenario today, I'm not sure how it happen, I just
>>>> found it like this. Maybe you can give me some information about it, my
>>>> Zookeeper Server is version 3.2.1.
>>>> 
>>>> My Zookeeper cluster contains three servers, with ip:
>>>> 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create
>> ephemeral
>>>> node under znode: *se/diserver_tc*. The client runs on the server with
>> ip
>>>> 10.81.13.173. The client can create a ephemeral node on zookeeper server
>> and
>>>> write the host ip (10.81.13.173) in to the node as its data. There is
>> only
>>>> one client process can be running at a time, because the client will
>> listen
>>>> to a certain port.
>>>> 
>>>> It is strange that I found there were two ephemeral node with the ip
>>>> 10.81.13.173 under znode se/diserver_tc.
>>>> *se/diserver_tc/diserver_tc0000000067*
>>>> STAT:
>>>>         czxid: 124554079820
>>>>         mzxid: 124554079820
>>>>         ctime: 1260609598547
>>>>         mtime: 1260609598547
>>>>         version: 0
>>>>         cversion: 0
>>>>         aversion: 0
>>>>         ephemeralOwner: 226627854640480810
>>>>         dataLength: 92
>>>>         numChildren: 0
>>>>         pzxid: 124554079820
>>>> 
>>>> *se/diserver_tc/diserver_tc0000000095
>>>> *STAT:
>>>>         czxid: 128849019107
>>>>         mzxid: 128849019107
>>>>         ctime: 1260772197356
>>>>         mtime: 1260772197356
>>>>         version: 0
>>>>         cversion: 0
>>>>         aversion: 0
>>>>         ephemeralOwner: 154673159808876591
>>>>         dataLength: 92
>>>>         numChildren: 0
>>>>         pzxid: 128849019107*
>>>> *
>>>> There are TWO with different session id! And after I kill the client
>> process
>>>> on the server 10.81.13.173, the *se/diserver_tc/diserver_tc0000000095
>> *node
>>>> disappear, but the *se/diserver_tc/diserver_tc0000000067 *stay the same.
>>>> That means it is not my coding mistake to create the node twice. I
>> checked
>>>> several times and I'm sure that there is no another client instance
>> running.
>>>> And I use the 'stat' command to check the three zookeeper servers, and
>> there
>>>> is no client from 10.81.13.173,
>>>> 
>>>> $echo stat | nc 10.81.12.144 2181
>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
>>>> Clients:
>>>>  /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by
>> the nc
>>>> process*
>>>> 
>>>> Latency min/avg/max: 0/3/254
>>>> Received: 11081
>>>> Sent: 0
>>>> Outstanding: 0
>>>> Zxid: 0x1e000001f5
>>>> Mode: follower
>>>> *Node count: 32
>>>> *
>>>> $ echo stat | nc 10.81.12.141 2181
>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
>>>> Clients:
>>>>  /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
>>>>  /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by
>> the nc
>>>> process*
>>>> 
>>>> Latency min/avg/max: 0/0/37
>>>> Received: 37128
>>>> Sent: 0
>>>> Outstanding: 0
>>>> Zxid: 0x1e000001f5
>>>> Mode: follower
>>>> *Node count: 26*
>>>> 
>>>> $ echo stat | nc 10.81.12.145 2181
>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
>>>> Clients:
>>>>  /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
>>>>  /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by
>> the nc
>>>> process*
>>>> 
>>>> Latency min/avg/max: 0/2/213
>>>> Received: 26700
>>>> Sent: 0
>>>> Outstanding: 0
>>>> Zxid: 0x1e000001f5
>>>> Mode: leader
>>>> *Node count: 26*
>>>> 
>>>> The three 'stat' commands show different Node count! Just cannot
>> understand
>>>> how it happened, can anyone give me some explanation about it?
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>

Re: A very strange scenario, may due to some bug on the server side

Reply via email to