Hi Ben,

Comments inline..

On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> if we can't rely on the clock, we cannot say things like "if ... for 5
> seconds".
>
>
"if ... for 5 seconds" indicates the timeout give by the socket library.
After the timeout we can verify that the timeout received was not a side
effect of time jump by looking at the number of ping attempts.



> also, clients connect to servers, not visa-versa, so we cannot say things
> like "server can attempt to reconnect".
>

In the scenario described below, wouldn't it be ok for the server to just
send a ping request to see if the client is really dead?


> ben
>
>
> On 08/19/2010 10:17 AM, Vishal K wrote:
>
>> Hi Ted,
>>
>> I haven't give it a serious thought yet, but I don't think it is
>> neccessary
>> for the cluster to keep track of time.
>>
>> A node can make its own decision. For the sake of argument, lets say that
>> we
>> have a client and a server with following policy:
>> 1. Client is supposed to send a ping to server every 1 sec.
>> 2. If server does not hear from client for 5 seconds, then the server
>> declares that the client is dead.
>> 3. Similary if the client cannot communicate with the server for 5 seconds
>> client declares that the server is dead.
>>
>> If the client receives a timeout (say while doing some IO) because of a
>> time
>> jump, it should check the number of pings that has failed with the server.
>> If the number is 5, then this is a true failure, If the number is less
>> than
>> 5, then this is because of a time drift.
>>
>> At the server side, the server can attempt to reconnect (or send a ping to
>> the client) after it receives a timeout. Thus, if the timeout occured
>> because of time drift, the server will reconnect and continue. We should
>> ofcourse have an upper bound in number of retries, etc.
>>
>> For ZK, it is important to handle time jumps on ZK leader.
>>
>>
>>
>>> I believe that the pattern of these problems is a slow slippage behind
>>> and
>>> a
>>> sudden jump forward.
>>>
>>>
>>>
>>
>> You won't see the slippage. You will mainly see a jump forward. Note with
>> large enough number of nodes, multiple nodes could see their time jumping
>> forward. Therefore, checking comparing time between two servers may not
>> help.
>>
>>
>>
>>
>>> On Thu, Aug 19, 2010 at 7:51 AM, Vishal K<vishalm...@gmail.com>  wrote:
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> I remember Ben had opened a jira for clock jumps earlier:
>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon
>>>>
>>>>
>>> to
>>>
>>>
>>>> have clocks jump forward in virtualized environments.
>>>>
>>>> It is desirable to modify ZooKeeper to handle this situation (as much as
>>>> possible) internally. It would need to be done for both client - server
>>>> connections and server - server connections. One obvious solution is to
>>>> retry a few times (send ping) after getting a timeout. Another way is to
>>>> count the number of pings that have been sent after receiving the
>>>>
>>>>
>>> timeout.
>>>
>>>
>>>> If number of pings do not match the expected number (say 5 ping attempt
>>>> should be finished for a 5 sec timeout), then wait till all the pings
>>>> are
>>>> finished. In effect do not completely rely on the clock. Any comments?
>>>>
>>>> -Vishal
>>>>
>>>> On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan<qing...@gmail.com>  wrote:
>>>>
>>>>
>>>>
>>>>> Oh.. our servers are also running in a virtualized environment.
>>>>>
>>>>> On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite<waite....@gmail.com>
>>>>>
>>>>>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have tripped over similar problems testing Red Hat Cluster in
>>>>>>
>>>>>>
>>>>> virtualised
>>>>>
>>>>>
>>>>>> environments.  I don't know whether recent linux kernels have
>>>>>>
>>>>>>
>>>>> improved
>>>
>>>
>>>> their
>>>>>> interaction with VMWare, but in our environments clock drift caused
>>>>>>
>>>>>>
>>>>> by
>>>
>>>
>>>> lost
>>>>>
>>>>>
>>>>>> ticks can be substantial, requiring NTP to sometimes jump the clock
>>>>>>
>>>>>>
>>>>> rather
>>>>>
>>>>>
>>>>>> than control acceleration.   In one of our internal production rigs,
>>>>>>
>>>>>>
>>>>> the
>>>>
>>>>
>>>>> local NTP servers themselves were virtualised - causing absolute
>>>>>>
>>>>>>
>>>>> mayhem
>>>
>>>
>>>> when
>>>>>> heavy loads hit the other guests on the same physical hosts.
>>>>>>
>>>>>> The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time
>>>>>>
>>>>>>
>>>>> by
>>>
>>>
>>>> 10
>>>>>
>>>>>
>>>>>> seconds always causes a member to prematurely time-out on a network
>>>>>>
>>>>>>
>>>>> read,
>>>>
>>>>
>>>>> causing the member to drop out and trigger a cluster reconfiguration.
>>>>>> Apparently NTP is integrated with RHCS version 3, but I don't know
>>>>>>
>>>>>>
>>>>> what
>>>
>>>
>>>> is
>>>>>
>>>>>
>>>>>> meant by that.
>>>>>>
>>>>>> I guess this post is not entirely relevent to ZK, but I am just
>>>>>>
>>>>>>
>>>>> making
>>>
>>>
>>>> the
>>>>>
>>>>>
>>>>>> point that virtualisation (of NTP servers and or clients) can cause
>>>>>> repeated
>>>>>> premature timeouts.  On Linux, I believe that there is a class of
>>>>>>
>>>>>>
>>>>> timers
>>>>
>>>>
>>>>> provided that is immune to this, but I doubt that there is a platform
>>>>>> independent way of coping with this.
>>>>>>
>>>>>> My 2p.
>>>>>>
>>>>>> regards,
>>>>>> Martin
>>>>>>
>>>>>> On 18 August 2010 16:53, Patrick Hunt<ph...@apache.org>  wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Do you expect the time to be "wrong" frequently? If ntp is running
>>>>>>>
>>>>>>>
>>>>>> it
>>>
>>>
>>>> should never get out of sync more than a small amount. As long as
>>>>>>>
>>>>>>>
>>>>>> this
>>>>
>>>>
>>>>> is
>>>>>
>>>>>
>>>>>> less than ~your timeout you should be fine.
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>>
>>>>>>> On 08/18/2010 01:04 AM, Qing Yan wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>    The testcase is fairly simple. We have a client which connects
>>>>>>>>
>>>>>>>>
>>>>>>> to
>>>
>>>
>>>> ZK,
>>>>>
>>>>>
>>>>>> registers an ephemeral node and watches on it. Now change the
>>>>>>>>
>>>>>>>>
>>>>>>> client
>>>
>>>
>>>> machine's time - session killed..
>>>>>>>>
>>>>>>>>    Here is the log:
>>>>>>>>
>>>>>>>> *2010-08-18 04:24:57,782 INFO
>>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Host name
>>>>>>>> kgbtest1.corp.alimama.com
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:zookeeper.version=3.2.2-888565, built on 12/08/2009
>>>>>>>>
>>>>>>>>
>>>>>>> 21:51
>>>>
>>>>
>>>>> GMT
>>>>>>
>>>>>>
>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:host.name=kgbtest1.corp.alimama.com
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:java.version=1.6.0_13
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:java.vendor=Sun Microsystems Inc.
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:java.home=/usr/java/jdk1.6.0_13/jre
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
>>>
>>>
>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
>>>
>>>
>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:java.io.tmpdir=/tmp
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:java.compiler=<NA>
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:os.name=Linux
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:os.arch=amd64
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:os.version=2.6.18-164.el5
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:user.name=admin
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:user.home=/home/admin
>>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Client
>>>
>>>
>>>> environment:user.dir=/home/admin/TimeTunnel2/cluster/log
>>>>>>>> 2010-08-18 04:24:57,790 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>
>>>>>>>>
>>>>>>> Initiating
>>>>>
>>>>>
>>>>>> client connection, connectString=
>>>>>>>>
>>>>>>>>
>>>>>>> xentest10-vm5.corp.alimama.com:2181,
>>>>
>>>>
>>>>> xentest10-vm6.corp.alimama.com:2181,
>>>>>>>>
>>>>>>>>
>>>>>>> xentest10-vm9.corp.alimama.com:2181
>>>>>
>>>>>
>>>>>> sessionTimeout=600000
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> watcher=com.taobao.timetunnel2.cluster.service.agentserv...@48d6c16c
>>>
>>>
>>>> 2010-08-18 04:24:57,791 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>> zookeeper.disableAutoWatchReset is false
>>>>>>>> 2010-08-18 04:24:57,817 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Attempting
>>>>>
>>>>>
>>>>>> connection to server
>>>>>>>>
>>>>>>>>
>>>>>>> xentest10-vm5.corp.alimama.com/10.254.5.160:2181
>>>>
>>>>
>>>>> 2010-08-18 04:24:57,818 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Priming
>>>>
>>>>
>>>>> connection to java.nio.channels.SocketChannel[connected local=/
>>>>>>>> 10.254.2.160:48536 remote=
>>>>>>>> xentest10-vm5.corp.alimama.com/10.254.5.160:2181]
>>>>>>>> 2010-08-18 04:24:57,821 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Server
>>>
>>>
>>>> connection successful
>>>>>>>> 2010-08-18 04:24:57,832 INFO
>>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Register...
>>>>>>>> 2010-08-18 04:24:57,834 DEBUG org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Reading
>>>>
>>>>
>>>>> reply
>>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null
>>>>>>>>
>>>>>>>>
>>>>>>> serverPath:null
>>>>
>>>>
>>>>> finished:false header:: -8,101  replyHeader:: -8,-1,0  request::
>>>>>>>> 0,v{},v{},v{}  response:: null
>>>>>>>> 2010-08-18 04:24:57,846 DEBUG org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Reading
>>>>
>>>>
>>>>> reply
>>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null
>>>>>>>>
>>>>>>>>
>>>>>>> serverPath:null
>>>>
>>>>
>>>>> finished:false header:: 1,1  replyHeader:: 1,8589934649,-110
>>>>>>>>
>>>>>>>>
>>>>>>>  request::
>>>>>
>>>>>
>>>>>> '/agent,,v{s{31,s{'world,'anyone}}},0  response::
>>>>>>>> 2010-08-18 04:24:57,853 DEBUG
>>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator:
>>>>>>>>
>>>>>>>>
>>>>>>> Path
>>>
>>>
>>>> /agent
>>>>>>>> already exist
>>>>>>>> 2010-08-18 04:24:57,882 DEBUG org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Reading
>>>>
>>>>
>>>>> reply
>>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null
>>>>>>>>
>>>>>>>>
>>>>>>> serverPath:null
>>>>
>>>>
>>>>> finished:false header:: 2,1  replyHeader:: 2,8589934650,-110
>>>>>>>>
>>>>>>>>
>>>>>>>  request::
>>>>>
>>>>>
>>>>>> '/agent/status,,v{s{31,s{'world,'anyone}}},0  response::
>>>>>>>> 2010-08-18 04:24:57,882 DEBUG
>>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator:
>>>>>>>>
>>>>>>>>
>>>>>>> Path
>>>
>>>
>>>> /agent/status already exist
>>>>>>>> 2010-08-18 04:24:57,886 DEBUG org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Reading
>>>>
>>>>
>>>>> reply
>>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null
>>>>>>>>
>>>>>>>>
>>>>>>> serverPath:null
>>>>
>>>>
>>>>> finished:false header:: 3,1  replyHeader:: 3,8589934651,0
>>>>>>>>
>>>>>>>>
>>>>>>>  request::
>>>
>>>
>>>> '/agent/status/kgbtest1.corp.alimama.com
>>>>>>>>
>>>>>>>>
>>>>>>> ,,v{s{31,s{'world,'anyone}}},1
>>>>>
>>>>>
>>>>>> response:: '/agent/status/kgbtest1.corp.alimama.com
>>>>>>>> 2010-08-18 04:24:57,886 DEBUG
>>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator:
>>>>>>>>
>>>>>>>>
>>>>>>> Path
>>>
>>>
>>>> /agent/status/kgbtest1.corp.alimama.com creation OK
>>>>>>>> 2010-08-18 04:24:57,886 INFO
>>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Watch for ->
>>>>>>>> /agent/status/kgbtest1.corp.alimama.com
>>>>>>>> 2010-08-18 04:24:57,892 DEBUG org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Reading
>>>>
>>>>
>>>>> reply
>>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null
>>>>>>>>
>>>>>>>>
>>>>>>> serverPath:null
>>>>
>>>>
>>>>> finished:false header:: 4,4  replyHeader:: 4,-1,0  request::
>>>>>>>> '/agent/status/
>>>>>>>> kgbtest1.corp.alimama.com,T  response::
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> ,s{8589934651,8589934651,1282116165777,1282116165777,0,0,0,84024764344696835,0,0,8589934651}
>>>
>>>
>>>> 2010-08-18 04:24:57,892 INFO
>>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Data changed
>>>>>>>> @/agent/status/kgbtest1.corp.alimama.com
>>>>>>>> 2010-08-18 04:25:11,226 DEBUG org.apache.zookeeper.ClientCnxn: Got
>>>>>>>>
>>>>>>>>
>>>>>>> ping
>>>>>
>>>>>
>>>>>> response for sessionid:0x12a8413f71a0003 after 1ms
>>>>>>>> 2010-08-18 04:25:24,560 DEBUG org.apache.zookeeper.ClientCnxn: Got
>>>>>>>>
>>>>>>>>
>>>>>>> ping
>>>>>
>>>>>
>>>>>> response for sessionid:0x12a8413f71a0003 after 2ms
>>>>>>>> 2010-08-18 04:25:37,893 DEBUG org.apache.zookeeper.ClientCnxn: Got
>>>>>>>>
>>>>>>>>
>>>>>>> ping
>>>>>
>>>>>
>>>>>> response for sessionid:0x12a8413f71a0003 after 1ms
>>>>>>>> 2010-08-18 04:19:44,853 DEBUG org.apache.zookeeper.ClientCnxn: Got
>>>>>>>> notification sessionid:0x12a8413f71a0003
>>>>>>>> 2010-08-18 04:19:44,854 DEBUG org.apache.zookeeper.ClientCnxn: Got
>>>>>>>> WatchedEvent: Znode change. Path:
>>>>>>>> /agent/status/kgbtest1.corp.alimama.comType: NodeDeleted for
>>>>>>>>
>>>>>>>>
>>>>>>> sessionid
>>>>
>>>>
>>>>> 0x12a8413f71a0003
>>>>>>>> 2010-08-18 04:19:44,854 DEBUG
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher:
>>>
>>>
>>>> getChildren event =>WatchedEvent: Znode change. Path:
>>>>>>>>
>>>>>>>>
>>>>>>> /agent/status/
>>>
>>>
>>>> kgbtest1.corp.alimama.com Type: NodeDeleted
>>>>>>>> 2010-08-18 04:19:44,855 WARN org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Exception
>>>>>
>>>>>
>>>>>> closing session 0x12a8413f71a0003 to
>>>>>>>>
>>>>>>>>
>>>>>>> sun.nio.ch.selectionkeyi...@7ca3d4cf
>>>>>>
>>>>>>
>>>>>>> java.io.IOException: Read error rc = -1
>>>>>>>>
>>>>>>>>
>>>>>>> java.nio.DirectByteBuffer[pos=0
>>>>>
>>>>>
>>>>>> lim=4 cap=4]
>>>>>>>>     at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):701)
>>>
>>>
>>>>     at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
>>>
>>>
>>>> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Attempting
>>>>>
>>>>>
>>>>>> connection to server
>>>>>>>>
>>>>>>>>
>>>>>>> xentest10-vm6.corp.alimama.com/10.254.5.161:2181
>>>>
>>>>
>>>>> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Priming
>>>>
>>>>
>>>>> connection to java.nio.channels.SocketChannel[connected local=/
>>>>>>>> 10.254.2.160:45776 remote=
>>>>>>>> xentest10-vm6.corp.alimama.com/10.254.5.161:2181]
>>>>>>>> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Server
>>>
>>>
>>>> connection successful
>>>>>>>> 2010-08-18 04:19:45,593 WARN org.apache.zookeeper.ClientCnxn:
>>>>>>>>
>>>>>>>>
>>>>>>> Exception
>>>>>
>>>>>
>>>>>> closing session 0x12a8413f71a0003 to
>>>>>>>>
>>>>>>>>
>>>>>>> sun.nio.ch.selectionkeyi...@2fdb7df8
>>>>>>
>>>>>>
>>>>>>> java.io.IOException: Session Expired
>>>>>>>>     at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(zookeeper:ClientCnxn.java):589)
>>>
>>>
>>>>     at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):709)
>>>
>>>
>>>>     at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
>>>
>>>
>>>> *   I guess ZK might rely on timestamp to  keep sessions alive,
>>>>>>>>
>>>>>>>>
>>>>>>> but
>>>
>>>
>>>> we
>>>>
>>>>
>>>>> have
>>>>>>>> NTP daemon running so machine time can get changed
>>>>>>>> automatically, is there a conflict?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to