Hi Ben, Comments inline..
On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed <br...@yahoo-inc.com> wrote: > if we can't rely on the clock, we cannot say things like "if ... for 5 > seconds". > > "if ... for 5 seconds" indicates the timeout give by the socket library. After the timeout we can verify that the timeout received was not a side effect of time jump by looking at the number of ping attempts. > also, clients connect to servers, not visa-versa, so we cannot say things > like "server can attempt to reconnect". > In the scenario described below, wouldn't it be ok for the server to just send a ping request to see if the client is really dead? > ben > > > On 08/19/2010 10:17 AM, Vishal K wrote: > >> Hi Ted, >> >> I haven't give it a serious thought yet, but I don't think it is >> neccessary >> for the cluster to keep track of time. >> >> A node can make its own decision. For the sake of argument, lets say that >> we >> have a client and a server with following policy: >> 1. Client is supposed to send a ping to server every 1 sec. >> 2. If server does not hear from client for 5 seconds, then the server >> declares that the client is dead. >> 3. Similary if the client cannot communicate with the server for 5 seconds >> client declares that the server is dead. >> >> If the client receives a timeout (say while doing some IO) because of a >> time >> jump, it should check the number of pings that has failed with the server. >> If the number is 5, then this is a true failure, If the number is less >> than >> 5, then this is because of a time drift. >> >> At the server side, the server can attempt to reconnect (or send a ping to >> the client) after it receives a timeout. Thus, if the timeout occured >> because of time drift, the server will reconnect and continue. We should >> ofcourse have an upper bound in number of retries, etc. >> >> For ZK, it is important to handle time jumps on ZK leader. >> >> >> >>> I believe that the pattern of these problems is a slow slippage behind >>> and >>> a >>> sudden jump forward. >>> >>> >>> >> >> You won't see the slippage. You will mainly see a jump forward. Note with >> large enough number of nodes, multiple nodes could see their time jumping >> forward. Therefore, checking comparing time between two servers may not >> help. >> >> >> >> >>> On Thu, Aug 19, 2010 at 7:51 AM, Vishal K<vishalm...@gmail.com> wrote: >>> >>> >>> >>>> Hi, >>>> >>>> I remember Ben had opened a jira for clock jumps earlier: >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon >>>> >>>> >>> to >>> >>> >>>> have clocks jump forward in virtualized environments. >>>> >>>> It is desirable to modify ZooKeeper to handle this situation (as much as >>>> possible) internally. It would need to be done for both client - server >>>> connections and server - server connections. One obvious solution is to >>>> retry a few times (send ping) after getting a timeout. Another way is to >>>> count the number of pings that have been sent after receiving the >>>> >>>> >>> timeout. >>> >>> >>>> If number of pings do not match the expected number (say 5 ping attempt >>>> should be finished for a 5 sec timeout), then wait till all the pings >>>> are >>>> finished. In effect do not completely rely on the clock. Any comments? >>>> >>>> -Vishal >>>> >>>> On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan<qing...@gmail.com> wrote: >>>> >>>> >>>> >>>>> Oh.. our servers are also running in a virtualized environment. >>>>> >>>>> On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite<waite....@gmail.com> >>>>> >>>>> >>>> wrote: >>>> >>>> >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I have tripped over similar problems testing Red Hat Cluster in >>>>>> >>>>>> >>>>> virtualised >>>>> >>>>> >>>>>> environments. I don't know whether recent linux kernels have >>>>>> >>>>>> >>>>> improved >>> >>> >>>> their >>>>>> interaction with VMWare, but in our environments clock drift caused >>>>>> >>>>>> >>>>> by >>> >>> >>>> lost >>>>> >>>>> >>>>>> ticks can be substantial, requiring NTP to sometimes jump the clock >>>>>> >>>>>> >>>>> rather >>>>> >>>>> >>>>>> than control acceleration. In one of our internal production rigs, >>>>>> >>>>>> >>>>> the >>>> >>>> >>>>> local NTP servers themselves were virtualised - causing absolute >>>>>> >>>>>> >>>>> mayhem >>> >>> >>>> when >>>>>> heavy loads hit the other guests on the same physical hosts. >>>>>> >>>>>> The effect on RHCS (v2.0) is quite dramatic. A forward jump in time >>>>>> >>>>>> >>>>> by >>> >>> >>>> 10 >>>>> >>>>> >>>>>> seconds always causes a member to prematurely time-out on a network >>>>>> >>>>>> >>>>> read, >>>> >>>> >>>>> causing the member to drop out and trigger a cluster reconfiguration. >>>>>> Apparently NTP is integrated with RHCS version 3, but I don't know >>>>>> >>>>>> >>>>> what >>> >>> >>>> is >>>>> >>>>> >>>>>> meant by that. >>>>>> >>>>>> I guess this post is not entirely relevent to ZK, but I am just >>>>>> >>>>>> >>>>> making >>> >>> >>>> the >>>>> >>>>> >>>>>> point that virtualisation (of NTP servers and or clients) can cause >>>>>> repeated >>>>>> premature timeouts. On Linux, I believe that there is a class of >>>>>> >>>>>> >>>>> timers >>>> >>>> >>>>> provided that is immune to this, but I doubt that there is a platform >>>>>> independent way of coping with this. >>>>>> >>>>>> My 2p. >>>>>> >>>>>> regards, >>>>>> Martin >>>>>> >>>>>> On 18 August 2010 16:53, Patrick Hunt<ph...@apache.org> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Do you expect the time to be "wrong" frequently? If ntp is running >>>>>>> >>>>>>> >>>>>> it >>> >>> >>>> should never get out of sync more than a small amount. As long as >>>>>>> >>>>>>> >>>>>> this >>>> >>>> >>>>> is >>>>> >>>>> >>>>>> less than ~your timeout you should be fine. >>>>>>> >>>>>>> Patrick >>>>>>> >>>>>>> >>>>>>> On 08/18/2010 01:04 AM, Qing Yan wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> The testcase is fairly simple. We have a client which connects >>>>>>>> >>>>>>>> >>>>>>> to >>> >>> >>>> ZK, >>>>> >>>>> >>>>>> registers an ephemeral node and watches on it. Now change the >>>>>>>> >>>>>>>> >>>>>>> client >>> >>> >>>> machine's time - session killed.. >>>>>>>> >>>>>>>> Here is the log: >>>>>>>> >>>>>>>> *2010-08-18 04:24:57,782 INFO >>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Host name >>>>>>>> kgbtest1.corp.alimama.com >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 >>>>>>>> >>>>>>>> >>>>>>> 21:51 >>>> >>>> >>>>> GMT >>>>>> >>>>>> >>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:host.name=kgbtest1.corp.alimama.com >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:java.version=1.6.0_13 >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:java.vendor=Sun Microsystems Inc. >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:java.home=/usr/java/jdk1.6.0_13/jre >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar >>> >>> >>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib >>> >>> >>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:java.io.tmpdir=/tmp >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:java.compiler=<NA> >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:os.name=Linux >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:os.arch=amd64 >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:os.version=2.6.18-164.el5 >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:user.name=admin >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:user.home=/home/admin >>>>>>>> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Client >>> >>> >>>> environment:user.dir=/home/admin/TimeTunnel2/cluster/log >>>>>>>> 2010-08-18 04:24:57,790 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>> >>>>>>>> >>>>>>> Initiating >>>>> >>>>> >>>>>> client connection, connectString= >>>>>>>> >>>>>>>> >>>>>>> xentest10-vm5.corp.alimama.com:2181, >>>> >>>> >>>>> xentest10-vm6.corp.alimama.com:2181, >>>>>>>> >>>>>>>> >>>>>>> xentest10-vm9.corp.alimama.com:2181 >>>>> >>>>> >>>>>> sessionTimeout=600000 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> watcher=com.taobao.timetunnel2.cluster.service.agentserv...@48d6c16c >>> >>> >>>> 2010-08-18 04:24:57,791 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> zookeeper.disableAutoWatchReset is false >>>>>>>> 2010-08-18 04:24:57,817 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Attempting >>>>> >>>>> >>>>>> connection to server >>>>>>>> >>>>>>>> >>>>>>> xentest10-vm5.corp.alimama.com/10.254.5.160:2181 >>>> >>>> >>>>> 2010-08-18 04:24:57,818 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Priming >>>> >>>> >>>>> connection to java.nio.channels.SocketChannel[connected local=/ >>>>>>>> 10.254.2.160:48536 remote= >>>>>>>> xentest10-vm5.corp.alimama.com/10.254.5.160:2181] >>>>>>>> 2010-08-18 04:24:57,821 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Server >>> >>> >>>> connection successful >>>>>>>> 2010-08-18 04:24:57,832 INFO >>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Register... >>>>>>>> 2010-08-18 04:24:57,834 DEBUG org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Reading >>>> >>>> >>>>> reply >>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null >>>>>>>> >>>>>>>> >>>>>>> serverPath:null >>>> >>>> >>>>> finished:false header:: -8,101 replyHeader:: -8,-1,0 request:: >>>>>>>> 0,v{},v{},v{} response:: null >>>>>>>> 2010-08-18 04:24:57,846 DEBUG org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Reading >>>> >>>> >>>>> reply >>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null >>>>>>>> >>>>>>>> >>>>>>> serverPath:null >>>> >>>> >>>>> finished:false header:: 1,1 replyHeader:: 1,8589934649,-110 >>>>>>>> >>>>>>>> >>>>>>> request:: >>>>> >>>>> >>>>>> '/agent,,v{s{31,s{'world,'anyone}}},0 response:: >>>>>>>> 2010-08-18 04:24:57,853 DEBUG >>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: >>>>>>>> >>>>>>>> >>>>>>> Path >>> >>> >>>> /agent >>>>>>>> already exist >>>>>>>> 2010-08-18 04:24:57,882 DEBUG org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Reading >>>> >>>> >>>>> reply >>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null >>>>>>>> >>>>>>>> >>>>>>> serverPath:null >>>> >>>> >>>>> finished:false header:: 2,1 replyHeader:: 2,8589934650,-110 >>>>>>>> >>>>>>>> >>>>>>> request:: >>>>> >>>>> >>>>>> '/agent/status,,v{s{31,s{'world,'anyone}}},0 response:: >>>>>>>> 2010-08-18 04:24:57,882 DEBUG >>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: >>>>>>>> >>>>>>>> >>>>>>> Path >>> >>> >>>> /agent/status already exist >>>>>>>> 2010-08-18 04:24:57,886 DEBUG org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Reading >>>> >>>> >>>>> reply >>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null >>>>>>>> >>>>>>>> >>>>>>> serverPath:null >>>> >>>> >>>>> finished:false header:: 3,1 replyHeader:: 3,8589934651,0 >>>>>>>> >>>>>>>> >>>>>>> request:: >>> >>> >>>> '/agent/status/kgbtest1.corp.alimama.com >>>>>>>> >>>>>>>> >>>>>>> ,,v{s{31,s{'world,'anyone}}},1 >>>>> >>>>> >>>>>> response:: '/agent/status/kgbtest1.corp.alimama.com >>>>>>>> 2010-08-18 04:24:57,886 DEBUG >>>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: >>>>>>>> >>>>>>>> >>>>>>> Path >>> >>> >>>> /agent/status/kgbtest1.corp.alimama.com creation OK >>>>>>>> 2010-08-18 04:24:57,886 INFO >>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Watch for -> >>>>>>>> /agent/status/kgbtest1.corp.alimama.com >>>>>>>> 2010-08-18 04:24:57,892 DEBUG org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Reading >>>> >>>> >>>>> reply >>>>>>>> sessionid:0x12a8413f71a0003, packet:: clientPath:null >>>>>>>> >>>>>>>> >>>>>>> serverPath:null >>>> >>>> >>>>> finished:false header:: 4,4 replyHeader:: 4,-1,0 request:: >>>>>>>> '/agent/status/ >>>>>>>> kgbtest1.corp.alimama.com,T response:: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> ,s{8589934651,8589934651,1282116165777,1282116165777,0,0,0,84024764344696835,0,0,8589934651} >>> >>> >>>> 2010-08-18 04:24:57,892 INFO >>>>>>>> com.taobao.timetunnel2.cluster.service.AgentService: Data changed >>>>>>>> @/agent/status/kgbtest1.corp.alimama.com >>>>>>>> 2010-08-18 04:25:11,226 DEBUG org.apache.zookeeper.ClientCnxn: Got >>>>>>>> >>>>>>>> >>>>>>> ping >>>>> >>>>> >>>>>> response for sessionid:0x12a8413f71a0003 after 1ms >>>>>>>> 2010-08-18 04:25:24,560 DEBUG org.apache.zookeeper.ClientCnxn: Got >>>>>>>> >>>>>>>> >>>>>>> ping >>>>> >>>>> >>>>>> response for sessionid:0x12a8413f71a0003 after 2ms >>>>>>>> 2010-08-18 04:25:37,893 DEBUG org.apache.zookeeper.ClientCnxn: Got >>>>>>>> >>>>>>>> >>>>>>> ping >>>>> >>>>> >>>>>> response for sessionid:0x12a8413f71a0003 after 1ms >>>>>>>> 2010-08-18 04:19:44,853 DEBUG org.apache.zookeeper.ClientCnxn: Got >>>>>>>> notification sessionid:0x12a8413f71a0003 >>>>>>>> 2010-08-18 04:19:44,854 DEBUG org.apache.zookeeper.ClientCnxn: Got >>>>>>>> WatchedEvent: Znode change. Path: >>>>>>>> /agent/status/kgbtest1.corp.alimama.comType: NodeDeleted for >>>>>>>> >>>>>>>> >>>>>>> sessionid >>>> >>>> >>>>> 0x12a8413f71a0003 >>>>>>>> 2010-08-18 04:19:44,854 DEBUG >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher: >>> >>> >>>> getChildren event =>WatchedEvent: Znode change. Path: >>>>>>>> >>>>>>>> >>>>>>> /agent/status/ >>> >>> >>>> kgbtest1.corp.alimama.com Type: NodeDeleted >>>>>>>> 2010-08-18 04:19:44,855 WARN org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Exception >>>>> >>>>> >>>>>> closing session 0x12a8413f71a0003 to >>>>>>>> >>>>>>>> >>>>>>> sun.nio.ch.selectionkeyi...@7ca3d4cf >>>>>> >>>>>> >>>>>>> java.io.IOException: Read error rc = -1 >>>>>>>> >>>>>>>> >>>>>>> java.nio.DirectByteBuffer[pos=0 >>>>> >>>>> >>>>>> lim=4 cap=4] >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):701) >>> >>> >>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945) >>> >>> >>>> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Attempting >>>>> >>>>> >>>>>> connection to server >>>>>>>> >>>>>>>> >>>>>>> xentest10-vm6.corp.alimama.com/10.254.5.161:2181 >>>> >>>> >>>>> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Priming >>>> >>>> >>>>> connection to java.nio.channels.SocketChannel[connected local=/ >>>>>>>> 10.254.2.160:45776 remote= >>>>>>>> xentest10-vm6.corp.alimama.com/10.254.5.161:2181] >>>>>>>> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Server >>> >>> >>>> connection successful >>>>>>>> 2010-08-18 04:19:45,593 WARN org.apache.zookeeper.ClientCnxn: >>>>>>>> >>>>>>>> >>>>>>> Exception >>>>> >>>>> >>>>>> closing session 0x12a8413f71a0003 to >>>>>>>> >>>>>>>> >>>>>>> sun.nio.ch.selectionkeyi...@2fdb7df8 >>>>>> >>>>>> >>>>>>> java.io.IOException: Session Expired >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(zookeeper:ClientCnxn.java):589) >>> >>> >>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):709) >>> >>> >>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945) >>> >>> >>>> * I guess ZK might rely on timestamp to keep sessions alive, >>>>>>>> >>>>>>>> >>>>>>> but >>> >>> >>>> we >>>> >>>> >>>>> have >>>>>>>> NTP daemon running so machine time can get changed >>>>>>>> automatically, is there a conflict? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >