if we can't rely on the clock, we cannot say things like "if ... for 5 seconds".

also, clients connect to servers, not visa-versa, so we cannot say things like "server can attempt to reconnect".


ben

On 08/19/2010 10:17 AM, Vishal K wrote:
Hi Ted,

I haven't give it a serious thought yet, but I don't think it is neccessary
for the cluster to keep track of time.

A node can make its own decision. For the sake of argument, lets say that we
have a client and a server with following policy:
1. Client is supposed to send a ping to server every 1 sec.
2. If server does not hear from client for 5 seconds, then the server
declares that the client is dead.
3. Similary if the client cannot communicate with the server for 5 seconds
client declares that the server is dead.

If the client receives a timeout (say while doing some IO) because of a time
jump, it should check the number of pings that has failed with the server.
If the number is 5, then this is a true failure, If the number is less than
5, then this is because of a time drift.

At the server side, the server can attempt to reconnect (or send a ping to
the client) after it receives a timeout. Thus, if the timeout occured
because of time drift, the server will reconnect and continue. We should
ofcourse have an upper bound in number of retries, etc.

For ZK, it is important to handle time jumps on ZK leader.

I believe that the pattern of these problems is a slow slippage behind and
a
sudden jump forward.


You won't see the slippage. You will mainly see a jump forward. Note with
large enough number of nodes, multiple nodes could see their time jumping
forward. Therefore, checking comparing time between two servers may not
help.


On Thu, Aug 19, 2010 at 7:51 AM, Vishal K<vishalm...@gmail.com>  wrote:

Hi,

I remember Ben had opened a jira for clock jumps earlier:
https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon
to
have clocks jump forward in virtualized environments.

It is desirable to modify ZooKeeper to handle this situation (as much as
possible) internally. It would need to be done for both client - server
connections and server - server connections. One obvious solution is to
retry a few times (send ping) after getting a timeout. Another way is to
count the number of pings that have been sent after receiving the
timeout.
If number of pings do not match the expected number (say 5 ping attempt
should be finished for a 5 sec timeout), then wait till all the pings are
finished. In effect do not completely rely on the clock. Any comments?

-Vishal

On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan<qing...@gmail.com>  wrote:

Oh.. our servers are also running in a virtualized environment.

On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite<waite....@gmail.com>
wrote:
Hi,

I have tripped over similar problems testing Red Hat Cluster in
virtualised
environments.  I don't know whether recent linux kernels have
improved
their
interaction with VMWare, but in our environments clock drift caused
by
lost
ticks can be substantial, requiring NTP to sometimes jump the clock
rather
than control acceleration.   In one of our internal production rigs,
the
local NTP servers themselves were virtualised - causing absolute
mayhem
when
heavy loads hit the other guests on the same physical hosts.

The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time
by
10
seconds always causes a member to prematurely time-out on a network
read,
causing the member to drop out and trigger a cluster reconfiguration.
Apparently NTP is integrated with RHCS version 3, but I don't know
what
is
meant by that.

I guess this post is not entirely relevent to ZK, but I am just
making
the
point that virtualisation (of NTP servers and or clients) can cause
repeated
premature timeouts.  On Linux, I believe that there is a class of
timers
provided that is immune to this, but I doubt that there is a platform
independent way of coping with this.

My 2p.

regards,
Martin

On 18 August 2010 16:53, Patrick Hunt<ph...@apache.org>  wrote:

Do you expect the time to be "wrong" frequently? If ntp is running
it
should never get out of sync more than a small amount. As long as
this
is
less than ~your timeout you should be fine.

Patrick


On 08/18/2010 01:04 AM, Qing Yan wrote:

Hi,

    The testcase is fairly simple. We have a client which connects
to
ZK,
registers an ephemeral node and watches on it. Now change the
client
machine's time - session killed..

    Here is the log:

*2010-08-18 04:24:57,782 INFO
com.taobao.timetunnel2.cluster.service.AgentService: Host name
kgbtest1.corp.alimama.com
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:zookeeper.version=3.2.2-888565, built on 12/08/2009
21:51
GMT
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:host.name=kgbtest1.corp.alimama.com
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:java.version=1.6.0_13
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:java.vendor=Sun Microsystems Inc.
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:java.home=/usr/java/jdk1.6.0_13/jre
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client

environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client

environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:java.io.tmpdir=/tmp
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:java.compiler=<NA>
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:os.name=Linux
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:os.arch=amd64
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:os.version=2.6.18-164.el5
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:user.name=admin
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:user.home=/home/admin
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
Client
environment:user.dir=/home/admin/TimeTunnel2/cluster/log
2010-08-18 04:24:57,790 INFO org.apache.zookeeper.ZooKeeper:
Initiating
client connection, connectString=
xentest10-vm5.corp.alimama.com:2181,
xentest10-vm6.corp.alimama.com:2181,
xentest10-vm9.corp.alimama.com:2181
sessionTimeout=600000

watcher=com.taobao.timetunnel2.cluster.service.agentserv...@48d6c16c
2010-08-18 04:24:57,791 INFO org.apache.zookeeper.ClientCnxn:
zookeeper.disableAutoWatchReset is false
2010-08-18 04:24:57,817 INFO org.apache.zookeeper.ClientCnxn:
Attempting
connection to server
xentest10-vm5.corp.alimama.com/10.254.5.160:2181
2010-08-18 04:24:57,818 INFO org.apache.zookeeper.ClientCnxn:
Priming
connection to java.nio.channels.SocketChannel[connected local=/
10.254.2.160:48536 remote=
xentest10-vm5.corp.alimama.com/10.254.5.160:2181]
2010-08-18 04:24:57,821 INFO org.apache.zookeeper.ClientCnxn:
Server
connection successful
2010-08-18 04:24:57,832 INFO
com.taobao.timetunnel2.cluster.service.AgentService: Register...
2010-08-18 04:24:57,834 DEBUG org.apache.zookeeper.ClientCnxn:
Reading
reply
sessionid:0x12a8413f71a0003, packet:: clientPath:null
serverPath:null
finished:false header:: -8,101  replyHeader:: -8,-1,0  request::
0,v{},v{},v{}  response:: null
2010-08-18 04:24:57,846 DEBUG org.apache.zookeeper.ClientCnxn:
Reading
reply
sessionid:0x12a8413f71a0003, packet:: clientPath:null
serverPath:null
finished:false header:: 1,1  replyHeader:: 1,8589934649,-110
  request::
'/agent,,v{s{31,s{'world,'anyone}}},0  response::
2010-08-18 04:24:57,853 DEBUG
com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator:
Path
/agent
already exist
2010-08-18 04:24:57,882 DEBUG org.apache.zookeeper.ClientCnxn:
Reading
reply
sessionid:0x12a8413f71a0003, packet:: clientPath:null
serverPath:null
finished:false header:: 2,1  replyHeader:: 2,8589934650,-110
  request::
'/agent/status,,v{s{31,s{'world,'anyone}}},0  response::
2010-08-18 04:24:57,882 DEBUG
com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator:
Path
/agent/status already exist
2010-08-18 04:24:57,886 DEBUG org.apache.zookeeper.ClientCnxn:
Reading
reply
sessionid:0x12a8413f71a0003, packet:: clientPath:null
serverPath:null
finished:false header:: 3,1  replyHeader:: 3,8589934651,0
  request::
'/agent/status/kgbtest1.corp.alimama.com
,,v{s{31,s{'world,'anyone}}},1
response:: '/agent/status/kgbtest1.corp.alimama.com
2010-08-18 04:24:57,886 DEBUG
com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator:
Path
/agent/status/kgbtest1.corp.alimama.com creation OK
2010-08-18 04:24:57,886 INFO
com.taobao.timetunnel2.cluster.service.AgentService: Watch for ->
/agent/status/kgbtest1.corp.alimama.com
2010-08-18 04:24:57,892 DEBUG org.apache.zookeeper.ClientCnxn:
Reading
reply
sessionid:0x12a8413f71a0003, packet:: clientPath:null
serverPath:null
finished:false header:: 4,4  replyHeader:: 4,-1,0  request::
'/agent/status/
kgbtest1.corp.alimama.com,T  response::


,s{8589934651,8589934651,1282116165777,1282116165777,0,0,0,84024764344696835,0,0,8589934651}
2010-08-18 04:24:57,892 INFO
com.taobao.timetunnel2.cluster.service.AgentService: Data changed
@/agent/status/kgbtest1.corp.alimama.com
2010-08-18 04:25:11,226 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x12a8413f71a0003 after 1ms
2010-08-18 04:25:24,560 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x12a8413f71a0003 after 2ms
2010-08-18 04:25:37,893 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x12a8413f71a0003 after 1ms
2010-08-18 04:19:44,853 DEBUG org.apache.zookeeper.ClientCnxn: Got
notification sessionid:0x12a8413f71a0003
2010-08-18 04:19:44,854 DEBUG org.apache.zookeeper.ClientCnxn: Got
WatchedEvent: Znode change. Path:
/agent/status/kgbtest1.corp.alimama.comType: NodeDeleted for
sessionid
0x12a8413f71a0003
2010-08-18 04:19:44,854 DEBUG

com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher:
getChildren event =>WatchedEvent: Znode change. Path:
/agent/status/
kgbtest1.corp.alimama.com Type: NodeDeleted
2010-08-18 04:19:44,855 WARN org.apache.zookeeper.ClientCnxn:
Exception
closing session 0x12a8413f71a0003 to
sun.nio.ch.selectionkeyi...@7ca3d4cf
java.io.IOException: Read error rc = -1
java.nio.DirectByteBuffer[pos=0
lim=4 cap=4]
     at


org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):701)
     at


org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
Attempting
connection to server
xentest10-vm6.corp.alimama.com/10.254.5.161:2181
2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
Priming
connection to java.nio.channels.SocketChannel[connected local=/
10.254.2.160:45776 remote=
xentest10-vm6.corp.alimama.com/10.254.5.161:2181]
2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
Server
connection successful
2010-08-18 04:19:45,593 WARN org.apache.zookeeper.ClientCnxn:
Exception
closing session 0x12a8413f71a0003 to
sun.nio.ch.selectionkeyi...@2fdb7df8
java.io.IOException: Session Expired
     at


org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(zookeeper:ClientCnxn.java):589)
     at


org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):709)
     at


org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
*   I guess ZK might rely on timestamp to  keep sessions alive,
but
we
have
NTP daemon running so machine time can get changed
automatically, is there a conflict?



Reply via email to