Re: Session expiration caused by time change

2010-08-20 Thread Martin Waite
Hi,

In our testing of Red Hat Cluster, we could reproduce the NTP impact by
jumping the clock backwards and forwards, just using the date command in a
tight-ish loop:

use strict;

my $dir = 1;

while (1) {
   jump_time( $dir );
   $dir = $dir * -1;
}

sub jump_time {
  my ($dir) = @_;

  my $step = 20 * $dir;
  my $time = scalar localtime( time() + $step  );
  print `/bin/date -s \$time\`, $?, \n;
  select(undef,undef,undef, 0.3 );
}

Obviously not a realistic test, but it soon flushes out problems.

regards,
Martin

On 19 August 2010 23:51, Benjamin Reed br...@yahoo-inc.com wrote:

 i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out.
 Qing (or anyone else, can you reproduce it pretty easily?)

 thanx
 ben


 On 08/19/2010 09:29 AM, Ted Dunning wrote:

 Nice (modulo inverting the  in your text).

 Option 2 seems very simple.  That always attracts me.

 On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reedbr...@yahoo-inc.com
  wrote:



 yes, you are right. we could do this. it turns out that the expiration
 code
 is very simple:

while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime  currentTime) {
this.wait(nextExpirationTime - currentTime);
continue;
}
SessionSet set;
set = sessionSets.remove(nextExpirationTime);
if (set != null) {
for (SessionImpl s : set.sessions) {
sessionsById.remove(s.sessionId);
 expirer.expire(s);
}
}
nextExpirationTime += expirationInterval;
}

 so we can detect a jump very easily: if nextExpirationTime  currentTime,
 we have jumped ahead in time.

 now the question is, what do we do with this information?

 option 1) we could figure out the jump (nextExpirationTime-currentTime is
 a
 good estimate) and move all of the sessions forward by that amount.
 option 2) we could converge on the time by having a policy to always wait
 at least a half a tick time.

 there probably are other options as well. i kind of like option 2. worst
 case is it will make the sessions expire in half the time that they
 should,
 but this shouldn't be too much of a problem since clients send a ping if
 they are idle for 1/3 of their session timeout.

 ben


 On 08/19/2010 08:39 AM, Ted Dunning wrote:



 True.  But it knows that there has been a jump.

 Quiet time can be distinguished from clock shift by assuming that
 members
 of
 the cluster
 don't all jump at the same time.

 I would imagine that a recent clock jump estimate could be kept and
 buckets that would
 otherwise expire due to such a jump could be given a bit of a second
 lease
 on life, delaying
 all of their expiration.  Since time-outs are relatively short, the
 server
 would be able to forget
 about the bump very shortly.

 On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reedbr...@yahoo-inc.com
  wrote:





 if we try to use network messages to detect and correct the situation,
 it
 seems like we would recreate the problem we are having with ntp, since
 that
 is exactly what it does.













Re: Session expiration caused by time change

2010-08-20 Thread Benjamin Reed
i put up a patch that should address the problem. now i need to write a 
test case. the only way i can think of is to change the call to 
System.currentTimeMillis to a utility class that calls 
System.currentTimeMillis that i can mock for testing. any better ideas?


ben

On 08/19/2010 03:53 PM, Ted Dunning wrote:

Put in a four letter command that will put the server to sleep for 15
seconds!

:-)

On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reedbr...@yahoo-inc.com  wrote:

   

i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out.
Qing (or anyone else, can you reproduce it pretty easily?)

 




Re: Session expiration caused by time change

2010-08-20 Thread Ted Dunning
Mocking the time via a utility was my thought. Mocking system itself  
is scary.


Sent from my iPhone

On Aug 20, 2010, at 1:18 PM, Benjamin Reed br...@yahoo-inc.com wrote:

i put up a patch that should address the problem. now i need to  
write a test case. the only way i can think of is to change the call  
to System.currentTimeMillis to a utility class that calls  
System.currentTimeMillis that i can mock for testing. any better  
ideas?


ben

On 08/19/2010 03:53 PM, Ted Dunning wrote:

Put in a four letter command that will put the server to sleep for 15
seconds!

:-)

On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reedbr...@yahoo- 
inc.com  wrote:



i'm updating ZOOKEEPER-366 with this discussion and try to get a  
patch out.

Qing (or anyone else, can you reproduce it pretty easily?)






Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
You can always increase your timeouts a bit.

On Thu, Aug 19, 2010 at 12:52 AM, Qing Yan qing...@gmail.com wrote:

 Oh.. our servers are also running in a virtualized environment.

 On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com wrote:

  Hi,
 
  I have tripped over similar problems testing Red Hat Cluster in
 virtualised
  environments.  I don't know whether recent linux kernels have improved
  their
  interaction with VMWare, but in our environments clock drift caused by
 lost
  ticks can be substantial, requiring NTP to sometimes jump the clock
 rather
  than control acceleration.   In one of our internal production rigs, the
  local NTP servers themselves were virtualised - causing absolute mayhem
  when
  heavy loads hit the other guests on the same physical hosts.
 
  The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time by
 10
  seconds always causes a member to prematurely time-out on a network read,
  causing the member to drop out and trigger a cluster reconfiguration.
  Apparently NTP is integrated with RHCS version 3, but I don't know what
 is
  meant by that.
 
  I guess this post is not entirely relevent to ZK, but I am just making
 the
  point that virtualisation (of NTP servers and or clients) can cause
  repeated
  premature timeouts.  On Linux, I believe that there is a class of timers
  provided that is immune to this, but I doubt that there is a platform
  independent way of coping with this.
 
  My 2p.
 
  regards,
  Martin
 
  On 18 August 2010 16:53, Patrick Hunt ph...@apache.org wrote:
 
   Do you expect the time to be wrong frequently? If ntp is running it
   should never get out of sync more than a small amount. As long as this
 is
   less than ~your timeout you should be fine.
  
   Patrick
  
  
   On 08/18/2010 01:04 AM, Qing Yan wrote:
  
   Hi,
  
  The testcase is fairly simple. We have a client which connects to
 ZK,
   registers an ephemeral node and watches on it. Now change the client
   machine's time - session killed..
  
  Here is the log:
  
   *2010-08-18 04:24:57,782 INFO
   com.taobao.timetunnel2.cluster.service.AgentService: Host name
   kgbtest1.corp.alimama.com
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51
  GMT
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:host.name=kgbtest1.corp.alimama.com
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.version=1.6.0_13
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.vendor=Sun Microsystems Inc.
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.home=/usr/java/jdk1.6.0_13/jre
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
  
  
 
 environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
  
  
 
 environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.io.tmpdir=/tmp
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.compiler=NA
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.name=Linux
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.arch=amd64
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.version=2.6.18-164.el5
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:user.name=admin
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:user.home=/home/admin
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:user.dir=/home/admin/TimeTunnel2/cluster/log
   2010-08-18 04:24:57,790 INFO org.apache.zookeeper.ZooKeeper:
 Initiating
   client connection, connectString=xentest10-vm5.corp.alimama.com:2181,
   xentest10-vm6.corp.alimama.com:2181,
 xentest10-vm9.corp.alimama.com:2181
   sessionTimeout=60
   watcher=com.taobao.timetunnel2.cluster.service.agentserv...@48d6c16c
   2010-08-18 04:24:57,791 INFO org.apache.zookeeper.ClientCnxn:
   zookeeper.disableAutoWatchReset is false
   2010-08-18 

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi,

I remember Ben had opened a jira for clock jumps earlier:
https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to
have clocks jump forward in virtualized environments.

It is desirable to modify ZooKeeper to handle this situation (as much as
possible) internally. It would need to be done for both client - server
connections and server - server connections. One obvious solution is to
retry a few times (send ping) after getting a timeout. Another way is to
count the number of pings that have been sent after receiving the timeout.
If number of pings do not match the expected number (say 5 ping attempt
should be finished for a 5 sec timeout), then wait till all the pings are
finished. In effect do not completely rely on the clock. Any comments?

-Vishal

On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan qing...@gmail.com wrote:

 Oh.. our servers are also running in a virtualized environment.

 On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com wrote:

  Hi,
 
  I have tripped over similar problems testing Red Hat Cluster in
 virtualised
  environments.  I don't know whether recent linux kernels have improved
  their
  interaction with VMWare, but in our environments clock drift caused by
 lost
  ticks can be substantial, requiring NTP to sometimes jump the clock
 rather
  than control acceleration.   In one of our internal production rigs, the
  local NTP servers themselves were virtualised - causing absolute mayhem
  when
  heavy loads hit the other guests on the same physical hosts.
 
  The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time by
 10
  seconds always causes a member to prematurely time-out on a network read,
  causing the member to drop out and trigger a cluster reconfiguration.
  Apparently NTP is integrated with RHCS version 3, but I don't know what
 is
  meant by that.
 
  I guess this post is not entirely relevent to ZK, but I am just making
 the
  point that virtualisation (of NTP servers and or clients) can cause
  repeated
  premature timeouts.  On Linux, I believe that there is a class of timers
  provided that is immune to this, but I doubt that there is a platform
  independent way of coping with this.
 
  My 2p.
 
  regards,
  Martin
 
  On 18 August 2010 16:53, Patrick Hunt ph...@apache.org wrote:
 
   Do you expect the time to be wrong frequently? If ntp is running it
   should never get out of sync more than a small amount. As long as this
 is
   less than ~your timeout you should be fine.
  
   Patrick
  
  
   On 08/18/2010 01:04 AM, Qing Yan wrote:
  
   Hi,
  
  The testcase is fairly simple. We have a client which connects to
 ZK,
   registers an ephemeral node and watches on it. Now change the client
   machine's time - session killed..
  
  Here is the log:
  
   *2010-08-18 04:24:57,782 INFO
   com.taobao.timetunnel2.cluster.service.AgentService: Host name
   kgbtest1.corp.alimama.com
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51
  GMT
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:host.name=kgbtest1.corp.alimama.com
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.version=1.6.0_13
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.vendor=Sun Microsystems Inc.
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.home=/usr/java/jdk1.6.0_13/jre
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
  
  
 
 environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
  
  
 
 environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.io.tmpdir=/tmp
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:java.compiler=NA
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.name=Linux
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.arch=amd64
   2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   environment:os.version=2.6.18-164.el5
   2010-08-18 04:24:57,789 INFO 

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Another option would be for the cluster to compare times and note when one
member seems to be lagging.  Restoration of that
lag would then be less remarkable.

I believe that the pattern of these problems is a slow slippage behind and a
sudden jump forward.

On Thu, Aug 19, 2010 at 7:51 AM, Vishal K vishalm...@gmail.com wrote:

 Hi,

 I remember Ben had opened a jira for clock jumps earlier:
 https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to
 have clocks jump forward in virtualized environments.

 It is desirable to modify ZooKeeper to handle this situation (as much as
 possible) internally. It would need to be done for both client - server
 connections and server - server connections. One obvious solution is to
 retry a few times (send ping) after getting a timeout. Another way is to
 count the number of pings that have been sent after receiving the timeout.
 If number of pings do not match the expected number (say 5 ping attempt
 should be finished for a 5 sec timeout), then wait till all the pings are
 finished. In effect do not completely rely on the clock. Any comments?

 -Vishal

 On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan qing...@gmail.com wrote:

  Oh.. our servers are also running in a virtualized environment.
 
  On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com
 wrote:
 
   Hi,
  
   I have tripped over similar problems testing Red Hat Cluster in
  virtualised
   environments.  I don't know whether recent linux kernels have improved
   their
   interaction with VMWare, but in our environments clock drift caused by
  lost
   ticks can be substantial, requiring NTP to sometimes jump the clock
  rather
   than control acceleration.   In one of our internal production rigs,
 the
   local NTP servers themselves were virtualised - causing absolute mayhem
   when
   heavy loads hit the other guests on the same physical hosts.
  
   The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time by
  10
   seconds always causes a member to prematurely time-out on a network
 read,
   causing the member to drop out and trigger a cluster reconfiguration.
   Apparently NTP is integrated with RHCS version 3, but I don't know what
  is
   meant by that.
  
   I guess this post is not entirely relevent to ZK, but I am just making
  the
   point that virtualisation (of NTP servers and or clients) can cause
   repeated
   premature timeouts.  On Linux, I believe that there is a class of
 timers
   provided that is immune to this, but I doubt that there is a platform
   independent way of coping with this.
  
   My 2p.
  
   regards,
   Martin
  
   On 18 August 2010 16:53, Patrick Hunt ph...@apache.org wrote:
  
Do you expect the time to be wrong frequently? If ntp is running it
should never get out of sync more than a small amount. As long as
 this
  is
less than ~your timeout you should be fine.
   
Patrick
   
   
On 08/18/2010 01:04 AM, Qing Yan wrote:
   
Hi,
   
   The testcase is fairly simple. We have a client which connects to
  ZK,
registers an ephemeral node and watches on it. Now change the client
machine's time - session killed..
   
   Here is the log:
   
*2010-08-18 04:24:57,782 INFO
com.taobao.timetunnel2.cluster.service.AgentService: Host name
kgbtest1.corp.alimama.com
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.2.2-888565, built on 12/08/2009
 21:51
   GMT
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=kgbtest1.corp.alimama.com
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_13
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_13/jre
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   
   
  
 
 environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
   
   
  
 
 environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
2010-08-18 04:24:57,789 

Re: Session expiration caused by time change

2010-08-19 Thread Martin Waite
Hi,

I'm not sure if you mean the timers I was on about earlier.  If so,
http://linux.die.net/man/3/clock_gettime

Sufficiently recent versions of GNU libc and the Linux kernel support the
following clocks:

...
*CLOCK_MONOTONIC* Clock that cannot be set and represents monotonic time
since some unspecified starting point. Although re-reading that now, I might
have applied wishful thinking to my interpretation.

regards,
Martin


On 19 August 2010 16:13, Benjamin Reed br...@yahoo-inc.com wrote:

 do you have a pointer to those timers?

 thanx
 ben


 On 08/18/2010 11:58 PM, Martin Waite wrote:

  On Linux, I believe that there is a class of timers
 provided that is immune to this, but I doubt that there is a platform
 independent way of coping with this.






Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
True.  But it knows that there has been a jump.

Quiet time can be distinguished from clock shift by assuming that members of
the cluster
don't all jump at the same time.

I would imagine that a recent clock jump estimate could be kept and
buckets that would
otherwise expire due to such a jump could be given a bit of a second lease
on life, delaying
all of their expiration.  Since time-outs are relatively short, the server
would be able to forget
about the bump very shortly.

On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reed br...@yahoo-inc.com wrote:

 if we try to use network messages to detect and correct the situation, it
 seems like we would recreate the problem we are having with ntp, since that
 is exactly what it does.



Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
yes, you are right. we could do this. it turns out that the expiration 
code is very simple:


while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime  currentTime) {
this.wait(nextExpirationTime - currentTime);
continue;
}
SessionSet set;
set = sessionSets.remove(nextExpirationTime);
if (set != null) {
for (SessionImpl s : set.sessions) {
sessionsById.remove(s.sessionId); 
expirer.expire(s);

}
}
nextExpirationTime += expirationInterval;
}

so we can detect a jump very easily: if nextExpirationTime  
currentTime, we have jumped ahead in time.


now the question is, what do we do with this information?

option 1) we could figure out the jump (nextExpirationTime-currentTime 
is a good estimate) and move all of the sessions forward by that amount.
option 2) we could converge on the time by having a policy to always 
wait at least a half a tick time.


there probably are other options as well. i kind of like option 2. worst 
case is it will make the sessions expire in half the time that they 
should, but this shouldn't be too much of a problem since clients send a 
ping if they are idle for 1/3 of their session timeout.


ben

On 08/19/2010 08:39 AM, Ted Dunning wrote:

True.  But it knows that there has been a jump.

Quiet time can be distinguished from clock shift by assuming that members of
the cluster
don't all jump at the same time.

I would imagine that a recent clock jump estimate could be kept and
buckets that would
otherwise expire due to such a jump could be given a bit of a second lease
on life, delaying
all of their expiration.  Since time-outs are relatively short, the server
would be able to forget
about the bump very shortly.

On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reedbr...@yahoo-inc.com  wrote:

   

if we try to use network messages to detect and correct the situation, it
seems like we would recreate the problem we are having with ntp, since that
is exactly what it does.

 




Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Nice (modulo inverting the  in your text).

Option 2 seems very simple.  That always attracts me.

On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed br...@yahoo-inc.com wrote:

 yes, you are right. we could do this. it turns out that the expiration code
 is very simple:

while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime  currentTime) {
this.wait(nextExpirationTime - currentTime);
continue;
}
SessionSet set;
set = sessionSets.remove(nextExpirationTime);
if (set != null) {
for (SessionImpl s : set.sessions) {
sessionsById.remove(s.sessionId); expirer.expire(s);
}
}
nextExpirationTime += expirationInterval;
}

 so we can detect a jump very easily: if nextExpirationTime  currentTime,
 we have jumped ahead in time.

 now the question is, what do we do with this information?

 option 1) we could figure out the jump (nextExpirationTime-currentTime is a
 good estimate) and move all of the sessions forward by that amount.
 option 2) we could converge on the time by having a policy to always wait
 at least a half a tick time.

 there probably are other options as well. i kind of like option 2. worst
 case is it will make the sessions expire in half the time that they should,
 but this shouldn't be too much of a problem since clients send a ping if
 they are idle for 1/3 of their session timeout.

 ben


 On 08/19/2010 08:39 AM, Ted Dunning wrote:

 True.  But it knows that there has been a jump.

 Quiet time can be distinguished from clock shift by assuming that members
 of
 the cluster
 don't all jump at the same time.

 I would imagine that a recent clock jump estimate could be kept and
 buckets that would
 otherwise expire due to such a jump could be given a bit of a second lease
 on life, delaying
 all of their expiration.  Since time-outs are relatively short, the server
 would be able to forget
 about the bump very shortly.

 On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reedbr...@yahoo-inc.com
  wrote:



 if we try to use network messages to detect and correct the situation, it
 seems like we would recreate the problem we are having with ntp, since
 that
 is exactly what it does.







Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ted,

I haven't give it a serious thought yet, but I don't think it is neccessary
for the cluster to keep track of time.

A node can make its own decision. For the sake of argument, lets say that we
have a client and a server with following policy:
1. Client is supposed to send a ping to server every 1 sec.
2. If server does not hear from client for 5 seconds, then the server
declares that the client is dead.
3. Similary if the client cannot communicate with the server for 5 seconds
client declares that the server is dead.

If the client receives a timeout (say while doing some IO) because of a time
jump, it should check the number of pings that has failed with the server.
If the number is 5, then this is a true failure, If the number is less than
5, then this is because of a time drift.

At the server side, the server can attempt to reconnect (or send a ping to
the client) after it receives a timeout. Thus, if the timeout occured
because of time drift, the server will reconnect and continue. We should
ofcourse have an upper bound in number of retries, etc.

For ZK, it is important to handle time jumps on ZK leader.


 I believe that the pattern of these problems is a slow slippage behind and
 a
 sudden jump forward.



You won't see the slippage. You will mainly see a jump forward. Note with
large enough number of nodes, multiple nodes could see their time jumping
forward. Therefore, checking comparing time between two servers may not
help.



 On Thu, Aug 19, 2010 at 7:51 AM, Vishal K vishalm...@gmail.com wrote:

  Hi,
 
  I remember Ben had opened a jira for clock jumps earlier:
  https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon
 to
  have clocks jump forward in virtualized environments.
 
  It is desirable to modify ZooKeeper to handle this situation (as much as
  possible) internally. It would need to be done for both client - server
  connections and server - server connections. One obvious solution is to
  retry a few times (send ping) after getting a timeout. Another way is to
  count the number of pings that have been sent after receiving the
 timeout.
  If number of pings do not match the expected number (say 5 ping attempt
  should be finished for a 5 sec timeout), then wait till all the pings are
  finished. In effect do not completely rely on the clock. Any comments?
 
  -Vishal
 
  On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan qing...@gmail.com wrote:
 
   Oh.. our servers are also running in a virtualized environment.
  
   On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com
  wrote:
  
Hi,
   
I have tripped over similar problems testing Red Hat Cluster in
   virtualised
environments.  I don't know whether recent linux kernels have
 improved
their
interaction with VMWare, but in our environments clock drift caused
 by
   lost
ticks can be substantial, requiring NTP to sometimes jump the clock
   rather
than control acceleration.   In one of our internal production rigs,
  the
local NTP servers themselves were virtualised - causing absolute
 mayhem
when
heavy loads hit the other guests on the same physical hosts.
   
The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time
 by
   10
seconds always causes a member to prematurely time-out on a network
  read,
causing the member to drop out and trigger a cluster reconfiguration.
Apparently NTP is integrated with RHCS version 3, but I don't know
 what
   is
meant by that.
   
I guess this post is not entirely relevent to ZK, but I am just
 making
   the
point that virtualisation (of NTP servers and or clients) can cause
repeated
premature timeouts.  On Linux, I believe that there is a class of
  timers
provided that is immune to this, but I doubt that there is a platform
independent way of coping with this.
   
My 2p.
   
regards,
Martin
   
On 18 August 2010 16:53, Patrick Hunt ph...@apache.org wrote:
   
 Do you expect the time to be wrong frequently? If ntp is running
 it
 should never get out of sync more than a small amount. As long as
  this
   is
 less than ~your timeout you should be fine.

 Patrick


 On 08/18/2010 01:04 AM, Qing Yan wrote:

 Hi,

The testcase is fairly simple. We have a client which connects
 to
   ZK,
 registers an ephemeral node and watches on it. Now change the
 client
 machine's time - session killed..

Here is the log:

 *2010-08-18 04:24:57,782 INFO
 com.taobao.timetunnel2.cluster.service.AgentService: Host name
 kgbtest1.corp.alimama.com
 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
 Client
 environment:zookeeper.version=3.2.2-888565, built on 12/08/2009
  21:51
GMT
 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper:
 Client
 environment:host.name=kgbtest1.corp.alimama.com
 2010-08-18 04:24:57,789 INFO 

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
if we can't rely on the clock, we cannot say things like if ... for 5 
seconds.


also, clients connect to servers, not visa-versa, so we cannot say 
things like server can attempt to reconnect.


ben

On 08/19/2010 10:17 AM, Vishal K wrote:

Hi Ted,

I haven't give it a serious thought yet, but I don't think it is neccessary
for the cluster to keep track of time.

A node can make its own decision. For the sake of argument, lets say that we
have a client and a server with following policy:
1. Client is supposed to send a ping to server every 1 sec.
2. If server does not hear from client for 5 seconds, then the server
declares that the client is dead.
3. Similary if the client cannot communicate with the server for 5 seconds
client declares that the server is dead.

If the client receives a timeout (say while doing some IO) because of a time
jump, it should check the number of pings that has failed with the server.
If the number is 5, then this is a true failure, If the number is less than
5, then this is because of a time drift.

At the server side, the server can attempt to reconnect (or send a ping to
the client) after it receives a timeout. Thus, if the timeout occured
because of time drift, the server will reconnect and continue. We should
ofcourse have an upper bound in number of retries, etc.

For ZK, it is important to handle time jumps on ZK leader.

   

I believe that the pattern of these problems is a slow slippage behind and
a
sudden jump forward.

 


You won't see the slippage. You will mainly see a jump forward. Note with
large enough number of nodes, multiple nodes could see their time jumping
forward. Therefore, checking comparing time between two servers may not
help.


   

On Thu, Aug 19, 2010 at 7:51 AM, Vishal Kvishalm...@gmail.com  wrote:

 

Hi,

I remember Ben had opened a jira for clock jumps earlier:
https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon
   

to
 

have clocks jump forward in virtualized environments.

It is desirable to modify ZooKeeper to handle this situation (as much as
possible) internally. It would need to be done for both client - server
connections and server - server connections. One obvious solution is to
retry a few times (send ping) after getting a timeout. Another way is to
count the number of pings that have been sent after receiving the
   

timeout.
 

If number of pings do not match the expected number (say 5 ping attempt
should be finished for a 5 sec timeout), then wait till all the pings are
finished. In effect do not completely rely on the clock. Any comments?

-Vishal

On Thu, Aug 19, 2010 at 3:52 AM, Qing Yanqing...@gmail.com  wrote:

   

Oh.. our servers are also running in a virtualized environment.

On Thu, Aug 19, 2010 at 2:58 PM, Martin Waitewaite@gmail.com
 

wrote:
   
 

Hi,

I have tripped over similar problems testing Red Hat Cluster in
   

virtualised
 

environments.  I don't know whether recent linux kernels have
   

improved
 

their
interaction with VMWare, but in our environments clock drift caused
   

by
 

lost
 

ticks can be substantial, requiring NTP to sometimes jump the clock
   

rather
 

than control acceleration.   In one of our internal production rigs,
   

the
   

local NTP servers themselves were virtualised - causing absolute
   

mayhem
 

when
heavy loads hit the other guests on the same physical hosts.

The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time
   

by
 

10
 

seconds always causes a member to prematurely time-out on a network
   

read,
   

causing the member to drop out and trigger a cluster reconfiguration.
Apparently NTP is integrated with RHCS version 3, but I don't know
   

what
 

is
 

meant by that.

I guess this post is not entirely relevent to ZK, but I am just
   

making
 

the
 

point that virtualisation (of NTP servers and or clients) can cause
repeated
premature timeouts.  On Linux, I believe that there is a class of
   

timers
   

provided that is immune to this, but I doubt that there is a platform
independent way of coping with this.

My 2p.

regards,
Martin

On 18 August 2010 16:53, Patrick Huntph...@apache.org  wrote:

   

Do you expect the time to be wrong frequently? If ntp is running
 

it
 

should never get out of sync more than a small amount. As long as
 

this
   

is
 

less than ~your timeout you should be fine.

Patrick


On 08/18/2010 01:04 AM, Qing Yan wrote:

 

Hi,

The testcase is fairly simple. We have a client which connects
   

to
 

ZK,
 

registers an ephemeral node and watches on it. Now change the
   

client
 

machine's time - session killed..

Here is the log:

*2010-08-18 

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ben,

Comments inline..

On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed br...@yahoo-inc.com wrote:

 if we can't rely on the clock, we cannot say things like if ... for 5
 seconds.


if ... for 5 seconds indicates the timeout give by the socket library.
After the timeout we can verify that the timeout received was not a side
effect of time jump by looking at the number of ping attempts.



 also, clients connect to servers, not visa-versa, so we cannot say things
 like server can attempt to reconnect.


In the scenario described below, wouldn't it be ok for the server to just
send a ping request to see if the client is really dead?


 ben


 On 08/19/2010 10:17 AM, Vishal K wrote:

 Hi Ted,

 I haven't give it a serious thought yet, but I don't think it is
 neccessary
 for the cluster to keep track of time.

 A node can make its own decision. For the sake of argument, lets say that
 we
 have a client and a server with following policy:
 1. Client is supposed to send a ping to server every 1 sec.
 2. If server does not hear from client for 5 seconds, then the server
 declares that the client is dead.
 3. Similary if the client cannot communicate with the server for 5 seconds
 client declares that the server is dead.

 If the client receives a timeout (say while doing some IO) because of a
 time
 jump, it should check the number of pings that has failed with the server.
 If the number is 5, then this is a true failure, If the number is less
 than
 5, then this is because of a time drift.

 At the server side, the server can attempt to reconnect (or send a ping to
 the client) after it receives a timeout. Thus, if the timeout occured
 because of time drift, the server will reconnect and continue. We should
 ofcourse have an upper bound in number of retries, etc.

 For ZK, it is important to handle time jumps on ZK leader.



 I believe that the pattern of these problems is a slow slippage behind
 and
 a
 sudden jump forward.




 You won't see the slippage. You will mainly see a jump forward. Note with
 large enough number of nodes, multiple nodes could see their time jumping
 forward. Therefore, checking comparing time between two servers may not
 help.




 On Thu, Aug 19, 2010 at 7:51 AM, Vishal Kvishalm...@gmail.com  wrote:



 Hi,

 I remember Ben had opened a jira for clock jumps earlier:
 https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon


 to


 have clocks jump forward in virtualized environments.

 It is desirable to modify ZooKeeper to handle this situation (as much as
 possible) internally. It would need to be done for both client - server
 connections and server - server connections. One obvious solution is to
 retry a few times (send ping) after getting a timeout. Another way is to
 count the number of pings that have been sent after receiving the


 timeout.


 If number of pings do not match the expected number (say 5 ping attempt
 should be finished for a 5 sec timeout), then wait till all the pings
 are
 finished. In effect do not completely rely on the clock. Any comments?

 -Vishal

 On Thu, Aug 19, 2010 at 3:52 AM, Qing Yanqing...@gmail.com  wrote:



 Oh.. our servers are also running in a virtualized environment.

 On Thu, Aug 19, 2010 at 2:58 PM, Martin Waitewaite@gmail.com


 wrote:




 Hi,

 I have tripped over similar problems testing Red Hat Cluster in


 virtualised


 environments.  I don't know whether recent linux kernels have


 improved


 their
 interaction with VMWare, but in our environments clock drift caused


 by


 lost


 ticks can be substantial, requiring NTP to sometimes jump the clock


 rather


 than control acceleration.   In one of our internal production rigs,


 the


 local NTP servers themselves were virtualised - causing absolute


 mayhem


 when
 heavy loads hit the other guests on the same physical hosts.

 The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time


 by


 10


 seconds always causes a member to prematurely time-out on a network


 read,


 causing the member to drop out and trigger a cluster reconfiguration.
 Apparently NTP is integrated with RHCS version 3, but I don't know


 what


 is


 meant by that.

 I guess this post is not entirely relevent to ZK, but I am just


 making


 the


 point that virtualisation (of NTP servers and or clients) can cause
 repeated
 premature timeouts.  On Linux, I believe that there is a class of


 timers


 provided that is immune to this, but I doubt that there is a platform
 independent way of coping with this.

 My 2p.

 regards,
 Martin

 On 18 August 2010 16:53, Patrick Huntph...@apache.org  wrote:



 Do you expect the time to be wrong frequently? If ntp is running


 it


 should never get out of sync more than a small amount. As long as


 this


 is


 less than ~your timeout you should be fine.

 Patrick


 On 08/18/2010 01:04 AM, Qing Yan wrote:



 Hi,

The testcase is fairly simple. We have a client which connects


 to


 ZK,


 

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Ben's approach is really simpler.  The client already sends keep-alive
messages and we know that
some have gone missing or a time shift has happened.  Those two
possibilities are cleanly distinguished
by Ben's suggestion of comparing current time to the bucket expiration.  If
current time is significantly after
the bucket expiration, we know something strange happened and can reschedule
the next few buckets.

As Ben mentioned, this has a cleanly bounded maximum error and is very, very
simple.  He didn't mention
that it doesn't require any more information than is already known and
doesn't require any machine interaction.

On Thu, Aug 19, 2010 at 3:16 PM, Vishal K vishalm...@gmail.com wrote:


 On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed br...@yahoo-inc.com
 wrote:

  if we can't rely on the clock, we cannot say things like if ... for 5
  seconds.
 
 
 if ... for 5 seconds indicates the timeout give by the socket library.
 After the timeout we can verify that the timeout received was not a side
 effect of time jump by looking at the number of ping attempts.



  also, clients connect to servers, not visa-versa, so we cannot say things
  like server can attempt to reconnect.
 

 In the scenario described below, wouldn't it be ok for the server to just
 send a ping request to see if the client is really dead?



Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch 
out. Qing (or anyone else, can you reproduce it pretty easily?)


thanx
ben

On 08/19/2010 09:29 AM, Ted Dunning wrote:

Nice (modulo inverting the  in your text).

Option 2 seems very simple.  That always attracts me.

On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reedbr...@yahoo-inc.com  wrote:

   

yes, you are right. we could do this. it turns out that the expiration code
is very simple:

while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime  currentTime) {
this.wait(nextExpirationTime - currentTime);
continue;
}
SessionSet set;
set = sessionSets.remove(nextExpirationTime);
if (set != null) {
for (SessionImpl s : set.sessions) {
sessionsById.remove(s.sessionId); expirer.expire(s);
}
}
nextExpirationTime += expirationInterval;
}

so we can detect a jump very easily: if nextExpirationTime  currentTime,
we have jumped ahead in time.

now the question is, what do we do with this information?

option 1) we could figure out the jump (nextExpirationTime-currentTime is a
good estimate) and move all of the sessions forward by that amount.
option 2) we could converge on the time by having a policy to always wait
at least a half a tick time.

there probably are other options as well. i kind of like option 2. worst
case is it will make the sessions expire in half the time that they should,
but this shouldn't be too much of a problem since clients send a ping if
they are idle for 1/3 of their session timeout.

ben


On 08/19/2010 08:39 AM, Ted Dunning wrote:

 

True.  But it knows that there has been a jump.

Quiet time can be distinguished from clock shift by assuming that members
of
the cluster
don't all jump at the same time.

I would imagine that a recent clock jump estimate could be kept and
buckets that would
otherwise expire due to such a jump could be given a bit of a second lease
on life, delaying
all of their expiration.  Since time-outs are relatively short, the server
would be able to forget
about the bump very shortly.

On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reedbr...@yahoo-inc.com
  wrote:



   

if we try to use network messages to detect and correct the situation, it
seems like we would recreate the problem we are having with ntp, since
that
is exactly what it does.



 
   
 




Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Put in a four letter command that will put the server to sleep for 15
seconds!

:-)

On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reed br...@yahoo-inc.com wrote:

 i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out.
 Qing (or anyone else, can you reproduce it pretty easily?)



Re: Session expiration caused by time change

2010-08-18 Thread Ted Dunning
If NTP is changing your time by more than a few milliseconds then you have
other problems (big ones).

On Wed, Aug 18, 2010 at 1:04 AM, Qing Yan qing...@gmail.com wrote:

 I guess ZK might rely on timestamp to  keep sessions alive, but we have
 NTP daemon running so machine time can get changed
 automatically, is there a conflict?