Re: Session expiration caused by time change

2010-08-20 Thread Ted Dunning
Mocking the time via a utility was my thought. Mocking system itself is scary. Sent from my iPhone On Aug 20, 2010, at 1:18 PM, Benjamin Reed wrote: i put up a patch that should address the problem. now i need to write a test case. the only way i can think of is to change the call to Sys

Re: Session expiration caused by time change

2010-08-20 Thread Benjamin Reed
i put up a patch that should address the problem. now i need to write a test case. the only way i can think of is to change the call to System.currentTimeMillis to a utility class that calls System.currentTimeMillis that i can mock for testing. any better ideas? ben On 08/19/2010 03:53 PM, Te

Re: Session expiration caused by time change

2010-08-19 Thread Martin Waite
Hi, In our testing of Red Hat Cluster, we could reproduce the NTP impact by jumping the clock backwards and forwards, just using the date command in a tight-ish loop: use strict; my $dir = 1; while (1) { jump_time( $dir ); $dir = $dir * -1; } sub jump_time { my ($dir) = @_; my $step

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Put in a four letter command that will put the server to sleep for 15 seconds! :-) On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reed wrote: > i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. > Qing (or anyone else, can you reproduce it pretty easily?) >

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. Qing (or anyone else, can you reproduce it pretty easily?) thanx ben On 08/19/2010 09:29 AM, Ted Dunning wrote: Nice (modulo inverting the< in your text). Option 2 seems very simple. That always attracts me. On Thu

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Ben's approach is really simpler. The client already sends keep-alive messages and we know that some have gone missing or a time shift has happened. Those two possibilities are cleanly distinguished by Ben's suggestion of comparing current time to the bucket expiration. If current time is signif

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ben, Comments inline.. On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed wrote: > if we can't rely on the clock, we cannot say things like "if ... for 5 > seconds". > > "if ... for 5 seconds" indicates the timeout give by the socket library. After the timeout we can verify that the timeout rece

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
if we can't rely on the clock, we cannot say things like "if ... for 5 seconds". also, clients connect to servers, not visa-versa, so we cannot say things like "server can attempt to reconnect". ben On 08/19/2010 10:17 AM, Vishal K wrote: Hi Ted, I haven't give it a serious thought yet, bu

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ted, I haven't give it a serious thought yet, but I don't think it is neccessary for the cluster to keep track of time. A node can make its own decision. For the sake of argument, lets say that we have a client and a server with following policy: 1. Client is supposed to send a ping to server

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Nice (modulo inverting the < in your text). Option 2 seems very simple. That always attracts me. On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed wrote: > yes, you are right. we could do this. it turns out that the expiration code > is very simple: > >while (running) { >

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
yes, you are right. we could do this. it turns out that the expiration code is very simple: while (running) { currentTime = System.currentTimeMillis(); if (nextExpirationTime > currentTime) { this.wait(nextExpirationTime - currentTi

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
True. But it knows that there has been a jump. Quiet time can be distinguished from clock shift by assuming that members of the cluster don't all jump at the same time. I would imagine that a "recent clock jump" estimate could be kept and buckets that would otherwise expire due to such a jump co

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
a) that only provides monotonic time, not smooth time b) that is C, the server is Java Could be hard to get the benefit we need. On Thu, Aug 19, 2010 at 8:27 AM, Martin Waite wrote: > Hi, > > I'm not sure if you mean the timers I was on about earlier. If so, > http://linux.die.net/man/3/clock

Re: Session expiration caused by time change

2010-08-19 Thread Martin Waite
Hi, I'm not sure if you mean the timers I was on about earlier. If so, http://linux.die.net/man/3/clock_gettime Sufficiently recent versions of GNU libc and the Linux kernel support the following clocks: ... *CLOCK_MONOTONIC* Clock that cannot be set and represents monotonic time since some uns

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm afraid it isn't that simple. we figure out who is expired by bucketizing sessions to be expired in an interval. if we hear from that a we move it to a different bucket, otherwise when the bucket expires, everything in that bucket goes away. when time jumps, it looks to the server like ther

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
do you have a pointer to those timers? thanx ben On 08/18/2010 11:58 PM, Martin Waite wrote: On Linux, I believe that there is a class of timers provided that is immune to this, but I doubt that there is a platform independent way of coping with this.

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Another option would be for the cluster to compare times and note when one member seems to be lagging. Restoration of that lag would then be less remarkable. I believe that the pattern of these problems is a slow slippage behind and a sudden jump forward. On Thu, Aug 19, 2010 at 7:51 AM, Vishal

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi, I remember Ben had opened a jira for clock jumps earlier: https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to have clocks jump forward in virtualized environments. It is desirable to modify ZooKeeper to handle this situation (as much as possible) internally. It would ne

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
You can always increase your timeouts a bit. On Thu, Aug 19, 2010 at 12:52 AM, Qing Yan wrote: > Oh.. our servers are also running in a virtualized environment. > > On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite wrote: > > > Hi, > > > > I have tripped over similar problems testing Red Hat Cluste

Re: Session expiration caused by time change

2010-08-19 Thread Qing Yan
Oh.. our servers are also running in a virtualized environment. On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite wrote: > Hi, > > I have tripped over similar problems testing Red Hat Cluster in virtualised > environments. I don't know whether recent linux kernels have improved > their > interactio

Re: Session expiration caused by time change

2010-08-18 Thread Martin Waite
Hi, I have tripped over similar problems testing Red Hat Cluster in virtualised environments. I don't know whether recent linux kernels have improved their interaction with VMWare, but in our environments clock drift caused by lost ticks can be substantial, requiring NTP to sometimes jump the clo

Re: Session expiration caused by time change

2010-08-18 Thread Patrick Hunt
Do you expect the time to be "wrong" frequently? If ntp is running it should never get out of sync more than a small amount. As long as this is less than ~your timeout you should be fine. Patrick On 08/18/2010 01:04 AM, Qing Yan wrote: Hi, The testcase is fairly simple. We have a client

Re: Session expiration caused by time change

2010-08-18 Thread Ted Dunning
If NTP is changing your time by more than a few milliseconds then you have other problems (big ones). On Wed, Aug 18, 2010 at 1:04 AM, Qing Yan wrote: > I guess ZK might rely on timestamp to keep sessions alive, but we have > NTP daemon running so machine time can get changed > automatically, i

Session expiration caused by time change

2010-08-18 Thread Qing Yan
Hi, The testcase is fairly simple. We have a client which connects to ZK, registers an ephemeral node and watches on it. Now change the client machine's time - session killed.. Here is the log: *2010-08-18 04:24:57,782 INFO com.taobao.timetunnel2.cluster.service.AgentService: Host name kgb