Re: Session expiration caused by time change

2010-08-19 Thread Qing Yan
Oh.. our servers are also running in a virtualized environment. On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite wrote: > Hi, > > I have tripped over similar problems testing Red Hat Cluster in virtualised > environments. I don't know whether recent linux kernels have improved > their > interactio

Zookeeper stops

2010-08-19 Thread Wim Jongman
Hi, I have a zookeeper server running that can sometimes run for days and then quits: Is there somebody with a clue to the problem? I am running 64 bit Ubuntu with java version "1.6.0_18" OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1) OpenJDK 64-Bit Server VM (build 14.0-b16, mi

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
You can always increase your timeouts a bit. On Thu, Aug 19, 2010 at 12:52 AM, Qing Yan wrote: > Oh.. our servers are also running in a virtualized environment. > > On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite wrote: > > > Hi, > > > > I have tripped over similar problems testing Red Hat Cluste

Re: Zookeeper stops

2010-08-19 Thread Mahadev Konar
Hi Wim, It mostly looks like that zookeeper is not able to create files on the /tmp filesystem. Is there is a space shortage or is it possible the file is being deleted as its being written to? Sometimes admins have a crontab on /tmp that cleans up the /tmp filesystem. Thanks mahadev On 8/1

Re: Zookeeper stops

2010-08-19 Thread Ted Dunning
Also, /tmp is not a great place to keep things that are intended for persistence. On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konar wrote: > Hi Wim, > It mostly looks like that zookeeper is not able to create files on the > /tmp filesystem. Is there is a space shortage or is it possible the file is

Re: Zookeeper stops

2010-08-19 Thread Wim Jongman
Ah, thanks guys! I did not realize that this was a user setting. Will try. Best regards, Wim On Thu, Aug 19, 2010 at 4:43 PM, Ted Dunning wrote: > Also, /tmp is not a great place to keep things that are intended for > persistence. > > On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konar >wrote: >

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi, I remember Ben had opened a jira for clock jumps earlier: https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to have clocks jump forward in virtualized environments. It is desirable to modify ZooKeeper to handle this situation (as much as possible) internally. It would ne

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Another option would be for the cluster to compare times and note when one member seems to be lagging. Restoration of that lag would then be less remarkable. I believe that the pattern of these problems is a slow slippage behind and a sudden jump forward. On Thu, Aug 19, 2010 at 7:51 AM, Vishal

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
do you have a pointer to those timers? thanx ben On 08/18/2010 11:58 PM, Martin Waite wrote: On Linux, I believe that there is a class of timers provided that is immune to this, but I doubt that there is a platform independent way of coping with this.

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm afraid it isn't that simple. we figure out who is expired by bucketizing sessions to be expired in an interval. if we hear from that a we move it to a different bucket, otherwise when the bucket expires, everything in that bucket goes away. when time jumps, it looks to the server like ther

Re: Session expiration caused by time change

2010-08-19 Thread Martin Waite
Hi, I'm not sure if you mean the timers I was on about earlier. If so, http://linux.die.net/man/3/clock_gettime Sufficiently recent versions of GNU libc and the Linux kernel support the following clocks: ... *CLOCK_MONOTONIC* Clock that cannot be set and represents monotonic time since some uns

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
a) that only provides monotonic time, not smooth time b) that is C, the server is Java Could be hard to get the benefit we need. On Thu, Aug 19, 2010 at 8:27 AM, Martin Waite wrote: > Hi, > > I'm not sure if you mean the timers I was on about earlier. If so, > http://linux.die.net/man/3/clock

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
True. But it knows that there has been a jump. Quiet time can be distinguished from clock shift by assuming that members of the cluster don't all jump at the same time. I would imagine that a "recent clock jump" estimate could be kept and buckets that would otherwise expire due to such a jump co

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
yes, you are right. we could do this. it turns out that the expiration code is very simple: while (running) { currentTime = System.currentTimeMillis(); if (nextExpirationTime > currentTime) { this.wait(nextExpirationTime - currentTi

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Nice (modulo inverting the < in your text). Option 2 seems very simple. That always attracts me. On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed wrote: > yes, you are right. we could do this. it turns out that the expiration code > is very simple: > >while (running) { >

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ted, I haven't give it a serious thought yet, but I don't think it is neccessary for the cluster to keep track of time. A node can make its own decision. For the sake of argument, lets say that we have a client and a server with following policy: 1. Client is supposed to send a ping to server

Re: Zookeeper stops

2010-08-19 Thread Patrick Hunt
+1 on that Ted. I frequently see this issue crop up as "I just rebooted my server and lost all my data ..." -- many os's will cleanup tmp on reboot. :-) Patrick On 08/19/2010 07:43 AM, Ted Dunning wrote: Also, /tmp is not a great place to keep things that are intended for persistence. On Thu

Re: Zookeeper stops

2010-08-19 Thread Wim Jongman
Hi, But zk does default to /tmp? Regards, Wim On Thursday, August 19, 2010, Patrick Hunt wrote: > +1 on that Ted. I frequently see this issue crop up as "I just rebooted my > server and lost all my data ..." -- many os's will cleanup tmp on reboot. :-) > > Patrick > > On 08/19/2010 07:43

Re: Zookeeper stops

2010-08-19 Thread Patrick Hunt
No. You configure it in the server configuration file. Patrick On 08/19/2010 01:19 PM, Wim Jongman wrote: Hi, But zk does default to /tmp? Regards, Wim On Thursday, August 19, 2010, Patrick Hunt wrote: +1 on that Ted. I frequently see this issue crop up as "I just rebooted my server

Re: ZK monitoring

2010-08-19 Thread Patrick Hunt
Maybe we should have a contrib pkg for utilities such as this? I could see a python script that, given 1 server (might require addl 4letter words but this would be useful regardless), could collect such information from the cluster. Create a JIRA? Patrick On 08/17/2010 12:14 PM, Andrei Savu w

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
if we can't rely on the clock, we cannot say things like "if ... for 5 seconds". also, clients connect to servers, not visa-versa, so we cannot say things like "server can attempt to reconnect". ben On 08/19/2010 10:17 AM, Vishal K wrote: Hi Ted, I haven't give it a serious thought yet, bu

Re: ZK monitoring

2010-08-19 Thread Ted Dunning
It would be nice if it took a list of servers and verified that they all thought that they were part of the same cluster. On Thu, Aug 19, 2010 at 1:46 PM, Patrick Hunt wrote: > Maybe we should have a contrib pkg for utilities such as this? I could see > a python script that, given 1 server (migh

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ben, Comments inline.. On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed wrote: > if we can't rely on the clock, we cannot say things like "if ... for 5 > seconds". > > "if ... for 5 seconds" indicates the timeout give by the socket library. After the timeout we can verify that the timeout rece

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Ben's approach is really simpler. The client already sends keep-alive messages and we know that some have gone missing or a time shift has happened. Those two possibilities are cleanly distinguished by Ben's suggestion of comparing current time to the bucket expiration. If current time is signif

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. Qing (or anyone else, can you reproduce it pretty easily?) thanx ben On 08/19/2010 09:29 AM, Ted Dunning wrote: Nice (modulo inverting the< in your text). Option 2 seems very simple. That always attracts me. On Thu

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Put in a four letter command that will put the server to sleep for 15 seconds! :-) On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reed wrote: > i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. > Qing (or anyone else, can you reproduce it pretty easily?) >

Re: Session expiration caused by time change

2010-08-19 Thread Martin Waite
Hi, In our testing of Red Hat Cluster, we could reproduce the NTP impact by jumping the clock backwards and forwards, just using the date command in a tight-ish loop: use strict; my $dir = 1; while (1) { jump_time( $dir ); $dir = $dir * -1; } sub jump_time { my ($dir) = @_; my $step