Oh.. our servers are also running in a virtualized environment.
On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite wrote:
> Hi,
>
> I have tripped over similar problems testing Red Hat Cluster in virtualised
> environments. I don't know whether recent linux kernels have improved
> their
> interactio
Hi,
I have a zookeeper server running that can sometimes run for days and then
quits:
Is there somebody with a clue to the problem?
I am running 64 bit Ubuntu with
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mi
You can always increase your timeouts a bit.
On Thu, Aug 19, 2010 at 12:52 AM, Qing Yan wrote:
> Oh.. our servers are also running in a virtualized environment.
>
> On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite wrote:
>
> > Hi,
> >
> > I have tripped over similar problems testing Red Hat Cluste
Hi Wim,
It mostly looks like that zookeeper is not able to create files on the /tmp
filesystem. Is there is a space shortage or is it possible the file is being
deleted as its being written to?
Sometimes admins have a crontab on /tmp that cleans up the /tmp filesystem.
Thanks
mahadev
On 8/1
Also, /tmp is not a great place to keep things that are intended for
persistence.
On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konar wrote:
> Hi Wim,
> It mostly looks like that zookeeper is not able to create files on the
> /tmp filesystem. Is there is a space shortage or is it possible the file is
Ah, thanks guys! I did not realize that this was a user setting.
Will try.
Best regards,
Wim
On Thu, Aug 19, 2010 at 4:43 PM, Ted Dunning wrote:
> Also, /tmp is not a great place to keep things that are intended for
> persistence.
>
> On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konar >wrote:
>
Hi,
I remember Ben had opened a jira for clock jumps earlier:
https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to
have clocks jump forward in virtualized environments.
It is desirable to modify ZooKeeper to handle this situation (as much as
possible) internally. It would ne
Another option would be for the cluster to compare times and note when one
member seems to be lagging. Restoration of that
lag would then be less remarkable.
I believe that the pattern of these problems is a slow slippage behind and a
sudden jump forward.
On Thu, Aug 19, 2010 at 7:51 AM, Vishal
do you have a pointer to those timers?
thanx
ben
On 08/18/2010 11:58 PM, Martin Waite wrote:
On Linux, I believe that there is a class of timers
provided that is immune to this, but I doubt that there is a platform
independent way of coping with this.
i'm afraid it isn't that simple. we figure out who is expired by
bucketizing sessions to be expired in an interval. if we hear from that
a we move it to a different bucket, otherwise when the bucket expires,
everything in that bucket goes away.
when time jumps, it looks to the server like ther
Hi,
I'm not sure if you mean the timers I was on about earlier. If so,
http://linux.die.net/man/3/clock_gettime
Sufficiently recent versions of GNU libc and the Linux kernel support the
following clocks:
...
*CLOCK_MONOTONIC* Clock that cannot be set and represents monotonic time
since some uns
a) that only provides monotonic time, not smooth time
b) that is C, the server is Java
Could be hard to get the benefit we need.
On Thu, Aug 19, 2010 at 8:27 AM, Martin Waite wrote:
> Hi,
>
> I'm not sure if you mean the timers I was on about earlier. If so,
> http://linux.die.net/man/3/clock
True. But it knows that there has been a jump.
Quiet time can be distinguished from clock shift by assuming that members of
the cluster
don't all jump at the same time.
I would imagine that a "recent clock jump" estimate could be kept and
buckets that would
otherwise expire due to such a jump co
yes, you are right. we could do this. it turns out that the expiration
code is very simple:
while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime > currentTime) {
this.wait(nextExpirationTime - currentTi
Nice (modulo inverting the < in your text).
Option 2 seems very simple. That always attracts me.
On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed wrote:
> yes, you are right. we could do this. it turns out that the expiration code
> is very simple:
>
>while (running) {
>
Hi Ted,
I haven't give it a serious thought yet, but I don't think it is neccessary
for the cluster to keep track of time.
A node can make its own decision. For the sake of argument, lets say that we
have a client and a server with following policy:
1. Client is supposed to send a ping to server
+1 on that Ted. I frequently see this issue crop up as "I just rebooted
my server and lost all my data ..." -- many os's will cleanup tmp on
reboot. :-)
Patrick
On 08/19/2010 07:43 AM, Ted Dunning wrote:
Also, /tmp is not a great place to keep things that are intended for
persistence.
On Thu
Hi,
But zk does default to /tmp?
Regards,
Wim
On Thursday, August 19, 2010, Patrick Hunt wrote:
> +1 on that Ted. I frequently see this issue crop up as "I just rebooted my
> server and lost all my data ..." -- many os's will cleanup tmp on reboot. :-)
>
> Patrick
>
> On 08/19/2010 07:43
No. You configure it in the server configuration file.
Patrick
On 08/19/2010 01:19 PM, Wim Jongman wrote:
Hi,
But zk does default to /tmp?
Regards,
Wim
On Thursday, August 19, 2010, Patrick Hunt wrote:
+1 on that Ted. I frequently see this issue crop up as "I just rebooted my server
Maybe we should have a contrib pkg for utilities such as this? I could
see a python script that, given 1 server (might require addl 4letter
words but this would be useful regardless), could collect such
information from the cluster. Create a JIRA?
Patrick
On 08/17/2010 12:14 PM, Andrei Savu w
if we can't rely on the clock, we cannot say things like "if ... for 5
seconds".
also, clients connect to servers, not visa-versa, so we cannot say
things like "server can attempt to reconnect".
ben
On 08/19/2010 10:17 AM, Vishal K wrote:
Hi Ted,
I haven't give it a serious thought yet, bu
It would be nice if it took a list of servers and verified that they all
thought that they were part of the same cluster.
On Thu, Aug 19, 2010 at 1:46 PM, Patrick Hunt wrote:
> Maybe we should have a contrib pkg for utilities such as this? I could see
> a python script that, given 1 server (migh
Hi Ben,
Comments inline..
On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed wrote:
> if we can't rely on the clock, we cannot say things like "if ... for 5
> seconds".
>
>
"if ... for 5 seconds" indicates the timeout give by the socket library.
After the timeout we can verify that the timeout rece
Ben's approach is really simpler. The client already sends keep-alive
messages and we know that
some have gone missing or a time shift has happened. Those two
possibilities are cleanly distinguished
by Ben's suggestion of comparing current time to the bucket expiration. If
current time is signif
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch
out. Qing (or anyone else, can you reproduce it pretty easily?)
thanx
ben
On 08/19/2010 09:29 AM, Ted Dunning wrote:
Nice (modulo inverting the< in your text).
Option 2 seems very simple. That always attracts me.
On Thu
Put in a four letter command that will put the server to sleep for 15
seconds!
:-)
On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reed wrote:
> i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out.
> Qing (or anyone else, can you reproduce it pretty easily?)
>
Hi,
In our testing of Red Hat Cluster, we could reproduce the NTP impact by
jumping the clock backwards and forwards, just using the date command in a
tight-ish loop:
use strict;
my $dir = 1;
while (1) {
jump_time( $dir );
$dir = $dir * -1;
}
sub jump_time {
my ($dir) = @_;
my $step
27 matches
Mail list logo