!
Is the session recoverable in case the zk server was restarted in
meantime ?
Johannes
On Sep 12, 2008, at 3:52 PM, Benjamin Reed wrote:
If a application does not close the ZooKeeper session before shutting
down, ZooKeeper will not cleanup the session until it times out. So
when
an application
That graph is taken from a paper we will be publishing as a tech report.
Here is the missing text:
To show the behavior of the system over time as failures are injected we
ran a ZooKeeper service made up of 7 machines. We ran the same
saturation benchmark as before, but this time we kept the write
Thomas,
in the scenario you give you have two simultaneous failures with 3 nodes, so it
will not recover correctly. A is failed because it is not up. B has failed
because it lost all its data.
it would be good for ZooKeeper to not come up in that scenario. perhaps what we
need is something
The command line is a very simple utility for testing and as an example of how
to use the API.
these are good suggestions, you should document them in a Jira.
ben
From: burtona...@gmail.com [burtona...@gmail.com] On Behalf Of Kevin Burton
when you shutdown the full ensemble the session isn't expired. when things come
back up your session will still be active. (it would be bad if the zk service
could not survive the bounce of an ensembel.)
you are way over thinking this and i fear you are not helping yourself with
trying to
just to clarify: you also get ConnectionLossException from syncronous requests
if the request cannot be sent or no response is received.
ben
-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Wednesday, January 07, 2009 10:16 AM
To: zookeeper-user@hadoop.apache.org
if you do a getData(/a, true) and then /a changes, you will get a watch
event. if /a changes again, you will not get an event. so, if you want to
monitor /a, you need to do a new getData() after each watch event to
reregister the watch and get the new value. (re-registering watches on
. Then people could suggest
abstractions that would essentially put a box around sections of the
diagram. However I feel woefully inadequate at the former :(.
.. Adam
On Thu, Jan 8, 2009 at 4:20 PM, Benjamin Reed br...@yahoo-inc.com wrote:
For your first issue if an ensemble goes offline and comes
: Updated NodeWatcher...
Ben this is great, thanks! Do you want to close out this one and point
to the faq?
https://issues.apache.org/jira/browse/ZOOKEEPER-264
Although IMO this should be moved to the forrest docs.
Patrick
Benjamin Reed wrote:
I'm really bad a creating figures, but i've put up
'potentially' already been processed. That way he can double
check first before he goes off and processes the message again. But
adding that info in ZK might be more expensive that doing the double
check every time in consumer anyways.
On Thu, Jan 8, 2009 at 11:42 AM, Benjamin Reed br...@yahoo
we should delay. it would be good to try out quotas for a bit before we do the
release. quotas are also a key part of the release. 3 weeks seem a little long
though.
ben
From: Mahadev Konar [maha...@yahoo-inc.com]
Sent: Thursday, January 15, 2009 4:32 PM
idleness is not a problem. the client library sends heartbeats to keep the
session alive. the client library will also handle reconnects automatically if
a server dies.
since session expiration really is a rare catastrophic event. (or at least it
should be.) it is probably easiest to deal with
just a quick sanity check. are you sure your memory is not overcommitted? in
other words you aren't swapping. since the gc does a bunch of random memory
accesses if you swap at all things will go very slow.
ben
From: Joey Echeverria [joe...@gmail.com]
i'm ready to reevaluate it. i did the contrib for fatjar and it was
extremely painful! (and that was an extremely simple contrib!) we really
want to ramp up the contribs and get a bunch of recipe implementations
in, so we need something that makes it really easy. i'm not a fan of
maven (they
... Be aware that the
contribution process, release process and other documentation would have
to be updated as part of this. For example if we want to push jars to an
artifact repo the artifacts/pom/etc... would have to be voted on as part
of the release process.
Patrick
Benjamin Reed wrote
I realize this is discussion is over, but i did want to make one quick
clarification. when we talk about ensembles, we are talking about the
servers that make up the zookeeper service. we refer to the servers that
use the zookeeper service as clients. we have systems here that use
ensembles of
it is possible for the time to pass without the session expiring. Imagine a
session timeout of 15 seconds. there is correlated power outage affecting the
zookeeper servers. lets say it takes 5 minutes to recover power and reboot.
when the service recovers, it resets expiration times, so when
i'm not exactly clear how you use these ideas, but one source of unique
ids that are longs is the zxid. if you create a znode, everytime you
write to it, you will get a unique zxid in the mzxid member of the stat
structure. (you get the stat structure back in the response to the setData.)
ben
yes, /zookeeper is part of the reserved namespace for zookeeper internals. you
should ignore it for such things.
ben
From: Satish Bhatti [cthd2...@gmail.com]
Sent: Wednesday, May 06, 2009 2:57 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Moving
good summary ted. just to add a bit. another motivation for the current design
is what scott had mentioned earlier: not sending a flood of changes when the
value of a node is changing rapidly. implicit in this is the fact that we do
not send the value in the events. not only does this make the
this is great to hear. it's great to see siblings playing together ;)
* In CXF we use Maven to build everything. To depend on Zookeeper we
need to pull it in from a Maven repository. I couldn't find Zookeeper
in any main Maven repos, so currently we're pulling it in from
sorry to jump in late.
if i understand the scenario correctly, you are partitioned from ZK, but
you still have access to the NN on which you are holding leases to
files. the problem is that even though your ephemeral nodes may timeout,
you are still holding a lease on the NN and recovery
the create is atomic. we just use a data structure that does not store
the list of children in order.
ben
Erik Holstad wrote:
Hey Patrik!
Thanks for the reply.
I understand all the reasons that you posted above and totally agree that
nodes should not be sorted since you then have to pay that
Or maybe /usr/local/include/zookeeper but either way c-client-src is weird.
Please open a jira.
Thanx
ben
Sent from my phone.
-Original Message-
From: Michi Mutsuzaki mi...@cs.stanford.edu
Sent: Saturday, August 01, 2009 6:15 PM
To: zookeeper-user@hadoop.apache.org
I assume you are calling the synchronous version of exists. The callbacks for
both the watches and async calls are processed by a callback thread, so the
ordering is strict. Synchronous call responses are not queued to the callback
thread. (this allows you to make synchronous calls in callbacks
good point david! zhang can you try david's scripts? we should probably
commit those. thanx for pointing them out david.
ben
David Bosschaert wrote:
FWIW, I've uploaded some Windows versions of the zookeeper scripts to
https://issues.apache.org/jira/browse/ZOOKEEPER-426 a while ago. They
run
are you using the single threaded or multithreaded C library? the exceeded
deadline message means that our thread was supposed to get control after a
certain period, but we got control that many milliseconds late. what is your
session timeout?
ben
The connection refused message as opposed to no route to host, or
unknown host, indicate that zookeeper has not been started on the other
machines. are the other machines giving similar errors?
ben
Le Zhou wrote:
Hi,
I'm trying to install HBase 0.20.0 in fully distributed mode on my cluster.
can you clarify what you are asking for? are you just looking for
motivation? or are you trying to find out how to use it?
the myid file just has the unique identifier (number) of the server in
the cluster. that number is matched against the id in the configuration
file. there isn't much to
not getting here :-)
Regards, Orjan
On Fri, Sep 25, 2009 at 3:56 PM, Benjamin Reed br...@yahoo-inc.com
wrote:
can you clarify what you are asking for? are you just looking for
motivation? or are you trying to find out how to use it?
the myid file just has the unique identifier (number
so you have two problems going on. both have the same root:
zookeeper_init returns before a connection and session is established
with zookeeper, so you will not be able to fill in myid until a
connection is made. you can do something with a mutex in the watcher to
wait for a connection, or
right at the beginning of
http://hadoop.apache.org/zookeeper/docs/r3.2.1/zookeeperStarted.html it
shows you the minimum standalone configuration.
that doesn't explain the 0 id. i'd like to try an reproduce it. do you
have an empty data directory with a single file, myid, set to 1?
ben
there are a bunch of presentations you can grab at
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations
ben
Mark Vigeant wrote:
Hey Everyone,
I'm supposed to give a presentation next week about the basic functionality and
uses of zookeeper. I was wondering if anybody out there
there aren't any limits on the number of znodes, it's just limited by
your memory. there are two things (probably more :) to keep in mind:
1) the 1M limit also applies to the children list. you can't grow the
list of children to more than 1M (the sum of the names of all of the
children)
I agree with Ted, it doesn't seem like a good idea to do in practice.
however, you do have a couple of options if you are just testing things:
1) use tmpfs
2) you can set forceSync to no in the configuration file to disable
syncing to disk before acknowledging responses
3) if you really want
no please open a jira as a new feature request.
sent from my droid
-Original Message-
From: Steve Chu [stv...@gmail.com]
Received: 12/21/09 3:44 AM
To: zookeeper-user@hadoop.apache.org [zookeeper-u...@hadoop.apache.org]
Subject: Does zookeeper support listening on a specified address?
hi Qing,
i'm glad you like the page and Zab.
yes, we are very familiar with Paxos. that page is meant to show a
weakness of Paxos and a design point for Zab. it is not to say Paxos is
not useful. Paxos is used in the real world in production systems.
sometimes there are not order
henry is correct. just to state another way, Zab guarantees that if a
quorum of servers have accepted a transaction, the transaction will
commit. this means that if less than a quorum of servers have accepted a
transaction, we can commit or discard. the only constraint we have in
choosing is
sadly connectionloss is the really ugly part of zookeeper! it is a pain
to deal with. i'm not sure we have best practice, but i can tell you
what i do :) ZOOKEEPER-22 is meant to alleviate this problem.
i usually use the asynch API when handling the watch callback. in the
completion function
i was looking through the docs to see if we talk about handling session
expired, but i couldn't find anything. we should probably open a jira to add to
the docs, unless i missed something. did i?
ben
-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Monday,
it is a bit confusing but initLimit is the timer that is used when a
follower connects to a leader. there may be some state transfers
involved to bring the follower up to speed so we need to be able to
allow a little extra time for the initial connection.
after that we use syncLimit to figure
do you ever use zookeeper_init() with the clientid field set to
something other than null?
ben
On 03/16/2010 07:43 AM, Łukasz Osipiuk wrote:
Hi everyone!
I am writing to this group because recently we are getting some
strange errors with our production zookeeper setup.
From time to time we
weird, this does sound like a bug. do you have a reliable way of
reproducing the problem?
thanx
ben
On 03/16/2010 08:27 AM, Łukasz Osipiuk wrote:
nope.
I always pass 0 as clientid.
Łukasz
On Tue, Mar 16, 2010 at 16:20, Benjamin Reedbr...@yahoo-inc.com wrote:
do you ever use
we have updated ZOOKEEPER-713 with much more detail, but the bottom line
is that the Invalid snapshot was caused by an OutOfMemoryError. this
turns out not be a problem since we recover using an older snapshot.
there are other things that are happening that are the real causes of
the problem.
yes it means in sync with the leader. syncLimit governs the timeout when
a follower is actively following a leader. initLimit is the initial
connection timeout. because there is the potential for more data that
needs to be transmitted during the initial connection, we want to be
able to manage
awesome! that would be great ivan. i'm sure pat has some more concrete
suggestions, but one simple thing to do is to run the unit tests and
look at the log messages that get output. there are a couple of
categories of things that need to be fixed (this is in no way exhaustive):
1) messages
i agree with ted. i think he points out some disadvantages with trying
do do more. there is a slippery slope with these kinds of things. the
implementation is complicated enough even with the simple model that we use.
ben
On 03/29/2010 08:34 PM, Ted Dunning wrote:
I perhaps should not have
is this a bug? shouldn't we be returning an error.
ben
On 05/12/2010 11:34 AM, Patrick Hunt wrote:
I think that explains it then - the server is probably dropping the new
(3.3.0) getChildren message (xid 7) as it (3.2.2 server) doesn't know
about that message type. Then the server responds to
good catch lei! if this helps gregory, can you open a jira to throw an
exception in this situation. we should be throwing an invalid argument
exception or something in this case.
thanx
ben
On 05/20/2010 09:04 AM, Lei Zhang wrote:
Seems you are passing in wrong arguments:
Should have been:
charity, do you mind going through your scenario again to give a
timeline for the failure? i'm a bit confused as to what happened.
ben
On 06/02/2010 01:32 PM, Charity Majors wrote:
Thanks. That worked for me. I'm a little confused about why it threw the
entire cluster into an unusable
the call is executed at a later time on a different thread. the zoo_a*
calls are non-blocking, so (subject to the thread scheduling) usually
they will return before the request completes.
ben
On 06/03/2010 01:24 PM, Jack Orenstein wrote:
I'm trying to figure out how to use zookeeper's C API.
yes. (except for the single threaded C-client library :)
ben
On 06/17/2010 10:16 AM, Jun Rao wrote:
Hi,
Is ZK client thread safe? Is it ok for multiple threads sharing the same ZK
client? Thanks,
Jun
we do this in our tests for ZooKeeper. bookkeeper uses the testing
classes as well, unfortunately, we haven't documented the interface.
ben
On 06/22/2010 08:42 PM, Ishaaq Chandy wrote:
Hi all,
First some background:
1. We use maven as our build tool.
2. We use Hudson as our CI server, it is
the difference between close and disconnect is that close will actually
try to tell the server to kill the session before disconnecting.
a paranoid lock implementation doesn't need to test it's session. it
should just monitor watch events to look for disconnect and expired
events. if a client
can you try the following:
Index: src/contrib/fatjar/build.xml
===
--- src/contrib/fatjar/build.xml(revision 962637)
+++ src/contrib/fatjar/build.xml(working copy)
@@ -46,6 +46,7 @@
fileset
by custom QuorumVerifier are you referring to
http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html
?
ben
On 07/14/2010 12:43 PM, Sergei Babovich wrote:
Hi,
We are currently evaluating use of ZK in our infrastructure. In our
setup we have a set of servers running
i think there is a wiki page on this, but for the short answer:
the number of znodes impact two things: memory footprint and recovery
time. there is a base overhead to znodes to store its path, pointers to
the data, pointers to the acl, etc. i believe that is around 100 bytes.
you cant just
how big is your database? it would be good to know the timing of the two calls.
shutdown should take very little time.
sent from my droid
-Original Message-
From: Vishal K [vishalm...@gmail.com]
Received: 7/16/10 6:31 PM
To: zookeeper-user@hadoop.apache.org
you have concluded correctly.
1) bookkeeper was designed for a process to use as a write-ahead log, so
as a simplifying assumption we assume a single writer to a log. we
should be throwing an exception if you try to write to a handle that you
obtained using openLedger. can you open a jira for
i did a benchmark a while back to see the effect of turning off the
disk. (it wasn't as big as you would think.) i had to modify the code.
there is an option to turn off the sync in the config that will get you
most of the performance you would get by turning off the disk entirely.
ben
On
as long as a watcher object is only used with a single ZooKeeper object
it will be called by the same thread.
ben
On 07/21/2010 11:12 AM, Joshua Ball wrote:
Hi,
Do implementations of Watcher need to be thread-safe, or can I assume
that process(...) will always be called by the same thread?
i thought there was a jira about supporting embedded zookeeper. (i
remember rejecting a patch to fix it. one of the problems is that we
have a couple of places that do System.exit().) i can't seem to find it
though.
one case that would be great for embedding is writing test cases, so i
think
zookeeper takes care of reregistering all watchers on reconnect. you
don't need to do anything.
ben
On 08/16/2010 09:04 AM, Qian Ye wrote:
Hi all:
Will the watchers of a client be losed when the client disconnects from a
Zookeeper server? It is said at
good point ted! i should have waited a bit longer before responding :)
ben
On 08/16/2010 09:20 AM, Ted Dunning wrote:
There are two different concepts. One is connection loss. Watchers survive
this and the client automatically connects
to another member of the ZK cluster.
The other is
the client does keep track of the watches that it has outstanding. when
it reconnects to a new server it tells the server what it is watching
for and the last view of the system that it had.
ben
On 08/16/2010 09:28 AM, Qian Ye wrote:
thx for explaination. Since the watcher can be preserved
there are two things to keep in mind when thinking about this issue:
1) if a zk client is disconnected from the cluster, the client is
essentially in limbo. because the client cannot talk to a server it
cannot know if its session is still alive. it also cannot close its session.
2) the
yes, you are right. we could do this. it turns out that the expiration
code is very simple:
while (running) {
currentTime = System.currentTimeMillis();
if (nextExpirationTime currentTime) {
this.wait(nextExpirationTime -
if we can't rely on the clock, we cannot say things like if ... for 5
seconds.
also, clients connect to servers, not visa-versa, so we cannot say
things like server can attempt to reconnect.
ben
On 08/19/2010 10:17 AM, Vishal K wrote:
Hi Ted,
I haven't give it a serious thought yet, but I
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch
out. Qing (or anyone else, can you reproduce it pretty easily?)
thanx
ben
On 08/19/2010 09:29 AM, Ted Dunning wrote:
Nice (modulo inverting the in your text).
Option 2 seems very simple. That always attracts me.
On
i put up a patch that should address the problem. now i need to write a
test case. the only way i can think of is to change the call to
System.currentTimeMillis to a utility class that calls
System.currentTimeMillis that i can mock for testing. any better ideas?
ben
On 08/19/2010 03:53 PM,
i'm a bit skeptical that this is going to work out properly. a server
may receive a socket reset even though the client is still alive:
1) client sends a request to a server
2) client is partitioned from the server
3) server starts trying to send response
4) client reconnects to a different
for this session type (so
4 would fail). Would that address your concern, others?
Patrick
On 09/01/2010 10:03 AM, Benjamin Reed wrote:
i'm a bit skeptical that this is going to work out properly. a server
may receive a socket reset even though the client is still alive:
1) client sends a request
@hadoop.apache.org
Cc: Benjamin Reed
Subject: Re: closing session on socket close vs waiting for timeout
This really is, just as Ben says a problem of false positives and false
negatives in detecting session
expiration.
On the other hand, the current algorithm isn't really using all the
information available
to waste my time if there's a fundamental reason it's a bad
idea.
Thanks,
Camille
-Original Message-
From: Benjamin Reed [mailto:br...@yahoo-inc.com]
Sent: Wednesday, September 08, 2010 4:03 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: closing session on socket close vs waiting
ah dang, i should have said generate a close request for the session
and push that through the system.
ben
On 09/10/2010 01:01 PM, Benjamin Reed wrote:
the problem is that followers don't track session timeouts. they track
when they last heard from the sessions that are connected to them
we should also point out that our ops guys here at yahoo! don't like
the break at major clause. i imagine when we do the next major release
we will try to be one release backwards compatible. (although we
shouldn't promise it until we successfully do it once :)
ben
On 09/30/2010 10:29 AM,
you will need to time how long it takes to read all that state back in
and adjust the initTime accordingly. it will probably take a while to
pull all that data into memory.
ben
On 10/05/2010 11:36 AM, Avinash Lakshman wrote:
I have run it over 5 GB of heap with over 10M znodes. We will
hi amit,
sorry for the late response. this week has been crunch time for a lot of
different things.
here are your answers:
production
1. it is still in prototype phase. we are evaluating different aspects,
but there is still some work to do to make it production ready. we also
need to
your guess is correct :) for bookkeeper and hedwig we released early
to do the development in public. originally we developed bookkeeper as a
distributed write ahead log for the NameNode in HDFS, but while we were
able to get a proof of concept going, the structure of the code of the
NameNode
this usually happens when a follower closes its connection to the leader. it is
usually caused by the follower shutting down or failing. you may get further
insight by looking at the follower logs. you should really run with timestamps
on so that you can correlate the logs of the leader and
how big is your data? you may be running into the problem where it
takes too long to do the state transfer and times out. check the
initLimit and the size of your data.
ben
On 10/10/2010 08:57 AM, Avinash Lakshman wrote:
Thanks Ben. I am not mixing processes of different clusters. I just
which scheme are you using?
ben
On 10/18/2010 11:57 PM, FANG Yang wrote:
2010/10/19 FANG Yangfa...@douban.com
hi, all
I have a simple zk client written by c ,which is attachment #1. When i
use ZOO_CREATOR_ALL_ACL, the ret code of zoo_create is -114((Invalid ACL
specified definde in
we should put in a test for that. it is certainly a plausible
scenario. in theory it will just flow into the next epoch and everything
will be fine, but we should try it and see.
ben
On 10/19/2010 11:33 AM, Sandy Pratt wrote:
Just as a thought experiment, I was pondering the following:
ZK
currently program1 can read and write to an open ledger, but program2
must wait for the ledger to be closed before doing the read. the problem
is that program2 needs to know the last valid entry in the ledger.
(there may be entries that may not yet be valid.) for performance
reasons, only
in hedwig one hub does both the publish and subscribe for a given
topic and therefore is the only processes reading and writing from/to a
ledger, so there isn't an issue.
The ReadAheadCache does read-ahead :) it is so that we can minimize
latency when doing sequential reads.
ben
On
how were you able to reproduce it?
all the znodes in /zkrsm were created with the sequence flag. right?
ben
On 11/01/2010 02:28 PM, Jeremy Stribling wrote:
We were able to reproduce it. A stat on all three servers looks
identical:
[zk:ip:port(CONNECTED) 0] stat /zkrsm
cZxid = 9
ctime = Mon
sequential znodes. I'm guessing this is
pretty well-tested behavior, so there must be something weird or wrong
about the way I have stuff setup.
I'm happy to provide whatever logs or snapshots might help someone
track this down. Thanks,
Jeremy
On 11/01/2010 02:42 PM, Benjamin Reed wrote:
how were
one thing to note: the if you are using a DNS load balancer, some load
balancers will return the list of resolved addresses in different orders
to do the balancing. the zookeeper client will shuffle that list before
it it used, so in reality, using a single DNS hostname resolving to all
the
ah i see. you are manually reestablishing the connection to B using the
session identifier for the session with A.
the problem is that when you call close on a session, it kills the
session. we don't really have a way to close a handle without do that.
(actually there is a test class that
89 matches
Mail list logo