Pretty much all of the Java client wrappers out there in the wild have some
sort of a retry loop around operations, to make some of this easier to deal
with. But they don't to my knowledge deal with the situation of knowing whether
an operation succeeded in the case of a disconnect (it is
, this is a case that a disconnection can be handled
generically.
-JZ
On 10/14/11 7:20 AM, Fournier, Camille F. camille.fourn...@gs.com
wrote:
Pretty much all of the Java client wrappers out there in the wild have
some sort of a retry loop around operations, to make some of this easier
to deal
It's pretty easy to set up a zk-smoketest to simulate what you are doing. We
can't answer this question without knowing how big the data you're writing, etc
etc. I would recommend testing it out yourself on realistic data sizes.
So, the node was created by 0x13220b93e610550 at 12:17:56, then that session
closed at 12:17:57, the node did not delete, and a bunch of other sessions
later tried to create the node. These sessions got nodeexists failures I
presume?
Forgive the block of text I'm going to write instead of
We spread our ZKs across 3 data centers and in fact, these data centers are
split across global regions (2 or 4 in one region, one in a remote region). To
keep throughput up (and note that the throughput you have to worry about is
only write throughput), we always ensure that the master is in
We have a monitor process that runs 'stat' against the remote ZK and if it
returns leader, kills the process.
-Original Message-
From: Damu R [mailto:damu.devn...@gmail.com]
Sent: Thursday, September 22, 2011 12:47 PM
To: user@zookeeper.apache.org
Subject: Re: zookeeper cluster
This is expected. In cases where the network becomes unstable, it is the
responsibility of the client writer to handle disconnected events appropriately
and check to verify whether nodes they tried to write around the time of these
events did or did not succeed. It makes writing a Generic
Well, if you are locking member IDS that are generated strings that will never
be reused, the best you can do right now is clean those up after a period of
time.
If, on the other hand, your member IDs are meaningful, it's likely you will
want to reuse these locations in the future when you
Did anyone ever check resetting watches at client reconnect on a client with a
chroot? Looking at the code, we store the watches associated with the
non-chroot path, but they are set by the original request prepending chroot to
the request. However, it looks like the SetWatches request on
and reconnecting
Camille,
Do you think we should put the fix in 3.3.4? I think 3.4 might take a while to
stabilize, so 3.3.4 would be a good release to get this in.
Thoughts?
mahadev
On Aug 29, 2011, at 10:50 AM, Fournier, Camille F. wrote:
Well, it causes the problem you are seeing. If you set any
As long as the two clusters can ping each other, just set up a single ZK
cluster spread across the two, 3 or 5 nodes (total, not per cluster). Note that
if these two clusters are used for business continuity purposes (spread across
2 data centers), you still risk the outage of the zookeeper if
, Fournier, Camille F.
camille.fourn...@gs.com wrote:
As long as the two clusters can ping each other, just set up a single
ZK
cluster spread across the two, 3 or 5 nodes (total, not per cluster).
Note
that if these two clusters are used for business continuity purposes
(spread
letting you see
time go backwards. Your situation is different of course.
On Thu, Aug 4, 2011 at 7:05 PM, Fournier, Camille F.
camille.fourn...@gs.com wrote:
Right now the server just detects that the zxid is wrong, and calls
close
on the client. The client logs:
15:01:47,593 - INFO
that thinks that it is
in quorum then things are not good. The definition of thinks it is in
quorum is problematic of course.
On Fri, Aug 5, 2011 at 10:57 AM, Fournier, Camille F.
camille.fourn...@gs.com wrote:
Oh blah, of course it won't be b/w compatible, because all the older
clients would
to a dev discussion)
C
-Original Message-
From: Fournier, Camille F. [Tech]
Sent: Friday, August 05, 2011 11:57 AM
To: 'user@zookeeper.apache.org'
Subject: RE: devops/admin/client question: What do you do when you rollback?
Hmmm. I thought I had another way around this but I don't. We really
We had an issue here the other day where the ZK servers were running poorly,
and in an effort to get them healthy again we ended up rolling back the cluster
state. While this was, in retrospect, not the right solution to the problem we
were facing, it brought up another problem. Namely, that
: Thursday, August 04, 2011 1:51 PM
To: user@zookeeper.apache.org
Subject: Re: devops/admin/client question: What do you do when you rollback?
On Thu, Aug 4, 2011 at 10:29 AM, Fournier, Camille F.
camille.fourn...@gs.com wrote:
We had an issue here the other day where the ZK servers were running
If the zk cluster doesn't get pings from your existing master, the zk client on
that master should see a disconnected state event, not a node deletion event.
Upon seeing that event, it should stop acting as master until such time as it
can determine whether it has reconnected and is still
ZooKeeper can't possibly know that you are in GDB unless you have a special
message that you send to the server that says I'm in a debugger now, please
don't expire me. You might be able to hack something in to do this, but do you
really want to? I think the second idea is best. If you are a
thread keeps going happily along . though this
is a very slight possibility, theoretically it is still possible.
or am I missing something?
Thanks
Yang
On Mon, Jul 18, 2011 at 6:51 AM, Fournier, Camille F.
camille.fourn...@gs.com wrote:
If the zk cluster doesn't get pings from your existing
)
should be guaranteed to be higher than the time needed for application
thread to detect the Disconnect event from ZK client. the latter time
value can be due to GC pause, thread scheduling delay etc.
right?
thanks
Yang
On Mon, Jul 18, 2011 at 9:30 AM, Fournier, Camille F.
camille.fourn...@gs.com
Again, you seem too easily offended when touching on the subject,
which makes Thomas' points valid.
No it doesn't. He is allowed to be offended and it has no bearing on the truth
of the matter. This isn't a Shakespearian play.
C
-Original Message-
From: Gustavo Niemeyer
)?
Thanks,
Jun
On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao jun...@gmail.com wrote:
The log doesn't have any state changing entries around the time the watcher
is triggered, in all clients.
Jun
On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech]
camille.fourn...@gs.com wrote:
Any state
I would bet that Ted is right about this, but if you're still having problems
and want to put the yourkit profile up somewhere I could take a look later
today.
C
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Wednesday, June 01, 2011 2:07 AM
To:
Global clusters will affect writes greatly, and may also affect you client
reads in an indirect manner.
Writes, having to traverse from one region to another for purposes of voting,
will be slowed down considerably by the ping time between regions.
If you did a three node deployment in the
I have some ideas of where to look on this, happy to do it in the next day or
two if no one else wants to look. Please do open a jira even though you can't
reproduce it consistently yet, just so we have somewhere to track the efforts.
C
-Original Message-
From: Jeremy Stribling
To do this right you probably need messaging queues. I'd research the various
MQ solutions out there. They are built to handle exactly this sort of issue.
You could try to implement it via a ZK distributed queue plus some sort of
crazy transaction logic in each process (so that a document to
Have you checked out the distributed queue recipe? It is what I have used to
implement a solution to a similar problem.
http://hadoop.apache.org/zookeeper/docs/r3.3.2/recipes.html
Are the jobs worker-specific, or can all workers handle all jobs? The
distributed queue protocol is very nice and
28 matches
Mail list logo