Re: 3.4.6 to 3.4.12 weird issue

2018-06-28 Thread Camille Fournier
How are you launching the cluster? Are you using zkServer.sh to run it? On Thu, Jun 28, 2018 at 2:28 PM Dan Simoes wrote: > New to the list, howdy. > I've read about the issue with dataLogDir and dataDir in 3.4.10 ( > https://issues.apache.org/jira/browse/ZOOKEEPER-2960) and how it's fixed > in

Re: ...likely client has closed socket...

2017-07-17 Thread Camille Fournier
If your clients call the close operation when they are done using ZK you won't see this error. It is unexpected for your client to close the socket without having called close. I suspect that this is an error in the way you are using the ZK client. On Mon, Jul 17, 2017 at 9:25 AM,

Re: Zookeeper scalability and memory consumption

2017-02-22 Thread Camille Fournier
It depends on how much data you store in each node. On Wed, Feb 22, 2017 at 11:21 AM, Shivanshu Goswami < sgosw...@cs.stonybrook.edu> wrote: > Thanks Jordan. > > My question is about the same. If it keeps the entire DB in memory,* is > there a limit beyond which say Zookeeper server will stop

Re: etcd performance comparison

2017-02-22 Thread Camille Fournier
n create objective tests that compare common use > cases. > > > Jordan Zimmerman > > > On Feb 22, 2017, at 11:21 AM, Camille Fournier <cami...@apache.org> > wrote: > > > > I think that my biggest feeling about this blog post (besides not > &g

Re: etcd performance comparison

2017-02-22 Thread Camille Fournier
I think that my biggest feeling about this blog post (besides not disclosing the disk setup clearly) is that, ZK is really not designed to have massive write throughput. I would not traditionally recommend someone use ZK in that manner. If we think that evolving it to be useful for such workloads

Re: Extremely different readings on different zookeeper deployments

2017-02-07 Thread Camille Fournier
Disk writing speed is one of the major factors for zk write performance. Is the disk setup the same across both of these machines? My guess is that is a big factor. On Tue, Feb 7, 2017 at 2:24 AM, Amar Gajbhiye wrote: > Hi, > I am working on a distributed system where I

Re: ZK read-only issue

2016-07-29 Thread Camille Fournier
Just to clarify one thing though in the server logs from the fresh install is the new read only server reporting anything? On Jul 29, 2016 5:26 PM, "Camille Fournier" <cami...@apache.org> wrote: > Sorry again I'm having reading issues ignore that comment as I see flavio

Re: ZK read-only issue

2016-07-29 Thread Camille Fournier
Sorry again I'm having reading issues ignore that comment as I see flavio already answered it On Jul 29, 2016 5:25 PM, "Camille Fournier" <cami...@apache.org> wrote: > Update, I was confused by races of my own doing. Was this client > previously connected when it fail

Re: ZK read-only issue

2016-07-29 Thread Camille Fournier
this issue? On Jul 29, 2016 3:07 PM, "Camille Fournier" <cami...@apache.org> wrote: > Ok yeah I think this is reproducible and a bug in the client connection > read-only logic. > > On Fri, Jul 29, 2016 at 2:43 PM, Camille Fournier <cami...@apache.org> > wrote: >

Re: ZK read-only issue

2016-07-29 Thread Camille Fournier
Ok yeah I think this is reproducible and a bug in the client connection read-only logic. On Fri, Jul 29, 2016 at 2:43 PM, Camille Fournier <cami...@apache.org> wrote: > I'm looking at the readonly mode code right now and it appears that the > only way to set readonly mode is a g

Re: ZK read-only issue

2016-07-29 Thread Camille Fournier
I'm looking at the readonly mode code right now and it appears that the only way to set readonly mode is a global system property which means that the tests for this are only testing across 3 servers, all of which have readonly mode set. So, this MAY be a bug, but what a pain to figure out how to

Re: Does Zoo still provides reads when 1 node fails out of 2 nodes with quorum set to 2?

2016-07-11 Thread Camille Fournier
We support a read only mode configuration that will allow clients who wish to see disconnected nodes in read only mode. We do not support defaulting clusters into this configuration based on their configured size, because the edge cases there would be a headache and we don't recommend running two

Re: SyncLimit and client notifications

2016-07-07 Thread Camille Fournier
1) Yes, I believe that roughly if the follower doesn't hear from the leader it will go from state FOLLOWING to state LOOKING, which starts on that server a request for leader election 2) you can have ZK set to support read-only mode, which will allow reads from read-only clients even when

Re: observer changing to participant when there is no quorum

2016-06-15 Thread Camille Fournier
server is connected, but since its not a participant we get the > error above. In that case, one first needs to > convert the observer to remove the observer and then add it back. The > detailed explanation is in the doc, look for > "Changing an observer into a follower". > > On

Re: Use-case with lots of child nodes

2016-06-06 Thread Camille Fournier
Where exactly are you hitting the limit? Where in the stack is it throwing the ioexception with "Unreasonable length"? I believe that the way jute checks for this limit is when it serializes/deserializes various elements of the data structure, and glancing at the code it looks like the

Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Camille Fournier
three machines are up, or both machines in the preferred > datacenter, quorum can be achieved. > > On Fri, Jun 3, 2016 at 3:23 PM, Camille Fournier <cami...@apache.org> > wrote: > > > You can't solve this with weights. > > On Jun 3, 2016 6:03 PM, "Michael Han&

Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Camille Fournier
your system > > you have a reliable way to know that the other data center is really in > > fact down (this is a synchrony assumption), you could do as Camille > > suggested and > > reconfigure the system to only include the remaining data center. This > > would still be

Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Camille Fournier
2 servers is the same as 1 server wrt fault tolerance, so yes, you are correct. If they want fault tolerance, they have to run 3 (or more). On Fri, Jun 3, 2016 at 4:25 PM, Shawn Heisey wrote: > On 6/3/2016 1:44 PM, Nomar Morado wrote: > > Is there any settings to override

Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Camille Fournier
I can get away of not > using ZK > > > > Printing e-mails wastes valuable natural resources. Please don't print > this message unless it is absolutely necessary. Thank you for thinking > green! > > Sent from my iPhone > > > On Jun 3, 2016, at 3:51 PM, Camille Fou

Re: zookeeper deployment strategy for multi data centers

2016-06-03 Thread Camille Fournier
You could put the remaining available node in read-only mode. You could reconfigure the cluster to have the majority nodes in the remaining data center, but it would require reconfiguration and restart of the nodes in the living data center. But there's no automatic fix for this, and if you can

Re: Zookeeper is unable to start

2016-04-21 Thread Camille Fournier
Your attachments won't come through. Can you look at the log files and copy some of the last log lines into email? On Thu, Apr 21, 2016 at 10:23 PM, Eric Gao wrote: > Dear exports, > I have encountered a problem when I have started the zookeeper: > > [root@master data]#

Re: Multi DC ( DC-1 and DC-2) zookeeper setup

2016-03-09 Thread Camille Fournier
If you're referring to my setup I explicitly don't keep the data in sync across separate zk deployments. The logic for handling lookup to different zk is in the client. Trying to keep data in sync across multiple deployments of zk is probably not a great plan, but I'm sure you can think of

Re: ZooKeeper and HA

2016-02-26 Thread Camille Fournier
Have you tried asking the cloudera support lists about this error? On Feb 26, 2016 10:17 AM, "Paul" wrote: > Hi there guys, > I'm new to hadoop and I'm trying con setup HA in CDH 5.5.1, I've already > configured ( from the admin console ) yarn and HDFS in HA, now the

Re: ZooKeeperServer#shutdown hangs

2015-12-16 Thread Camille Fournier
Blergh. We made shutdown synchronized. But decrementing the requests is also synchronized and called from a different thread. So yeah, deadlock. Can you open a ticket for this? This came in with ZOOKEEPER-1907 C On Wed, Dec 16, 2015 at 2:46 PM, Ted Yu wrote: > Hi, > HBase

Re: Multi DC ( DC-1 and DC-2) zookeeper setup

2015-12-12 Thread Camille Fournier
2 members cannot form a quorum in a 5 node setup. You cannot guarantee a quorum split across two data centers will withstand the loss of either data center. You must have a tiebreaker node in a third data center. C On Dec 12, 2015 9:46 PM, "Kaushal Shriyan" wrote: >

Re: Multi DC ( DC-1 and DC-2) zookeeper setup

2015-12-12 Thread Camille Fournier
Here you go http://whilefalse.blogspot.com/2012/12/building-global-highly-available.html On Dec 12, 2015 9:56 PM, "Kaushal Shriyan" <kaushalshri...@gmail.com> wrote: > On Sun, Dec 13, 2015 at 8:21 AM, Camille Fournier <cami...@apache.org> > wrote: > > > 2 mem

Re: Reelection takes a long time

2015-11-23 Thread Camille Fournier
What does "a bit too long" mean here? 10 seconds? Two minutes? Longer, shorter? Any snippets from the log during election would also be helpful. Thanks C On Nov 23, 2015 7:49 AM, "Jens Rantil" wrote: > Hi, > > *Problem:* We are running a Zookeeper ensemble that takes a bit

Re: Behavior of server after loosing its state

2015-10-10 Thread Camille Fournier
Do you mean that the logs and snapshots are deleted off the machine? On Oct 10, 2015 1:18 PM, "Elias Levy" wrote: > Good day, > > I am wondering what is the expected behavior of a ZK server that is part of > an ensemble that looses its state after a restart, but

Re: Behavior of server after loosing its state

2015-10-10 Thread Camille Fournier
ined the cluster using the same server id? > > > On Sat, Oct 10, 2015 at 10:22 AM, Camille Fournier <cami...@apache.org> > wrote: > > > Do you mean that the logs and snapshots are deleted off the machine? >

Re: Doubts about libzookeeper

2015-08-04 Thread Camille Fournier
ZooKeeper provides a session-coherent single system image guarantee. Any request from the same session will see the results of all of its writes, regardless of which server it connects to. See: http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html#ch_zkGuarantees So, if your session

Re: LocalPeerBean getState() Possibly Incorrect

2015-08-04 Thread Camille Fournier
Yup that's a bug alright. Please feel free to file a ticket and even better file a ticket and submit a patch. Thanks, C On Tue, Aug 4, 2015 at 11:37 AM, Kevin Lee kgle...@yahoo.com.invalid wrote: Hi, When performing some JMX investigation and looking at the “State” within the LocalPeerBean,

Re: Doubts about libzookeeper

2015-08-04 Thread Camille Fournier
be able to connect. So it seems quite possible that it connects, then the request is executed (if zkserver-1 hasn't crashed after all) and the znode disappears. Alex On Tue, Aug 4, 2015 at 8:33 AM, Camille Fournier cami...@apache.org wrote: ZooKeeper provides a session-coherent single system

Re: Doubts about libzookeeper

2015-08-04 Thread Camille Fournier
1 and the leader sync won't immediately help, right ? On Tue, Aug 4, 2015 at 11:39 AM, Camille Fournier cami...@apache.org wrote: I thought that sync forced a flush of the queued events on a quorum member before completing/got it in the path of events from the leader, so that it won't

Re: Doubts about libzookeeper

2015-08-04 Thread Camille Fournier
AM, Camille Fournier cami...@apache.org wrote: ZooKeeper provides a session-coherent single system image guarantee. Any request from the same session will see the results of all of its writes, regardless of which server it connects to. See: http

Re: locking/leader election and dealing with session loss

2015-07-16 Thread Camille Fournier
They can and have happened in prod to people. I started taking about it after hearing enough people complain about just this situation on twitter. If you are relying on very large jvm memory footprints a 30s gc pause can and should be expected. In general I think most people don't need to worry

Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
You should not commit suicide unless it goes into SESSION_EXPIRED state. The quorum won't delete the ephemeral node immediately, it will only delete it when the session expires. So if your client for whatever reason disconnects and reconnects before the session expires, it will be fine. C On

Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
still AWOL presumably. C On Wed, Jul 15, 2015 at 2:24 PM, Camille Fournier cami...@apache.org wrote: I thought that the client itself had a notion of the session timeout internally that would conservatively let the client know that it was dead? If not, then that's my faulty memory

Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
If client a does a full gc immediately before sending a message that is long enough to lose the lock, it will send the message out of order. You cannot guarantee exclusive access without verification at the locked resource. C On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com

Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
, Jordan Zimmerman jor...@jordanzimmerman.com wrote: He’s talking about multiple writers. Given a reasonable session timeout, even a GC shouldn’t matter. If the GC causes a heartbeat miss the client will get SysDisconnected. -Jordan On July 15, 2015 at 2:05:41 PM, Camille Fournier (skami

Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
number? -Jordan On July 15, 2015 at 2:12:26 PM, Camille Fournier (skami...@gmail.com) wrote: I don't know what to tell you Jordan, but this is an observable phenomenon and it can happen. It's relatively unlikely and rare but not impossible. If you're interested in it I'd recommend reading the chubby

Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
I thought that the client itself had a notion of the session timeout internally that would conservatively let the client know that it was dead? If not, then that's my faulty memory. That being said, if you really care about the client not sending messages when it does not have the lock, the

Re: Why 1K filesize limit in ZKfuse

2015-06-13 Thread Camille Fournier
I'm not sure that there is a reason for it honestly. Possibly no one ever used it for anything larger than 1k. I don't know the package well myself but assuming it's storing this data inside of zk nodes having large files in there will fill up the zk memory fast and may be risky, but other than

Re: Q: Is ZK vulnerable Leap Second in 2015

2015-06-11 Thread Camille Fournier
I'm honestly not sure. This blog post from datastax on the JVM vulnerabilities to the leap second bug might be educational. http://www.datastax.com/dev/blog/preparing-for-the-leap-second Anyone else have any ideas? Thanks, C On Wed, Jun 10, 2015 at 5:12 PM, jawhny cooke j...@bluejeansnet.com

Re: Unit Tests failing

2015-05-02 Thread Camille Fournier
We need more context than this to help you... do you see the test output details earlier in the logs? Thanks, Camille On Fri, May 1, 2015 at 6:14 PM, Ankur Garg ankurgarg198...@gmail.com wrote: Hi , I am new to Apache Zookeeper . To set up my dev environment , I just took svn up from the

Re: Leader election duration

2015-05-01 Thread Camille Fournier
a diff isn't always possible, depending on how far behind followers compared to the leader, so the difference might be due to snapshots and diffs. -Flavio On Wednesday, April 29, 2015 6:32 PM, Camille Fournier cami...@apache.org wrote: Don't suppose you could share some snippets

Re: Leader election duration

2015-05-01 Thread Camille Fournier
. Sending a diff isn't always possible, depending on how far behind followers compared to the leader, so the difference might be due to snapshots and diffs. -Flavio On Wednesday, April 29, 2015 6:32 PM, Camille Fournier cami...@apache.org wrote: Don't suppose you could share some

Re: Leader election duration

2015-05-01 Thread Camille Fournier
://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication. If you cannot access these links, please notify us by reply message and we will send the contents to you. -Original Message- From: Camille Fournier [mailto:cami

Re: Leader election duration

2015-05-01 Thread Camille Fournier
claim to finish. C On Fri, May 1, 2015 at 12:22 PM, Camille Fournier cami...@apache.org wrote: Unfortunately that looks like just zab. It would be super awesome if you could send me one more log snippet of one of the other machines in this bad scenario. I think that will help me figure out

Re: Leader election duration

2015-05-01 Thread Camille Fournier
to you. -Original Message- From: Camille Fournier [mailto:cami...@apache.org] Sent: 01 May 2015 16:53 To: bookkeeper-u...@zookeeper.apache.org Subject: Re: Leader election duration One thing that jumps out at me here is that a lot of these messages are from different rounds. Some say

Re: Leader election duration

2015-05-01 Thread Camille Fournier
to do about it or whether it is an actual bug or not... if anyone with FLE experience more than just squinting at the code has an idea, I'm all ears. C On Fri, May 1, 2015 at 12:30 PM, Camille Fournier cami...@apache.org wrote: The other thing I notice is that we're first off in the n.round

Re: Leader election duration

2015-04-29 Thread Camille Fournier
, total data size, etc. I don't understand why though but that may just be my limited knowledge of the election protocol. Karol On 28 Apr 2015, at 19:54, Camille Fournier cami...@apache.org wrote: Just out of curiosity, if you start the 5 node cluster up with only 3 of the nodes

Re: Leader election duration

2015-04-28 Thread Camille Fournier
Just out of curiosity, if you start the 5 node cluster up with only 3 of the nodes to begin with (like, config 5, but only bring up 3 processes), does it speed up the leader election or is it still slow? C On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski karoldudzin...@gmail.com wrote: Hi,

Re: Cannot delete node after setting ACL

2015-04-27 Thread Camille Fournier
Just reply to the original email and it will thread accordingly On Mon, Apr 27, 2015 at 12:41 PM, Dubbert, Linda S lcdu...@sandia.gov wrote: Is this email how I respond to this question regarding deleting a node after setting an ACL? I think I know what the issue is with this particular

Re: Zookeeper-Zoodiscovery auto reconnect issue

2015-04-15 Thread Camille Fournier
So we have the notion of state that you can check. zooKeeper.getState().isAlive() will tell you if the client is actually alive or not. Looking through the code I'm not 100% sure why we are sending the Disconnected state change after the while loop, or if the code ever would, since the state

Re: Adding change to ZK code

2015-04-13 Thread Camille Fournier
Try running the ant task ant eclipse and refreshing the project. C On Mon, Apr 13, 2015 at 8:25 AM, Ibrahim i.s.el-san...@newcastle.ac.uk wrote: Hi folks, Here I am asking simple questions, but I really appreciate if I get help. Usually, I work with ZooKeeper 3.4.6, doing some experiments.

Re: Status of critical looped NPE issue?

2015-03-03 Thread Camille Fournier
...@squareup.com wrote: On Tue, Mar 3, 2015 at 2:36 PM, Camille Fournier cami...@apache.org wrote: Is there any more info you want to provide the ticket to help debug it in 3.4.6? There are some things that seem like they might be solaris-specific. May of us have dev laptops with OSX so

Re: Status of critical looped NPE issue?

2015-03-03 Thread Camille Fournier
Is there any more info you want to provide the ticket to help debug it in 3.4.6? There are some things that seem like they might be solaris-specific. May of us have dev laptops with OSX so perhaps we can debug it there, but I don't know how many folks have access to a solaris system to do

Re: Heartbeats not being received / responded to?

2015-01-21 Thread Camille Fournier
on my laptop, I *cannot* repro with the same on my coworkers laptop. So unfortunately I am forced to conclude that there is something strange going on locally. Thanks for the help anyhow! - Ian On Mon, Sep 29, 2014 at 9:40 AM, Camille Fournier cami...@apache.org wrote

Re: Make JDK 7 the minimum support version

2014-12-12 Thread Camille Fournier
+1 for 3.5+ On Fri, Dec 12, 2014 at 6:53 PM, Patrick Hunt ph...@apache.org wrote: +1 Makes sense to me. I'm assuming we're talking 3.5+ here, right? We would continue to support jdk6 in 3.4? Patrick On Fri, Dec 12, 2014 at 3:42 PM, Flavio Junqueira fpjunque...@yahoo.com.invalid wrote:

Re: Re: How to change zookeeper project into a maven project ?

2014-11-24 Thread Camille Fournier
Fortunately we have some ant shortcuts that will make this easy for you! Try running ant eclipse And it will set up the eclipse profile and download dependencies for you. On Mon, Nov 24, 2014 at 8:01 PM, Robin rchzz...@163.com wrote: Thanks Flavio. For a normally maven project, I can easily

Re: cross DC setup - is it Ok for ZK?

2014-10-21 Thread Camille Fournier
I have a blog post on this topic: http://whilefalse.blogspot.com/2012/12/building-global-highly-available.html I think you will find it helpful. The short answer is: the scheme you have proposed will cause the ZK to be unavailable when you do maintenance on the data center with 4 quorum members.

Re: cross DC setup - is it Ok for ZK?

2014-10-21 Thread Camille Fournier
: *if the leader is in the non-quorum side of the partition, that side of the partition will recognize that it no longer has a quorum of the ensemble* ( https://cwiki.apache.org/confluence/display/ZOOKEEPER/FailureScenarios). Where is the truth? :) On Tue, Oct 21, 2014 at 12:35 PM, Camille Fournier

Re: cross DC setup - is it Ok for ZK?

2014-10-21 Thread Camille Fournier
servers, 2 from each of 2 distinct groups. This is a different way of doing quorums in ZooKeeper if grouping makes sense in your scenario, like when you have multiple colos. -Flavio On Tuesday, October 21, 2014 10:15 PM, Camille Fournier cami...@apache.org wrote: You'll have to ask Flavio

Re: Heartbeats not being received / responded to?

2014-09-29 Thread Camille Fournier
[] Thanks, - Ian On Sun, Sep 28, 2014 at 2:22 PM, Camille Fournier cami...@apache.org wrote: Sorry but with what you've sent us I don't really see what the problem is. It does look like you connect and then nothing happens for 20s and then the connection is dropped. If you use the zkCli

Re: Heartbeats not being received / responded to?

2014-09-28 Thread Camille Fournier
Sorry but with what you've sent us I don't really see what the problem is. It does look like you connect and then nothing happens for 20s and then the connection is dropped. If you use the zkCli script to connect via the command line do you see the same problem? C On Fri, Sep 26, 2014 at 12:48

Re: Consistently running out of heap space

2014-09-05 Thread Camille Fournier
All state is stored in memory in ZK for performance reasons. It sounds like you're putting more data into it than the heap will accommodate. ZK is useful for references to data, but not for large amounts of actual data. It's not designed to be a large data store. Thanks, C On Fri, Sep 5, 2014

Re: Consistently running out of heap space

2014-09-05 Thread Camille Fournier
. Will these stay in memory? Thanks, Brian On 09/05/2014 11:26 AM, Camille Fournier wrote: All state is stored in memory in ZK for performance reasons. It sounds like you're putting more data into it than the heap will accommodate. ZK is useful for references to data, but not for large amounts

Re: entire cluster dies with EOFException

2014-07-04 Thread Camille Fournier
Do you have copies of the logs and snapshots from the time this happened? I think we'll need that to debug this. If you do can you open a ticket and attach those and tag me? C On Jul 4, 2014 9:30 AM, Aaron Zimmerman azimmer...@sproutsocial.com wrote: Hi all, We have a 5 node zookeeper cluster

Re: renaming a znode

2014-06-16 Thread Camille Fournier
Just to clarify you mean the multi API? C On Jun 16, 2014 9:40 AM, Jordan Zimmerman jor...@jordanzimmerman.com wrote: You could use the transaction api to create a new node and delete the old node. -JZ From: Mudit Verma mudit.f2004...@gmail.com Reply: user@zookeeper.apache.org

Re: Support for large number of keys?

2014-05-27 Thread Camille Fournier
Well, ZK is not designed to be a database. I wouldn't recommend most people try to use it as a database. There are many good KV store databases out there that are better suited to the operations one wants to do with a database and the consistency models and tradeoffs for a database. I think it

Re: Multi-facility Ensemble

2014-05-23 Thread Camille Fournier
Well, if A can't talk to C but B can talk to both, it kind of depends on what the state was before the partition, and then what happens after the partition. If the leader is in A, all of the members of C will go into disconnected state, but may also try to become leader since they can talk to B.

Re: System clocks

2014-04-21 Thread Camille Fournier
The timezone doesn't matter AFAIK. C On Mon, Apr 21, 2014 at 4:39 PM, Benjamin Jaton benjamin.ja...@gmail.comwrote: Hello, Do system clocks have to be the same across all the nodes? Example: Server A in the US Server B in Japan Server C in France Do I have to set the same system time

Re: Zookeeper can't find org/ietf/jgss/GSSException and neither can I

2014-04-21 Thread Camille Fournier
and a number of 1.7s. I did a jarscan on the whole tree and nada. It could be because I'm using OSX. But Zookeeper runs fine as a service. I can also run it as a client outside OSGi. But as an OSGi bundle, that's when the trouble starts. On Apr 21, 2014 5:17 PM, Camille Fournier cami

Re: Thread handling

2014-03-27 Thread Camille Fournier
I love the idea. In general, it would be great to uplift the way we do threading. It is a BIG project though, which is why it hasn't been tackled. I think this will go best if you have a clear idea of how we can break down the change so it isn't a several month dev/review process. C On Thu, Mar

Re: Basic client question

2014-03-24 Thread Camille Fournier
So, client will never be able to reach the server again? The client that you use (and we generally recommend you use an existing client library like Curator or even zkClient instead of writing your own) should have some sort of timeout configured so that if it gets disconnected and can't reach the

Re: Basic client question

2014-03-24 Thread Camille Fournier
this timeout work even if you never hear back from any server? Thanks, Dan On 03/24/2014 10:31 AM, Camille Fournier wrote: So, client will never be able to reach the server again? The client that you use (and we generally recommend you use an existing client library like Curator or even

Re: sometimes I can't read data I just wrote..race condition or misunderstanding?

2014-03-04 Thread Camille Fournier
Is the same client issuing the write and the read? If so, you should see the data you wrote. Otherwise, you may be connected to a lagging client, as Raul says. C On Tue, Mar 4, 2014 at 3:22 PM, Raúl Gutiérrez Segalés r...@itevenworks.netwrote: On 4 March 2014 07:21, Brian Tarbox

Re: ZK 3.4.5: Very Strange Write Latency Problem?

2014-02-24 Thread Camille Fournier
above will reveal the root cause. Any other suggestions are welcome. On Thu, Feb 20, 2014 at 7:57 PM, Camille Fournier cami...@apache.org wrote: I might suggest that you create a personal github and mock up a replication there :) I understand employers that own your code

Re: can't delete b/c Node not empty but the node really IS empty!

2014-02-21 Thread Camille Fournier
Can you open a jira and attach the zk snapshot and logs so we can debug this? On Fri, Feb 21, 2014 at 2:15 PM, Brian Tarbox tar...@cabotresearch.comwrote: I have some nodes that I can not delete because zkCli (and programmatically as well) says the nodes are not empty..but when I try to list

Re: ZK 3.4.5: Very Strange Write Latency Problem?

2014-02-20 Thread Camille Fournier
Can you share the test code somewhere (github maybe?)? Thanks, C On Thu, Feb 20, 2014 at 9:08 PM, jmmec jmmec2...@gmail.com wrote: Thanks for the quick reply. I did not try the slow test using a normal disk drive, however I first discovered this problem when writing to a 7200RPM disk drive

Re: ZK 3.4.5: Very Strange Write Latency Problem?

2014-02-20 Thread Camille Fournier
that several times now. ha... Appreciate any additional help or advice or suggestions from everyone and anyone and their brother or sister. On Thu, Feb 20, 2014 at 8:10 PM, Camille Fournier cami...@apache.org wrote: Can you share the test code somewhere (github maybe?)? Thanks, C

Re: Issue with monitoring zookeeper server state using four letter word commands

2014-02-18 Thread Camille Fournier
I'd certainly like to understand the fundamental problem you're seeing of why any server is unable to enter quorum for any period of time without being partitioned, etc. Is there a ticket open for this or do you think it's just part of your env somehow? As for the larger question, why not run the

Re: compile in eclipse error

2014-01-16 Thread Camille Fournier
Yeah I've seen this problem for a while. I think that I switched my eclipse VM back to jdk 1.6 and it helped. C On Thu, Jan 16, 2014 at 7:07 AM, Li Li fancye...@gmail.com wrote: I downloaded zookeeper source code of 3.4.5 and use ant to build it successfully Then I use ant eclipse to

Re: Question about multi rack/DC support

2014-01-12 Thread Camille Fournier
Cameron has it right on. You can't automatically detect the difference between the other two racks are down for maintenance and I am partitioned from the other two racks. In the latter case, you will have data loss and split-brain if you automatically convert to a single server cluster because, if

Re: contrib REST?

2014-01-08 Thread Camille Fournier
I've never been super crazy about a REST interface for ZK, since so much of the correctness etc depends on handling state changes. But I realize this concern might be pedantic. Glancing at the java code, it could use at minimum a bit of a facelift... lots of System.out.printlns in there. C On

Re: Behavior when client disconnects

2013-12-04 Thread Camille Fournier
As far as I can tell from the code: c1 will send its last seen zxid to the server that it is trying to connect to. If that zxid is greater than the zxid of the server, the server will refuse the connection. In this case, if the client has not seen an ack, it is certainly possible that the last

Re: Status of 3.4.6

2013-11-28 Thread Camille Fournier
I'm happy to take a look but none of the patches in 1576 apply cleanly to either branch; can you recreate and resubmit? Thanks, C On Thu, Nov 28, 2013 at 7:26 AM, Edward Ribeiro edward.ribe...@gmail.comwrote: Hi Flavio, I have a little patch that addresses both

Re: Ensure there is one master

2013-11-26 Thread Camille Fournier
The master in part b is only a master so long as it has a quorum of followers. It loses the quorum when the network partitions and so will no longer act as master from that moment. Additionally the master actions all require a consensus to proceed. So even if the master in section b doesn't know

Re: Disqualify a node from leader election

2013-11-21 Thread Camille Fournier
No. I recommend monitoring the node and should it become leader, kill the node and let it reelect. On Thu, Nov 21, 2013 at 9:10 PM, Owen Kim ohech...@gmail.com wrote: I have a 5-node cluster and have a node that I want to participate in leader election for quorum but never be a leader

Re: Are the Zookeeper nodes created at Client Side or Server Side?

2013-11-18 Thread Camille Fournier
A minor correction: The data is both stored in memory and written to a transaction log, then snapshot to disk. There are tools for parsing the transaction logs and snapshots but they're mostly useful only for server debugging On Nov 18, 2013 12:20 AM, Cameron McKenzie mckenzie@gmail.com wrote:

Re: Growing a cluster

2013-11-05 Thread Camille Fournier
Are you running a snapshot version of 3.5? Or is this for some future use case? Here is the documentation for using dynamic reconfig: https://docs.google.com/document/d/1AF8pIfQbN5cKxe0c4cQ4_DW6ZjBJqSkyANcTGUwkzjc/edit Unless you're using a snapshot of 3.5, I think you might be better just doing

Re: zkConsole not returning results basis on my command

2013-10-28 Thread Camille Fournier
Not obviously... I can do this just fine, although I'm not using cygwin but instead using zkCli.cmd and zkServer.cmd (which are the windows command lines). If you use windows command lines vs cygwin, does it work? On Mon, Oct 28, 2013 at 4:13 PM, Techy Teck comptechge...@gmail.com wrote: I

Re: Geographically redundant ZooKeeper instances

2013-09-15 Thread Camille Fournier
No, to do this you need a tiebreaker node at a tertiary site. I wrote a blog post on this a while back, you may find it useful: http://whilefalse.blogspot.com/2012/12/building-global-highly-available.html Best, Camille On Sun, Sep 15, 2013 at 6:32 PM, Cameron McKenzie

Re: Clarification of watch behavior at the Zookeeper Server

2013-09-11 Thread Camille Fournier
OK here goes: As far as I can tell from the code, the watches are resent by the client connection when it connects to a new server. See ClientCnxn.primeConnection particularly starting line 894. So whenever you connect to a server, we resend the live watches. I don't see anything obvious to

Re: Peer that can't become leader

2013-09-05 Thread Camille Fournier
-Original Message- From: Ben Horowitz bhoro...@gmail.com Sent: 05/09/2013 06:57 To: user@zookeeper.apache.org user@zookeeper.apache.org Subject: Peer that can't become leader Hi all, Camille Fournier in her blog post on a global service discovery infrastructure using ZK [1] describes

Re: Zookeeper performance

2013-07-31 Thread Camille Fournier
This sounds highly error prone to me regardless of whether or not zookeeper can handle the load-. Why not just use a standard transaction model with a vector clock or other timing device to detect conflicts so you don't have to worry about a second server to talk to (zookeeper) to do an update? On

Re: Zookeeper on Windows

2013-07-01 Thread Camille Fournier
Do you mean the ZooKeeper server or ZooKeeper clients? There are many places using ZooKeeper clients on windows. C On Mon, Jul 1, 2013 at 5:59 PM, Flavio Junqueira fpjunque...@yahoo.comwrote: The PoweredBy page does not say anything and I'm personally not aware of any case that has been

Re: Zookeeper on Windows

2013-07-01 Thread Camille Fournier
give names of some companies which are successfully using both java and C windows clients for zookeeper with either windows machines as zookeeper server or a linux machine as zookeeper server. Thanks Anurag On Mon, Jul 1, 2013 at 3:24 PM, Camille Fournier cami...@apache.org wrote: Do you

Re: Use zookeeper as Load Blancer

2013-05-15 Thread Camille Fournier
You can certainly use ZK for this use case. You'll have to implement the load balancing algorithm yourself but zookeeper can certainly be used for this. You might look at some related open source solutions already out there like norbert: http://data.linkedin.com/opensource/norbert C On Wed, May

Re: Error Path when creating a node

2013-03-20 Thread Camille Fournier
The path /apps/neo4j/zookeeper/data has not been created in the ZK on your unix box. You need to create that path before you can write an ephemeral node to it. You probably did this already on your windows ZK which is why you aren't seeing the error. ZooKeeper does not create parent nodes in a

  1   2   >