Re: How to reestablish a session

2010-11-18 Thread Benjamin Reed
is_unrecoverable() means exactly that: the session is toast. nothing you do will get it back. zookeeper_init is almost never used with a non-null client_id. the main use case for it is crash recovery. i've rarely seen it used, but you can start a session, save off the client_id to disk, create

Re: How to reestablish a session

2010-11-18 Thread Benjamin Reed
oops, sorry camille, i didn't mean to replicate your answer. you explained it better than me :) ben On 11/18/2010 10:06 AM, Fournier, Camille F. [Tech] wrote: This is exactly the scenario that you use to test session expiration, make one connection to a ZK and then another with the same sessi

Re: How to reestablish a session

2010-11-18 Thread Benjamin Reed
ah i see. you are manually reestablishing the connection to B using the session identifier for the session with A. the problem is that when you call "close" on a session, it kills the session. we don't really have a way to close a handle without do that. (actually there is a test class that do

Re: How to reestablish a session

2010-11-18 Thread Benjamin Reed
that quote is a bit out of context. it was with respect to a proposed change. in your scenario can you explain step 4)? what are you closing? ben On 11/18/2010 07:16 AM, Gustavo Niemeyer wrote: Greetings, As some of you already know, we've been using ZooKeeper at Canonical for a project we'v

Re: Running cluster behind load balancer

2010-11-04 Thread Benjamin Reed
at 3:45 PM, Benjamin Reed wrote: it would have to be a TCP based load balancer to work with ZooKeeper clients, but other than that it should work really well. The clients will be doing heart beats so the TCP connections will be long lived. The client library does random connection load balancing an

Re: Running cluster behind load balancer

2010-11-03 Thread Benjamin Reed
it would have to be a TCP based load balancer to work with ZooKeeper clients, but other than that it should work really well. The clients will be doing heart beats so the TCP connections will be long lived. The client library does random connection load balancing anyway. ben On 11/03/2010 12:

Re: Getting a "node exists" code on a sequence create

2010-11-03 Thread Benjamin Reed
it, and then trying to create more sequential znodes. I'm guessing this is pretty well-tested behavior, so there must be something weird or wrong about the way I have stuff setup. I'm happy to provide whatever logs or snapshots might help someone track this down. Thanks, Jeremy On 1

Re: Getting a "node exists" code on a sequence create

2010-11-01 Thread Benjamin Reed
how were you able to reproduce it? all the znodes in /zkrsm were created with the sequence flag. right? ben On 11/01/2010 02:28 PM, Jeremy Stribling wrote: We were able to reproduce it. A "stat" on all three servers looks identical: [zk:(CONNECTED) 0] stat /zkrsm cZxid = 9 ctime = Mon Nov 01

Re: Is it possible to read/write a ledger concurrently

2010-10-22 Thread Benjamin Reed
class in Hedwig? Is it used somewhere for this concurrent read/write problem? -regards Amit - Original Message From: Benjamin Reed To: zookeeper-user@hadoop.apache.org Sent: Fri, 22 October, 2010 11:09:07 AM Subject: Re: Is it possible to read/write a ledger concurrently currently program

Re: Is it possible to read/write a ledger concurrently

2010-10-21 Thread Benjamin Reed
currently program1 can read and write to an open ledger, but program2 must wait for the ledger to be closed before doing the read. the problem is that program2 needs to know the last valid entry in the ledger. (there may be entries that may not yet be valid.) for performance reasons, only prog

Re: zxid integer overflow

2010-10-19 Thread Benjamin Reed
we should put in a test for that. it is certainly a plausible scenario. in theory it will just flow into the next epoch and everything will be fine, but we should try it and see. ben On 10/19/2010 11:33 AM, Sandy Pratt wrote: Just as a thought experiment, I was pondering the following: ZK s

Re: invalid acl for ZOO_CREATOR_ALL_ACL

2010-10-19 Thread Benjamin Reed
which scheme are you using? ben On 10/18/2010 11:57 PM, FANG Yang wrote: 2010/10/19 FANG Yang hi, all I have a simple zk client written by c ,which is attachment #1. When i use ZOO_CREATOR_ALL_ACL, the ret code of zoo_create is -114((Invalid ACL specified definde in zookeeper.h)), but

Re: Testing zookeeper outside the source distribution?

2010-10-18 Thread Benjamin Reed
we should be exposing those classes and releasing them as a testing jar. do you want to open up a jira to track this issue? ben On 10/18/2010 05:17 AM, Anthony Urso wrote: Anyone have any pointers on how to test against ZK outside of the source distribution? All the fun classes (e.g. ClientBa

Re: Membership using ZK

2010-10-12 Thread Benjamin Reed
state. Is my understanding correct? Please advice. Thanks Avinash On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed wrote: ZooKeeper considers a client dead when it hasn't heard from that client during the timeout period. clients make sure to communicate with ZooKeeper at least once in 1/

Re: Membership using ZK

2010-10-12 Thread Benjamin Reed
ZooKeeper considers a client dead when it hasn't heard from that client during the timeout period. clients make sure to communicate with ZooKeeper at least once in 1/3 the timeout period. if the client doesn't hear from ZooKeeper in 2/3 the timeout period, the client will issue a ConnectionLos

Re: What does this mean?

2010-10-11 Thread Benjamin Reed
d who the follower to get more insight? Thanks A On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reed wrote: this usually happens when a follower closes its connection to the leader. it is usually caused by the follower shutting down or failing. you may get further insight by looking at the follower

RE: What does this mean?

2010-10-10 Thread Benjamin Reed
this usually happens when a follower closes its connection to the leader. it is usually caused by the follower shutting down or failing. you may get further insight by looking at the follower logs. you should really run with timestamps on so that you can correlate the logs of the leader and foll

Re: Question on production readiness, deployment, data of BookKeeper / Hedwig

2010-10-08 Thread Benjamin Reed
eeper used in production as a WAL (or for any other use) anywhere? If so, for what uses? Any info (even anecdotal) would be great! -jake On Thu, Oct 7, 2010 at 9:15 AM, Benjamin Reed wrote: hi amit, sorry for the late response. this week has been crunch time for a lot of different t

Re: Question on production readiness, deployment, data of BookKeeper / Hedwig

2010-10-07 Thread Benjamin Reed
hi amit, sorry for the late response. this week has been crunch time for a lot of different things. here are your answers: production 1. it is still in prototype phase. we are evaluating different aspects, but there is still some work to do to make it production ready. we also need to get

Re: Zookeeper on 60+Gb mem

2010-10-05 Thread Benjamin Reed
you will need to time how long it takes to read all that state back in and adjust the initTime accordingly. it will probably take a while to pull all that data into memory. ben On 10/05/2010 11:36 AM, Avinash Lakshman wrote: I have run it over 5 GB of heap with over 10M znodes. We will defin

Re: ZK compatability

2010-10-01 Thread Benjamin Reed
we should also point out that our ops guys here at yahoo! don't like the break at major clause. i imagine when we do the next major release we will try to be one release backwards compatible. (although we shouldn't promise it until we successfully do it once :) ben On 09/30/2010 10:29 AM, Pa

Re: closing session on socket close vs waiting for timeout

2010-09-10 Thread Benjamin Reed
ah dang, i should have said "generate a close request for the session and push that through the system." ben On 09/10/2010 01:01 PM, Benjamin Reed wrote: the problem is that followers don't track session timeouts. they track when they last heard from the sessions that a

Re: closing session on socket close vs waiting for timeout

2010-09-10 Thread Benjamin Reed
x27;t really want to waste my time if there's a fundamental reason it's a bad idea. Thanks, Camille -----Original Message- From: Benjamin Reed [mailto:br...@yahoo-inc.com] Sent: Wednesday, September 08, 2010 4:03 PM To: zookeeper-user@hadoop.apache.org Subject: Re: closing session

Re: closing session on socket close vs waiting for timeout

2010-09-08 Thread Benjamin Reed
@hadoop.apache.org Cc: Benjamin Reed Subject: Re: closing session on socket close vs waiting for timeout This really is, just as Ben says a problem of false positives and false negatives in detecting session expiration. On the other hand, the current algorithm isn't really using all the information avai

Re: closing session on socket close vs waiting for timeout

2010-09-06 Thread Benjamin Reed
this session type (so 4 would fail). Would that address your concern, others? Patrick On 09/01/2010 10:03 AM, Benjamin Reed wrote: i'm a bit skeptical that this is going to work out properly. a server may receive a socket reset even though the client is still alive: 1) client sends a re

Re: closing session on socket close vs waiting for timeout

2010-09-01 Thread Benjamin Reed
i'm a bit skeptical that this is going to work out properly. a server may receive a socket reset even though the client is still alive: 1) client sends a request to a server 2) client is partitioned from the server 3) server starts trying to send response 4) client reconnects to a different serv

Re: Session expiration caused by time change

2010-08-20 Thread Benjamin Reed
, Ted Dunning wrote: Put in a four letter command that will put the server to sleep for 15 seconds! :-) On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reed wrote: i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. Qing (or anyone else, can you reproduce it pretty easily?)

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
me. On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed wrote: yes, you are right. we could do this. it turns out that the expiration code is very simple: while (running) { currentTime = System.currentTimeMillis(); if (nextExpirationTime>

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
if we can't rely on the clock, we cannot say things like "if ... for 5 seconds". also, clients connect to servers, not visa-versa, so we cannot say things like "server can attempt to reconnect". ben On 08/19/2010 10:17 AM, Vishal K wrote: Hi Ted, I haven't give it a serious thought yet, bu

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
could be given a bit of a second lease on life, delaying all of their expiration. Since time-outs are relatively short, the server would be able to forget about the bump very shortly. On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reed wrote: if we try to use network messages to detect and corre

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm afraid it isn't that simple. we figure out who is expired by bucketizing sessions to be expired in an interval. if we hear from that a we move it to a different bucket, otherwise when the bucket expires, everything in that bucket goes away. when time jumps, it looks to the server like ther

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
do you have a pointer to those timers? thanx ben On 08/18/2010 11:58 PM, Martin Waite wrote: On Linux, I believe that there is a class of timers provided that is immune to this, but I doubt that there is a platform independent way of coping with this.

Re: Weird ephemeral node issue

2010-08-17 Thread Benjamin Reed
there are two things to keep in mind when thinking about this issue: 1) if a zk client is disconnected from the cluster, the client is essentially in limbo. because the client cannot talk to a server it cannot know if its session is still alive. it also cannot close its session. 2) the client

Re: A question about Watcher

2010-08-16 Thread Benjamin Reed
the client does keep track of the watches that it has outstanding. when it reconnects to a new server it tells the server what it is watching for and the last view of the system that it had. ben On 08/16/2010 09:28 AM, Qian Ye wrote: thx for explaination. Since the watcher can be preserved wh

Re: A question about Watcher

2010-08-16 Thread Benjamin Reed
good point ted! i should have waited a bit longer before responding :) ben On 08/16/2010 09:20 AM, Ted Dunning wrote: There are two different concepts. One is connection loss. Watchers survive this and the client automatically connects to another member of the ZK cluster. The other is sessio

Re: A question about Watcher

2010-08-16 Thread Benjamin Reed
zookeeper takes care of reregistering all watchers on reconnect. you don't need to do anything. ben On 08/16/2010 09:04 AM, Qian Ye wrote: Hi all: Will the watchers of a client be losed when the client disconnects from a Zookeeper server? It is said at http://hadoop.apache.org/zookeeper/docs/

Re: How to handle "Node does not exist" error?

2010-08-12 Thread Benjamin Reed
i thought there was a jira about supporting embedded zookeeper. (i remember rejecting a patch to fix it. one of the problems is that we have a couple of places that do System.exit().) i can't seem to find it though. one case that would be great for embedding is writing test cases, so i think

Re: Do implementations of Watcher need to be thread-safe?

2010-07-21 Thread Benjamin Reed
as long as a watcher object is only used with a single ZooKeeper object it will be called by the same thread. ben On 07/21/2010 11:12 AM, Joshua Ball wrote: Hi, Do implementations of Watcher need to be thread-safe, or can I assume that process(...) will always be called by the same thread? T

Re: ZK recovery questions

2010-07-21 Thread Benjamin Reed
i did a benchmark a while back to see the effect of turning off the disk. (it wasn't as big as you would think.) i had to modify the code. there is an option to turn off the sync in the config that will get you most of the performance you would get by turning off the disk entirely. ben On 07/

Re: does a ZK client read its own write

2010-07-20 Thread Benjamin Reed
it is still guaranteed to see its own write. when a client reconnects to a different server, we guarantee that the new server will be at least as up-to-date as the last server. otherwise the client would go back in time and a lot of things would go wrong. ben On 07/20/2010 08:28 AM, Jun Rao w

Re: Regarding the process method of Watcher Interface

2010-07-19 Thread Benjamin Reed
) then you need to synchronize. ben On 07/19/2010 03:36 PM, Srikanth Bondalapati: wrote: Thanks Dave& Ben. So, ultimately I need to synchronize process() method, when the same Watcher object is registered with different zookeeper handles (or Znodes). :) On Mon, Jul 19, 2010 at 3:03 PM, Benj

Re: BookKeeper Doubts

2010-07-19 Thread Benjamin Reed
you have concluded correctly. 1) bookkeeper was designed for a process to use as a write-ahead log, so as a simplifying assumption we assume a single writer to a log. we should be throwing an exception if you try to write to a handle that you obtained using openLedger. can you open a jira for

Re: Regarding the process method of Watcher Interface

2010-07-19 Thread Benjamin Reed
yes, you (and dave) are correct. watches are invoked sequentially in order. the only time you can run into trouble is if you register the same watcher object with different zookeeper handles since there is a dispatch thread per zookeeper handle. ben On 07/19/2010 02:50 PM, Srikanth Bondalapat

RE: cleanup ZK takes 40-60 seconds

2010-07-16 Thread Benjamin Reed
how big is your database? it would be good to know the timing of the two calls. shutdown should take very little time. sent from my droid -Original Message- From: Vishal K [vishalm...@gmail.com] Received: 7/16/10 6:31 PM To: zookeeper-user@hadoop.apache.org [zookeeper-u...@hadoop.apache.

Re: total # of zknodes

2010-07-15 Thread Benjamin Reed
i think there is a wiki page on this, but for the short answer: the number of znodes impact two things: memory footprint and recovery time. there is a base overhead to znodes to store its path, pointers to the data, pointers to the acl, etc. i believe that is around 100 bytes. you cant just di

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Benjamin Reed
by custom QuorumVerifier are you referring to http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html ? ben On 07/14/2010 12:43 PM, Sergei Babovich wrote: Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running fr

Re: What does this exception mean?

2010-07-14 Thread Benjamin Reed
that means that your connection to zookeeper has broken. usually because the server you were connected to failed. see http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling ben On 07/14/2010 11:41 AM, Avinash Lakshman wrote: Hi All I run into this periodically. I am curious to know what this

Re: Regarding Leader election and the limit on number of clients without performance degradation

2010-07-12 Thread Benjamin Reed
ted is correct, as usual. that warning is really to avoid unnecessary load, and 16 clients really don't generate much of a load at all. even with thousands of cliets, if they really need the list of children it will still be ok. the point of that note was that for leader election only one proce

Re: running the systest

2010-07-09 Thread Benjamin Reed
can you try the following: Index: src/contrib/fatjar/build.xml === --- src/contrib/fatjar/build.xml(revision 962637) +++ src/contrib/fatjar/build.xml(working copy) @@ -46,6 +46,7 @@ + thanx ben On 07/09/2010

Re: Suggested way to simulate client session expiration in unit tests?

2010-07-08 Thread Benjamin Reed
the difference between close and disconnect is that close will actually try to tell the server to kill the session before disconnecting. a paranoid lock implementation doesn't need to test it's session. it should just monitor watch events to look for disconnect and expired events. if a client

Re: Are Watchers execute sequentially or in parallel ?

2010-06-29 Thread Benjamin Reed
watchers are executed sequentially and in order. there is one dispatch thread that invokes the watch callbacks. ben ps - in 2) you do not install a watch. On 06/29/2010 06:13 AM, André Oriani wrote: Hi, Are Watchers executed sequentially or in parallel ? Suppose I want to monitor the childr

Re: integration tests

2010-06-23 Thread Benjamin Reed
we do this in our tests for ZooKeeper. bookkeeper uses the testing classes as well, unfortunately, we haven't documented the interface. ben On 06/22/2010 08:42 PM, Ishaaq Chandy wrote: Hi all, First some background: 1. We use maven as our build tool. 2. We use Hudson as our CI server, it is s

Re: is ZK client thread safe

2010-06-21 Thread Benjamin Reed
yes. (except for the single threaded C-client library :) ben On 06/17/2010 10:16 AM, Jun Rao wrote: Hi, Is ZK client thread safe? Is it ok for multiple threads sharing the same ZK client? Thanks, Jun

Re: Completions in C API

2010-06-03 Thread Benjamin Reed
the call is executed at a later time on a different thread. the zoo_a* calls are non-blocking, so (subject to the thread scheduling) usually they will return before the request completes. ben On 06/03/2010 01:24 PM, Jack Orenstein wrote: I'm trying to figure out how to use zookeeper's C API.

Re: zookeeper crash

2010-06-02 Thread Benjamin Reed
charity, do you mind going through your scenario again to give a timeline for the failure? i'm a bit confused as to what happened. ben On 06/02/2010 01:32 PM, Charity Majors wrote: Thanks. That worked for me. I'm a little confused about why it threw the entire cluster into an unusable state

Re: Securing ZooKeeper connections

2010-05-27 Thread Benjamin Reed
get SSL in. On 05/26/2010 04:44 PM, Mahadev Konar wrote: Hi Vishal, Ben (Benjamin Reed) has been working on a netty based client server protocol in ZooKeeper. I think there is an open jira for it. My network connection is pretty slow so am finding it hard to search for it. We have been

Re: problem connecting to zookeeper server

2010-05-20 Thread Benjamin Reed
good catch lei! if this helps gregory, can you open a jira to throw an exception in this situation. we should be throwing an invalid argument exception or something in this case. thanx ben On 05/20/2010 09:04 AM, Lei Zhang wrote: Seems you are passing in wrong arguments: Should have been:

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Benjamin Reed
is this a bug? shouldn't we be returning an error. ben On 05/12/2010 11:34 AM, Patrick Hunt wrote: I think that explains it then - the server is probably dropping the new (3.3.0) "getChildren" message (xid 7) as it (3.2.2 server) doesn't know about that message type. Then the server responds to

Re: How to ensure trasaction create-and-update

2010-03-30 Thread Benjamin Reed
i agree with ted. i think he points out some disadvantages with trying do do more. there is a slippery slope with these kinds of things. the implementation is complicated enough even with the simple model that we use. ben On 03/29/2010 08:34 PM, Ted Dunning wrote: I perhaps should not have sa

Re: Solitication for logging/debugging requirements

2010-03-29 Thread Benjamin Reed
awesome! that would be great ivan. i'm sure pat has some more concrete suggestions, but one simple thing to do is to run the unit tests and look at the log messages that get output. there are a couple of categories of things that need to be fixed (this is in no way exhaustive): 1) messages tha

Re: syncLimit explanation needed?

2010-03-18 Thread Benjamin Reed
yes it means in sync with the leader. syncLimit governs the timeout when a follower is actively following a leader. initLimit is the initial connection timeout. because there is the potential for more data that needs to be transmitted during the initial connection, we want to be able to manage

Re: cluster fails to start - broken snapshot?

2010-03-18 Thread Benjamin Reed
we have updated ZOOKEEPER-713 with much more detail, but the bottom line is that the Invalid snapshot was caused by an OutOfMemoryError. this turns out not be a problem since we recover using an older snapshot. there are other things that are happening that are the real causes of the problem. s

Re: permanent ZSESSIONMOVED

2010-03-16 Thread Benjamin Reed
weird, this does sound like a bug. do you have a reliable way of reproducing the problem? thanx ben On 03/16/2010 08:27 AM, Łukasz Osipiuk wrote: nope. I always pass 0 as clientid. Łukasz On Tue, Mar 16, 2010 at 16:20, Benjamin Reed wrote: do you ever use zookeeper_init() with the

Re: permanent ZSESSIONMOVED

2010-03-16 Thread Benjamin Reed
do you ever use zookeeper_init() with the clientid field set to something other than null? ben On 03/16/2010 07:43 AM, Łukasz Osipiuk wrote: Hi everyone! I am writing to this group because recently we are getting some strange errors with our production zookeeper setup. From time to time we

Re: Managing multi-site clusters with Zookeeper

2010-03-15 Thread Benjamin Reed
it is a bit confusing but initLimit is the timer that is used when a follower connects to a leader. there may be some state transfers involved to bring the follower up to speed so we need to be able to allow a little extra time for the initial connection. after that we use syncLimit to figure

Re: Znode ACL watcher?

2010-02-22 Thread Benjamin Reed
no, you cannot watch for ACL changes. it is one of the API/implementation simplifications we did since we didn't have a good use case for it. it does seem a little bit weird. we are following file system semantics here. i guess for ultimate security only clients with admin permission would be

Re: Ordering guarantees for async callbacks vs watchers

2010-02-11 Thread Benjamin Reed
just to expand on mahadev's answer a little bit: the basic guarantee is that you will see the watch event before you see the change. so let's say you call getChildren( "/foo", w, acb, ctx) twice and while you do that another client creates a child of /foo. there are three scenarios: 1) the cre

RE: When session expired event fired?

2010-02-08 Thread Benjamin Reed
i was looking through the docs to see if we talk about handling session expired, but i couldn't find anything. we should probably open a jira to add to the docs, unless i missed something. did i? ben -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, Fe

Re: ephemeral node after server bounce

2010-02-04 Thread Benjamin Reed
i second ted's proposals! thanx ted. there is one other option. when you create the ZooKeeper object you can pass a session id and password. your bounced server can actually reattach to the session. (that is why we put that constructor in.) to use it you need to save the session id and passwor

Re: how to handle re-add watch fails

2010-02-01 Thread Benjamin Reed
sadly connectionloss is the really ugly part of zookeeper! it is a pain to deal with. i'm not sure we have best practice, but i can tell you what i do :) ZOOKEEPER-22 is meant to alleviate this problem. i usually use the asynch API when handling the watch callback. in the completion function i

Re: Dependency on JBoss JMX

2010-01-28 Thread Benjamin Reed
there aren't any dependencies on jboss. can you clarify the dependency that you are seeing? thanx ben Gustavo Niemeyer wrote: Hello there, Is the dependency on JBoss a hard one, or is there a way to not use it? Perhaps an alternative package providing the same interface? I'm trying to get i

Re: Q about ZK internal: how commit is being remembered

2010-01-28 Thread Benjamin Reed
henry is correct. just to state another way, Zab guarantees that if a quorum of servers have accepted a transaction, the transaction will commit. this means that if less than a quorum of servers have accepted a transaction, we can commit or discard. the only constraint we have in choosing is or

Re: ZAB kick Paxos butt?

2010-01-20 Thread Benjamin Reed
hi Qing, i'm glad you like the page and Zab. yes, we are very familiar with Paxos. that page is meant to show a weakness of Paxos and a design point for Zab. it is not to say Paxos is not useful. Paxos is used in the real world in production systems. sometimes there are not order dependencies

RE: Does zookeeper support listening on a specified address?

2009-12-21 Thread Benjamin Reed
no please open a jira as a new feature request. sent from my droid -Original Message- From: Steve Chu [stv...@gmail.com] Received: 12/21/09 3:44 AM To: zookeeper-user@hadoop.apache.org [zookeeper-u...@hadoop.apache.org] Subject: Does zookeeper support listening on a specified address? H

Re: Share Zookeeper instance and Connection Limits

2009-12-16 Thread Benjamin Reed
I agree with Ted, it doesn't seem like a good idea to do in practice. however, you do have a couple of options if you are just testing things: 1) use tmpfs 2) you can set forceSync to "no" in the configuration file to disable syncing to disk before acknowledging responses 3) if you really want

Re: size of data / number of znodes

2009-12-15 Thread Benjamin Reed
there aren't any limits on the number of znodes, it's just limited by your memory. there are two things (probably more :) to keep in mind: 1) the 1M limit also applies to the children list. you can't grow the list of children to more than 1M (the sum of the names of all of the children) otherw

Re: Zookeeper Presentation

2009-11-13 Thread Benjamin Reed
there are a bunch of presentations you can grab at http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations ben Mark Vigeant wrote: Hey Everyone, I'm supposed to give a presentation next week about the basic functionality and uses of zookeeper. I was wondering if anybody out there had:

Re: Some thoughts on Zookeeper after using it for a while in the CXF/DOSGi subproject

2009-11-11 Thread Benjamin Reed
david, it should be pretty easy to do since we do it in our test cases. (start and stop servers.) the problem is that we haven't really exposed the interfaces. (but we have wanted to.) and we don't have tests for those non-existent exposed interfaces :) with a clean interface it should be prett

Re: Struggling with a simple configuration file.

2009-10-09 Thread Benjamin Reed
right at the beginning of http://hadoop.apache.org/zookeeper/docs/r3.2.1/zookeeperStarted.html it shows you the minimum standalone configuration. that doesn't explain the 0 id. i'd like to try an reproduce it. do you have an empty data directory with a single file, myid, set to 1? ben Leona

Re: How to expire a session

2009-09-25 Thread Benjamin Reed
so you have two problems going on. both have the same root: zookeeper_init returns before a connection and session is established with zookeeper, so you will not be able to fill in myid until a connection is made. you can do something with a mutex in the watcher to wait for a connection, or you

Re: The idea behind 'myid'

2009-09-25 Thread Benjamin Reed
is some details im not getting here :-) Regards, Orjan On Fri, Sep 25, 2009 at 3:56 PM, Benjamin Reed wrote: can you clarify what you are asking for? are you just looking for motivation? or are you trying to find out how to use it? the myid file just has the unique identifier (number) of

Re: The idea behind 'myid'

2009-09-25 Thread Benjamin Reed
can you clarify what you are asking for? are you just looking for motivation? or are you trying to find out how to use it? the myid file just has the unique identifier (number) of the server in the cluster. that number is matched against the id in the configuration file. there isn't much to sa

Re: Start problem of "Running Replicated ZooKeeper"

2009-09-23 Thread Benjamin Reed
oh yes, that is scenario that may generate a connection refused. ben Ted Dunning wrote: Good points. On the other hand, it could still be firewall issues. On Wed, Sep 23, 2009 at 8:30 AM, Benjamin Reed wrote: The "connection refused" message as opposed to no route to host,

Re: Start problem of "Running Replicated ZooKeeper"

2009-09-23 Thread Benjamin Reed
The "connection refused" message as opposed to no route to host, or unknown host, indicate that zookeeper has not been started on the other machines. are the other machines giving similar errors? ben Le Zhou wrote: Hi, I'm trying to install HBase 0.20.0 in fully distributed mode on my cluster

Re: ACL question w/ Zookeeper 3.1.1

2009-09-18 Thread Benjamin Reed
what error do you get? ben Todd Greenwood wrote: I'm attempting to secure a zookeeper installation using zookeeper ACLs. However, I'm finding that while Ids.OPEN_ACL_UNSAFE works great, my attempts at using Ids.CREATOR_ALL_ACL are failing. Here's a code snippet: public class ZooWrapper { /*

Re: zookeeper on ec2

2009-09-03 Thread Benjamin Reed
these suggestions would be great to put in a faq! thanx ted ben Ted Dunning wrote: I always used a large node for ZK to avoid sharing the machine, but the reason for doing that turned out to be incorrect. In fact, my problem was to do with GC on the client side. I can't believe that they are

RE: A question about "Connection timed out" and "operation timeout"

2009-08-20 Thread Benjamin Reed
are you using the single threaded or multithreaded C library? the exceeded deadline message means that our thread was supposed to get control after a certain period, but we got control that many milliseconds late. what is your session timeout? ben From:

Re: Errors when run zookeeper in windows ?

2009-08-19 Thread Benjamin Reed
good point david! zhang can you try david's scripts? we should probably commit those. thanx for pointing them out david. ben David Bosschaert wrote: FWIW, I've uploaded some Windows versions of the zookeeper scripts to https://issues.apache.org/jira/browse/ZOOKEEPER-426 a while ago. They run f

RE: exist return true before event comes in

2009-08-04 Thread Benjamin Reed
: Re: exist return true before event comes in Interesting, that basically means if I want strict order, I have to use the async api? ~~~ Hadoop training and consulting http://www.scaleunlimited.com http://www.101tec.com On Aug 3, 2009, at 8:10 PM, Benjamin Reed wrote

RE: exist return true before event comes in

2009-08-03 Thread Benjamin Reed
I assume you are calling the synchronous version of exists. The callbacks for both the watches and async calls are processed by a callback thread, so the ordering is strict. Synchronous call responses are not queued to the callback thread. (this allows you to make synchronous calls in callbacks

RE: c client header location

2009-08-02 Thread Benjamin Reed
Or maybe /usr/local/include/zookeeper but either way c-client-src is weird. Please open a jira. Thanx ben Sent from my phone. -Original Message- From: Michi Mutsuzaki Sent: Saturday, August 01, 2009 6:15 PM To: zookeeper-user@hadoop.apache.org Subject: c client header location Hello

Re: Zookeeper WAN Configuration

2009-07-24 Thread Benjamin Reed
the processing of the write transaction is described in the zookeeper internals presentation on http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations i think other presentations may also touch on it. we also have it in the ZooKeeper documentation: http://hadoop.apache.org/zookeeper/do

Re: Multiple ZK clusters or a single, shared cluster?

2009-07-17 Thread Benjamin Reed
ty concerns as well? Sorry for all the questions, just trying to get the story straight so that we don't spread misinformation to HBase users. Most users start out on very small clusters, so dedicated ZK nodes are not a realistic assumption... How big of a deal is that? JG Benjamin R

Re: Multiple ZK clusters or a single, shared cluster?

2009-07-17 Thread Benjamin Reed
we designed zk to have high performance so that it can be shared by multiple applications. the main thing is that you use dedicated zk machines (with a dedicated disk for logging). once you have that in place, watch the load on your cluster, as long as you aren't saturating the cluster you shou

Re: Question about the sequential flag on create.

2009-07-14 Thread Benjamin Reed
the create is atomic. we just use a data structure that does not store the list of children in order. ben Erik Holstad wrote: Hey Patrik! Thanks for the reply. I understand all the reasons that you posted above and totally agree that nodes should not be sorted since you then have to pay that o

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Benjamin Reed
sorry to jump in late. if i understand the scenario correctly, you are partitioned from ZK, but you still have access to the NN on which you are holding leases to files. the problem is that even though your ephemeral nodes may timeout, you are still holding a lease on the NN and recovery would

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-24 Thread Benjamin Reed
hat, I'll open a jira and give it a try. J-D On Tue, Jun 23, 2009 at 6:04 PM, Benjamin Reed wrote: ZooKeeper only tells you about states that it is sure about, so you will not get the Expired event until you reconnect to ZooKeeper. if you never connect again to ZooKeeper, you will not ge

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

2009-06-23 Thread Benjamin Reed
ZooKeeper only tells you about states that it is sure about, so you will not get the Expired event until you reconnect to ZooKeeper. if you never connect again to ZooKeeper, you will not get the Expired event. if you want to timeout using some sanity value, 2 times the session timeout for examp

"Fixing" ZooDefs.PERMS.ALL

2009-06-16 Thread Benjamin Reed
We have discovered that there is a bug in ZooDefs.PERMS.ALL: it is missing ZooDefs.PERMS.ADMIN, thus it isn't really ALL :) The problem is that the C binding includes ADMIN in ALL, so we have an inconsistency between the two bindings. We would like to fix this as a bug fix in the next release,

Re: zookeeper.getChildren asynchronous callback

2009-06-11 Thread Benjamin Reed
just to clarify i believe you are talking about callbacks on the watch object you are passing in the asynchronous call rather than the asynchronous completion callback. (Henry is making the same assumption.) when you say you are getting the callback 10 times, i believe your are talking about 10

  1   2   >