ZooKeeper approved by Apache Board as TLP!

2010-11-22 Thread Patrick Hunt
We are now officially an Apache TLP! http://bit.ly/9czN2x

As part of the process for moving out from under Hadoop and into full
TLP status we need to work through the following:
http://incubator.apache.org/guides/graduation.html#new-project-hand-over
If you are involved with the project, esp on the dev side, please
review these sections. Notice that a number of things will be
changing; mailinglist, source repo, wiki, etc...

I'll be sending out updates as we work through these, regards and
Congratulations everyone!

Patrick


Re: number of clients/watchers

2010-11-18 Thread Patrick Hunt
Camille, that's a very good question. Largest cluster I've heard about
is 10k sessions.

Jeremy - largest I've ever tested was a 3 server cluster with ~500
sessions. Each session created 10k znodes (100bytes each znode) and
set 5 watches on each. So 5 million znodes and 25million watches. I
then had the sessions delete the znodes and looked for the
notifications. They were processed by the clients quite quickly (order
of seconds) iirc. Note: this required some GC tuning on the servers to
operate correctly (in particular cms and incremental gc was turned on
and sufficient memory was allocated for the heaps).

here's a similar test setup I used:
http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
this is the latency tester tool
https://github.com/phunt/zk-smoketest

Patrick

On Thu, Nov 18, 2010 at 9:44 AM, Fournier, Camille F. [Tech]
camille.fourn...@gs.com wrote:
 Can you clarify what you mean when you say 10-100K watchers? Do you mean 
 10-100K clients with 1 active watch, or some lesser number of clients with 
 more watches, or a few clients doing a lot of watches and other clients doing 
 other things?

 -Original Message-
 From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
 Sent: Thursday, November 18, 2010 12:15 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: number of clients/watchers

 I had a question about number of clients against a zookeeper cluster.  I was 
 looking at having between 10,000 and 100,000 (towards 100,000) watchers 
 within a single datacenter at a given time.  Assuming that some fraction of 
 that number are active clients and the r/w ratio is well within the zookeeper 
 norms, is that number within the realm of possibility for zookeeper?  We're 
 going to do testing and benchmarking and things, but I didn't want to go down 
 a rabbit hole if this is simply too much for a single zookeeper cluster to 
 handle.   The numbers I've seen in blog posts vary and I saw that the 
 observers feature may be useful in this kind of setting.

 Maybe I'm underestimating zookeeper or maybe I don't have enough information 
 to tell.  I'm just trying to see if zookeeper is a good fit for our use case.

 Thanks.



Re: number of clients/watchers

2010-11-18 Thread Patrick Hunt
fyi: I haven't heard of anyone running over 10k sessions. I've tried
20k before and had issues, you may want to look at this sooner rather
than later.

* Server gc tuning will be an issue (be sure to use cms/incremental).
* Be sure to disable clients accessing the leader (server configuration param).
* You may need to use the Observers feature to scale out this large.

Patrick

On Thu, Nov 18, 2010 at 10:31 AM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:
 Can you clarify what you mean when you say 10-100K watchers? Do you mean 
 10-100K clients with 1 active watch, or some lesser number of clients with 
 more watches, or a few clients doing a lot of watches and other clients 
 doing other things?

 Probably 10-100K clients each with 1 or 2 active watches.  The clients will 
 respond to watch events and sometimes initiate actions of their own.

 here's a similar test setup I used:

 Thanks Patrick - it's really nice to have those numbers and test harness 
 basis.

 We're still in architecture mode so some of the details are still in flux, 
 but I think this gives us an idea.

 Thanks very much.

 On Nov 18, 2010, at 11:51 AM, Patrick Hunt wrote:

 Camille, that's a very good question. Largest cluster I've heard about
 is 10k sessions.

 Jeremy - largest I've ever tested was a 3 server cluster with ~500
 sessions. Each session created 10k znodes (100bytes each znode) and
 set 5 watches on each. So 5 million znodes and 25million watches. I
 then had the sessions delete the znodes and looked for the
 notifications. They were processed by the clients quite quickly (order
 of seconds) iirc. Note: this required some GC tuning on the servers to
 operate correctly (in particular cms and incremental gc was turned on
 and sufficient memory was allocated for the heaps).

 here's a similar test setup I used:
 http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
 this is the latency tester tool
 https://github.com/phunt/zk-smoketest

 Patrick

 On Thu, Nov 18, 2010 at 9:44 AM, Fournier, Camille F. [Tech]
 camille.fourn...@gs.com wrote:
 Can you clarify what you mean when you say 10-100K watchers? Do you mean 
 10-100K clients with 1 active watch, or some lesser number of clients with 
 more watches, or a few clients doing a lot of watches and other clients 
 doing other things?

 -Original Message-
 From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
 Sent: Thursday, November 18, 2010 12:15 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: number of clients/watchers

 I had a question about number of clients against a zookeeper cluster.  I 
 was looking at having between 10,000 and 100,000 (towards 100,000) watchers 
 within a single datacenter at a given time.  Assuming that some fraction of 
 that number are active clients and the r/w ratio is well within the 
 zookeeper norms, is that number within the realm of possibility for 
 zookeeper?  We're going to do testing and benchmarking and things, but I 
 didn't want to go down a rabbit hole if this is simply too much for a 
 single zookeeper cluster to handle.   The numbers I've seen in blog posts 
 vary and I saw that the observers feature may be useful in this kind of 
 setting.

 Maybe I'm underestimating zookeeper or maybe I don't have enough 
 information to tell.  I'm just trying to see if zookeeper is a good fit for 
 our use case.

 Thanks.





Re: number of clients/watchers

2010-11-18 Thread Patrick Hunt
On Thu, Nov 18, 2010 at 3:46 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:
 Unless I misunderstand, active watches aren't open sessions.  If that's the
 case, I don't think we'll hit the 10K-20K number of open sessions at a given
 time.  However, that's a good boundary to keep in mind as we put the system
 together.

Right. A session is represented by a ZooKeeper object. One session
per object. So if you
have 10 client hosts each creating it's own ZooKeeper instance you'll
have 10 sessions. This
is regardless of the number of znodes, watches, etc... Watches were
designed to be lightweight
and you can maintain a large number of them. (25million spread across
500 sessions in my example)

Patrick


 On 11/18/10 2:06 PM, Fournier, Camille F. [Tech] camille.fourn...@gs.com
 wrote:

 We tested up to the ulimit (~16K) of connections against a single server and
 performance was ok, but I would definitely try to do some serious load 
 testing
 before I put a system into production that I knew was going to have that load
 from the get-go.
 The system degrades VERY ungracefully when you hit the ulimit for the 
 process,
 so be sure to have enough ensemble nodes to spread those connections across
 that this won't happen. I think maybe there's a JIRA out to deal with this
 issue, not sure what the status is.

 C

 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Thursday, November 18, 2010 2:57 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: number of clients/watchers

 fyi: I haven't heard of anyone running over 10k sessions. I've tried
 20k before and had issues, you may want to look at this sooner rather
 than later.

 * Server gc tuning will be an issue (be sure to use cms/incremental).
 * Be sure to disable clients accessing the leader (server configuration
 param).
 * You may need to use the Observers feature to scale out this large.

 Patrick

 On Thu, Nov 18, 2010 at 10:31 AM, Jeremy Hanna
 jeremy.hanna1...@gmail.com wrote:
 Can you clarify what you mean when you say 10-100K watchers? Do you mean
 10-100K clients with 1 active watch, or some lesser number of clients with
 more watches, or a few clients doing a lot of watches and other clients
 doing other things?

 Probably 10-100K clients each with 1 or 2 active watches.  The clients will
 respond to watch events and sometimes initiate actions of their own.

 here's a similar test setup I used:

 Thanks Patrick - it's really nice to have those numbers and test harness
 basis.

 We're still in architecture mode so some of the details are still in flux,
 but I think this gives us an idea.

 Thanks very much.

 On Nov 18, 2010, at 11:51 AM, Patrick Hunt wrote:

 Camille, that's a very good question. Largest cluster I've heard about
 is 10k sessions.

 Jeremy - largest I've ever tested was a 3 server cluster with ~500
 sessions. Each session created 10k znodes (100bytes each znode) and
 set 5 watches on each. So 5 million znodes and 25million watches. I
 then had the sessions delete the znodes and looked for the
 notifications. They were processed by the clients quite quickly (order
 of seconds) iirc. Note: this required some GC tuning on the servers to
 operate correctly (in particular cms and incremental gc was turned on
 and sufficient memory was allocated for the heaps).

 here's a similar test setup I used:
 http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
 this is the latency tester tool
 https://github.com/phunt/zk-smoketest

 Patrick

 On Thu, Nov 18, 2010 at 9:44 AM, Fournier, Camille F. [Tech]
 camille.fourn...@gs.com wrote:
 Can you clarify what you mean when you say 10-100K watchers? Do you mean
 10-100K clients with 1 active watch, or some lesser number of clients with
 more watches, or a few clients doing a lot of watches and other clients
 doing other things?

 -Original Message-
 From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
 Sent: Thursday, November 18, 2010 12:15 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: number of clients/watchers

 I had a question about number of clients against a zookeeper cluster.  I
 was looking at having between 10,000 and 100,000 (towards 100,000) 
 watchers
 within a single datacenter at a given time.  Assuming that some fraction 
 of
 that number are active clients and the r/w ratio is well within the
 zookeeper norms, is that number within the realm of possibility for
 zookeeper?  We're going to do testing and benchmarking and things, but I
 didn't want to go down a rabbit hole if this is simply too much for a
 single zookeeper cluster to handle.   The numbers I've seen in blog posts
 vary and I saw that the observers feature may be useful in this kind of
 setting.

 Maybe I'm underestimating zookeeper or maybe I don't have enough
 information to tell.  I'm just trying to see if zookeeper is a good fit 
 for
 our use case.

 Thanks.








Re: Verifying Changes

2010-11-10 Thread Patrick Hunt
Perhaps something similar to what Ben detailed here? (rendezvous)
http://developer.yahoo.com/blogs/hadoop/posts/2009/05/using_zookeeper_to_tame_system/

Change the key, add child znode(s) that's deleted by the notified
client(s) once it's read the changed value. Some details need to be
worked out but seems reasonable.

Patrick

On Tue, Nov 9, 2010 at 6:42 PM, Ben Hall b...@zynga.com wrote:
 Hi All...

 Long time reader... First time writer...  Hehe...

 I am curious to know what successes people have had with verifying zookeeper 
 changes across a pool of clients.  I.E. Being able to verify that your 
 changed Key did in fact get pushed out to all of the subscribed clients.

 We are looking at creating a hash of the finished key value and comparing 
 that with what is on the ZK server... But curious if anyone has any smarter 
 ideas.

 Thanks
 Ben



Re: Key factors for production readiness of Hedwig

2010-11-10 Thread Patrick Hunt
On Wed, Nov 10, 2010 at 10:58 AM, Erwin Tam e...@yahoo-inc.com wrote:
 1. Ops tools including monitoring and administration.

Command port (4 letter words) for monitoring has worked extremely well
for zk. Whatever you do put the command port on a separate port, and
make it a full fledged feature rather than a hack (allow clients to
maintain sessions, allow more complex requests than just a 4letter
word, etc...). Perhaps in today's world you should just go with a REST
interface (easy using jersey) rather than try to implement a
4letterword. json/xml/text for free. easy to integrate with any
monitoring app or adhoc script.

Patrick


[Discussion] Some proposed logging (log4j) JIRAs

2010-11-09 Thread Patrick Hunt
I wanted to highlight a couple recent JIRAs that may have impact on
users (api consumers AND admins of the service) in the 3.4 timeframe.
If you want to weigh in please comment on the respective jira:

1) proposal to move to slf4j (remove/replace log4j)
https://issues.apache.org/jira/browse/ZOOKEEPER-850

from user perspective not much should change as slf4j has full support
for log4j as an engine. But I'm not fully versed on every
particular.

Note that hbase is in the process of moving
https://issues.apache.org/jira/browse/HBASE-2608 and Avro has already
moved to slf4j, not sure about some of the other hadoop ex-subs.


2) on a related note. We did a bunch of work in the 3.3 timeframe to
improve logging where the severity levels of log messages tended to be
too verbose (many items which should have been debug/trace were info).
Much of this was based on feedback we received from the hbase
community. However there are still some rough edges.

A recent JIRA
https://issues.apache.org/jira/browse/ZOOKEEPER-912
is proposing some additional changes. It would be good for
users/admins (consumers of the client api and those involved with
running the service itself) to weigh in if they have any
insights/preferences. My primary concern is that we are still able to
help users when they run into trouble - ie sufficient logging at info
level, not losing critical detail in the weeds of debug/trace level.
It's unfortunate that we only have 3 levels to play with here. FF to
weigh in.

Regards,

Patrick


Re: Running cluster behind load balancer

2010-11-04 Thread Patrick Hunt
Hi Chang, thanks for the insights, if you have a few minutes would you
mind updating the FAQ with some of this detail?
http://wiki.apache.org/hadoop/ZooKeeper/FAQ

Thanks!

Patrick

On Thu, Nov 4, 2010 at 6:27 AM, Chang Song tru64...@me.com wrote:

 Sorry. I made a mistake on retry timeout in load balancer section of my 
 answer.
 The same timeout applies to load balancer case as well (depends on the recv
 timeout)

 Thank you

 Chang


 On Nov 4, 2010, at 10:22 PM, Chang Song wrote:


 I would like to add some info on this.

 This may not be very important, but there are subtle differences.

 Two cases:  1. server hardware failure or kernel panic
                      2. zookeeper Java daemon process down

 In former one, timeout will be based on the timeout argument in 
 zookeeper_init().
 Partially based on ZK heartbeat algorithm. It recognize server down in 2/3 
 of the timeout.
 then retries at every timeout. For example, if timeout is 9000 msec, it
 first times out in 6 second, and retries every 9 seconds.

 In latter case (Java process down), since socket connect immediately returns
 refused connection, it can retry immediately.

 On top of that,

 - Hardware load balancer:
 If an ensemble cluster is serviced with hardware load balancer,
 zookeeper client will retry every 2 second since we only have one IP to try.

 - DNS RR:
 Make sure that nscd on your linux box is off since it is most likely that 
 DNS cache returns the same IP many times.
 This is actually worse than above since ZK client will retry the same dead 
 server every 2 seconds for some time.


 I think it is best not to use load balancer for ZK clients since ZK clients 
 will try next server immediately
 if previous one fails for some reason (based on timeout above). And this is 
 especially true if your cluster works in
 pseudo realtime environment where tickTime is set to very low.


 Chang


 On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote:

 DNS round-robin works as well.

 On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reed br...@yahoo-inc.com wrote:

 it would have to be a TCP based load balancer to work with ZooKeeper
 clients, but other than that it should work really well. The clients will 
 be
 doing heart beats so the TCP connections will be long lived. The client
 library does random connection load balancing anyway.

 ben

 On 11/03/2010 12:19 PM, Luka Stojanovic wrote:

 What would be expected behavior if a three node cluster is put behind a
 load
 balancer? It would ease deployment because all clients would be configured
 to target zookeeper.example.com regardless of actual cluster
 configuration,
 but I have impression that client-server connection is stateful and that
 jumping randomly from server to server could bring strange behavior.

 Cheers,

 --
 Luka Stojanovic
 lu...@vast.com
 Platform Engineering








Re: JUnit tests do not produce logs if the JVM crashes

2010-11-04 Thread Patrick Hunt
In addition to what Mahadev suggested you can also change the
log4j.properties to log to a file rather than the CONSOLE. Although
that just redirects the logs, if there is some output to stdout/stderr
then junit buffering is still in play.

Patrick

On Thu, Nov 4, 2010 at 8:15 AM, Mahadev Konar maha...@yahoo-inc.com wrote:
 Hi Andras,
  Junit unit will always buffer the logs unless you print it out to console.

 To do that, try running this

 ant test -Dtest.output=yes

 This will print out the logs to console as they are logged.

 Thanks
 mahadev


 On 11/4/10 3:33 AM, András Kövi allp...@gmail.com wrote:

 Hi all, I'm new to Zookeeper and ran into an issue while trying to run the
 tests with ant.

 It seems like the log output is buffered until the complete test suite
 finishes and it is flushed into its specific file only after then. I had to
 make some changes to the code (no JNI or similar) that resulted in JVM
 crashes. Since the logs are lost in this case, it is a little hard to debug
 the issue.

 Do you have any idea how I could disable the buffering?

 Thanks,
 Andras





Re: Running cluster behind load balancer

2010-11-04 Thread Patrick Hunt
Great, thanks!

On Thu, Nov 4, 2010 at 10:04 PM, Chang Song tru64...@me.com wrote:

 Benjamin.
 It looks like ZK clients can handle a list of IPs from DNS query correctly.
 Yes you are right.

 I am updating wiki per Patrick's request.

 Thanks a lot.

 Chang



 On Nov 5, 2010, at 1:10 AM, Benjamin Reed wrote:

 one thing to note: the if you are using a DNS load balancer, some load 
 balancers will return the list of resolved addresses in different orders to 
 do the balancing. the zookeeper client will shuffle that list before it it 
 used, so in reality, using a single DNS hostname resolving to all the server 
 addresses will probably work just as well as most DNS-based load balancers.

 ben

 On 11/04/2010 08:26 AM, Patrick Hunt wrote:
 Hi Chang, thanks for the insights, if you have a few minutes would you
 mind updating the FAQ with some of this detail?
 http://wiki.apache.org/hadoop/ZooKeeper/FAQ

 Thanks!

 Patrick

 On Thu, Nov 4, 2010 at 6:27 AM, Chang Songtru64...@me.com  wrote:
 Sorry. I made a mistake on retry timeout in load balancer section of my 
 answer.
 The same timeout applies to load balancer case as well (depends on the recv
 timeout)

 Thank you

 Chang


 On Nov 4, 2010, at 10:22 PM, Chang Song wrote:

 I would like to add some info on this.

 This may not be very important, but there are subtle differences.

 Two cases:  1. server hardware failure or kernel panic
                      2. zookeeper Java daemon process down

 In former one, timeout will be based on the timeout argument in 
 zookeeper_init().
 Partially based on ZK heartbeat algorithm. It recognize server down in 
 2/3 of the timeout.
 then retries at every timeout. For example, if timeout is 9000 msec, it
 first times out in 6 second, and retries every 9 seconds.

 In latter case (Java process down), since socket connect immediately 
 returns
 refused connection, it can retry immediately.

 On top of that,

 - Hardware load balancer:
 If an ensemble cluster is serviced with hardware load balancer,
 zookeeper client will retry every 2 second since we only have one IP to 
 try.

 - DNS RR:
 Make sure that nscd on your linux box is off since it is most likely 
 that DNS cache returns the same IP many times.
 This is actually worse than above since ZK client will retry the same 
 dead server every 2 seconds for some time.


 I think it is best not to use load balancer for ZK clients since ZK 
 clients will try next server immediately
 if previous one fails for some reason (based on timeout above). And this 
 is especially true if your cluster works in
 pseudo realtime environment where tickTime is set to very low.


 Chang


 On Nov 4, 2010, at 9:17 AM, Ted Dunning wrote:

 DNS round-robin works as well.

 On Wed, Nov 3, 2010 at 3:45 PM, Benjamin Reedbr...@yahoo-inc.com  
 wrote:

 it would have to be a TCP based load balancer to work with ZooKeeper
 clients, but other than that it should work really well. The clients 
 will be
 doing heart beats so the TCP connections will be long lived. The client
 library does random connection load balancing anyway.

 ben

 On 11/03/2010 12:19 PM, Luka Stojanovic wrote:

 What would be expected behavior if a three node cluster is put behind a
 load
 balancer? It would ease deployment because all clients would be 
 configured
 to target zookeeper.example.com regardless of actual cluster
 configuration,
 but I have impression that client-server connection is stateful and 
 that
 jumping randomly from server to server could bring strange behavior.

 Cheers,

 --
 Luka Stojanovic
 lu...@vast.com
 Platform Engineering








Re: question about watcher

2010-11-03 Thread Patrick Hunt
Definitely checkout the 4letter words then (wch*). Keep in mind
getting this data may be expensive (if there's alot of it) and that
watches are locak, so servers only know about the watches from
sessions est through it (server 1 doesn't know about watches of
sessions connected on server 2, 3, etc...).

Patrick

On Wed, Nov 3, 2010 at 1:13 AM, Qian Ye yeqian@gmail.com wrote:
 thanks Patrick, I want to know all watches set by all clients.
 I would open a jira and write some design think about it later.

 On Tue, Nov 2, 2010 at 11:53 PM, Patrick Hunt ph...@apache.org wrote:

 Hi Qian Ye, yes you should open a JIRA for this. If you want to work
 on a patch we could advise you. One thing not clear to me, are you
 interested in just the watches set by the particular client, or all
 watches set by all clients? The first should be relatively easy to
 get, the second would be more involved (the difference btw getting
 local watches and having to talk to the server to get all watches).
 Does this have to be a client api or more administrative in nature?
 Also see
 http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
 specifically the wchs,wchp,wchs 4 letter words.

 Regards,

 Patrick

 On Tue, Nov 2, 2010 at 4:11 AM, Qian Ye yeqian@gmail.com wrote:
  Hi all,
 
  Is there any progress about this issue? Should we open a new JIRA for it?
 We
  really need a way to know who set watchers on a specific node.
 
  thanks~
 
  On Thu, Aug 6, 2009 at 11:01 PM, Qian Ye yeqian@gmail.com wrote:
 
  Thanks Mahadev, I think it is a useful feature for many scenarios.
 
 
  On Thu, Aug 6, 2009 at 12:59 PM, Mahadev Konar maha...@yahoo-inc.com
 wrote:
 
  Hi Qian,
   There isnt any such api. We have been thinking abt adding an api on
  cancelling a cleints watches. We have been thinking about adding a proc
  filesystem wherein a cleintt will have a list of all the watches. This
  data
  can be used to know which clients are watching what znode, but this has
  always been in the future discussions for us. We DO NOT have anything
  planned in the near future for this.
 
  Thanks
  mahadev
 
 
  On 8/5/09 6:57 PM, Qian Ye yeqian@gmail.com wrote:
 
   Hi all:
  
   Is there a client API for querying the watchers' owner for a specific
  znode?
   In some situation, we want to find out who set watchers on the znode.
  
   thx
 
 
 
 
  --
  With Regards!
 
  Ye, Qian
  Made in Zhejiang University
 
 
 
 
  --
  With Regards!
 
  Ye, Qian
 




 --
 With Regards!

 Ye, Qian



Re: question about watcher

2010-11-02 Thread Patrick Hunt
Hi Qian Ye, yes you should open a JIRA for this. If you want to work
on a patch we could advise you. One thing not clear to me, are you
interested in just the watches set by the particular client, or all
watches set by all clients? The first should be relatively easy to
get, the second would be more involved (the difference btw getting
local watches and having to talk to the server to get all watches).
Does this have to be a client api or more administrative in nature?
Also see 
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
specifically the wchs,wchp,wchs 4 letter words.

Regards,

Patrick

On Tue, Nov 2, 2010 at 4:11 AM, Qian Ye yeqian@gmail.com wrote:
 Hi all,

 Is there any progress about this issue? Should we open a new JIRA for it? We
 really need a way to know who set watchers on a specific node.

 thanks~

 On Thu, Aug 6, 2009 at 11:01 PM, Qian Ye yeqian@gmail.com wrote:

 Thanks Mahadev, I think it is a useful feature for many scenarios.


 On Thu, Aug 6, 2009 at 12:59 PM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi Qian,
  There isnt any such api. We have been thinking abt adding an api on
 cancelling a cleints watches. We have been thinking about adding a proc
 filesystem wherein a cleintt will have a list of all the watches. This
 data
 can be used to know which clients are watching what znode, but this has
 always been in the future discussions for us. We DO NOT have anything
 planned in the near future for this.

 Thanks
 mahadev


 On 8/5/09 6:57 PM, Qian Ye yeqian@gmail.com wrote:

  Hi all:
 
  Is there a client API for querying the watchers' owner for a specific
 znode?
  In some situation, we want to find out who set watchers on the znode.
 
  thx




 --
 With Regards!

 Ye, Qian
 Made in Zhejiang University




 --
 With Regards!

 Ye, Qian



Re: Getting a node exists code on a sequence create

2010-11-01 Thread Patrick Hunt
Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short period
of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Stribling st...@nicira.com wrote:
 HI everyone,

 Is there any situation in which creating a new ZK node with the SEQUENCE
 flag should result in a node exists error?  I'm seeing this happening
 after a failure of a ZK node that appeared to have been the master; when the
 new master takes over, my app is unable to create a new SEQUENCE node under
 an existing parent node.  I'm using Zookeeper 3.2.2.

 Here's a representative log snippet:

 --
 3050756 [ProcessThread:-1] TRACE
 org.apache.zookeeper.server.PrepRequestProcessor  -
 :Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
 zxid:0xfffe txntype:unknown /zkrsm/_record
 3050756 [ProcessThread:-1] WARN
 org.apache.zookeeper.server.PrepRequestProcessor  - Got exception when
 processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
 zxid:0xfffe txntype:unknown n/a
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
 NodeExists
        at
 org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
        at
 org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
 3050756 [ProcessThread:-1] DEBUG
 org.apache.zookeeper.server.quorum.CommitProcessor  - Processing request::
 sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050756 [ProcessThread:-1] DEBUG org.apache.zookeeper.server.quorum.Leader
  - Proposing:: sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
 zxid:0x5027e txntype:-1 n/a
 3050756 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
 Ack zxid: 0x5027e
 3050757 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
 outstanding proposal: 0x5027e
 3050757 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
 outstanding proposals all
 3050757 [SyncThread:0] DEBUG org.apache.zookeeper.server.quorum.Leader  -
 Count for zxid: 0x5027e is 1
 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
 org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
 org.apache.zookeeper.server.quorum.Leader  - outstanding proposal:
 0x5027e
 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
 org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
 org.apache.zookeeper.server.quorum.Leader  - Count for zxid: 0x5027e is
 2
 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
 org.apache.zookeeper.server.quorum.CommitProcessor  - Committing request::
 sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050757 [CommitProcessor:0] DEBUG
 org.apache.zookeeper.server.FinalRequestProcessor  - Processing request::
 sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050757 [CommitProcessor:0] TRACE
 org.apache.zookeeper.server.FinalRequestProcessor  -
 :Esessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
 org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
 org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
 3050757 [FollowerHandler-/172.16.0.28:41062] DEBUG
 org.apache.zookeeper.server.quorum.Leader  - outstanding is 0
 --

 I'm still a n00b at understanding ZK log messages, so maybe there's
 something obvious going on.  I looked in the JIRA and did my best to search
 the mailing list archives, but couldn't find anything related to this.  Any
 ideas?  Thanks very much,

 Jeremy



Re: Setting the heap size

2010-11-01 Thread Patrick Hunt
Actually if you are going to admin your own ZK it's probably a good
idea to review that Admin doc fully. Some other good detail in there
(backups and cleaning the datadir for example).

Regards,

Patrick

On Fri, Oct 29, 2010 at 7:22 AM, Tim Robertson
timrobertson...@gmail.com wrote:
 Great - thanks Patrick!


 On Thu, Oct 28, 2010 at 6:13 PM, Patrick Hunt ph...@apache.org wrote:
 Tim, one other thing you might want to be aware of:
 http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_supervision

 Patrick

 On Thu, Oct 28, 2010 at 9:11 AM, Patrick Hunt ph...@apache.org wrote:
 On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson
 timrobertson...@gmail.com wrote:
 We are setting up a small Hadoop 13 node cluster running 1 HDFS
 master, 9 region severs for HBase and 3 map reduce nodes, and are just
 installing zookeeper to perform the HBase coordination and to manage a
 few simple process locks for other tasks we run.

 Could someone please advise what kind on heap we should give to our
 single ZK node and also (ahem) how does one actually set this? It's
 not immediately obvious in the docs or config.

 The amount of heap necessary will be dependent on the application(s)
 using ZK, also configuration of the heap is dependent on what
 packaging you are using to start ZK.

 Are you using zkServer.sh from our distribution? If so then you
 probably want to set JVMFLAGS env variable. We pass this through to
 the jvm, see -Xmx in the man page
 (http://www.manpagez.com/man/1/java/)

 Given this is Hbase (which I'm reasonably familiar with) the default
 heap should be fine. However you might want to check with the Hbase
 team on that.

 I'd also encourage you to enter a JIRA on the (lack of) doc issue you
 highlighted:  https://issues.apache.org/jira/browse/ZOOKEEPER

 Regards,

 Patrick





Re: Setting the heap size

2010-11-01 Thread Patrick Hunt
Actually if you are going to admin your own ZK it's probably a good
idea to review that Admin doc fully. Some other good detail in there
(backups and cleaning the datadir for example).

Regards,

Patrick

On Fri, Oct 29, 2010 at 7:22 AM, Tim Robertson
timrobertson...@gmail.com wrote:
 Great - thanks Patrick!


 On Thu, Oct 28, 2010 at 6:13 PM, Patrick Hunt ph...@apache.org wrote:
 Tim, one other thing you might want to be aware of:
 http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_supervision

 Patrick

 On Thu, Oct 28, 2010 at 9:11 AM, Patrick Hunt ph...@apache.org wrote:
 On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson
 timrobertson...@gmail.com wrote:
 We are setting up a small Hadoop 13 node cluster running 1 HDFS
 master, 9 region severs for HBase and 3 map reduce nodes, and are just
 installing zookeeper to perform the HBase coordination and to manage a
 few simple process locks for other tasks we run.

 Could someone please advise what kind on heap we should give to our
 single ZK node and also (ahem) how does one actually set this? It's
 not immediately obvious in the docs or config.

 The amount of heap necessary will be dependent on the application(s)
 using ZK, also configuration of the heap is dependent on what
 packaging you are using to start ZK.

 Are you using zkServer.sh from our distribution? If so then you
 probably want to set JVMFLAGS env variable. We pass this through to
 the jvm, see -Xmx in the man page
 (http://www.manpagez.com/man/1/java/)

 Given this is Hbase (which I'm reasonably familiar with) the default
 heap should be fine. However you might want to check with the Hbase
 team on that.

 I'd also encourage you to enter a JIRA on the (lack of) doc issue you
 highlighted:  https://issues.apache.org/jira/browse/ZOOKEEPER

 Regards,

 Patrick





Re: Setting the heap size

2010-10-28 Thread Patrick Hunt
On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson
timrobertson...@gmail.com wrote:
 We are setting up a small Hadoop 13 node cluster running 1 HDFS
 master, 9 region severs for HBase and 3 map reduce nodes, and are just
 installing zookeeper to perform the HBase coordination and to manage a
 few simple process locks for other tasks we run.

 Could someone please advise what kind on heap we should give to our
 single ZK node and also (ahem) how does one actually set this? It's
 not immediately obvious in the docs or config.

The amount of heap necessary will be dependent on the application(s)
using ZK, also configuration of the heap is dependent on what
packaging you are using to start ZK.

Are you using zkServer.sh from our distribution? If so then you
probably want to set JVMFLAGS env variable. We pass this through to
the jvm, see -Xmx in the man page
(http://www.manpagez.com/man/1/java/)

Given this is Hbase (which I'm reasonably familiar with) the default
heap should be fine. However you might want to check with the Hbase
team on that.

I'd also encourage you to enter a JIRA on the (lack of) doc issue you
highlighted:  https://issues.apache.org/jira/browse/ZOOKEEPER

Regards,

Patrick


Re: Setting the heap size

2010-10-28 Thread Patrick Hunt
Tim, one other thing you might want to be aware of:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_supervision

Patrick

On Thu, Oct 28, 2010 at 9:11 AM, Patrick Hunt ph...@apache.org wrote:
 On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson
 timrobertson...@gmail.com wrote:
 We are setting up a small Hadoop 13 node cluster running 1 HDFS
 master, 9 region severs for HBase and 3 map reduce nodes, and are just
 installing zookeeper to perform the HBase coordination and to manage a
 few simple process locks for other tasks we run.

 Could someone please advise what kind on heap we should give to our
 single ZK node and also (ahem) how does one actually set this? It's
 not immediately obvious in the docs or config.

 The amount of heap necessary will be dependent on the application(s)
 using ZK, also configuration of the heap is dependent on what
 packaging you are using to start ZK.

 Are you using zkServer.sh from our distribution? If so then you
 probably want to set JVMFLAGS env variable. We pass this through to
 the jvm, see -Xmx in the man page
 (http://www.manpagez.com/man/1/java/)

 Given this is Hbase (which I'm reasonably familiar with) the default
 heap should be fine. However you might want to check with the Hbase
 team on that.

 I'd also encourage you to enter a JIRA on the (lack of) doc issue you
 highlighted:  https://issues.apache.org/jira/browse/ZOOKEEPER

 Regards,

 Patrick



Re: Retrying sequential znode creation

2010-10-25 Thread Patrick Hunt
On Wed, Oct 20, 2010 at 3:27 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 These corner cases are relatively rare, I would think (I personally keep
 logs around for days or longer).


A concern I would have is that it does add complexity, would be hard to
debug...


 Would it be possible to get a partial solution in place that invokes the
 current behavior if logs aren't available?


Seems like it's possible. The issue of finding a viable solution (one where,
for example, the memory overhead is limited) is still an issue though. In
the end it wouldn't really help the end user, given they would still have to
code for this corner case.

Patrick


 On Wed, Oct 20, 2010 at 10:42 AM, Patrick Hunt phu...@gmail.com wrote:

  Hi Ted, Mahadev is in the best position to comment (he looked at it last)
  but iirc when we started looking into implementing this we immediately
 ran
  into so big questions. One was what to do if the logs had been cleaned up
  and the individual transactions no longer available. This could be
 overcome
  by changes wrt cleanup, log rotation, etc... There was another more
  bulletproof option, essentially to keep all the changes in memory that
  might
  be necessary to implement 22, however this might mean a significant
  increase
  in mem requirements and general bookkeeping. It turned out (again correct
  me
  if I'm wrong) that more thought was going to be necessary, esp around
  ensuring correct operation in any/all special cases.
 
  Patrick
 
  On Wed, Oct 13, 2010 at 12:49 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   Patrick,
  
   What are these hurdles?  The last comment on ZK-22 was last winter.
  Back
   then, it didn't sound like
   it was going to be that hard.
  
   On Wed, Oct 13, 2010 at 12:08 PM, Patrick Hunt ph...@apache.org
 wrote:
  
22 would help with this issue
https://issues.apache.org/jira/browse/ZOOKEEPER-22
however there are some real hurdles to implementing 22 successfully.
   
  
 



Re: Reading znodes directly from snapshot and log files

2010-10-25 Thread Patrick Hunt
Sounds like a useful utility, the closest that I know of is this:
http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/server/LogFormatter.html
but it just dumps the txn log. Seems like it would be cool to be able to
open a shell on the datadir and query it (separate from running a server).

Another option is to just copy the datadir and start a standalone zk
instance on it. You can then use the std zk shell to query it.

Patrick

ps. I had worked on something similar in python a while back:
http://github.com/phunt/zk-txnlog-tools/blob/master/parse_txnlog.py


On Thu, Oct 21, 2010 at 2:31 PM, Vishal K vishalm...@gmail.com wrote:

 Hi,

 Is it possible to read znodes directly from snapshot and log files instead
 of usign ZooKeeper API. In case a ZK ensemble is not available, can I login
 to all available nodes and run a utility that will dump all znodes?

 Thanks.
 -Vishal



Re: Stale value for read request

2010-10-25 Thread Patrick Hunt
On Sat, Oct 23, 2010 at 9:03 PM, jingguo yao yaojing...@gmail.com wrote:

 Read requests are handled locally at each Zookeeper server. So it is
 possible for a read request to return a stale value even though a more
 recent update to the same znode has been committed. Does this statement
 still hold if the Zookeeper follower serving the read request is the one
 which has just served the recent update request?


It's probably good to start with the explicit guarantees:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkGuarantees

Yes (it could still get stale data from quorum perspective). The leader
may have committed a new change that has not yet been seen by the follower
(ie two changes in quick succession)


 For example, client A connects to follower X. And client A issues a request
 to update znode /a from 0 to 1. After receiving this request, follower X
 forwards this request to the leader. Then the leader broadcasts this update
 proposal to all the Zookeeper servers. After a quorum of the followers
 commit the update request, the update succeeds. Then client A issues a read
 request to get the value of znode /a. And follower X receives this read
 request. So if follower X is not among the quorum and follower X has not
 committed the update to catch up with the leader, it is still possible for
 client A to get a stale value of znode /a. In this case, the return value
 is
 0.

 Is my understanding correct?


That's correct. See the the NOTE in the section (link) I provided above.

Patrick


Re: Unusual exception

2010-10-20 Thread Patrick Hunt
EOS means that the client closed the connection (from the point of view of
the server). The server then tries to cleanup by closing the socket
explicitly, in some cases that results in debug messages you see subsequent.

EndOfStreamException: Unable to
read additional data from client sessionid 0x0, likely client has closed
socket

Notice that the session id is 0 - so either this is a zk client that failed
before establishing a session, or more likely it's a monitoring/4letterword
command (which never est sessions).

Patrick

On Wed, Oct 13, 2010 at 2:49 PM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:

 I started seeing a bunch of these exceptions. What do these mean?

 2010-10-13 14:01:33,426 - WARN [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to
 read additional data from client sessionid 0x0, likely client has closed
 socket
 2010-10-13 14:01:33,426 - INFO [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for
 client /10.138.34.195:55738 (no session established for client)
 2010-10-13 14:01:33,426 - DEBUG [CommitProcessor:1:finalrequestproces...@78
 ]
 - Processing request:: sessionid:0x12b9d1f8b907a44 type:closeSession
 cxid:0x0 zxid:0x600193996 txntype:-11 reqpath:n/a
 2010-10-13 14:01:33,427 - WARN [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to
 read additional data from client sessionid 0x12b9d1f8b907a5d, likely client
 has closed socket
 2010-10-13 14:01:33,427 - INFO [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for
 client /10.138.34.195:55979 which had sessionid 0x12b9d1f8b907a5d
 2010-10-13 14:01:33,427 - DEBUG [QuorumPeer:/0.0.0.0:5001
 :commitproces...@159] - Committing request:: sessionid:0x52b90ab45bd51af
 type:createSession cxid:0x0 zxid:0x600193cf9 txntype:-10 reqpath:n/a
 2010-10-13 14:01:33,427 - DEBUG [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1302] - ignoring exception during
 output
 shutdown
 java.net.SocketException: Transport endpoint is not connected
 at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
 at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
 at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
 at

 org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1298)
 at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263)
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609)
 at

 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
 2010-10-13 14:01:33,428 - DEBUG [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1310] - ignoring exception during input
 shutdown
 java.net.SocketException: Transport endpoint is not connected
 at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
 at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
 at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
 at

 org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306)
 at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263)
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609)
 at

 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
 2010-10-13 14:01:33,428 - WARN [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to
 read additional data from client sessionid 0x0, likely client has closed
 socket
 2010-10-13 14:01:33,428 - INFO [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for
 client /10.138.34.195:55731 (no session established for client)



Re: zxid integer overflow

2010-10-20 Thread Patrick Hunt
I'm not aware of sustained 1k/sec, Ben might know how long the 20k/sec test
runs for (and for how long that rate is sustained). You'd definitely want to
tune the GC, GC related pauses would be the biggest obstacle for this
(assuming you are using a dedicated log device for the transaction logs).

Patrick

On Tue, Oct 19, 2010 at 3:14 PM, Sandy Pratt prat...@adobe.com wrote:

 Follow up question: does anyone have a production cluster that handles a
 similar sustained rate of changes?

 -Original Message-
 From: Benjamin Reed [mailto:br...@yahoo-inc.com]
 Sent: Tuesday, October 19, 2010 2:53 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: zxid integer overflow

  we should put in a test for that. it is certainly a plausible scenario. in
 theory it will just flow into the next epoch and everything will be fine,
 but we should try it and see.

 ben

 On 10/19/2010 11:33 AM, Sandy Pratt wrote:
  Just as a thought experiment, I was pondering the following:
 
  ZK stamps each change to its managed state with a zxid (
 http://hadoop.apache.org/zookeeper/docs/r3.2.1/zookeeperInternals.html).
  That ID consists of a 64 bit number in which the upper 32 bits are the
 epoch, which changes when the leader does, and the bottom 32 bits are a
 counter, which is incremented by the leader with every change.  If 1000
 changes are made to ZK state each second (which is 1/20th of the peak rate
 advertised), then the counter portion will roll over in 2^32 / (86400 *
 1000) = 49 days.
 
  Now, assuming that my math is correct, is this an actual concern?  For
 example, if I'm using ZK to provide locking for a key value store that
 handles transactions at about that rate, am I setting myself up for failure?
 
  Thanks,
 
  Sandy




Re: Testing zookeeper outside the source distribution?

2010-10-18 Thread Patrick Hunt
You might checkout a tool I built a while back to be used by operations
teams deploying ZooKeeper: http://bit.ly/a6tGVJ

It's really two tools actually, a smoketester and a latency tester, both of
which are important to verify when deploying a new cluster.

Patrick

On Mon, Oct 18, 2010 at 9:50 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Generally, I think a better way to do this is to use a standard mock object
 framework.  Then you don't have to fake up an interface.

 But the original poster probably has a need to do integration tests more
 than unit tests.  In such tests, they need to test against a real ZK to
 make
 sure that their assumptions about the semantics of ZK are valid.

 On Mon, Oct 18, 2010 at 8:53 AM, David Rosenstrauch dar...@darose.net
 wrote:

  Consequently, the way I write my code for ZooKeeper is against a more
  generic interface that provides operations for open, close, getData, and
  setData.  When unit testing, I substitute in a dummy implementation
 that
  just stores data in memory (i.e., a HashMap); when running live code I
 use
  an implementation that talks to ZooKeeper.
 



Re: Testing zookeeper outside the source distribution?

2010-10-18 Thread Patrick Hunt
You might checkout a tool I built a while back to be used by operations
teams deploying ZooKeeper: http://bit.ly/a6tGVJ

It's really two tools actually, a smoketester and a latency tester, both of
which are important to verify when deploying a new cluster.

Patrick

On Mon, Oct 18, 2010 at 9:50 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Generally, I think a better way to do this is to use a standard mock object
 framework.  Then you don't have to fake up an interface.

 But the original poster probably has a need to do integration tests more
 than unit tests.  In such tests, they need to test against a real ZK to
 make
 sure that their assumptions about the semantics of ZK are valid.

 On Mon, Oct 18, 2010 at 8:53 AM, David Rosenstrauch dar...@darose.net
 wrote:

  Consequently, the way I write my code for ZooKeeper is against a more
  generic interface that provides operations for open, close, getData, and
  setData.  When unit testing, I substitute in a dummy implementation
 that
  just stores data in memory (i.e., a HashMap); when running live code I
 use
  an implementation that talks to ZooKeeper.
 



Re: What does this mean?

2010-10-13 Thread Patrick Hunt
On Mon, Oct 11, 2010 at 4:16 PM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:

 tickTime = 2000, initLimit = 3000 and the data is around 11GB this is log +
 snapshot. So if I need to add a new observer can I transfer state from the
 ensemble manually before starting it? If so which files do I need to
 transfer?


You can't really do it manually. As part of the bring up process for a
server it communicates with the current leader and downloads the appropriate
data (either a diff of the recent changes or a full snapshot if too far
behind ). Try increasing your initLimit to 15 or so (btw, that' in ticks,
not milliseconds, so if you have 3000 now that's probably not the issue ;-)
). You might also want to increase the syncLimit at the same time. Here's
from the sample conf that ships with the release:

# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5

Patrick



 Thanks

 On Mon, Oct 11, 2010 at 10:16 AM, Benjamin Reed br...@yahoo-inc.com
 wrote:

   how big is your data? you may be running into the problem where it takes
  too long to do the state transfer and times out. check the initLimit and
 the
  size of your data.
 
  ben
 
 
  On 10/10/2010 08:57 AM, Avinash Lakshman wrote:
 
  Thanks Ben. I am not mixing processes of different clusters. I just
 double
  checked that. I have ZK deployed in a 5 node cluster and I have 20
  observers. I just started the 5 node cluster w/o starting the observers.
 I
  still the same issue. Now my cluster won't start up. So what is the
  correct
  workaround to get this going? How can I find out who the leader is and
 who
  the follower to get more insight?
 
  Thanks
  A
 
  On Sun, Oct 10, 2010 at 8:33 AM, Benjamin Reedbr...@yahoo-inc.com
   wrote:
 
   this usually happens when a follower closes its connection to the
 leader.
  it is usually caused by the follower shutting down or failing. you may
  get
  further insight by looking at the follower logs. you should really run
  with
  timestamps on so that you can correlate the logs of the leader and
  follower.
 
  on thing that is strange is the wide divergence between zxid of
 follower
  and leader. are you mixing processes of different clusters?
 
  ben
 
  
  From: Avinash Lakshman [avinash.laksh...@gmail.com]
  Sent: Sunday, October 10, 2010 8:18 AM
  To: zookeeper-user
  Subject: What does this mean?
 
  I see this exception and the servers not doing anything.
 
  java.io.IOException: Channel eof
 at
 
 
 
 org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
  ERROR - 124554051584(higestZxid)  21477836646(next log) for type -11
  WARN - Sending snapshot last zxid of peer is 0xe  zxid of
 leader
  is
  0x1e
  WARN - Sending snapshot last zxid of peer is 0x18  zxid of
 leader
  is
  0x1eg
   WARN - Sending snapshot last zxid of peer is 0x5002dc766  zxid of
 leader
  is
  0x1e
  WARN - Sending snapshot last zxid of peer is 0x1c  zxid of
 leader
  is
  0x1e
  ERROR - Unexpected exception causing shutdown while sock still open
  java.net.SocketException: Broken pipe
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at
  java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at
 java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at
  java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at
  java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
 at java.io.DataOutputStream.writeInt(DataOutputStream.java:180)
 at
 
 org.apache.jute.BinaryOutputArchive.writeInt(BinaryOutputArchive.java:55)
 at
 
 org.apache.zookeeper.data.StatPersisted.serialize(StatPersisted.java:116)
 at
  org.apache.zookeeper.server.DataNode.serialize(DataNode.java:167)
 at
 
 
 
 org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:967)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
 at
  org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:982)
 at
  org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1031)
 at
 
 
 
 org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
 at
 
 
 
 org.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:426)
 at
 
 
 
 org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:331)
  WARN - *** GOODBYE /10.138.34.212:33272 
 
  Avinash
 
 
 



Re: Retrying sequential znode creation

2010-10-13 Thread Patrick Hunt
On Wed, Oct 13, 2010 at 5:58 AM, Vishal K vishalm...@gmail.com wrote:

 However, gets trickier because there is no explicit way (to my knowledge)
 to
 get CreateMode for a znode. As a result, we cannot tell whether a node is
 sequential or not.


Sequentials are really just regular znodes with fancy naming applied by the
cluster at create time, subsequently it makes no distinction. Using the
format of the name would be the be only/best way I know if you want to
distinguish yourself. (or put some data into the znode itself)

22 would help with this issue
https://issues.apache.org/jira/browse/ZOOKEEPER-22
however there are some real hurdles to implementing 22 successfully.

Patrick



 Thanks.
 -Vishal


 On Tue, Oct 12, 2010 at 5:36 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  Yes.  This is indeed a problem.  I generally try to avoid sequential
 nodes
  unless they are ephemeral and if I get an error on
  creation, I generally have to either tear down the connection (losing all
  other ephemeral nodes in the process) or scan through
  all live nodes trying to determine if mine got created.  Neither is a
 very
  acceptable answer so I try to avoid the problem.
 
  Your UUID answer is one option.  At least you know what file got created
  (or
  not) and with good naming you can pretty much guarantee no collisions.
  You
  don't have to scan all children since you can simply check for the
  existence
  of the file of interest.
 
  There was a JIRA filed that was supposed to take care of this problem,
 but
  I
  don't know the state of play there.
 
  On Tue, Oct 12, 2010 at 12:11 PM, Vishal K vishalm...@gmail.com wrote:
 
   Hi,
  
   What is the best approach to have an idempotent create() operation for
 a
   sequential node?
  
   Suppose a client is trying to create a sequential node and it gets a
   ConnectionLoss KeeperException, it cannot know for sure whether the
  request
   succeeded or not. If in the meantime, the client's session is
   re-established, the client would like to create a sequential znode
 again.
   However, the client needs to know if its earlier request has succeeded
 or
   not. If it did, then the client does not need to retry. To my
  understanding
   ZooKeeper does not provide this feature. Can someone confirm this?
  
   External to ZooKeeper, the client can either set a unique UUID in the
  path
   to the create call or write the UUID as part of its data. Before
  retrying,
   it can read back all the children of the parent znode and go through
 the
   list to determine if its earlier request had succeeded. This doesn't
  sound
   that appealing to me.
  
   I am guessing this is a common problem that many would have faced. Can
   folks
   give a feedback on what their approach was?
  
   Thanks.
   -Vishal
  
 



Re: Changing configuration

2010-10-07 Thread Patrick Hunt
You probably want to do a rolling restart, this is preferable over
restart the cluster as the service will not go down.
http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6

http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6Patrick

On Wed, Oct 6, 2010 at 9:49 PM, Avinash Lakshman avinash.laksh...@gmail.com
 wrote:

 Suppose I have a 3 node ZK cluster composed of machines A, B and C. Now for
 whatever reason I lose C forever and the machine needs to be replaced. How
 do I handle this situation? Update the config with D in place of C and
 restart the cluster? Also if I am interested in read just the ZAB portions
 which packages should I be looking at?

 Cheers
 A



Re: snapshots

2010-10-07 Thread Patrick Hunt
Simplified: when a server comes back up it checks it's local snaps/logs to
reconstruct as much of the current state as possible. It then checks with
the leader to see how far behind it is, at which point it either gets a diff
or gets a full snapshot (from the leader) depending on how far behind it is.

Patrick

On Wed, Oct 6, 2010 at 8:11 PM, Avinash Lakshman avinash.laksh...@gmail.com
 wrote:

 Hi All

 Are snapshots serialized dumps of the DataTree taken whenever a log rolls
 over? So when a server goes down and comes back up does it construct the
 data tree from the snapshots? What if I am running this on a machine with
 SSD as extended RAM how does it affect anything?

 Cheers
 A



Re: znode inconsistencies across ZooKeeper servers

2010-10-07 Thread Patrick Hunt
Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this
description, your configuration files from all servers, and the log files
from all servers during the time of the incident? If you could run the
servers in DEBUG level logging during the time you reproduce the issue that
would probably help:
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks!

Patrick

On Wed, Oct 6, 2010 at 2:57 PM, Vishal K vishalm...@gmail.com wrote:

 Hi Patrick,

 You are correct, the test restarts both ZooKeeper server and the client.
 The
 client opens a new connection after restarting. So we would expect that the
 ephmeral znode (/foo) to expire after the session timeout. However, the
 client with the new session creates the ephemeral znode (/foo) again after
 it reboots (it sets a watch for /foo and recreates /foo if it is deleted or
 doesn't exist). The client is not reusing the session ID. What I expect to
 see is that the older /foo should expire after which a new /foo should get
 created. Is my expectation correct?

 What confuses me is the following output of 3 successive getstat /foo
 requests on A (the zxid, time and owner fields).  Notice that the older
 znode reappeared.
 At the same time when I do getstat at B and C, I see the newer /foo.

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x105ef
 ctime = Tue Oct 05 15:00:50 UTC 2010
 mZxid = 0x105ef
 mtime = Tue Oct 05 15:00:50 UTC 2010
 pZxid = 0x105ef
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce57ce4
 dataLength = 54
 numChildren = 0

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x10607
 ctime = Tue Oct 05 15:01:07 UTC 2010
 mZxid = 0x10607
 mtime = Tue Oct 05 15:01:07 UTC 2010
 pZxid = 0x10607
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce5bda4
 dataLength = 54
 numChildren = 0

 log4j:WARN No appenders could be found for logger
 (org.apache.zookeeper.ZooKeeper).
 log4j:WARN Please initialize the log4j system properly.
 cZxid = 0x105ef
 ctime = Tue Oct 05 15:00:50 UTC 2010
 mZxid = 0x105ef
 mtime = Tue Oct 05 15:00:50 UTC 2010
 pZxid = 0x105ef
 cversion = 0
 dataVersion = 0
 aclVersion = 0
 ephemeralOwner = 0x2b7ce57ce4
 dataLength = 54
 numChildren = 0

 Thanks for your help.

 -Vishal

 On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt ph...@apache.org wrote:

  Vishal the attachment seems to be getting removed by the list daemon (I
  don't have it), can you create a JIRA and attach? Also this is a good
  question for the ppl on zookeeper-user. (ccing)
 
  You are aware that ephemeral znodes are tied to the session? And that
  sessions only expire after the session timeout period? At which time any
  znodes created during that session are then deleted. The fact that you
 are
  killing your client process leads me to believe that you are not
 closing
  the session cleanly (meaning that it will eventually expire after the
  session timeout period), in which case the ephemeral znodes _should_
  reappear when A is restarted and successfully rejoins the cluster. (at
  least
  until the session timeout is exceeded)
 
  Patrick
 
  On Tue, Oct 5, 2010 at 11:04 AM, Vishal K vishalm...@gmail.com wrote:
 
   Hi,
  
   I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A),
 I
   have a ZK client running that connects to the local server and creates
 an
   ephemeral znode to indicate clients on other nodes that it is online.
  
   I have test script that reboots the zookeeper server as well as client
 on
   A. The test does a getstat on the ephemeral znode created by the client
  on
   A. I am seeing that the view of znodes on A is different from the other
 2
   nodes. I can tell this from the session ID that the client gets after
   reconnecting to the local ZK server.
  
   So the test is simple:
   - kill zookeeper server and client process
   - wait for a few seconds
   - do zkCli.sh stat ...  test.out
  
   What I am seeing is that the ephemeral znode with old zxid, time, and
   session ID is reappearing on node A. I have attached the output of 3
   consecutive getstat requests of the test (see client_getstat.out).
 Notice
   that the third output is the same as the first one. That is, the old
   ephemeral znode reappeared at A. However, both B and C are showing the
   latest znode with correct time, zxid and session ID (output not
  attached).
  
   After this point, all following getstat requests on A are showing the
 old
   znode. Whereas, B and C show the correct znode every time the client on
 A
   comes online. This is something very perplexing. Earlier I thought this
  was
   a bug in my client implementation. But the test shows that the ZK
 server
  on
   A after reboot is out of sync with rest of the servers

Re: Too many connections

2010-10-06 Thread Patrick Hunt
On Tue, Oct 5, 2010 at 10:23 AM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:

 So shouldn't all servers in another DC just have one session? So even if I
 have 50 observers in another DC that should be 50 sessions established
 since
 the IP doesn't change correct? Am I missing something? In some ZK clients I
 see the following exception even though they are in the same DC.


This really depends on how you implemented your client. Each time you create
a ZooKeeper object a new session is established. If you have 50 clients each
creating a ZooKeeper object then you have 50 sessions.

Patrick


Re: znode inconsistencies across ZooKeeper servers

2010-10-06 Thread Patrick Hunt
Vishal the attachment seems to be getting removed by the list daemon (I
don't have it), can you create a JIRA and attach? Also this is a good
question for the ppl on zookeeper-user. (ccing)

You are aware that ephemeral znodes are tied to the session? And that
sessions only expire after the session timeout period? At which time any
znodes created during that session are then deleted. The fact that you are
killing your client process leads me to believe that you are not closing
the session cleanly (meaning that it will eventually expire after the
session timeout period), in which case the ephemeral znodes _should_
reappear when A is restarted and successfully rejoins the cluster. (at least
until the session timeout is exceeded)

Patrick

On Tue, Oct 5, 2010 at 11:04 AM, Vishal K vishalm...@gmail.com wrote:

 Hi,

 I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node A), I
 have a ZK client running that connects to the local server and creates an
 ephemeral znode to indicate clients on other nodes that it is online.

 I have test script that reboots the zookeeper server as well as client on
 A. The test does a getstat on the ephemeral znode created by the client on
 A. I am seeing that the view of znodes on A is different from the other 2
 nodes. I can tell this from the session ID that the client gets after
 reconnecting to the local ZK server.

 So the test is simple:
 - kill zookeeper server and client process
 - wait for a few seconds
 - do zkCli.sh stat ...  test.out

 What I am seeing is that the ephemeral znode with old zxid, time, and
 session ID is reappearing on node A. I have attached the output of 3
 consecutive getstat requests of the test (see client_getstat.out). Notice
 that the third output is the same as the first one. That is, the old
 ephemeral znode reappeared at A. However, both B and C are showing the
 latest znode with correct time, zxid and session ID (output not attached).

 After this point, all following getstat requests on A are showing the old
 znode. Whereas, B and C show the correct znode every time the client on A
 comes online. This is something very perplexing. Earlier I thought this was
 a bug in my client implementation. But the test shows that the ZK server on
 A after reboot is out of sync with rest of the servers.

 The stat command to each server shows that the servers are in sync as far
 as zxid's are concerned (see stat.out). So there is something wrong with A's
 local database that is causing this problem.

 Has anyone seen this before? I will be doing more debugging in the next few
 days. Comments/suggestions for further debugging are welcomed.

 -Vishal





Re: Zookeeper on 60+Gb mem

2010-10-05 Thread Patrick Hunt
Tuning GC is going to be critical, otw all the sessions will timeout (and
potentially expire) during GC pauses.

Patrick

On Tue, Oct 5, 2010 at 1:18 PM, Maarten Koopmans maar...@vrijheid.netwrote:

 Yes, and syncing after a crash will be interesting as well. Off note; I am
 running it with a 6GB heap now, but it's not filled yet. I do have smoke
 tests thoug, so maybe I'll give it a try.



 Op 5 okt. 2010 om 21:13 heeft Benjamin Reed br...@yahoo-inc.com het
 volgende geschreven:

 
  you will need to time how long it takes to read all that state back in
 and adjust the initTime accordingly. it will probably take a while to pull
 all that data into memory.
 
  ben
 
  On 10/05/2010 11:36 AM, Avinash Lakshman wrote:
  I have run it over 5 GB of heap with over 10M znodes. We will definitely
 run
  it with over 64 GB of heap. Technically I do not see any limitiation.
  However I will the experts chime in.
 
  Avinash
 
  On Tue, Oct 5, 2010 at 11:14 AM, Mahadev Konarmaha...@yahoo-inc.com
 wrote:
 
  Hi Maarteen,
   I definitely know of a group which uses around 3GB of memory heap for
  zookeeper but never heard of someone with such huge requirements. I
 would
  say it definitely would be a learning experience with such high memory
  which
  I definitely think would be very very useful for others in the
 community as
  well.
 
  Thanks
  mahadev
 
 
  On 10/5/10 11:03 AM, Maarten Koopmansmaar...@vrijheid.net  wrote:
 
  Hi,
 
  I just wondered: has anybody ever ran zookeeper to the max on a 68GB
  quadruple extra large high memory EC2 instance? With, say, 60GB
 allocated
  or
  so?
 
  Because EC2 with EBS is a nice way to grow your zookeeper cluster
 (data
  on the
  ebs columes, upgrade as your memory utilization grows)  - I just
  wonder
  what the limits are there, or if I am foing where angels fear to
 tread...
 
  --Maarten
 
 
 
 



Re: ZK compatability

2010-09-30 Thread Patrick Hunt
Historically major releases can have non-bw compatible changes.  However if
you look back through the release history you'll see that the last time that
happened was oct 2008, when we moved the project from sourceforge to apache.

Patrick

On Tue, Sep 28, 2010 at 11:37 AM, Jun Rao jun...@gmail.com wrote:

 What about major releases going forward? Thanks,

 Jun

 On Mon, Sep 27, 2010 at 10:32 PM, Patrick Hunt ph...@apache.org wrote:

  In general yes, minor and bug fix releases are fully backward compatible.
 
  Patrick
 
 
  On Sun, Sep 26, 2010 at 9:11 PM, Jun Rao jun...@gmail.com wrote:
 
  Hi,
 
  Does ZK support (and plan to support in the future) backward
 compatibility
  (so that a new client can talk to an old server and vice versa)?
 
  Thanks
 
  Jun
 
 
 



Re: c client 0 state?

2010-09-28 Thread Patrick Hunt
Seems like a bug to me. Please enter a JIRA (if you haven't already).

Thanks,

Patrick

On Fri, Sep 17, 2010 at 9:10 AM, Michael Xu mx2...@gmail.com wrote:

 Hi everyone

 in the c client api:

 Is it normal for zoo_state() to return zero (not one of the valid state
 consts) when it is handling socket errors?


 In the C Code, handle_error(), which handles socket errors,  sets the
 zh-state to zero,
 ==
if (!is_unrecoverable(zh))
zh-state = 0;
 ==

 If the handle is recoverable, why is the state set to zero, which is not
 even a valid state const?

 Here's a use case where the state should be connecting, but instead is
 zero:

 1) c client connects to a zkserver
 2) shutdown zkserver
 3) zoo_state() returns zero on a valid zookeeper handle.


 We are using zoo_state() to get the state of the connection, and this is
 a surprising returned value from this function.


 Thanks,

 michael




Re: zkfuse

2010-09-27 Thread Patrick Hunt
Sounds like you have an old version of autoconf, try upgrading, see similar
issue here:
http://www.mail-archive.com/thrift-u...@incubator.apache.org/msg00673.html

http://www.mail-archive.com/thrift-u...@incubator.apache.org/msg00673.html
Patrick

2010/9/24 俊贤 junx...@taobao.com

 Hi mahadev,

 My os is Linux localhost.localdomain 2.6.18-164.el5 #1 SMP Thu Sep 3
 03:33:56 EDT 2009 i686 i686 i386 GNU/Linux


  The errror occured when I run the autoreconf -if command reminded in the
 README file.

 follow the error info:

 configure.ac:51: error: possibly undefined macro: AC_TYPE_INT64_T
 configure.ac:58: error: possibly undefined macro: AC_TYPE_UINT32_T
 configure.ac:59: error: possibly undefined macro: AC_TYPE_UINT64_T
 configure.ac:60: error: possibly undefined macro: AC_TYPE_UINT8_T


 Thanks you!
 JunX



 junxian
 
 From: Mahadev Konar [maha...@yahoo-inc.com]
 Sent: 25 September 2010 06:19
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: zkfuse

 Hi Jun,
  I havent seen people using zkfuse recently. What kind of issues are you
 facing?

 Thanks
 mahadev

 This email (including any attachments) is confidential and may be legally
 privileged. If you received this email in error, please delete it
 immediately and do not copy it or use it for any purpose or disclose its
 contents to any other person. Thank you.


 本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。



Re: processResults

2010-09-27 Thread Patrick Hunt
I believe what the author is trying to say is that if the getdata were to
fail (such as the example you give) the watch set as part of the original
call will fire, and this will notify the client that the node was deleted.
(call to process(event))

Patrick

On Mon, Sep 27, 2010 at 6:56 PM, Milind Parikh milindpar...@gmail.comwrote:

 In the explanation of the Java binding, it is mentioned If the file (or
 znode) exists, it gets the data from the znode, and then invoke the
 exists()
 callback of Executor if the state has changed. Note, it doesn't have to do
 any Exception processing for the getData call because it has watches
 pending
 for anything that could cause an error: if the node is deleted before it
 calls ZooKeeper.getData(), the watch event set by the
 ZooKeeper.exists()triggers a callback 

 I read this to mean that if I insert a Thread.sleep() before the getData
 call  removed the node from the cli, somehow (magically) there would be no
 error. But of course, it does not happen

 Sleeps for 10 seconds
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
 NoNode for /zk_test
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
at DataMonitor.processResult(DataMonitor.java:114)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:512)

 Am I doing something wrong (or reading something wrong)?

 -- Milind



Re: possible bug in zookeeper ?

2010-09-14 Thread Patrick Hunt
That is unusual. I don't recall anyone reporting a similar issue, and
looking at the code I don't see any issues off hand. Can you try the
following?

1) on that particular zk client machine resolve the hosts zook1/zook2/zook3,
what ip addresses does this resolve to? (try dig)
2) try running the client using the 3.3.1 jar file (just replace the jar on
the client), it includes more log4j information, turn on DEBUG or TRACE
logging

Patrick

On Tue, Sep 14, 2010 at 8:44 AM, Yatir Ben Shlomo yat...@outbrain.comwrote:

 zook1:2181,zook2:2181,zook3:2181


 -Original Message-
 From: Ted Dunning [mailto:ted.dunn...@gmail.com]
 Sent: Tuesday, September 14, 2010 4:11 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: possible bug in zookeeper ?

 What was the list of servers that was given originally to open the
 connection to ZK?

 On Tue, Sep 14, 2010 at 6:15 AM, Yatir Ben Shlomo yat...@outbrain.com
 wrote:

  Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances.
 
  I am performing survivability  tests:
  Taking one of the zookeeper instances down I would expect the client to
 use
  a different zookeeper server instance.
 
  But as you can see in the below logs attached
  Depending on which instance I choose to take down (in my case,  the last
  one in the list of zookeeper servers)
  the client is constantly insisting on the same zookeeper server
 (Attempting
  connection to server zook3/192.168.252.78:2181)
  and not switching to a different one
  the problem seems to arrive from ClientCnxn.java
  Any one has an idea on this ?
 
  Solr cloud currently is using  zookeeper-3.2.2.jar
  Is this a know bug that was fixed in later versions ?( 3.3.1)
 
  Thanks in advance,
  Yatir
 
 
  Logs:
 
  Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
  WARNING: Ignoring exception during shutdown input
  java.nio.channels.ClosedChannelException
 at
  sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
 at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
  Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
  WARNING: Ignoring exception during shutdown output
  java.nio.channels.ClosedChannelException
 at
  sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
 at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
  Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
  INFO: Attempting connection to server zook3/192.168.252.78:2181
  Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
  WARNING: Exception closing session 0x32b105244a20001 to
  sun.nio.ch.selectionkeyi...@3ca58cbf
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
 at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933)
  Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
  WARNING: Ignoring exception during shutdown input
  java.nio.channels.ClosedChannelException
 at
  sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
 at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
  Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
  WARNING: Ignoring exception during shutdown output
  java.nio.channels.ClosedChannelException
 at
  sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
 at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
 at
 
 org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
  Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
  INFO: Attempting connection to server zook3/192.168.252.78:2181
  Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
  WARNING: Exception closing session 0x32b105244a2 to
  sun.nio.ch.selectionkeyi...@3960f81b
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
 at
  sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   

Re: Spew after call to close

2010-09-08 Thread Patrick Hunt
No worries, let us know if something else pops up.

Patrick

On Tue, Sep 7, 2010 at 3:10 PM, Stack st...@duboce.net wrote:

 Nevermind.  I figured it.  It was an hbase issue.  We were leaking a
 client reference.
 Sorry for the noise,
 St.Ack


 On Sat, Sep 4, 2010 at 10:58 AM, Stack st...@duboce.net wrote:
  Thats right -- client is shutdown first, then server... How do I stop
  the client trying to come back from the dead?
  Good on you Mahadev?
  St.Ack
 
  On Fri, Sep 3, 2010 at 8:36 PM, Mahadev Konar maha...@yahoo-inc.com
 wrote:
 
  Hi Stack,
   Looks like you are shutting down the server and shutting down the
 client at
  the same time? Is that the issue?
 
  Thanks
  mahadev
 
  On 9/3/10 4:47 PM, Stack st...@duboce.net wrote:
 
  Have you fellas seen this before? I call close on zookeeper but it
 insists
  on doing the below exceptions.  Why is it doing this 'Session
  0x12ad9dccda30002
  for server null, unexpected error, closing socket connection and
  attempting reconnect'?   This would seem to come after the close has
  been noticed and looking in code, i'd think we'd not do this since the
  close flag should be set to true post call to close?
 
  Thanks lads  (The below looks ugly in our logs... this is zk 3.3.1),
  St.Ack
 
  2010-09-03 16:09:52,369 INFO
  org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
  for client /fe80:0:0:0:0:0:0:1%1:56941 which had sessionid
  0x12ad9dccda30001
  2010-09-03 16:09:52,369 INFO
  org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
  for client /127.0.0.1:56942 which had sessionid 0x12ad9dccda30002
  2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable
  to read additional data from server sessionid 0x12ad9dccda30001,
  likely server has closed socket, closing socket connection and
  attempting reconnect
  2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable
  to read additional data from server sessionid 0x12ad9dccda30002,
  likely server has closed socket, closing socket connection and
  attempting reconnect
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.NIOServerCnxn: NIOServerCnxn factory
  exited run method
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.PrepRequestProcessor: PrepRequestProcessor
  exited loop!
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.SyncRequestProcessor: SyncRequestProcessor
  exited!
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.FinalRequestProcessor: shutdown of request
  processor complete
  2010-09-03 16:09:52,470 DEBUG
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received ZooKeeper Event, type=None, state=Disconnected, path=null
  2010-09-03 16:09:52,470 INFO
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received Disconnected from ZooKeeper, ignoring
  2010-09-03 16:09:52,471 DEBUG
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received ZooKeeper Event, type=None, state=Disconnected, path=null
  2010-09-03 16:09:52,471 INFO
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received Disconnected from ZooKeeper, ignoring
  2010-09-03 16:09:52,857 INFO org.apache.zookeeper.ClientCnxn: Opening
  socket connection to server localhost/0:0:0:0:0:0:0:1:2181
  2010-09-03 16:09:52,858 WARN org.apache.zookeeper.ClientCnxn: Session
  0x12ad9dccda30001 for server null, unexpected error, closing socket
  connection and attempting reconnect
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
  2010-09-03 16:09:53,149 INFO org.apache.zookeeper.ClientCnxn: Opening
  socket connection to server localhost/fe80:0:0:0:0:0:0:1%1:2181
  2010-09-03 16:09:53,150 WARN org.apache.zookeeper.ClientCnxn: Session
  0x12ad9dccda30002 for server null, unexpected error, closing socket
  connection and attempting reconnect
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
  2010-09-03 16:09:53,576 INFO org.apache.zookeeper.ClientCnxn: Opening
  socket connection to server localhost/127.0.0.1:2181
  2010-09-03 16:09:53,576 WARN org.apache.zookeeper.ClientCnxn: Session
  0x12ad9dccda30001 for server null, unexpected error, closing socket
  connection and attempting reconnect
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
  2010-09-03 16:09:54,000 INFO
  

Re: Spew after call to close

2010-09-08 Thread Patrick Hunt
No worries, let us know if something else pops up.

Patrick

On Tue, Sep 7, 2010 at 3:10 PM, Stack st...@duboce.net wrote:

 Nevermind.  I figured it.  It was an hbase issue.  We were leaking a
 client reference.
 Sorry for the noise,
 St.Ack


 On Sat, Sep 4, 2010 at 10:58 AM, Stack st...@duboce.net wrote:
  Thats right -- client is shutdown first, then server... How do I stop
  the client trying to come back from the dead?
  Good on you Mahadev?
  St.Ack
 
  On Fri, Sep 3, 2010 at 8:36 PM, Mahadev Konar maha...@yahoo-inc.com
 wrote:
 
  Hi Stack,
   Looks like you are shutting down the server and shutting down the
 client at
  the same time? Is that the issue?
 
  Thanks
  mahadev
 
  On 9/3/10 4:47 PM, Stack st...@duboce.net wrote:
 
  Have you fellas seen this before? I call close on zookeeper but it
 insists
  on doing the below exceptions.  Why is it doing this 'Session
  0x12ad9dccda30002
  for server null, unexpected error, closing socket connection and
  attempting reconnect'?   This would seem to come after the close has
  been noticed and looking in code, i'd think we'd not do this since the
  close flag should be set to true post call to close?
 
  Thanks lads  (The below looks ugly in our logs... this is zk 3.3.1),
  St.Ack
 
  2010-09-03 16:09:52,369 INFO
  org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
  for client /fe80:0:0:0:0:0:0:1%1:56941 which had sessionid
  0x12ad9dccda30001
  2010-09-03 16:09:52,369 INFO
  org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
  for client /127.0.0.1:56942 which had sessionid 0x12ad9dccda30002
  2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable
  to read additional data from server sessionid 0x12ad9dccda30001,
  likely server has closed socket, closing socket connection and
  attempting reconnect
  2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable
  to read additional data from server sessionid 0x12ad9dccda30002,
  likely server has closed socket, closing socket connection and
  attempting reconnect
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.NIOServerCnxn: NIOServerCnxn factory
  exited run method
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.PrepRequestProcessor: PrepRequestProcessor
  exited loop!
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.SyncRequestProcessor: SyncRequestProcessor
  exited!
  2010-09-03 16:09:52,370 INFO
  org.apache.zookeeper.server.FinalRequestProcessor: shutdown of request
  processor complete
  2010-09-03 16:09:52,470 DEBUG
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received ZooKeeper Event, type=None, state=Disconnected, path=null
  2010-09-03 16:09:52,470 INFO
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received Disconnected from ZooKeeper, ignoring
  2010-09-03 16:09:52,471 DEBUG
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received ZooKeeper Event, type=None, state=Disconnected, path=null
  2010-09-03 16:09:52,471 INFO
  org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
  Received Disconnected from ZooKeeper, ignoring
  2010-09-03 16:09:52,857 INFO org.apache.zookeeper.ClientCnxn: Opening
  socket connection to server localhost/0:0:0:0:0:0:0:1:2181
  2010-09-03 16:09:52,858 WARN org.apache.zookeeper.ClientCnxn: Session
  0x12ad9dccda30001 for server null, unexpected error, closing socket
  connection and attempting reconnect
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
  2010-09-03 16:09:53,149 INFO org.apache.zookeeper.ClientCnxn: Opening
  socket connection to server localhost/fe80:0:0:0:0:0:0:1%1:2181
  2010-09-03 16:09:53,150 WARN org.apache.zookeeper.ClientCnxn: Session
  0x12ad9dccda30002 for server null, unexpected error, closing socket
  connection and attempting reconnect
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
  2010-09-03 16:09:53,576 INFO org.apache.zookeeper.ClientCnxn: Opening
  socket connection to server localhost/127.0.0.1:2181
  2010-09-03 16:09:53,576 WARN org.apache.zookeeper.ClientCnxn: Session
  0x12ad9dccda30001 for server null, unexpected error, closing socket
  connection and attempting reconnect
  java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
 at
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
  2010-09-03 16:09:54,000 INFO
  

Re: getting created child on NodeChildrenChanged event

2010-09-07 Thread Patrick Hunt
It is good to keep things simple, but we have seen some requests related to
the client api  for children use cases that seem reasonable. In particular
the issue of handling large numbers of children efficiently is currently a
problem (queue say). We've seen proposals on this before, just no one's
followed through with them. I personally think there's room for improvement,
perhaps the current client api is too simple:

https://issues.apache.org/jira/browse/ZOOKEEPER-423

Patrick

On Fri, Sep 3, 2010 at 11:18 PM, Mahadev Konar maha...@yahoo-inc.comwrote:

 Hi Todd,
  We have always tried to lean on the side of keeping things lightweight and
 the api simple. The only way you would be able to do this is with
 sequential
 creates.

 1. create nodes like /queueelement-$i where i is a monotonically increasing
 number. You could use the sequential flag of zookeeper to do this.

 2. when deleting a node, you would remove the node and create a deleted
 node
 on

 /deletedqueueelements/queuelement-$i

 2.1 on notification you would go to /deletedqueelements/ and find out which
 ones were deleted.

 The above only works if you are ok with monotonically unique queue
 elements.

 3. the above method allows the folks to see the deltas using
 deletedqueuelements, which can be garbage collected by some clean up
 process
 (you can be smarter abt this as well)

 Would something like this work?


 Thanks
 mahadev


 On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote:

  Hi Dave,
Thanks for the response.  I understand your point about missed events
  during a watch reset period.  I may be off, here is the functionality I
  was thinking.  I'm not sure if the ZK internal versioning process could
  possibly support something like this.
 
  1. A watch is placed on children
  2. The event is fired to the client.  The client receives the Stat
  object as part of the event for the current state of the node when the
  event was created.  We'll call this Stat A with version 1
  3. The client performs processing.  Meanwhile the node has several
  children changed. Versions are incremented to version 2 and version 3
  4. Client resets the watch
  5. A node is added
  6. The event is fired to the client.  Client receives Stat B with
  version 4
  7. Client calls performs a deltaChildren(Stat A, Stat B)
  8. zookeeper returns added nodes between stats, also returns deleted
  nodes between stats.
 
  This would handle the missed event problem since the client would have
  the 2 states it needs to compare.  It also allows clients dealing with
  large data sets to only deal with the delta over time (like a git
  replay).  Our number of queues could get quite large, and I'm concerned
  that keeping my previous event's children in a set to perform the delta
  may become quite memory and processor intensive  Would a feature like
  this be possible without over complicating the Zookeeper core?
 
 
  Thanks,
  Todd
 
  On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:
 
  Hi Todd -
  The general explanation for why Zookeeper doesn't pass the event
 information
  w/ the event notification is that an event notification is only
 triggered
  once, and thus may indicate multiple events. For example, if you do a
  GetChildren and set a watch, then multiple children are added at about
 the
  same time, the first one triggers a notification, but the second (or
 later)
  ones do not. When you do another GetChildren() request to get the list
 and
  reset the watch, you'll see all the changed nodes, however if you had
 just
  been told about the first change in the notification you would have
 missed
  the others.
  To do what you are wanting, you would really need persistent watches
 that
  send notifications every time a change occurs and don't need to be reset
 so
  you can't miss events. That isn't the design that was chosen for
 Zookeeper
  and I don't think it's likely to be implemented.
 
  -Dave Wright
 
  On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz
 wrote:
 
  Hi all,
   I'm writing a distributed queue monitoring class for our leader node
 in
  the cluster.  We're queueing messages per input hardware device, this
 queue
  is then assigned to a node with the least load in our cluster.  To do
 this,
  I maintain 2 Persistent Znode with the following format.
 
  data queue
 
  /dataqueue/devices/unit id/data packet
 
  processing follower
 
  /dataqueue/nodes/node name/unit id
 
  The queue monitor watches for changes on the path of
 /dataqueue/devices.
   When the first packet from a unit is received, the queue writer will
  create
  the queue with the unit id.  This triggers the watch event on the
  monitoring
  class, which in turn creates the znode for the path with the least
 loaded
  node.  This path is watched for child node creation and the node
 creates a
  queue consumer to consume messages from the new queue.
 
 
  Our list of queues can become quite large, and I would prefer not to
  maintain a list 

Re: election recipe

2010-09-07 Thread Patrick Hunt
Hi Andrei, the answer may not be as simple as that. In the case of passive
leader you might want to just wait till you're reconnected before taking
any action. Connection loss indicates that you aren't currently connected to
a server, it doesn't mean that you've lost leadership (if you get expired
that would mean you lost leader). However for active leader you might want
to stop acting as leader immed. upon connection loss (given you don't know
if you're the leader any longer). The active vs passive leader distinction
is indicating whether the leader is the one taking the action (active), or
the followers are the ones taking the action (passive). For example in the
active case the leader may be sending out commands to the followers, in the
passive case the leader might be getting requests from the followers. In the
first case you want to stop as soon as you are not sure you're the leader,
in the passive case the followers will stop talking to you on their own if
leadership change does take place.

Patrick

On Sat, Sep 4, 2010 at 11:16 AM, Andrei Savu savu.and...@gmail.com wrote:

 You should also be careful how you handle connection loss events. The
 leader should suspend itself and re-run the election process when the
 connection is reestablished.

 On Sat, Sep 4, 2010 at 8:37 AM, Mahadev Konar maha...@yahoo-inc.com
 wrote:
  Hi Eric,
   As Ted and you yourself mentioned its mostly to avoid herd affect.  A
 herd
  affect would usually mean 1000¹s of client notified of some change and
 would
  try creating the same node on notification.  With just 10¹s of clients
 you
  don¹t need to worry abt this herd effect at all.
 
  Thanks
  mahadev
 
 
  On 9/2/10 3:40 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  You are correct that this simpler recipe will work for smaller
 populations
  and correct that the complications are to avoid the herd effect.
 
 
 
  On Thu, Sep 2, 2010 at 12:55 PM, Eric van Orsouw
  eric.van.ors...@gmail.comwrote:
 
  Hi there,
 
 
 
  I would like to use zookeeper to implement an election scheme.
 
  There is a recipe on the homepage, but it is relatively complex.
 
  I was wondering what was wrong with the following pseudo code;
 
 
 
  forever {
 
 zookeeper.create -e /election my_ip_address
 
 if creation succeeded then {
 
 // do the leader thing
 
 } else {
 
 // wait for change in /election using watcher mechanism
 
 }
 
  }
 
 
 
  My assumption is that the recipe is more elaborate to the eliminate the
  flood of requests if the leader falls away.
 
  But if there are only a handful of leader-candidates ,than that should
 not
  be a problem.
 
 
 
  Is this correct, or am I missing out on something.
 
 
 
  Thanks,
 
  Eric
 
 
 
 
 
 
 



 --
 Andrei Savu -- http://www.andreisavu.ro/



Re: closing session on socket close vs waiting for timeout

2010-09-07 Thread Patrick Hunt
That's a good point, however with suitable documentation, warnings and such
it seems like a reasonable feature to provide for those users who require
it. Used in moderation it seems fine to me. Perhaps we also make it
configurable at the server level for those administrators/ops who don't want
to deal with it (disable the feature entirely, or only enable on particular
servers, etc...).

Patrick

On Mon, Sep 6, 2010 at 2:10 PM, Benjamin Reed br...@yahoo-inc.com wrote:

 if this mechanism were used very often, we would get a huge number of
 session expirations when a server fails. you are trading fast error
 detection for the ability to tolerate temporary network and server outages.

 to be honest this seems like something that in theory sounds like it will
 work in practice, but once deployed we start getting session expirations for
 cases that we really do not want or expect.

 ben


 On 09/01/2010 12:47 PM, Patrick Hunt wrote:

 Ben, in this case the session would be tied directly to the connection,
 we'd explicitly deny session re-establishment for this session type (so
 4 would fail). Would that address your concern, others?

 Patrick

 On 09/01/2010 10:03 AM, Benjamin Reed wrote:


 i'm a bit skeptical that this is going to work out properly. a server
 may receive a socket reset even though the client is still alive:

 1) client sends a request to a server
 2) client is partitioned from the server
 3) server starts trying to send response
 4) client reconnects to a different server
 5) partition heals
 6) server gets a reset from client

 at step 6 i don't think you want to delete the ephemeral nodes.

 ben

 On 08/31/2010 01:41 PM, Fournier, Camille F. [Tech] wrote:


 Yes that's right. Which network issues can cause the socket to close
 without the initiating process closing the socket? In my limited
 experience in this area network issues were more prone to leave dead
 sockets open rather than vice versa so I don't know what to look out
 for.

 Thanks,
 Camille

 -Original Message-
 From: Dave Wright [mailto:wrig...@gmail.com]
 Sent: Tuesday, August 31, 2010 1:14 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: closing session on socket close vs waiting for timeout

 I think he's saying that if the socket closes because of a crash (i.e.
 not a
 normal zookeeper close request) then the session stays alive until the
 session timeout, which is of course true since ZK allows reconnection
 and
 resumption of the session in case of disconnect due to network issues.

 -Dave Wright

 On Tue, Aug 31, 2010 at 1:03 PM, Ted Dunningted.dunn...@gmail.com
 wrote:



 That doesn't sound right to me.

 Is there a Zookeeper expert in the house?

 On Tue, Aug 31, 2010 at 8:58 AM, Fournier, Camille F. [Tech]
 camille.fourn...@gs.com  wrote:



 I foolishly did not investigate the ZK code closely enough and it
 seems
 that closing the socket still waits for the session timeout to
 remove the
 session.








Re: closing session on socket close vs waiting for timeout

2010-09-01 Thread Patrick Hunt
Ben, in this case the session would be tied directly to the connection, 
we'd explicitly deny session re-establishment for this session type (so 
4 would fail). Would that address your concern, others?


Patrick

On 09/01/2010 10:03 AM, Benjamin Reed wrote:

i'm a bit skeptical that this is going to work out properly. a server
may receive a socket reset even though the client is still alive:

1) client sends a request to a server
2) client is partitioned from the server
3) server starts trying to send response
4) client reconnects to a different server
5) partition heals
6) server gets a reset from client

at step 6 i don't think you want to delete the ephemeral nodes.

ben

On 08/31/2010 01:41 PM, Fournier, Camille F. [Tech] wrote:

Yes that's right. Which network issues can cause the socket to close
without the initiating process closing the socket? In my limited
experience in this area network issues were more prone to leave dead
sockets open rather than vice versa so I don't know what to look out for.

Thanks,
Camille

-Original Message-
From: Dave Wright [mailto:wrig...@gmail.com]
Sent: Tuesday, August 31, 2010 1:14 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: closing session on socket close vs waiting for timeout

I think he's saying that if the socket closes because of a crash (i.e.
not a
normal zookeeper close request) then the session stays alive until the
session timeout, which is of course true since ZK allows reconnection and
resumption of the session in case of disconnect due to network issues.

-Dave Wright

On Tue, Aug 31, 2010 at 1:03 PM, Ted Dunningted.dunn...@gmail.com
wrote:


That doesn't sound right to me.

Is there a Zookeeper expert in the house?

On Tue, Aug 31, 2010 at 8:58 AM, Fournier, Camille F. [Tech]
camille.fourn...@gs.com wrote:


I foolishly did not investigate the ZK code closely enough and it seems
that closing the socket still waits for the session timeout to
remove the
session.




Re: Logs and in memory operations

2010-08-31 Thread Patrick Hunt
On Mon, Aug 30, 2010 at 1:11 PM, Avinash Lakshman 
avinash.laksh...@gmail.com wrote:

 From my understanding when a znode is updated/created a write happens into
 the local transaction logs and then some in-memory data structure is
 updated
 to serve the future reads.
 Where in the source code can I find this? Also how can I decide when it is
 ok for me to delete the logs off disk?


The bits where the in-memory db is updated is here:
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Request)

Regarding datadir cleanup see this section of the docs.
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup
Basically -- there's a tool for that but you should backup the current state
of the database before doing the cleanup.

Patrick


Re: Zookeeper shell

2010-08-31 Thread Patrick Hunt

Depending on your classpath setup:

java org.apache.zookeeper.ZooKeeperMain -server 127.0.0.1:2181

if jline jar is in your classpath (included in the zk release 
distribution) you'll get history, auto-complete and such.


Patrick

On 08/31/2010 03:08 PM, Michi Mutsuzaki wrote:

Hello,

I'm looking for a good zookeeper shell. So far I've only used cli_mt (c
client), but it's not very user friendly. Are there any alternatives? In
particular, I'm looking for:

- command history with reverse search
- auto-complete znode path

Thanks!
--Michi



Re: IllegalArgumentException excpetion : Path cannot be null

2010-08-30 Thread Patrick Hunt
The client (solr in this case) is passing a null path to the
ZooKeeper.getChildren(path, ... ) call.

java.lang.IllegalArgumentException: Path cannot be null
   at
org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
   at
org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
   at
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)

I'm afraid you'll have to work with the solr team to determine the cause of
this.

Patrick

On Thu, Aug 26, 2010 at 12:15 AM, Yatir Ben Shlomo yat...@outbrain.comwrote:

 I am running a zookeeper ensemble of 3 zookeeper instances
 and established a solrCloud to work with it (2 masters , 2 slaves)
 on one of the masters I keep noticing ZooKeeper related exceptions which I
 can't understand:
 And the other is java.lang.IllegalArgumentException: Path cannot be null
 (PathUtils.java:45)

 Here are my logs (I set the log level to FINE on zookeeper package)

  Anyone can identify the issue?
 (I could not yet get any help from the solrCloud community)


 FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null
 serverPath:null finished:false header:: -8,101  replyHeader:: -8,-1,0
  request::
 30064776552,v{'/collections},v{},v{'/collections/ENPwl/shards/ENPWL1,'/collections/ENPwl/shards/ENPWL4,'/collections/ENPwl/shards/ENPWL2,'/collections,'/collections/ENPwl/shards/ENPWL3,'/collections/ENPwlMaster/shards/ENPWLMaster_3,'/collections/ENPwlMaster/shards/ENPWLMaster_4,'/live_nodes,'/collections/ENPwlMaster/shards/ENPWLMaster_1,'/collections/ENPwlMaster/shards/ENPWLMaster_2}
  response:: null
 Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug
 FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null
 serverPath:null finished:false header:: 540,8  replyHeader:: 540,-1,0
  request:: '/collections,F  response:: v{'ENPwl,'ENPwlMaster}
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader
 updateCloudState
 INFO: Cloud state update for ZooKeeper already scheduled
 Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error
 SEVERE: Error while calling watcher
 java.lang.IllegalArgumentException: Path cannot be null
at
 org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
at
 org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
at
 org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)
at
 org.apache.solr.common.cloud.ZkStateReader$5.process(ZkStateReader.java:315)
at
 org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425)
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4
 process
 INFO: Detected a shard change under ShardId:ENPWL3 in collection:ENPwl
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader
 updateCloudState
 INFO: Cloud state update for ZooKeeper already scheduled
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4
 process
 INFO: Detected a shard change under ShardId:ENPWL4 in collection:ENPwl
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader
 updateCloudState
 INFO: Cloud state update for ZooKeeper already scheduled
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4
 process
 INFO: Detected a shard change under ShardId:ENPWL1 in collection:ENPwl
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader
 updateCloudState
 INFO: Cloud state update for ZooKeeper already scheduled
 Aug 25, 2010 5:18:19 AM org.apache.solr.cloud.ZkController$2 process
 INFO: Updating live
 nodes:org.apache.solr.common.cloud.solrzkcli...@55308275
 Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader
 updateCloudState
 INFO: Updating live nodes from ZooKeeper...
 Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug
 FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null
 serverPath:null finished:false header:: 541,8  replyHeader:: 541,-1,0
  request:: '/live_nodes,F  response:: v{'ob1078.nydc1.outbrain.com:8983
 _solr2,'ob1078.nydc1.outbrain.com:8983
 _solr1,'ob1061.nydc1.outbrain.com:8983
 _solr2,'ob1062.nydc1.outbrain.com:8983
 _solr1,'ob1062.nydc1.outbrain.com:8983
 _solr2,'ob1061.nydc1.outbrain.com:8983
 _solr1,'ob1077.nydc1.outbrain.com:8983
 _solr2,'ob1077.nydc1.outbrain.com:8983_solr1}
 Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error
 SEVERE: Error while calling watcher
 java.lang.IllegalArgumentException: Path cannot be null
at
 org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
at
 org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
at
 org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)
at
 org.apache.solr.cloud.ZkController$2.process(ZkController.java:321)
at
 org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425)
 Aug 25, 2010 5:18:19 AM 

Re: Receiving create events for self with synchronous create

2010-08-30 Thread Patrick Hunt
On line 64 are you ensuring that the ZooKeeper session is active before
executing that sequence?

zookeeper = new ZooKeeper(...) is async - it returns before you're actually
connected to the server (you get notified of this in your watcher). If you
execute this sequence quickly enough your zk.create operation is queued
until the zookeeper session is actually established.

Patrick

On Thu, Aug 26, 2010 at 8:09 PM, Todd Nine t...@spidertracks.co.nz wrote:

 Sure thing.  The FollowerWatcher class is instantiated by the
 IClusterManager implementation.It then performs the following

 FollowerWatcher.init() which is intended to do the following.

 1. Create our follower node so that other nodes know we exist at path
 /com/spidertracks/aviator/cluster/follower/10.0.1.1  where the last
 node is an ephemeral node with the internal IP address of the node.
 These are lines 67 through 72.
 2. Signal to the clusterManager that the cluster has changed (line 79).
 Ultimately the clusterManager will perform a barrier for partitioning
 data ( a separate watcher)
 3. Register a watcher to receive all future events on the follower path
 /com/spidertracks/aviator/cluster/follower/ line 81.


 Then we have the following characteristics in the watcher

 1. If a node has been added or deleted from the children of
 /com/spidertracks/aviator/cluster/follower then continue.  Otherwise,
 ignore the event.  Lines 33 through 44
 2. If this was an event we should process our cluster has changed,
 signal to the CusterManager that a node has either been added or
 removed. line 51.


 I'm trying to encapsulate the detection of additions and deletions of
 child nodes within this Watcher.  All other events that occur due to a
 node being added or deleted should be handled externally by the
 clustermanager.

 Thanks,
 Todd


 On Thu, 2010-08-26 at 19:26 -0700, Mahadev Konar wrote:

  Hi Todd,
The code that you point to, I am not able to make out the sequence
  of steps.
 Can you be more clear on what you are trying to do in terms of
  zookeeper api?
 
  Thanks
  mahadev
  On 8/26/10 5:58 PM, Todd Nine t...@spidertracks.co.nz wrote:
 
 
  Hi all,
I'm running into a strange issue I could use a hand with.
I've
  implemented leader election, and this is working well.  I'm
  now
  implementing a follower queue with ephemeral nodes. I have an
  interface
  IClusterManager which simply has the api clusterChanged.  I
  don't care
  if nodes are added or deleted, I always want to fire this
  event.  I have
  the following basic algorithm.
 
 
  init
 
  Create a path with /follower/+mynode name
 
  fire the clusterChangedEvent
 
  Watch set the event watcher on the path /follower.
 
 
  watch:
 
  reset the watch on /follower
 
  if event is not a NodeDeleted or NodeCreated, ignore
 
  fire the clustermanager event
 
 
  this seems pretty straightforward.  Here is what I'm expecting
 
 
  1. Create my node path
  2. fire the clusterChanged event
  3. Set watch on /follower
  4. Receive watch events for changes from any other nodes.
 
  What's actually happening
 
  1. Create my node path
  2. fire the clusterChanged event
  3. Set Watch on /follower
  4. Receive watch event for node created in step 1
  5. Receive future watch events for changes from any other
  nodes.
 
 
  Here is my code.  Since I set the watch after I create the
  node, I'm not
  expecting to receive the event for it.  Am I doing something
  incorrectly
  in creating my watch?  Here is my code.
 
  http://pastebin.com/zDXgLagd
 
  Thanks,
  Todd
 
 
 
 
 



Re: Exception causing close of session

2010-08-30 Thread Patrick Hunt
No, by reset I meant purging the ZK database (rm -fr /zkdatadir). I've
seen a number of cases like this now, where a user plays with hbase for a
while and wants to reset back to a state with no data in hbase. They
shutdown some of the hbase/zk processes but not all of them (and as a result
old zk sessions are hanging around). Really we should invalidate the
session:
https://issues.apache.org/jira/browse/ZOOKEEPER-583

Patrick

On Fri, Aug 27, 2010 at 12:00 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Patrick,

 Can you clarify what reset means?  It doesn't mean just restart, does it?

 On Thu, Aug 26, 2010 at 5:05 PM, Patrick Hunt ph...@apache.org wrote:

   Client has seen zxid 0xfa4 our last zxid is 0x42
 
  Someone reset the zk server database without restarting the clients. As a
  result the client is forward in time relative to the cluster.
 
  Patrick
 
 
  On 08/26/2010 04:03 PM, Ted Yu wrote:
 
  Hi,
  zookeeper-3.2.2 is used out of HBase 0.20.5
 
  Linux sjc1-.com 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64
  x86_64 x86_64 GNU/Linux
 
  In hbase-hadoop-zookeeper-sjc1-cml-grid00.log, I see a lot of the
  following:
 
  2010-08-26 22:58:01,930 INFO org.apache.zookeeper.server.NIOServerCnxn:
  closing session:0x0 NIOServerCnxn:
  java.nio.channels.SocketChannel[connected
  local=/10.201.9.40:2181 remote=/10.201.9.22:63316]
  2010-08-26 22:58:02,097 INFO org.apache.zookeeper.server.NIOServerCnxn:
  Connected to /10.201.9.22:63317 lastZxid 4004
  2010-08-26 22:58:02,097 WARN org.apache.zookeeper.server.NIOServerCnxn:
  Client has seen zxid 0xfa4 our last zxid is 0x42
  2010-08-26 22:58:02,097 WARN org.apache.zookeeper.server.NIOServerCnxn:
  Exception causing close of session 0x0 due to java.io.IOException:
 Client
  has seen zxid 0xfa4 our last zxid is 0x42
 
  If you can shed some thought on root cause, that would be great.
 
 



Re: Exception causing close of session

2010-08-26 Thread Patrick Hunt

 Client has seen zxid 0xfa4 our last zxid is 0x42

Someone reset the zk server database without restarting the clients. As 
a result the client is forward in time relative to the cluster.


Patrick

On 08/26/2010 04:03 PM, Ted Yu wrote:

Hi,
zookeeper-3.2.2 is used out of HBase 0.20.5

Linux sjc1-.com 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64
x86_64 x86_64 GNU/Linux

In hbase-hadoop-zookeeper-sjc1-cml-grid00.log, I see a lot of the following:

2010-08-26 22:58:01,930 INFO org.apache.zookeeper.server.NIOServerCnxn:
closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected
local=/10.201.9.40:2181 remote=/10.201.9.22:63316]
2010-08-26 22:58:02,097 INFO org.apache.zookeeper.server.NIOServerCnxn:
Connected to /10.201.9.22:63317 lastZxid 4004
2010-08-26 22:58:02,097 WARN org.apache.zookeeper.server.NIOServerCnxn:
Client has seen zxid 0xfa4 our last zxid is 0x42
2010-08-26 22:58:02,097 WARN org.apache.zookeeper.server.NIOServerCnxn:
Exception causing close of session 0x0 due to java.io.IOException: Client
has seen zxid 0xfa4 our last zxid is 0x42

If you can shed some thought on root cause, that would be great.



Re: Zookeeper stops

2010-08-19 Thread Patrick Hunt
+1 on that Ted. I frequently see this issue crop up as I just rebooted 
my server and lost all my data ... -- many os's will cleanup tmp on 
reboot. :-)


Patrick

On 08/19/2010 07:43 AM, Ted Dunning wrote:

Also, /tmp is not a great place to keep things that are intended for
persistence.

On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konarmaha...@yahoo-inc.comwrote:


Hi Wim,
  It mostly looks like that zookeeper is not able to create files on the
/tmp filesystem. Is there is a space shortage or is it possible the file is
being deleted as its being written to?

Sometimes admins have a crontab on /tmp that cleans up the /tmp filesystem.

Thanks
mahadev


On 8/19/10 1:15 AM, Wim Jongmanwim.jong...@gmail.com  wrote:

Hi,

I have a zookeeper server running that can sometimes run for days and then
quits:

Is there somebody with a clue to the problem?

I am running 64 bit Ubuntu with

java version 1.6.0_18
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

Zookeeper 3.3.0

The log below has some context before it shows the fatal error. Our
component.id=40676 indicates that it is the 40676th time that I ask ZK to
publish this information. It has been seen to go up to half a million
before
stopping.

Regards,

Wim

ZooDiscovery  Service Unpublished: Aug 18, 2010 11:17:28 PM.
ServiceInfo[uri=osgiservices://

188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice
,

osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@68a1e081,
component.name=Star Wars Quotes Service, ecf.sp.ect=ecf.generic.server,
component.id=40676,

ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5b9a6ad1
}]]
ZooDiscovery  Service Published: Aug 18, 2010 11:17:29 PM.
ServiceInfo[uri=osgiservices://

188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice
,

osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4,
component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server,
component.id=40677,

ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953
}]]
[log;+0200 2010.08.18

23:17:29:545;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remoteservice;code=0;message=No
async remote service interface found with
name=org.eclipse.ecf.services.quotes.QuoteServiceAsync for proxy service

class=org.eclipse.ecf.services.quotes.QuoteService;severity2;exception=null;children=[]]]
2010-08-18 23:17:37,057 - FATAL [Snapshot Thread:zookeeperser...@262] -
Severe unrecoverable error, exiting
java.io.FileNotFoundException: /tmp/zookeeperData/version-2/snapshot.13e2e
(No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:209)
at java.io.FileOutputStream.init(FileOutputStream.java:160)
at

org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:224)
at

org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:211)
at

org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:260)
at

org.apache.zookeeper.server.SyncRequestProcessor$1.run(SyncRequestProcessor.java:120)
ZooDiscovery  Service Unpublished: Aug 18, 2010 11:17:37 PM.
ServiceInfo[uri=osgiservices://

188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice
,

osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4,
component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server,
component.id=40677,


Re: Zookeeper stops

2010-08-19 Thread Patrick Hunt

No. You configure it in the server configuration file.

Patrick

On 08/19/2010 01:19 PM, Wim Jongman wrote:

Hi,

But zk does default to /tmp?

Regards,

Wim





On Thursday, August 19, 2010, Patrick Huntph...@apache.org  wrote:

+1 on that Ted. I frequently see this issue crop up as I just rebooted my server 
and lost all my data ... -- many os's will cleanup tmp on reboot. :-)

Patrick

On 08/19/2010 07:43 AM, Ted Dunning wrote:

Also, /tmp is not a great place to keep things that are intended for
persistence.

On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konarmaha...@yahoo-inc.comwrote:


Hi Wim,
   It mostly looks like that zookeeper is not able to create files on the
/tmp filesystem. Is there is a space shortage or is it possible the file is
being deleted as its being written to?

Sometimes admins have a crontab on /tmp that cleans up the /tmp filesystem.

Thanks
mahadev


On 8/19/10 1:15 AM, Wim Jongmanwim.jong...@gmail.comwrote:

Hi,

I have a zookeeper server running that can sometimes run for days and then
quits:

Is there somebody with a clue to the problem?

I am running 64 bit Ubuntu with

java version 1.6.0_18
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

Zookeeper 3.3.0

The log below has some context before it shows the fatal error. Our
component.id=40676 indicates that it is the 40676th time that I ask ZK to
publish this information. It has been seen to go up to half a million
before
stopping.

Regards,

Wim

ZooDiscoveryService Unpublished: Aug 18, 2010 11:17:28 PM.
ServiceInfo[uri=osgiservices://

188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice
,

osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@68a1e081,
component.name=Star Wars Quotes Service, ecf.sp.ect=ecf.generic.server,
component.id=40676,

ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5b9a6ad1
}]]
ZooDiscoveryService Published: Aug 18, 2010 11:17:29 PM.
ServiceInfo[uri=osgiservices://

188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice
,

osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4,
component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server,
component.id=40677,

ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953
}]]
[log;+0200 2010.08.18

23:17:29:545;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remo


Re: ZK monitoring

2010-08-19 Thread Patrick Hunt
Maybe we should have a contrib pkg for utilities such as this? I could 
see a python script that, given 1 server (might require addl 4letter 
words but this would be useful regardless), could collect such 
information from the cluster. Create a JIRA?


Patrick

On 08/17/2010 12:14 PM, Andrei Savu wrote:

It's not possible. You need to query all the servers in order to know
who is the current leader.

It should be pretty simple to implement this by parsing the output
from the 'stat' 4-letter command.

On Tue, Aug 17, 2010 at 9:50 PM, Jun Raojun...@gmail.com  wrote:

Hi,

Is there a way to see the current leader and a list of followers from a
single node in the ZK quorum? It seems that ZK monitoring (JMX, 4-letter
commands) only provides info local to a node.

Thanks,

Jun





-- Andrei Savu


Re: A question about Watcher

2010-08-17 Thread Patrick Hunt
All servers keep a copy - so you can shutdown the zk service entirely 
(all servers) and restart it and the sessions are maintained.


Patrick

On 08/16/2010 06:34 PM, Qian Ye wrote:

Thx Mahadev and Benjamin, it seems that I've got some misunderstanding about
the client. I will check it out.

Another relevant question. I noticed that the master zookeeper server keep a
track of all the client session which connects to every zookeeper server in
the same cluster. So when a slave zookeeper server failed, the clients it
served, can switch to another zookeeper server and keep their old session
(the new zookeeper server can get the session information from the master).
My question is, if the master failed, does that means some session
information will definitely be lost?

thx~

On Tue, Aug 17, 2010 at 12:40 AM, Benjamin Reedbr...@yahoo-inc.com  wrote:


the client does keep track of the watches that it has outstanding. when it
reconnects to a new server it tells the server what it is watching for and
the last view of the system that it had.

ben


On 08/16/2010 09:28 AM, Qian Ye wrote:


thx for explaination. Since the watcher can be preserved when the client
switch the zookeeper server it connects to, does that means all the
watchers
information will be saved on all the zookeeper servers? I didn't find any
source of the client can hold the watchers information.


On Tue, Aug 17, 2010 at 12:21 AM, Ted Dunningted.dunn...@gmail.com
  wrote:




I should correct this.  The watchers will deliver a session expiration
event, but since the connection is closed at that point no further
events will be delivered and the cluster will remove them.  This is as
good
as the watchers disappearing.

On Mon, Aug 16, 2010 at 9:20 AM, Ted Dunningted.dunn...@gmail.com
wrote:




The other is session expiration.  Watchers do not survive this.  This
happens when a client does not provide timely
evidence that it is alive and is marked as having disappeared by the
cluster.



















Re: How to handle Node does not exist error?

2010-08-16 Thread Patrick Hunt
Try using the logs, stat command or JMX to verify that each ZK server is 
indeed a leader/follower as expected. You should have one leader and n-1 
followers. Verify that you don't have any standalone servers (this is 
the most frequent error I see - misconfiguration of a server such that 
it thinks it's a standalone server; I often see where a user has 3 
standalone servers which they think is a single quorum, all of the 
servers will therefore be inconsistent to each other).


Patrick

On 08/12/2010 05:42 PM, Ted Dunning wrote:

On Thu, Aug 12, 2010 at 4:57 PM, Dr Hao Heh...@softtouchit.com  wrote:


hi, Ted,

I am a little bit confused here.  So, is the node inconsistency problem
that Vishal and I have seen here most likely caused by configurations or
embedding?

If it is the former, I'd appreciate if you can point out where those silly
mistakes have been made and the correct way to embed ZK.



I think it is likely due to misconfiguration, but I don't know what the
issue is exactly.  I think that another poster suggested that you ape the
normal ZK startup process more closely.  That sounds good but it may be
incompatible with your goals of integrating all configuration into a single
XML file and not using the normal ZK configuration process.

Your thought about forking ZK is a good one since there are calls to
System.exit() that could wreak havoc.




Although I agree with your comments about the architectural issues that
embedding may lead to and we are aware of those,  I do not agree that
embedding will always lead to those issues.



I agree that embedding won't always lead to those issues and your
application is a reasonable counter-example.  As is common, I think that the
exception proves the rule since your system is really just another way to
launch an independent ZK cluster rather than an example of ZK being embedded
into an application.



Re: client failure detectionin ZK

2010-08-16 Thread Patrick Hunt

The session timeout is used for this:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions

Patrick

On 08/16/2010 01:47 PM, Jun Rao wrote:

Hi,

What config parameters in ZK determine how soon a failed client is detected?
Thanks,

Jun



Re: Backing up zk data files

2010-08-12 Thread Patrick Hunt


On 08/11/2010 06:49 PM, Adam Rosien wrote:

http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperAdmin.html#sc_dataFileManagement
says that one can copy the contents of the data directory and use it
on another machine. The example states the other instance is not in
the server list; what would happen if one did copy it to an offline
member of the quorum that then starts up?



The previously offline member will contact the quorum leader and see 
that it has an older version of the db, it will then synchronize with 
the leader as usual. (either by d/l a diff or if too bar behind getting 
a full snapshot).



Do the docs imply that one can copy the data directory as-is as a
backup method? Is it restorable to any crashed/hosed server, or only
the one with the same server id?



It can be copied as is. Keep in mind though this is only needed for 
catastrophic failures (the entire zk serving cluster is lost) - not the 
case where a single server loses it's HD for example, in that case you 
just restart the server - it will contact the leader and synchronize as 
I detailed above.



What is a valid backup method for zk data?


Copy the datadirectory (snapshots and logs)

Patrick


Re: zookeeper seems to hang

2010-08-12 Thread Patrick Hunt

Great bug report Ted, the stack trace in particular is very useful.

It looks like a timing bug where the client is not shutting down cleanly 
on the close call. I reviewed the code in question but nothing pops out 
at me. Also the logs just show us shutting down, nothing else from zk in 
there.


Create a jira and attach all the detail you have available.

Patrick

On 08/11/2010 03:21 PM, Ted Yu wrote:

Hi,
Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Your comment is welcome.

Here is relevant portion from jstack - I attempted to attach jstack twice in
my email to d...@hbase.apache.org but failed:

DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on
condition [0x]
java.lang.Thread.State: RUNNABLE

regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81
in Object.wait() [0x43755000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on0x2aaab76633c0  (a
org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
 - locked0x2aaab76633c0  (a
org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
 at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
 - locked0x2aaabf5e0c30  (a org.apache.zookeeper.ZooKeeper)
 at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
 at java.lang.Thread.run(Thread.java:619)

main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting
on condition [0x413f3000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for0x2aaabf6e9150  (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

RMI TCP Accept-0 daemon prio=10 tid=0x2aabb822c800 nid=0x6c7d runnable
[0x40752000]
java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
 - locked0x2aaabf585578  (a java.net.SocksSocketImpl)
 at java.net.ServerSocket.implAccept(ServerSocket.java:453)
 at java.net.ServerSocket.accept(ServerSocket.java:421)
 at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
 at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
 at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
 at java.lang.Thread.run(Thread.java:619)



Re: Clarification on async calls in a cluster

2010-08-11 Thread Patrick Hunt


On 08/11/2010 03:25 PM, Jordan Zimmerman wrote:

If I use an async version of a call in a cluster (ensemble) what
happens if the server I'm connected to goes down? Does ZK
transparently resubmit the call to the next server in the cluster and
call my async callback or is there something I need to do? The docs
aren't clear on this and searching the archive didn't give me the
answer. Another source of confusion here is that the non-async
versions do not resubmit the call - I need to do that manually.

Thanks!


Hi Jordan, the callbacks have a rc parameter that details the result 
of the request (result code), this will be one of KeeperException.Code, 
in this case CONNECTIONLOSS. You receive a connection loss result when 
the client has sent a request to the server but loses the connection 
before the server responds. You must resubmit of this request manually 
(usually once you reconnect to the cluster), same as for sync calls.


See these sections in the faq:
http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2

also some detail in
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions

I agree the docs could be improved here. The java api for callback is 
esp. embarassing (there is none). Please enter JIRAs for any areas you'd 
like to see improved, including adding javadoc to the callbacks.


Regards,

Patrick


Re: Sequence Number Generation With Zookeeper

2010-08-10 Thread Patrick Hunt

Great!

Basic details are here (create a jira, attach a patch, click submit 
and someone will review and help you get it into a state which we can 
commit). Probably you'd put your code into src/recipes or src/contrib 
(recipes sounds reasonable).

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

Patrick

On 08/10/2010 09:59 AM, David Rosenstrauch wrote:

Good news!  I got approval to release this code!  (Man, I love working
for a startup!!!) :-)

So anyone know: what's the next step? Do I need to obtain commit
privileges? Or do I deliver the code to someone who has commit privs who
shepherds this for me?

Also, what (if anything) do I need to tweak in the code to make it
release-ready. (e.g., Change package names? Slap an Apache license on
it? etc.)

Thanks,

DR

On 08/06/2010 10:39 PM, David Rosenstrauch wrote:

I'll run it by my boss next week.

DR

On 08/06/2010 07:30 PM, Mahadev Konar wrote:

Hi David,
I think it would be really useful. It would be very helpful for someone
looking for geenrating unique tokens/generations ids ( I can think of
plenty
of applications for this).

Please do consider contributing it back to the community!

Thanks
mahadev


On 8/6/10 7:10 AM, David Rosenstrauchdar...@darose.net wrote:


Perhaps. I'd have to ask my boss for permission to release the code.

Is this something that would be interesting/useful to other people? If
so, I can ask about it.

DR

On 08/05/2010 11:02 PM, Jonathan Holloway wrote:

Hi David,

We did discuss potentially doing this as well. It would be nice to
get some
recipes for Zookeeper done for this area, if people think it's
useful. Were
you thinking of submitting this back as a recipe, if not then I could
potentially work on such a recipe instead.

Many thanks,
Jon.



I just ran into this exact situation, and handled it like so:

I wrote a library that uses the option (b) you described above. Only
instead of requesting a single sequence number, you request a block
of them
at a time from Zookeeper, and then locally use them up one by one
from the
block you retrieved. Retrieving by block (e.g., by blocks of 1
at a
time) eliminates the contention issue.

Then, if you're finished assigning ID's from that block, but still
have a
bunch of ID's left in the block, the library has another function
to push
back the unused ID's. They'll then get pulled again in the next
block
retrieval.

We don't actually have this code running in production yet, so I
can't
vouch for how well it works. But the design was reviewed and given
the
thumbs up by the core developers on the team, and the
implementation passes
all my unit tests.

HTH. Feel free to email back with specific questions if you'd like
more
details.

DR




Re: Too many KeeperErrorCode = Session moved messages

2010-08-08 Thread Patrick Hunt
I suspect this is a bug with the sync call and session moved (the code 
path for sync is a bit special). Please enter a JIRA for this. Thanks.


Patrick

On 08/05/2010 01:20 PM, Vishal K wrote:

Hi All,

I am seeing a lot of these messages in our application. I would like to know
if I am doing something wrong or this is a ZK bug.

Setup:
- Server environment:zookeeper.version=3.3.0-925362
- 3 node cluster
- Each node has few clients that connect to the local server using 127.0.0.1
as the host IP.
- The application first forms a ZK cluster. Once the ZK cluster is formed,
each node establish sessions with local ZK servers. The clients do not know
about remote server so sessions are always with the local server.

As soon as ZK clients connected to their respective follower, the ZK leader
starts spitting the following messages:

2010-07-01 10:55:36,733 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,748 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x9 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,755 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0xb zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,795 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x10 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,850 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x1 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,910 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x1b zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:36,920 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x20 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,019 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x29 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,030 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2c zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,035 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x2e zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:37,065 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa9
type:sync: cxid:0x33 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
2010-07-01 10:55:38,840 - INFO  [ProcessThread:-1:preprequestproces...@405]
- Got user-level KeeperException when processing sessionid:0x298d3b1fa90001
type:sync: cxid:0x4 zxid:0xfffe txntype:unknown reqpath:/ Error
Path:null Error:KeeperErrorCode = Session moved
20

These sessions were established on the follower:
2010-07-01 08:59:09,890 - INFO  [CommitProcessor:0:nioserverc...@1431] -
Established session 0x298d3b1fa9 with negotiated timeout 9000 for client
/127.0.0.1:50773
2010-07-01 08:59:09,890 - INFO
[SvaDefaultBLC-SendThread(localhost.localdom:2181):clientcnxn$sendthr...@701]
- Session establishment complete on server localhost.localdom/127.0.0.1:2181,
sessionid = 0x298d3b1fa9, negotiated timeout = 9000


The server is spitting out these messages for every session that it does not
own  (session established by clients with followers). The messages are
always seen for a sync request.
No other issues are seen with the cluster. I am wondering what would be the
cause of this problem? Looking at PrepRequestProcessor, it seems like this
message is printed when the owner of the 

Re: Using watcher for being notified of children addition/removal

2010-08-02 Thread Patrick Hunt
You may want to consider adding a distributed queue to your use of ZK. 
As was mentioned previously, watches don't notify you of every change, 
just that a change was made. For example multiple changes may be 
visible when you get the notification.


A distributed queue would allow you to log every change, and have your 
watcher process easily process the result. The only issue I could see 
is one of atomicity, but depending on your use case(s) that may not be 
an issue, or perhaps one that can be worked around.


Patrick

On 08/02/2010 09:18 AM, Ted Dunning wrote:

Another option besides Steve's excellent one would be to keep something like
1000 nodes in your list per znode.  Many update patterns will give you the
same number of updates, but the ZK transactions that result (getChildren,
read znode) will likely be more efficient, especially the getChildren call.

Remember, it is not a requirement that you have a one-to-one mapping between
your in-memory objects and in-zookeeper znodes.  If that works, fine.  If
not, feel free to be creative.

On Mon, Aug 2, 2010 at 7:45 AM, Steve Gury
steve.g...@mimesis-republic.comwrote:


Is there any recipe that would provide this feature (or a work around) ?





Re: JMX error while starting ZooKeeper

2010-07-19 Thread Patrick Hunt


On 07/19/2010 05:04 PM, Rakesh Aggarwal wrote:

javax.management.MBeanServer; was not found


Sounds like you are missing rt.jar for some reason (contains that class).

Try running java -verbose -version and see what jars are being picked 
up, I see a number of lines containing:


 ... /usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/rt.jar ...

Patrick


Re: Errors with Python bindings

2010-07-16 Thread Patrick Hunt
Hi Rich, the version string looks useful to have, thanks! Would you mind 
submitting this via jira? Do a svn diff (looks like you did already), 
create a jira and attach the diff, then click submit link on the jira. 
We'll review and work on getting it into a future release.

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

Thanks!

Patrick

On 07/15/2010 05:24 PM, Rich Schumacher wrote:

Hey Henry,

Good to know!  I was under the impression the 3.3.0 release had the updated 
bindings but it seems I was mistaken.  I'll get those built and then see what 
happens.  Just curious, have you ever run into or heard of this?  A quick 
Google search didn't return anything interesting.

As for the version in the Python bindings, how about this trivial patch:

Index: src/c/zookeeper.c
===
--- src/c/zookeeper.c   (revision 964617)
+++ src/c/zookeeper.c   (working copy)
@@ -1510,6 +1510,11 @@
PyModule_AddObject(module, ZooKeeperException, ZooKeeperException);
Py_INCREF(ZooKeeperException);

+  char version_str[];
+  sprintf(version_str, %i.%i.%i, ZOO_MAJOR_VERSION, ZOO_MINOR_VERSION, 
ZOO_PATCH_VERSION);
+
+  PyModule_AddStringConstant(module, __version__, version_str);
+
ADD_INTCONSTANT(PERM_READ);
ADD_INTCONSTANT(PERM_WRITE);
ADD_INTCONSTANT(PERM_CREATE);


On Jul 14, 2010, at 2:57 PM, Henry Robinson wrote:


Hi Rich -

No, there's not a very easy way to verify the Python bindings version afaik
- would be a useful feature to have though.

My first suggestion is to move to the bindings shipped with 3.3.1 - we fixed
a lot of problems with the Python bindings which improved their stability a
lot. Could you try that and then let us know if you continue to see
problems?

cheers,
Henry

On 14 July 2010 13:14, Rich Schumacherrich.s...@gmail.com  wrote:


I'm running a Tornado webserver and using ZooKeeper to store some metadata
and occasionally the ZooKeeper connection will error out irrevocably.  Any
subsequent calls to ZooKeeper from this process will result in a
SystemError.

Here is the relevant portion of the Python traceback:
snip...
File /usr/lib/pymodules/python2.5/zuul/storage/zoo.py, line 69, in call
   return getattr(zookeeper, name)(self.handle, *args)
SystemError: NULL result without error in PyObject_Call

I found this in the ZooKeeper server logs:

2010-07-13 06:52:46,488 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - Accepted socket
connection from /10.2.128.233:54779
2010-07-13 06:52:46,489 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:nioserverc...@742] - Client attempting to renew
session 0x429b865a6270003 at /10.2.128.233:54779
2010-07-13 06:52:46,489 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:lear...@95] - Revalidating client: 299973596915630083
2010-07-13 06:52:46,793 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@1424] - Invalid session
0x429b865a6270003 for client /10.2.128.233:54779, probably expired
2010-07-13 06:52:46,794 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket connection for
client /10.2.128.233:54779 which had sessionid 0x429b865a6270003


The ZooKeeper ensemble is healthy; each node responds as expected to the
four letter word commands and a simple restart of the Tornado processes
fixes this.

My question is, if this really is due to session expiration why is a
SessionExpiredException not raised?  Another question, is there an easy way
to determine the version of the ZooKeeper Python bindings I'm using?  I
built the 3.3.0 bindings but I just want to be able to verify that.

Thanks for the help,

Rich





--
Henry Robinson
Software Engineer
Cloudera
415-994-6679




Re: total # of zknodes

2010-07-15 Thread Patrick Hunt
I've done some tests with ~600 clients creating 5 million znodes (size 
100bytes iirc) and 25million watches. I was using 8gb of memory for 
this, however --- in this scenario it's critical that you tune the GC, 
in particular you need to turn on CMS and incremental GC options. Otw 
when the GC collects it will collect for long periods of time and all of 
your clients will then time out. Keep an eye on the max latency of your 
servers, that's usually the most obvious indication of GC hits (it will 
spike up).


You can use the latency tester from here to do the quick benchmarks Ben 
suggested:

http://github.com/phunt/zk-smoketest
also see: http://bit.ly/4ekN8G

Patrick

On 07/15/2010 08:57 AM, Benjamin Reed wrote:

i think there is a wiki page on this, but for the short answer:

the number of znodes impact two things: memory footprint and recovery
time. there is a base overhead to znodes to store its path, pointers to
the data, pointers to the acl, etc. i believe that is around 100 bytes.
you cant just divide your memory by 100+1K (for data) though, because
the GC needs to be able to run and collect things and maintain a free
space. if you use 3/4 of your available memory, that would mean with 4G
you can store about three million znodes. when there is a crash and you
recover, servers may need to read this data back off the disk or over
the network. that means it will take about a minute to read 3G from the
disk and perhaps a bit more to read it over the network, so you will
need to adjust your initLimit accordingly.

of course this is all back-of-the-envelope. i would suggest doing some
quick benchmarks to test and make sure your results are in line with
expectation.

ben

On 07/15/2010 02:56 AM, Maarten Koopmans wrote:

Hi,

I am mapping a filesystem to ZooKeeper, and use it for locking and
mapping a filesystem namespace to a flat data object space (like S3).
So assuming proper nesting and small ZooKeeper nodes ( 1KB), how many
nodes could a cluster with a few GBs of memory per instance
realistically hold totally?

Thanks, Maarten




Re: Suggested way to simulate client session expiration in unit tests?

2010-07-06 Thread Patrick Hunt

If you want to simulate expiration use the example I sent.


http://github.com/phunt/zkexamples


Another option is to use a mock.

Patrick

On 07/06/2010 05:42 PM, Jeremy Davis wrote:

Thanks!
That seems to work, but it is approximately the same as zooKeeper.close() in
that there is no SessionExpired event that comes up through the default
Watcher.
Maybe I'm assuming more from ZK than I should, but should a paranoid lock
implementation periodically test it's session by reading or writing a value?

Regards,
-JD


On Tue, Jul 6, 2010 at 10:32 AM, Mahadev Konarmaha...@yahoo-inc.comwrote:


Hi Jeremy,

  zk.disconnect() is the right way to disconnect from the servers. For
session expiration you just have to make sure that the client stays
disconnected for more than the session expiration interval.

Hope that helps.

Thanks
mahadev


On 7/6/10 9:09 AM, Jeremy Davisjerdavis.cassan...@gmail.com  wrote:


Is there a recommended way of simulating a client session expiration in

unit

tests?
I see a TestableZooKeeper.java, with a pauseCnxn() method that does cause
the connection to timeout/disconnect and reconnect. Is there an easy way

to

push this all the way through to session expiration?
Thanks,
-JD







Re: Zookeeper outage recap questions

2010-07-01 Thread Patrick Hunt
Hi Travis, as Flavio suggested would be great to get the logs. A few 
questions:


1) how did you eventually recover, restart the zk servers?

2) was the cluster losing quorum during this time? leader re-election?

3) Any chance this could have been initially triggered by a long GC 
pause on one of the servers? (is gc logging turned on, any sort of heap 
monitoring?) Has the GC been tuned on the servers, for example CMS and 
incremental?


4) what are the clients using for timeout on the sessions?

3.4 probably not for a few months yet, but we are planning for a 3.3.2 
in a few weeks to fix a couple critical issues (which don't seem related 
to what you saw). If we can identify the problem here we should be able 
to include it in any fix release we do.


fixing something like 517 might help, but it's not clear how we got to 
this state in the first place. fixing 517 might not have any effect if 
the root cause is not addressed. 662 has only ever been reported once 
afaik, and we weren't able to identify the root cause for that one.


One thing we might also consider is modifying the zk client lib to 
backoff connection attempts if they keep failing (timing out say). Today 
the clients are pretty aggressive on reconnection attempts. Having some 
sort of backoff (exponential?) would provide more breathing room to the 
server in this situation.


Patrick

On 06/30/2010 11:13 PM, Travis Crawford wrote:

Hey zookeepers -

We just experienced a total zookeeper outage, and here's a quick
post-mortem of the issue, and some questions about preventing it going
forward. Quick overview of the setup:

- RHEL5 2.6.18 kernel
- Zookeeper 3.3.0
- ulimit raised to 65k files
- 3 cluster members
- 4-5k connections in steady-state
- Primarily C and python clients, plus some java

In chronological order, the issue manifested itself as alert about RW
tests failing. Logs were full of too many files errors, and the output
of netstat showed lots of CLOSE_WAIT and SYN_RECV sockets. CPU was
100%. Application logs showed lots of connection timeouts. This
suggests an event happened that caused applications to dogpile on
Zookeeper, and eventually the CLOSE_WAIT timeout caused file handles
to run out and basically game over.

I looked through lots of logs (clients+servers) and did not see a
clear indication of what happened. Graphs show a sudden decrease in
network traffic when the outage began, zookeeper goes cpu bound, and
runs our of file descriptors.

Clients are primarily a couple thousand C clients using default
connection parameters, and a couple thousand python clients using
default connection parameters.

Digging through Jira we see two issues that probably contributed to this outage:

 https://issues.apache.org/jira/browse/ZOOKEEPER-662
 https://issues.apache.org/jira/browse/ZOOKEEPER-517

Both are tagged for the 3.4.0 release. Anyone know if that's still the
case, and when 3.4.0 is roughly scheduled to ship?

Thanks!
Travis


Re: Guaranteed message delivery until session timeout?

2010-06-30 Thread Patrick Hunt

On 06/30/2010 09:37 AM, Ted Dunning wrote:

Which API are you talking about?  C?

I think that the difference between connection loss and session expiration
might mess you up slightly in your disjunction here.

On Wed, Jun 30, 2010 at 7:45 AM, Bryan Thompsonbr...@systap.com  wrote:


I am wondering what guarantees (if any) zookeeper provides for reliable
messaging for operation return codes up to a session timeout.  Basically, I


in particular see timeliness 
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkGuarantees



would like to know whether a zookeeper client can rely on observing the
return code for a successful operation which creates an ephemeral (or
ephemeral sequential) znode -or- have a guarantee that its session was timed
out and the ephemeral znode destroyed.  That is, does zookeeper provide


Any ephemeral node(s) associated with a session will be deleted when the 
session is invalidated (session expiration or client close request).


Patrick


Re: Receive timed out error while starting zookeeper server

2010-06-27 Thread Patrick Hunt


On 06/26/2010 06:53 AM, Peeyush Kumar wrote:

 I have a 6 node cluster (5 slaves and 1 master). I am trying to


You typically want an odd number given that zk works by majority (even 
is fine, but not optimal). So 5 would be great (7 is a bit of overkill). 
3 is fine too, but 5 allows for you to take 1 server down for scheduled 
maintenance and still experience an unexpected failure w/o impact to 
service availability.


In your exception I see DatagramSocket this is unusual. What are you 
running for ZK version? As Lei suggested please include your config file 
so that we can review that as well (if you are overriding electionAlg 
this might be part of the problem. Current versions of ZK servers use 
tcp for connections by default, that's why this is unusual.)


Most likely there is either a config problem or perhaps you have a 
firewall that's blocking communication btw the servers? Try verifying 
server to server connectivity on the ports you've selected.


Patrick


start the zookeper server on the cluster. when I issue this command:
$ java -cp zookeeper.jar:lib/log4j-1.2.15.jar:conf \
org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg
I get the following error:
2010-06-26 18:09:17,468 - INFO  [main:quorumpeercon...@80] - Reading
configuration from: conf/zoo.cfg
2010-06-26 18:09:17,483 - INFO  [main:quorumpeercon...@232] - Defaulting to
majority quorums
2010-06-26 18:09:17,545 - INFO  [main:quorumpeerm...@118] - Starting quorum
peer
2010-06-26 18:09:17,585 - INFO  [QuorumPeer:/0.0.0.0:2179:quorump...@514] -
LOOKING
2010-06-26 18:09:17,589 - INFO  [QuorumPeer:/0.0.0.0:2179:leaderelect...@154]
- Server address: master.cf.net/192.168.1.1:2180

2010-06-26 18:09:17,589 - INFO  [QuorumPeer:/0.0.0.0:2179:leaderelect...@154]
- Server address: slave01.cf.net/192.168.1.2:2180

2010-06-26 18:09:17,792 - WARN  [QuorumPeer:/0.0.0.0:2179:leaderelect...@194]
- Ignoring exception while looking for
leader

java.net.SocketTimeoutException: Receive timed
out
 at java.net.PlainDatagramSocketImpl.receive0(Native
Method)
 at
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)

 at
java.net.DatagramSocket.receive(DatagramSocket.java:725)

 at
org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170)

 at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515)

2010-06-26 18:09:17,794 - INFO  [QuorumPeer:/0.0.0.0:2179:leaderelect...@154]
- Server address: slave02.cf.net/192.168.1.3:2180

2010-06-26 18:09:17,995 - WARN  [QuorumPeer:/0.0.0.0:2179:leaderelect...@194]
- Ignoring exception while looking for
leader

java.net.SocketTimeoutException: Receive timed
out
 at java.net.PlainDatagramSocketImpl.receive0(Native
Method)
 at
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)

 at
java.net.DatagramSocket.receive(DatagramSocket.java:725)

 at
org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170)

 at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515)

2010-06-26 18:09:17,996 - INFO  [QuorumPeer:/0.0.0.0:2179:leaderelect...@154]
- Server address: slave03.cf.net/192.168.1.4:2180

2010-06-26 18:09:18,197 - WARN  [QuorumPeer:/0.0.0.0:2179:leaderelect...@194]
- Ignoring exception while looking for
leader

java.net.SocketTimeoutException: Receive timed
out
 at java.net.PlainDatagramSocketImpl.receive0(Native
Method)
 at
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)

 at
java.net.DatagramSocket.receive(DatagramSocket.java:725)

 at
org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170)

 at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515)

2010-06-26 18:09:18,200 - INFO  [QuorumPeer:/0.0.0.0:2179:leaderelect...@154]
- Server address: slave04.cf.net/192.168.1.5:2180

2010-06-26 18:09:18,401 - WARN  [QuorumPeer:/0.0.0.0:2179:leaderelect...@194]
- Ignoring exception while looking for
leader

java.net.SocketTimeoutException: Receive timed
out
 at java.net.PlainDatagramSocketImpl.receive0(Native
Method)
 at
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)

 at
java.net.DatagramSocket.receive(DatagramSocket.java:725)

 at
org.apache.zookeeper.server.quorum.LeaderElection.lookForLeader(LeaderElection.java:170)

 at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515)

2010-06-26 18:09:18,402 - INFO  [QuorumPeer:/0.0.0.0:2179:leaderelect...@154]
- Server address: slave05.cf.net/192.168.1.6:2180

2010-06-26 18:09:18,604 - WARN  [QuorumPeer:/0.0.0.0:2179:leaderelect...@194]
- Ignoring exception while looking for
leader

java.net.SocketTimeoutException: Receive timed
out
 at java.net.PlainDatagramSocketImpl.receive0(Native
Method)
 at

Re: 答复: Starting zookeeper in replicat ed mode

2010-06-22 Thread Patrick Hunt

There are 3 ports that need to be opened

1) the client port (btw client and servers)
2/3) the quorum and election ports - only btw servers

You are setting these three ports in your config file (clientport 
defaults to 2181 iirc, unless you override)


Patrick

On 06/22/2010 06:17 AM, Erik Test wrote:

Thanks for your help. The missing file issue is resolved.

I was confused by how to start zookeeper because a firewall is blocking
connections between nodes. The odd thing is hadoop can run on its own with
the configured iptables but doesn't work with zookeeper for some reason. The
problem here is I can't turn off the firewall and need to configure the
firewall so that zookeeper can work correctly.

I'm going to work on the iptables to open connections needed by zookeeper.
If any one knows of a way to do this or even just a link to configuring an
iptable with zookeeper in mind, I'd appreciate it.

Thanks again for the help.
Erik


On 21 June 2010 20:56, Joe Zouj...@hz.webex.com  wrote:


Hi:
You miss the file.
the Caused by: java.lang.IllegalArgumentException: /var/zookeeper/myid file
is missing
at
thanks
Joe Zou
-邮件原件-
发件人: Erik Test [mailto:erik.shi...@gmail.com]
发送时间: Tuesday, June 22, 2010 3:05 AM
收件人: zookeeper-user@hadoop.apache.org
主题: Starting zookeeper in replicated mode

Hi All,

I'm having a problem with installing zookeeper on a cluster with 6 nodes in
replicated mode. I was able to install and run zookeeper in standalone mode
but I'm unable to run zookeeper in replicated mode.

I've added a list of servers in zoo.cfg as suggested by the ZooKeeper
Getting Started Guide but I get these logs displayed to screen:

*[r...@master1 bin]# ./zkServer.sh start
JMX enabled by default
Using config: /root/zookeeper-3.2.2/bin/../conf/zoo.cfg
Starting zookeeper ...
STARTED
[r...@master1 bin]# 2010-06-21 12:25:23,738 - INFO
[main:quorumpeercon...@80] - Reading configuration from:
/root/zookeeper-3.2.2/bin/../conf/zoo.cfg
2010-06-21 12:25:23,743 - INFO  [main:quorumpeercon...@232] - Defaulting
to
majority quorums
2010-06-21 12:25:23,745 - FATAL [main:quorumpeerm...@82] - Invalid config,
exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error
processing /root/zookeeper-3.2.2/bin/../conf/zoo.cfg
at

org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:100)
at

org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:98)
at

org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:75)
Caused by: java.lang.IllegalArgumentException: /var/zookeeper/myid file is
missing
at

org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:238)
at

org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:96)
... 2 more
Invalid config, exiting abnormally*

And here is my config file:
*
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=5
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=2
# the directory where the snapshot is stored.
dataDir=/var/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=master1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
*
I'm a little confused as to why this doesn't work and I haven't had any
luck
finding answers to some questions I have.

Am I supposed to have an instance of ZooKeeper on each node started before
running in replication mode? Should I have each node that will be running
ZK
listed in the config file? Should I be using an IP address to point to a
server instead of a hostname?

Thanks for your time.
Erik





Re: Free Software Solution to continuously load a large number of feeds with several servers?

2010-06-18 Thread Patrick Hunt
I've seen a number of these built as proprietary solutions using 
ZooKeeper. It would be great to see something open sourced. HBase/ZK 
seems like a good fit. You might also consider ZooKeeper/BookKeeper.


Patrick

On 06/18/2010 11:01 AM, Thomas Koch wrote:

http://stackoverflow.com/questions/3072042/free-software-solution-to-
continuously-load-a-large-number-of-feeds-with-several

I need a system that schedules and conducts the loading of a large number of
Feeds. The scheduling should consider priority values for feeds provided by me
and the history of past publish frequency of the feed. Later the system should
make use of pubsub where available.
Currently I'm planning to implement my own system based on HBase and
ZooKeeper. If there isn't any free software solution by now, then I'd propose
at work to develop our solution as Free Software.

Thank you for any hints,

Thomas Koch, http://www.koch.ro


Re: zookeeper crash

2010-06-16 Thread Patrick Hunt
We are unable to reproduce this issue. If you can provide the server 
logs (all servers) and attach them to the jira it would be very helpful. 
Some detail on the approx time of the issue so we can correlate to the 
logs would help too (summary of what you did/do to cause it, etc... 
anything that might help us nail this one down).


https://issues.apache.org/jira/browse/ZOOKEEPER-335

Some detail on ZK version, OS, Java version, HW info, etc... would also 
be of use to us.


Patrick

On 06/16/2010 02:49 PM, Vishal K wrote:

Hi,

We are running into this bug very often (almost 60-75% hit rate) while
testing our newly developed application over ZK. This is almost a blocker
for us. Will the fix be simplified if backward compatibility was not an
issue?

Considering that this bug is rarely reported, I am wondering why we are
running into this problem so often. Also, on a side note, I am curious why
the systest that comes with ZooKeeper did not detect this bug. Can anyone
please give an overview of the problem?

Thanks.
-Vishal


On Wed, Jun 2, 2010 at 8:17 PM, Charity Majorschar...@shopkick.com  wrote:


Sure thing.

We got paged this morning because backend services were not able to write
to the database.  Each server discovers the DB master using zookeeper, so
when zookeeper goes down, they assume they no longer know who the DB master
is and stop working.

When we realized there were no problems with the database, we logged in to
the zookeeper nodes.  We weren't able to connect to zookeeper using zkCli.sh
from any of the three nodes, so we decided to restart them all, starting
with node one.  However, after restarting node one, the cluster started
responding normally again.

(The timestamps on the zookeeper processes on nodes two and three *are*
dated today, but none of us restarted them.  We checked shell histories and
sudo logs, and they seem to back us up.)

We tried getting node one to come back up and join the cluster, but that's
when we realized we weren't getting any logs, because log4j.properties was
in the wrong location.  Sorry -- I REALLY wish I had those logs for you.  We
put log4j back in place, and that's when we saw the spew I pasted in my
first message.

I'll tack this on to ZK-335.



On Jun 2, 2010, at 4:17 PM, Benjamin Reed wrote:


charity, do you mind going through your scenario again to give a
timeline for the failure? i'm a bit confused as to what happened.

ben

On 06/02/2010 01:32 PM, Charity Majors wrote:

Thanks.  That worked for me.  I'm a little confused about why it threw

the entire cluster into an unusable state, though.


I said before that we restarted all three nodes, but tracing back, we

actually didn't.  The zookeeper cluster was refusing all connections until
we restarted node one.  But once node one had been dropped from the cluster,
the other two nodes formed a quorum and started responding to queries on
their own.


Is that expected as well?  I didn't see it in ZOOKEEPER-335, so thought

I'd mention it.




On Jun 2, 2010, at 11:49 AM, Patrick Hunt wrote:



Hi Charity, unfortunately this is a known issue not specific to 3.3

that

we are working to address. See this thread for some background:



http://zookeeper-user.578899.n2.nabble.com/odd-error-message-td4933761.html


I've raised the JIRA level to blocker to ensure we address this asap.

As Ted suggested you can remove the datadir -- only on the effected
server -- and then restart it. That should resolve the issue (the

server

will d/l a snapshot of the current db from the leader).

Patrick

On 06/02/2010 11:11 AM, Charity Majors wrote:


I upgraded my zookeeper cluster last week from 3.2.1 to 3.3.1, in an

attempt to get away from a client bug that was crashing my backend services.


Unfortunately, this morning I had a server crash, and it brought down

my entire cluster.  I don't have the logs leading up to the crash, because
-- argghffbuggle -- log4j wasn't set up correctly.  But I restarted all
three nodes, and odes two and three came back up and formed a quorum.


Node one, meanwhile, does this:

2010-06-02 17:04:56,446 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@620] - LOOKING

2010-06-02 17:04:56,446 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:files...@82] - Reading snapshot
/services/zookeeper/data/zookeeper/version-2/snapshot.a0045

2010-06-02 17:04:56,476 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@649] - New election.
My id =  1, Proposed zxid = 47244640287

2010-06-02 17:04:56,486 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@689] - Notification:
1, 47244640287, 4, 1, LOOKING, LOOKING, 1

2010-06-02 17:04:56,486 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification:
3, 38654707048, 3, 1, LOOKING, LEADING, 3

2010-06-02 17:04:56,486 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@799] - Notification:
3, 38654707048, 3, 1, LOOKING, FOLLOWING, 2

2010-06-02 17:04:56,486 - INFO

  [QuorumPeer:/0:0:0:0:0:0:0

Re: Debugging help for SessionExpiredException

2010-06-15 Thread Patrick Hunt
I'm not very experienced personally with running zk on ec2 smalls, Ted 
usually has the ec2 related insight. Given these boxes are not loaded or 
lightly loaded, and you've ruled out gc/swap, the only thing I can think 
of is that something is going on under the covers at the vm level that's 
causing the high latency you're seeing.


You're seeing 15 _minutes_ max latency. I can't think of what would 
cause that inside zk. Any chance that the VM is shutting down or 
freezing during that period? I dont' know. Are you monitoring that 
system from a second system? Perhaps that might shed some light (monitor 
the cpu/disk activity using some monitoring tool like ganglia, nagios, 
etc... or even more primitive, perhaps doing a ping to that system and 
tracking the round trip time/packet loss, dump to a file and review the 
next day, etc...)


Patrick

On 06/15/2010 03:59 PM, Jordan Zimmerman wrote:

They're small instances. The thing is that these machines are doing
next to no work. We're just running simple little tests. The session
expiration has not happened while I've been watching. It tends to
happen over night.

-JZ

On Jun 15, 2010, at 1:50 PM, Ted Dunning wrote:


As usual, the ZK team provides the best feedback.

I would be bold enough to ask what kind of ec2 instances you are
running on.  Small instances are small chunks of larger machines
and are sometimes subject to competition for resources from the
other tenants.

On Tue, Jun 15, 2010 at 12:30 PM, Patrick Huntph...@apache.org
wrote: 3) under-provisioned virtual machines (ie vmware)

...

Given that you've ruled out the gc (most common), disk utilization
would be the next thing to check.






Re: zookeeper watch triggered multiple times on same event

2010-06-15 Thread Patrick Hunt
I don't think this should be possible (if it happens it's a bug in zk). 
Perhaps, for some reason, there really are 2 change actions (children 
created, or the same child created twice) and not just one?


Re-registering the watch inside the watch is fine. The server sends 
watch notifications as one way messages, when it notices a znode child 
list has changed it fires off change messages to all the registered 
clients. The client then receives the notification and calls the handler.


Patrick

On 06/15/2010 05:47 PM, Jun Rao wrote:

Hi,

I get a quick question on ZK 3.2.2. Here is a sequence of events during a
test:
1. client 1 creates an ephemeral node under /a
2. client 1 sets a watch using getChildren on /a
3. client 2 creates an ephemeral node under /a
4. client 1's watch gets triggered (a node change event). Inside the watch,
client 1 does getChildren on /a and sets the watch.
5. client 1's watch gets triggered again (a node change event)

My question is why the same node change event gets triggered twice. It seems
that step 5 shouldn't have happened.

Thanks,

Jun



Re: Debugging help for SessionExpiredException

2010-06-11 Thread Patrick Hunt
Session expiration is due to the server not hearing heartbeats from the 
client. So either the client is partitioned from the server, or the 
client is not sending heartbeats for some reason, typically this is due 
to the client JVM gc'ing or swapping.


Patrick

On 06/10/2010 04:14 PM, Ted Dunning wrote:

Uh the options I was recommending were for your CLIENT.  You should have
similar settings on ZK, but it is your client that is likely to be pausing.

On Thu, Jun 10, 2010 at 4:08 PM, Jordan Zimmermanjzimmer...@proofpoint.com

wrote:



The thing is, this is a test instance (on AWS/EC2) that isn't getting a lot
of traffic. i.e. 1 zookeeper instance that we're testing with.

On Jun 10, 2010, at 4:06 PM, Ted Dunning wrote:


Possibly.

I have seen GC times of  4 minutes on some large processes.  Better to

set

the GC parameters so you don't get long pauses.

On http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting it mentions

using

the -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC options.  I

recommend

adding

-XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled
-XX:+DisableExplicitGC

You may want to tune the actual parameters of the GC itself.  These

should

not be used in general, but might be helpful for certain kinds of

servers:


-XX:MaxTenuringThreshold=6
-XX:SurvivorRatio=6
-XX:CMSInitiatingOccupancyFraction=60
-XX:+UseCMSInitiatingOccupancyOnly

Finally, you should always add options for lots of GC diagnostics:

-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution

On Thu, Jun 10, 2010 at 3:49 PM, Jordan Zimmerman

jzimmer...@proofpoint.com

wrote:



If I set my session timeout very high (1 minute) this shouldn't happen,
right?








Re: Debugging help for SessionExpiredException

2010-06-09 Thread Patrick Hunt
100mb partition? sounds like virtualization. resource starvation 
(worse in virtualized env) is a common cause of this. Are your clients 
gcing/swapping at all? If a client gc's for long periods of time the 
heartbeat thread won't be able to run and the server will expire the 
session. There is a min/max cap that the server places on the client 
timeouts (it's negotiated), check the client log for detail on what 
timeout it negotiated (logged in 3.3 releases)


take a look at this and see if you can make progress:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

My guess is that your client is gcing for long periods of time - you can 
rule this in/out by turning on gc logging in your clients and then 
viewing the results after another such incident happens (try gchisto for 
graphical view)


Patrick

On 06/09/2010 11:36 AM, Jordan Zimmerman wrote:

We have a test system using Zookeeper. There is a single Zookeeper
server node and 4 clients. There is very little activity in this
system. After a day's testing we start to see SessionExpiredException
on the client. Things I've tried:

* Increasing the session timeout to 1 minute * Making sure all JVMs
are running in a 100MB partition

Any help debugging this problem would be appreciated. What kind of
diagnostics should can I add? Are there more config parameters that I
should try?

-JZ


Re: Debugging help for SessionExpiredException

2010-06-09 Thread Patrick Hunt


On 06/09/2010 03:35 PM, Lei Zhang wrote:


We've consistently run into issues with vmware workstation (CentOS as guest
OS) on Windows host: just by leaving the cluster idle over night leads to zk
session expire issue. My theory is: windows may have gone to hibernation,
the zk heartbeat logic hibernates, session expire exception is thrown the
moment windows is taken out of hibernation.



That sounds like a possible scenario.


On EC2 (still CentOS as guest OS), we consistently run into zk session
expire issue when our cluster is under heavy load. I am planning to raise
scheduling priority of zk server, but haven't done testing.



Before you take any action you might examine a few things to identify 
what's biting you:


this has some good general detail on issues other users have seen:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

In particular you might look at GC/swapping on your clients, that's the 
most common case we see for session expiration (apart from the obvious 
-- network level connectivity failures). In one case I remember there 
was very heavy network load for a period of time once per day, this was 
causing some issue on the switches which would result in occassional 
session expiration, but only during this short window. This was pretty 
hard to track down. Are you monitoring network connectivity in general? 
Is it possible that temporary network outages are causing this? Perhaps 
take a look at both your server and client ZK logs, see if the client is 
seeing anything other than the session expiration (is the client seeing 
session TIMED OUT for example, this happens when the client doesn't hear 
back from the server, while session expiration happens because the 
server doesn't hear from the client).


Good luck,

Patrick


Re: Simulating failures?

2010-06-04 Thread Patrick Hunt

Here's how to test session expiration (haven't tried this in a while):
http://github.com/phunt/zkexamples

It would be great to have some test 
infrastructure/examples/docs/strategies available for developers (zk 
client users). If someone would be interested to workon/contribute this 
we'd be pretty psyched to work with you on it.


Patrick

On 06/04/2010 11:28 AM, Stephen Green wrote:

Now that I've got things working pretty smoothly with my ZooKeeper
setup in normal operation, I'd like to test some of the recovery stuff
that I've put into my application.

I'd like to make sure that if a connection to ZK fails, then my
application recovers appropriately (possibly by giving up).  Obviously
I could do some of this by shutting off the server and restarting it,
but I'd like to be a bit more systematic, if possible.

Is there any way to inject failures into the ZK client so that I can
test without having to randomly kill servers/clients?

Thanks,

Steve


Re: Locking and Partial Failure

2010-05-31 Thread Patrick Hunt
Hi Charles, any luck with this? Re the issues you found with the recipes 
please enter a JIRA, it would be good to address the problem(s) you found.

https://issues.apache.org/jira/browse/ZOOKEEPER

re use of session/thread id, might you use some sort of unique token 
that's dynamically assigned to the thread making a request on the shared 
session? The calling code could then be identified by that token in 
recovery cases.


Patrick

On 05/28/2010 08:28 AM, Charles Gordon wrote:

Hello,

I am new to using Zookeeper and I have a quick question about the locking
recipe that can be found here:

http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#sc_recipes_Locks

It appears to me that there is a flaw in this algorithm related to partial
failure, and I am curious to know how to fix it.

The algorithm follows these steps:

  1. Call create() with a pathname like /some/path/to/parent/child-lock-.
  2. Call getChildren() on the lock node without the watch flag set.
  3. If the path created in step (1) has the lowest sequence number, you are
the master (skip the next steps).
  4. Otherwise, call exists() with the watch flag set on the child with the
next lowest sequence number.
  5. If exists() returns false, go to step (2), otherwise wait for a
notification from the path, then go to step (2).

The scenario that seems to be faulty is a partial failure in step (1).
Assume that my client program follows step (1) and calls create(). Assume
that the call succeeds on the Zookeeper server, but there is a
ConnectionLoss event right as the server sends the response (e.g., a network
partition, some dropped packets, the ZK server goes down, etc). Assume
further that the client immediately reconnects, so the session is not timed
out. At this point there is a child node that was created by my client, but
that my client does not know about (since it never received the response).
Since my client doesn't know about the child, it won't know to watch the
previous child to it, and it also won't know to delete it. That means all
clients using that lock will fail to make progress as soon as the orphaned
child is the lowest sequence number. This state will continue until my
client closes it's session (which may be a while if I have a long lived
session, as I would like to have). Correctness is maintained here, but
live-ness is not.

The only good solution I have found for this problem is to establish a new
session with Zookeeper before acquiring a lock, and to close that session
immediately upon any connection loss in step (1). If everything works, the
session could be re-used, but you'd need to guarantee that the session was
closed if there was a failure during creation of the child node. Are there
other good solutions?

I looked at the sample code that comes with the Zookeeper distribution (I'm
using 3.2.2 right now), and it uses the current session ID as part of the
child node name. Then, if there is a failure during creation, it tries to
look up the child using that session ID. This isn't really helpful in the
environment I'm using, where a single session could be shared by multiple
threads, any of which could request a lock (so I can't uniquely identify a
lock by session ID). I could use thread ID, but then I run the risk of a
thread being reused and getting the wrong lock. In any case, there is also
the risk that a second failure prevents me from looking up the lock after a
connection loss, so I'm right back to an orphaned lock child, as above. I
could, presumably, be careful enough with try/catch logic to prevent even
that case, but it makes for pretty bug-prone code. Also, as a side note,
that code appears to be sorting the child nodes by the session ID first,
then the sequence number, which could cause locks to be ordered incorrectly.

Thanks for any help you can provide!

Charles Gordon



Re: Securing ZooKeeper connections

2010-05-27 Thread Patrick Hunt


On 05/27/2010 09:47 AM, Benjamin Reed wrote:

actually pat hunt took over that issue: ZOOKEEPER-733. pat has made a
lot of progress and the patch looks close to being ready.


This is just the server side though, still need to make similar changes 
on the client. That will likely be a separate jira. But yes, it's coming 
along.




ps - actually, to be clear the patch adds netty support. the idea is
that once we have netty in and netty supports SSL quite transparently,
it should be easy to get SSL in.


SSL/netty part seems pretty simple, however there's also the key mgmt 
portion which look more complicated (need to integrate not quite 
commons ssl or something like that, haven't gotten that far yet)




On 05/26/2010 04:44 PM, Mahadev Konar wrote:

Hi Vishal,
Ben (Benjamin Reed) has been working on a netty based client server
protocol in ZooKeeper. I think there is an open jira for it. My network
connection is pretty slow so am finding it hard to search for it.

We have been thinking abt enabling secure connections via this netty
based
connections in zookeeper.

Thanks
mahadev


On 5/25/10 12:20 PM, Vishal Kvishalm...@gmail.com wrote:


Hi All,

Since ZooKeeper does not support secure network connections yet, I
thought I
would poll and see what people are doing to address this problem. Is
anyone
running ZooKeeper over secure channels (client - server and server-
server
authentication/encryption)? If yes, can you please elaborate how you
do it?

Thanks.

Regards,
-Vishal




Re: Securing ZooKeeper connections

2010-05-27 Thread Patrick Hunt
Short of someone else stepping up I have it on my todo list. ;-) Still 
quite a bit of work to do on 733 though getting it back into shape. (not 
to mention layering the ssl on top). Then there's also the server-server 
connectivity that also needs to have netty support added 
(quorum/election port I mean, 733 only adds netty to the server side 
client port).


Set a watch on 733 and subscribe to the dev list if you want to follow 
along.


Patrick

On 05/27/2010 10:46 AM, Gustavo Niemeyer wrote:

actually pat hunt took over that issue: ZOOKEEPER-733. pat has made a
lot of progress and the patch looks close to being ready.


This is just the server side though, still need to make similar changes on
the client. That will likely be a separate jira. But yes, it's coming along.


Oh, that's great news Patrick.  Thanks for pushing this forward!

Do you think the client side might see some attention soon as well?
Or, in other words, do you plan to shift over to the client side once
you're done with the server?



Re: Question about concurrent primitives library

2010-05-26 Thread Patrick Hunt
Hi, this was originally proposed as a google summer of code project, the 
slots for gsoc have already been given out, this was not one of the 
projects chosen by apache. So you could still work on this if you like, 
but not under the gsoc umbrella. We (zk contributor community) would be 
happy to work with you. See the following for the recipes we currently 
ship with:

http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/recipes/

You might also check JIRA, such as:
https://issues.apache.org/jira/browse/ZOOKEEPER-767

There's still alot of work to be done in this area. Such as:

* identifying and documenting what components the library might/should 
contain. See this for the current list: 
http://hadoop.apache.org/zookeeper/docs/current/recipes.html


* even the existing recipes could benefit, improved documentation for 
example.


* queues/locks are implemented in src/recipes, however the other recipes 
are not


* add python implementations in addition to c/java?

* not all recipes are black/white, but rather there are many variations 
to each. We could add these to the docs/implementation


there's probably alot more that could be done that I haven't identified, 
this is fertile ground. Would be great if you were interested and would 
like to contribute. Feel free to create some jiras and contribute 
patches! I encourage you to move further the discussion to the 
zookeeper-dev list, that's where we discuss futures and unreleased 
software.


Regards,

Patrick

On 05/25/2010 11:34 PM, Chia-Hung Lin wrote:

Hi

I read the page at
http://wiki.apache.org/hadoop/ZooKeeper/SoC2010Ideas saying there
would have a mentor if one would work on those projects. I am
interested in `Concurrent Primitives Library' and would like to work
on this project. Is this project still available? Or any procedure
requires to apply in order to participate in this project?

Thanks, ChiaHung








Re: Ping and client session timeouts

2010-05-21 Thread Patrick Hunt

Hi Stephen, my comments inline below:

On 05/21/2010 09:31 AM, Stephen Green wrote:

I feel like I'm missing something fairly fundamental here.  I'm
building a clustered application that uses ZooKeeper (3.3.1) to store
its configuration information.  There are 33 nodes in the cluster
(Amazon EC2 instance, if that matters), and I'm currently using a
single ZooKeeper instance. When a node starts up, it makes a
connection to ZK, sets the data on a few paths and makes an ephemeral
node for itself. I keep the connection open while the node is running
so that I can use watches to find out if a node disappears, but after
the initial setup, the node usually won't write or read anything from
ZK.

My understanding (having had a quick look at the code) is that the
client connection will send a ping every sessionTimeout * 2/3 ms or so
to keep the session alive, but I keep seeing sessions dying.  On the


Actually the client sends a ping every 1/3 the timeout, and then looks 
for a response before another 1/3 elapses. This allows time to reconnect 
to a different server (and still maintain the session) if the current 
server were to become unavailable.



client side I'll see something like the following sequence of events:

[05/21/10 15:59:40.753] INFO Initiating client connection,
connectString=zookeeper:2200 sessionTimeout=3
watcher=com.echonest.cluster.zoocontai...@1eb3319f
[05/21/10 15:59:40.767] INFO Socket connection established to
zookeeper/10.255.9.187:2200, initiating session
[05/21/10 15:59:40.787] INFO Session establishment complete on server
zookeeper/10.255.9.187:2200, sessionid = 0x128bb7b828d004c, negotiated
timeout = 3


Ok, this (^^^) says that the timeout is set to 30sec.


[05/21/10 16:13:03.729] INFO Client session timed out, have not heard
from server in 33766ms for sessionid 0x128bb7b828d004c, closing socket
connection and attempting reconnect
[05/21/10 16:13:19.268] INFO Initiating client connection,
connectString=zookeeper:2200 sessionTimeout=3
watcher=com.echonest.cluster.zoocontai...@1eb3319f
[05/21/10 16:14:12.326] INFO Client session timed out, have not heard
from server in 53058ms for sessionid 0x128bb7b828d004c, closing socket
connection and attempting reconnect


This (^^^) is very suspicious, in particular have not heard from server 
in 53058ms. This means that the client heartbeat code didn't notice 
that the heartbeat was exceeded for 53 seconds! This should never 
happen, the client does a select with a timeout of 1/3 the session 
timeout (10sec here).


The fact that the select is taking 43 addl seconds (53-10sec select 
timeout) tells me that your client jvm is not allowing the heartbeat 
thread to run. The most common reason for this is GC. Is your client 
application very memory intensive? Heavy on GC? You should turn on your 
GC logging and review the output after reproducing this issue (turning 
on CMS/incremental GC mode usually resolves this issue, but you should 
verify first). What we typically see here is that the client JVM is 
running GC for very long periods of time, this blocks all the threads, 
and as a result the heartbeat is not sent by the client!


As you are running in a virtualized environment this could also be a 
factor (it's def an issue from a GC perspective). But I suspect that gc 
is the issue here, look at that first.


See this page for some common issues users have faced in the past: 
http://bit.ly/5WwS44




If I'm reading this correctly, the connection gets set up and then the
server experiences an error trying to read from the client, so it
closes the connection.  It's not clear if this causes the session
timeout or vice-versa (these systems are both running ntp, but I doubt
that we can count on interleaving those log times correctly.)



Yes, this is due to the client not getting the heartbeat, so it will 
close the connection and attempt to reestablish.



I started out with a session timeout of 10,000ms, but as you can see,
I have the same problem at 30,000ms.



You may not need to use this after you resolve the GC issue.


Do I have a fundamental misunderstanding? What else should I do to
figure out what's going on here?


As I suggested above, give GC logging a try. I found 'gchisto' a very 
useful tool for reviewing the resulting log files.

http://sysadminsjourney.com/content/2008/09/15/profile-your-java-gc-performance-gchisto

Regards,

Patrick


Re: Ping and client session timeouts

2010-05-21 Thread Patrick Hunt


On 05/21/2010 11:32 AM, Stephen Green wrote:

Right.  The system can be very memory-intensive, but at the time these
are occurring, it's not under a really heavy load, and there's plenty
of heap available. However, while looking at a thread dump from one of
the nodes, I realized that a very poor decision meant that I had more
than 1200 threads running.  I expect this is more of a problem than
the GC at this point.  I'm taking steps to correct this problem now.

Lately, I've had fewer and fewer problems with GC.  In a former life,
I sat down the hall from the folks who wrote Hotspot's GC and they're
pretty sharp folks :-)


GC as a cause is very common, however had you mentioned 1200 threads I 
would have guessed that to be a potential issue. ;-)



Right.  I'd like to have as small a timeout as possible so that I
notice quickly when things disappear.  What's a reasonable minimum?  I
notice recommendations in other messages on the list that 2 is a
good value.



The setting you should use typically is determined by your sla 
requirements. How soon do you want ephemeral nodes to be cleaned up if a 
client fails? Say you were doing leader election, this would gate 
re-election in the case where the current leader failed (set it lower 
and you are more responsive (faster), but also more susceptible to 
false positives (such as temp network glitch). Set it higher and you 
ride over the network glitches however it takes longer to recover when a 
client really does go down).


In some cases (hbase, solr) we've seen that the timeout had to be set 
artificially high due to the limitations of the current JVM GC algos. 
For example some hbase users were seeing GC pause times of  4 minutes. 
So this raises the question - do you consider this a failure or not? (I 
could reboot the machine faster than it takes to run that GC...)


Good luck,

Patrick


Re: Concurrent reads and writes on BookKeeper

2010-05-20 Thread Patrick Hunt


On 05/20/2010 08:42 AM, Flavio Junqueira wrote:

We have such a mechanism already, as Utkarsh mentions in the jira. The
question is if we need more sophisticated mechanisms implemented, or if
we should leave to the application to implement it. For now, we haven't
felt the need for such extra mechanisms implemented along with BK, but
I'd certainly be happy to hear a different perspective.



Ok, was just saying that we shouldn't be too strict about it (impls 
available out of the box). Otw we run into situations similar to zk 
recipes where multiple users were re-implementing common patterns.



Having said that, we have interesting projects to get folks involved
with BK, but I don't have it clear that this is one of them.



It would be great if you could enter JIRAs on this (projects), perhaps 
also a wiki 'interesting projects around bk (or hedwig, etc...)' page 
that catalogs those JIRAs.


Thanks!

Patrick


-Flavio

On May 20, 2010, at 1:36 AM, Patrick Hunt wrote:



On 05/19/2010 01:23 PM, Flavio Junqueira wrote:

Hi Andre, To guarantee that two clients that read from a ledger will
read the same sequence of entries, we need to make sure that there is
agreement on the end of the sequence. A client is still able to read
from an open ledger, though. We have an open jira about informing
clients of the progress of an open ledger (ZOOKEEPER-462), but we
haven't reached agreement on it yet. Some folks think that it is best
that each application use the mechanism it finds best. One option is to
have the writer writing periodically to a ZooKeeper znode to inform of
its progress.


Hi Flavio. Seems like wrapping up a couple/few of these options in the
client library (or a client library) would be useful for users --
reuse rather than everyone reinvent. Similar to how we now provide
recipes in zk source base rather than everyone rewriting the basic
locks/queues... Would be a great project I would think for someone
interested in getting started with bk (and to some extent zk)
development.

Patrick



I would need to know more detail of your application before recommending
you to stick with BookKeeper or switch to ZooKeeper. If your workload is
dominated by writes, then BookKeeper might be a better option.

-Flavio

On May 19, 2010, at 1:29 AM, André Oriani wrote:


Sorry, I forgot the subject on my last message :|

Hi all,
I was considering BookKeeper to implement some server replicated
application having one primary server as writer and many backup servers
reading from BookKeeper concurrently. The last documentation a I had
access says This writer has to execute a close ledger operation before
any other client can read from it. So readers cannot ready any
entry on
the ledger, even the already committed ones until writer stops
writing to
the ledger,i.e, closes it. Is my understanding right ? Should I then
use
Zookeeper directly to achieve what I want ?


Thanks for the attention,
André Oriani












[ANNOUNCE] Apache ZooKeeper 3.3.1

2010-05-17 Thread Patrick Hunt
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 
3.3.1


ZooKeeper is a high-performance coordination service for distributed 
applications. It exposes common services - such as naming, configuration 
management, synchronization, and group services - in a simple interface 
so you don't have to write them from scratch. You can use it 
off-the-shelf to implement consensus, group management, leader election, 
and presence protocols. And you can build on it for your own, specific 
needs.


For ZooKeeper release details and downloads, visit:
http://hadoop.apache.org/zookeeper/releases.html

ZooKeeper 3.3.1 Release Notes are at:
http://hadoop.apache.org/zookeeper/docs/r3.3.1/releasenotes.html

Regards,

The ZooKeeper Team





Re: Using ZooKeeper for managing solrCloud

2010-05-14 Thread Patrick Hunt
Mahadev pointed out the ZK monitoring details, but on the solr side of 
the house I don't think we can provide much insight as solr is acting as 
a client of the zk service. Your best bet would be to ask on the solr 
user list.


Regards,

Patrick

On 05/14/2010 04:09 AM, Rakhi Khatwani wrote:

Hi,
I just went through the zookeeper tutorial and successfully managed
to run the zookeeper server.
How do we monitor the zookeeper server?, is there a url for it?

i pasted the following urls on browser, but all i get is a blank page
http://localhost:2181
http://localhost:2181/zookeeper


I actually needed zookeeper for managing solr cloud managed externally
but now if i hv 2 solr servers running, how do i configure zookeeper to
manage them.

Regards,
Raakhi



Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
Hi Jordan, you've seen this once or frequently? (having the server + 
client logs will help alot)


Patrick

On 05/12/2010 11:08 AM, Jordan Zimmerman wrote:

Sure - if you think it's a bug.

We were using Zookeeper without issue. I then refactored a bunch of
code and this new behavior started. I'm starting ZK using zkServer
start and haven't made any changes to the code at all.

I'll get the logs together and post a JIRA.

-JZ

On May 12, 2010, at 10:59 AM, Mahadev Konar wrote:


Hi Jordan, Can you create a jira for this? And attach all the
server logs and client logs related to this timeline? How did you
start up the servers? Is there some changes you might have made
accidentatlly to the servers?


Thanks mahadev


On 5/12/10 10:49 AM, Jordan Zimmermanjzimmer...@proofpoint.com
wrote:


We've just started seeing an odd error and are having trouble
determining the cause. Xid out of order. Got 8 expected 7 Any
hints on what can cause this? Any ideas on how to debug?

We're using ZK 3.3.0. The error occurs in ClientCnxn.java line
781

-Jordan






Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
I'm still interested though... Are you using the new getChildren api 
that was added to the client in 3.3.0? (it provides a Stat object on 
return, the old getChildren did not). While we don't officially support 
3.3.0 client with 3.2.2 server (we do support the other way around), 
there shouldn't be they type of problem with this configuration as you 
describe. I'd still be interested for you to create that jira.


Regards,

Patrick

On 05/12/2010 11:23 AM, Jordan Zimmerman wrote:

Apologies...

I thought I was running 3.3.0 server, but was running 3.2.2 server
with 3.3.0 client. I upgraded the server and now all works again.
Sorry to trouble y'all.

-Jordan

On May 12, 2010, at 11:11 AM, Patrick Hunt wrote:


Hi Jordan, you've seen this once or frequently? (having the server
+ client logs will help alot)

Patrick

On 05/12/2010 11:08 AM, Jordan Zimmerman wrote:

Sure - if you think it's a bug.

We were using Zookeeper without issue. I then refactored a bunch
of code and this new behavior started. I'm starting ZK using
zkServer start and haven't made any changes to the code at
all.

I'll get the logs together and post a JIRA.

-JZ

On May 12, 2010, at 10:59 AM, Mahadev Konar wrote:


Hi Jordan, Can you create a jira for this? And attach all the
server logs and client logs related to this timeline? How did
you start up the servers? Is there some changes you might have
made accidentatlly to the servers?


Thanks mahadev


On 5/12/10 10:49 AM, Jordan
Zimmermanjzimmer...@proofpoint.com wrote:


We've just started seeing an odd error and are having
trouble determining the cause. Xid out of order. Got 8
expected 7 Any hints on what can cause this? Any ideas on
how to debug?

We're using ZK 3.3.0. The error occurs in ClientCnxn.java
line 781

-Jordan








Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
I think that explains it then - the server is probably dropping the new 
(3.3.0) getChildren message (xid 7) as it (3.2.2 server) doesn't know 
about that message type. Then the server responds to the client for a 
subsequent operation (xid 8), and at that point the client notices that 
getChildren (xid 7) got lost.


Patrick

On 05/12/2010 11:30 AM, Jordan Zimmerman wrote:

Oh, OK. When I get a moment I'll restart the 3.2.2 and post logs,
etc.

Yes, we're calling getChildren with the callback.

-JZ

On May 12, 2010, at 11:28 AM, Patrick Hunt wrote:


I'm still interested though... Are you using the new getChildren
api that was added to the client in 3.3.0? (it provides a Stat
object on return, the old getChildren did not). While we don't
officially support 3.3.0 client with 3.2.2 server (we do support
the other way around), there shouldn't be they type of problem with
this configuration as you describe. I'd still be interested for you
to create that jira.

Regards,

Patrick

On 05/12/2010 11:23 AM, Jordan Zimmerman wrote:

Apologies...

I thought I was running 3.3.0 server, but was running 3.2.2
server with 3.3.0 client. I upgraded the server and now all works
again. Sorry to trouble y'all.

-Jordan

On May 12, 2010, at 11:11 AM, Patrick Hunt wrote:


Hi Jordan, you've seen this once or frequently? (having the
server + client logs will help alot)

Patrick

On 05/12/2010 11:08 AM, Jordan Zimmerman wrote:

Sure - if you think it's a bug.

We were using Zookeeper without issue. I then refactored a
bunch of code and this new behavior started. I'm starting ZK
using zkServer start and haven't made any changes to the
code at all.

I'll get the logs together and post a JIRA.

-JZ

On May 12, 2010, at 10:59 AM, Mahadev Konar wrote:


Hi Jordan, Can you create a jira for this? And attach all
the server logs and client logs related to this timeline?
How did you start up the servers? Is there some changes you
might have made accidentatlly to the servers?


Thanks mahadev


On 5/12/10 10:49 AM, Jordan
Zimmermanjzimmer...@proofpoint.com  wrote:


We've just started seeing an odd error and are having
trouble determining the cause. Xid out of order. Got 8
expected 7 Any hints on what can cause this? Any ideas
on how to debug?

We're using ZK 3.3.0. The error occurs in
ClientCnxn.java line 781

-Jordan










Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
I think Ben meant that the unknown operation itself (from server 
perspective) should result in an error directly on both client and server.


Patrick

On 05/12/2010 11:45 AM, Jordan Zimmerman wrote:

Technically, there is an error generated.  IMO - a more descriptive error would 
be helpful.

-JZ

On May 12, 2010, at 11:41 AM, Benjamin Reed wrote:


is this a bug? shouldn't we be returning an error.

ben

On 05/12/2010 11:34 AM, Patrick Hunt wrote:

I think that explains it then - the server is probably dropping the new
(3.3.0) getChildren message (xid 7) as it (3.2.2 server) doesn't know
about that message type. Then the server responds to the client for a
subsequent operation (xid 8), and at that point the client notices that
getChildren (xid 7) got lost.

Patrick

On 05/12/2010 11:30 AM, Jordan Zimmerman wrote:


Oh, OK. When I get a moment I'll restart the 3.2.2 and post logs,
etc.

Yes, we're calling getChildren with the callback.

-JZ

On May 12, 2010, at 11:28 AM, Patrick Hunt wrote:



I'm still interested though... Are you using the new getChildren
api that was added to the client in 3.3.0? (it provides a Stat
object on return, the old getChildren did not). While we don't
officially support 3.3.0 client with 3.2.2 server (we do support
the other way around), there shouldn't be they type of problem with
this configuration as you describe. I'd still be interested for you
to create that jira.

Regards,

Patrick

On 05/12/2010 11:23 AM, Jordan Zimmerman wrote:


Apologies...

I thought I was running 3.3.0 server, but was running 3.2.2
server with 3.3.0 client. I upgraded the server and now all works
again. Sorry to trouble y'all.

-Jordan

On May 12, 2010, at 11:11 AM, Patrick Hunt wrote:



Hi Jordan, you've seen this once or frequently? (having the
server + client logs will help alot)

Patrick

On 05/12/2010 11:08 AM, Jordan Zimmerman wrote:


Sure - if you think it's a bug.

We were using Zookeeper without issue. I then refactored a
bunch of code and this new behavior started. I'm starting ZK
using zkServer start and haven't made any changes to the
code at all.

I'll get the logs together and post a JIRA.

-JZ

On May 12, 2010, at 10:59 AM, Mahadev Konar wrote:



Hi Jordan, Can you create a jira for this? And attach all
the server logs and client logs related to this timeline?
How did you start up the servers? Is there some changes you
might have made accidentatlly to the servers?


Thanks mahadev


On 5/12/10 10:49 AM, Jordan
Zimmermanjzimmer...@proofpoint.comwrote:



We've just started seeing an odd error and are having
trouble determining the cause. Xid out of order. Got 8
expected 7 Any hints on what can cause this? Any ideas
on how to debug?

We're using ZK 3.3.0. The error occurs in
ClientCnxn.java line 781

-Jordan















Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
Hm, if you don't mind enter that jira, would still like to verify by 
looking at the logs.


Patrick

On 05/12/2010 11:52 AM, Jordan Zimmerman wrote:

So, I'm off the Jira hook then?

-JZ

On May 12, 2010, at 11:49 AM, Patrick Hunt wrote:


You're right. Ben, would you mind entering a JIRA?

Patrick

On 05/12/2010 11:41 AM, Benjamin Reed wrote:

is this a bug? shouldn't we be returning an error.

ben

On 05/12/2010 11:34 AM, Patrick Hunt wrote:

I think that explains it then - the server is probably dropping the new
(3.3.0) getChildren message (xid 7) as it (3.2.2 server) doesn't know
about that message type. Then the server responds to the client for a
subsequent operation (xid 8), and at that point the client notices that
getChildren (xid 7) got lost.

Patrick

On 05/12/2010 11:30 AM, Jordan Zimmerman wrote:

Oh, OK. When I get a moment I'll restart the 3.2.2 and post logs,
etc.

Yes, we're calling getChildren with the callback.

-JZ

On May 12, 2010, at 11:28 AM, Patrick Hunt wrote:


I'm still interested though... Are you using the new getChildren
api that was added to the client in 3.3.0? (it provides a Stat
object on return, the old getChildren did not). While we don't
officially support 3.3.0 client with 3.2.2 server (we do support
the other way around), there shouldn't be they type of problem with
this configuration as you describe. I'd still be interested for you
to create that jira.

Regards,

Patrick

On 05/12/2010 11:23 AM, Jordan Zimmerman wrote:

Apologies...

I thought I was running 3.3.0 server, but was running 3.2.2
server with 3.3.0 client. I upgraded the server and now all works
again. Sorry to trouble y'all.

-Jordan

On May 12, 2010, at 11:11 AM, Patrick Hunt wrote:


Hi Jordan, you've seen this once or frequently? (having the
server + client logs will help alot)

Patrick

On 05/12/2010 11:08 AM, Jordan Zimmerman wrote:

Sure - if you think it's a bug.

We were using Zookeeper without issue. I then refactored a
bunch of code and this new behavior started. I'm starting ZK
using zkServer start and haven't made any changes to the
code at all.

I'll get the logs together and post a JIRA.

-JZ

On May 12, 2010, at 10:59 AM, Mahadev Konar wrote:


Hi Jordan, Can you create a jira for this? And attach all
the server logs and client logs related to this timeline?
How did you start up the servers? Is there some changes you
might have made accidentatlly to the servers?


Thanks mahadev


On 5/12/10 10:49 AM, Jordan
Zimmermanjzimmer...@proofpoint.com  wrote:


We've just started seeing an odd error and are having
trouble determining the cause. Xid out of order. Got 8
expected 7 Any hints on what can cause this? Any ideas
on how to debug?

We're using ZK 3.3.0. The error occurs in
ClientCnxn.java line 781

-Jordan






Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Patrick Hunt


On 05/12/2010 08:30 PM, Aaron Crow wrote:

I may have a better idea of what caused the trouble. I way, WAY
underestimated the number of nodes we collect over time. Right now we're at
1.9 million. This isn't a bug of our application; it's actually a feature
(but perhaps an ill-conceived one).

A most recent snapshot from a Zookeeper db is 227MB. If I scp it over to one
of the other Zookeeper hosts, it takes about 4 seconds.



Nice. You probably hold the record for largest (znode count) production 
ZK repo. Largest I've heard of at least.



Now, there are some things I can do to limit the number of nodes we collect.
My question is, how deadly could this node size be for us? Patrick mentioned
to me that he's run Zookeeper with this many nodes, but you need to be
careful about tuning. We're currently running with the recommended JVM
settings (see below). We're using different drives for the 2 different kinds
of data dirs that Zookeeper needs. We may also have the option of running on
a 64 bit OS with added RAM, if it's worth it. What about timeout settings?
I'm copying in our current settings below, are those ok?



ALA you have enough memory/disk/IO you should be ok. Are you monitoring 
the operation latency on the servers? (via 4letter words, such as stat?)


You might increase the init/sync limits a bit to ensure that the 
followers have enough time to d/l the snapshot, deserialize it, and get 
setup with the leader (if this takes too long the quorum will fail and 
reelect a new leader, which might happen indefinitely).



Or should we just figure out how to keep our node count much lower? And how
low is definitely pretty safe?



There's really no max - it's just dependent on your resources. Memory 
in particular.


You should turn on incremental GC mode though (-XX:+CMSIncrementalMode), 
otw large GC pauses will wreck your latencies. Checkout this link 
(below), verbose gc is also useful to track down issues later (if 
something bad happens you can use it to rule out/in GC as an issue)


http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html#0.0.0.0.Incremental%20mode%7Coutline

Regards,

Patrick



=== some current settings ===
-XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -Xms2560m -Xmx2560m
tickTime=2000
initLimit=10
syncLimit=5



Many thanks in advance for any good advice.
Aaron


On Wed, Apr 28, 2010 at 10:47 PM, Patrick Huntph...@apache.org  wrote:


Hi Aaron, some questions/comments below:


On 04/28/2010 06:29 PM, Aaron Crow wrote:


We were running version 3.2.2 for about a month and it was working well
for
us. Then late this past Saturday night, our cluster went pathological. One
of the 3 ZK servers spewed many WARNs (see below), and the other 2 servers
were almost constantly pegging the CPU. All three servers are on separate
machines. From what we could tell, the machines were fine... networking
fine, disk fine, etc. The ZK clients were completely unable to complete
their connections to ZK.



These machines are local (not wan) connected then? What OS and java version
are you using?

Do you see any FATAL or ERROR level messages in the logs?

It would help to look at your zk config files for these servers. Could you
provide (you might want to create a JIRA first, then just attach configs and
other details/collateral to that, easier than dealing with email)

If you have logs for the time period and can share that would be most
useful. (again, gzip and attach to the jira)


  We tried all sorts of restarts, running zkCleanup, etc. We even completely

shut down our clients... and the pathology continued. Our workaround was
to
do an urgent upgrade to version 3.3.0. The new ZK cluster with 3.3.0 has
been running well for us... so far...



Off hand and with the data we have so far nothing sticks out that 3.3 would
have resolved (JIRA is conveniently down for the last hour or so so I can't
review right now). Although there were some changes to reduce memory
consumption (see below).


  I realize that, sadly, this message doesn't contain nearly enough details

to
trace exactly what happened. I guess I'm wondering if anyone has seen this
general scenario, and/or knows how to prevent? Is there anything we might
be
doing client side to trigger this? Our application level request frequency
is maybe a few requests to Zookeeper per second, times 5 clients
applications. If we detect a SESSION EXPIRED, we do a simple create new
client and use that instead. And we were seeing this happen occasionally.



What are the client doing? Do you have a large number/size of znodes?

Do you see any OutOfMemoryError in the logs?

Could the ZK server java process be swapping? Are you monitoring GC,
perhaps large GC pauses are happening?

I have a suspicion that one of a few things might be happening. I see the
following in your original email:



:followerhand...@302] - Sending snapshot last zxid of peer is

0xd0007d66d

zxid of leader is 0xf
2010-04-24 23:06:03,254 - ERROR 

Re: zookeeper-3.2.2:Cannot open channel to X at election address / Connection refused

2010-05-11 Thread Patrick Hunt
The cases where we've seen this reported in the past the user tracked
the issue down to a firewall problem, I'm not sure what the issue is
here given you've verified that's not the problem. The log is clearly
saying:

 Thread:quorumcnxmana...@336] - Cannot open channel to 2 at election
 address /192.168.1.3:3888 http://192.168.1.3:3888
 java.net.ConnectException: Connection refused

which means that the server is attempting to open a connection to
192.168.1.3 port 3888, but the server at that ip/port is not accepting
the connection. Are you sure that both servers are up/running at the
same time? The log that you included, this was for server 1 right
(192.168.1.2)?

You might use netstat -a to verify that each server is bound to the
correct ports on each host, then take a look at the logs to see if this
connection refused is still happening (it can happen in the logs if
server 1 starts but server 2 is not yet started, but then should rectify
once both servers are bound and accepting connections).

If you still have issues create a jira, attach both configs and both log
files and we'll take closer look.
https://issues.apache.org/jira/browse/ZOOKEEPER

Good Luck,

Patrick

On 05/10/2010 08:07 PM, chen peng wrote:
 *Thank http://www.iciba.com/thank/ you http://www.iciba.com/you/ for 
 your http://www.iciba.com/your/ kind reply,but i think port(s) works 
 well, it **is not http://www.iciba.com/not/ a problem 
 http://www.iciba.com/problem/, **Any other 
 http://www.iciba.com/other/ suggestions?*
 PS:*In that case,i installed zookeeper-3.2.2 but hbase.*
 
 
 From: chenpeng0...@hotmail.com
 To: phu...@gmail.com
 Subject: RE: zookeeper-3.2.2:Cannot open channel to X at election 
 address / Connection refused
 Date: Mon, 10 May 2010 14:30:55 +
 
 *Thank http://www.iciba.com/thank/ you http://www.iciba.com/you/ for 
 your http://www.iciba.com/your/ kind reply,but i think port(s) works 
 well, it **is not http://www.iciba.com/not/ a problem 
 http://www.iciba.com/problem/, **Any other 
 http://www.iciba.com/other/ suggestions?*
 PS:*In that case,i installed zookeeper-3.2.2 but hbase.*
 
 
 Date: Sat, 8 May 2010 22:43:34 -0700
 Subject: Re: zookeeper-3.2.2:Cannot open channel to X at election 
 address / Connection refused
 From: phu...@gmail.com
 To: zookeeper-user@hadoop.apache.org; chenpeng0...@hotmail.com
 
 Often this is related to the port(s) being blocked by a firewall. 
 Perhaps you could check this (2888/3888) in both directions? Telnet can 
 help:
 https://help.maximumasp.com/KB/a445/connectivity-testing-with-ping-telnet-tracert-and-pathping-.aspx
 
 Patrick
 
 2010/5/7 chen peng chenpeng0...@hotmail.com 
 mailto:chenpeng0...@hotmail.com
 
 
 Hi all
 I have a question: after installation of the zookeeper according to
 the doc for zookeeper(
 
 http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_systemReq),
 abnormalities emerge as follows:
 --
 JMX enabled by default
 Using
 config: /home/baeeq/hadoop-0.20.2/zookeeper-3.2.2/bin/../conf/zoo.cfg
 Starting zookeeper ...
 STARTED
 2010-05-08 13:37:28,273 - INFO [main:quorumpeercon...@80] - Reading
 configuration
 from: /home/baeeq/hadoop-0.20.2/zookeeper-3.2.2/bin/../conf/zoo.cfg
 2010-05-08 13:37:28,284 - INFO [main:quorumpeercon...@232] - Defaulting
 to majority quorums
 2010-05-08 13:37:28,299 - INFO [main:quorumpeerm...@118] - Starting
 quorum peer
 2010-05-08 13:37:28,331 - INFO [Thread-1:quorumcnxmanager$liste...@409]
 - My election bind port: 3888
 2010-05-08 13:37:28,342 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@514] - LOOKING
 2010-05-08 13:37:28,345 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@579] - New
 election: -1
 2010-05-08 13:37:28,351 - WARN [WorkerSender
 Thread:quorumcnxmana...@336] - Cannot open channel to 2 at election
 address /192.168.1.3:3888 http://192.168.1.3:3888
 java.net.ConnectException: Connection refused
 at sun.nio.ch.Net.connect(Native Method)
 at
 sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
 at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
 at
 
 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
 at
 
 org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
 at org.apache.zookeeper.server.quorum.FastLeaderElection
 $Messenger$WorkerSender.process(FastLeaderElection.java:323)
 at org.apache.zookeeper.server.quorum.FastLeaderElection
 $Messenger$WorkerSender.run(FastLeaderElection.java:296)
 at java.lang.Thread.run(Thread.java:619)
 2010-05-08 13:37:28,352 - INFO
 

Re: zookeeper-3.2.2:Cannot open channel to X at election address / Connection refused

2010-05-11 Thread Patrick Hunt
Ok, great, good luck!

Patrick

On 05/10/2010 11:20 PM, chen peng wrote:
 My question has been decided.
 *I did not http://www.iciba.com/not/ know http://www.iciba.com/know/ 
 bin/zkServer start should be execute on each machine!*
 *I took him to be very close in function with **hadoop(**start-all.sh).*
 tks!
   Date: Mon, 10 May 2010 23:02:48 -0700
   From: ph...@apache.org
   To: chenpeng0...@hotmail.com
   CC: zookeeper-user@hadoop.apache.org
   Subject: Re: zookeeper-3.2.2:Cannot open channel to X at election 
 address / Connection refused
  
   The cases where we've seen this reported in the past the user tracked
   the issue down to a firewall problem, I'm not sure wh at the issue is
   here given you've verified that's not the problem. The log is clearly
   saying:
  
Thread:quorumcnxmana...@336] - Cannot open channel to 2 at election
address /192.168.1.3:3888 http://192.168.1.3:3888
java.net.ConnectException: Connection refused
  
   which means that the server is attempting to open a connection to
   192.168.1.3 port 3888, but the server at that ip/port is not accepting
   the connection. Are you sure that both servers are up/running at the
   same time? The log that you included, this was for server 1 right
   (192.168.1.2)?
  
   You might use netstat -a to verify that each server is bound to the
   correct ports on each host, then take a look at the logs to see if this
   connection refused is still happening (it can happen in the logs if
   server 1 starts but server 2 is not yet started, but then sh ould rectify
   once both servers are bound and accepting connections).
  
   If you still have issues create a jira, attach both configs and both log
   files and we'll take closer look.
   https://issues.apache.org/jira/browse/ZOOKEEPER
  
   Good Luck,
  
   Patrick
  
   On 05/10/2010 08:07 PM, chen peng wrote:
*Thank http://www.iciba.com/thank/ you 
 http://www.iciba.com/you/ for
your http://www.iciba.com/your/ kind reply,but i think port(s) works
well, it **is not http://www.iciba.com/not/ a problem
http://www.iciba.com/problem/, **Any other
http://www.iciba.com/other/ suggestions?*
PS:*In that case,i installed zookeeper-3.2.2 but hbase.*
   

 
From: chenpeng0...@hotmail.com
 g t;  To: phu...@gmail.com
Subject: RE: zookeeper-3.2.2:Cannot open channel to X at election
address / Connection refused
Date: Mon, 10 May 2010 14:30:55 +
   
*Thank http://www.iciba.com/thank/ you 
 http://www.iciba.com/you/ for
your http://www.iciba.com/your/ kind reply,but i think port(s) works
well, it **is not http://www.iciba.com/not/ a problem
http://www.iciba.com/problem/, **Any other
http://www.iciba.com/other/ suggestions?*
PS:*In that case,i installed zookeeper-3.2.2 but hbase.*
   

 
Date: Sat, 8 May 2010 22:43:34 -0700
Subject: Re: zookeeper-3.2.2:Cannot open channel to X at election
address / Connection refused
From: phu...@gmail.com*  To: zookeeper-user@hadoop.apache.org; 
 chenpeng0...@hotmail.com
   
Often this is related to the port(s) being blocked by a firewall.
Perhaps you could check this (2888/3888) in both directions? Telnet 
 can
help:

 https://help.maximumasp.com/KB/a445/connectivity-testing-with-ping-telnet-tracert-and-pathping-.aspx
   
Patrick
   
2010/5/7 chen peng chenpeng0...@hotmail.com
mailto:chenpeng0...@hotmail.com
   
   
Hi all
I have a question: after installation of the zookeeper according to
the doc for zookeeper(

 http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_systemReq),
abnormalities emerge as follows:
--
JMX enabled by de fault
Using
config: /home/baeeq/hadoop-0.20.2/zookeeper-3.2.2/bin/../conf/zoo.cfg
Starting zookeeper ...
STARTED
2010-05-08 13:37:28,273 - INFO [main:quorumpeercon...@80] - Reading
configuration
from: /home/baeeq/hadoop-0.20.2/zookeeper-3.2.2/bin/../conf/zoo.cfg
2010-05-08 13:37:28,284 - INFO [main:quorumpeercon...@232] - Defaulting
to majority quorums
2010-05-08 13:37:28,299 - INFO [main:quorumpeerm...@118] - Starting
quorum peer
2010-05-08 13:37:28,331 - INFO [Thread-1:quorumcnxmanager$liste...@409]
- My election bind port: 3888
2010-05-08 13:37:28,342 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@514] - LOOKING
2010-05-08 13:37:28,345 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLea derelect...@579] - New
election: -1
2010-05-08 13:37:28,351 - WARN [WorkerSender
Thread:quorumcnxmana...@336] - Cannot open channel to 2 at election
address /192.168.1.3:3888 http://192.168.1.3:3888
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at

Re: New ZooKeeper client library Cages

2010-05-11 Thread Patrick Hunt
Hi Dominic, this looks really interesting thanks for open sourcing it. I 
really like the idea of providing higher level concepts. I only just 
looked at the code, it wasn't clear on first pass what happens if you 
multilock on 3 paths, the first 2 are success, but the third fails. How 
are the locks cleared? How about the case where the client loses 
connectivity to the cluster, what happens in this case (both if partial 
locks are acquired, and the case where all the locks were acquired (for 
example how does the caller know if the locks are still held or released 
due to client partitioned from the cluster, etc...)).


I'll try d/l the code and looking at it more, I see some javadoc in 
there as well so that's great.


Regards,

Patrick

On 05/11/2010 04:02 PM, Dominic Williams wrote:

Anyone looking for a Java client library for ZooKeeper, please checkout:

Cages - http://cages.googlecode.com

The library will be expanded and feedback will be helpful.

Many thanks,
Dominic
ria101.wordpress.com



Re: zookeeper-3.2.2:Cannot open channel to X at election address / Connection refused

2010-05-08 Thread Patrick Hunt
Often this is related to the port(s) being blocked by a firewall. Perhaps
you could check this (2888/3888) in both directions? Telnet can help:
https://help.maximumasp.com/KB/a445/connectivity-testing-with-ping-telnet-tracert-and-pathping-.aspx

Patrick

2010/5/7 chen peng chenpeng0...@hotmail.com


 Hi all
   I have a question: after installation of the zookeeper according to
 the doc for zookeeper(

 http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_systemReq),
 abnormalities emerge as follows:
 --
 JMX enabled by default
 Using
 config: /home/baeeq/hadoop-0.20.2/zookeeper-3.2.2/bin/../conf/zoo.cfg
 Starting zookeeper ...
 STARTED
 2010-05-08 13:37:28,273 - INFO  [main:quorumpeercon...@80] - Reading
 configuration
 from: /home/baeeq/hadoop-0.20.2/zookeeper-3.2.2/bin/../conf/zoo.cfg
 2010-05-08 13:37:28,284 - INFO  [main:quorumpeercon...@232] - Defaulting
 to majority quorums
 2010-05-08 13:37:28,299 - INFO  [main:quorumpeerm...@118] - Starting
 quorum peer
 2010-05-08 13:37:28,331 - INFO  [Thread-1:quorumcnxmanager$liste...@409]
 - My election bind port: 3888
 2010-05-08 13:37:28,342 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@514] - LOOKING
 2010-05-08 13:37:28,345 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@579] - New
 election: -1
 2010-05-08 13:37:28,351 - WARN  [WorkerSender
 Thread:quorumcnxmana...@336] - Cannot open channel to 2 at election
 address /192.168.1.3:3888
 java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at
 sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at

 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
at

 org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
at org.apache.zookeeper.server.quorum.FastLeaderElection
 $Messenger$WorkerSender.process(FastLeaderElection.java:323)
at org.apache.zookeeper.server.quorum.FastLeaderElection
 $Messenger$WorkerSender.run(FastLeaderElection.java:296)
at java.lang.Thread.run(Thread.java:619)
 2010-05-08 13:37:28,352 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@618] -
 Notification: 1, -1, 1, 1, LOOKING, LOOKING, 1
 2010-05-08 13:37:28,353 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@642] - Adding vote
 2010-05-08 13:37:28,557 - WARN
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorumcnxmana...@336] - Cannot open
 channel to 2 at election address /192.168.1.3:3888
 java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at
 sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at

 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
at

 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:356)
at

 org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:603)
at
 org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:515)
 2010-05-08 13:37:28,559 - INFO
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@612] - Notification
 time out: 400
 2010-05-08 13:37:28,961 - WARN
 [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorumcnxmana...@336] - Cannot open
 channel to 2 at election address /192.168.1.3:3888
 java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at
 sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at

 org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323).
 --
 fileinfo for zoo.cfg is listed below:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/baeeq/hadoop-0.20.2/zookeeper-data
clientPort=2181
server.1=192.168.1.2:2888:3888
server.2=192.168.1.3:2888:3888

 PS: It works well on the single computer after deleting
 server.1=192.168.1.2:2888:3888 server.2=192.168.1.3:2888:3888






 _
 Hotmail: Trusted email with powerful SPAM protection.
 https://signup.live.com/signup.aspx?id=60969



Re: ZKClient

2010-05-05 Thread Patrick Hunt
Thanks Travis, I've slated this for 3.4.0, I think it would be useful to 
add more examples so feel free to add more if you have any ideas for 
useful ones.


For future reference, we ask that contributions come in the form of a patch:
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

It's fine this time around, but in future it would be helpful.

(also click on submit patch link when you are ready for review - that 
pushes it through the process, incl automated testing/verification, 
that's why we ask for a patch off the root btw)


Thanks!

Patrick

On 05/04/2010 04:00 PM, Travis Crawford wrote:

On Tue, May 4, 2010 at 3:45 PM, Ted Dunningted.dunn...@gmail.com  wrote:

Travis,

Attachments are stripped from this mailing list.  Can you file a JIRA and
put your attachment on that instead?

Here is a link to get you started:
https://issues.apache.org/jira/browse/ZOOKEEPER


Whoops. Filed:

https://issues.apache.org/jira/browse/ZOOKEEPER-765

--travis




On Tue, May 4, 2010 at 3:43 PM, Travis Crawfordtraviscrawf...@gmail.comwrote:


Attached is a skeleton application I extracted from a script I use --
perhaps we could add this as a recipe? If there are issues I'm more
than happy to fix them, or add more comments, whatever. It took a
while to figure this out and I'd love to save others that time in the
future.

--travis


On Tue, May 4, 2010 at 3:16 PM, Mahadev Konarmaha...@yahoo-inc.com
wrote:

Hi Adam,
  I don't think zk is very very hard to get right. There are exmaples in
src/recipes which implements locks/queues/others. There is ZOOKEEPER-22

to

make it even more easier for application to use.

Regarding re registration of watches, you can deifnitely write code and
submit is as a part of well documented contrib module which lays out the
assumptions/design of it. It could very well be useful for others. Its

just

that folks havent had much time to focus on these areas as yet.

Thanks
mahadev


On 5/4/10 2:58 PM, Adam Rosiena...@rosien.net  wrote:


I use zkclient in my work at kaChing and I have mixed feelings about
it. On one hand it makes easy things easy which is great, but on the
other hand I very few ideas what assumptions it makes under the
hood. I also dislike some of the design choices such as unchecked
exceptions, but that's neither here nor there. It would take some
extensive documentation work by the authors to really enumerate the
model and assumptions, but the project doesn't seem to be active
(either from it being adequate for its current users or just
inactive). I'm not sure I could derive the assumptions myself.

I'm a bit frustrated that zk is very, very hard to really get right.
At a project level, can't we create structures to avoid most of these
errors? Can there be a standard model with detailed assumptions and
implementations of all the recipes? How can we start this? Is there
something that makes this too hard?

I feel like a recipe page is a big fail; wouldn't an example app that
uses locks and barriers be that much more compelling?

For the common FAQ items like you need to re-register the watch,
can't we just create code that implements this pattern? My goal is to
live up to the motto: a good API is impossible to use incorrectly.

.. Adam

On Tue, May 4, 2010 at 2:21 PM, Ted Dunningted.dunn...@gmail.com

wrote:

In general, writing this sort of layer on top of ZK is very, very hard

to

get really right for general use.  In a simple use-case, you can

probably

nail it but distributed systems are a Zoo, to coin a phrase.  The

problem is

that you are fundamentally changing the metaphors in use so assumptions

can

come unglued or be introduced pretty easily.

One example of this is the fact that ZK watches *don't* fire for every
change but when you write listener oriented code, you kind of expect

that

they will.  That makes it really, really easy to introduce that

assumption

in the heads of the programmer using the event listener library on top

of

ZK.  Another example is how the atomic get content/set watch call works

in

ZK is easy to violate in an event driven architecture because the

thread

that watches ZK probably resets the watch.  If you assume that the

listener

will read the data, then you have introduced a timing mismatch between

the

read of the data and the resetting of the watch.  That might be OK or

it

might not be.  The point is that these changes are subtle and tricky to

get

exactly right.

On Tue, May 4, 2010 at 1:48 PM, Jonathan Holloway
jonathan.hollo...@gmail.com  wrote:


Is there any reason why this isn't part of the Zookeeper trunk

already?













Re: ZKClient

2010-05-05 Thread Patrick Hunt
While I agree DS is hard, I don't think we should lose the useful 
feedback given by Jonathan/Adam - that getting started with ZK is 
challenging and can be frustrating. We need to learn from this feedback 
and create some action items to address. One of the main things I've 
heard so far that we can act on today is that we should add 
examples/docs to round things out. I agree with this. Also the recipes 
page should be updated to point to the recipe implementations we 
recently added to the release.


One suggestion, it's much easier for new contributors/users to 
contribute to the examples than it is to jump into ZK core development. 
New users feel the pain most directly (recently), I'd encourage you to 
contribute back by creating an example or two. I'm sure the existing 
contributors would be happy to work with you to get them committed and 
released.


Regards,

Patrick

On 05/04/2010 03:43 PM, Ted Dunning wrote:

Creating recipes is a great thing, but that doesn't change the fact that
distributed systems are inherently a bit tricky, especially if you start
with the assumption (as many people do) that Peter Deutsch was wrong.

One of the great contributions of MapReduce style parallelism or the java
concurrent package is that it provides safe trails in a pretty scary forest.
  Good Zookeeper recipes could provide similar guidance with similar positive
effects.

On Tue, May 4, 2010 at 3:24 PM, Adam Rosiena...@rosien.net  wrote:


I'll check it out, but it is repeated in this list and on the web site
that it's not as easy as it seems. I just want to enumerate the
failure points and create abstractions to avoid them.

.. Adam

On Tue, May 4, 2010 at 3:16 PM, Mahadev Konarmaha...@yahoo-inc.com
wrote:

Hi Adam,
  I don't think zk is very very hard to get right. There are exmaples in
src/recipes which implements locks/queues/others. There is ZOOKEEPER-22

to

make it even more easier for application to use.

Regarding re registration of watches, you can deifnitely write code and
submit is as a part of well documented contrib module which lays out the
assumptions/design of it. It could very well be useful for others. Its

just

that folks havent had much time to focus on these areas as yet.

Thanks
mahadev


On 5/4/10 2:58 PM, Adam Rosiena...@rosien.net  wrote:


I use zkclient in my work at kaChing and I have mixed feelings about
it. On one hand it makes easy things easy which is great, but on the
other hand I very few ideas what assumptions it makes under the
hood. I also dislike some of the design choices such as unchecked
exceptions, but that's neither here nor there. It would take some
extensive documentation work by the authors to really enumerate the
model and assumptions, but the project doesn't seem to be active
(either from it being adequate for its current users or just
inactive). I'm not sure I could derive the assumptions myself.

I'm a bit frustrated that zk is very, very hard to really get right.
At a project level, can't we create structures to avoid most of these
errors? Can there be a standard model with detailed assumptions and
implementations of all the recipes? How can we start this? Is there
something that makes this too hard?

I feel like a recipe page is a big fail; wouldn't an example app that
uses locks and barriers be that much more compelling?

For the common FAQ items like you need to re-register the watch,
can't we just create code that implements this pattern? My goal is to
live up to the motto: a good API is impossible to use incorrectly.

.. Adam

On Tue, May 4, 2010 at 2:21 PM, Ted Dunningted.dunn...@gmail.com

wrote:

In general, writing this sort of layer on top of ZK is very, very hard

to

get really right for general use.  In a simple use-case, you can

probably

nail it but distributed systems are a Zoo, to coin a phrase.  The

problem is

that you are fundamentally changing the metaphors in use so assumptions

can

come unglued or be introduced pretty easily.

One example of this is the fact that ZK watches *don't* fire for every
change but when you write listener oriented code, you kind of expect

that

they will.  That makes it really, really easy to introduce that

assumption

in the heads of the programmer using the event listener library on top

of

ZK.  Another example is how the atomic get content/set watch call works

in

ZK is easy to violate in an event driven architecture because the

thread

that watches ZK probably resets the watch.  If you assume that the

listener

will read the data, then you have introduced a timing mismatch between

the

read of the data and the resetting of the watch.  That might be OK or

it

might not be.  The point is that these changes are subtle and tricky to

get

exactly right.

On Tue, May 4, 2010 at 1:48 PM, Jonathan Holloway
jonathan.hollo...@gmail.com  wrote:


Is there any reason why this isn't part of the Zookeeper trunk

already?













Re: ZKClient

2010-05-04 Thread Patrick Hunt

Take a look at this thread for some background.
http://www.mail-archive.com/zookeeper-user@hadoop.apache.org/msg00917.html

There were some concerns at the time, not sure if they have been 
addressed since (It has been a while since that discussion).


Patrick

On 05/04/2010 01:48 PM, Jonathan Holloway wrote:

It looks good, having written a client already myself, I'd rather use this
than have to roll
my own each time.

Is there any reason why this isn't part of the Zookeeper trunk already?
It would make working with Zookeeper a bit easier (at least from my
perspective)...

Jon.


On 4 May 2010 12:57, Ted Dunningted.dunn...@gmail.com  wrote:


This is used as part of katta where it gets a fair bit of exercise at low
update rates with small data.  It is used for managing the state of the
search cluster.

I don't think it has had much external review or use for purposes apart
from
katta.  Katta generally has pretty decent code, though.

On Tue, May 4, 2010 at 12:39 PM, Jonathan Holloway
jonathan.hollo...@gmail.com  wrote:


I came across this project on Github

http://github.com/sgroschupf/zkclient

for working with the Zookeeper API.  Has anybody used it in the past?  Is
it
a better way of interacting with
a Zookeeper cluster?

Many thanks,
Jon.







Re: avoiding deadlocks on client handle close w/ python/c api

2010-05-04 Thread Patrick Hunt

Thanks Kapil, Mahadev perhaps you could take a look at this as well?

Patrick

On 05/04/2010 06:36 AM, Kapil Thangavelu wrote:

I've constructed  a simple example just using the zkpython library with
condition variables, that will deadlock. I've filed a new ticket for it,

https://issues.apache.org/jira/browse/ZOOKEEPER-763

the gdb stack traces look suspiciously like the ones in 591, but sans the
watchers.
https://issues.apache.org/jira/browse/ZOOKEEPER-591

the attached example on the ticket will deadlock in zk 3.3.0 (which has the
fix for 591) and trunk.

-kapil

On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelukapil.f...@gmail.comwrote:


Hi Folks,

I'm constructing an async api on top of the zookeeper python bindings for
twisted. The intent was to make a thin wrapper that would wrap the existing
async api with one that allows for integration with the twisted python event
loop (http://www.twistedmatrix.com) primarily using the async apis.

One issue i'm running into while developing a unit tests, deadlocks occur
if we attempt to close a handle while there are any outstanding async
requests (aget, acreate, etc). Normally on close both the io thread
terminates and the completion thread are terminated and joined, however
w\ith outstanding async requests, the completion thread won't be in a
joinable state, and we effectively hang when the main thread does the join.

I'm curious if this would be considered bug, afaics ideal behavior would be
on close of a handle, to effectively clear out any remaining callbacks and
let the completion thread terminate.

i've tried adding some bookkeeping to the api to guard against closing
while there is an outstanding completion request, but its an imperfect
solution do to the nature of the event loop integration. The problem is that
the python callback invoked by the completion thread in turn schedules a
function for the main thread. In twisted the api for this is implemented by
appending the function to a list attribute on the reactor and then writing a
byte to a pipe to wakeup the main thread. If a thread switch to the main
thread occurs before the completion thread callback returns, the scheduled
function runs and the rest of the application keeps processing, of which the
last step for the unit tests is to close the connection, which results in a
deadlock.

i've included some of the client log and gdb stack traces from a deadlock'd
client process.

thanks,

Kapil








  1   2   3   >