Re: ZkClient package
Jun Rao: Hi, ZkClient (http://github.com/sgroschupf/zkclient) provides a nice wrapper around the ZooKeeper client and handles things like retry during ConnectionLoss events, and auto reconnect. Does anyone (other than Katta) use it? Would people recommend using it? Thanks, Jun Hi Jun, I have some ideas for an alternative Zk Client design, but haven't had the time yet to hack it together: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper- dev/201005.mbox/%3c201005261509.54236.tho...@koch.ro%3e I don't like zkClient very much, but it's the best thing available by now AFAIK. Also have a look at this bug: http://oss.101tec.com/jira/browse/KATTA-137 Best regards, Thomas Koch, http://www.koch.ro
Re: building client tools
Hi Andrei, I needed to install the following: apt-get install libtool autoconf libcppunit-dev There could well be other packages that were already installed on my machine (automake, gcc etc), but my build works now. I have since found that zookeeper is already packaged in debian testing, and the build-depends for this is quite large: http://git.debian.org/?p=pkg-java/zookeeper.git;a=blob;f=debian/control;h=b3d5b6d73a298784473f62a1e0ac57a378dde9c9;hb=43878542fbc30e4d8fa8d55be16044d0c9b488a4 Thanks for the assistance. regards, Martin On 13 July 2010 18:39, Andrei Savu savu.and...@gmail.com wrote: Hi, In this case I think you have to install libcppunit (should work using apt-get). I believe that should be enough but I don't really remember what else I've installed the first time I compiled the c client. Let me know what else was needed. I would like to submit a patch to update the README file in order to avoid this problem in the future. Thanks. On Tue, Jul 13, 2010 at 8:09 PM, Martin Waite waite@gmail.com wrote: Hi, I am trying to build the c client on debian lenny for zookeeper 3.3.1. autoreconf -if configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. autoreconf: /usr/bin/autoconf failed with exit status: 1 I probably need to install some required tools. Is there a list of what tools are needed to build this please ? regards, Martin -- Andrei Savu - http://andreisavu.ro/
Re: building client tools
Hi Mahadev, The suggestions from Sergey and Andrei have fixed this for me. regards, Martin On 13 July 2010 19:11, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Martin, There is a list of tools, i.e cppunit. That is the only required tool to build the zookeeper c library. The readme says that it can be done without cppunit being installed but there has been a open bug regarding this. So cppunit is required as of now. Thanks mahadev On 7/13/10 10:09 AM, Martin Waite waite@gmail.com wrote: Hi, I am trying to build the c client on debian lenny for zookeeper 3.3.1. autoreconf -if configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. autoreconf: /usr/bin/autoconf failed with exit status: 1 I probably need to install some required tools. Is there a list of what tools are needed to build this please ? regards, Martin
unit test failure
Hi, I am attempting to build the C client on debian lenny. autoconf, configure, make and make install all appear to work cleanly. I ran: autoreconf -if ./configure make make install make run-check However, the unit tests fail: $ make run-check make zktest-st zktest-mt make[1]: Entering directory `/home/martin/zookeeper-3.3.1/src/c' make[1]: `zktest-st' is up to date. make[1]: `zktest-mt' is up to date. make[1]: Leaving directory `/home/martin/zookeeper-3.3.1/src/c' ./zktest-st ./tests/zkServer.sh: line 52: kill: (17711) - No such process ZooKeeper server startedRunning Zookeeper_operations::testPing : elapsed 1 : OK Zookeeper_operations::testTimeoutCausedByWatches1 : elapsed 0 : OK Zookeeper_operations::testTimeoutCausedByWatches2 : elapsed 0 : OK Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : elapsed 2 : OK Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : OK Zookeeper_operations::testConcurrentOperations1 : elapsed 206 : OK Zookeeper_init::testBasic : elapsed 0 : OK Zookeeper_init::testAddressResolution : elapsed 0 : OK Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK Zookeeper_init::testNullAddressString : elapsed 0 : OK Zookeeper_init::testEmptyAddressString : elapsed 0 : OK Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK Zookeeper_init::testInvalidAddressString2 : elapsed 2 : OK Zookeeper_init::testNonexistentHost : elapsed 108 : OK Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 0 : OK Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK Zookeeper_close::testCloseUnconnected : elapsed 0 : OK Zookeeper_close::testCloseUnconnected1 : elapsed 0 : OK Zookeeper_close::testCloseConnected1 : elapsed 0 : OK Zookeeper_close::testCloseFromWatcher1 : elapsed 0 : OK Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after throwing an instance of 'CppUnit::Exception' what(): equality assertion failed - Expected: -101 - Actual : -4 make: *** [run-check] Aborted This appears to come from tests/TestClient.cc - but beyond that, it is hard to identify which equality assertion failed. Help ! regards, Martin
Re: unit test failure
HI Martin, Can you check if you have a stale java process (ZooKeeperServer) running on your machine? That might cause some issues with the tests. Thanks mahadev On 7/14/10 8:03 AM, Martin Waite waite@gmail.com wrote: Hi, I am attempting to build the C client on debian lenny. autoconf, configure, make and make install all appear to work cleanly. I ran: autoreconf -if ./configure make make install make run-check However, the unit tests fail: $ make run-check make zktest-st zktest-mt make[1]: Entering directory `/home/martin/zookeeper-3.3.1/src/c' make[1]: `zktest-st' is up to date. make[1]: `zktest-mt' is up to date. make[1]: Leaving directory `/home/martin/zookeeper-3.3.1/src/c' ./zktest-st ./tests/zkServer.sh: line 52: kill: (17711) - No such process ZooKeeper server startedRunning Zookeeper_operations::testPing : elapsed 1 : OK Zookeeper_operations::testTimeoutCausedByWatches1 : elapsed 0 : OK Zookeeper_operations::testTimeoutCausedByWatches2 : elapsed 0 : OK Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : elapsed 2 : OK Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : OK Zookeeper_operations::testConcurrentOperations1 : elapsed 206 : OK Zookeeper_init::testBasic : elapsed 0 : OK Zookeeper_init::testAddressResolution : elapsed 0 : OK Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK Zookeeper_init::testNullAddressString : elapsed 0 : OK Zookeeper_init::testEmptyAddressString : elapsed 0 : OK Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK Zookeeper_init::testInvalidAddressString2 : elapsed 2 : OK Zookeeper_init::testNonexistentHost : elapsed 108 : OK Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 0 : OK Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK Zookeeper_close::testCloseUnconnected : elapsed 0 : OK Zookeeper_close::testCloseUnconnected1 : elapsed 0 : OK Zookeeper_close::testCloseConnected1 : elapsed 0 : OK Zookeeper_close::testCloseFromWatcher1 : elapsed 0 : OK Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after throwing an instance of 'CppUnit::Exception' what(): equality assertion failed - Expected: -101 - Actual : -4 make: *** [run-check] Aborted This appears to come from tests/TestClient.cc - but beyond that, it is hard to identify which equality assertion failed. Help ! regards, Martin
Re: ZkClient package
Thomas - I like the ideas of your proposal, it seems very natural to use Callable/Future for zk operations rather than something with more opaque semantics (does this method block? etc.). Let's discuss this more, I'd be more than happy to help out. We're still using 3.2.1 so I'll probably have to fix zkclient when we upgrade in the near future. .. Adam On Wed, Jul 14, 2010 at 12:49 AM, Thomas Koch tho...@koch.ro wrote: Jun Rao: Hi, ZkClient (http://github.com/sgroschupf/zkclient) provides a nice wrapper around the ZooKeeper client and handles things like retry during ConnectionLoss events, and auto reconnect. Does anyone (other than Katta) use it? Would people recommend using it? Thanks, Jun Hi Jun, I have some ideas for an alternative Zk Client design, but haven't had the time yet to hack it together: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper- dev/201005.mbox/%3c201005261509.54236.tho...@koch.ro%3e I don't like zkClient very much, but it's the best thing available by now AFAIK. Also have a look at this bug: http://oss.101tec.com/jira/browse/KATTA-137 Best regards, Thomas Koch, http://www.koch.ro
What does this exception mean?
Hi All I run into this periodically. I am curious to know what this means, why would this happen and how am I to react to it programmatically. org.apache.thrift.TException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /Config/Stats/count at com.abc.service.MyService.handleAll(MyService.java:223) at com.abc.service.MyService.assign(AtlasService.java:344) at com.abc.service.MyService.assign(AtlasService.java:364) at com.abc.service.MyService.assignAll(AtlasService.java:385) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /Config/Stats/count at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:518) Please advice. Thanks Avinash
Achieving quorum with only half of the nodes
Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network partitioning is not an issue in our case. The only solution I come up with so far is to provide custom QuorumVerifier that will add a little premium in case if all servers in the quorum set are from the same group. Basically if we have only half of votes but all of them belong to the same group then we decide to have a quorum. Any ideas or better solutions are very appreciated. Sorry if this has been already discussed/answered. Regards, Sergei This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof.
Errors with Python bindings
I'm running a Tornado webserver and using ZooKeeper to store some metadata and occasionally the ZooKeeper connection will error out irrevocably. Any subsequent calls to ZooKeeper from this process will result in a SystemError. Here is the relevant portion of the Python traceback: snip... File /usr/lib/pymodules/python2.5/zuul/storage/zoo.py, line 69, in call return getattr(zookeeper, name)(self.handle, *args) SystemError: NULL result without error in PyObject_Call I found this in the ZooKeeper server logs: 2010-07-13 06:52:46,488 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - Accepted socket connection from /10.2.128.233:54779 2010-07-13 06:52:46,489 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@742] - Client attempting to renew session 0x429b865a6270003 at /10.2.128.233:54779 2010-07-13 06:52:46,489 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:lear...@95] - Revalidating client: 299973596915630083 2010-07-13 06:52:46,793 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@1424] - Invalid session 0x429b865a6270003 for client /10.2.128.233:54779, probably expired 2010-07-13 06:52:46,794 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket connection for client /10.2.128.233:54779 which had sessionid 0x429b865a6270003 The ZooKeeper ensemble is healthy; each node responds as expected to the four letter word commands and a simple restart of the Tornado processes fixes this. My question is, if this really is due to session expiration why is a SessionExpiredException not raised? Another question, is there an easy way to determine the version of the ZooKeeper Python bindings I'm using? I built the 3.3.0 bindings but I just want to be able to verify that. Thanks for the help, Rich
Re: Achieving quorum with only half of the nodes
by custom QuorumVerifier are you referring to http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html ? ben On 07/14/2010 12:43 PM, Sergei Babovich wrote: Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network partitioning is not an issue in our case. The only solution I come up with so far is to provide custom QuorumVerifier that will add a little premium in case if all servers in the quorum set are from the same group. Basically if we have only half of votes but all of them belong to the same group then we decide to have a quorum. Any ideas or better solutions are very appreciated. Sorry if this has been already discussed/answered. Regards, Sergei This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof.
Re: Achieving quorum with only half of the nodes
Hi Sergei, I'm not sure what the implementation of QuorumVerifier you have in mind would look like to make your setting work. Even if you don't have partitions, variation in message delays can cause inconsistencies in your ZooKeeper cluster. Keep in mind that we make the assumption that quorums intersect.-FlavioOn Jul 14, 2010, at 9:43 PM, Sergei Babovich wrote:Hi,We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network partitioning is not an issue in our case. The only solution I come up with so far is to provide custom QuorumVerifier that will add a little premium in case if all servers in the quorum set are from the same group. Basically if we have only half of votes but all of them belong to the same group then we decide to have a quorum.Any ideas or better solutions are very appreciated. Sorry if this has been already discussed/answered.Regards,SergeiThis e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof. flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
Re: Achieving quorum with only half of the nodes
Just another implementation of QuorumVerifier (based on existing implementation: either majority or hierarchical quorums). Probably hierarchical quorum is simplest to adjust - it already has notion of groups, etc. On 07/14/2010 04:46 PM, Benjamin Reed wrote: by custom QuorumVerifier are you referring to http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html ? ben On 07/14/2010 12:43 PM, Sergei Babovich wrote: Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network partitioning is not an issue in our case. The only solution I come up with so far is to provide custom QuorumVerifier that will add a little premium in case if all servers in the quorum set are from the same group. Basically if we have only half of votes but all of them belong to the same group then we decide to have a quorum. Any ideas or better solutions are very appreciated. Sorry if this has been already discussed/answered. Regards, Sergei This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof. This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof.
Re: Achieving quorum with only half of the nodes
Thanks, Flavio, Yep... I see. This is a problem. Any better idea? As an alternative option we could probably consider running single ZK node on EC2 - only in order to handle this specific case. Does it make sense to you? Is it feasible? Would it result in considerable performance impact due to network latency? I hope that at least in theory since quorum can be reached without ack from EC2 node performance impact might be manageable. Regards, Sergei On 07/14/2010 04:52 PM, Flavio Junqueira wrote: Hi Sergei, I'm not sure what the implementation of QuorumVerifier you have in mind would look like to make your setting work. Even if you don't have partitions, variation in message delays can cause inconsistencies in your ZooKeeper cluster. Keep in mind that we make the assumption that quorums intersect. -Flavio On Jul 14, 2010, at 9:43 PM, Sergei Babovich wrote: Hi, We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network partitioning is not an issue in our case. The only solution I come up with so far is to provide custom QuorumVerifier that will add a little premium in case if all servers in the quorum set are from the same group. Basically if we have only half of votes but all of them belong to the same group then we decide to have a quorum. Any ideas or better solutions are very appreciated. Sorry if this has been already discussed/answered. Regards, Sergei This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof. *flavio* *junqueira* research scientist f...@yahoo-inc.com mailto:f...@yahoo-inc.com direct +34 93-183-8828 avinguda diagonal 177, 8th floor, barcelona, 08018, es phone (408) 349 3300 fax (408) 349 3301 This e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof.
Re: Achieving quorum with only half of the nodes
On Wed, Jul 14, 2010 at 2:16 PM, Sergei Babovich sbabov...@demandware.comwrote: Yep... I see. This is a problem. Any better idea? I think that the production of slightly elaborate quorum rules to handle specific failure modes isn't a reasonable thing. What you need to do in conjunction is to estimate likelihoods of classes of failure modes and convince yourself that you have decreased the overall failure probability. As an alternative option we could probably consider running single ZK node on EC2 - only in order to handle this specific case. Does it make sense to you? Is it feasible? Would it result in considerable performance impact due to network latency? I hope that at least in theory since quorum can be reached without ack from EC2 node performance impact might be manageable. What about just putting a UPS on one machine in each of the two power supply groups? You are probably correct, though, that this outlier machine would almost never matter to speed except when half of your machines have failed.