Re: ZkClient package

2010-07-14 Thread Thomas Koch
Jun Rao:
 Hi,
 
 ZkClient (http://github.com/sgroschupf/zkclient) provides a nice wrapper
 around the ZooKeeper client and handles things like retry during
 ConnectionLoss events, and auto reconnect. Does anyone (other than Katta)
 use it? Would people recommend using it? Thanks,
 
 Jun
Hi Jun,

I have some ideas for an alternative Zk Client design, but haven't had the 
time yet to hack it together:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-
dev/201005.mbox/%3c201005261509.54236.tho...@koch.ro%3e

I don't like zkClient very much, but it's the best thing available by now 
AFAIK. Also have a look at this bug:
http://oss.101tec.com/jira/browse/KATTA-137

Best regards,

Thomas Koch, http://www.koch.ro


Re: building client tools

2010-07-14 Thread Martin Waite
Hi Andrei,

I needed to install the following:

   apt-get install libtool autoconf libcppunit-dev

There could well be other packages that were already installed on my machine
(automake, gcc etc), but my build works now.

I have since found that zookeeper is already packaged in debian testing, and
the build-depends for this is quite large:

http://git.debian.org/?p=pkg-java/zookeeper.git;a=blob;f=debian/control;h=b3d5b6d73a298784473f62a1e0ac57a378dde9c9;hb=43878542fbc30e4d8fa8d55be16044d0c9b488a4

Thanks for the assistance.

regards,
Martin

On 13 July 2010 18:39, Andrei Savu savu.and...@gmail.com wrote:

 Hi,

 In this case I think you have to install libcppunit (should work using
 apt-get). I believe that should be enough but I don't really remember
 what else I've installed the first time I compiled the c client.

 Let me know what else was needed. I would like to submit a patch to
 update the README file in order to avoid this problem in the future.

 Thanks.

 On Tue, Jul 13, 2010 at 8:09 PM, Martin Waite waite@gmail.com wrote:
  Hi,
 
  I am trying to build the c client on debian lenny for zookeeper 3.3.1.
 
  autoreconf -if
  configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
  configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
  configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT
   If this token and others are legitimate, please use
 m4_pattern_allow.
   See the Autoconf documentation.
  autoreconf: /usr/bin/autoconf failed with exit status: 1
 
  I probably need to install some required tools.   Is there a list of what
  tools are needed to build this please ?
 
  regards,
  Martin
 



 --
 Andrei Savu - http://andreisavu.ro/



Re: building client tools

2010-07-14 Thread Martin Waite
Hi Mahadev,

The suggestions from Sergey and Andrei have fixed this for me.

regards,
Martin

On 13 July 2010 19:11, Mahadev Konar maha...@yahoo-inc.com wrote:

 Hi Martin,
  There is a list of tools, i.e cppunit. That is the only required tool to
 build the zookeeper c library. The readme says that it can be done without
 cppunit being installed but there has been a open bug regarding this. So
 cppunit is required as of now.

 Thanks
 mahadev


 On 7/13/10 10:09 AM, Martin Waite waite@gmail.com wrote:

  Hi,
 
  I am trying to build the c client on debian lenny for zookeeper 3.3.1.
 
  autoreconf -if
  configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
  configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
  configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT
If this token and others are legitimate, please use
 m4_pattern_allow.
See the Autoconf documentation.
  autoreconf: /usr/bin/autoconf failed with exit status: 1
 
  I probably need to install some required tools.   Is there a list of what
  tools are needed to build this please ?
 
  regards,
  Martin




unit test failure

2010-07-14 Thread Martin Waite
Hi,

I am attempting to build the C client on debian lenny.

autoconf, configure, make and make install all appear to work cleanly.

I ran:

autoreconf -if
./configure
make
make install
make run-check

However, the unit tests fail:

$ make run-check
make  zktest-st zktest-mt
make[1]: Entering directory `/home/martin/zookeeper-3.3.1/src/c'
make[1]: `zktest-st' is up to date.
make[1]: `zktest-mt' is up to date.
make[1]: Leaving directory `/home/martin/zookeeper-3.3.1/src/c'
./zktest-st
./tests/zkServer.sh: line 52: kill: (17711) - No such process
 ZooKeeper server startedRunning
Zookeeper_operations::testPing : elapsed 1 : OK
Zookeeper_operations::testTimeoutCausedByWatches1 : elapsed 0 : OK
Zookeeper_operations::testTimeoutCausedByWatches2 : elapsed 0 : OK
Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : elapsed 2 :
OK
Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 :
OK
Zookeeper_operations::testConcurrentOperations1 : elapsed 206 : OK
Zookeeper_init::testBasic : elapsed 0 : OK
Zookeeper_init::testAddressResolution : elapsed 0 : OK
Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK
Zookeeper_init::testNullAddressString : elapsed 0 : OK
Zookeeper_init::testEmptyAddressString : elapsed 0 : OK
Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK
Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK
Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK
Zookeeper_init::testInvalidAddressString2 : elapsed 2 : OK
Zookeeper_init::testNonexistentHost : elapsed 108 : OK
Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK
Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK
Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 0 : OK
Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK
Zookeeper_close::testCloseUnconnected : elapsed 0 : OK
Zookeeper_close::testCloseUnconnected1 : elapsed 0 : OK
Zookeeper_close::testCloseConnected1 : elapsed 0 : OK
Zookeeper_close::testCloseFromWatcher1 : elapsed 0 : OK
Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
throwing an instance of 'CppUnit::Exception'
  what():  equality assertion failed
- Expected: -101
- Actual  : -4

make: *** [run-check] Aborted

This appears to come from tests/TestClient.cc - but beyond that, it is hard
to identify which equality assertion failed.

Help !

regards,
Martin


Re: unit test failure

2010-07-14 Thread Mahadev Konar
HI Martin,
  Can you check if you have a stale java process (ZooKeeperServer) running
on your machine? That might cause some issues with the tests.


Thanks 
mahadev


On 7/14/10 8:03 AM, Martin Waite waite@gmail.com wrote:

 Hi,
 
 I am attempting to build the C client on debian lenny.
 
 autoconf, configure, make and make install all appear to work cleanly.
 
 I ran:
 
 autoreconf -if
 ./configure
 make
 make install
 make run-check
 
 However, the unit tests fail:
 
 $ make run-check
 make  zktest-st zktest-mt
 make[1]: Entering directory `/home/martin/zookeeper-3.3.1/src/c'
 make[1]: `zktest-st' is up to date.
 make[1]: `zktest-mt' is up to date.
 make[1]: Leaving directory `/home/martin/zookeeper-3.3.1/src/c'
 ./zktest-st
 ./tests/zkServer.sh: line 52: kill: (17711) - No such process
  ZooKeeper server startedRunning
 Zookeeper_operations::testPing : elapsed 1 : OK
 Zookeeper_operations::testTimeoutCausedByWatches1 : elapsed 0 : OK
 Zookeeper_operations::testTimeoutCausedByWatches2 : elapsed 0 : OK
 Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : elapsed 2 :
 OK
 Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 :
 OK
 Zookeeper_operations::testConcurrentOperations1 : elapsed 206 : OK
 Zookeeper_init::testBasic : elapsed 0 : OK
 Zookeeper_init::testAddressResolution : elapsed 0 : OK
 Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK
 Zookeeper_init::testNullAddressString : elapsed 0 : OK
 Zookeeper_init::testEmptyAddressString : elapsed 0 : OK
 Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK
 Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK
 Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK
 Zookeeper_init::testInvalidAddressString2 : elapsed 2 : OK
 Zookeeper_init::testNonexistentHost : elapsed 108 : OK
 Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK
 Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK
 Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 0 : OK
 Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK
 Zookeeper_close::testCloseUnconnected : elapsed 0 : OK
 Zookeeper_close::testCloseUnconnected1 : elapsed 0 : OK
 Zookeeper_close::testCloseConnected1 : elapsed 0 : OK
 Zookeeper_close::testCloseFromWatcher1 : elapsed 0 : OK
 Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
 throwing an instance of 'CppUnit::Exception'
   what():  equality assertion failed
 - Expected: -101
 - Actual  : -4
 
 make: *** [run-check] Aborted
 
 This appears to come from tests/TestClient.cc - but beyond that, it is hard
 to identify which equality assertion failed.
 
 Help !
 
 regards,
 Martin



Re: ZkClient package

2010-07-14 Thread Adam Rosien
Thomas -

I like the ideas of your proposal, it seems very natural to use
Callable/Future for zk operations rather than something with more
opaque semantics (does this method block? etc.). Let's discuss this
more, I'd be more than happy to help out.

We're still using 3.2.1 so I'll probably have to fix zkclient when we
upgrade in the near future.

.. Adam

On Wed, Jul 14, 2010 at 12:49 AM, Thomas Koch tho...@koch.ro wrote:
 Jun Rao:
 Hi,

 ZkClient (http://github.com/sgroschupf/zkclient) provides a nice wrapper
 around the ZooKeeper client and handles things like retry during
 ConnectionLoss events, and auto reconnect. Does anyone (other than Katta)
 use it? Would people recommend using it? Thanks,

 Jun
 Hi Jun,

 I have some ideas for an alternative Zk Client design, but haven't had the
 time yet to hack it together:
 http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-
 dev/201005.mbox/%3c201005261509.54236.tho...@koch.ro%3e

 I don't like zkClient very much, but it's the best thing available by now
 AFAIK. Also have a look at this bug:
 http://oss.101tec.com/jira/browse/KATTA-137

 Best regards,

 Thomas Koch, http://www.koch.ro



What does this exception mean?

2010-07-14 Thread Avinash Lakshman
Hi All

I run into this periodically. I am curious to know what this means, why
would this happen and how am I to react to it programmatically.

org.apache.thrift.TException:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /Config/Stats/count
at com.abc.service.MyService.handleAll(MyService.java:223)
at com.abc.service.MyService.assign(AtlasService.java:344)
at com.abc.service.MyService.assign(AtlasService.java:364)
at com.abc.service.MyService.assignAll(AtlasService.java:385)

Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for  /Config/Stats/count
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:518)

Please advice.

Thanks
Avinash


Achieving quorum with only half of the nodes

2010-07-14 Thread Sergei Babovich

Hi,
We are currently evaluating use of ZK in our infrastructure. In our 
setup we have a set of servers running from two different power feeds. 
If one power feed goes away so does half of the servers. This makes 
problematic to configure ZK ensemble that would tolerate such outage. 
The network partitioning is not an issue in our case. The only solution 
I come up with so far is to provide custom QuorumVerifier that will add 
a little premium in case if all servers in the quorum set are from the 
same group. Basically if we have only half of votes but all of them 
belong to the same group then we decide to have a quorum.
Any ideas or better solutions are very appreciated. Sorry if this has 
been already discussed/answered.


Regards,
Sergei
This e-mail message and all attachments transmitted with it may contain 
privileged and/or confidential information intended solely for the use of the 
addressee(s). If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, 
forwarding or other use of this message or its attachments is strictly 
prohibited. If you have received this message in error, please notify the 
sender immediately and delete this message, all attachments and all copies and 
backups thereof.



Errors with Python bindings

2010-07-14 Thread Rich Schumacher
I'm running a Tornado webserver and using ZooKeeper to store some metadata and 
occasionally the ZooKeeper connection will error out irrevocably.  Any 
subsequent calls to ZooKeeper from this process will result in a SystemError.

Here is the relevant portion of the Python traceback:
  snip...
  File /usr/lib/pymodules/python2.5/zuul/storage/zoo.py, line 69, in call
return getattr(zookeeper, name)(self.handle, *args)
SystemError: NULL result without error in PyObject_Call

I found this in the ZooKeeper server logs:

2010-07-13 06:52:46,488 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - 
Accepted socket connection from /10.2.128.233:54779
2010-07-13 06:52:46,489 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@742] - Client 
attempting to renew session 0x429b865a6270003 at /10.2.128.233:54779
2010-07-13 06:52:46,489 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:lear...@95] - Revalidating client: 
299973596915630083
2010-07-13 06:52:46,793 - INFO  
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:nioserverc...@1424] - Invalid session 
0x429b865a6270003 for client /10.2.128.233:54779, probably expired
2010-07-13 06:52:46,794 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket 
connection for client /10.2.128.233:54779 which had sessionid 0x429b865a6270003


The ZooKeeper ensemble is healthy; each node responds as expected to the four 
letter word commands and a simple restart of the Tornado processes fixes this.

My question is, if this really is due to session expiration why is a 
SessionExpiredException not raised?  Another question, is there an easy way to 
determine the version of the ZooKeeper Python bindings I'm using?  I built the 
3.3.0 bindings but I just want to be able to verify that.

Thanks for the help,

Rich

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Benjamin Reed
by custom QuorumVerifier are you referring to 
http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html 
?


ben

On 07/14/2010 12:43 PM, Sergei Babovich wrote:

Hi,
We are currently evaluating use of ZK in our infrastructure. In our
setup we have a set of servers running from two different power feeds.
If one power feed goes away so does half of the servers. This makes
problematic to configure ZK ensemble that would tolerate such outage.
The network partitioning is not an issue in our case. The only solution
I come up with so far is to provide custom QuorumVerifier that will add
a little premium in case if all servers in the quorum set are from the
same group. Basically if we have only half of votes but all of them
belong to the same group then we decide to have a quorum.
Any ideas or better solutions are very appreciated. Sorry if this has
been already discussed/answered.

Regards,
Sergei
This e-mail message and all attachments transmitted with it may contain 
privileged and/or confidential information intended solely for the use of the 
addressee(s). If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, 
forwarding or other use of this message or its attachments is strictly 
prohibited. If you have received this message in error, please notify the 
sender immediately and delete this message, all attachments and all copies and 
backups thereof.

   




Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Flavio Junqueira
Hi Sergei, I'm not sure what the implementation of QuorumVerifier you have in mind would look like to make your setting work. Even if you don't have partitions, variation in message delays can cause inconsistencies in your ZooKeeper cluster. Keep in mind that we make the assumption that quorums intersect.-FlavioOn Jul 14, 2010, at 9:43 PM, Sergei Babovich wrote:Hi,We are currently evaluating use of ZK in our infrastructure. In our setup we have a set of servers running from two different power feeds. If one power feed goes away so does half of the servers. This makes problematic to configure ZK ensemble that would tolerate such outage. The network partitioning is not an issue in our case. The only solution I come up with so far is to provide custom QuorumVerifier that will add a little premium in case if all servers in the quorum set are from the same group. Basically if we have only half of votes but all of them belong to the same group then we decide to have a quorum.Any ideas or better solutions are very appreciated. Sorry if this has been already discussed/answered.Regards,SergeiThis e-mail message and all attachments transmitted with it may contain privileged and/or confidential information intended solely for the use of the addressee(s). If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, forwarding or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete this message, all attachments and all copies and backups thereof. flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301 

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Sergei Babovich
Just another implementation of QuorumVerifier (based on existing 
implementation: either majority or hierarchical quorums). Probably 
hierarchical quorum is simplest to adjust - it already has notion of 
groups, etc.


On 07/14/2010 04:46 PM, Benjamin Reed wrote:

by custom QuorumVerifier are you referring to
http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperHierarchicalQuorums.html
?

ben

On 07/14/2010 12:43 PM, Sergei Babovich wrote:
   

Hi,
We are currently evaluating use of ZK in our infrastructure. In our
setup we have a set of servers running from two different power feeds.
If one power feed goes away so does half of the servers. This makes
problematic to configure ZK ensemble that would tolerate such outage.
The network partitioning is not an issue in our case. The only solution
I come up with so far is to provide custom QuorumVerifier that will add
a little premium in case if all servers in the quorum set are from the
same group. Basically if we have only half of votes but all of them
belong to the same group then we decide to have a quorum.
Any ideas or better solutions are very appreciated. Sorry if this has
been already discussed/answered.

Regards,
Sergei
This e-mail message and all attachments transmitted with it may contain 
privileged and/or confidential information intended solely for the use of the 
addressee(s). If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, 
forwarding or other use of this message or its attachments is strictly 
prohibited. If you have received this message in error, please notify the 
sender immediately and delete this message, all attachments and all copies and 
backups thereof.


 


   


This e-mail message and all attachments transmitted with it may contain 
privileged and/or confidential information intended solely for the use of the 
addressee(s). If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, 
forwarding or other use of this message or its attachments is strictly 
prohibited. If you have received this message in error, please notify the 
sender immediately and delete this message, all attachments and all copies and 
backups thereof.



Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Sergei Babovich

Thanks, Flavio,
Yep... I see. This is a problem. Any better idea?
As an alternative option we could probably consider running single ZK 
node on EC2 - only in order to handle this specific case. Does it make 
sense to you? Is it feasible? Would it result in considerable 
performance impact due to network latency? I hope that at least in 
theory since quorum can be reached without ack from EC2 node performance 
impact might be manageable.


Regards,
Sergei

On 07/14/2010 04:52 PM, Flavio Junqueira wrote:
Hi Sergei, I'm not sure what the implementation of QuorumVerifier you 
have in mind would look like to make your setting work. Even if you 
don't have partitions, variation in message delays can cause 
inconsistencies in your ZooKeeper cluster. Keep in mind that we make 
the assumption that quorums intersect.


-Flavio

On Jul 14, 2010, at 9:43 PM, Sergei Babovich wrote:


Hi,
We are currently evaluating use of ZK in our infrastructure. In our
setup we have a set of servers running from two different power feeds.
If one power feed goes away so does half of the servers. This makes
problematic to configure ZK ensemble that would tolerate such outage.
The network partitioning is not an issue in our case. The only solution
I come up with so far is to provide custom QuorumVerifier that will add
a little premium in case if all servers in the quorum set are from the
same group. Basically if we have only half of votes but all of them
belong to the same group then we decide to have a quorum.
Any ideas or better solutions are very appreciated. Sorry if this has
been already discussed/answered.

Regards,
Sergei
This e-mail message and all attachments transmitted with it may 
contain privileged and/or confidential information intended solely 
for the use of the addressee(s). If the reader of this message is not 
the intended recipient, you are hereby notified that any reading, 
dissemination, distribution, copying, forwarding or other use of this 
message or its attachments is strictly prohibited. If you have 
received this message in error, please notify the sender immediately 
and delete this message, all attachments and all copies and backups 
thereof.




*flavio*
*junqueira*

research scientist

f...@yahoo-inc.com mailto:f...@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300 fax (408) 349 3301






This e-mail message and all attachments transmitted with it may contain 
privileged and/or confidential information intended solely for the use of the 
addressee(s). If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, 
forwarding or other use of this message or its attachments is strictly 
prohibited. If you have received this message in error, please notify the 
sender immediately and delete this message, all attachments and all copies and 
backups thereof.

Re: Achieving quorum with only half of the nodes

2010-07-14 Thread Ted Dunning
On Wed, Jul 14, 2010 at 2:16 PM, Sergei Babovich
sbabov...@demandware.comwrote:

 Yep... I see. This is a problem. Any better idea?


I think that the production of slightly elaborate quorum rules to handle
specific failure modes isn't a reasonable thing.  What you need to do in
conjunction is to estimate likelihoods of classes of failure modes and
convince yourself that you have decreased the overall failure probability.


 As an alternative option we could probably consider running single ZK node
 on EC2 - only in order to handle this specific case. Does it make sense to
 you? Is it feasible? Would it result in considerable performance impact due
 to network latency? I hope that at least in theory since quorum can be
 reached without ack from EC2 node performance impact might be manageable.


What about just putting a UPS on one machine in each of the two power supply
groups?

You are probably correct, though, that this outlier machine would almost
never matter to speed except when half of your machines have failed.