Re: txzookeeper - a twisted python client for zookeeper

2010-11-19 Thread Mahadev Konar
Nice.
Any chance of putting it back in zk?

Would be useful.

Thanks
mahadev


On 11/18/10 1:17 PM, Kapil Thangavelu kapil.f...@gmail.com wrote:

At canonical we've been using zookeeper heavily in the development of a new
project (ensemble) as noted by gustavo.

I just wanted to give a quick overview of the client library we're using for
it. Its called txzookeeper, its got 100% test coverage, and implements
various queue, lock, and utilities in addition to wrapping the standard zk
interface. Its based on the twisted async networking framework for python,
and obviates the need to use threads within the application, as all watches
and result callbacks are invoked in the main app thread. This makes
structuring the code signficantly simpler imo than having to deal with
threads in the application, but of course tastes may vary ;-).

Source code is here : http://launchpad.net/txzookeeper

comments and feedback welcome.

cheers,

Kapil



Re: JUnit tests do not produce logs if the JVM crashes

2010-11-04 Thread Mahadev Konar
Hi Andras,
  Junit unit will always buffer the logs unless you print it out to console.

To do that, try running this

ant test -Dtest.output=yes

This will print out the logs to console as they are logged.

Thanks
mahadev


On 11/4/10 3:33 AM, András Kövi allp...@gmail.com wrote:

 Hi all, I'm new to Zookeeper and ran into an issue while trying to run the
 tests with ant.
 
 It seems like the log output is buffered until the complete test suite
 finishes and it is flushed into its specific file only after then. I had to
 make some changes to the code (no JNI or similar) that resulted in JVM
 crashes. Since the logs are lost in this case, it is a little hard to debug
 the issue.
 
 Do you have any idea how I could disable the buffering?
 
 Thanks,
 Andras
 



FW: [Hadoop Wiki] Update of ZooKeeper/ZKClientBindings by yfinkelstein

2010-11-02 Thread Mahadev Konar
Nice to see this!

Thanks
mahadev
-- Forwarded Message
From: Apache Wiki wikidi...@apache.org
Reply-To: common-...@hadoop.apache.org
Date: Tue, 2 Nov 2010 14:39:24 -0700
To: Apache Wiki wikidi...@apache.org
Subject: [Hadoop Wiki] Update of ZooKeeper/ZKClientBindings by
yfinkelstein

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Hadoop Wiki for
change notification.

The ZooKeeper/ZKClientBindings page has been changed by yfinkelstein.
http://wiki.apache.org/hadoop/ZooKeeper/ZKClientBindings?action=diffrev1=5;
rev2=6

--

  ||Binding||Author||URL||
  ||Scala||Steve Jenson, John
Corwin||http://github.com/twitter/scala-zookeeper-client||
  ||C#||Eric Hauser||http://github.com/ewhauser/zookeeper||
- || || || ||
+ ||Node.js||Yuri 
Finkelstein||http://github.com/yfinkelstein/node-zookeeper||
 


-- End of Forwarded Message



Re: Problem with Zookeeper cluster configuration

2010-10-27 Thread Mahadev Konar
I think Jared pointed this out, given that your clientPort and quorum port
are same:

clientPort=5181
server.1=3.7.192.142:5181:5888
 


The above 2 ports should be different.

Thanks
mahadev

On 10/27/10 10:19 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 Sorry, didn't see this last bit.
 
 Hmph.  A real ZK person will have to answer this.
 
 On Wed, Oct 27, 2010 at 6:21 AM, siddhartha banik 
 siddhartha.ba...@gmail.com wrote:
 
 I have tried with netstat command also. No other process is using *5181
 *port
 other then zookeeper process.
 
 Other thing I have tried is: using separate ports for server1  server 2.
 Surprise is after starting server 2, server 1 also starts to use the same
 port as server 2 is using as client port. Does that matter , as server1 
 server 2 are running in different boxes.
 
 Any help is appreciated.
 
 
 Thanks
 Siddhartha
 
 



Re: Unusual exception

2010-10-26 Thread Mahadev Konar
Hi Avinash,
 Not sure if you got a response for your email.
  The exception that you mention mostly means that the client already closed
the socket or shutdown.
 Looks like a client is trying to connect but disconnects before the server
can respond.

Do you have any such clients? Is this causing any issues with your zookeeper
set up?

Thanks
mahadev


On 10/13/10 2:49 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote:

 I started seeing a bunch of these exceptions. What do these mean?
 
 2010-10-13 14:01:33,426 - WARN [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to
 read additional data from client sessionid 0x0, likely client has closed
 socket
 2010-10-13 14:01:33,426 - INFO [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for
 client /10.138.34.195:55738 (no session established for client)
 2010-10-13 14:01:33,426 - DEBUG [CommitProcessor:1:finalrequestproces...@78]
 - Processing request:: sessionid:0x12b9d1f8b907a44 type:closeSession
 cxid:0x0 zxid:0x600193996 txntype:-11 reqpath:n/a
 2010-10-13 14:01:33,427 - WARN [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to
 read additional data from client sessionid 0x12b9d1f8b907a5d, likely client
 has closed socket
 2010-10-13 14:01:33,427 - INFO [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for
 client /10.138.34.195:55979 which had sessionid 0x12b9d1f8b907a5d
 2010-10-13 14:01:33,427 - DEBUG [QuorumPeer:/0.0.0.0:5001
 :commitproces...@159] - Committing request:: sessionid:0x52b90ab45bd51af
 type:createSession cxid:0x0 zxid:0x600193cf9 txntype:-10 reqpath:n/a
 2010-10-13 14:01:33,427 - DEBUG [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1302] - ignoring exception during output
 shutdown
 java.net.SocketException: Transport endpoint is not connected
 at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
 at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
 at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
 at
 org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1298)
 at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263)
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609)
 at
 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
 2010-10-13 14:01:33,428 - DEBUG [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1310] - ignoring exception during input
 shutdown
 java.net.SocketException: Transport endpoint is not connected
 at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
 at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
 at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
 at
 org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306)
 at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263)
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609)
 at
 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
 2010-10-13 14:01:33,428 - WARN [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to
 read additional data from client sessionid 0x0, likely client has closed
 socket
 2010-10-13 14:01:33,428 - INFO [NIOServerCxn.Factory:
 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for
 client /10.138.34.195:55731 (no session established for client)
 



Re: Zookeeper on 60+Gb mem

2010-10-05 Thread Mahadev Konar
Hi Maarteen,
  I definitely know of a group which uses around 3GB of memory heap for
zookeeper but never heard of someone with such huge requirements. I would
say it definitely would be a learning experience with such high memory which
I definitely think would be very very useful for others in the community as
well. 

Thanks
mahadev


On 10/5/10 11:03 AM, Maarten Koopmans maar...@vrijheid.net wrote:

 Hi,
 
 I just wondered: has anybody ever ran zookeeper to the max on a 68GB
 quadruple extra large high memory EC2 instance? With, say, 60GB allocated or
 so?
 
 Because EC2 with EBS is a nice way to grow your zookeeper cluster (data on the
 ebs columes, upgrade as your memory utilization grows)  - I just wonder
 what the limits are there, or if I am foing where angels fear to tread...
 
 --Maarten 
 



Re: possible bug in zookeeper ?

2010-10-04 Thread Mahadev Konar
Hi Yatir,

  Any update on this? Are you still struggling with this problem?

Thanks
mahadev

On 9/15/10 12:56 AM, Yatir Ben Shlomo yat...@outbrain.com wrote:

 Thanks to all who replied, I appreciate your efforts:
 
 1. There is no connections problem from the client machine:
 (ob1078)(tom...@cass3:~)$ echo ruok | nc zook1 2181
 imok(ob1078)(tom...@cass3:~)$ echo ruok | nc zook2 2181
 imok(ob1078)(tom...@cass3:~)$ echo ruok | nc zook3 2181
 imok(ob1078)(tom...@cass3:~)$
 
 2. Unfortunately I have already tried to switch to the new jar but it does not
 seem to be backward compatible.
 It seems that the QuorumPeerConfig class does not have the following field
 protected int clientPort;
 It was replaced by InetSocketAddress clientPortAddress in the new jar
 So I am getting java.lang.NoSuchFieldError exception...
 
 3. I looked at the ClientCnxn.java code.
 It seems that the logic for iterating over the available servers
 (nextAddrToTry++ ) is used only inside the startConnect() function but not in
 the finishConnect() function, nor anywhere else.
 
 Possibly something along these lines is happening:
 some exception that happens inside the finishConnect() function is cauasing
 the cleanup() function which in turn causes another exception.
 Nowhere in this code path is the nextAddrToTry++ applied.
 Can this make sense to someone ?
 thanks
 
 
 
 
 
 
 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Tuesday, September 14, 2010 6:20 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: possible bug in zookeeper ?
 
 That is unusual. I don't recall anyone reporting a similar issue, and
 looking at the code I don't see any issues off hand. Can you try the
 following?
 
 1) on that particular zk client machine resolve the hosts zook1/zook2/zook3,
 what ip addresses does this resolve to? (try dig)
 2) try running the client using the 3.3.1 jar file (just replace the jar on
 the client), it includes more log4j information, turn on DEBUG or TRACE
 logging
 
 Patrick
 
 On Tue, Sep 14, 2010 at 8:44 AM, Yatir Ben Shlomo yat...@outbrain.comwrote:
 
 zook1:2181,zook2:2181,zook3:2181
 
 
 -Original Message-
 From: Ted Dunning [mailto:ted.dunn...@gmail.com]
 Sent: Tuesday, September 14, 2010 4:11 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: possible bug in zookeeper ?
 
 What was the list of servers that was given originally to open the
 connection to ZK?
 
 On Tue, Sep 14, 2010 at 6:15 AM, Yatir Ben Shlomo yat...@outbrain.com
 wrote:
 
 Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances.
 
 I am performing survivability  tests:
 Taking one of the zookeeper instances down I would expect the client to
 use
 a different zookeeper server instance.
 
 But as you can see in the below logs attached
 Depending on which instance I choose to take down (in my case,  the last
 one in the list of zookeeper servers)
 the client is constantly insisting on the same zookeeper server
 (Attempting
 connection to server zook3/192.168.252.78:2181)
 and not switching to a different one
 the problem seems to arrive from ClientCnxn.java
 Any one has an idea on this ?
 
 Solr cloud currently is using  zookeeper-3.2.2.jar
 Is this a know bug that was fixed in later versions ?( 3.3.1)
 
 Thanks in advance,
 Yatir
 
 
 Logs:
 
 Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown input
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at
 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java)
 :999)
at
 
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970
)
 Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown output
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at
 
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java)
 :1004)
at
 
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970
)
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
 INFO: Attempting connection to server zook3/192.168.252.78:2181
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
 WARNING: Exception closing session 0x32b105244a20001 to
 sun.nio.ch.selectionkeyi...@3ca58cbf
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at
 sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
 
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933
)
 Sep 14, 2010 9:02:22 AM 

Re: Expiring session... timeout of 600000ms exceeded

2010-10-04 Thread Mahadev Konar
Am not sure, if anyone responded to this or not. Are the clients getting
session expired or getting Connectionloss?
In any case, zookeeper client has its own thread to updated the server with
active connection status. Did you take a look at the GC activity at your
client?

Thanks
mahadev


On 9/21/10 8:24 AM, Tim Robertson timrobertson...@gmail.com wrote:

 Hi all,
 
 I am seeing a lot of my clients being kicked out after the 10 minute
 negotiated timeout is exceeded.
 My clients are each a JVM (around 100 running on a machine) which are
 doing web crawling of specific endpoints and handling the response XML
 - so they do wait around for 3-4 minutes on HTTP timeouts, but
 certainly not 10 mins.
 I am just prototyping right now on a 2xquad core mac pro with 12GB
 memory, and the 100 child processes only get -Xmx64m and I don't see
 my machine exhausted.
 
 Do my clients need to do anything in order to initiate keep alive
 heart beats or should this be automatic (I thought the ticktime would
 dictate this)?
 
 # my conf is:
 tickTime=2000
 dataDir=/Volumes/Data/zookeeper
 clientPort=2181
 maxClientCnxns=1
 minSessionTimeout=4000
 maxSessionTimeout=80
 
 Thanks for any pointers to this newbie,
 Tim
 



Re: SessionMovedException

2010-10-01 Thread Mahadev Konar
Hi Jun,
  You can read more about the SessionMovedException at

http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperProgrammers.html

Thanks
mahadev


On 10/1/10 9:58 AM, Jun Rao jun...@gmail.com wrote:

Hi,

Could someone explain what SessionMovedException means? Should it be treated
as SessionExpiredException (therefore have to recreate ephemeral nodes,
etc)? I have seen this exception when the network is being upgraded.

Thanks,

Jun



Re: zkfuse

2010-09-24 Thread Mahadev Konar
Hi Jun,
  I havent seen people using zkfuse recently. What kind of issues are you
facing?

Thanks
mahadev


On 9/19/10 6:46 PM, 俊贤 junx...@taobao.com wrote:

 Hi guys,
 Has anyone succeeded in installing the zkfuse?
 
 
 This email (including any attachments) is confidential and may be legally
 privileged. If you received this email in error, please delete it immediately
 and do not copy it or use it for any purpose or disclose its contents to any
 other person. Thank you.
 
 本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。
 请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
 



Re: possible bug in zookeeper ?

2010-09-14 Thread Mahadev Konar
Hi yatir,
 Can you confirm that zook1 , zook2 can be nslookedup from the client
machine? 

We havent seen a bug like this. It would be great to nail this down.

Thanks
mahadev


On 9/14/10 8:44 AM, Yatir Ben Shlomo yat...@outbrain.com wrote:

 zook1:2181,zook2:2181,zook3:2181
 
 
 -Original Message-
 From: Ted Dunning [mailto:ted.dunn...@gmail.com]
 Sent: Tuesday, September 14, 2010 4:11 PM
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: possible bug in zookeeper ?
 
 What was the list of servers that was given originally to open the
 connection to ZK?
 
 On Tue, Sep 14, 2010 at 6:15 AM, Yatir Ben Shlomo yat...@outbrain.comwrote:
 
 Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances.
 
 I am performing survivability  tests:
 Taking one of the zookeeper instances down I would expect the client to use
 a different zookeeper server instance.
 
 But as you can see in the below logs attached
 Depending on which instance I choose to take down (in my case,  the last
 one in the list of zookeeper servers)
 the client is constantly insisting on the same zookeeper server (Attempting
 connection to server zook3/192.168.252.78:2181)
 and not switching to a different one
 the problem seems to arrive from ClientCnxn.java
 Any one has an idea on this ?
 
 Solr cloud currently is using  zookeeper-3.2.2.jar
 Is this a know bug that was fixed in later versions ?( 3.3.1)
 
 Thanks in advance,
 Yatir
 
 
 Logs:
 
 Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown input
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java)
 :999)
at
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970
)
 Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown output
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java)
 :1004)
at
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970
)
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
 INFO: Attempting connection to server zook3/192.168.252.78:2181
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
 WARNING: Exception closing session 0x32b105244a20001 to
 sun.nio.ch.selectionkeyi...@3ca58cbf
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933
)
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown input
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java)
 :999)
at
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970
)
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown output
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at
 org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java)
 :1004)
at
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970
)
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
 INFO: Attempting connection to server zook3/192.168.252.78:2181
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
 WARNING: Exception closing session 0x32b105244a2 to
 sun.nio.ch.selectionkeyi...@3960f81b
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933
)
 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
 WARNING: Ignoring exception during shutdown input
 java.nio.channels.ClosedChannelException
at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at 

Re: Receiving create events for self with synchronous create

2010-09-13 Thread Mahadev Konar
Hi Todd,
 Sorry for my late response. I had marked this email to respond but couldn't 
find the time :). Did you figure this out? It mostly looks like that as soon as 
you set a watch on /follower, some other node instantly creates another child 
of  /follower? Could that be the case?

Thanks
mahadev


On 8/26/10 8:09 PM, Todd Nine t...@spidertracks.co.nz wrote:

Sure thing.  The FollowerWatcher class is instantiated by the IClusterManager 
implementation.It then performs the following

FollowerWatcher.init() which is intended to do the following.

1. Create our follower node so that other nodes know we exist at path 
/com/spidertracks/aviator/cluster/follower/10.0.1.1  where the last node is 
an ephemeral node with the internal IP address of the node.  These are lines 67 
through 72.
2. Signal to the clusterManager that the cluster has changed (line 79).  
Ultimately the clusterManager will perform a barrier for partitioning data ( a 
separate watcher)
3. Register a watcher to receive all future events on the follower path 
/com/spidertracks/aviator/cluster/follower/ line 81.


Then we have the following characteristics in the watcher

1. If a node has been added or deleted from the children of 
/com/spidertracks/aviator/cluster/follower then continue.  Otherwise, ignore 
the event.  Lines 33 through 44
2. If this was an event we should process our cluster has changed, signal to 
the CusterManager that a node has either been added or removed. line 51.


I'm trying to encapsulate the detection of additions and deletions of child 
nodes within this Watcher.  All other events that occur due to a node being 
added or deleted should be handled externally by the clustermanager.

Thanks,
Todd


On Thu, 2010-08-26 at 19:26 -0700, Mahadev Konar wrote:
Hi Todd,
   The code that you point to, I am not able to make out the sequence of steps.
Can you be more clear on what you are trying to do in terms of zookeeper 
api?

 Thanks
 mahadev
 On 8/26/10 5:58 PM, Todd Nine t...@spidertracks.co.nz wrote:


Hi all,
   I'm running into a strange issue I could use a hand with.   I've
 implemented leader election, and this is working well.  I'm now
 implementing a follower queue with ephemeral nodes. I have an interface
 IClusterManager which simply has the api clusterChanged.  I don't care
 if nodes are added or deleted, I always want to fire this event.  I have
 the following basic algorithm.


 init

 Create a path with /follower/+mynode name

 fire the clusterChangedEvent

 Watch set the event watcher on the path /follower.


 watch:

 reset the watch on /follower

 if event is not a NodeDeleted or NodeCreated, ignore

 fire the clustermanager event


 this seems pretty straightforward.  Here is what I'm expecting


 1. Create my node path
 2. fire the clusterChanged event
 3. Set watch on /follower
 4. Receive watch events for changes from any other nodes.

 What's actually happening

 1. Create my node path
 2. fire the clusterChanged event
 3. Set Watch on /follower
 4. Receive watch event for node created in step 1
 5. Receive future watch events for changes from any other nodes.


 Here is my code.  Since I set the watch after I create the node, I'm not
 expecting to receive the event for it.  Am I doing something incorrectly
 in creating my watch?  Here is my code.

 http://pastebin.com/zDXgLagd

 Thanks,
 Todd









Re: Lock example

2010-09-13 Thread Mahadev Konar
Hi Tim,
 The lock recipe you mention is supposed to avoid her affect and prevent 
starvation (though it has bugs :)).
 Are you looking for something like that or just a simple lock and unlock that 
doesn't have to worry abt the above issues.
If that's the case then just doing an ephemeral create and delete should give 
you your lock and unlock recipes.


Thanks
mahadev


On 9/8/10 9:58 PM, Tim Robertson timrobertson...@gmail.com wrote:

Hi all,

I am new to ZK and using the queue and lock examples that come with
zookeeper but have run into ZOOKEEPER-645 with the lock.
I have several JVMs each keeping a long running ZK client and the
first JVM (and hence client) does not respect the locks obtained by
subsequent clients - e.g. the first client always manages to get the
lock even if another client holds it.

Before I start digging, I thought I'd ask if anyone has a simple lock
implemented they might share?  My needs are simply to lock a URL to
indicate that it is being worked on, so that I don't hammer my
endpoints with multiple clients.

Thanks for any advice,
Tim



Re: Understanding ZooKeeper data file management and LogFormatter

2010-09-13 Thread Mahadev Konar
Hi Vishal,
 Usually the default retention policy is safe enough for operations.

http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html

Gives you an overview of how to use the purging library in zookeeper.

Thanks
mahadev


On 9/8/10 12:01 PM, Vishal K vishalm...@gmail.com wrote:

 Hi All,
 
 Can you please share your experience regarding ZK snapshot retention and
 recovery policies?
 
 We have an application where we never need to rollback (i.e., revert back to
 a previous state by using old snapshots). Given this, I am trying to
 understand under what circumstances would we ever need to use old ZK
 snapshots. I understand a lot of these decisions depend on the application
 and amount of redundancy used at every level (e.g,. RAID level where the
 snapshots are stored etc) in the product. To simplify the discussion, I
 would like to rule out any application characteristics and focus mainly on
 data consistency.
 
 - Assuming that we have a 3 node cluster I am trying to figure out when
 would I really need to use old snapshot files. With 3 nodes we already have
 at least 2 servers with consistent database. If I loose files on one of the
 servers, I can use files from the other. In fact, ZK server join will take
 care of this. I can remove files from a faulty node and reboot that node.
 The faulty node will sync with the leader.
 
 - The old files will be useful if the current snapshot and/or log files are
 lost or corrupted on all 3 servers. If  the loss is due to a disaster (case
 where we loose all 3 servers), one would have to keep the snapshots on some
 external storage to recover. However, if the current snapshot file is
 corrupted on all 3 servers, then the most likely cause would be a bug in ZK.
 In which case, how can I trust the consistency of the old snapshots?
 
 - Given a set of snapshots and log files, how can I verify the correctness
 of these files? Example, if one of the intermediate snapshot file is
 corrupt.
 
 - The Admin's guide says Using older log and snapshot files, you can look
 at the previous state of ZooKeeper servers and even restore that state. The
 LogFormatter class allows an administrator to look at the transactions in a
 log. * *Is there a tool that does this for the admin?  The LogFormatter
 only displays the transactions in the log file.
 
 - Has anyone ever had to play with the snapshot files in production?
 
 Thanks in advance.
 
 Regards,
 -Vishal
 



Re: ZooKeeper C bindings and cygwin?

2010-09-04 Thread Mahadev Konar
Hi Jan,
 It would be great to have some documentation on how to use the windows 
install. Would you mind submitting a patch with documentation with FAQ's and 
any other issues you might have faced?

Thanks
mahadev


On 9/1/10 6:04 AM, jdeinh...@ujam.com jdeinh...@ujam.com wrote:

Dear list readers,

we've solved the problem ourself. We found the dll CYGZOOKEEPER_MT-2.DLL in 
/usr/local/bin.

Best regards
Jan  Jan

Am 01.09.2010 um 12:57 schrieb jdeinh...@ujam.commailto:jdeinh...@ujam.com:

Dear list readers,

we want to use the zookeeper C bindings with our applications. Some of them are 
running on Linux (e.g. Load Balancer) and others (.NET(C#) Audio Servers) on 
Windows Server 2008.
We'd like to try using cygwin to accomplish this task on windows, but we need 
further advice on how to do that.

What we did so far:

1) downloaded latest cygwin
2) ran ./configure
3) ran make
4) ran make install


Now we find some files (libzookeeper_mt.a, libzookeeper_mt.dll.a, 
libzookeeper_mt.la, libzookeeper_st.a, libzookeeper_st.dll.a and 
libzookeeper_st.la) in our cygwin/usr/local/lib folder, but these cannot be 
used in Visual Studio.
Is it somehow possible to produce a file that we can then use like a .dll or a 
.lib ?
What do we have to do to accomplish our task? Are we heading in a completely 
wrong direction?


Any help is greatly appreciated, thank you in advance!



Best regards
Jan  Jan





Jan Deinhard

Software Developer

UJAM GmbH
Speicher 1
Konsul-Smidt-Str 8d
28217 Bremen

fon  +49 421 89 80 97-04

jdeinh...@ujam.commailto:a...@ujam.com
www.ujam.comhttp://www.ujam.com/







Re: getting created child on NodeChildrenChanged event

2010-09-04 Thread Mahadev Konar
Hi Todd, 
  We have always tried to lean on the side of keeping things lightweight and
the api simple. The only way you would be able to do this is with sequential
creates.

1. create nodes like /queueelement-$i where i is a monotonically increasing
number. You could use the sequential flag of zookeeper to do this.

2. when deleting a node, you would remove the node and create a deleted node
on 

/deletedqueueelements/queuelement-$i

2.1 on notification you would go to /deletedqueelements/ and find out which
ones were deleted. 

The above only works if you are ok with monotonically unique queue elements.

3. the above method allows the folks to see the deltas using
deletedqueuelements, which can be garbage collected by some clean up process
(you can be smarter abt this as well)

Would something like this work?


Thanks
mahadev


On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote:

 Hi Dave,
   Thanks for the response.  I understand your point about missed events
 during a watch reset period.  I may be off, here is the functionality I
 was thinking.  I'm not sure if the ZK internal versioning process could
 possibly support something like this.
 
 1. A watch is placed on children
 2. The event is fired to the client.  The client receives the Stat
 object as part of the event for the current state of the node when the
 event was created.  We'll call this Stat A with version 1
 3. The client performs processing.  Meanwhile the node has several
 children changed. Versions are incremented to version 2 and version 3
 4. Client resets the watch
 5. A node is added
 6. The event is fired to the client.  Client receives Stat B with
 version 4
 7. Client calls performs a deltaChildren(Stat A, Stat B)
 8. zookeeper returns added nodes between stats, also returns deleted
 nodes between stats.
 
 This would handle the missed event problem since the client would have
 the 2 states it needs to compare.  It also allows clients dealing with
 large data sets to only deal with the delta over time (like a git
 replay).  Our number of queues could get quite large, and I'm concerned
 that keeping my previous event's children in a set to perform the delta
 may become quite memory and processor intensive  Would a feature like
 this be possible without over complicating the Zookeeper core?
 
 
 Thanks,
 Todd
 
 On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote:
 
 Hi Todd -
 The general explanation for why Zookeeper doesn't pass the event information
 w/ the event notification is that an event notification is only triggered
 once, and thus may indicate multiple events. For example, if you do a
 GetChildren and set a watch, then multiple children are added at about the
 same time, the first one triggers a notification, but the second (or later)
 ones do not. When you do another GetChildren() request to get the list and
 reset the watch, you'll see all the changed nodes, however if you had just
 been told about the first change in the notification you would have missed
 the others.
 To do what you are wanting, you would really need persistent watches that
 send notifications every time a change occurs and don't need to be reset so
 you can't miss events. That isn't the design that was chosen for Zookeeper
 and I don't think it's likely to be implemented.
 
 -Dave Wright
 
 On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote:
 
 Hi all,
  I'm writing a distributed queue monitoring class for our leader node in
 the cluster.  We're queueing messages per input hardware device, this queue
 is then assigned to a node with the least load in our cluster.  To do this,
 I maintain 2 Persistent Znode with the following format.
 
 data queue
 
 /dataqueue/devices/unit id/data packet
 
 processing follower
 
 /dataqueue/nodes/node name/unit id
 
 The queue monitor watches for changes on the path of /dataqueue/devices.
  When the first packet from a unit is received, the queue writer will
 create
 the queue with the unit id.  This triggers the watch event on the
 monitoring
 class, which in turn creates the znode for the path with the least loaded
 node.  This path is watched for child node creation and the node creates a
 queue consumer to consume messages from the new queue.
 
 
 Our list of queues can become quite large, and I would prefer not to
 maintain a list of queues I have assigned then perform a delta when the
 event fires to determine which queues are new and caused the watch event. I
 can't really use sequenced nodes and keep track of my last read position,
 because I don't want to iterate over the list of queues to determine which
 sequenced node belongs to the current unit id (it would require full
 iteration, which really doesn't save me any reads).  Is it possible to
 create a watch to return the path and Stat of the child node that caused
 the
 event to fire?
 
 Thanks,
 Todd
 
 



Re: Logs and in memory operations

2010-09-04 Thread Mahadev Konar
Hi Avinash,
  IN the source code the FinalRequestProcessor updates the in memory data 
structures and the SyncRequestProcessor logs to disk.
For deciding when to delete take a look at PurgeTxnLog.java file.

Thanks
mahadev


On 8/30/10 1:11 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote:

Hi All

From my understanding when a znode is updated/created a write happens into
the local transaction logs and then some in-memory data structure is updated
to serve the future reads.
Where in the source code can I find this? Also how can I decide when it is
ok for me to delete the logs off disk?

Please advice.

Cheers
Avinash



Re: Spew after call to close

2010-09-03 Thread Mahadev Konar

Hi Stack,
 Looks like you are shutting down the server and shutting down the client at
the same time? Is that the issue?

Thanks
mahadev

On 9/3/10 4:47 PM, Stack st...@duboce.net wrote:

 Have you fellas seen this before? I call close on zookeeper but it insists
 on doing the below exceptions.  Why is it doing this 'Session
 0x12ad9dccda30002
 for server null, unexpected error, closing socket connection and
 attempting reconnect'?   This would seem to come after the close has
 been noticed and looking in code, i'd think we'd not do this since the
 close flag should be set to true post call to close?
 
 Thanks lads  (The below looks ugly in our logs... this is zk 3.3.1),
 St.Ack
 
 2010-09-03 16:09:52,369 INFO
 org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
 for client /fe80:0:0:0:0:0:0:1%1:56941 which had sessionid
 0x12ad9dccda30001
 2010-09-03 16:09:52,369 INFO
 org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
 for client /127.0.0.1:56942 which had sessionid 0x12ad9dccda30002
 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable
 to read additional data from server sessionid 0x12ad9dccda30001,
 likely server has closed socket, closing socket connection and
 attempting reconnect
 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable
 to read additional data from server sessionid 0x12ad9dccda30002,
 likely server has closed socket, closing socket connection and
 attempting reconnect
 2010-09-03 16:09:52,370 INFO
 org.apache.zookeeper.server.NIOServerCnxn: NIOServerCnxn factory
 exited run method
 2010-09-03 16:09:52,370 INFO
 org.apache.zookeeper.server.PrepRequestProcessor: PrepRequestProcessor
 exited loop!
 2010-09-03 16:09:52,370 INFO
 org.apache.zookeeper.server.SyncRequestProcessor: SyncRequestProcessor
 exited!
 2010-09-03 16:09:52,370 INFO
 org.apache.zookeeper.server.FinalRequestProcessor: shutdown of request
 processor complete
 2010-09-03 16:09:52,470 DEBUG
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
 Received ZooKeeper Event, type=None, state=Disconnected, path=null
 2010-09-03 16:09:52,470 INFO
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
 Received Disconnected from ZooKeeper, ignoring
 2010-09-03 16:09:52,471 DEBUG
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
 Received ZooKeeper Event, type=None, state=Disconnected, path=null
 2010-09-03 16:09:52,471 INFO
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase
 Received Disconnected from ZooKeeper, ignoring
 2010-09-03 16:09:52,857 INFO org.apache.zookeeper.ClientCnxn: Opening
 socket connection to server localhost/0:0:0:0:0:0:0:1:2181
 2010-09-03 16:09:52,858 WARN org.apache.zookeeper.ClientCnxn: Session
 0x12ad9dccda30001 for server null, unexpected error, closing socket
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 2010-09-03 16:09:53,149 INFO org.apache.zookeeper.ClientCnxn: Opening
 socket connection to server localhost/fe80:0:0:0:0:0:0:1%1:2181
 2010-09-03 16:09:53,150 WARN org.apache.zookeeper.ClientCnxn: Session
 0x12ad9dccda30002 for server null, unexpected error, closing socket
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 2010-09-03 16:09:53,576 INFO org.apache.zookeeper.ClientCnxn: Opening
 socket connection to server localhost/127.0.0.1:2181
 2010-09-03 16:09:53,576 WARN org.apache.zookeeper.ClientCnxn: Session
 0x12ad9dccda30001 for server null, unexpected error, closing socket
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
 2010-09-03 16:09:54,000 INFO
 org.apache.zookeeper.server.SessionTrackerImpl: SessionTrackerImpl
 exited loop!
 2010-09-03 16:09:54,002 DEBUG
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Closed
 zookeeper sessionid=0x12ad9dccda30001
 2010-09-03 16:09:54,129 INFO org.apache.zookeeper.ClientCnxn: Opening
 socket connection to server localhost/0:0:0:0:0:0:0:1:2181
 2010-09-03 16:09:54,130 WARN org.apache.zookeeper.ClientCnxn: Session
 0x12ad9dccda30002 for server null, unexpected error, closing socket
 connection and attempting reconnect
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 

Re: election recipe

2010-09-03 Thread Mahadev Konar
Hi Eric,
 As Ted and you yourself mentioned its mostly to avoid herd affect.  A herd
affect would usually mean 1000¹s of client notified of some change and would
try creating the same node on notification.  With just 10¹s of clients you
don¹t need to worry abt this herd effect at all.

Thanks
mahadev


On 9/2/10 3:40 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 You are correct that this simpler recipe will work for smaller populations
 and correct that the complications are to avoid the herd effect.
 
 
 
 On Thu, Sep 2, 2010 at 12:55 PM, Eric van Orsouw
 eric.van.ors...@gmail.comwrote:
 
 Hi there,
 
 
 
 I would like to use zookeeper to implement an election scheme.
 
 There is a recipe on the homepage, but it is relatively complex.
 
 I was wondering what was wrong with the following pseudo code;
 
 
 
 forever {
 
zookeeper.create -e /election my_ip_address
 
if creation succeeded then {
 
// do the leader thing
 
} else {
 
// wait for change in /election using watcher mechanism
 
}
 
 }
 
 
 
 My assumption is that the recipe is more elaborate to the eliminate the
 flood of requests if the leader falls away.
 
 But if there are only a handful of leader-candidates ,than that should not
 be a problem.
 
 
 
 Is this correct, or am I missing out on something.
 
 
 
 Thanks,
 
 Eric
 
 
 
 
 



Re: Zookeeper stops

2010-08-26 Thread Mahadev Konar
HI Ted,
 You can take a look at
http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperAdmin.html

To see how to set up directory outside of /tmp.
I am not sure if you zookeeper instance is part of hbase installation or not. 
In that case you would be better off posting this question on hbase list.

Thanks
mahadev




On 8/26/10 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:

I saw the same error in hbase-hadoop-zookeeper-X.log
zookeeper-3.2.2 is used and managed by HBase.

How do I use a directory outside of /tmp for zookeeper persistence ?

Thanks

On Thu, Aug 19, 2010 at 1:42 PM, Patrick Hunt ph...@apache.org wrote:

 No. You configure it in the server configuration file.

 Patrick


 On 08/19/2010 01:19 PM, Wim Jongman wrote:

 Hi,

 But zk does default to /tmp?

 Regards,

 Wim





 On Thursday, August 19, 2010, Patrick Huntph...@apache.org  wrote:

 +1 on that Ted. I frequently see this issue crop up as I just rebooted
 my server and lost all my data ... -- many os's will cleanup tmp on reboot.
 :-)

 Patrick

 On 08/19/2010 07:43 AM, Ted Dunning wrote:

 Also, /tmp is not a great place to keep things that are intended for
 persistence.

 On Thu, Aug 19, 2010 at 7:34 AM, Mahadev Konarmaha...@yahoo-inc.com
 wrote:


 Hi Wim,
   It mostly looks like that zookeeper is not able to create files on the
 /tmp filesystem. Is there is a space shortage or is it possible the file
 is
 being deleted as its being written to?

 Sometimes admins have a crontab on /tmp that cleans up the /tmp
 filesystem.

 Thanks
 mahadev


 On 8/19/10 1:15 AM, Wim Jongmanwim.jong...@gmail.comwrote:

 Hi,

 I have a zookeeper server running that can sometimes run for days and
 then
 quits:

 Is there somebody with a clue to the problem?

 I am running 64 bit Ubuntu with

 java version 1.6.0_18
 OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
 OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

 Zookeeper 3.3.0

 The log below has some context before it shows the fatal error. Our
 component.id=40676 indicates that it is the 40676th time that I ask ZK
 to
 publish this information. It has been seen to go up to half a million
 before
 stopping.

 Regards,

 Wim

 ZooDiscoveryService Unpublished: Aug 18, 2010 11:17:28 PM.
 ServiceInfo[uri=osgiservices://


 188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservicehttp://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;id=ServiceID%5Btype%3DServiceTypeID%5BtypeName%3D_osgiservices._tcp.default._iana%5D%3Blocation%3Dosgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=%5D;priority=0;weight=0;props=ServiceProperties%5B%7Becf.rsvc.ns=ecf.namespace.generic.remoteservice
 ,


 osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
 ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
 =org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@68a1e081,
 component.name=Star Wars Quotes Service, ecf.sp.ect=ecf.generic.server,
 component.id=40676,


 ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5b9a6ad1
 }]]
 ZooDiscoveryService Published: Aug 18, 2010 11:17:29 PM.
 ServiceInfo[uri=osgiservices://


 188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservicehttp://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID%5Btype%3DServiceTypeID%5BtypeName%3D_osgiservices._tcp.default._iana%5D%3Blocation%3Dosgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=%5D;priority=0;weight=0;props=ServiceProperties%5B%7Becf.rsvc.ns=ecf.namespace.generic.remoteservice
 ,


 osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
 ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
 =org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4,
 component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server,
 component.id=40677,


 ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953
 }]]
 [log;+0200 2010.08.18


 23:17:29:545;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remo





Re: Receiving create events for self with synchronous create

2010-08-26 Thread Mahadev Konar
Hi Todd,
  The code that you point to, I am not able to make out the sequence of steps.
   Can you be more clear on what you are trying to do in terms of zookeeper api?

Thanks
mahadev
On 8/26/10 5:58 PM, Todd Nine t...@spidertracks.co.nz wrote:

Hi all,
  I'm running into a strange issue I could use a hand with.   I've
implemented leader election, and this is working well.  I'm now
implementing a follower queue with ephemeral nodes. I have an interface
IClusterManager which simply has the api clusterChanged.  I don't care
if nodes are added or deleted, I always want to fire this event.  I have
the following basic algorithm.


init

Create a path with /follower/+mynode name

fire the clusterChangedEvent

Watch set the event watcher on the path /follower.


watch:

reset the watch on /follower

if event is not a NodeDeleted or NodeCreated, ignore

fire the clustermanager event


this seems pretty straightforward.  Here is what I'm expecting


1. Create my node path
2. fire the clusterChanged event
3. Set watch on /follower
4. Receive watch events for changes from any other nodes.

What's actually happening

1. Create my node path
2. fire the clusterChanged event
3. Set Watch on /follower
4. Receive watch event for node created in step 1
5. Receive future watch events for changes from any other nodes.


Here is my code.  Since I set the watch after I create the node, I'm not
expecting to receive the event for it.  Am I doing something incorrectly
in creating my watch?  Here is my code.

http://pastebin.com/zDXgLagd

Thanks,
Todd







Re: Size of a znode in memory

2010-08-25 Thread Mahadev Konar
Hi Marten,
 The usual memory footprint of a znode is around 40-80 bytes.

 I think Ben is planning to document a way to calculate approximate memory
footprint of your zk servers given a set of updates and there sizes.

 thanks
mahadev


On 8/25/10 11:49 AM, Maarten Koopmans maar...@vrijheid.net wrote:

 Hi,
 
 Is there a way to know/measure the size of a znode? My average znode has a
 name of 32 bytes and user data of max 128 bytes.
 
 Or is the only way to run a smoke test and watch the heap growth via jconsole
 or so?
 
 Thanks, Maarten 
 



Re: Searching more ZooKeeper content

2010-08-25 Thread Mahadev Konar
I am definitely a +1 on this, given that its powered by Solr.

Thanks
mahadev


On 8/25/10 9:22 AM, Alex Baranau alex.barano...@gmail.com wrote:

 Hello guys,
 
 Over at http://search-hadoop.com we index ZooKeeper project's mailing lists,
 wiki, web site,
 source code, javadoc, jira...
 
 Would the community be interested in a patch that replaces the
 Google-powered
 search with that from search-hadoop.com, set to search only ZooKeeper
 project by
 default?
 
 We look into adding this search service for all Hadoop's sub-projects.
 
 Assuming people are for this, any suggestions for how the search should
 function by default or any specific instructions for how the search box
 should
 be modified would be great!
 
 Thank you,
 Alex Baranau.
 
 P.S. HBase community already accepted our proposal (please refer to
 https://issues.apache.org/jira/browse/HBASE-2886) and new version (0.90)
 will include new search box. Also the patch is available for TIKA (we are in
 the process of discussing some details now):
 https://issues.apache.org/jira/browse/TIKA-488. ZooKeeper's site looks much
 like Avro's for which we also created patch recently (
 https://issues.apache.org/jira/browse/AVRO-626).
 



Re: Parent nodes multi-step transactions

2010-08-23 Thread Mahadev Konar
Hi Gustavo,
 Usually the paradigm I like to suggest is to have something like

/A/init

Every client watches for the existence of this node and this node is only
created after /A has been initialized with the creation of /A/C or other
stuff.

Would that work for you?

Thanks
mahadev


On 8/23/10 7:34 AM, Gustavo Niemeyer gust...@niemeyer.net wrote:

 Greetings,
 
 We (a development team at Canonical) are stumbling into a situation
 here which I'd be curious to understand what is the general practice,
 since I'm sure this is somewhat of a common issue.
 
 It's quite easy to describe it: say there's a parent node A somewhere
 in the tree.  That node was created dynamically over the course of
 running the system, because it's associated with some resource which
 has its own life-span.  Now, under this node we put some control nodes
 for different reasons (say, A/B), and we also want to track some
 information which is related to a sequence of nodes (say, A/C/D-0,
 A/C/D-1, etc).
 
 So, we end up with something like this:
 
 A/B
 A/C/D-0
 A/C/D-1
 
 The question here is about best-practices for taking care of nodes
 like A/C.  It'd be fantastic to be able to create A's structure
 together with A itself, otherwise we risk getting in a situation where
 a client can see the node A before its initialization has been
 finished (A/C doesn't exist yet).  In fact, A/C may never exist, since
 it is possible for a client to die between the creation of A and C.
 
 Anyway, I'm sure you all understand the problem.  The question here
 is: this is pretty common, and quite boring to deal with properly on
 every single client.  Is there any feature in the roadmap to deal with
 this, and any common practice besides the obvious check for
 half-initialization and wait for A/C to be created or deal with
 timeouts and whatnot on every client?
 
 I'm about to start writing another layer on top of Zookeeper's API, so
 it'd be great to have some additional insight into this issue.
 
 --
 Gustavo Niemeyer
 http://niemeyer.net
 http://niemeyer.net/blog
 http://niemeyer.net/twitter
 



Re: Non Hadoop scheduling frameworks

2010-08-23 Thread Mahadev Konar
Hi Todd,
  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or is 
this a broader question for scheduling on cassandra nodes? For the latter this 
probably isnt the right mailing list.

Thanks
mahadev


On 8/23/10 4:02 PM, Todd Nine t...@spidertracks.co.nz wrote:

Hi all,
  We're using Zookeeper for Leader Election and system monitoring.  We're
also using it for synchronizing our cluster wide jobs with  barriers.  We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job.  In the
event of a system failure, another node in our application cluster will need
to fire this Job.  I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have.  I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/


I've looked at both plugins, but they require hadoop.  We're not currently
running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.


UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.


Any input would be greatly appreciated.

Thanks,
Todd



Re: Zookeeper stops

2010-08-19 Thread Mahadev Konar
Hi Wim,
  It mostly looks like that zookeeper is not able to create files on the /tmp 
filesystem. Is there is a space shortage or is it possible the file is being 
deleted as its being written to?

Sometimes admins have a crontab on /tmp that cleans up the /tmp filesystem.

Thanks
mahadev


On 8/19/10 1:15 AM, Wim Jongman wim.jong...@gmail.com wrote:

Hi,

I have a zookeeper server running that can sometimes run for days and then
quits:

Is there somebody with a clue to the problem?

I am running 64 bit Ubuntu with

java version 1.6.0_18
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

Zookeeper 3.3.0

The log below has some context before it shows the fatal error. Our
component.id=40676 indicates that it is the 40676th time that I ask ZK to
publish this information. It has been seen to go up to half a million before
stopping.

Regards,

Wim

ZooDiscovery Service Unpublished: Aug 18, 2010 11:17:28 PM.
ServiceInfo[uri=osgiservices://
188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice,
osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@68a1e081,
component.name=Star Wars Quotes Service, ecf.sp.ect=ecf.generic.server,
component.id=40676,
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5b9a6ad1
}]]
ZooDiscovery Service Published: Aug 18, 2010 11:17:29 PM.
ServiceInfo[uri=osgiservices://
188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice,
osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4,
component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server,
component.id=40677,
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953
}]]
[log;+0200 2010.08.18
23:17:29:545;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remoteservice;code=0;message=No
async remote service interface found with
name=org.eclipse.ecf.services.quotes.QuoteServiceAsync for proxy service
class=org.eclipse.ecf.services.quotes.QuoteService;severity2;exception=null;children=[]]]
2010-08-18 23:17:37,057 - FATAL [Snapshot Thread:zookeeperser...@262] -
Severe unrecoverable error, exiting
java.io.FileNotFoundException: /tmp/zookeeperData/version-2/snapshot.13e2e
(No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:209)
at java.io.FileOutputStream.init(FileOutputStream.java:160)
at
org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:224)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:211)
at
org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:260)
at
org.apache.zookeeper.server.SyncRequestProcessor$1.run(SyncRequestProcessor.java:120)
ZooDiscovery Service Unpublished: Aug 18, 2010 11:17:37 PM.
ServiceInfo[uri=osgiservices://
188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice,
osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService,
ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id
=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4,
component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server,
component.id=40677,
ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953
}]]



Re: A question about Watcher

2010-08-16 Thread Mahadev Konar
Hi Qian,
 The watcher information is saved at the client, and the client will
reattach the watches to the new server it connects to.
  Hope that helps.

Thanks
mahadev


On 8/16/10 9:28 AM, Qian Ye yeqian@gmail.com wrote:

 thx for explaination. Since the watcher can be preserved when the client
 switch the zookeeper server it connects to, does that means all the watchers
 information will be saved on all the zookeeper servers? I didn't find any
 source of the client can hold the watchers information.
 
 
 On Tue, Aug 17, 2010 at 12:21 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 I should correct this.  The watchers will deliver a session expiration
 event, but since the connection is closed at that point no further
 events will be delivered and the cluster will remove them.  This is as good
 as the watchers disappearing.
 
 On Mon, Aug 16, 2010 at 9:20 AM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
 The other is session expiration.  Watchers do not survive this.  This
 happens when a client does not provide timely
 evidence that it is alive and is marked as having disappeared by the
 cluster.
 
 
 
 
 
 --
 With Regards!
 
 Ye, Qian
 



Re: How to handle Node does not exist error?

2010-08-11 Thread Mahadev Konar
HI Dr Hao,
  Can you please post the configuration of all the 3 zookeeper servers? I
suspect it might be misconfigured clusters and they might not belong to the
same ensemble.

Just to be clear:
/xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002807

And other such nodes exist on one of the zookeeper servers and the same node
does not exist on other servers?

Also, as ted pointed out, can you please post the output of echo ³stat² | nc
localhost 2181 (on all the 3 servers) to the list?

Thanks
mahadev



On 8/11/10 12:10 AM, Dr Hao He h...@softtouchit.com wrote:

 hi, Ted,
 
 Thanks for the reply.  Here is what I did:
 
 [zk: localhost:2181(CONNECTED) 0] ls
 /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948
 []
 zk: localhost:2181(CONNECTED) 1] ls
 /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs
 [msg002807, msg002700, msg002701, msg002804, msg002704,
 msg002706, msg002601, msg001849, msg001847, msg002508,
 msg002609, msg001841, msg002607, msg002606, msg002604,
 msg002809, msg002817, msg001633, msg002812, msg002814,
 msg002711, msg002815, msg002713, msg002716, msg001772,
 msg002811, msg001635, msg001774, msg002515, msg002610,
 msg001838, msg002517, msg002612, msg002519, msg001973,
 msg001835, msg001974, msg002619, msg001831, msg002510,
 msg002512, msg002615, msg002614, msg002617, msg002104,
 msg002106, msg001769, msg001768, msg002828, msg002822,
 msg001760, msg002820, msg001963, msg001961, msg002110,
 msg002118, msg002900, msg002836, msg001757, msg002907,
 msg001753, msg001752, msg001755, msg001952, msg001958,
 msg001852, msg001956, msg001854, msg002749, msg001608,
 msg001609, msg002747, msg002882, msg001743, msg002888,
 msg001605, msg002885, msg001487, msg001746, msg002330,
 msg001749, msg001488, msg001489, msg001881, msg001491,
 msg002890, msg001889, msg002758, msg002241, msg002892,
 msg002852, msg002759, msg002898, msg002850, msg001733,
 msg002751, msg001739, msg002753, msg002756, msg002332,
 msg001872, msg002233, msg001721, msg001627, msg001720,
 msg001625, msg001628, msg001629, msg001729, msg002350,
 msg001727, msg002352, msg001622, msg001726, msg001623,
 msg001723, msg001724, msg001621, msg002736, msg002738,
 msg002363, msg001717, msg002878, msg002362, msg002361,
 msg001611, msg001894, msg002357, msg002218, msg002358,
 msg002355, msg001895, msg002356, msg001898, msg002354,
 msg001996, msg001990, msg002093, msg002880, msg002576,
 msg002579, msg002267, msg002266, msg002366, msg001901,
 msg002365, msg001903, msg001799, msg001906, msg002368,
 msg001597, msg002679, msg002166, msg001595, msg002481,
 msg002482, msg002373, msg002374, msg002371, msg001599,
 msg002773, msg002274, msg002275, msg002270, msg002583,
 msg002271, msg002580, msg002067, msg002277, msg002278,
 msg002376, msg002180, msg002467, msg002378, msg002182,
 msg002377, msg002184, msg002379, msg002187, msg002186,
 msg002665, msg002666, msg002381, msg002382, msg002661,
 msg002662, msg002663, msg002385, msg002284, msg002766,
 msg002282, msg002190, msg002599, msg002054, msg002596,
 msg002453, msg002459, msg002457, msg002456, msg002191,
 msg002652, msg002395, msg002650, msg002656, msg002655,
 msg002189, msg002047, msg002658, msg002659, msg002796,
 msg002250, msg002255, msg002589, msg002257, msg002061,
 msg002064, msg002585, msg002258, msg002587, msg002444,
 msg002446, msg002447, msg002450, msg002646, msg001501,
 msg002591, msg002592, msg001503, msg001506, msg002260,
 msg002594, msg002262, msg002263, msg002264, msg002590,
 msg002132, msg002130, msg002530, msg002931, msg001559,
 msg001808, msg002024, msg001553, msg002939, msg002937,
 msg001556, msg002935, msg002933, msg002140, msg001937,
 msg002143, msg002520, msg002522, msg002429, msg002524,
 msg002920, msg002035, msg001561, msg002134, msg002138,
 msg002925, msg002151, msg002287, msg002555, msg002010,
 msg002002, msg002290, msg001537, msg002005, msg002147,
 msg002145, msg002698, 

Re: Sequence Number Generation With Zookeeper

2010-08-06 Thread Mahadev Konar
Hi David,
 I think it would be really useful. It would be very helpful for someone
looking for geenrating unique tokens/generations ids ( I can think of plenty
of applications for this).

Please do consider contributing it back to the community!

Thanks
mahadev


On 8/6/10 7:10 AM, David Rosenstrauch dar...@darose.net wrote:

 Perhaps.  I'd have to ask my boss for permission to release the code.
 
 Is this something that would be interesting/useful to other people?  If
 so, I can ask about it.
 
 DR
 
 On 08/05/2010 11:02 PM, Jonathan Holloway wrote:
 Hi David,
 
 We did discuss potentially doing this as well.  It would be nice to get some
 recipes for Zookeeper done for this area, if people think it's useful.  Were
 you thinking of submitting this back as a recipe, if not then I could
 potentially work on such a recipe instead.
 
 Many thanks,
 Jon.
 
 
 I just ran into this exact situation, and handled it like so:
 
 I wrote a library that uses the option (b) you described above.  Only
 instead of requesting a single sequence number, you request a block of them
 at a time from Zookeeper, and then locally use them up one by one from the
 block you retrieved.  Retrieving by block (e.g., by blocks of 1 at a
 time) eliminates the contention issue.
 
 Then, if you're finished assigning ID's from that block, but still have a
 bunch of ID's left in the block, the library has another function to push
 back the unused ID's.  They'll then get pulled again in the next block
 retrieval.
 
 We don't actually have this code running in production yet, so I can't
 vouch for how well it works.  But the design was reviewed and given the
 thumbs up by the core developers on the team, and the implementation passes
 all my unit tests.
 
 HTH.  Feel free to email back with specific questions if you'd like more
 details.
 
 DR
 
 
 



Re: zkperl - skipped tests

2010-08-04 Thread Mahadev Konar
Hi Martin,
 You might have to look into the tests.
 t/50_access.t is the file you might want to take a look at. I am not a perl 
guru so am not of much help but let me know if you cant work out the details on 
the skipped tests. I will try to dig into the perl code.

Thanks
mahadev


On 8/4/10 6:16 AM, Martin Waite waite@gmail.com wrote:

Hi,

I built the perl module and ran the test suite.   For test 50_access, 3
tests are skipped.

vm-026-lenny-mw$ ZK_TEST_HOSTS=127.0.0.1:2181 make test
PERL_DL_NONLAZY=1 /usr/bin/perl -MExtUtils::Command::MM -e
test_harness(0, 'blib/lib', 'blib/arch') t/*.t
t/10_invalid..ok 1/107# no ZooKeeper path specified in ZK_TEST_PATH env
var, using root path
t/10_invalid..ok
t/15_thread...ok
t/20_tie..ok
t/22_stat_tie.ok
t/24_watch_tieok
t/30_connect..ok
t/35_log..ok
t/40_basicok
t/45_classok
t/50_access...ok
3/38 skipped: various reasons
t/60_watchok
All tests successful, 3 subtests skipped.
Files=11, Tests=461, 18 wallclock secs ( 2.01 cusr +  3.08 csys =  5.09 CPU)

Is there any way to find out which of the 38 tests were skipped and why ?

regards,
Martin



Re: node symlinks

2010-07-26 Thread Mahadev Konar
HI Maarteen,
  Can you elaborate on your use case of ZooKeeper? We currently don't have
any symlinks feature in zookeeper. The only way to do it for you would be a
client side hash/lookup table that buckets data to different zookeeper
servers. 

Or you could also store this hash/lookup table in one of the zookeeper
clusters. This lookup table can then be cached on the client side after
reading it once from zookeeper servers.

Thanks
mahadev


On 7/24/10 2:39 PM, Maarten Koopmans maar...@vrijheid.net wrote:

 Yes, I thought about Cassandra or Voldemort, but I need ZKs guarantees
 as it will provide the file system hierarchy to a flat object store so I
 need locking primitives and consistency. Doing that on top of Voldemort
 will give me a scalable version of ZK, but just slower. Might as well
 find a way to scale across ZK clusters.
 
 Also, I want to be able to add clusters as the number of nodes grows.
 Note that the #nodes will grow with the #users of the system, so the
 clusters can grow sequentially, hence the symlink idea.
 
 --Maarten
 
 On 07/24/2010 11:12 PM, Ted Dunning wrote:
 Depending on your application, it might be good to simply hash the node name
 to decide which ZK cluster to put it on.
 
 Also, a scalable key value store like Voldemort or Cassandra might be more
 appropriate for your application.  Unless you need the hard-core guarantees
 of ZK, they can be better for large scale storage.
 
 On Sat, Jul 24, 2010 at 7:30 AM, Maarten Koopmansmaar...@vrijheid.netwrote:
 
 Hi,
 
 I have a number of nodes that will grow larger than one cluster can hold,
 so I am looking for a way to efficiently stack clusters. One way is to have
 a zookeeper node symlink to another cluster.
 
 Has anybody ever done that and some tips, or alternative approaches?
 Currently I use Scala, and traverse zookeeper trees by proper tail
 recursion, so adapting the tail recursion to process symlinks would be my
 approach.
 
 Bst, Maarten
 
 
 



Re: ZK recovery questions

2010-07-20 Thread Mahadev Konar
Hi Ashwin,
 We have seen people wanting to have something like ZooKeeper without the
reliability of permanent storage and are willing to work with loosened
guarantees of current Zookeeper. What you mention on log files is certainly
a valid use case. 

It would be great to see how much throughput you will be able to get in such
a scenario wherein we never log onto a permanent store. Do you want to try
this out and see what kind of throughput difference you can get?

Thanks
mahadev


On 7/19/10 8:35 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com
wrote:

 
 Cool. I've only tried the single node server so far. I didn't know it could
 sync from other senior servers.
 
 Server/Cluster addresses: I read somewhere in the docs/todo list that the
 bootstrap server list for the clients should be the same. So, what happens
 when a new replacement server has to be brought in on a different
 IP/hostname? Do the older clients autodetect the new server or is this even
 supported? I suppose not.
 
 Log files: I have absolutely no confusion between ZK and databases (very
 tempting tho'), but running ZK servers without log files does not seem
 unusual. Especially since you said new servers can sync directly from senior
 servers without relying on log files. In that case, I'm curious to see what
 happens if you just redirect log files to /dev/null. Anyone tried this?
 
 Regards,
 Ashwin Jayaprakash.



Re: unit test failure

2010-07-14 Thread Mahadev Konar
HI Martin,
  Can you check if you have a stale java process (ZooKeeperServer) running
on your machine? That might cause some issues with the tests.


Thanks 
mahadev


On 7/14/10 8:03 AM, Martin Waite waite@gmail.com wrote:

 Hi,
 
 I am attempting to build the C client on debian lenny.
 
 autoconf, configure, make and make install all appear to work cleanly.
 
 I ran:
 
 autoreconf -if
 ./configure
 make
 make install
 make run-check
 
 However, the unit tests fail:
 
 $ make run-check
 make  zktest-st zktest-mt
 make[1]: Entering directory `/home/martin/zookeeper-3.3.1/src/c'
 make[1]: `zktest-st' is up to date.
 make[1]: `zktest-mt' is up to date.
 make[1]: Leaving directory `/home/martin/zookeeper-3.3.1/src/c'
 ./zktest-st
 ./tests/zkServer.sh: line 52: kill: (17711) - No such process
  ZooKeeper server startedRunning
 Zookeeper_operations::testPing : elapsed 1 : OK
 Zookeeper_operations::testTimeoutCausedByWatches1 : elapsed 0 : OK
 Zookeeper_operations::testTimeoutCausedByWatches2 : elapsed 0 : OK
 Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : elapsed 2 :
 OK
 Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 :
 OK
 Zookeeper_operations::testConcurrentOperations1 : elapsed 206 : OK
 Zookeeper_init::testBasic : elapsed 0 : OK
 Zookeeper_init::testAddressResolution : elapsed 0 : OK
 Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK
 Zookeeper_init::testNullAddressString : elapsed 0 : OK
 Zookeeper_init::testEmptyAddressString : elapsed 0 : OK
 Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK
 Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK
 Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK
 Zookeeper_init::testInvalidAddressString2 : elapsed 2 : OK
 Zookeeper_init::testNonexistentHost : elapsed 108 : OK
 Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK
 Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK
 Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 0 : OK
 Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK
 Zookeeper_close::testCloseUnconnected : elapsed 0 : OK
 Zookeeper_close::testCloseUnconnected1 : elapsed 0 : OK
 Zookeeper_close::testCloseConnected1 : elapsed 0 : OK
 Zookeeper_close::testCloseFromWatcher1 : elapsed 0 : OK
 Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
 throwing an instance of 'CppUnit::Exception'
   what():  equality assertion failed
 - Expected: -101
 - Actual  : -4
 
 make: *** [run-check] Aborted
 
 This appears to come from tests/TestClient.cc - but beyond that, it is hard
 to identify which equality assertion failed.
 
 Help !
 
 regards,
 Martin



Re: building client tools

2010-07-13 Thread Mahadev Konar
Hi Martin,
  There is a list of tools, i.e cppunit. That is the only required tool to
build the zookeeper c library. The readme says that it can be done without
cppunit being installed but there has been a open bug regarding this. So
cppunit is required as of now.

Thanks
mahadev


On 7/13/10 10:09 AM, Martin Waite waite@gmail.com wrote:

 Hi,
 
 I am trying to build the c client on debian lenny for zookeeper 3.3.1.
 
 autoreconf -if
 configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
 configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library
 configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT
   If this token and others are legitimate, please use m4_pattern_allow.
   See the Autoconf documentation.
 autoreconf: /usr/bin/autoconf failed with exit status: 1
 
 I probably need to install some required tools.   Is there a list of what
 tools are needed to build this please ?
 
 regards,
 Martin



Re: running the systest

2010-07-09 Thread Mahadev Konar
Hi Stuart,
 The instructions are just out of date. If you could open a jira and post a
patch to it that would be great!

We should try getting this in 3.3.2! That would be useful!

Thanks
mahadev


On 7/9/10 6:36 AM, Stuart Halloway stuart.hallo...@gmail.com wrote:

 Hi all,
 
 I am trying to run the systest and have hit a few minor issues:
 
 (1) The readme says src/contrib/jarjar, apparently should be
 src/contrib/fatjar
 
 (2) The compiled fatjar seems to be missing junit, so the launch instructions
 do not work.
 
 I can fix or workaround these, but I wanted to see if maybe the instructions
 are just out of date, and there is an easy (but currently undocumented) way to
 launch the tests.
 
 Thanks,
 Stu
 



Re: Suggested way to simulate client session expiration in unit tests?

2010-07-06 Thread Mahadev Konar
Hi Jeremy,

 zk.disconnect() is the right way to disconnect from the servers. For
session expiration you just have to make sure that the client stays
disconnected for more than the session expiration interval.

Hope that helps.

Thanks
mahadev


On 7/6/10 9:09 AM, Jeremy Davis jerdavis.cassan...@gmail.com wrote:

 Is there a recommended way of simulating a client session expiration in unit
 tests?
 I see a TestableZooKeeper.java, with a pauseCnxn() method that does cause
 the connection to timeout/disconnect and reconnect. Is there an easy way to
 push this all the way through to session expiration?
 Thanks,
 -JD



Re: Guaranteed message delivery until session timeout?

2010-07-01 Thread Mahadev Konar
When a connectionloss happens all the watches are triggered saying that
connectionloss occurred. But on a reconnect the watches are reset
automagically  on the new server and will be fired if the change has already
happened or will be reset!

I hope that answers your question.

Thanks
mahadev


On 6/30/10 5:11 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 I think that you are correct, but a real ZK person should answer this.
 
 On Wed, Jun 30, 2010 at 4:48 PM, Bryan Thompson br...@systap.com wrote:
 
 For example, if a client registers a watch, and a state change which would
 trigger that watch occurs _after_ the client has successfuly registered the
 watch with the zookeeper quorum, is it possible that the client would not
 observe the watch trigger due to communication failure, etc., even while the
 clients session remains valid?  It sounds like the answer is no per the
 timeliness guarantee.  Is that correct?
 
 



Re: Securing ZooKeeper connections

2010-05-26 Thread Mahadev Konar
Hi Vishal,
  Ben (Benjamin Reed) has been working on a netty based client server
protocol in ZooKeeper. I think there is an open jira for it. My network
connection is pretty slow so am finding it hard to search for it.

We have been thinking abt enabling secure connections via this netty based
connections in zookeeper.

Thanks
mahadev


On 5/25/10 12:20 PM, Vishal K vishalm...@gmail.com wrote:

 Hi All,
 
 Since ZooKeeper does not support secure network connections yet, I thought I
 would poll and see what people are doing to address this problem. Is anyone
 running ZooKeeper over secure channels (client - server and server- server
 authentication/encryption)? If yes, can you please elaborate how you do it?
 
 Thanks.
 
 Regards,
 -Vishal



Re: Zookeeper EventThread and SendThread

2010-05-20 Thread Mahadev Konar

Hi Nick,
 These threads are spawned with each zookeeper client handle. As soon as you
create a zookeeper client object these threads are spawned.

Are yu creating too many zookeeper client objects in your application?

Htanks
mahadev

On 5/20/10 11:30 AM, Nick Bailey nicholas.bai...@rackspace.com wrote:

 Hey guys,
 
 Question regarding zookeeper's EventThread and SendThread. I'm not quite sure
 what these are used for but a stacktrace of our client application contains
 lines similar to
 
 pool-2-thread-20-EventThread daemon prio=10 tid=0x2aac3cb29c00 nid=0x75d
 waiting on condition [0x6b08..0x6b080b10]
    java.lang.Thread.State: WAITING (parking)
         at sun.misc.Unsafe.park(Native Method)
         - parking to wait for  0x2aab1f577250 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
         at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Ab
 stractQueuedSynchronizer.java:1925)
         at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
         at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)
 
 pool-2-thread-20-SendThread daemon prio=10 tid=0x2aac3c35d400 nid=0x75c
 runnable [0x70ede000..0x70edeb90]
    java.lang.Thread.State: RUNNABLE
         at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
         at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
         at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
         - locked 0x2aab1f571d08 (a sun.nio.ch.Util$1)
         - locked 0x2aab1f571cf0 (a
 java.util.Collections$UnmodifiableSet)
         - locked 0x2aab1f5715b8 (a sun.nio.ch.EPollSelectorImpl)
         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:921)
 
 There are pairs of threads ranging from thread-1 to thread-50 and also
 multiple pairs of these threads.  As in pool-2-thread-20-SendThread is the
 name of multiple threads in the trace.  I'm debugging some load issues with
 our system and am suspicious that the large amount of zookeeper threads is
 contributing. Would anyone be able to elaborate on the purpose of these
 threads and how they are spawned?
 
 Thanks,
 
 Nick Bailey
 Rackspace Hosting
 Software Developer, Email  Apps
 nicholas.bai...@rackspace.com
 



Re: Using ZooKeeper for managing solrCloud

2010-05-14 Thread Mahadev Konar
Hi Rakhi,
 You can read more abt monitoring zookeeper servers at

http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperAdmin.html#sc_monito
ring


Thanks
mahadev


On 5/14/10 4:09 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
I just went through the zookeeper tutorial and successfully managed
 to run the zookeeper server.
 How do we monitor the zookeeper server?, is there a url for it?
 
 i pasted the following urls on browser, but all i get is a blank page
 http://localhost:2181
 http://localhost:2181/zookeeper
 
 
 I actually needed zookeeper for managing solr cloud managed externally
 but now if i hv 2 solr servers running, how do i configure zookeeper to
 manage them.
 
 Regards,
 Raakhi



Re: Problems with ZooKeeper apache wiki.

2010-05-13 Thread Mahadev Konar
Sudipto,
 I am ccing Flavio who might have the slides with him.

  Flavio, can you please send Sudipto the slides in case you have them with
you?

Thanks
mahadev

On 5/13/10 11:16 AM, Sudipto Das sudi...@cs.ucsb.edu wrote:

 Hi,
 
 Any update on this issue, or any way the presentations can be made
 available? I am in search for a presentation that explains the leader
 election phase in ZAB. I checked the ZAB papers and it explains everything
 except that. :(
 
 Best Regards
 Sudipto
 
 --
 Sudipto Das
 PhD Candidate
 CS @ UCSB
 Santa Barbara, CA 93106, USA
 http://www.cs.ucsb.edu/~sudipto
 
 
 On Tue, May 11, 2010 at 4:52 PM, Mahadev Konar maha...@yahoo-inc.comwrote:
 
 Thanks Paul!
 
 Does this mean folks will not be able to upload and download presentations
 on wiki from now on?
 
 If so, what is the alternative for us to upload attachments (such as
 presentations)?
 
 Thanks
 mahadev
 
 On 5/11/10 4:48 PM, Paul Querna p...@querna.org wrote:
 
 All file attachments have been disabled due to a spammer earlier this
 week.
 
 I don't know how soon or if we will enable file attachments again.
 
 On Tue, May 11, 2010 at 4:36 PM, Mahadev Konar maha...@apache.org
 wrote:
 Hi,
  We are having problems with apache wiki wherein we are getting an error
 of
 
  You are not allowed to do AttachFile on this page.
 
 Whenever we try to download an attachment from
 
 http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations.
 
 As far as I remember this worked fine when I attached the last ppt on
 the
 wiki. Can you please help us determine what the problem is with the
 wiki?
 
 Thanks
 mahadev
 
 
 
 
 



Re: Can't ls with large node count and I don't understand the use of jute.maxbuffer

2010-05-13 Thread Mahadev Konar
Hi Aaaron,
  Each of the requests and response between client and servers is sent an
(buflen, buffer) packet. The content of the packets are then deserialized
from this buffer. 

Looks like the size of the packet (buflen) is big in yoru case. We usually
avoid sending/receiving large packets just to discourage folks from using it
as bulk data store.

We also discourage creating a flat hierarchy with too many direct children
(your case). This is because such directories can cause huge load on
network/servers when an list on that directores are done by a huge number of
clients. We always suggest to bucket these children into more hierarchical
structure.

You are probably hitting the limit of 1MB for this! You might want to change
this in your client configuration as a temporary fix! But for later you
might want to think about out structure in ZooKeeper to make it more
hierarchical via some kind of bucketing!

Thanks
mahadev




On 5/13/10 10:18 AM, Aaron Crow dirtyvagab...@yahoo.com wrote:

 We're running Zookeeper with about 2 million nodes. It's working, with one
 specific exception: When I try to get all children on one of the main node
 trees, I get an IOException out of ClientCnxn (Packet len4648067 is out of
 range!). There are 150329 children under the node in question. I should
 also mention that I can successfully ls other nodes with similarly high
 children counts. But this specific node always fails.
 
 Googling led me to see that Mahadev dealt with this last year:
 http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00175.html
 
 Source diving led me to see that ClientCnxn enforces a bound based on
 the jute.maxbuffer setting:
 
 packetLen = Integer.getInteger(jute.maxbuffer, 4096 * 1024);
 
 ...
 
 if (len  0 || len = packetLen) {
 
   throw new IOException(Packet len + len +  is out of range!);
 
 
 So maybe I could bump this up in config... but, I'm confused when reading
 the documentation on jute.maxbuffer:
 It specifies the maximum size of the data that can be stored in a znode.
 
 It's true we have an extremely high node count. However, we've been careful
 to keep each node's data very small -- e.g., we certainly should have no
 single data entry longer than 256 characters.  The way I'm reading the docs,
 the jute.maxbuffer bound is purely against the data size of specific nodes,
 and shouldn't relate to child count. Or does it relate to child count as
 well?
 
 Here is a stat on the offending node:
 
 cZxid = 0x1000e
 
 ctime = Mon May 03 17:40:58 PDT 2010
 
 mZxid = 0x1000e
 
 mtime = Mon May 03 17:40:58 PDT 2010
 
 pZxid = 0x100315064
 
 cversion = 150654
 
 dataVersion = 0
 
 aclVersion = 0
 
 ephemeralOwner = 0x0
 
 dataLength = 0
 
 numChildren = 150372
 
 
 Thanks for any insights...
 
 
 Aaron



Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Mahadev Konar
Hi Jordan,
 Can you create a jira for this? And attach all the server logs and client
logs related to this timeline? How did you start up the servers? Is there
some changes you might have made accidentatlly to the servers?


Thanks
mahadev


On 5/12/10 10:49 AM, Jordan Zimmerman jzimmer...@proofpoint.com wrote:

 We've just started seeing an odd error and are having trouble determining the
 cause. 
 Xid out of order. Got 8 expected 7
 Any hints on what can cause this? Any ideas on how to debug?
 
 We're using ZK 3.3.0. The error occurs in ClientCnxn.java line 781
 
 -Jordan



Re: ZookeeperPresentations Wiki

2010-05-11 Thread Mahadev Konar
I just emailed in...@apache to ask for there help on this. I wasn't able to
figure out what the problem is!

Thanks for pointing it out.

mahadev


On 5/11/10 4:01 PM, Sudipto Das sudi...@cs.ucsb.edu wrote:

 Hi,
 
 I am trying to download some presentation slides from the
 ZookeeperPresentations wiki (
 http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations) but I am
 facing a weird problem. On clicking on a link for a presentation, I am
 getting the error message You are not allowed to do AttachFile on this
 page. Login and try again. I tried creating an account, and even after
 that, I get the same error message, except the login suggestion. All
 attachment links have an action=AttachFile URL, (e.g.
 http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations?action=AttachFi
 ledo=viewtarget=zookeeper_hbase.pptxfor
 the zookeeper_hbase.pptx file). My intent is to just download the
 files.
 Please let me know if I am doing something wrong. Sorry for my ignorance,
 but I honestly tried out all obvious means to figure out. :(
 
 Best Regards
 Sudipto
 
 --
 Sudipto Das
 PhD Candidate
 CS @ UCSB
 Santa Barbara, CA 93106, USA
 http://www.cs.ucsb.edu/~sudipto



Re: New ZooKeeper client library Cages

2010-05-11 Thread Mahadev Konar
Hi Dominic,
  Good to see this. I like the name cages :).


You might want to post to the list what cages is useful for. I think quite a
few folks would be interested in something like this. Are you guys currently
using it with cassandra?

Thanks
mahadev


On 5/11/10 4:02 PM, Dominic Williams thedwilli...@googlemail.com wrote:

 Anyone looking for a Java client library for ZooKeeper, please checkout:
 
 Cages - http://cages.googlecode.com
 
 The library will be expanded and feedback will be helpful.
 
 Many thanks,
 Dominic
 ria101.wordpress.com



Re: Problems with ZooKeeper apache wiki.

2010-05-11 Thread Mahadev Konar
Thanks Paul!

Does this mean folks will not be able to upload and download presentations
on wiki from now on?

If so, what is the alternative for us to upload attachments (such as
presentations)? 

Thanks
mahadev

On 5/11/10 4:48 PM, Paul Querna p...@querna.org wrote:

 All file attachments have been disabled due to a spammer earlier this week.
 
 I don't know how soon or if we will enable file attachments again.
 
 On Tue, May 11, 2010 at 4:36 PM, Mahadev Konar maha...@apache.org wrote:
 Hi,
  We are having problems with apache wiki wherein we are getting an error of
 
  You are not allowed to do AttachFile on this page.
 
 Whenever we try to download an attachment from
 
 http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations.
 
 As far as I remember this worked fine when I attached the last ppt on the
 wiki. Can you please help us determine what the problem is with the wiki?
 
 Thanks
 mahadev
 
 
 



Re: avoiding deadlocks on client handle close w/ python/c api

2010-05-04 Thread Mahadev Konar
Sure, Ill take a look at it.

Thanks
mahadev


On 5/4/10 2:32 PM, Patrick Hunt ph...@apache.org wrote:

 Thanks Kapil, Mahadev perhaps you could take a look at this as well?
 
 Patrick
 
 On 05/04/2010 06:36 AM, Kapil Thangavelu wrote:
 I've constructed  a simple example just using the zkpython library with
 condition variables, that will deadlock. I've filed a new ticket for it,
 
 https://issues.apache.org/jira/browse/ZOOKEEPER-763
 
 the gdb stack traces look suspiciously like the ones in 591, but sans the
 watchers.
 https://issues.apache.org/jira/browse/ZOOKEEPER-591
 
 the attached example on the ticket will deadlock in zk 3.3.0 (which has the
 fix for 591) and trunk.
 
 -kapil
 
 On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelukapil.f...@gmail.comwrote:
 
 Hi Folks,
 
 I'm constructing an async api on top of the zookeeper python bindings for
 twisted. The intent was to make a thin wrapper that would wrap the existing
 async api with one that allows for integration with the twisted python event
 loop (http://www.twistedmatrix.com) primarily using the async apis.
 
 One issue i'm running into while developing a unit tests, deadlocks occur
 if we attempt to close a handle while there are any outstanding async
 requests (aget, acreate, etc). Normally on close both the io thread
 terminates and the completion thread are terminated and joined, however
 w\ith outstanding async requests, the completion thread won't be in a
 joinable state, and we effectively hang when the main thread does the join.
 
 I'm curious if this would be considered bug, afaics ideal behavior would be
 on close of a handle, to effectively clear out any remaining callbacks and
 let the completion thread terminate.
 
 i've tried adding some bookkeeping to the api to guard against closing
 while there is an outstanding completion request, but its an imperfect
 solution do to the nature of the event loop integration. The problem is that
 the python callback invoked by the completion thread in turn schedules a
 function for the main thread. In twisted the api for this is implemented by
 appending the function to a list attribute on the reactor and then writing a
 byte to a pipe to wakeup the main thread. If a thread switch to the main
 thread occurs before the completion thread callback returns, the scheduled
 function runs and the rest of the application keeps processing, of which the
 last step for the unit tests is to close the connection, which results in a
 deadlock.
 
 i've included some of the client log and gdb stack traces from a deadlock'd
 client process.
 
 thanks,
 
 Kapil
 
 
 
 
 



Re: ZKClient

2010-05-04 Thread Mahadev Konar
Hi Adam,
  I don't think zk is very very hard to get right. There are exmaples in
src/recipes which implements locks/queues/others. There is ZOOKEEPER-22 to
make it even more easier for application to use.

Regarding re registration of watches, you can deifnitely write code and
submit is as a part of well documented contrib module which lays out the
assumptions/design of it. It could very well be useful for others. Its just
that folks havent had much time to focus on these areas as yet.

Thanks
mahadev


On 5/4/10 2:58 PM, Adam Rosien a...@rosien.net wrote:

 I use zkclient in my work at kaChing and I have mixed feelings about
 it. On one hand it makes easy things easy which is great, but on the
 other hand I very few ideas what assumptions it makes under the
 hood. I also dislike some of the design choices such as unchecked
 exceptions, but that's neither here nor there. It would take some
 extensive documentation work by the authors to really enumerate the
 model and assumptions, but the project doesn't seem to be active
 (either from it being adequate for its current users or just
 inactive). I'm not sure I could derive the assumptions myself.
 
 I'm a bit frustrated that zk is very, very hard to really get right.
 At a project level, can't we create structures to avoid most of these
 errors? Can there be a standard model with detailed assumptions and
 implementations of all the recipes? How can we start this? Is there
 something that makes this too hard?
 
 I feel like a recipe page is a big fail; wouldn't an example app that
 uses locks and barriers be that much more compelling?
 
 For the common FAQ items like you need to re-register the watch,
 can't we just create code that implements this pattern? My goal is to
 live up to the motto: a good API is impossible to use incorrectly.
 
 .. Adam
 
 On Tue, May 4, 2010 at 2:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 In general, writing this sort of layer on top of ZK is very, very hard to
 get really right for general use.  In a simple use-case, you can probably
 nail it but distributed systems are a Zoo, to coin a phrase.  The problem is
 that you are fundamentally changing the metaphors in use so assumptions can
 come unglued or be introduced pretty easily.
 
 One example of this is the fact that ZK watches *don't* fire for every
 change but when you write listener oriented code, you kind of expect that
 they will.  That makes it really, really easy to introduce that assumption
 in the heads of the programmer using the event listener library on top of
 ZK.  Another example is how the atomic get content/set watch call works in
 ZK is easy to violate in an event driven architecture because the thread
 that watches ZK probably resets the watch.  If you assume that the listener
 will read the data, then you have introduced a timing mismatch between the
 read of the data and the resetting of the watch.  That might be OK or it
 might not be.  The point is that these changes are subtle and tricky to get
 exactly right.
 
 On Tue, May 4, 2010 at 1:48 PM, Jonathan Holloway 
 jonathan.hollo...@gmail.com wrote:
 
 Is there any reason why this isn't part of the Zookeeper trunk already?
 
 



Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Mahadev Konar
Hi Dave,
 Just a question on how do you see it being used, meaning who would call
addserver and removeserver? It does seem useful to be able to do this. This
is definitely worth working on. You can link it as a subtask of
ZOOKEEPER-107.

Thanks
mahadev


On 5/3/10 7:03 AM, Dave Wright wrig...@gmail.com wrote:

 I've got a situation where I essentially need dynamic cluster
 membership, which has been talked about in ZOOKEEPER-107 but doesn't
 look like it's going to happen any time soon.
 
 For now, I'm planning on working around this by having a simple
 coordinator service on the server nodes that will re-write the configs
 and bounce the servers when membership changes. Clients will may get
 an error or two and need to reconnect, but that should be handled by
 the normal error logic.
 
 On the client side, I'd really like to dynamically update the server
 list w/o having to re-create the entire Zookeeper object. Looking at
 the code, it seems like it would be pretty trivial to add
 RemoveServer()/AddServer() functions for Zookeeper that calls down
 to ClientCnxn, where they are just maintained in a list. Of course if
 the server being removed is the one currently connected, we'd need to
 disconnect, but a simple call to disconnect() seems like it would
 resolve that and trigger the automatic re-connection logic.
 
 Does anyone see an issue with that approach?
 Were I to create the patch, do you think it would be interesting
 enough to merge? It seems like that functionality will eventually be
 needed for whatever full dynamic server support is eventually
 implemented.
 
 -Dave Wright



Re: Dynamic adding/removing ZK servers on client

2010-05-03 Thread Mahadev Konar
Yeah, that was one of the ideas, I think its been on the jira somewhere ( I
forget)... But could be and would definitely be one soln for it.

Thanks
mahadev


On 5/3/10 2:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Should this be a znode in the privileged namespace?
 
 On Mon, May 3, 2010 at 1:45 PM, Dave Wright wrig...@gmail.com wrote:
 
 Hi Dave,
  Just a question on how do you see it being used, meaning who would call
 addserver and removeserver? It does seem useful to be able to do this.
 This
 is definitely worth working on. You can link it as a subtask of
 ZOOKEEPER-107.
 
 
 In my case, it would be my client application - I would get a
 notification (probably via a watched ZK node controlled by my manager
 process) that the cluster membership was changing, and I'd adjust the
 client server list accordingly.
 
 -Dave
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Hi Lei,
 I think you might be confusing Leader Election within ZooKeeper itself with
electing a leader for your application? Is that so?

Leader Election within ZooKeeper is totally internal to zookeeper service
and is not visible to applications.

I am little confused, what your problem statement is? Can you please explain
it from the view point of your application?

Thanks
mahadev


On 4/30/10 12:45 PM, Lei Gao l...@linkedin.com wrote:

 Hi Ted,
 
 I 100% agree with what you said. But my question is more about what if my
 zookeeper service cluster is partitioned from a majority of nodes in my USER
 CLUSTER.  In this case, the majority nodes in one network partition can¹t
 select a new leader because zookeeper is out of reach.
 
 Another example will be that if there is an asymmetric network failure where a
 majority of nodes from the USER CLUSTER can¹t reach the leader while the
 zookeeper still can. How does zookeeper handle such situation?
 
 Thanks,
 
 Lei
 
 On 4/30/10 12:24 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 There are a variety of situations that can trigger a new leader election and a
 few that can cause the cluster to be unable to elect a new leader.  Isolation
 of just the leader is one of the situations that will cause a new leader
 election.  Isolation of nodes into groups smaller than the quorum will result
 in the cluster freezing.
 
 On Fri, Apr 30, 2010 at 11:56 AM, Lei Gao l...@linkedin.com wrote:
 Hi,
 
 I have a general question on how zookeeper can maintain its view of the user
 cluster (that zookeeper manages) that is consistent with the nodes in the user
 cluster. In other words, when zookeeper considers the current leader is
 unavailable, does it really guarantee that a majority of nodes in the user
 cluster can¹t reach the current leader? The same question applies to the
 membership service as well. Because the zookeeper can be partitioned from a
 majority of the nodes in the user cluster. How does the zookeeper handle
 situations like this?
 
 Thanks,
 
 Lei
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Hi Lei,
 In this case, its up to application to decide what to do when this happens.
The application will be notified that its disconnected from the ZooKeeper
cluster. In such a case some of the applications might decide to not proceed
at all, (since it might lead to some state corruption) and some others might
decide on using cached values, wherein stale values are fine for correctness
of the system. Its up to you to decide what you would want to do in such a
situation.


Also, usually you would want to set up ZooKeeper clusters in such a way that
this should not be possible... Like across switches

In this case, the application will be able to access one of the zookeeper
servers on the zookeeper cluster and it will be highly unlikely that they
arent able to reach any one of those.

Hope this helps.

Thanks
mahadev


On 4/30/10 1:26 PM, Lei Gao l...@linkedin.com wrote:

 Hi Henry,
 
 I am not talking about the leader election within zookeeper cluster. I guess
 I didn't make the discussion context clear. In my case, I run a cluster that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper (think
 of it as a stand-alone entity) becomes unavailabe in the way I've described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 Thanks,
 
 Lei
 
 
 On 4/30/10 1:10 PM, Henry Robinson he...@cloudera.com wrote:
 
 Hi Lei -
 
 The 'user cluster' (by which I think you mean the set of clients of
 ZooKeeper?) plays no part in leader election. If a majority of ZooKeeper
 server nodes can talk to each other, a new leader can be elected. Clients of
 the minority server partition will be disconnected - if they too cannot
 reach the majority partition then they will not be able to reconnect.
 
 Hope this helps,
 Henry
 
 On 30 April 2010 12:45, Lei Gao l...@linkedin.com wrote:
 
 Hi Ted,
 
 I 100% agree with what you said. But my question is more about what if my
 zookeeper service cluster is partitioned from a majority of nodes in my USER
 CLUSTER.  In this case, the majority nodes in one network partition can¹t
 select a new leader because zookeeper is out of reach.
 
 Another example will be that if there is an asymmetric network failure
 where a majority of nodes from the USER CLUSTER can¹t reach the leader while
 the zookeeper still can. How does zookeeper handle such situation?
 
 Thanks,
 
 Lei
 
 On 4/30/10 12:24 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 There are a variety of situations that can trigger a new leader election
 and a few that can cause the cluster to be unable to elect a new leader.
  Isolation of just the leader is one of the situations that will cause a new
 leader election.  Isolation of nodes into groups smaller than the quorum
 will result in the cluster freezing.
 
 On Fri, Apr 30, 2010 at 11:56 AM, Lei Gao l...@linkedin.com wrote:
 Hi,
 
 I have a general question on how zookeeper can maintain its view of the
 user cluster (that zookeeper manages) that is consistent with the nodes in
 the user cluster. In other words, when zookeeper considers the current
 leader is unavailable, does it really guarantee that a majority of nodes in
 the user cluster can¹t reach the current leader? The same question applies
 to the membership service as well. Because the zookeeper can be partitioned
 from a majority of the nodes in the user cluster. How does the zookeeper
 handle situations like this?
 
 Thanks,
 
 Lei
 
 
 
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Hi Lei,

In this case, the Leader will be disconnected from ZK cluster and will give
up its leadership. Since its disconnected, ZK cluster will realize that the
Leader is dead!

When Zk cluster realizes that the Leader is dead (this is because the zk
cluster hasn't heard from the Leader for a certain time Configurable via
session timeout parameter), the slaves will be notified of this via watchers
in zookeeper cluster. The slaves will realize that the Leader is gone and
will relect a new Leader and will start working with the new Leader.

Does that answer your question?

You might want to look though the documentation of ZK to understand its use
case and how it solves these kind of issues

Thanks
mahadev


On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:

 Thank you all for your answers. It clarifies a lot of my confusion about the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Hi Lei,
 Sorry I minsinterpreted your question! The scenario you describe could be
handled in such a way -

You could have a status node in ZooKeeper which every slave will subscribe
to and update! If one of the slave nodes sees that there have been too many
connection refused to the Leader by the slaves, the slave could go ahead and
delete the Leader znode, and force the Leader to give up its leadership. I
am not describing a deatiled way to do it, but its not very hard to come up
with a design for this.



Do you intend to have the Leader and Slaves in different Network (different
ACLs I mean) protected zones? In that case, it is a legitimate concern else
I do think assymetric network partition would be very unlikely to happen.

Do you usually see network partitions in such scenarios?

Thanks
mahadev


On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote:

 Hi Mahadev,
 
 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.
 
 Thanks,
 
 Lei  
 
 
 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable via
 session timeout parameter), the slaves will be notified of this via watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've
 described
 earlier, how can I handle such situation so my cluster can still function
 while a majority of nodes still connect to each other (but not to the
 zookeeper)?
 
 
 
 



Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
Maybe I jumped the gun here but Ted's response to your query is more
appropriate - 



You can then use ZK in your application to pick a lead machine for other
operations.  In that case, essentially every failure scenario is handled by
the standard recipe.  In your example where the master and slave are cut
off, but both still have access to ZK, all that will happen is that the
master cannot communicate with the slave.  Both will still be clear about
who is in which role.

The case where the master is cut off from both ZK and the slave is also
handled well as is the case where the master is cut off from ZK, but not
from the slave.  In both cases, the master will get a connection loss event
and stop trying to act like a master and the slave will be notified that the
master has dropped out of its role.

--





On 4/30/10 4:14 PM, Mahadev Konar maha...@yahoo-inc.com wrote:

 Hi Lei,
  Sorry I minsinterpreted your question! The scenario you describe could be
 handled in such a way -
 
 You could have a status node in ZooKeeper which every slave will subscribe
 to and update! If one of the slave nodes sees that there have been too many
 connection refused to the Leader by the slaves, the slave could go ahead and
 delete the Leader znode, and force the Leader to give up its leadership. I
 am not describing a deatiled way to do it, but its not very hard to come up
 with a design for this.
 
 
 
 Do you intend to have the Leader and Slaves in different Network (different
 ACLs I mean) protected zones? In that case, it is a legitimate concern else
 I do think assymetric network partition would be very unlikely to happen.
 
 Do you usually see network partitions in such scenarios?
 
 Thanks
 mahadev
 
 
 On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote:
 
 Hi Mahadev,
 
 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.
 
 Thanks,
 
 Lei  
 
 
 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable via
 session timeout parameter), the slaves will be notified of this via watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about
 the
 service guarantees of ZK. I am still struggling with one failure case (I am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster
 are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect a new
 leader and figure out what the current leader is. So if the zookeeper
 (think
 of it as a stand-alone entity) becomes unavailabe in the way I've
 described

Re: Question on maintaining leader/membership status in zookeeper

2010-04-30 Thread Mahadev Konar
HI Lei,
  ZooKeeper provides a set of primitives which allows you to do all kinds of
things! You might want to take a look at the api and some examples of
zookeeper recipes to see how it works and probably that will clear things
out for you.

Here are the links:

http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html

Thanks
mahadev


On 4/30/10 4:46 PM, Lei Gao l...@linkedin.com wrote:

 Hi Mahadev,
 
 First of all, I like to thank you for being patient with me - my questions
 seem unclear to many of you who try to help me.
 
 I guess clients have to be smart enough to trigger a new leader election by
 trying to delete the znode. But in this case, ZK should not allow any single
 or multiple (as long as they are less than a quorum) client(s) to delete the
 znode responding to the master, right? A new consensus among clients (NOT
 among the nodes in zk cluster) has to be there for the znode to be deleted,
 right?  Does zk have this capability or the clients have to come to this
 consensus outside of zk before trying to delete the znode in zk?
 
 Thanks,
 
 Lei
 
 Hi Lei,
  Sorry I minsinterpreted your question! The scenario you describe could be
 handled in such a way -
 
 You could have a status node in ZooKeeper which every slave will subscribe
 to and update! If one of the slave nodes sees that there have been too many
 connection refused to the Leader by the slaves, the slave could go ahead and
 delete the Leader znode, and force the Leader to give up its leadership. I
 am not describing a deatiled way to do it, but its not very hard to come up
 with a design for this.
 
 
 
 Do you intend to have the Leader and Slaves in different Network (different
 ACLs I mean) protected zones? In that case, it is a legitimate concern else
 I do think assymetric network partition would be very unlikely to happen.
 
 Do you usually see network partitions in such scenarios?
 
 Thanks
 mahadev
 
 
 On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote:
 
 Hi Mahadev,
 
 Why would the leader be disconnected from ZK? ZK is fine communicating with
 the leader in this case. We are talking about asymmetric network failure.
 Yes. Leader could consider all the slaves being down if it tracks the status
 of all slaves himself. But I guess if ZK is used for for membership
 management, neither the leader nor the slaves will be considered
 disconnected because they can all connect to ZK.
 
 Thanks,
 
 Lei  
 
 
 On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Lei,
 
 In this case, the Leader will be disconnected from ZK cluster and will give
 up its leadership. Since its disconnected, ZK cluster will realize that the
 Leader is dead!
 
 When Zk cluster realizes that the Leader is dead (this is because the zk
 cluster hasn't heard from the Leader for a certain time Configurable
 via
 session timeout parameter), the slaves will be notified of this via
 watchers
 in zookeeper cluster. The slaves will realize that the Leader is gone and
 will relect a new Leader and will start working with the new Leader.
 
 Does that answer your question?
 
 You might want to look though the documentation of ZK to understand its use
 case and how it solves these kind of issues
 
 Thanks
 mahadev
 
 
 On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote:
 
 Thank you all for your answers. It clarifies a lot of my confusion about
 the
 service guarantees of ZK. I am still struggling with one failure case (I
 am
 not trying to be the pain in the neck. But I need to have a full
 understanding of what ZK can offer before I make a decision on whether to
 used it in my cluster.)
 
 Assume the following topology:
 
  Leader   ZK cluster
   \\//
\\  //
  \\   //
   Slave(s)
 
 If I am asymmetric network failure such that the connection between Leader
 and Slave(s) are broken while all other connections are still alive, would
 my system hang after some point? Because no new leader election will be
 initiated by slaves and the leader can't get the work to slave(s).
 
 Thanks,
 
 Lei
 
 On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 If one of your user clients can no longer reach one member of the ZK
 cluster, then it will try to reach another.  If it succeeds, then it will
 continue without any problems as long as the ZK cluster itself is OK.
 
 This applies for all the ZK recipes.  You will have to be a little bit
 careful to handle connection loss, but that should get easier soon (and
 isn't all that difficult anyway).
 
 On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote:
 
 I am not talking about the leader election within zookeeper cluster. I
 guess
 I didn't make the discussion context clear. In my case, I run a cluster
 that
 uses zookeeper for doing the leader election. Yes, nodes in my cluster
 are
 the clients of zookeeper.  Those nodes depend on zookeeper to elect

Re: Misbehaving zk servers

2010-04-29 Thread Mahadev Konar
Hi Travis,

 How many clients did you have connected to this server? Usually the default
is 8K file descriptors. Did you have clients more than that?

Also, if clients fail to attach to a server, they will run off to another
server. We do not do any blacklisting because we expect the server to heal
and if it does not, it mostly shuts itself down in most of the cases.

Thanks
mahadev


On 4/29/10 12:08 AM, Travis Crawford traviscrawf...@gmail.com wrote:

 Hey zookeeper gurus -
 
 We recently had a zookeeper outage when one ZK server was started with
 a low limit after upgrading to 3.3.0. Several days later the outage
 occurred when that node reached its file descriptor limit and clients
 started having major issues.
 
 Are there any circumstances when a ZK server will get blacklisted from
 the ensemble? Something similar to how tasktrackers are blacklisted
 when too many tasks fail.
 
 Thanks!
 Travis



Re: Embedding ZK in another application

2010-04-29 Thread Mahadev Konar
We do set that Chad but it doesn't seem to help on some systems (especially
bsd)...

Thanks
mahadev


On 4/29/10 11:22 AM, Chad Harrington chad.harring...@gmail.com wrote:

 On Thu, Apr 29, 2010 at 8:49 AM, Patrick Hunt ph...@apache.org wrote:
 
 This is not foolproof however. We found that in general this would work,
 however there were some infrequent cases where a restarted server would fail
 to initialize due to the following issue:
 it is possible for the process to complete before the kernel has released
 the associated network resource, and this port cannot be bound to another
 process until the kernel has decided that it is done.
 
 more detail here:
 http://hea-www.harvard.edu/~fine/Tech/addrinuse.html
 
 as a result we ended up changing the test code to start each test with new
 quorum/election port numbers. This fixed the problem for us but would not be
 a solution in your case.
 
 Patrick
 
 
 I am not an expert at all on this, but I have used SO_REUSEADDR in other
 situations to avoid the address in use problem.  Would that help here?
 
 Chad Harrington
 chad.harring...@gmail.com
 
 
 
 On 04/29/2010 07:13 AM, Vishal K wrote:
 
 Hi Ted,
 
 We want the application that embeds the ZK server to be running even after
 the ZK server is shutdown. So we don't want to restart the application.
 Also, we prefer not to use zkServer.sh/zkServer.cmd because these are OS
 dependent (our application will run on Win as well as Linux). Instead, we
 thought that calling QuorumPeerMain.initializeAndRun() and
 QuorumPeerMain.shutdown() will suffice to start and shutdown a ZK server
 and
 we won't have to worry about checking the OS.
 
 Is there way to cleanly shutdown the ZK server (by invoking ZK server API)
 when it is embedded in the application without actually restarting the
 application process?
 Thanks.
 On Thu, Apr 29, 2010 at 1:54 AM, Ted Dunningted.dunn...@gmail.com
  wrote:
 
  Hmmm it isn't quite clear what you mean by restart without
 restarting.
 
 Why is killing the server and restarting it not an option?
 
 It is common to do a rolling restart on a ZK cluster.  Just restart one
 server at a time.  This is often used during system upgrades.
 
 On Wed, Apr 28, 2010 at 8:22 PM, Vishal Kvishalm...@gmail.com  wrote:
 
 
 What is a good way to restart a ZK server (standalone and quorum)
 without
 having to restart it?
 
 Currently, I have ZK server embedded in another java application.
 
 
 
 



Re: Partially partitioned connectivity

2010-04-29 Thread Mahadev Konar
Hi Kevin,
  I had the response set up but didn't hit send. Ted already answered your
question, but to give you a more technical background assuming that you know
a little bit more about transaction ids in ZooKeeper and server ids:

If B and C are partitioned from each other, and A is the leader, there would
not be a problem since A can talk to B and C and the cluster would continue
functioning.

But if B/C is the Leader then the other would get disconnected and stop
functioning unless the partition heals. The clients connected to this server
will get disconnected with connected to the working quorum of A and B/C.


If a quorum election happens in this situation, there are two possibilities

1) A is the leader
2) A is not the leader

If A is not Leader either B or C is the leader and the quorum functions with
the disconnected server not functioning.

If A is the leader then since it can talk to both B and C, all three servers
will function and will work all fine.

Hope that answers your question.

Thanks
mahadev

In the first case A will not be the lea
On 4/29/10 11:22 AM, Kevin Webb kcw...@cs.ucsd.edu wrote:

 Suppose I have three zookeeper servers (A, B, and C).  A can
 communicate with both B and C, but B and C are partitioned from one
 another.
 
 Is the system behavior under such conditions documented anywhere?  If
 not, can someone explain what will happen at the servers and their
 clients?
 
 Thanks!
 
 -Kevin



Re: Zookeeper client

2010-04-27 Thread Mahadev Konar
HI Avinash,
  The zk client does itself maintain liveness information and also
randomizes the list of servers to balance the number of clients connected to
a single ZooKeeper server.

Hope that helps.

Thanks
mahadev


On 4/27/10 10:56 AM, Avinash Lakshman avinash.laksh...@gmail.com wrote:

 Let's assume I have 100 clients connecting to a cluster of 5 Zookeeper
 servers over time. On the client side I instantiate a ZooKeeper instance and
 use it whenever I need to read/write into ZK. Now I know I can pass in a
 connect string with the list of all servers that make up the ZK cluster.
 Does the ZK client automatically maintain liveness information and load
 balance my connections across the machines? How can I do this effectively? I
 basically want to spread the connections from the 100 clients to the 5 ZK
 instances effectively.
 
 Thanks
 Avinash



Re: Zookeeper client

2010-04-27 Thread Mahadev Konar

Hi Avinash,
  No, the randomization only happens on the list of servers that are passed
to the client. If you just pass A to the client, the client will only be
able to connect to A and will not know about the other servers.

Does that help?

Thanks
mahadev

On 4/27/10 11:40 AM, Avinash Lakshman avinash.laksh...@gmail.com wrote:

 Thanks Mahadev. Does this happen irrespective of what I provide in the
 connect string? Let's say I have servers A, B, C, D and E in the ZK cluster.
 But all my clients instantiate ZooKeeper instances by providing information
 only about A. Does the randomization occur even in this case?
 
 Thanks
 Avinash
 
 On Tue, Apr 27, 2010 at 11:00 AM, Mahadev Konar maha...@yahoo-inc.comwrote:
 
 HI Avinash,
  The zk client does itself maintain liveness information and also
 randomizes the list of servers to balance the number of clients connected
 to
 a single ZooKeeper server.
 
 Hope that helps.
 
 Thanks
 mahadev
 
 
 On 4/27/10 10:56 AM, Avinash Lakshman avinash.laksh...@gmail.com
 wrote:
 
 Let's assume I have 100 clients connecting to a cluster of 5 Zookeeper
 servers over time. On the client side I instantiate a ZooKeeper instance
 and
 use it whenever I need to read/write into ZK. Now I know I can pass in a
 connect string with the list of all servers that make up the ZK cluster.
 Does the ZK client automatically maintain liveness information and load
 balance my connections across the machines? How can I do this
 effectively? I
 basically want to spread the connections from the 100 clients to the 5 ZK
 instances effectively.
 
 Thanks
 Avinash
 
 



Re: Embedding ZK in another application

2010-04-23 Thread Mahadev Konar
Hi Vishal and Ashanka,

  I think Ted and Pat had somewhat comentted on this before.

Reiterating these comments below. If you are ok with these points I see no
concern in ZooKeeper as an embedded application.

Also, as Pat mentioned earlier  there are some cases where the server code
will system.exit. This is typically only if quorum communication fails in
some weird, unrecoverable way. We have removed most of these but there are a
few still remaining.


--- Comments by Ted 

I can't comment on the details of your code (but I have run in-process ZK's
in the past without problem)

Operationally, however, this isn't a great idea.  The problem is two-fold:

a) firstly, somebody would probably like to look at Zookeeper to understand
the state of your service.  If the service is
down, then ZK will go away.  That means that Zookeeper can't be used that
way and is mild to moderate
on the logarithmic international suckitude scale.

b) secondly, if you want to upgrade your server without upgrading Zookeeper
then you still have to bounce
Zookeeper.  This is probably not a problem, but it can be a slight pain.

c) thirdly, you can't scale your service independently of how you scale
Zookeeper.  This may or may
not bother you, but it would bother me.

d) fourthly, you will be synchronizing your server restarts with ZK's
service restarts.  Moving these events
away from each other is likely to make them slightly more reliable.  There
is no failure mode that I know
of that would be tickled here, but your service code will be slightly more
complex since it has to make sure
that ZK is up before it does stuff.  If you could make the assumption that
ZK is up or exit, that would be
simpler.

e) yes, I know that is more than two issues.  That is itself an issue since
any design where the number of worries
is increasing so fast is suspect on larger grounds.  If there are small
problems cropping up at that rate, the likelihood
of there being a large problem that comes up seems higher.


On 4/23/10 11:04 AM, Vishal K vishalm...@gmail.com wrote:

 Hi,
 
 Good question. We are planning to do something similar as well and it will
 great to know if there are any issues with embedding ZK server into an app.
 We simply use QourumPeerMain and QourumPeer from our app to start/stop the
 ZK server. Is this not a good way to do it?
 
 On Fri, Apr 23, 2010 at 1:28 PM, Asankha C. Perera asan...@apache.orgwrote:
 
 Hi All
 
 I'm very new to ZK, and am looking at embeding ZK into an app that needs
 cluster management - and the objective is to use ZK to notify
 application cluster control operations (e.g. shutdown etc) across nodes.
 
 I came across this post [1] from the user list by Ted Dunning from some
 months back :
 My experience with Katta has led me to believe that embedding a ZK in a
 product is almost always a bad idea. - The problems are that you can't
 administer the Zookeeper cluster independently and that the cluster
 typically goes down when the associated service goes down.
 
 However, I believe that both the above are fine to live with for the
 application under consideration, as ZK will be used only to coordinate
 the larger application. Is there anything else that needs to be
 considered - and can I safely shutdown the clientPort since the
 application is always in the same JVM - but, if I do that how would I
 connect to ZK thereafter ?
 
 thanks and regards
 asankha
 
 [1] http://markmail.org/message/tjonwec7p7dhfpms
 



Re: Embedding ZK in another application

2010-04-23 Thread Mahadev Konar
That's true!

Thanks
mahadev


On 4/23/10 11:41 AM, Asankha C. Perera asan...@apache.org wrote:

 Hi Mahadev
   I think Ted and Pat had somewhat comentted on this before.
 
 Reiterating these comments below. If you are ok with these points I see no
 concern in ZooKeeper as an embedded application...
 
 Thanks, I missed this on the archives, and it helps!..
 
 I guess if we still decide to embed, the only way to connect to ZK is
 still with the normal TCP client..
 
 cheers
 asankha



Re: bug: wrong heading in recipes doc

2010-04-22 Thread Mahadev Konar
I think we should be using zookeeper locks to create jiras :) . Looks
like both of you created one!!! :)


Thanks
mahadev


On 4/22/10 1:37 PM, Patrick Hunt ph...@apache.org wrote:

 No problem.
 https://issues.apache.org/jira/browse/ZOOKEEPER-752
 
 I've seen alot of traffic on infrastruct...@apache, you might try there,
 I'm sure they could help you out.
 
 Regards,
 
 Patrick
 
 On 04/22/2010 01:26 PM, Adam Rosien wrote:
 I would, but the Apache JIRA has been f***ed since the breakin and I
 can't reset my password. Would you mind adding it for me?
 
 .. Adam
 
 On Thu, Apr 22, 2010 at 11:32 AM, Patrick Huntph...@apache.org  wrote:
 Hi Adam, would you mind creating a JIRA? That's the best way to address this
 type of issue. Thanks!
 https://issues.apache.org/jira/browse/ZOOKEEPER
 
 Patrick
 
 On 04/22/2010 11:30 AM, Adam Rosien wrote:
 
 
 http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html#sc_recoverableS
 haredLocks
 uses the heading recoverable locks, but the text refers to
 revocable.
 
 .. Adam
 



Re: odd error message

2010-04-20 Thread Mahadev Konar
Ok, I think this is possible.
So here is what happens currently. This has been a long standing bug and
should be fixed in 3.4

https://issues.apache.org/jira/browse/ZOOKEEPER-335

A newly elected leader currently doesn't log the new leader transaction to
its database

In your case, the follower (the 3rd server) did log it but the leader never
did. Now when you brought up the 3rd server it had the transaction log
present but the leader did not have that. In that case the 3rd server cried
fowl and shut down.

Removing the DB is totally fine. For now, we should update our docs on 3.3
and mention that this problem might occur during upgrade and fix it in 3.4.


Thanks for bringing it up Ted.


Thanks
mahadev

On 4/20/10 2:14 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 We have just done an upgrade of ZK to 3.3.0.  Previous to this, ZK has been
 up for about a year with no problems.
 
 On two nodes, we killed the previous instance and started the 3.3.0
 instance.  The first node was a follower and the second a leader.
 
 All went according to plan and no clients seemed to notice anything.  The
 stat command showed connections moving around as expected and all other
 indicators were normal.
 
 When we did the third node, we saw this in the log:
 
 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] -
 Leader epoch 18 is less than our epoch 19
 
 The third node refused all connections.
 
 We brought down the third node, wiped away its snapshot, restarted and it
 joined without complaint.  Note that the third node
 was originally a follower and had never been a leader during the upgrade
 process.
 
 Does anybody know why this happened?
 
 We are fully upgraded and there was no interruption to normal service, but
 this seems strange.



Re: Recovery issue - how to debug?

2010-04-19 Thread Mahadev Konar
Hi Hao,
  As Vishal already asked, how are you determining if the writes are being
received? 
 Also, what was the status of C2 when you checked for these writes? Do you
have the output of echo stat | nc localhost port?

How long did you wait when you say that C2 did not received the writes? What
was the status of C2 (again echo stat | nc localhost port) when you saw
the C2 had received the writes?

Thanks
mahadev


On 4/18/10 10:54 PM, Dr Hao He h...@softtouchit.com wrote:

 I have zookeeper cluster E1 with 3 nodes A,B, and C.
 
 I stopped C and did some writes on E1.  Both A and B received the writes.  I
 then started C and after a short while, C also received the writes.
 
 All seem to go well so I replicated the setup to another cluster E2 with
 exactly 3 nodes: A2, B2, and C2.
 
 I stopped C2 and did some writes on E2.  A2 received the writes.  I then
 started C2.  However, no matter how long I wait, C2 never received the writes.
 
 I then did more writes on E2.  Then C2 can receive all the writes including
 the old writes when it was down.
 
 How do I find out what was wrong withe E2 setup?
 
 I am running 3.2.2 on all nodes.
 
 Regards,
 
 Dr Hao He
 
 XPE - the truly SOA platform
 
 h...@softtouchit.com
 http://softtouchit.com
 
 



Re: rolling upgrade 3.2.1 - 3.3.0

2010-04-14 Thread Mahadev Konar
Hi Charity,
   Looks like you are hitting a bug recently found in 3.3.0.

https://issues.apache.org/jira/browse/ZOOKEEPER-737


Is the bug, wherein the server does not show the right status. Looks like in
your case the server is running fine but bin/zkserver.sh status is not
returning the right result.

You can try telnet localhost port and then type stat to get the status on
the server. This bug will be fixed in the bug fix release 3.3.1 which most
probalbly will be released by next week or so.

Thanks
mahadev
  


On 4/14/10 3:59 PM, Charity Majors char...@shopkick.com wrote:

 Hi.  I'm trying to upgrade a zookeeper cluster from 3.2.1 to 3.3.0, and having
 problems.  I can't get a 3.3.0 node to successfully join the cluster and stay
 joined.  
 
 If I run zkServer.sh status immediately after starting up the newly upgraded
 node, it says the service is probably not running, and shows me this:
 
 
 [char...@test-zookeeper001 zookeeper-current]$ bin/zkServer.sh status
 JMX enabled by default
 Using config: /services/zookeeper/zookeeper-20100412.1/bin/../conf/zoo.cfg
 2010-04-14 22:47:35,574 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] -
 Accepted socket connection from /127.0.0.1:40287
 2010-04-14 22:47:35,576 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@968] - Processing
 stat command from /127.0.0.1:40287
 2010-04-14 22:47:35,577 - WARN
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@606] -
 EndOfStreamException: Unable to read additional data from client sessionid
 0x0, likely client has closed socket
 2010-04-14 22:47:35,578 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket
 connection for client /127.0.0.1:40287 (no session established for client)
 Error contacting service. It is probably not running.
 [char...@test-zookeeper001 zookeeper-current]$ 2010-04-14 22:47:35,580 - DEBUG
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1310] - ignoring
 exception during input shutdown
 java.net.SocketException: Transport endpoint is not connected
 at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
 at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609)
 at 
 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
 
 
 If I connect with zkCli.sh, I can list the contents of zookeeper.  If I make
 changes to the schema on either of the other two nodes, test-zookeeper002 and
 test-zookeeper003, both of which are running 3.2.1, the changes are reflected
 on test-zookeeper001, which is running 3.3.0.
 
 When I exit zkCli.sh, however, zkServer.sh status starts flapping between
 Error contacting service. It is probably not running. and Mode: follower,
 as you can see below.
 
 Any ideas?  I'd really rather not have to take the production zookeeper
 cluster down to upgrade if it's not necessary.
 
 Thanks,
 Charity.
 
 
 
 [char...@test-zookeeper001 zookeeper-current]$ bin/zkServer.sh status
 JMX enabled by default
 Using config: /services/zookeeper/zookeeper-20100412.1/bin/../conf/zoo.cfg
 2010-04-14 22:53:16,848 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] -
 Accepted socket connection from /127.0.0.1:55284
 2010-04-14 22:53:16,849 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@968] - Processing
 stat command from /127.0.0.1:55284
 2010-04-14 22:53:16,849 - WARN
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@606] -
 EndOfStreamException: Unable to read additional data from client sessionid
 0x0, likely client has closed socket
 2010-04-14 22:53:16,850 - INFO
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket
 connection for client /127.0.0.1:55284 (no session established for client)
 Error contacting service. It is probably not running.
 2010-04-14 22:53:16,850 - DEBUG
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1310] - ignoring
 exception during input shutdown
 java.net.SocketException: Transport endpoint is not connected
 at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
 at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263)
 at 
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609)
 at 
 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
 
 

Re: znode cversion decreasing?

2010-04-12 Thread Mahadev Konar
HI Kevin,

 The cversion should be monotonically increasing for the the znode. It would
be a bug if its not. Can you please elaborate in which cases you are seeing
the cversion decreasing? If you can reproduce with an example that would be
great.

Thanks
mahadev


On 4/11/10 3:53 PM, Kevin Webb kcw...@cs.ucsd.edu wrote:

 I'm using Zookeeper (3.2.2) for  a simple group membership service in
 the manner that is typically described[1,2]:
 
 I create a znode for the group, and each present group member adds an
 ephemeral node under the group node.  I'm using the cversion of the
 group node as a group number. I expected this value to be
 monotonically increasing, but I'm seeing instances where this isn't
 the case.  According to the programmer's guide, changes to a node will
 cause the appropriate version number to increase, but it says nothing
 about decreasing.
 
 Am I misunderstanding something about the way node version numbers
 work?  Is there a better/recommended way to implement a monotonically
 increasing group number?
 
 Thanks!
 Kevin
 
 
 [1] http://hadoop.apache.org/zookeeper/docs/r3.2.2/recipes.html
 [2] http://eng.kaching.com/2010/01/actually-implementing-group-management.html



Re: feed queue fetcher with hadoop/zookeeper/gearman?

2010-04-12 Thread Mahadev Konar
Hi Thomas,
  There are a couple of projects inside Yahoo! that use ZooKeeper as an
event manager for feed processing.
  
I am little bit unclear on your example below. As I understand it-

1. There are 1 million feeds that will be stored in Hbase.
2. A map reduce job will be run on these feeds to find out which feeds need
to be fetched. 
3. This will create queues in ZooKeeper to fetch the feeds
4.  Workers will pull items from this queue and process feeds

Did I understand it correctly? Also, if above is the case, how many queue
items would you anticipate be accumulated every hour?

Thanks
mahadev


On 4/12/10 1:21 AM, Thomas Koch tho...@koch.ro wrote:

 Hi,
 
 I'd like to implement a feed loader with Hadoop and most likely HBase. I've
 got around 1 million feeds, that should be loaded and checked for new entries.
 However the feeds have different priorities based on their average update
 frequency in the past and their relevance.
 The feeds (url, last_fetched timestamp, priority) are stored in HBase. How
 could I implement the fetch queue for the loaders?
 
 - An hourly map-reduce job to produce new queues for each node and save them
 on the nodes?
   - but how to know, which feeds have been fetched in the last hour?
   - what to do, if a fetch node dies?
 
 - Store a fetch queue in zookeeper and add to the queue with map-reduce each
 hour?
   - Isn't that too much load for zookeeper? (I could make one znode for a
 bunch of urls...?)
 
 - Use gearman to store the fetch queue?
   - But the gearman job server still seems to be a SPOF
 
 [1] http://gearman.org
 
 Thank you!
 
 Thomas Koch, http://www.koch.ro
 



Re: znode cversion decreasing?

2010-04-12 Thread Mahadev Konar
Hi Kevin,

 Thanks for the info. Could you cut and paste the code you are using that
prints the view info?
That would help. We can then create a jira and follow up on that.

Also, a zookeeper client can never go back in time (even if its gets
disconnected and connected back to another server).

Thanks
mahadev


On 4/12/10 2:26 PM, Kevin Webb kcw...@cs.ucsd.edu wrote:

 On Mon, 12 Apr 2010 09:27:46 -0700
 Mahadev Konar maha...@yahoo-inc.com wrote:
 
 HI Kevin,
 
  The cversion should be monotonically increasing for the the znode.
 It would be a bug if its not. Can you please elaborate in which cases
 you are seeing the cversion decreasing? If you can reproduce with an
 example that would be great.
 
 Thanks
 mahadev
 
 Thanks Mahadev and Patrick!
 
 Here are some more details:
 
 I'm using the C client and running three servers on PlanetLab, with
 each server on a different continent.  Most of the time, the cversion
 is increasing as expected.  I'm never deleting the group node, so
 that's not the issue.
 
 Of course, now that I've emailed this list, I haven't seen it happen
 again...  
 
 I do have one old log file though:
 
 ZK(10): 1270514949 (Re)Connected to zookeeper server.
 ZK(10): 1270514952 Beginning new view #7.  Unsetting panic...
 GOSSIP(10): 1270514952 Changing view to 7
 ZK(10): 1270515798 Disconnected from zookeeper.  Setting panic...
 ZK(10): 1270515803 (Re)Connected to zookeeper server.
 ZK(10): 1270515806 Beginning new view #7.  Unsetting panic...
 GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current
 view is 7.
 ZK(10): 1270516812 Disconnected from zookeeper.  Setting panic...
 ZK(10): 1270516823 (Re)Connected to zookeeper server.
 ZK(10): 1270516826 Beginning new view #11.  Unsetting panic...
 GOSSIP(10): 1270516826 Changing view to 11
 ZK(10): 1270519191 Disconnected from zookeeper.  Setting panic...
 ZK(10): 1270519195 (Re)Connected to zookeeper server.
 ZK(10): 1270519198 Beginning new view #9.  Unsetting panic...
 GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current
 view is 11.
 
 The large integral number is a Unix seconds-since-epoch timestamp (the
 result of calling time(NULL)).
 
 In this case, the client connected, got group #7, disconnected,
 reconnected, got #7 again, disconnected, reconnected, got #11,
 disconnected, reconnected, and then got #9.
 
 The host string that I pass to zookeeper_init contains only one
 address:port, so it's not an issue of re-connecting to a different
 server and getting old/stale information.
 
 
 If/when it does happen again, I'll be sure to also save the zookeeper
 server logs.
 
 -Kevin



Re: Errors while running sytest

2010-04-07 Thread Mahadev Konar
Great. 

I was just responding with a different soln:

'---


Looks like the fatjar does not include junit class. Also, the -jar option
does not use the classpath environment variable.

Here is an excerpt from the man page of java:

   -jar   

Execute a program encapsulated in a JAR archive.  The first argument is the
name of a JAR file instead of a startup class name.  In order for this
option to work, the manifest of the JAR file  must
  

  When you use this option, the JAR file is the source of all
user classes, and other user class path settings are ignored.


So you will have to use the main class in fatjar with the java -classpath
option with all the libraries in the classpath.

Java -cp log4j:junit:fatjar  org.apache.zookeeper.util.FatJarMain  server
...


But putting it in build and including it as part of fatjar is much more
convenient!!! 



Thanks
mahadev


On 4/7/10 1:09 PM, Vishal K vishalm...@gmail.com wrote:

 Hi,
 
 It works for me now. Just for the record, I had to copy junit*.jar to
 buil/lib because fat.jar expects it to be there. Then, I had to rebuild
 fatjar.jar.
 
 On Wed, Apr 7, 2010 at 12:10 AM, Vishal K vishalm...@gmail.com wrote:
 
 Hi,
 
 I am trying to run systest on a 3 node cluster (
 http://svn.apache.org/repos/asf/hadoop/zookeeper/trunk/src/java/systest/READM
 E.txt
 ).
 
 When I reach the 4th step which is to actually run the test I get exception
 shown below.
 
 Exception in thread main java.lang.NoClassDefFoundError:
 junit/framework/TestC
 ase
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
 at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:14
 1)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
 at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:169)
 at org.apache.zookeeper.util.FatJarMain.main(FatJarMain.java:97)
 Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 ... 15 more
 
 Looks like it is not able to find classes in junit. However, my classpath
 is set right:
 
 
 :/opt/zookeeper-3.3.0/zookeeper.jar:/opt/zookeeper-3.3.0/lib/junit-4.4.jar:/o
 pt/
 
 zookeeper-3.3.0/lib/log4j-1.2.15.jar:/opt/zookeeper-3.3.0/build/test/lib/juni
 t-4.8.1.jar
 
 Any suggestions how I can get around this problem? Thanks.
 



Re: deleting a node - command line tool

2010-03-26 Thread Mahadev Konar
Hi Karthik,
  You can use bin/zkCli.sh which provides a nice command line shell
interface for executing commands.

Thanks
mahadev


On 3/26/10 9:42 AM, Karthik K oss@gmail.com wrote:

 Hi -
   I am looking to delete a node (say, /katta) from a running zk ensemble
 altogether and curious if there is any command-line tool that is available
 that can do a delete.
 
 --
   Karthik.



Re: java heap size

2010-03-16 Thread Mahadev Konar
I am not sure I understand it.

Lei, the default JVM size actually would not fit the JVM heap size of most
of the applications that use ZooKeeper. Its always good to understand the
requirements that are needed to run a system. I am not sure how its
confusing. Is it the language or how to set it?

Thanks
mahadev


On 3/15/10 8:04 PM, Lei Zhang lzvoya...@gmail.com wrote:

 Sorry I was too terse - I meant to say the default JVM heap size setting
 probably fits the need of most applications. Singling out this setting in
 Zookeeper Admin Guide is confusing rather than helping.



Re: Zookeeper unit tester?

2010-03-09 Thread Mahadev Konar
Hi David,
  We don't really have a mock test ZooKeeper client which does not do any
I/O. We have been thinking about using mockito sometime soon to use for this
kind of testing, but currently there is none.

Thanks
mahadev


On 3/9/10 2:23 PM, David Rosenstrauch dar...@darose.net wrote:

 Just wondering if there was a mock/fake version of
 org.apache.zookeeper.Zookeeper that could be used for unit testing?
 What I'm envisioning would be a single instance Zookeeper that operates
 completely in memory, with no network or disk I/O.
 
 This would make it possible to pass one of the memory-only
 FakeZookeeper's into unit tests, while using a real Zookeeper in
 production code.
 
 Any such animal?  :-)
 
 Thanks,
 
 DR



Re: Managing multi-site clusters with Zookeeper

2010-03-08 Thread Mahadev Konar
HI Martin,
  The results would be really nice information to have on ZooKeeper wiki.
Would be very helpful for others considering the same kind of deployment.
So, do send out your results on the list.


Thanks
mahadev


On 3/8/10 11:18 AM, Martin Waite waite@googlemail.com wrote:

 Hi Patrick,
 
 Thanks for you input.
 
 I am planning on having 3 zk servers per data centre, with perhaps only 2 in
 the tie-breaker site.
 
 The traffic between zk and the applications will be lots of local reads -
 who is the primary database ?.  Changes to the config will be rare (server
 rebuilds, etc - ie. planned changes) or caused by server / network / site
 failure.
 
 The interesting thing in my mind is how zookeeper will cope with inter-site
 link failure - how quickly the remote sites will notice, and how quickly
 normality can be resumed when the link reappears.
 
 I need to get this running in the lab and start pulling out wires.
 
 regards,
 Martin
 
 On 8 March 2010 17:39, Patrick Hunt ph...@apache.org wrote:
 
 IMO latency is the primary issue you will face, but also keep in mind
 reliability w/in a colo.
 
 Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in
 each colo, you will be reliable but clients w/in each colo will have to
 connect to a remote colo if the local fails. You will want to prioritize the
 local colo given that reads can be serviced entirely local that way. If you
 have 7 servers (2-2-3) that would be better - if a local server fails you
 have a redundant, if both fail then you go remote.
 
 You want to keep your writes as few as possible and as small as possible?
 Why? Say you have 100ms latency btw colos, let's go through a scenario for a
 client in a colo where the local servers are not the leader (zk cluster
 leader).
 
 read:
 1) client reads a znode from local server
 2) local server (usually  1ms if in colo comm) responds in 1ms
 
 write:
 1) client writes a znode to local server A
 2) A proposes change to the ZK Leader (L) in remote colo
 3) L gets the proposal in 100ms
 4) L proposes the change to all followers
 5) all followers (not exactly, but hopefully) get the proposal in 100ms
 6) followers ack the change
 7) L gets the acks in 100ms
 8) L commits the change (message to all followers)
 9) A gets the commit in 100ms
 10) A responds to client ( 1ms)
 
 write latency: 100 + 100 + 100 + 100 = 400ms
 
 Obviously keeping these writes small is also critical.
 
 Patrick
 
 
 Martin Waite wrote:
 
 Hi Ted,
 
 If the links do not work for us for zk, then they are unlikely to work
 with
 any other solution - such as trying to stretch Pacemaker or Red Hat
 Cluster
 with their multicast protocols across the links.
 
 If the links are not good enough, we might have to spend some more money
 to
 fix this.
 
 regards,
 Martin
 
 On 8 March 2010 02:14, Ted Dunning ted.dunn...@gmail.com wrote:
 
  If you can stand the latency for updates then zk should work well for
 you.
 It is unlikely that you will be able to better than zk does and still
 maintain correctness.
 
 Do note that you can, probalbly bias client to use a local server. That
 should make things more efficient.
 
 Sent from my iPhone
 
 
 On Mar 7, 2010, at 3:00 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 
  The inter-site links are a nuisance.  We have two data-centres with
 100Mb
 
 links which I hope would be good enough for most uses, but we need a 3rd
 site - and currently that only has 2Mb links to the other sites.  This
 might
 be a problem.
 
 
 



Re: Managing multi-site clusters with Zookeeper

2010-03-07 Thread Mahadev Konar
Hi Martin,
 As Ted rightly mentions that ZooKeeper usually is run within a colo because
of the low latency requirements of applications that it supports.

Its definitely reasnoble to use it in a multi data center environments but
you should realize the implications of it. The high latency/low throughput
means that you should make minimal use of such a ZooKeeper ensemble.

Also, there are things like the tick Time, the syncLimit and others (setup
parameters for ZooKeeper in config) which you will need to tune a little to
get ZooKeeper running without many hiccups in this environment.

Thanks
mahadev


On 3/6/10 10:29 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 What you describe is relatively reasonable, even though Zookeeper is not
 normally distributed across multiple data centers with all members getting
 full votes.  If you account for the limited throughput that this will impose
 on your applications that use ZK, then I think that this can work well.
 Probably, you would have local ZK clusters for higher transaction rate
 applications.
 
 You should also consider very carefully whether having multiple data centers
 increases or decreases your overall reliability.  Unless you design very
 carefully, this will normally substantially degrade reliability.  Making
 sure that it increases reliability is a really big task that involves a lot
 of surprising (it was to me) considerations and considerable hardware and
 time investments.
 
 Good luck!
 
 On Sat, Mar 6, 2010 at 1:50 AM, Martin Waite waite@googlemail.comwrote:
 
 Is this a viable approach, or am I taking Zookeeper out of its application
 domain and just asking for trouble ?
 
 
 



Re: Managing multi-site clusters with Zookeeper

2010-03-07 Thread Mahadev Konar
Martin,
 2Mb link might certainly be a problem. We can refer to these nodes as
ZooKeeper servers. Znodes is used to data elements in the ZooKeeper data
tree.

The Zookeeper ensemble has minimal traffic which is basically health checks
between the members of the ensemble. We call one of the members as Leader
who is leading the ensemble and the others as Followers. The Leader does
periodic health checks to see if the Followers are doing fine. This is of
the order of  1KB/sec.

There is some traffic when the leader election within the ensemble happens.
This might be of the order of 1-2KB/sec.

As you mentioned the reads happen locally. So, a good enough link within the
ensemble members is important so that these followers can be up to date with
the Leader. But again looking at your config, looks like its mostly read
only traffic. 

One more thing you should be aware of:
Lets says a ephemeral node was created and the client died, then the clients
connected to the slow ZooKeeper server (with 2Mb/s links) would lag behind
the other clients connected to the other servers.

As per my opinion you should do some testing since 2Mb/sec seems a little
dodgy.

Thanks
mahadev
 
On 3/7/10 2:09 PM, Martin Waite waite@googlemail.com wrote:

 Hi Mahadev,
 
 The inter-site links are a nuisance.  We have two data-centres with 100Mb
 links which I hope would be good enough for most uses, but we need a 3rd
 site - and currently that only has 2Mb links to the other sites.  This might
 be a problem.
 
 The ensemble would have a lot of read traffic from applications asking which
 database to connect to for each transaction - which presumably would be
 mostly handled by local zookeeper servers (do we call these nodes as
 opposed to znodes ?).  The write traffic would be mostly changes to
 configuration (a rare event), and changes in the health of database servers
 - also hopefully rare.  I suppose the main concern is how much ambient
 zookeeper system chatter will cross the links.   Are there any measurements
 of how much traffic is used by zookeeper in maintaining the ensemble ?
 
 Another question that occurs is whether I can link sites A,B, and C in a
 ring - so that if any one site drops out, the remaining 2 continue to talk.
 I suppose that if the zookeeper servers are all in direct contact with each
 other, this issue does not exist.
 
 regards,
 Martin
 
 On 7 March 2010 21:43, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Martin,
  As Ted rightly mentions that ZooKeeper usually is run within a colo
 because
 of the low latency requirements of applications that it supports.
 
 Its definitely reasnoble to use it in a multi data center environments but
 you should realize the implications of it. The high latency/low throughput
 means that you should make minimal use of such a ZooKeeper ensemble.
 
 Also, there are things like the tick Time, the syncLimit and others (setup
 parameters for ZooKeeper in config) which you will need to tune a little to
 get ZooKeeper running without many hiccups in this environment.
 
 Thanks
 mahadev
 
 
 On 3/6/10 10:29 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 What you describe is relatively reasonable, even though Zookeeper is not
 normally distributed across multiple data centers with all members
 getting
 full votes.  If you account for the limited throughput that this will
 impose
 on your applications that use ZK, then I think that this can work well.
 Probably, you would have local ZK clusters for higher transaction rate
 applications.
 
 You should also consider very carefully whether having multiple data
 centers
 increases or decreases your overall reliability.  Unless you design very
 carefully, this will normally substantially degrade reliability.  Making
 sure that it increases reliability is a really big task that involves a
 lot
 of surprising (it was to me) considerations and considerable hardware and
 time investments.
 
 Good luck!
 
 On Sat, Mar 6, 2010 at 1:50 AM, Martin Waite waite@googlemail.com
 wrote:
 
 Is this a viable approach, or am I taking Zookeeper out of its
 application
 domain and just asking for trouble ?
 
 
 
 
 



Re: zookeeper utils

2010-03-02 Thread Mahadev Konar
Hi David,
  There is an implementation for locks and queues in src/recipes. The
documentation residres in src/recipes/{lock/queue}/README.txt.

Thanks
mahadev


On 3/2/10 1:04 PM, David Rosenstrauch dar...@darose.net wrote:

 Was reading through the zookeeper docs on the web - specifically the
 recipes and solutions page (as well as comments elsewhere inviting
 additional such contributions from the community) and was wondering:
 
 Is there a library of higher-level zookeeper utilities that people have
 contributed, beyond the barrier and queue examples provided in the docs?
 
 Thanks,
 
 DR



Re: is there a good pattern for leases ?

2010-02-24 Thread Mahadev Konar
I am not sure if I was clear enoguh in my last message.

What is suggested was this:

Create a client with a timeout of lets say 10 seconds!

Zookeeper zk = new ZooKeeper(1); (for brevity ignoring other parameters)

Zk.create(/parent/ephemeral, data, EPEMERAL);

//create a another thread that triggeers at 120 seconds

On a trigger from this thread call zk.delete(/parent/ephemeral);

That's how lease can be done at the application side.

Obviously your lease expires on a session close and other events as well,
you need to be monitoring.

Thanks
mahadev


On 2/24/10 11:09 AM, Martin Waite waite@googlemail.com wrote:

 Hi Mahadev,
 
 That is interesting.  All I need to do is hold the connection for the
 required time of a session that created an ephemeral node.
 
 Zookeeper is an interesting tool.
 
 Thanks again,
 Martin
 
 On 24 February 2010 17:00, Mahadev Konar maha...@yahoo-inc.com wrote:
 
 Hi Martin,
  There isnt an inherent model for leases in the zookeeper library itself.
 To implement leases you will have to implement them at your application
 side
 with timeouts triggers (lease triggers) leading to session close at the
 client.
 
 
 Thanks
 mahadev
 
 
 On 2/24/10 3:40 AM, Martin Waite waite@googlemail.com wrote:
 
 Hi,
 
 Is there a good model for implementing leases in Zookeeper ?
 
 What I want to achieve is for a client to create a lock, and for that
 lock
 to disappear two minutes later - regardless of whether the client is
 still
 connected to zk.   Like ephemeral nodes - but with a time delay.
 
 regards,
 Martin
 
 



Re: how to lock one-of-many ?

2010-02-24 Thread Mahadev Konar
Hi martin,
 Currently you cannot access the server that the client is connected to.
This was fixed in this jira

http://issues.apache.org/jira/browse/ZOOKEEPER-544

But again this does not tell you if you are connected to the primary or the
other followers. So you will anyway have to do some manual testing with
specifying the client host:port address as just the primary or just the
follower (for the follower test case).

Leaking information like (if the server is primary or not) can cause
applications to use this information in a wrong way. So we never exposed
this information! :)

Thanks
mahadev




On 2/24/10 11:25 AM, Martin Waite waite@googlemail.com wrote:

 Hi,
 
 I take the point that the watch is useful for stopping clients unnecessarily
 pestering the zk nodes.
 
 I think that this is something I will have to experiment with and see how it
 goes.  I only need to place about 10k locks per minute, so I am hoping that
 whatever approach I take is well within the headroom of Zookeeper on some
 reasonable boxes.
 
 Is it possible for the client to know whether it has connected to the
 current primary or not ?   During my testing I would like to make sure that
 the approach works both when the client is attached to the primary and when
 attached to a lagged non-primary node.
 
 regards,
 Martin
 
 On 24 February 2010 18:42, Ted Dunning ted.dunn...@gmail.com wrote:
 
 Random back-off like this is unlikely to succeed (seems to me).  Better to
 use the watch on the locks directory to make the wait as long as possible
 AND as short as possible.
 
 On Wed, Feb 24, 2010 at 8:53 AM, Patrick Hunt ph...@apache.org wrote:
 
 Anyone interested in locking an explicit resource attempts to create an
 ephemeral node in /locks with the same ### as they resource they want
 access
 to. If interested in just getting any resource then you would
 getchildren(/resources) and getchildren(/locks) and attempt to lock
 anything
 not in the intersection (avail). This could be done efficiently since
 resources won't change much, just cache the results of getchildren and
 set a
 watch at the same time. To lock a resource randomize avail and attempt
 to
 lock each in turn. If all avail fail to acq the lock, then have some
 random
 holdoff time, then re-getchildren(locks) and start over.
 
 
 
 
 --
 Ted Dunning, CTO
 DeepDyve
 



Re: how to lock one-of-many ?

2010-02-23 Thread Mahadev Konar
Hi Martin,
  How about this- 

  you have resources in the a directory (say /locks)

  each process which needs to lock, lists all the children of this directory
and then creates an ephemeral node called /locks/resource1/lock depending on
which resource it wants to lock.

This ephemeral node will be deleted by the process as soon as its done using
the resource. A process should only use to resource_{i} if its been able to
create /locks/resource_{i}/locks.

Would that work?

Thanks
mahadev

On 2/23/10 4:05 AM, Martin Waite waite@googlemail.com wrote:

 Hi,
 
 I have a set of resources each of which has a unique identifier.  Each
 resource element must be locked before it is used, and unlocked afterwards.
 
 The logic of the application is something like:
 
 lock any one element;
 if (none locked) then
exit with error;
 else
get resource-id from lock
use resource
unlock resource
 end
 
 Zookeeper looks like a good candidate for managing these locks, being fast
 and resilient, and it seems quite simple to recover from client failure.
 
 However, I cannot think of a good way to implement this sort of one-of-many
 locking.
 
 I could create a directory called available and another called locked.
 Available would have one entry for each resource id ( or one entry
 containing a list of the resource-ids).  For locking, I could loop through
 the available ids, attempting to create a lock for that in the locked
 directory.  However this seems a bit clumsy and slow.  Also, the locks are
 held for a relatively short time (1 second on average), and by time I have
 blundered through all the possible locks, ids that were locked at the start
 might be available by time I finished.
 
 Can anyone think of a more elegant and efficient way of doing this ?
 
 regards,
 Martin



Re: Bit of help debugging a TIMED OUT session please

2010-02-22 Thread Mahadev Konar
HI stack,
 the other interesting part is with the session:
0x26ed968d880001

Looks like it gets disconnected from one of the servers (TIMEOUT). DO you
see any of these messages: Attempting connection to server in the logs
before you see all the consecutive

org.apache.zookeeper.ClientCnxn: Exception closing session
0x26ed968d880001 to sun.nio.ch.selectionkeyi...@788ab708
java.io.IOException: Read error rc = -1
java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
at 
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)

and


From the cient 0x26ed968d880001?

Thanks
mahadev


On 2/22/10 11:42 AM, Stack st...@duboce.net wrote:

 The thing that seems odd to me is that the connectivity complaints are
 out of the zk client, right?, why is it failing getting to member 14
 and why not move to another ensemble member if issue w/ 14?, and if
 there were a general connectivity issue, I'd think that the running
 hbase cluster would be complaining at about the same time (its talking
 to datanodes and masters at this time).
 
 (Thanks for the input lads)
 
 St.Ack
 
 
 On Mon, Feb 22, 2010 at 11:26 AM, Mahadev Konar maha...@yahoo-inc.com wrote:
 I also looked at the logs. Ted might have a point. It does look like that
 zookeeper server's are doing fine (though as ted mentions the skew is a
 little concerning, though that might be due to very few packets served by
 the first server). Other than that the latencies of 300 ms at max should not
 cause any timeouts.
 Also, the number of packets received is pretty low - meaning that it wasn't
 serving huge traffic. Is there anyway we can check if the network connection
 from the client to the server is not flaky?
 
 Thanks
 mahadev
 
 
 On 2/22/10 10:40 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 Not sure this helps at all, but these times are remarkably asymmetrical.  I
 would expect members of a ZK  cluster to have very comparable times.
 
 Additionally, 345 ms is nowhere near large enough to cause a session to
 expire.  My take is that ZK doesn't think it caused the timeout.
 
 On Mon, Feb 22, 2010 at 10:18 AM, Stack st...@duboce.net wrote:
 
        Latency min/avg/max: 2/125/345
 ...
        Latency min/avg/max: 0/7/81
 ...
        Latency min/avg/max: 1/1/1
 
 Thanks for any pointers on how to debug.
 
 
 



Re: How do I get added child znode?

2010-02-16 Thread Mahadev Konar
Hi Kim,
  The zookeeper api does not provide an api to get the znode that was added
or deleted. You will have to compare the last set of children and new set of
children to see which one was added or deleted.

Thanks
mahadev


On 2/16/10 5:47 AM, neptune opennept...@gmail.com wrote:

 Hi all, I'm kimhj.
 
 I have a question. I registered a Watcher on a parent znode(/foo).
 I create child znode(/foo/bar1) using a zookeeper console.
 Test program received Children changed event. But there is no API getting
 added znode.
 ZooKeeper.getChildren() method returns all children in a parent node.
 
 public class ZkTest implements Watcher {
   ZooKeeper zk;
   public void test() {
 zk = new ZooKeeper(127.0.0.1:2181, 10 * 1000, this);
 zk.create(/foo, false);
 zk.getChild(/foo, this);
   }
 
   public void process(WatchedEvent event) {
 if(event.getType() == Event.EventType.NodeChildrenChanged) {
   *ListString children = zk.getChildren(event.getPath(), this);*
 }
   }
 }
 
 Thanks.



Re: ZooKeeper packages for Ubuntu

2010-02-16 Thread Mahadev Konar
Great to hear this

mahadev


On 2/16/10 11:55 AM, Gustavo Niemeyer gust...@niemeyer.net wrote:

 Hello everyone,
 
 Thanks to Matthias Klose and Thierry Carrez, we've got ZooKeeper
 packaged for Ubuntu:
 
 https://launchpad.net/~ttx/+archive/ppa
 
 This is a Personal Package Archive at the moment, but these packages
 may end up being promoted depending on how relevant they are.
 
 Please let me know if these work or do not work for you.



Re: Ordering guarantees for async callbacks vs watchers

2010-02-10 Thread Mahadev Konar
Hi martin,
 a call like getchildren(final String path, Watcher watcher,
ChildrenCallback cb, Object ctx)

Means that set a watch on this node for any further changes on the server. A
client will see the response to getchildren data before the above watch is
fired. 

Hope that helps.

Thanks
mahadev


On 2/10/10 6:59 PM, Martin Traverso mtrave...@gmail.com wrote:

 What are the ordering guarantees for asynchronous callbacks vs watcher
 notifications (Java API) when both are used in the same call? E.g.,
 for getChildren(final String path, Watcher watcher, ChildrenCallback cb,
 Object ctx)
 
 Will the callback always be invoked before the watcher if there is a state
 change on the server at about the same time the call is made?
 
 I *think* that's what's implied by the documentation, but I'm not sure I'm
 reading it right:
 
 All completions for asynchronous calls and watcher callbacks will be made
 in order, one at a time. The caller can do any processing they wish, but no
 other callbacks will be processed during that time. (
 http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperProgrammers.html#Java+
 Binding
 )
 
 Thanks!
 
 Martin



Re: When session expired event fired?

2010-02-08 Thread Mahadev Konar
Hi,
 a zookeeper client does not expire a session until and unless it is able to
connect to one of the servers. In your case if you kill all the servers, the
client is not able to connect to any of the servers and will keep trying to
connect to the three servers. It cannot expire a session on its own and
needs to hear from the server to know if the session is expired or not.

Does that help? 

Thanks
mahadev


On 2/8/10 2:37 PM, neptune opennept...@gmail.com wrote:

 Hi all.
 I have a question. I started zookeeper(3.2.2) on three servers.
 When session expired event fired in following code?
 I expected that if client can't connect to server(disconnected) for session
 timeout, zookeeper fires session expired event.
 I killed three zookeeper server sequentially. Client retry to connect
 zookeeper server. Never occured Expired event.
 
 *class WatcherTest {
   public static void main(String[] args) {
 (new **WatcherTest*()).exec();
 *  }
 
   private WatcherTest() throws Exception {
 zk = new ZooKeeper(server1:2181,server2:2181:server3:2181, 10 * 1000,
 this);
   }
   private void exec() {
 while(ture) {
   //do something
 }
   }
   public void process(WatchedEvent event) {
 if (event.getType() == Event.EventType.None) {
   switch (event.getState()) {
   case SyncConnected:
 System.out.println(ZK SyncConnected);
 break;
   case Disconnected:
 System.out.println(ZK Disconnected);
 break;
   case Expired:
 System.out.println(ZK Session Expired);
 System.exit(0);
 break;
   }
 }
 }
 *



ZOOKEEPER-22 and release 3.3

2010-02-02 Thread Mahadev Konar
Hi all,

 I had been working on zookeeper-22 and found out that it needs quite a few
extensive changes. We will need to do some memory measurements to see if it
has any memory impacts or not.

Since we are targetting 3.3 release for early march, ZOOKEEPER-22 would be
hard to get into 3.3. I am proposing to move it to a later release (3.4), so
that it can be tested early in the release phase and gets baked in the
release. 


Thanks
mahadev



Re: Q about ZK internal: how commit is being remembered

2010-01-28 Thread Mahadev Konar
Qian,

  ZooKeeper gurantees that if a client sees some transaction response, then
it will persist but the one's that a client does not see might be discarded
or committed. So in case a quorum does not log the transaction, there might
be a case wherein a zookeeper server which does not have the logged
transaction becomes the leader (because the machines with the logged
transaction are down). In that case the transaction is discarded. In a case
when a machine which has the logged transaction becomes the leader that
transaction will be committed.

Hope that clear your doubt.

mahadev


On 1/28/10 6:02 PM, Qian Ye yeqian@gmail.com wrote:

 Thanks henry and ben, actually I have read the paper henry mentioned in this
 mail, but I'm still not so clear with some of the details. Anyway, maybe
 more study on the source code can help me understanding. Since Ben said
 that, if less than a quorum of servers have accepted a transaction, we can
 commit or discard. Would this feature cause any unexpected problem? Can you
 give some hints about this issue?
 
 
 
 On Fri, Jan 29, 2010 at 1:09 AM, Benjamin Reed br...@yahoo-inc.com wrote:
 
 henry is correct. just to state another way, Zab guarantees that if a
 quorum of servers have accepted a transaction, the transaction will commit.
 this means that if less than a quorum of servers have accepted a
 transaction, we can commit or discard. the only constraint we have in
 choosing is ordering. we have to decide which partially accepted
 transactions are going to be committed and which discarded before we propose
 any new messages so that ordering is preserved.
 
 ben
 
 
 Henry Robinson wrote:
 
 Hi -
 
 Note that a machine that has the highest received zxid will necessarily
 have
 seen the most recent transaction that was logged by a quorum of followers
 (the FIFO property of TCP again ensures that all previous messages will
 have
 been seen). This is the property that ZAB needs to preserve. The idea is
 to
 avoid missing a commit that went to a node that has since failed.
 
 I was therefore slightly imprecise in my previous mail - it's possible for
 only partially-proposed proposals to be committed if the leader that is
 elected next has seen them. Only when another proposal is committed
 instead
 must the original proposal be discarded.
 
 I highly recommend Ben Reed's and Flavio Junqueira's LADIS paper on the
 subject, for those with portal.acm.org access:
 http://portal.acm.org/citation.cfm?id=1529978
 
 Henry
 
 On 27 January 2010 21:52, Qian Ye yeqian@gmail.com wrote:
 
 
 
 Hi Henry:
 
 According to your explanation, *ZAB makes the guarantee that a proposal
 which has been logged by
 a quorum of followers will eventually be committed* , however, the
 source
 code of Zookeeper, the FastLeaderElection.java file, shows that, in the
 election, the candidates only provide their zxid in the votes, the one
 with
 the max zxid would win the election. I mean, it seems that no check has
 been
 made to make sure whether the latest proposal has been logged by a quorum
 of
 servers.
 
 In this situation, the zookeeper would deliver a proposal, which is known
 as
 a failed one by the client. Imagine this scenario, a zookeeper cluster
 with
 5 servers, Leader only receives 1 ack for proposal A, after a timeout,
 the
 client is told that the proposal failed. At this time, all servers
 restart
 due to a power failure. The server have the log of proposal A would be
 the
 leader, however, the client is told the proposal A failed.
 
 Do I misunderstand this?
 
 
 On Wed, Jan 27, 2010 at 10:37 AM, Henry Robinson he...@cloudera.com
 wrote:
 
 
 
 Qing -
 
 That part of the documentation is slightly confusing. The elected leader
 must have the highest zxid that has been written to disk by a quorum of
 followers. ZAB makes the guarantee that a proposal which has been logged
 
 
 by
 
 
 a quorum of followers will eventually be committed. Conversely, any
 proposals that *don't* get logged by a quorum before the leader sending
 them
 dies will not be committed. One of the ZAB papers covers both these
 situations - making sure proposals are committed or skipped at the right
 moments.
 
 So you get the neat property that leader election can be live in exactly
 the
 case where the ZK cluster is live. If a quorum of peers aren't available
 
 
 to
 
 
 elect the leader, the resulting cluster won't be live anyhow, so it's ok
 for
 leader election to fail.
 
 FLP impossibility isn't actually strictly relevant for ZAB, because FLP
 requires that message reordering is possible (see all the stuff in that
 paper about non-deterministically drawing messages from a potentially
 deliverable set). TCP FIFO channels don't reorder, so provide the extra
 signalling that ZAB requires.
 
 cheers,
 Henry
 
 2010/1/26 Qing Yan qing...@gmail.com
 
 
 
 Hi,
 
 I have question about how zookeeper *remembers* a commit operation.
 
 According to
 
 
 
 
 
 

Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup

2010-01-22 Thread Mahadev Konar
Unfortunately no.. We are planning to deploy 3.3 as the first version on
maven repo.


Thanks
mahadev


On 1/22/10 12:58 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Is ZK 3.2.2 in a maven repository somewhere?
 
 -- Forwarded message --
 From: Drew Farris drew.far...@gmail.com
 Date: Fri, Jan 22, 2010 at 11:47 AM
 Subject: Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup
 To: mahout-...@lucene.apache.org
 
 
 Neither hbase 0.20.2 nor zookeeper (any version) appear to be in a
 maven repo at this point, so Mahout would have to roll and deploy
 these. What was the process that was followed to build and deploy the
 mahout-packaged hadoop 0.20.1 and hbase artifacts? Is this something I
 could submit a patch to Mahout for, or better left for the committers?
 
 As Ted pointed out, yes the release of zk is 3.2.2
 
 Drew
 
 On Thu, Jan 21, 2010 at 5:12 AM, zhao zhendong zhaozhend...@gmail.com
 wrote:
 Hi Drew,
 
 I propose to
 1) update hbase-0.20.0.jar to hbase-0.20.2.jar due to the later is stable
 and hbased-platform is based on this version,
 
 2) and add zookeeper-3.2.1.jar.
 
 Cheers,
 Zhendong
 
 On Tue, Jan 19, 2010 at 12:36 PM, zhao zhendong zhaozhend...@gmail.com
 wrote:
 
 Hi Drew,
 
 Including a source code in snapshots that will be great.
 
 Currently, the HDFS reader does not work in 0.20.2. Without source code,
 it's not convenient for me to debug the code.
 
 Cheers,
 Zhendong
 
 On Sat, Jan 9, 2010 at 12:25 AM, Drew Farris drew.far...@gmail.com
 wrote:
 
 I wonder if we can get the hadoop people to include source jars with
 their snapshots?
 
 On Fri, Jan 8, 2010 at 11:23 AM, Sean Owen sro...@gmail.com wrote:
 I need a fix after 0.20.1, that's the primary reason. As a bonus, we
 don't have to maintain our own version. The downside is relying on a
 SNAPSHOT, but seems worth it to me.
 
 On Fri, Jan 8, 2010 at 4:02 PM, zhao zhendong zhaozhend...@gmail.com
 wrote:
 Thanks Drew,
 
 +1 for me to maintain a stable hadoop release, such as 0.20.1. The
 reason is
 obvious :)
 
 Cheers,
 Zhendong
 
 
 
 
 
 
 
 --
 -
 
 Zhen-Dong Zhao (Maxim)
 
 
 
 Department of Computer Science
 School of Computing
 National University of Singapore
 
 
 
 
 
 
 
 --
 -
 
 Zhen-Dong Zhao (Maxim)
 
 
 
 Department of Computer Science
 School of Computing
 National University of Singapore
 
 
 
 
 



Re: Server exception when closing session

2010-01-22 Thread Mahadev Konar
Hi Josh,
 This warning is not of any concern. Just a quick question, is there any
reason for you to runn the server on a DEBUG level?

Thanks
mahadev


On 1/22/10 5:19 PM, Josh Scheid jsch...@velocetechnologies.com wrote:

 Is it normal for client session close() to cause a server exception?
 Things seem to work, but the WARN is a bit disconcerting.
 
 2010-01-22 17:15:01,573 - WARN
 [NIOServerCxn.Factory:2181:nioserverc...@518] - Exception causing
 close of session 0x126571af282114b due to java.io.IOException: Read
 error
 2010-01-22 17:15:01,573 - DEBUG
 [NIOServerCxn.Factory:2181:nioserverc...@521] - IOException stack
 trace
 java.io.IOException: Read error
 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:396)
 at 
 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:239)
 2010-01-22 17:15:01,573 - INFO
 [NIOServerCxn.Factory:2181:nioserverc...@857] - closing
 session:0x126571af282114b NIOServerCnxn:
 java.nio.channels.SocketChannel[connected local=/10.66.16.96:2181
 remote=/10.66.24.94:59591]
 2010-01-22 17:15:01,573 - INFO
 [ProcessThread:-1:preprequestproces...@384] - Processed session
 termination request for id: 0x126571af282114b
 2010-01-22 17:15:01,583 - DEBUG
 [SyncThread:0:finalrequestproces...@74] - Processing request::
 sessionid:0x126571af282114b type:closeSession cxid:0x4b5a4d95
 zxid:0x43f3 txntype:-11 n/a
 2010-01-22 17:15:01,583 - DEBUG
 [SyncThread:0:finalrequestproces...@147] - sessionid:0x126571af282114b
 type:closeSession cxid:0x4b5a4d95 zxid:0x43f3 txntype:-11 n/a
 
 zk 3.2.2.  Client is using zkpython.
 
 Nothing is otherwise abnormal.  I can just connect, then close the
 session and this occurs.
 
 -Josh



Re: Server exception when closing session

2010-01-22 Thread Mahadev Konar
HI Josh,
  The server latency does seem huge. What os and hardware are you running it
on? What is usage model of zookeeper? How much memory are you allocating to
the server? 
The debug well exacerbate the problem.
A dedicated disk means the following:
Zookeeper has snapshots and transaction logs. The datadir is the directory
that stores the transaction logs. Its highly recommended that this directory
be on a separate disk that isnt being used by any other process. The
snapshots can sit on a disk that is being used by the OS and can be shared.

Also, Pat ran some tests for serve lantecies at:

http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview

You can take a look at that as well and see what the expected performance
should be for your workload.

Thanks
mahadev


On 1/22/10 5:40 PM, Josh Scheid jsch...@velocetechnologies.com wrote:

 On Fri, Jan 22, 2010 at 17:22, Mahadev Konar maha...@yahoo-inc.com wrote:
  This warning is not of any concern.
 
 OK.  I'm used to warnings being things that must be addressed.  I'll
 ignore this one in the future.
 
 Just a quick question, is there any reason for you to runn the server on a
 DEBUG level?
 
 We're having issues with server latency.  Client default timeout of
 1ms gets hit.  I saw a stat output showing a 16s max latency
 today.
 Is DEBUG going to exacerbate that?  Of the recommendations I've seen,
 the one I can't yet follow is a dedicated disk: dataDir is in the root
 partition of the server right now.
 
 -Josh



Re: question about locking in java singleton class

2010-01-14 Thread Mahadev Konar
Hi Jaakko,
 The lock recipe has already been implemented in zookeeper under
src/recipes/lock (version 3.* I think). It has code to deal with
connectionloss as well. I would suggest that you use the recipe. You can
file jira's in case you see some shortcomings/bugs in the code.


Thanks
mahadev


On 1/14/10 1:32 AM, Jaakko rosvopaalli...@gmail.com wrote:

 Hi,
 
 I'm trying to provide mutex services through a singleton class
 (methods lock and unlock). It basically follows lock recipe, but I'm
 having a problem how to handle connection loss properly if it happens
 during mutex wait:
 
 pseudocode/snippet:
 
 
 public class SingletonMutex implements Watcher
 {
 private Integer mutex;
 private ZooKeeper zk;
 
 public void process(WatchedEvent event)
 {
 synchronized (mutex)
 {
 mutex.notifyAll();
 }
 }
 
 private String lock()
 {
 create ephemeral znode
 find children and do related checks
 
 if (there_is_somebody_with_smaller_number)
 {
 mutex.wait();
 
 if (isConnected() == false)
 throw new Exception(foobar);
 }
 }
 }
 
 Question is: If there is a server disconnect during mutex.wait, the
 thread will wake up without having any means to continue (or delete
 the znode), so it throws an exception. However, if this is only due to
 connection loss (and not session expire), the lock znode it has
 created previously will not be deleted, thus resulting in a deadlock.
 One way would be to use sessionId in lock name and check if we already
 have the lock when entering lock method. However, since this is
 singleton class and used by multiple threads, that approach won't
 work. Using thread ID for this purpose or some form of internal
 bookkeeping is also not very robust. Currently I just do zk.close and
 get another instance on connection loss, which seems to solve the
 problem.
 
 Is there any other way to do this?
 
 Thanks for any comments/suggestions,
 
 -Jaakko



Re: Question regarding Membership Election

2010-01-14 Thread Mahadev Konar
Hi Vijay,
  Unfortunately you wont be able to keep running the observer in the other
DC if the quorum in the DC 1 is dead. Most of the folks we have talked to
also want to avoid voiting across colos. They usually run two instances of
Zookeeper in 2 DC's and copy state of zookeeper (using a bridge) across
colos to keep them in sync. Usually the data requirement across colos is
very small and they are usually able to do that by copying data across with
there own bridge process.

Hope that helps.

Thanks
mahadev


On 1/14/10 12:12 PM, Vijay vijay2...@gmail.com wrote:

 Hi,
 
 I read about observers in other datacenter,
 
 My question is i dont want voting across the datacenters (So i will use
 observers), at the same time when a DC goes down i dont want to loose the
 cluster, whats the solution for it?
 
 I have to have 3 nodes in primary DC to accept 1 node failure. Thats fine...
 but what about the other DC? how many nodes and how will i make it work?
 
 Regards,
 /VJ



Re: Question regarding Membership Election

2010-01-14 Thread Mahadev Konar
Hi Vijay,
 Sadly there isnt any. It would be great to have someone contribute one to
zookeeper code base :).

Thanks
mahadev


On 1/14/10 12:58 PM, Vijay vijay2...@gmail.com wrote:

 Thanks Mahadev that helps,
 
 Is there any hookup's (In zoo keeper) or examples which i can take a look
 for the bridging process?
 
 Regards,
 /VJ
 
 
 
 
 On Thu, Jan 14, 2010 at 12:38 PM, Mahadev Konar maha...@yahoo-inc.comwrote:
 
 Hi Vijay,
  Unfortunately you wont be able to keep running the observer in the other
 DC if the quorum in the DC 1 is dead. Most of the folks we have talked to
 also want to avoid voiting across colos. They usually run two instances of
 Zookeeper in 2 DC's and copy state of zookeeper (using a bridge) across
 colos to keep them in sync. Usually the data requirement across colos is
 very small and they are usually able to do that by copying data across with
 there own bridge process.
 
 Hope that helps.
 
 Thanks
 mahadev
 
 
 On 1/14/10 12:12 PM, Vijay vijay2...@gmail.com wrote:
 
 Hi,
 
 I read about observers in other datacenter,
 
 My question is i dont want voting across the datacenters (So i will use
 observers), at the same time when a DC goes down i dont want to loose the
 cluster, whats the solution for it?
 
 I have to have 3 nodes in primary DC to accept 1 node failure. Thats
 fine...
 but what about the other DC? how many nodes and how will i make it work?
 
 Regards,
 /VJ
 
 



Re: Namespace partitioning ?

2010-01-14 Thread Mahadev Konar
Hi kay,
  the namespace partitioning in zookeeper has been on a back burner for a
long time. There isnt any jira open on it. There had been some discussions
on this but no real work. Flavio/Ben have had this on there minds for a
while but no real work/proposal is out yet.

May I know is this something you are looking for in production?

Thanks
mahadev


On 1/14/10 3:38 PM, Kay Kay kaykay.uni...@gmail.com wrote:

 Digging up some old tickets + search results - I am trying to understand
 what current state is , w.r.t support for namespace partitioning in
 zookeeper.  Is it already in / any tickets-mailing lists to understand
 the current state.
 
 
 



Re: Killing a zookeeper server

2010-01-13 Thread Mahadev Konar
Hi Adam,
  That seems fair to file as an improvement. Running 'stat' did return the
right stats right? Saying the servers werent able to elect a leader?

mahadev


On 1/13/10 11:52 AM, Adam Rosien a...@rosien.net wrote:

 On a related note, it was initially confusing to me that the server
 returned 'imok' when it wasn't part of the quorum. I realize the
 internal checks are probably in separate areas of the code, but if
 others feel similarly I could file an improvement in JIRA.
 
 .. Adam
 
 On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey ni...@mailtrust.com wrote:
 So the solution for us was to just nuke zookeeper and restart everywhere.
  We will also be upgrading soon as well.
 
 To answer your question, yes I believe all the servers were running normally
 except for the fact that they were experiencing high CPU usage.  As we began
 to see some CPU alerts I started restarting some of the servers.
 
 It was then that we noticed that they were not actually running according to
 'stat'.
 
 I still have the log from one server with a debug level and the rest with a
 warn level. If you would like to see any of these and analyze them just let
 me know.
 
 Thanks for the help,
 Nick Bailey
 
 On Jan 12, 2010, at 8:20 PM, Patrick Hunt ph...@apache.org wrote:
 
 Nick Bailey wrote:
 
 In my last email I failded to include a log line that may be revelent as
 well
 2010-01-12 18:33:10,658 [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager)
 DEBUG - Queue size: 0
 2010-01-12 18:33:10,659 [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection)
 INFO  - Notification time out: 6400
 
 Yes, that is significant/interesting. I believe this means that there is
 some problem with the election process (ie the server re-joining the
 ensemble). We have a backoff on these attempts, which matches your
 description below. We have fixed some election issues in recent versions (we
 introduced fault injection testing prior to the 3.2.1 release which found a
 few issues with election). I don't have them off hand - but I've asked
 Flavio to comment directly (he's in diff tz).
 
 Can you provide a bit more background: prior to this issue, this
 particular server was running fine? You restarted it and then started seeing
 the issue? (rather than this being a new server I mean). What I'm getting at
 is that there shouldn't/couldn't be any networking/firewall type issue going
 on right?
 
 Can you provide a full/more log? What I'd suggest is shut down this one
 server, clear the log4j log file, then restart it. Let the problem
 reproduce, then gzip the log4j log file and attach to your response. Ok?
 
 Patrick
 
 We see this line occur frequently and the timeout will graduatlly
 increase to 6. It appears that all of our servers that seem to be
 acting
 normally are experiencing the cpu issue I mentioned earlier
 'https://issues.apache.org/jira/browse/ZOOKEEPER-427'. Perhaps that is
 causing the timeout in responding?
 Also to answer your other questions Patrick, we aren't storing a large
 amount of data really and network latency appears fine.
 Thanks for the help,
 Nick
 -Original Message-
 From: Nick Bailey nicholas.bai...@rackspace.com
 Sent: Tuesday, January 12, 2010 6:03pm
 To: zookeeper-user@hadoop.apache.org
 Subject: Re: Killing a zookeeper server
 12 was just to keep uniformity on our servers. Our clients are connecting
 from the same 12 servers.  Easily modifiable and perhaps we should look
 into
 changing that.
 The logs just seem to indicate that the servers that claim to have no
 server running are continually attempting to elect a leader.  A sample is
 provided below.  The initial exception is something we see regularly in our
 logs and the debug and info lines following are simply repeating throughout
 the log.
 2010-01-12 17:55:02,269 [NIOServerCxn.Factory:2181] (NIOServerCnxn) WARN
  - Exception causing close of session 0x0 due to java.io.IOException: Read
 error
 2010-01-12 17:55:02,269 [NIOServerCxn.Factory:2181] (NIOServerCnxn) DEBUG
 - IOException stack trace
 java.io.IOException: Read error
       at
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:295)
       at
 org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:16
 2)
 2010-01-12 17:55:02,269 [NIOServerCxn.Factory:2181] (NIOServerCnxn) INFO
  - closing session:0x0 NIOServerCnxn:
 java.nio.channels.SocketChannel[connected local=/172.20.36.9:2181
 remote=/172.20.36.9:50367]
 2010-01-12 17:55:02,270 [NIOServerCxn.Factory:2181] (NIOServerCnxn) DEBUG
 - ignoring exception during input shutdown
 java.net.SocketException: Transport endpoint is not connected
       at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
       at
 sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
       at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
       at
 org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:767)
       at
 org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:421)
 

Re: Fetching sequential children

2009-12-23 Thread Mahadev Konar
Hi ohad,
  there isnt a way to get a selected set of children from the servers. So
you will have to get all of them and filter out the unwanted ones. Also,
what Steve suggested in the other email might be useful for you.

Thanks
mahadev


On 12/23/09 12:29 AM, Ohad Ben Porat o...@outbrain.com wrote:

 Hey,
 
 Under the main node of my application I have the following sequential
 children:  mytest1, mytest2, mytest3, sometest1,sometest2,sometest3.
 Now, I want to get all children of my main node that starts with mytest,
 something like getChildren(/main/mytest*, false), is there a command for
 that? Or must I bring all children and filter out the unwanted ones?
 
 Ohad



Re: Starting Zookeeper on Amazon EC2

2009-12-09 Thread Mahadev Konar
Hi,
 Can you try this?

bin/zkCli.sh 127.0.0.1:2181

The -server command was added later as far as I remember.

Thanks
mahadev 



On 12/9/09 9:05 AM, Something Something mailinglist...@gmail.com wrote:

 I am trying to start ZooKeeper on an EC2 instance.  Here's what I did:
 
 1)  Downloaded  Unpacked ZooKeeper 3.1.1 on EC2 instance.
 2)  cp /conf/zoo_sample.cfg /conf/zoo.cfg
 3)  Changed the dataDir path to point to my EBS volume.
 4)  In one command window, ran /bin/zkServer.sh start
 (The last message I see is... Snapshotting: 0)
 
 5)  Opened another command window, and ran jps
 (This shows a new process called, QuorumPeerMain.  That's the only one I
 see.)
 
 6)  As per documentation, tried
 
 bin/zkCli.sh -server 127.0.0.1:2181
 
 (This gives me IOException: USAGE)
 
 7) So I ran:
 
 bin/zkCli.sh -server 127.0.0.1:2181 ls
 
 Got UnknownHostException: -server
 
 8)  So I tried various ways of specifying IP address in EC2, such as:
 
 10.xx.xx.xx
 ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
 domU-12-31-xx-xx-xx-xx.compute-1.internal
 domU-12-31-xx-xx-xx-xx
 
 None of them worked.  Keep getting UnknownHostException.
 
 What am I doing wrong.  Please help.  Thanks.



Re: zkfuse

2009-11-24 Thread Mahadev Konar
Hi Maarten,

  zkfuse does not have any support for acls. We havent had much time to
focus on zkfuse. Create/read/write/delete/ls are all supported. It was built
mostly for infrequent updates and more of a browsing interface on
filesystem. I don't think zkfuse is being used in production anywhere. Would
you mind elaborating your use case?

Thanks
mahadev


On 11/24/09 11:14 AM, Maarten Koopmans maar...@vrijheid.net wrote:

 Hi,
 
 I just started using zkfuse, and this may very well suit my needs for
 now. Thumbs up to the ZooKeeper team!
 
 What operations are supported (i.e. what is the best use of zkfuse). I
 can see how files, dirs there creation and listing map quite nicely. ACLs?
 
 I have noticed two things on a fresk Ubuntu 9.10 (posting for future
 archive reference):
 
 - I *have* to run in debug mode (-d)
 - you have to add libboost  or it won't compile
 
 Regards,
 
 Maarten



Bugfix release 3.2.2

2009-10-30 Thread Mahadev Konar
Hi all,
  We are planning to make a bugfix release 3.2.2 which will include a
critical bugfix in the c client code. The jira is ZOOKEEPER-562,
http://issues.apache.org/jira/browse/ZOOKEEPER-562.

 If you would like some fix to be considered for this bugfix release please
feel free to post on the zookeeper-dev list.


Thanks
Mahadev



  1   2   >