Re: txzookeeper - a twisted python client for zookeeper
Nice. Any chance of putting it back in zk? Would be useful. Thanks mahadev On 11/18/10 1:17 PM, Kapil Thangavelu kapil.f...@gmail.com wrote: At canonical we've been using zookeeper heavily in the development of a new project (ensemble) as noted by gustavo. I just wanted to give a quick overview of the client library we're using for it. Its called txzookeeper, its got 100% test coverage, and implements various queue, lock, and utilities in addition to wrapping the standard zk interface. Its based on the twisted async networking framework for python, and obviates the need to use threads within the application, as all watches and result callbacks are invoked in the main app thread. This makes structuring the code signficantly simpler imo than having to deal with threads in the application, but of course tastes may vary ;-). Source code is here : http://launchpad.net/txzookeeper comments and feedback welcome. cheers, Kapil
Re: JUnit tests do not produce logs if the JVM crashes
Hi Andras, Junit unit will always buffer the logs unless you print it out to console. To do that, try running this ant test -Dtest.output=yes This will print out the logs to console as they are logged. Thanks mahadev On 11/4/10 3:33 AM, András Kövi allp...@gmail.com wrote: Hi all, I'm new to Zookeeper and ran into an issue while trying to run the tests with ant. It seems like the log output is buffered until the complete test suite finishes and it is flushed into its specific file only after then. I had to make some changes to the code (no JNI or similar) that resulted in JVM crashes. Since the logs are lost in this case, it is a little hard to debug the issue. Do you have any idea how I could disable the buffering? Thanks, Andras
FW: [Hadoop Wiki] Update of ZooKeeper/ZKClientBindings by yfinkelstein
Nice to see this! Thanks mahadev -- Forwarded Message From: Apache Wiki wikidi...@apache.org Reply-To: common-...@hadoop.apache.org Date: Tue, 2 Nov 2010 14:39:24 -0700 To: Apache Wiki wikidi...@apache.org Subject: [Hadoop Wiki] Update of ZooKeeper/ZKClientBindings by yfinkelstein Dear Wiki user, You have subscribed to a wiki page or wiki category on Hadoop Wiki for change notification. The ZooKeeper/ZKClientBindings page has been changed by yfinkelstein. http://wiki.apache.org/hadoop/ZooKeeper/ZKClientBindings?action=diffrev1=5; rev2=6 -- ||Binding||Author||URL|| ||Scala||Steve Jenson, John Corwin||http://github.com/twitter/scala-zookeeper-client|| ||C#||Eric Hauser||http://github.com/ewhauser/zookeeper|| - || || || || + ||Node.js||Yuri Finkelstein||http://github.com/yfinkelstein/node-zookeeper|| -- End of Forwarded Message
Re: Problem with Zookeeper cluster configuration
I think Jared pointed this out, given that your clientPort and quorum port are same: clientPort=5181 server.1=3.7.192.142:5181:5888 The above 2 ports should be different. Thanks mahadev On 10/27/10 10:19 AM, Ted Dunning ted.dunn...@gmail.com wrote: Sorry, didn't see this last bit. Hmph. A real ZK person will have to answer this. On Wed, Oct 27, 2010 at 6:21 AM, siddhartha banik siddhartha.ba...@gmail.com wrote: I have tried with netstat command also. No other process is using *5181 *port other then zookeeper process. Other thing I have tried is: using separate ports for server1 server 2. Surprise is after starting server 2, server 1 also starts to use the same port as server 2 is using as client port. Does that matter , as server1 server 2 are running in different boxes. Any help is appreciated. Thanks Siddhartha
Re: Unusual exception
Hi Avinash, Not sure if you got a response for your email. The exception that you mention mostly means that the client already closed the socket or shutdown. Looks like a client is trying to connect but disconnects before the server can respond. Do you have any such clients? Is this causing any issues with your zookeeper set up? Thanks mahadev On 10/13/10 2:49 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: I started seeing a bunch of these exceptions. What do these mean? 2010-10-13 14:01:33,426 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-10-13 14:01:33,426 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for client /10.138.34.195:55738 (no session established for client) 2010-10-13 14:01:33,426 - DEBUG [CommitProcessor:1:finalrequestproces...@78] - Processing request:: sessionid:0x12b9d1f8b907a44 type:closeSession cxid:0x0 zxid:0x600193996 txntype:-11 reqpath:n/a 2010-10-13 14:01:33,427 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x12b9d1f8b907a5d, likely client has closed socket 2010-10-13 14:01:33,427 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for client /10.138.34.195:55979 which had sessionid 0x12b9d1f8b907a5d 2010-10-13 14:01:33,427 - DEBUG [QuorumPeer:/0.0.0.0:5001 :commitproces...@159] - Committing request:: sessionid:0x52b90ab45bd51af type:createSession cxid:0x0 zxid:0x600193cf9 txntype:-10 reqpath:n/a 2010-10-13 14:01:33,427 - DEBUG [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@1302] - ignoring exception during output shutdown java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1298) at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) 2010-10-13 14:01:33,428 - DEBUG [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@1310] - ignoring exception during input shutdown java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306) at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) 2010-10-13 14:01:33,428 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-10-13 14:01:33,428 - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:5001:nioserverc...@1286] - Closed socket connection for client /10.138.34.195:55731 (no session established for client)
Re: Zookeeper on 60+Gb mem
Hi Maarteen, I definitely know of a group which uses around 3GB of memory heap for zookeeper but never heard of someone with such huge requirements. I would say it definitely would be a learning experience with such high memory which I definitely think would be very very useful for others in the community as well. Thanks mahadev On 10/5/10 11:03 AM, Maarten Koopmans maar...@vrijheid.net wrote: Hi, I just wondered: has anybody ever ran zookeeper to the max on a 68GB quadruple extra large high memory EC2 instance? With, say, 60GB allocated or so? Because EC2 with EBS is a nice way to grow your zookeeper cluster (data on the ebs columes, upgrade as your memory utilization grows) - I just wonder what the limits are there, or if I am foing where angels fear to tread... --Maarten
Re: possible bug in zookeeper ?
Hi Yatir, Any update on this? Are you still struggling with this problem? Thanks mahadev On 9/15/10 12:56 AM, Yatir Ben Shlomo yat...@outbrain.com wrote: Thanks to all who replied, I appreciate your efforts: 1. There is no connections problem from the client machine: (ob1078)(tom...@cass3:~)$ echo ruok | nc zook1 2181 imok(ob1078)(tom...@cass3:~)$ echo ruok | nc zook2 2181 imok(ob1078)(tom...@cass3:~)$ echo ruok | nc zook3 2181 imok(ob1078)(tom...@cass3:~)$ 2. Unfortunately I have already tried to switch to the new jar but it does not seem to be backward compatible. It seems that the QuorumPeerConfig class does not have the following field protected int clientPort; It was replaced by InetSocketAddress clientPortAddress in the new jar So I am getting java.lang.NoSuchFieldError exception... 3. I looked at the ClientCnxn.java code. It seems that the logic for iterating over the available servers (nextAddrToTry++ ) is used only inside the startConnect() function but not in the finishConnect() function, nor anywhere else. Possibly something along these lines is happening: some exception that happens inside the finishConnect() function is cauasing the cleanup() function which in turn causes another exception. Nowhere in this code path is the nextAddrToTry++ applied. Can this make sense to someone ? thanks -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, September 14, 2010 6:20 PM To: zookeeper-user@hadoop.apache.org Subject: Re: possible bug in zookeeper ? That is unusual. I don't recall anyone reporting a similar issue, and looking at the code I don't see any issues off hand. Can you try the following? 1) on that particular zk client machine resolve the hosts zook1/zook2/zook3, what ip addresses does this resolve to? (try dig) 2) try running the client using the 3.3.1 jar file (just replace the jar on the client), it includes more log4j information, turn on DEBUG or TRACE logging Patrick On Tue, Sep 14, 2010 at 8:44 AM, Yatir Ben Shlomo yat...@outbrain.comwrote: zook1:2181,zook2:2181,zook3:2181 -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Tuesday, September 14, 2010 4:11 PM To: zookeeper-user@hadoop.apache.org Subject: Re: possible bug in zookeeper ? What was the list of servers that was given originally to open the connection to ZK? On Tue, Sep 14, 2010 at 6:15 AM, Yatir Ben Shlomo yat...@outbrain.com wrote: Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances. I am performing survivability tests: Taking one of the zookeeper instances down I would expect the client to use a different zookeeper server instance. But as you can see in the below logs attached Depending on which instance I choose to take down (in my case, the last one in the list of zookeeper servers) the client is constantly insisting on the same zookeeper server (Attempting connection to server zook3/192.168.252.78:2181) and not switching to a different one the problem seems to arrive from ClientCnxn.java Any one has an idea on this ? Solr cloud currently is using zookeeper-3.2.2.jar Is this a know bug that was fixed in later versions ?( 3.3.1) Thanks in advance, Yatir Logs: Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java) :999) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970 ) Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java) :1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970 ) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info INFO: Attempting connection to server zook3/192.168.252.78:2181 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Exception closing session 0x32b105244a20001 to sun.nio.ch.selectionkeyi...@3ca58cbf java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933 ) Sep 14, 2010 9:02:22 AM
Re: Expiring session... timeout of 600000ms exceeded
Am not sure, if anyone responded to this or not. Are the clients getting session expired or getting Connectionloss? In any case, zookeeper client has its own thread to updated the server with active connection status. Did you take a look at the GC activity at your client? Thanks mahadev On 9/21/10 8:24 AM, Tim Robertson timrobertson...@gmail.com wrote: Hi all, I am seeing a lot of my clients being kicked out after the 10 minute negotiated timeout is exceeded. My clients are each a JVM (around 100 running on a machine) which are doing web crawling of specific endpoints and handling the response XML - so they do wait around for 3-4 minutes on HTTP timeouts, but certainly not 10 mins. I am just prototyping right now on a 2xquad core mac pro with 12GB memory, and the 100 child processes only get -Xmx64m and I don't see my machine exhausted. Do my clients need to do anything in order to initiate keep alive heart beats or should this be automatic (I thought the ticktime would dictate this)? # my conf is: tickTime=2000 dataDir=/Volumes/Data/zookeeper clientPort=2181 maxClientCnxns=1 minSessionTimeout=4000 maxSessionTimeout=80 Thanks for any pointers to this newbie, Tim
Re: SessionMovedException
Hi Jun, You can read more about the SessionMovedException at http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperProgrammers.html Thanks mahadev On 10/1/10 9:58 AM, Jun Rao jun...@gmail.com wrote: Hi, Could someone explain what SessionMovedException means? Should it be treated as SessionExpiredException (therefore have to recreate ephemeral nodes, etc)? I have seen this exception when the network is being upgraded. Thanks, Jun
Re: zkfuse
Hi Jun, I havent seen people using zkfuse recently. What kind of issues are you facing? Thanks mahadev On 9/19/10 6:46 PM, 俊贤 junx...@taobao.com wrote: Hi guys, Has anyone succeeded in installing the zkfuse? This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you. 本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。 请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
Re: possible bug in zookeeper ?
Hi yatir, Can you confirm that zook1 , zook2 can be nslookedup from the client machine? We havent seen a bug like this. It would be great to nail this down. Thanks mahadev On 9/14/10 8:44 AM, Yatir Ben Shlomo yat...@outbrain.com wrote: zook1:2181,zook2:2181,zook3:2181 -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Tuesday, September 14, 2010 4:11 PM To: zookeeper-user@hadoop.apache.org Subject: Re: possible bug in zookeeper ? What was the list of servers that was given originally to open the connection to ZK? On Tue, Sep 14, 2010 at 6:15 AM, Yatir Ben Shlomo yat...@outbrain.comwrote: Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances. I am performing survivability tests: Taking one of the zookeeper instances down I would expect the client to use a different zookeeper server instance. But as you can see in the below logs attached Depending on which instance I choose to take down (in my case, the last one in the list of zookeeper servers) the client is constantly insisting on the same zookeeper server (Attempting connection to server zook3/192.168.252.78:2181) and not switching to a different one the problem seems to arrive from ClientCnxn.java Any one has an idea on this ? Solr cloud currently is using zookeeper-3.2.2.jar Is this a know bug that was fixed in later versions ?( 3.3.1) Thanks in advance, Yatir Logs: Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java) :999) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970 ) Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java) :1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970 ) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info INFO: Attempting connection to server zook3/192.168.252.78:2181 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Exception closing session 0x32b105244a20001 to sun.nio.ch.selectionkeyi...@3ca58cbf java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933 ) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java) :999) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970 ) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java) :1004) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970 ) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info INFO: Attempting connection to server zook3/192.168.252.78:2181 Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Exception closing session 0x32b105244a2 to sun.nio.ch.selectionkeyi...@3960f81b java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933 ) Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn WARNING: Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at
Re: Receiving create events for self with synchronous create
Hi Todd, Sorry for my late response. I had marked this email to respond but couldn't find the time :). Did you figure this out? It mostly looks like that as soon as you set a watch on /follower, some other node instantly creates another child of /follower? Could that be the case? Thanks mahadev On 8/26/10 8:09 PM, Todd Nine t...@spidertracks.co.nz wrote: Sure thing. The FollowerWatcher class is instantiated by the IClusterManager implementation.It then performs the following FollowerWatcher.init() which is intended to do the following. 1. Create our follower node so that other nodes know we exist at path /com/spidertracks/aviator/cluster/follower/10.0.1.1 where the last node is an ephemeral node with the internal IP address of the node. These are lines 67 through 72. 2. Signal to the clusterManager that the cluster has changed (line 79). Ultimately the clusterManager will perform a barrier for partitioning data ( a separate watcher) 3. Register a watcher to receive all future events on the follower path /com/spidertracks/aviator/cluster/follower/ line 81. Then we have the following characteristics in the watcher 1. If a node has been added or deleted from the children of /com/spidertracks/aviator/cluster/follower then continue. Otherwise, ignore the event. Lines 33 through 44 2. If this was an event we should process our cluster has changed, signal to the CusterManager that a node has either been added or removed. line 51. I'm trying to encapsulate the detection of additions and deletions of child nodes within this Watcher. All other events that occur due to a node being added or deleted should be handled externally by the clustermanager. Thanks, Todd On Thu, 2010-08-26 at 19:26 -0700, Mahadev Konar wrote: Hi Todd, The code that you point to, I am not able to make out the sequence of steps. Can you be more clear on what you are trying to do in terms of zookeeper api? Thanks mahadev On 8/26/10 5:58 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm running into a strange issue I could use a hand with. I've implemented leader election, and this is working well. I'm now implementing a follower queue with ephemeral nodes. I have an interface IClusterManager which simply has the api clusterChanged. I don't care if nodes are added or deleted, I always want to fire this event. I have the following basic algorithm. init Create a path with /follower/+mynode name fire the clusterChangedEvent Watch set the event watcher on the path /follower. watch: reset the watch on /follower if event is not a NodeDeleted or NodeCreated, ignore fire the clustermanager event this seems pretty straightforward. Here is what I'm expecting 1. Create my node path 2. fire the clusterChanged event 3. Set watch on /follower 4. Receive watch events for changes from any other nodes. What's actually happening 1. Create my node path 2. fire the clusterChanged event 3. Set Watch on /follower 4. Receive watch event for node created in step 1 5. Receive future watch events for changes from any other nodes. Here is my code. Since I set the watch after I create the node, I'm not expecting to receive the event for it. Am I doing something incorrectly in creating my watch? Here is my code. http://pastebin.com/zDXgLagd Thanks, Todd
Re: Lock example
Hi Tim, The lock recipe you mention is supposed to avoid her affect and prevent starvation (though it has bugs :)). Are you looking for something like that or just a simple lock and unlock that doesn't have to worry abt the above issues. If that's the case then just doing an ephemeral create and delete should give you your lock and unlock recipes. Thanks mahadev On 9/8/10 9:58 PM, Tim Robertson timrobertson...@gmail.com wrote: Hi all, I am new to ZK and using the queue and lock examples that come with zookeeper but have run into ZOOKEEPER-645 with the lock. I have several JVMs each keeping a long running ZK client and the first JVM (and hence client) does not respect the locks obtained by subsequent clients - e.g. the first client always manages to get the lock even if another client holds it. Before I start digging, I thought I'd ask if anyone has a simple lock implemented they might share? My needs are simply to lock a URL to indicate that it is being worked on, so that I don't hammer my endpoints with multiple clients. Thanks for any advice, Tim
Re: Understanding ZooKeeper data file management and LogFormatter
Hi Vishal, Usually the default retention policy is safe enough for operations. http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html Gives you an overview of how to use the purging library in zookeeper. Thanks mahadev On 9/8/10 12:01 PM, Vishal K vishalm...@gmail.com wrote: Hi All, Can you please share your experience regarding ZK snapshot retention and recovery policies? We have an application where we never need to rollback (i.e., revert back to a previous state by using old snapshots). Given this, I am trying to understand under what circumstances would we ever need to use old ZK snapshots. I understand a lot of these decisions depend on the application and amount of redundancy used at every level (e.g,. RAID level where the snapshots are stored etc) in the product. To simplify the discussion, I would like to rule out any application characteristics and focus mainly on data consistency. - Assuming that we have a 3 node cluster I am trying to figure out when would I really need to use old snapshot files. With 3 nodes we already have at least 2 servers with consistent database. If I loose files on one of the servers, I can use files from the other. In fact, ZK server join will take care of this. I can remove files from a faulty node and reboot that node. The faulty node will sync with the leader. - The old files will be useful if the current snapshot and/or log files are lost or corrupted on all 3 servers. If the loss is due to a disaster (case where we loose all 3 servers), one would have to keep the snapshots on some external storage to recover. However, if the current snapshot file is corrupted on all 3 servers, then the most likely cause would be a bug in ZK. In which case, how can I trust the consistency of the old snapshots? - Given a set of snapshots and log files, how can I verify the correctness of these files? Example, if one of the intermediate snapshot file is corrupt. - The Admin's guide says Using older log and snapshot files, you can look at the previous state of ZooKeeper servers and even restore that state. The LogFormatter class allows an administrator to look at the transactions in a log. * *Is there a tool that does this for the admin? The LogFormatter only displays the transactions in the log file. - Has anyone ever had to play with the snapshot files in production? Thanks in advance. Regards, -Vishal
Re: ZooKeeper C bindings and cygwin?
Hi Jan, It would be great to have some documentation on how to use the windows install. Would you mind submitting a patch with documentation with FAQ's and any other issues you might have faced? Thanks mahadev On 9/1/10 6:04 AM, jdeinh...@ujam.com jdeinh...@ujam.com wrote: Dear list readers, we've solved the problem ourself. We found the dll CYGZOOKEEPER_MT-2.DLL in /usr/local/bin. Best regards Jan Jan Am 01.09.2010 um 12:57 schrieb jdeinh...@ujam.commailto:jdeinh...@ujam.com: Dear list readers, we want to use the zookeeper C bindings with our applications. Some of them are running on Linux (e.g. Load Balancer) and others (.NET(C#) Audio Servers) on Windows Server 2008. We'd like to try using cygwin to accomplish this task on windows, but we need further advice on how to do that. What we did so far: 1) downloaded latest cygwin 2) ran ./configure 3) ran make 4) ran make install Now we find some files (libzookeeper_mt.a, libzookeeper_mt.dll.a, libzookeeper_mt.la, libzookeeper_st.a, libzookeeper_st.dll.a and libzookeeper_st.la) in our cygwin/usr/local/lib folder, but these cannot be used in Visual Studio. Is it somehow possible to produce a file that we can then use like a .dll or a .lib ? What do we have to do to accomplish our task? Are we heading in a completely wrong direction? Any help is greatly appreciated, thank you in advance! Best regards Jan Jan Jan Deinhard Software Developer UJAM GmbH Speicher 1 Konsul-Smidt-Str 8d 28217 Bremen fon +49 421 89 80 97-04 jdeinh...@ujam.commailto:a...@ujam.com www.ujam.comhttp://www.ujam.com/
Re: getting created child on NodeChildrenChanged event
Hi Todd, We have always tried to lean on the side of keeping things lightweight and the api simple. The only way you would be able to do this is with sequential creates. 1. create nodes like /queueelement-$i where i is a monotonically increasing number. You could use the sequential flag of zookeeper to do this. 2. when deleting a node, you would remove the node and create a deleted node on /deletedqueueelements/queuelement-$i 2.1 on notification you would go to /deletedqueelements/ and find out which ones were deleted. The above only works if you are ok with monotonically unique queue elements. 3. the above method allows the folks to see the deltas using deletedqueuelements, which can be garbage collected by some clean up process (you can be smarter abt this as well) Would something like this work? Thanks mahadev On 8/31/10 3:55 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi Dave, Thanks for the response. I understand your point about missed events during a watch reset period. I may be off, here is the functionality I was thinking. I'm not sure if the ZK internal versioning process could possibly support something like this. 1. A watch is placed on children 2. The event is fired to the client. The client receives the Stat object as part of the event for the current state of the node when the event was created. We'll call this Stat A with version 1 3. The client performs processing. Meanwhile the node has several children changed. Versions are incremented to version 2 and version 3 4. Client resets the watch 5. A node is added 6. The event is fired to the client. Client receives Stat B with version 4 7. Client calls performs a deltaChildren(Stat A, Stat B) 8. zookeeper returns added nodes between stats, also returns deleted nodes between stats. This would handle the missed event problem since the client would have the 2 states it needs to compare. It also allows clients dealing with large data sets to only deal with the delta over time (like a git replay). Our number of queues could get quite large, and I'm concerned that keeping my previous event's children in a set to perform the delta may become quite memory and processor intensive Would a feature like this be possible without over complicating the Zookeeper core? Thanks, Todd On Tue, 2010-08-31 at 09:23 -0400, Dave Wright wrote: Hi Todd - The general explanation for why Zookeeper doesn't pass the event information w/ the event notification is that an event notification is only triggered once, and thus may indicate multiple events. For example, if you do a GetChildren and set a watch, then multiple children are added at about the same time, the first one triggers a notification, but the second (or later) ones do not. When you do another GetChildren() request to get the list and reset the watch, you'll see all the changed nodes, however if you had just been told about the first change in the notification you would have missed the others. To do what you are wanting, you would really need persistent watches that send notifications every time a change occurs and don't need to be reset so you can't miss events. That isn't the design that was chosen for Zookeeper and I don't think it's likely to be implemented. -Dave Wright On Tue, Aug 31, 2010 at 3:49 AM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm writing a distributed queue monitoring class for our leader node in the cluster. We're queueing messages per input hardware device, this queue is then assigned to a node with the least load in our cluster. To do this, I maintain 2 Persistent Znode with the following format. data queue /dataqueue/devices/unit id/data packet processing follower /dataqueue/nodes/node name/unit id The queue monitor watches for changes on the path of /dataqueue/devices. When the first packet from a unit is received, the queue writer will create the queue with the unit id. This triggers the watch event on the monitoring class, which in turn creates the znode for the path with the least loaded node. This path is watched for child node creation and the node creates a queue consumer to consume messages from the new queue. Our list of queues can become quite large, and I would prefer not to maintain a list of queues I have assigned then perform a delta when the event fires to determine which queues are new and caused the watch event. I can't really use sequenced nodes and keep track of my last read position, because I don't want to iterate over the list of queues to determine which sequenced node belongs to the current unit id (it would require full iteration, which really doesn't save me any reads). Is it possible to create a watch to return the path and Stat of the child node that caused the event to fire? Thanks, Todd
Re: Logs and in memory operations
Hi Avinash, IN the source code the FinalRequestProcessor updates the in memory data structures and the SyncRequestProcessor logs to disk. For deciding when to delete take a look at PurgeTxnLog.java file. Thanks mahadev On 8/30/10 1:11 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: Hi All From my understanding when a znode is updated/created a write happens into the local transaction logs and then some in-memory data structure is updated to serve the future reads. Where in the source code can I find this? Also how can I decide when it is ok for me to delete the logs off disk? Please advice. Cheers Avinash
Re: Spew after call to close
Hi Stack, Looks like you are shutting down the server and shutting down the client at the same time? Is that the issue? Thanks mahadev On 9/3/10 4:47 PM, Stack st...@duboce.net wrote: Have you fellas seen this before? I call close on zookeeper but it insists on doing the below exceptions. Why is it doing this 'Session 0x12ad9dccda30002 for server null, unexpected error, closing socket connection and attempting reconnect'? This would seem to come after the close has been noticed and looking in code, i'd think we'd not do this since the close flag should be set to true post call to close? Thanks lads (The below looks ugly in our logs... this is zk 3.3.1), St.Ack 2010-09-03 16:09:52,369 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /fe80:0:0:0:0:0:0:1%1:56941 which had sessionid 0x12ad9dccda30001 2010-09-03 16:09:52,369 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:56942 which had sessionid 0x12ad9dccda30002 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x12ad9dccda30001, likely server has closed socket, closing socket connection and attempting reconnect 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x12ad9dccda30002, likely server has closed socket, closing socket connection and attempting reconnect 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.server.NIOServerCnxn: NIOServerCnxn factory exited run method 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.server.PrepRequestProcessor: PrepRequestProcessor exited loop! 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.server.SyncRequestProcessor: SyncRequestProcessor exited! 2010-09-03 16:09:52,370 INFO org.apache.zookeeper.server.FinalRequestProcessor: shutdown of request processor complete 2010-09-03 16:09:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase Received ZooKeeper Event, type=None, state=Disconnected, path=null 2010-09-03 16:09:52,470 INFO org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase Received Disconnected from ZooKeeper, ignoring 2010-09-03 16:09:52,471 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase Received ZooKeeper Event, type=None, state=Disconnected, path=null 2010-09-03 16:09:52,471 INFO org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: localhost:/hbase Received Disconnected from ZooKeeper, ignoring 2010-09-03 16:09:52,857 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181 2010-09-03 16:09:52,858 WARN org.apache.zookeeper.ClientCnxn: Session 0x12ad9dccda30001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 2010-09-03 16:09:53,149 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/fe80:0:0:0:0:0:0:1%1:2181 2010-09-03 16:09:53,150 WARN org.apache.zookeeper.ClientCnxn: Session 0x12ad9dccda30002 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 2010-09-03 16:09:53,576 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 2010-09-03 16:09:53,576 WARN org.apache.zookeeper.ClientCnxn: Session 0x12ad9dccda30001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 2010-09-03 16:09:54,000 INFO org.apache.zookeeper.server.SessionTrackerImpl: SessionTrackerImpl exited loop! 2010-09-03 16:09:54,002 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Closed zookeeper sessionid=0x12ad9dccda30001 2010-09-03 16:09:54,129 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181 2010-09-03 16:09:54,130 WARN org.apache.zookeeper.ClientCnxn: Session 0x12ad9dccda30002 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
Re: Receiving create events for self with synchronous create
Hi Todd, The code that you point to, I am not able to make out the sequence of steps. Can you be more clear on what you are trying to do in terms of zookeeper api? Thanks mahadev On 8/26/10 5:58 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm running into a strange issue I could use a hand with. I've implemented leader election, and this is working well. I'm now implementing a follower queue with ephemeral nodes. I have an interface IClusterManager which simply has the api clusterChanged. I don't care if nodes are added or deleted, I always want to fire this event. I have the following basic algorithm. init Create a path with /follower/+mynode name fire the clusterChangedEvent Watch set the event watcher on the path /follower. watch: reset the watch on /follower if event is not a NodeDeleted or NodeCreated, ignore fire the clustermanager event this seems pretty straightforward. Here is what I'm expecting 1. Create my node path 2. fire the clusterChanged event 3. Set watch on /follower 4. Receive watch events for changes from any other nodes. What's actually happening 1. Create my node path 2. fire the clusterChanged event 3. Set Watch on /follower 4. Receive watch event for node created in step 1 5. Receive future watch events for changes from any other nodes. Here is my code. Since I set the watch after I create the node, I'm not expecting to receive the event for it. Am I doing something incorrectly in creating my watch? Here is my code. http://pastebin.com/zDXgLagd Thanks, Todd
Re: Size of a znode in memory
Hi Marten, The usual memory footprint of a znode is around 40-80 bytes. I think Ben is planning to document a way to calculate approximate memory footprint of your zk servers given a set of updates and there sizes. thanks mahadev On 8/25/10 11:49 AM, Maarten Koopmans maar...@vrijheid.net wrote: Hi, Is there a way to know/measure the size of a znode? My average znode has a name of 32 bytes and user data of max 128 bytes. Or is the only way to run a smoke test and watch the heap growth via jconsole or so? Thanks, Maarten
Re: Searching more ZooKeeper content
I am definitely a +1 on this, given that its powered by Solr. Thanks mahadev On 8/25/10 9:22 AM, Alex Baranau alex.barano...@gmail.com wrote: Hello guys, Over at http://search-hadoop.com we index ZooKeeper project's mailing lists, wiki, web site, source code, javadoc, jira... Would the community be interested in a patch that replaces the Google-powered search with that from search-hadoop.com, set to search only ZooKeeper project by default? We look into adding this search service for all Hadoop's sub-projects. Assuming people are for this, any suggestions for how the search should function by default or any specific instructions for how the search box should be modified would be great! Thank you, Alex Baranau. P.S. HBase community already accepted our proposal (please refer to https://issues.apache.org/jira/browse/HBASE-2886) and new version (0.90) will include new search box. Also the patch is available for TIKA (we are in the process of discussing some details now): https://issues.apache.org/jira/browse/TIKA-488. ZooKeeper's site looks much like Avro's for which we also created patch recently ( https://issues.apache.org/jira/browse/AVRO-626).
Re: Parent nodes multi-step transactions
Hi Gustavo, Usually the paradigm I like to suggest is to have something like /A/init Every client watches for the existence of this node and this node is only created after /A has been initialized with the creation of /A/C or other stuff. Would that work for you? Thanks mahadev On 8/23/10 7:34 AM, Gustavo Niemeyer gust...@niemeyer.net wrote: Greetings, We (a development team at Canonical) are stumbling into a situation here which I'd be curious to understand what is the general practice, since I'm sure this is somewhat of a common issue. It's quite easy to describe it: say there's a parent node A somewhere in the tree. That node was created dynamically over the course of running the system, because it's associated with some resource which has its own life-span. Now, under this node we put some control nodes for different reasons (say, A/B), and we also want to track some information which is related to a sequence of nodes (say, A/C/D-0, A/C/D-1, etc). So, we end up with something like this: A/B A/C/D-0 A/C/D-1 The question here is about best-practices for taking care of nodes like A/C. It'd be fantastic to be able to create A's structure together with A itself, otherwise we risk getting in a situation where a client can see the node A before its initialization has been finished (A/C doesn't exist yet). In fact, A/C may never exist, since it is possible for a client to die between the creation of A and C. Anyway, I'm sure you all understand the problem. The question here is: this is pretty common, and quite boring to deal with properly on every single client. Is there any feature in the roadmap to deal with this, and any common practice besides the obvious check for half-initialization and wait for A/C to be created or deal with timeouts and whatnot on every client? I'm about to start writing another layer on top of Zookeeper's API, so it'd be great to have some additional insight into this issue. -- Gustavo Niemeyer http://niemeyer.net http://niemeyer.net/blog http://niemeyer.net/twitter
Re: Non Hadoop scheduling frameworks
Hi Todd, Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or is this a broader question for scheduling on cassandra nodes? For the latter this probably isnt the right mailing list. Thanks mahadev On 8/23/10 4:02 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, We're using Zookeeper for Leader Election and system monitoring. We're also using it for synchronizing our cluster wide jobs with barriers. We're running into an issue where we now have a single job, but each node can fire the job independently of others with different criteria in the job. In the event of a system failure, another node in our application cluster will need to fire this Job. I've used quartz previously (we're running Java 6), but it simply isn't designed for the use case we have. I found this article on cloudera. http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/ I've looked at both plugins, but they require hadoop. We're not currently running hadoop, we only have Cassandra. Here are the 2 basic use cases we need to support. UC1: Synchronized Jobs 1. A job is fired across all nodes 2. The nodes wait until the barrier is entered by all participants 3. The nodes process the data and leave 4. On all nodes leaving the barrier, the Leader node marks the job as complete. UC2: Multiple Jobs per Node 1. A Job is scheduled for a future time on a specific node (usually the same node that's creating the trigger) 2. A Trigger can be overwritten and cancelled without the job firing 3. In the event of a node failure, the Leader will take all pending jobs from the failed node, and partition them across the remaining nodes. Any input would be greatly appreciated. Thanks, Todd
Re: Zookeeper stops
Hi Wim, It mostly looks like that zookeeper is not able to create files on the /tmp filesystem. Is there is a space shortage or is it possible the file is being deleted as its being written to? Sometimes admins have a crontab on /tmp that cleans up the /tmp filesystem. Thanks mahadev On 8/19/10 1:15 AM, Wim Jongman wim.jong...@gmail.com wrote: Hi, I have a zookeeper server running that can sometimes run for days and then quits: Is there somebody with a clue to the problem? I am running 64 bit Ubuntu with java version 1.6.0_18 OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) Zookeeper 3.3.0 The log below has some context before it shows the fatal error. Our component.id=40676 indicates that it is the 40676th time that I ask ZK to publish this information. It has been seen to go up to half a million before stopping. Regards, Wim ZooDiscovery Service Unpublished: Aug 18, 2010 11:17:28 PM. ServiceInfo[uri=osgiservices:// 188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_19q0FmlQF0wEwjSl6SpUTJRlV5g=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice, osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id =org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@68a1e081, component.name=Star Wars Quotes Service, ecf.sp.ect=ecf.generic.server, component.id=40676, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5b9a6ad1 }]] ZooDiscovery Service Published: Aug 18, 2010 11:17:29 PM. ServiceInfo[uri=osgiservices:// 188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice, osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id =org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4, component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server, component.id=40677, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953 }]] [log;+0200 2010.08.18 23:17:29:545;INFO;org.eclipse.ecf.remoteservice;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.remoteservice;code=0;message=No async remote service interface found with name=org.eclipse.ecf.services.quotes.QuoteServiceAsync for proxy service class=org.eclipse.ecf.services.quotes.QuoteService;severity2;exception=null;children=[]]] 2010-08-18 23:17:37,057 - FATAL [Snapshot Thread:zookeeperser...@262] - Severe unrecoverable error, exiting java.io.FileNotFoundException: /tmp/zookeeperData/version-2/snapshot.13e2e (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:209) at java.io.FileOutputStream.init(FileOutputStream.java:160) at org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:224) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:211) at org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:260) at org.apache.zookeeper.server.SyncRequestProcessor$1.run(SyncRequestProcessor.java:120) ZooDiscovery Service Unpublished: Aug 18, 2010 11:17:37 PM. ServiceInfo[uri=osgiservices:// 188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;id=ServiceID[type=ServiceTypeID[typeName=_osgiservices._tcp.default._iana];location=osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=;full=_osgiservices._tcp.default._i...@osgiservices://188.40.116.87:3282/svc_u2GpWmF3YKSlTauWcwOMsDgiBxs=];priority=0;weight=0;props=ServiceProperties[{ecf.rsvc.ns=ecf.namespace.generic.remoteservice, osgi.remote.service.interfaces=org.eclipse.ecf.services.quotes.QuoteService, ecf.sp.cns=org.eclipse.ecf.core.identity.StringID, ecf.rsvc.id =org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@71bfa0a4, component.name=Eclipse Twitter, ecf.sp.ect=ecf.generic.server, component.id=40677, ecf.sp.cid=org.eclipse.ecf.discovery.serviceproperties$bytearraywrap...@5bcba953 }]]
Re: A question about Watcher
Hi Qian, The watcher information is saved at the client, and the client will reattach the watches to the new server it connects to. Hope that helps. Thanks mahadev On 8/16/10 9:28 AM, Qian Ye yeqian@gmail.com wrote: thx for explaination. Since the watcher can be preserved when the client switch the zookeeper server it connects to, does that means all the watchers information will be saved on all the zookeeper servers? I didn't find any source of the client can hold the watchers information. On Tue, Aug 17, 2010 at 12:21 AM, Ted Dunning ted.dunn...@gmail.com wrote: I should correct this. The watchers will deliver a session expiration event, but since the connection is closed at that point no further events will be delivered and the cluster will remove them. This is as good as the watchers disappearing. On Mon, Aug 16, 2010 at 9:20 AM, Ted Dunning ted.dunn...@gmail.com wrote: The other is session expiration. Watchers do not survive this. This happens when a client does not provide timely evidence that it is alive and is marked as having disappeared by the cluster. -- With Regards! Ye, Qian
Re: How to handle Node does not exist error?
HI Dr Hao, Can you please post the configuration of all the 3 zookeeper servers? I suspect it might be misconfigured clusters and they might not belong to the same ensemble. Just to be clear: /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002807 And other such nodes exist on one of the zookeeper servers and the same node does not exist on other servers? Also, as ted pointed out, can you please post the output of echo ³stat² | nc localhost 2181 (on all the 3 servers) to the list? Thanks mahadev On 8/11/10 12:10 AM, Dr Hao He h...@softtouchit.com wrote: hi, Ted, Thanks for the reply. Here is what I did: [zk: localhost:2181(CONNECTED) 0] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg002948 [] zk: localhost:2181(CONNECTED) 1] ls /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs [msg002807, msg002700, msg002701, msg002804, msg002704, msg002706, msg002601, msg001849, msg001847, msg002508, msg002609, msg001841, msg002607, msg002606, msg002604, msg002809, msg002817, msg001633, msg002812, msg002814, msg002711, msg002815, msg002713, msg002716, msg001772, msg002811, msg001635, msg001774, msg002515, msg002610, msg001838, msg002517, msg002612, msg002519, msg001973, msg001835, msg001974, msg002619, msg001831, msg002510, msg002512, msg002615, msg002614, msg002617, msg002104, msg002106, msg001769, msg001768, msg002828, msg002822, msg001760, msg002820, msg001963, msg001961, msg002110, msg002118, msg002900, msg002836, msg001757, msg002907, msg001753, msg001752, msg001755, msg001952, msg001958, msg001852, msg001956, msg001854, msg002749, msg001608, msg001609, msg002747, msg002882, msg001743, msg002888, msg001605, msg002885, msg001487, msg001746, msg002330, msg001749, msg001488, msg001489, msg001881, msg001491, msg002890, msg001889, msg002758, msg002241, msg002892, msg002852, msg002759, msg002898, msg002850, msg001733, msg002751, msg001739, msg002753, msg002756, msg002332, msg001872, msg002233, msg001721, msg001627, msg001720, msg001625, msg001628, msg001629, msg001729, msg002350, msg001727, msg002352, msg001622, msg001726, msg001623, msg001723, msg001724, msg001621, msg002736, msg002738, msg002363, msg001717, msg002878, msg002362, msg002361, msg001611, msg001894, msg002357, msg002218, msg002358, msg002355, msg001895, msg002356, msg001898, msg002354, msg001996, msg001990, msg002093, msg002880, msg002576, msg002579, msg002267, msg002266, msg002366, msg001901, msg002365, msg001903, msg001799, msg001906, msg002368, msg001597, msg002679, msg002166, msg001595, msg002481, msg002482, msg002373, msg002374, msg002371, msg001599, msg002773, msg002274, msg002275, msg002270, msg002583, msg002271, msg002580, msg002067, msg002277, msg002278, msg002376, msg002180, msg002467, msg002378, msg002182, msg002377, msg002184, msg002379, msg002187, msg002186, msg002665, msg002666, msg002381, msg002382, msg002661, msg002662, msg002663, msg002385, msg002284, msg002766, msg002282, msg002190, msg002599, msg002054, msg002596, msg002453, msg002459, msg002457, msg002456, msg002191, msg002652, msg002395, msg002650, msg002656, msg002655, msg002189, msg002047, msg002658, msg002659, msg002796, msg002250, msg002255, msg002589, msg002257, msg002061, msg002064, msg002585, msg002258, msg002587, msg002444, msg002446, msg002447, msg002450, msg002646, msg001501, msg002591, msg002592, msg001503, msg001506, msg002260, msg002594, msg002262, msg002263, msg002264, msg002590, msg002132, msg002130, msg002530, msg002931, msg001559, msg001808, msg002024, msg001553, msg002939, msg002937, msg001556, msg002935, msg002933, msg002140, msg001937, msg002143, msg002520, msg002522, msg002429, msg002524, msg002920, msg002035, msg001561, msg002134, msg002138, msg002925, msg002151, msg002287, msg002555, msg002010, msg002002, msg002290, msg001537, msg002005, msg002147, msg002145, msg002698,
Re: Sequence Number Generation With Zookeeper
Hi David, I think it would be really useful. It would be very helpful for someone looking for geenrating unique tokens/generations ids ( I can think of plenty of applications for this). Please do consider contributing it back to the community! Thanks mahadev On 8/6/10 7:10 AM, David Rosenstrauch dar...@darose.net wrote: Perhaps. I'd have to ask my boss for permission to release the code. Is this something that would be interesting/useful to other people? If so, I can ask about it. DR On 08/05/2010 11:02 PM, Jonathan Holloway wrote: Hi David, We did discuss potentially doing this as well. It would be nice to get some recipes for Zookeeper done for this area, if people think it's useful. Were you thinking of submitting this back as a recipe, if not then I could potentially work on such a recipe instead. Many thanks, Jon. I just ran into this exact situation, and handled it like so: I wrote a library that uses the option (b) you described above. Only instead of requesting a single sequence number, you request a block of them at a time from Zookeeper, and then locally use them up one by one from the block you retrieved. Retrieving by block (e.g., by blocks of 1 at a time) eliminates the contention issue. Then, if you're finished assigning ID's from that block, but still have a bunch of ID's left in the block, the library has another function to push back the unused ID's. They'll then get pulled again in the next block retrieval. We don't actually have this code running in production yet, so I can't vouch for how well it works. But the design was reviewed and given the thumbs up by the core developers on the team, and the implementation passes all my unit tests. HTH. Feel free to email back with specific questions if you'd like more details. DR
Re: zkperl - skipped tests
Hi Martin, You might have to look into the tests. t/50_access.t is the file you might want to take a look at. I am not a perl guru so am not of much help but let me know if you cant work out the details on the skipped tests. I will try to dig into the perl code. Thanks mahadev On 8/4/10 6:16 AM, Martin Waite waite@gmail.com wrote: Hi, I built the perl module and ran the test suite. For test 50_access, 3 tests are skipped. vm-026-lenny-mw$ ZK_TEST_HOSTS=127.0.0.1:2181 make test PERL_DL_NONLAZY=1 /usr/bin/perl -MExtUtils::Command::MM -e test_harness(0, 'blib/lib', 'blib/arch') t/*.t t/10_invalid..ok 1/107# no ZooKeeper path specified in ZK_TEST_PATH env var, using root path t/10_invalid..ok t/15_thread...ok t/20_tie..ok t/22_stat_tie.ok t/24_watch_tieok t/30_connect..ok t/35_log..ok t/40_basicok t/45_classok t/50_access...ok 3/38 skipped: various reasons t/60_watchok All tests successful, 3 subtests skipped. Files=11, Tests=461, 18 wallclock secs ( 2.01 cusr + 3.08 csys = 5.09 CPU) Is there any way to find out which of the 38 tests were skipped and why ? regards, Martin
Re: node symlinks
HI Maarteen, Can you elaborate on your use case of ZooKeeper? We currently don't have any symlinks feature in zookeeper. The only way to do it for you would be a client side hash/lookup table that buckets data to different zookeeper servers. Or you could also store this hash/lookup table in one of the zookeeper clusters. This lookup table can then be cached on the client side after reading it once from zookeeper servers. Thanks mahadev On 7/24/10 2:39 PM, Maarten Koopmans maar...@vrijheid.net wrote: Yes, I thought about Cassandra or Voldemort, but I need ZKs guarantees as it will provide the file system hierarchy to a flat object store so I need locking primitives and consistency. Doing that on top of Voldemort will give me a scalable version of ZK, but just slower. Might as well find a way to scale across ZK clusters. Also, I want to be able to add clusters as the number of nodes grows. Note that the #nodes will grow with the #users of the system, so the clusters can grow sequentially, hence the symlink idea. --Maarten On 07/24/2010 11:12 PM, Ted Dunning wrote: Depending on your application, it might be good to simply hash the node name to decide which ZK cluster to put it on. Also, a scalable key value store like Voldemort or Cassandra might be more appropriate for your application. Unless you need the hard-core guarantees of ZK, they can be better for large scale storage. On Sat, Jul 24, 2010 at 7:30 AM, Maarten Koopmansmaar...@vrijheid.netwrote: Hi, I have a number of nodes that will grow larger than one cluster can hold, so I am looking for a way to efficiently stack clusters. One way is to have a zookeeper node symlink to another cluster. Has anybody ever done that and some tips, or alternative approaches? Currently I use Scala, and traverse zookeeper trees by proper tail recursion, so adapting the tail recursion to process symlinks would be my approach. Bst, Maarten
Re: ZK recovery questions
Hi Ashwin, We have seen people wanting to have something like ZooKeeper without the reliability of permanent storage and are willing to work with loosened guarantees of current Zookeeper. What you mention on log files is certainly a valid use case. It would be great to see how much throughput you will be able to get in such a scenario wherein we never log onto a permanent store. Do you want to try this out and see what kind of throughput difference you can get? Thanks mahadev On 7/19/10 8:35 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com wrote: Cool. I've only tried the single node server so far. I didn't know it could sync from other senior servers. Server/Cluster addresses: I read somewhere in the docs/todo list that the bootstrap server list for the clients should be the same. So, what happens when a new replacement server has to be brought in on a different IP/hostname? Do the older clients autodetect the new server or is this even supported? I suppose not. Log files: I have absolutely no confusion between ZK and databases (very tempting tho'), but running ZK servers without log files does not seem unusual. Especially since you said new servers can sync directly from senior servers without relying on log files. In that case, I'm curious to see what happens if you just redirect log files to /dev/null. Anyone tried this? Regards, Ashwin Jayaprakash.
Re: unit test failure
HI Martin, Can you check if you have a stale java process (ZooKeeperServer) running on your machine? That might cause some issues with the tests. Thanks mahadev On 7/14/10 8:03 AM, Martin Waite waite@gmail.com wrote: Hi, I am attempting to build the C client on debian lenny. autoconf, configure, make and make install all appear to work cleanly. I ran: autoreconf -if ./configure make make install make run-check However, the unit tests fail: $ make run-check make zktest-st zktest-mt make[1]: Entering directory `/home/martin/zookeeper-3.3.1/src/c' make[1]: `zktest-st' is up to date. make[1]: `zktest-mt' is up to date. make[1]: Leaving directory `/home/martin/zookeeper-3.3.1/src/c' ./zktest-st ./tests/zkServer.sh: line 52: kill: (17711) - No such process ZooKeeper server startedRunning Zookeeper_operations::testPing : elapsed 1 : OK Zookeeper_operations::testTimeoutCausedByWatches1 : elapsed 0 : OK Zookeeper_operations::testTimeoutCausedByWatches2 : elapsed 0 : OK Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : elapsed 2 : OK Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : OK Zookeeper_operations::testConcurrentOperations1 : elapsed 206 : OK Zookeeper_init::testBasic : elapsed 0 : OK Zookeeper_init::testAddressResolution : elapsed 0 : OK Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK Zookeeper_init::testNullAddressString : elapsed 0 : OK Zookeeper_init::testEmptyAddressString : elapsed 0 : OK Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK Zookeeper_init::testInvalidAddressString2 : elapsed 2 : OK Zookeeper_init::testNonexistentHost : elapsed 108 : OK Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 0 : OK Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK Zookeeper_close::testCloseUnconnected : elapsed 0 : OK Zookeeper_close::testCloseUnconnected1 : elapsed 0 : OK Zookeeper_close::testCloseConnected1 : elapsed 0 : OK Zookeeper_close::testCloseFromWatcher1 : elapsed 0 : OK Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after throwing an instance of 'CppUnit::Exception' what(): equality assertion failed - Expected: -101 - Actual : -4 make: *** [run-check] Aborted This appears to come from tests/TestClient.cc - but beyond that, it is hard to identify which equality assertion failed. Help ! regards, Martin
Re: building client tools
Hi Martin, There is a list of tools, i.e cppunit. That is the only required tool to build the zookeeper c library. The readme says that it can be done without cppunit being installed but there has been a open bug regarding this. So cppunit is required as of now. Thanks mahadev On 7/13/10 10:09 AM, Martin Waite waite@gmail.com wrote: Hi, I am trying to build the c client on debian lenny for zookeeper 3.3.1. autoreconf -if configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. autoreconf: /usr/bin/autoconf failed with exit status: 1 I probably need to install some required tools. Is there a list of what tools are needed to build this please ? regards, Martin
Re: running the systest
Hi Stuart, The instructions are just out of date. If you could open a jira and post a patch to it that would be great! We should try getting this in 3.3.2! That would be useful! Thanks mahadev On 7/9/10 6:36 AM, Stuart Halloway stuart.hallo...@gmail.com wrote: Hi all, I am trying to run the systest and have hit a few minor issues: (1) The readme says src/contrib/jarjar, apparently should be src/contrib/fatjar (2) The compiled fatjar seems to be missing junit, so the launch instructions do not work. I can fix or workaround these, but I wanted to see if maybe the instructions are just out of date, and there is an easy (but currently undocumented) way to launch the tests. Thanks, Stu
Re: Suggested way to simulate client session expiration in unit tests?
Hi Jeremy, zk.disconnect() is the right way to disconnect from the servers. For session expiration you just have to make sure that the client stays disconnected for more than the session expiration interval. Hope that helps. Thanks mahadev On 7/6/10 9:09 AM, Jeremy Davis jerdavis.cassan...@gmail.com wrote: Is there a recommended way of simulating a client session expiration in unit tests? I see a TestableZooKeeper.java, with a pauseCnxn() method that does cause the connection to timeout/disconnect and reconnect. Is there an easy way to push this all the way through to session expiration? Thanks, -JD
Re: Securing ZooKeeper connections
Hi Vishal, Ben (Benjamin Reed) has been working on a netty based client server protocol in ZooKeeper. I think there is an open jira for it. My network connection is pretty slow so am finding it hard to search for it. We have been thinking abt enabling secure connections via this netty based connections in zookeeper. Thanks mahadev On 5/25/10 12:20 PM, Vishal K vishalm...@gmail.com wrote: Hi All, Since ZooKeeper does not support secure network connections yet, I thought I would poll and see what people are doing to address this problem. Is anyone running ZooKeeper over secure channels (client - server and server- server authentication/encryption)? If yes, can you please elaborate how you do it? Thanks. Regards, -Vishal
Re: Zookeeper EventThread and SendThread
Hi Nick, These threads are spawned with each zookeeper client handle. As soon as you create a zookeeper client object these threads are spawned. Are yu creating too many zookeeper client objects in your application? Htanks mahadev On 5/20/10 11:30 AM, Nick Bailey nicholas.bai...@rackspace.com wrote: Hey guys, Question regarding zookeeper's EventThread and SendThread. I'm not quite sure what these are used for but a stacktrace of our client application contains lines similar to pool-2-thread-20-EventThread daemon prio=10 tid=0x2aac3cb29c00 nid=0x75d waiting on condition [0x6b08..0x6b080b10] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aab1f577250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Ab stractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) pool-2-thread-20-SendThread daemon prio=10 tid=0x2aac3c35d400 nid=0x75c runnable [0x70ede000..0x70edeb90] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x2aab1f571d08 (a sun.nio.ch.Util$1) - locked 0x2aab1f571cf0 (a java.util.Collections$UnmodifiableSet) - locked 0x2aab1f5715b8 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:921) There are pairs of threads ranging from thread-1 to thread-50 and also multiple pairs of these threads. As in pool-2-thread-20-SendThread is the name of multiple threads in the trace. I'm debugging some load issues with our system and am suspicious that the large amount of zookeeper threads is contributing. Would anyone be able to elaborate on the purpose of these threads and how they are spawned? Thanks, Nick Bailey Rackspace Hosting Software Developer, Email Apps nicholas.bai...@rackspace.com
Re: Using ZooKeeper for managing solrCloud
Hi Rakhi, You can read more abt monitoring zookeeper servers at http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperAdmin.html#sc_monito ring Thanks mahadev On 5/14/10 4:09 AM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, I just went through the zookeeper tutorial and successfully managed to run the zookeeper server. How do we monitor the zookeeper server?, is there a url for it? i pasted the following urls on browser, but all i get is a blank page http://localhost:2181 http://localhost:2181/zookeeper I actually needed zookeeper for managing solr cloud managed externally but now if i hv 2 solr servers running, how do i configure zookeeper to manage them. Regards, Raakhi
Re: Can't ls with large node count and I don't understand the use of jute.maxbuffer
Hi Aaaron, Each of the requests and response between client and servers is sent an (buflen, buffer) packet. The content of the packets are then deserialized from this buffer. Looks like the size of the packet (buflen) is big in yoru case. We usually avoid sending/receiving large packets just to discourage folks from using it as bulk data store. We also discourage creating a flat hierarchy with too many direct children (your case). This is because such directories can cause huge load on network/servers when an list on that directores are done by a huge number of clients. We always suggest to bucket these children into more hierarchical structure. You are probably hitting the limit of 1MB for this! You might want to change this in your client configuration as a temporary fix! But for later you might want to think about out structure in ZooKeeper to make it more hierarchical via some kind of bucketing! Thanks mahadev On 5/13/10 10:18 AM, Aaron Crow dirtyvagab...@yahoo.com wrote: We're running Zookeeper with about 2 million nodes. It's working, with one specific exception: When I try to get all children on one of the main node trees, I get an IOException out of ClientCnxn (Packet len4648067 is out of range!). There are 150329 children under the node in question. I should also mention that I can successfully ls other nodes with similarly high children counts. But this specific node always fails. Googling led me to see that Mahadev dealt with this last year: http://www.mail-archive.com/zookeeper-comm...@hadoop.apache.org/msg00175.html Source diving led me to see that ClientCnxn enforces a bound based on the jute.maxbuffer setting: packetLen = Integer.getInteger(jute.maxbuffer, 4096 * 1024); ... if (len 0 || len = packetLen) { throw new IOException(Packet len + len + is out of range!); So maybe I could bump this up in config... but, I'm confused when reading the documentation on jute.maxbuffer: It specifies the maximum size of the data that can be stored in a znode. It's true we have an extremely high node count. However, we've been careful to keep each node's data very small -- e.g., we certainly should have no single data entry longer than 256 characters. The way I'm reading the docs, the jute.maxbuffer bound is purely against the data size of specific nodes, and shouldn't relate to child count. Or does it relate to child count as well? Here is a stat on the offending node: cZxid = 0x1000e ctime = Mon May 03 17:40:58 PDT 2010 mZxid = 0x1000e mtime = Mon May 03 17:40:58 PDT 2010 pZxid = 0x100315064 cversion = 150654 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 150372 Thanks for any insights... Aaron
Re: Xid out of order. Got 8 expected 7
Hi Jordan, Can you create a jira for this? And attach all the server logs and client logs related to this timeline? How did you start up the servers? Is there some changes you might have made accidentatlly to the servers? Thanks mahadev On 5/12/10 10:49 AM, Jordan Zimmerman jzimmer...@proofpoint.com wrote: We've just started seeing an odd error and are having trouble determining the cause. Xid out of order. Got 8 expected 7 Any hints on what can cause this? Any ideas on how to debug? We're using ZK 3.3.0. The error occurs in ClientCnxn.java line 781 -Jordan
Re: ZookeeperPresentations Wiki
I just emailed in...@apache to ask for there help on this. I wasn't able to figure out what the problem is! Thanks for pointing it out. mahadev On 5/11/10 4:01 PM, Sudipto Das sudi...@cs.ucsb.edu wrote: Hi, I am trying to download some presentation slides from the ZookeeperPresentations wiki ( http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations) but I am facing a weird problem. On clicking on a link for a presentation, I am getting the error message You are not allowed to do AttachFile on this page. Login and try again. I tried creating an account, and even after that, I get the same error message, except the login suggestion. All attachment links have an action=AttachFile URL, (e.g. http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations?action=AttachFi ledo=viewtarget=zookeeper_hbase.pptxfor the zookeeper_hbase.pptx file). My intent is to just download the files. Please let me know if I am doing something wrong. Sorry for my ignorance, but I honestly tried out all obvious means to figure out. :( Best Regards Sudipto -- Sudipto Das PhD Candidate CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto
Re: New ZooKeeper client library Cages
Hi Dominic, Good to see this. I like the name cages :). You might want to post to the list what cages is useful for. I think quite a few folks would be interested in something like this. Are you guys currently using it with cassandra? Thanks mahadev On 5/11/10 4:02 PM, Dominic Williams thedwilli...@googlemail.com wrote: Anyone looking for a Java client library for ZooKeeper, please checkout: Cages - http://cages.googlecode.com The library will be expanded and feedback will be helpful. Many thanks, Dominic ria101.wordpress.com
Re: avoiding deadlocks on client handle close w/ python/c api
Sure, Ill take a look at it. Thanks mahadev On 5/4/10 2:32 PM, Patrick Hunt ph...@apache.org wrote: Thanks Kapil, Mahadev perhaps you could take a look at this as well? Patrick On 05/04/2010 06:36 AM, Kapil Thangavelu wrote: I've constructed a simple example just using the zkpython library with condition variables, that will deadlock. I've filed a new ticket for it, https://issues.apache.org/jira/browse/ZOOKEEPER-763 the gdb stack traces look suspiciously like the ones in 591, but sans the watchers. https://issues.apache.org/jira/browse/ZOOKEEPER-591 the attached example on the ticket will deadlock in zk 3.3.0 (which has the fix for 591) and trunk. -kapil On Mon, May 3, 2010 at 9:48 PM, Kapil Thangavelukapil.f...@gmail.comwrote: Hi Folks, I'm constructing an async api on top of the zookeeper python bindings for twisted. The intent was to make a thin wrapper that would wrap the existing async api with one that allows for integration with the twisted python event loop (http://www.twistedmatrix.com) primarily using the async apis. One issue i'm running into while developing a unit tests, deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. I'm curious if this would be considered bug, afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to the api to guard against closing while there is an outstanding completion request, but its an imperfect solution do to the nature of the event loop integration. The problem is that the python callback invoked by the completion thread in turn schedules a function for the main thread. In twisted the api for this is implemented by appending the function to a list attribute on the reactor and then writing a byte to a pipe to wakeup the main thread. If a thread switch to the main thread occurs before the completion thread callback returns, the scheduled function runs and the rest of the application keeps processing, of which the last step for the unit tests is to close the connection, which results in a deadlock. i've included some of the client log and gdb stack traces from a deadlock'd client process. thanks, Kapil
Re: ZKClient
Hi Adam, I don't think zk is very very hard to get right. There are exmaples in src/recipes which implements locks/queues/others. There is ZOOKEEPER-22 to make it even more easier for application to use. Regarding re registration of watches, you can deifnitely write code and submit is as a part of well documented contrib module which lays out the assumptions/design of it. It could very well be useful for others. Its just that folks havent had much time to focus on these areas as yet. Thanks mahadev On 5/4/10 2:58 PM, Adam Rosien a...@rosien.net wrote: I use zkclient in my work at kaChing and I have mixed feelings about it. On one hand it makes easy things easy which is great, but on the other hand I very few ideas what assumptions it makes under the hood. I also dislike some of the design choices such as unchecked exceptions, but that's neither here nor there. It would take some extensive documentation work by the authors to really enumerate the model and assumptions, but the project doesn't seem to be active (either from it being adequate for its current users or just inactive). I'm not sure I could derive the assumptions myself. I'm a bit frustrated that zk is very, very hard to really get right. At a project level, can't we create structures to avoid most of these errors? Can there be a standard model with detailed assumptions and implementations of all the recipes? How can we start this? Is there something that makes this too hard? I feel like a recipe page is a big fail; wouldn't an example app that uses locks and barriers be that much more compelling? For the common FAQ items like you need to re-register the watch, can't we just create code that implements this pattern? My goal is to live up to the motto: a good API is impossible to use incorrectly. .. Adam On Tue, May 4, 2010 at 2:21 PM, Ted Dunning ted.dunn...@gmail.com wrote: In general, writing this sort of layer on top of ZK is very, very hard to get really right for general use. In a simple use-case, you can probably nail it but distributed systems are a Zoo, to coin a phrase. The problem is that you are fundamentally changing the metaphors in use so assumptions can come unglued or be introduced pretty easily. One example of this is the fact that ZK watches *don't* fire for every change but when you write listener oriented code, you kind of expect that they will. That makes it really, really easy to introduce that assumption in the heads of the programmer using the event listener library on top of ZK. Another example is how the atomic get content/set watch call works in ZK is easy to violate in an event driven architecture because the thread that watches ZK probably resets the watch. If you assume that the listener will read the data, then you have introduced a timing mismatch between the read of the data and the resetting of the watch. That might be OK or it might not be. The point is that these changes are subtle and tricky to get exactly right. On Tue, May 4, 2010 at 1:48 PM, Jonathan Holloway jonathan.hollo...@gmail.com wrote: Is there any reason why this isn't part of the Zookeeper trunk already?
Re: Dynamic adding/removing ZK servers on client
Hi Dave, Just a question on how do you see it being used, meaning who would call addserver and removeserver? It does seem useful to be able to do this. This is definitely worth working on. You can link it as a subtask of ZOOKEEPER-107. Thanks mahadev On 5/3/10 7:03 AM, Dave Wright wrig...@gmail.com wrote: I've got a situation where I essentially need dynamic cluster membership, which has been talked about in ZOOKEEPER-107 but doesn't look like it's going to happen any time soon. For now, I'm planning on working around this by having a simple coordinator service on the server nodes that will re-write the configs and bounce the servers when membership changes. Clients will may get an error or two and need to reconnect, but that should be handled by the normal error logic. On the client side, I'd really like to dynamically update the server list w/o having to re-create the entire Zookeeper object. Looking at the code, it seems like it would be pretty trivial to add RemoveServer()/AddServer() functions for Zookeeper that calls down to ClientCnxn, where they are just maintained in a list. Of course if the server being removed is the one currently connected, we'd need to disconnect, but a simple call to disconnect() seems like it would resolve that and trigger the automatic re-connection logic. Does anyone see an issue with that approach? Were I to create the patch, do you think it would be interesting enough to merge? It seems like that functionality will eventually be needed for whatever full dynamic server support is eventually implemented. -Dave Wright
Re: Dynamic adding/removing ZK servers on client
Yeah, that was one of the ideas, I think its been on the jira somewhere ( I forget)... But could be and would definitely be one soln for it. Thanks mahadev On 5/3/10 2:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: Should this be a znode in the privileged namespace? On Mon, May 3, 2010 at 1:45 PM, Dave Wright wrig...@gmail.com wrote: Hi Dave, Just a question on how do you see it being used, meaning who would call addserver and removeserver? It does seem useful to be able to do this. This is definitely worth working on. You can link it as a subtask of ZOOKEEPER-107. In my case, it would be my client application - I would get a notification (probably via a watched ZK node controlled by my manager process) that the cluster membership was changing, and I'd adjust the client server list accordingly. -Dave
Re: Question on maintaining leader/membership status in zookeeper
Hi Lei, In this case, the Leader will be disconnected from ZK cluster and will give up its leadership. Since its disconnected, ZK cluster will realize that the Leader is dead! When Zk cluster realizes that the Leader is dead (this is because the zk cluster hasn't heard from the Leader for a certain time Configurable via session timeout parameter), the slaves will be notified of this via watchers in zookeeper cluster. The slaves will realize that the Leader is gone and will relect a new Leader and will start working with the new Leader. Does that answer your question? You might want to look though the documentation of ZK to understand its use case and how it solves these kind of issues Thanks mahadev On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote: Thank you all for your answers. It clarifies a lot of my confusion about the service guarantees of ZK. I am still struggling with one failure case (I am not trying to be the pain in the neck. But I need to have a full understanding of what ZK can offer before I make a decision on whether to used it in my cluster.) Assume the following topology: Leader ZK cluster \\// \\ // \\ // Slave(s) If I am asymmetric network failure such that the connection between Leader and Slave(s) are broken while all other connections are still alive, would my system hang after some point? Because no new leader election will be initiated by slaves and the leader can't get the work to slave(s). Thanks, Lei On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote: If one of your user clients can no longer reach one member of the ZK cluster, then it will try to reach another. If it succeeds, then it will continue without any problems as long as the ZK cluster itself is OK. This applies for all the ZK recipes. You will have to be a little bit careful to handle connection loss, but that should get easier soon (and isn't all that difficult anyway). On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote: I am not talking about the leader election within zookeeper cluster. I guess I didn't make the discussion context clear. In my case, I run a cluster that uses zookeeper for doing the leader election. Yes, nodes in my cluster are the clients of zookeeper. Those nodes depend on zookeeper to elect a new leader and figure out what the current leader is. So if the zookeeper (think of it as a stand-alone entity) becomes unavailabe in the way I've described earlier, how can I handle such situation so my cluster can still function while a majority of nodes still connect to each other (but not to the zookeeper)?
Re: Question on maintaining leader/membership status in zookeeper
Hi Lei, Sorry I minsinterpreted your question! The scenario you describe could be handled in such a way - You could have a status node in ZooKeeper which every slave will subscribe to and update! If one of the slave nodes sees that there have been too many connection refused to the Leader by the slaves, the slave could go ahead and delete the Leader znode, and force the Leader to give up its leadership. I am not describing a deatiled way to do it, but its not very hard to come up with a design for this. Do you intend to have the Leader and Slaves in different Network (different ACLs I mean) protected zones? In that case, it is a legitimate concern else I do think assymetric network partition would be very unlikely to happen. Do you usually see network partitions in such scenarios? Thanks mahadev On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote: Hi Mahadev, Why would the leader be disconnected from ZK? ZK is fine communicating with the leader in this case. We are talking about asymmetric network failure. Yes. Leader could consider all the slaves being down if it tracks the status of all slaves himself. But I guess if ZK is used for for membership management, neither the leader nor the slaves will be considered disconnected because they can all connect to ZK. Thanks, Lei On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Lei, In this case, the Leader will be disconnected from ZK cluster and will give up its leadership. Since its disconnected, ZK cluster will realize that the Leader is dead! When Zk cluster realizes that the Leader is dead (this is because the zk cluster hasn't heard from the Leader for a certain time Configurable via session timeout parameter), the slaves will be notified of this via watchers in zookeeper cluster. The slaves will realize that the Leader is gone and will relect a new Leader and will start working with the new Leader. Does that answer your question? You might want to look though the documentation of ZK to understand its use case and how it solves these kind of issues Thanks mahadev On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote: Thank you all for your answers. It clarifies a lot of my confusion about the service guarantees of ZK. I am still struggling with one failure case (I am not trying to be the pain in the neck. But I need to have a full understanding of what ZK can offer before I make a decision on whether to used it in my cluster.) Assume the following topology: Leader ZK cluster \\// \\ // \\ // Slave(s) If I am asymmetric network failure such that the connection between Leader and Slave(s) are broken while all other connections are still alive, would my system hang after some point? Because no new leader election will be initiated by slaves and the leader can't get the work to slave(s). Thanks, Lei On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote: If one of your user clients can no longer reach one member of the ZK cluster, then it will try to reach another. If it succeeds, then it will continue without any problems as long as the ZK cluster itself is OK. This applies for all the ZK recipes. You will have to be a little bit careful to handle connection loss, but that should get easier soon (and isn't all that difficult anyway). On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote: I am not talking about the leader election within zookeeper cluster. I guess I didn't make the discussion context clear. In my case, I run a cluster that uses zookeeper for doing the leader election. Yes, nodes in my cluster are the clients of zookeeper. Those nodes depend on zookeeper to elect a new leader and figure out what the current leader is. So if the zookeeper (think of it as a stand-alone entity) becomes unavailabe in the way I've described earlier, how can I handle such situation so my cluster can still function while a majority of nodes still connect to each other (but not to the zookeeper)?
Re: Question on maintaining leader/membership status in zookeeper
HI Lei, ZooKeeper provides a set of primitives which allows you to do all kinds of things! You might want to take a look at the api and some examples of zookeeper recipes to see how it works and probably that will clear things out for you. Here are the links: http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html Thanks mahadev On 4/30/10 4:46 PM, Lei Gao l...@linkedin.com wrote: Hi Mahadev, First of all, I like to thank you for being patient with me - my questions seem unclear to many of you who try to help me. I guess clients have to be smart enough to trigger a new leader election by trying to delete the znode. But in this case, ZK should not allow any single or multiple (as long as they are less than a quorum) client(s) to delete the znode responding to the master, right? A new consensus among clients (NOT among the nodes in zk cluster) has to be there for the znode to be deleted, right? Does zk have this capability or the clients have to come to this consensus outside of zk before trying to delete the znode in zk? Thanks, Lei Hi Lei, Sorry I minsinterpreted your question! The scenario you describe could be handled in such a way - You could have a status node in ZooKeeper which every slave will subscribe to and update! If one of the slave nodes sees that there have been too many connection refused to the Leader by the slaves, the slave could go ahead and delete the Leader znode, and force the Leader to give up its leadership. I am not describing a deatiled way to do it, but its not very hard to come up with a design for this. Do you intend to have the Leader and Slaves in different Network (different ACLs I mean) protected zones? In that case, it is a legitimate concern else I do think assymetric network partition would be very unlikely to happen. Do you usually see network partitions in such scenarios? Thanks mahadev On 4/30/10 4:05 PM, Lei Gao l...@linkedin.com wrote: Hi Mahadev, Why would the leader be disconnected from ZK? ZK is fine communicating with the leader in this case. We are talking about asymmetric network failure. Yes. Leader could consider all the slaves being down if it tracks the status of all slaves himself. But I guess if ZK is used for for membership management, neither the leader nor the slaves will be considered disconnected because they can all connect to ZK. Thanks, Lei On 4/30/10 3:47 PM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Lei, In this case, the Leader will be disconnected from ZK cluster and will give up its leadership. Since its disconnected, ZK cluster will realize that the Leader is dead! When Zk cluster realizes that the Leader is dead (this is because the zk cluster hasn't heard from the Leader for a certain time Configurable via session timeout parameter), the slaves will be notified of this via watchers in zookeeper cluster. The slaves will realize that the Leader is gone and will relect a new Leader and will start working with the new Leader. Does that answer your question? You might want to look though the documentation of ZK to understand its use case and how it solves these kind of issues Thanks mahadev On 4/30/10 2:08 PM, Lei Gao l...@linkedin.com wrote: Thank you all for your answers. It clarifies a lot of my confusion about the service guarantees of ZK. I am still struggling with one failure case (I am not trying to be the pain in the neck. But I need to have a full understanding of what ZK can offer before I make a decision on whether to used it in my cluster.) Assume the following topology: Leader ZK cluster \\// \\ // \\ // Slave(s) If I am asymmetric network failure such that the connection between Leader and Slave(s) are broken while all other connections are still alive, would my system hang after some point? Because no new leader election will be initiated by slaves and the leader can't get the work to slave(s). Thanks, Lei On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com wrote: If one of your user clients can no longer reach one member of the ZK cluster, then it will try to reach another. If it succeeds, then it will continue without any problems as long as the ZK cluster itself is OK. This applies for all the ZK recipes. You will have to be a little bit careful to handle connection loss, but that should get easier soon (and isn't all that difficult anyway). On Fri, Apr 30, 2010 at 1:26 PM, Lei Gao l...@linkedin.com wrote: I am not talking about the leader election within zookeeper cluster. I guess I didn't make the discussion context clear. In my case, I run a cluster that uses zookeeper for doing the leader election. Yes, nodes in my cluster are the clients of zookeeper. Those nodes depend on zookeeper to elect
Re: Embedding ZK in another application
We do set that Chad but it doesn't seem to help on some systems (especially bsd)... Thanks mahadev On 4/29/10 11:22 AM, Chad Harrington chad.harring...@gmail.com wrote: On Thu, Apr 29, 2010 at 8:49 AM, Patrick Hunt ph...@apache.org wrote: This is not foolproof however. We found that in general this would work, however there were some infrequent cases where a restarted server would fail to initialize due to the following issue: it is possible for the process to complete before the kernel has released the associated network resource, and this port cannot be bound to another process until the kernel has decided that it is done. more detail here: http://hea-www.harvard.edu/~fine/Tech/addrinuse.html as a result we ended up changing the test code to start each test with new quorum/election port numbers. This fixed the problem for us but would not be a solution in your case. Patrick I am not an expert at all on this, but I have used SO_REUSEADDR in other situations to avoid the address in use problem. Would that help here? Chad Harrington chad.harring...@gmail.com On 04/29/2010 07:13 AM, Vishal K wrote: Hi Ted, We want the application that embeds the ZK server to be running even after the ZK server is shutdown. So we don't want to restart the application. Also, we prefer not to use zkServer.sh/zkServer.cmd because these are OS dependent (our application will run on Win as well as Linux). Instead, we thought that calling QuorumPeerMain.initializeAndRun() and QuorumPeerMain.shutdown() will suffice to start and shutdown a ZK server and we won't have to worry about checking the OS. Is there way to cleanly shutdown the ZK server (by invoking ZK server API) when it is embedded in the application without actually restarting the application process? Thanks. On Thu, Apr 29, 2010 at 1:54 AM, Ted Dunningted.dunn...@gmail.com wrote: Hmmm it isn't quite clear what you mean by restart without restarting. Why is killing the server and restarting it not an option? It is common to do a rolling restart on a ZK cluster. Just restart one server at a time. This is often used during system upgrades. On Wed, Apr 28, 2010 at 8:22 PM, Vishal Kvishalm...@gmail.com wrote: What is a good way to restart a ZK server (standalone and quorum) without having to restart it? Currently, I have ZK server embedded in another java application.
Re: Zookeeper client
HI Avinash, The zk client does itself maintain liveness information and also randomizes the list of servers to balance the number of clients connected to a single ZooKeeper server. Hope that helps. Thanks mahadev On 4/27/10 10:56 AM, Avinash Lakshman avinash.laksh...@gmail.com wrote: Let's assume I have 100 clients connecting to a cluster of 5 Zookeeper servers over time. On the client side I instantiate a ZooKeeper instance and use it whenever I need to read/write into ZK. Now I know I can pass in a connect string with the list of all servers that make up the ZK cluster. Does the ZK client automatically maintain liveness information and load balance my connections across the machines? How can I do this effectively? I basically want to spread the connections from the 100 clients to the 5 ZK instances effectively. Thanks Avinash
Re: Embedding ZK in another application
Hi Vishal and Ashanka, I think Ted and Pat had somewhat comentted on this before. Reiterating these comments below. If you are ok with these points I see no concern in ZooKeeper as an embedded application. Also, as Pat mentioned earlier there are some cases where the server code will system.exit. This is typically only if quorum communication fails in some weird, unrecoverable way. We have removed most of these but there are a few still remaining. --- Comments by Ted I can't comment on the details of your code (but I have run in-process ZK's in the past without problem) Operationally, however, this isn't a great idea. The problem is two-fold: a) firstly, somebody would probably like to look at Zookeeper to understand the state of your service. If the service is down, then ZK will go away. That means that Zookeeper can't be used that way and is mild to moderate on the logarithmic international suckitude scale. b) secondly, if you want to upgrade your server without upgrading Zookeeper then you still have to bounce Zookeeper. This is probably not a problem, but it can be a slight pain. c) thirdly, you can't scale your service independently of how you scale Zookeeper. This may or may not bother you, but it would bother me. d) fourthly, you will be synchronizing your server restarts with ZK's service restarts. Moving these events away from each other is likely to make them slightly more reliable. There is no failure mode that I know of that would be tickled here, but your service code will be slightly more complex since it has to make sure that ZK is up before it does stuff. If you could make the assumption that ZK is up or exit, that would be simpler. e) yes, I know that is more than two issues. That is itself an issue since any design where the number of worries is increasing so fast is suspect on larger grounds. If there are small problems cropping up at that rate, the likelihood of there being a large problem that comes up seems higher. On 4/23/10 11:04 AM, Vishal K vishalm...@gmail.com wrote: Hi, Good question. We are planning to do something similar as well and it will great to know if there are any issues with embedding ZK server into an app. We simply use QourumPeerMain and QourumPeer from our app to start/stop the ZK server. Is this not a good way to do it? On Fri, Apr 23, 2010 at 1:28 PM, Asankha C. Perera asan...@apache.orgwrote: Hi All I'm very new to ZK, and am looking at embeding ZK into an app that needs cluster management - and the objective is to use ZK to notify application cluster control operations (e.g. shutdown etc) across nodes. I came across this post [1] from the user list by Ted Dunning from some months back : My experience with Katta has led me to believe that embedding a ZK in a product is almost always a bad idea. - The problems are that you can't administer the Zookeeper cluster independently and that the cluster typically goes down when the associated service goes down. However, I believe that both the above are fine to live with for the application under consideration, as ZK will be used only to coordinate the larger application. Is there anything else that needs to be considered - and can I safely shutdown the clientPort since the application is always in the same JVM - but, if I do that how would I connect to ZK thereafter ? thanks and regards asankha [1] http://markmail.org/message/tjonwec7p7dhfpms
Re: Embedding ZK in another application
That's true! Thanks mahadev On 4/23/10 11:41 AM, Asankha C. Perera asan...@apache.org wrote: Hi Mahadev I think Ted and Pat had somewhat comentted on this before. Reiterating these comments below. If you are ok with these points I see no concern in ZooKeeper as an embedded application... Thanks, I missed this on the archives, and it helps!.. I guess if we still decide to embed, the only way to connect to ZK is still with the normal TCP client.. cheers asankha
Re: bug: wrong heading in recipes doc
I think we should be using zookeeper locks to create jiras :) . Looks like both of you created one!!! :) Thanks mahadev On 4/22/10 1:37 PM, Patrick Hunt ph...@apache.org wrote: No problem. https://issues.apache.org/jira/browse/ZOOKEEPER-752 I've seen alot of traffic on infrastruct...@apache, you might try there, I'm sure they could help you out. Regards, Patrick On 04/22/2010 01:26 PM, Adam Rosien wrote: I would, but the Apache JIRA has been f***ed since the breakin and I can't reset my password. Would you mind adding it for me? .. Adam On Thu, Apr 22, 2010 at 11:32 AM, Patrick Huntph...@apache.org wrote: Hi Adam, would you mind creating a JIRA? That's the best way to address this type of issue. Thanks! https://issues.apache.org/jira/browse/ZOOKEEPER Patrick On 04/22/2010 11:30 AM, Adam Rosien wrote: http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html#sc_recoverableS haredLocks uses the heading recoverable locks, but the text refers to revocable. .. Adam
Re: odd error message
Ok, I think this is possible. So here is what happens currently. This has been a long standing bug and should be fixed in 3.4 https://issues.apache.org/jira/browse/ZOOKEEPER-335 A newly elected leader currently doesn't log the new leader transaction to its database In your case, the follower (the 3rd server) did log it but the leader never did. Now when you brought up the 3rd server it had the transaction log present but the leader did not have that. In that case the 3rd server cried fowl and shut down. Removing the DB is totally fine. For now, we should update our docs on 3.3 and mention that this problem might occur during upgrade and fix it in 3.4. Thanks for bringing it up Ted. Thanks mahadev On 4/20/10 2:14 PM, Ted Dunning ted.dunn...@gmail.com wrote: We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been up for about a year with no problems. On two nodes, we killed the previous instance and started the 3.3.0 instance. The first node was a follower and the second a leader. All went according to plan and no clients seemed to notice anything. The stat command showed connections moving around as expected and all other indicators were normal. When we did the third node, we saw this in the log: 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] - Leader epoch 18 is less than our epoch 19 The third node refused all connections. We brought down the third node, wiped away its snapshot, restarted and it joined without complaint. Note that the third node was originally a follower and had never been a leader during the upgrade process. Does anybody know why this happened? We are fully upgraded and there was no interruption to normal service, but this seems strange.
Re: Recovery issue - how to debug?
Hi Hao, As Vishal already asked, how are you determining if the writes are being received? Also, what was the status of C2 when you checked for these writes? Do you have the output of echo stat | nc localhost port? How long did you wait when you say that C2 did not received the writes? What was the status of C2 (again echo stat | nc localhost port) when you saw the C2 had received the writes? Thanks mahadev On 4/18/10 10:54 PM, Dr Hao He h...@softtouchit.com wrote: I have zookeeper cluster E1 with 3 nodes A,B, and C. I stopped C and did some writes on E1. Both A and B received the writes. I then started C and after a short while, C also received the writes. All seem to go well so I replicated the setup to another cluster E2 with exactly 3 nodes: A2, B2, and C2. I stopped C2 and did some writes on E2. A2 received the writes. I then started C2. However, no matter how long I wait, C2 never received the writes. I then did more writes on E2. Then C2 can receive all the writes including the old writes when it was down. How do I find out what was wrong withe E2 setup? I am running 3.2.2 on all nodes. Regards, Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com
Re: rolling upgrade 3.2.1 - 3.3.0
Hi Charity, Looks like you are hitting a bug recently found in 3.3.0. https://issues.apache.org/jira/browse/ZOOKEEPER-737 Is the bug, wherein the server does not show the right status. Looks like in your case the server is running fine but bin/zkserver.sh status is not returning the right result. You can try telnet localhost port and then type stat to get the status on the server. This bug will be fixed in the bug fix release 3.3.1 which most probalbly will be released by next week or so. Thanks mahadev On 4/14/10 3:59 PM, Charity Majors char...@shopkick.com wrote: Hi. I'm trying to upgrade a zookeeper cluster from 3.2.1 to 3.3.0, and having problems. I can't get a 3.3.0 node to successfully join the cluster and stay joined. If I run zkServer.sh status immediately after starting up the newly upgraded node, it says the service is probably not running, and shows me this: [char...@test-zookeeper001 zookeeper-current]$ bin/zkServer.sh status JMX enabled by default Using config: /services/zookeeper/zookeeper-20100412.1/bin/../conf/zoo.cfg 2010-04-14 22:47:35,574 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - Accepted socket connection from /127.0.0.1:40287 2010-04-14 22:47:35,576 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@968] - Processing stat command from /127.0.0.1:40287 2010-04-14 22:47:35,577 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-14 22:47:35,578 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket connection for client /127.0.0.1:40287 (no session established for client) Error contacting service. It is probably not running. [char...@test-zookeeper001 zookeeper-current]$ 2010-04-14 22:47:35,580 - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1310] - ignoring exception during input shutdown java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306) at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) If I connect with zkCli.sh, I can list the contents of zookeeper. If I make changes to the schema on either of the other two nodes, test-zookeeper002 and test-zookeeper003, both of which are running 3.2.1, the changes are reflected on test-zookeeper001, which is running 3.3.0. When I exit zkCli.sh, however, zkServer.sh status starts flapping between Error contacting service. It is probably not running. and Mode: follower, as you can see below. Any ideas? I'd really rather not have to take the production zookeeper cluster down to upgrade if it's not necessary. Thanks, Charity. [char...@test-zookeeper001 zookeeper-current]$ bin/zkServer.sh status JMX enabled by default Using config: /services/zookeeper/zookeeper-20100412.1/bin/../conf/zoo.cfg 2010-04-14 22:53:16,848 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioservercnxn$fact...@251] - Accepted socket connection from /127.0.0.1:55284 2010-04-14 22:53:16,849 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@968] - Processing stat command from /127.0.0.1:55284 2010-04-14 22:53:16,849 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-14 22:53:16,850 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1286] - Closed socket connection for client /127.0.0.1:55284 (no session established for client) Error contacting service. It is probably not running. 2010-04-14 22:53:16,850 - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:nioserverc...@1310] - ignoring exception during input shutdown java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1306) at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1263) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:609) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
Re: feed queue fetcher with hadoop/zookeeper/gearman?
Hi Thomas, There are a couple of projects inside Yahoo! that use ZooKeeper as an event manager for feed processing. I am little bit unclear on your example below. As I understand it- 1. There are 1 million feeds that will be stored in Hbase. 2. A map reduce job will be run on these feeds to find out which feeds need to be fetched. 3. This will create queues in ZooKeeper to fetch the feeds 4. Workers will pull items from this queue and process feeds Did I understand it correctly? Also, if above is the case, how many queue items would you anticipate be accumulated every hour? Thanks mahadev On 4/12/10 1:21 AM, Thomas Koch tho...@koch.ro wrote: Hi, I'd like to implement a feed loader with Hadoop and most likely HBase. I've got around 1 million feeds, that should be loaded and checked for new entries. However the feeds have different priorities based on their average update frequency in the past and their relevance. The feeds (url, last_fetched timestamp, priority) are stored in HBase. How could I implement the fetch queue for the loaders? - An hourly map-reduce job to produce new queues for each node and save them on the nodes? - but how to know, which feeds have been fetched in the last hour? - what to do, if a fetch node dies? - Store a fetch queue in zookeeper and add to the queue with map-reduce each hour? - Isn't that too much load for zookeeper? (I could make one znode for a bunch of urls...?) - Use gearman to store the fetch queue? - But the gearman job server still seems to be a SPOF [1] http://gearman.org Thank you! Thomas Koch, http://www.koch.ro
Re: Errors while running sytest
Great. I was just responding with a different soln: '--- Looks like the fatjar does not include junit class. Also, the -jar option does not use the classpath environment variable. Here is an excerpt from the man page of java: -jar Execute a program encapsulated in a JAR archive. The first argument is the name of a JAR file instead of a startup class name. In order for this option to work, the manifest of the JAR file must When you use this option, the JAR file is the source of all user classes, and other user class path settings are ignored. So you will have to use the main class in fatjar with the java -classpath option with all the libraries in the classpath. Java -cp log4j:junit:fatjar org.apache.zookeeper.util.FatJarMain server ... But putting it in build and including it as part of fatjar is much more convenient!!! Thanks mahadev On 4/7/10 1:09 PM, Vishal K vishalm...@gmail.com wrote: Hi, It works for me now. Just for the record, I had to copy junit*.jar to buil/lib because fat.jar expects it to be there. Then, I had to rebuild fatjar.jar. On Wed, Apr 7, 2010 at 12:10 AM, Vishal K vishalm...@gmail.com wrote: Hi, I am trying to run systest on a 3 node cluster ( http://svn.apache.org/repos/asf/hadoop/zookeeper/trunk/src/java/systest/READM E.txt ). When I reach the 4th step which is to actually run the test I get exception shown below. Exception in thread main java.lang.NoClassDefFoundError: junit/framework/TestC ase at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:14 1) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at org.apache.zookeeper.util.FatJarMain.main(FatJarMain.java:97) Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 15 more Looks like it is not able to find classes in junit. However, my classpath is set right: :/opt/zookeeper-3.3.0/zookeeper.jar:/opt/zookeeper-3.3.0/lib/junit-4.4.jar:/o pt/ zookeeper-3.3.0/lib/log4j-1.2.15.jar:/opt/zookeeper-3.3.0/build/test/lib/juni t-4.8.1.jar Any suggestions how I can get around this problem? Thanks.
Re: deleting a node - command line tool
Hi Karthik, You can use bin/zkCli.sh which provides a nice command line shell interface for executing commands. Thanks mahadev On 3/26/10 9:42 AM, Karthik K oss@gmail.com wrote: Hi - I am looking to delete a node (say, /katta) from a running zk ensemble altogether and curious if there is any command-line tool that is available that can do a delete. -- Karthik.
Re: Zookeeper unit tester?
Hi David, We don't really have a mock test ZooKeeper client which does not do any I/O. We have been thinking about using mockito sometime soon to use for this kind of testing, but currently there is none. Thanks mahadev On 3/9/10 2:23 PM, David Rosenstrauch dar...@darose.net wrote: Just wondering if there was a mock/fake version of org.apache.zookeeper.Zookeeper that could be used for unit testing? What I'm envisioning would be a single instance Zookeeper that operates completely in memory, with no network or disk I/O. This would make it possible to pass one of the memory-only FakeZookeeper's into unit tests, while using a real Zookeeper in production code. Any such animal? :-) Thanks, DR
Re: Managing multi-site clusters with Zookeeper
HI Martin, The results would be really nice information to have on ZooKeeper wiki. Would be very helpful for others considering the same kind of deployment. So, do send out your results on the list. Thanks mahadev On 3/8/10 11:18 AM, Martin Waite waite@googlemail.com wrote: Hi Patrick, Thanks for you input. I am planning on having 3 zk servers per data centre, with perhaps only 2 in the tie-breaker site. The traffic between zk and the applications will be lots of local reads - who is the primary database ?. Changes to the config will be rare (server rebuilds, etc - ie. planned changes) or caused by server / network / site failure. The interesting thing in my mind is how zookeeper will cope with inter-site link failure - how quickly the remote sites will notice, and how quickly normality can be resumed when the link reappears. I need to get this running in the lab and start pulling out wires. regards, Martin On 8 March 2010 17:39, Patrick Hunt ph...@apache.org wrote: IMO latency is the primary issue you will face, but also keep in mind reliability w/in a colo. Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in each colo, you will be reliable but clients w/in each colo will have to connect to a remote colo if the local fails. You will want to prioritize the local colo given that reads can be serviced entirely local that way. If you have 7 servers (2-2-3) that would be better - if a local server fails you have a redundant, if both fail then you go remote. You want to keep your writes as few as possible and as small as possible? Why? Say you have 100ms latency btw colos, let's go through a scenario for a client in a colo where the local servers are not the leader (zk cluster leader). read: 1) client reads a znode from local server 2) local server (usually 1ms if in colo comm) responds in 1ms write: 1) client writes a znode to local server A 2) A proposes change to the ZK Leader (L) in remote colo 3) L gets the proposal in 100ms 4) L proposes the change to all followers 5) all followers (not exactly, but hopefully) get the proposal in 100ms 6) followers ack the change 7) L gets the acks in 100ms 8) L commits the change (message to all followers) 9) A gets the commit in 100ms 10) A responds to client ( 1ms) write latency: 100 + 100 + 100 + 100 = 400ms Obviously keeping these writes small is also critical. Patrick Martin Waite wrote: Hi Ted, If the links do not work for us for zk, then they are unlikely to work with any other solution - such as trying to stretch Pacemaker or Red Hat Cluster with their multicast protocols across the links. If the links are not good enough, we might have to spend some more money to fix this. regards, Martin On 8 March 2010 02:14, Ted Dunning ted.dunn...@gmail.com wrote: If you can stand the latency for updates then zk should work well for you. It is unlikely that you will be able to better than zk does and still maintain correctness. Do note that you can, probalbly bias client to use a local server. That should make things more efficient. Sent from my iPhone On Mar 7, 2010, at 3:00 PM, Mahadev Konar maha...@yahoo-inc.com wrote: The inter-site links are a nuisance. We have two data-centres with 100Mb links which I hope would be good enough for most uses, but we need a 3rd site - and currently that only has 2Mb links to the other sites. This might be a problem.
Re: Managing multi-site clusters with Zookeeper
Hi Martin, As Ted rightly mentions that ZooKeeper usually is run within a colo because of the low latency requirements of applications that it supports. Its definitely reasnoble to use it in a multi data center environments but you should realize the implications of it. The high latency/low throughput means that you should make minimal use of such a ZooKeeper ensemble. Also, there are things like the tick Time, the syncLimit and others (setup parameters for ZooKeeper in config) which you will need to tune a little to get ZooKeeper running without many hiccups in this environment. Thanks mahadev On 3/6/10 10:29 AM, Ted Dunning ted.dunn...@gmail.com wrote: What you describe is relatively reasonable, even though Zookeeper is not normally distributed across multiple data centers with all members getting full votes. If you account for the limited throughput that this will impose on your applications that use ZK, then I think that this can work well. Probably, you would have local ZK clusters for higher transaction rate applications. You should also consider very carefully whether having multiple data centers increases or decreases your overall reliability. Unless you design very carefully, this will normally substantially degrade reliability. Making sure that it increases reliability is a really big task that involves a lot of surprising (it was to me) considerations and considerable hardware and time investments. Good luck! On Sat, Mar 6, 2010 at 1:50 AM, Martin Waite waite@googlemail.comwrote: Is this a viable approach, or am I taking Zookeeper out of its application domain and just asking for trouble ?
Re: Managing multi-site clusters with Zookeeper
Martin, 2Mb link might certainly be a problem. We can refer to these nodes as ZooKeeper servers. Znodes is used to data elements in the ZooKeeper data tree. The Zookeeper ensemble has minimal traffic which is basically health checks between the members of the ensemble. We call one of the members as Leader who is leading the ensemble and the others as Followers. The Leader does periodic health checks to see if the Followers are doing fine. This is of the order of 1KB/sec. There is some traffic when the leader election within the ensemble happens. This might be of the order of 1-2KB/sec. As you mentioned the reads happen locally. So, a good enough link within the ensemble members is important so that these followers can be up to date with the Leader. But again looking at your config, looks like its mostly read only traffic. One more thing you should be aware of: Lets says a ephemeral node was created and the client died, then the clients connected to the slow ZooKeeper server (with 2Mb/s links) would lag behind the other clients connected to the other servers. As per my opinion you should do some testing since 2Mb/sec seems a little dodgy. Thanks mahadev On 3/7/10 2:09 PM, Martin Waite waite@googlemail.com wrote: Hi Mahadev, The inter-site links are a nuisance. We have two data-centres with 100Mb links which I hope would be good enough for most uses, but we need a 3rd site - and currently that only has 2Mb links to the other sites. This might be a problem. The ensemble would have a lot of read traffic from applications asking which database to connect to for each transaction - which presumably would be mostly handled by local zookeeper servers (do we call these nodes as opposed to znodes ?). The write traffic would be mostly changes to configuration (a rare event), and changes in the health of database servers - also hopefully rare. I suppose the main concern is how much ambient zookeeper system chatter will cross the links. Are there any measurements of how much traffic is used by zookeeper in maintaining the ensemble ? Another question that occurs is whether I can link sites A,B, and C in a ring - so that if any one site drops out, the remaining 2 continue to talk. I suppose that if the zookeeper servers are all in direct contact with each other, this issue does not exist. regards, Martin On 7 March 2010 21:43, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Martin, As Ted rightly mentions that ZooKeeper usually is run within a colo because of the low latency requirements of applications that it supports. Its definitely reasnoble to use it in a multi data center environments but you should realize the implications of it. The high latency/low throughput means that you should make minimal use of such a ZooKeeper ensemble. Also, there are things like the tick Time, the syncLimit and others (setup parameters for ZooKeeper in config) which you will need to tune a little to get ZooKeeper running without many hiccups in this environment. Thanks mahadev On 3/6/10 10:29 AM, Ted Dunning ted.dunn...@gmail.com wrote: What you describe is relatively reasonable, even though Zookeeper is not normally distributed across multiple data centers with all members getting full votes. If you account for the limited throughput that this will impose on your applications that use ZK, then I think that this can work well. Probably, you would have local ZK clusters for higher transaction rate applications. You should also consider very carefully whether having multiple data centers increases or decreases your overall reliability. Unless you design very carefully, this will normally substantially degrade reliability. Making sure that it increases reliability is a really big task that involves a lot of surprising (it was to me) considerations and considerable hardware and time investments. Good luck! On Sat, Mar 6, 2010 at 1:50 AM, Martin Waite waite@googlemail.com wrote: Is this a viable approach, or am I taking Zookeeper out of its application domain and just asking for trouble ?
Re: zookeeper utils
Hi David, There is an implementation for locks and queues in src/recipes. The documentation residres in src/recipes/{lock/queue}/README.txt. Thanks mahadev On 3/2/10 1:04 PM, David Rosenstrauch dar...@darose.net wrote: Was reading through the zookeeper docs on the web - specifically the recipes and solutions page (as well as comments elsewhere inviting additional such contributions from the community) and was wondering: Is there a library of higher-level zookeeper utilities that people have contributed, beyond the barrier and queue examples provided in the docs? Thanks, DR
Re: is there a good pattern for leases ?
I am not sure if I was clear enoguh in my last message. What is suggested was this: Create a client with a timeout of lets say 10 seconds! Zookeeper zk = new ZooKeeper(1); (for brevity ignoring other parameters) Zk.create(/parent/ephemeral, data, EPEMERAL); //create a another thread that triggeers at 120 seconds On a trigger from this thread call zk.delete(/parent/ephemeral); That's how lease can be done at the application side. Obviously your lease expires on a session close and other events as well, you need to be monitoring. Thanks mahadev On 2/24/10 11:09 AM, Martin Waite waite@googlemail.com wrote: Hi Mahadev, That is interesting. All I need to do is hold the connection for the required time of a session that created an ephemeral node. Zookeeper is an interesting tool. Thanks again, Martin On 24 February 2010 17:00, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Martin, There isnt an inherent model for leases in the zookeeper library itself. To implement leases you will have to implement them at your application side with timeouts triggers (lease triggers) leading to session close at the client. Thanks mahadev On 2/24/10 3:40 AM, Martin Waite waite@googlemail.com wrote: Hi, Is there a good model for implementing leases in Zookeeper ? What I want to achieve is for a client to create a lock, and for that lock to disappear two minutes later - regardless of whether the client is still connected to zk. Like ephemeral nodes - but with a time delay. regards, Martin
Re: how to lock one-of-many ?
Hi martin, Currently you cannot access the server that the client is connected to. This was fixed in this jira http://issues.apache.org/jira/browse/ZOOKEEPER-544 But again this does not tell you if you are connected to the primary or the other followers. So you will anyway have to do some manual testing with specifying the client host:port address as just the primary or just the follower (for the follower test case). Leaking information like (if the server is primary or not) can cause applications to use this information in a wrong way. So we never exposed this information! :) Thanks mahadev On 2/24/10 11:25 AM, Martin Waite waite@googlemail.com wrote: Hi, I take the point that the watch is useful for stopping clients unnecessarily pestering the zk nodes. I think that this is something I will have to experiment with and see how it goes. I only need to place about 10k locks per minute, so I am hoping that whatever approach I take is well within the headroom of Zookeeper on some reasonable boxes. Is it possible for the client to know whether it has connected to the current primary or not ? During my testing I would like to make sure that the approach works both when the client is attached to the primary and when attached to a lagged non-primary node. regards, Martin On 24 February 2010 18:42, Ted Dunning ted.dunn...@gmail.com wrote: Random back-off like this is unlikely to succeed (seems to me). Better to use the watch on the locks directory to make the wait as long as possible AND as short as possible. On Wed, Feb 24, 2010 at 8:53 AM, Patrick Hunt ph...@apache.org wrote: Anyone interested in locking an explicit resource attempts to create an ephemeral node in /locks with the same ### as they resource they want access to. If interested in just getting any resource then you would getchildren(/resources) and getchildren(/locks) and attempt to lock anything not in the intersection (avail). This could be done efficiently since resources won't change much, just cache the results of getchildren and set a watch at the same time. To lock a resource randomize avail and attempt to lock each in turn. If all avail fail to acq the lock, then have some random holdoff time, then re-getchildren(locks) and start over. -- Ted Dunning, CTO DeepDyve
Re: how to lock one-of-many ?
Hi Martin, How about this- you have resources in the a directory (say /locks) each process which needs to lock, lists all the children of this directory and then creates an ephemeral node called /locks/resource1/lock depending on which resource it wants to lock. This ephemeral node will be deleted by the process as soon as its done using the resource. A process should only use to resource_{i} if its been able to create /locks/resource_{i}/locks. Would that work? Thanks mahadev On 2/23/10 4:05 AM, Martin Waite waite@googlemail.com wrote: Hi, I have a set of resources each of which has a unique identifier. Each resource element must be locked before it is used, and unlocked afterwards. The logic of the application is something like: lock any one element; if (none locked) then exit with error; else get resource-id from lock use resource unlock resource end Zookeeper looks like a good candidate for managing these locks, being fast and resilient, and it seems quite simple to recover from client failure. However, I cannot think of a good way to implement this sort of one-of-many locking. I could create a directory called available and another called locked. Available would have one entry for each resource id ( or one entry containing a list of the resource-ids). For locking, I could loop through the available ids, attempting to create a lock for that in the locked directory. However this seems a bit clumsy and slow. Also, the locks are held for a relatively short time (1 second on average), and by time I have blundered through all the possible locks, ids that were locked at the start might be available by time I finished. Can anyone think of a more elegant and efficient way of doing this ? regards, Martin
Re: Bit of help debugging a TIMED OUT session please
HI stack, the other interesting part is with the session: 0x26ed968d880001 Looks like it gets disconnected from one of the servers (TIMEOUT). DO you see any of these messages: Attempting connection to server in the logs before you see all the consecutive org.apache.zookeeper.ClientCnxn: Exception closing session 0x26ed968d880001 to sun.nio.ch.selectionkeyi...@788ab708 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) and From the cient 0x26ed968d880001? Thanks mahadev On 2/22/10 11:42 AM, Stack st...@duboce.net wrote: The thing that seems odd to me is that the connectivity complaints are out of the zk client, right?, why is it failing getting to member 14 and why not move to another ensemble member if issue w/ 14?, and if there were a general connectivity issue, I'd think that the running hbase cluster would be complaining at about the same time (its talking to datanodes and masters at this time). (Thanks for the input lads) St.Ack On Mon, Feb 22, 2010 at 11:26 AM, Mahadev Konar maha...@yahoo-inc.com wrote: I also looked at the logs. Ted might have a point. It does look like that zookeeper server's are doing fine (though as ted mentions the skew is a little concerning, though that might be due to very few packets served by the first server). Other than that the latencies of 300 ms at max should not cause any timeouts. Also, the number of packets received is pretty low - meaning that it wasn't serving huge traffic. Is there anyway we can check if the network connection from the client to the server is not flaky? Thanks mahadev On 2/22/10 10:40 AM, Ted Dunning ted.dunn...@gmail.com wrote: Not sure this helps at all, but these times are remarkably asymmetrical. I would expect members of a ZK cluster to have very comparable times. Additionally, 345 ms is nowhere near large enough to cause a session to expire. My take is that ZK doesn't think it caused the timeout. On Mon, Feb 22, 2010 at 10:18 AM, Stack st...@duboce.net wrote: Latency min/avg/max: 2/125/345 ... Latency min/avg/max: 0/7/81 ... Latency min/avg/max: 1/1/1 Thanks for any pointers on how to debug.
Re: Ordering guarantees for async callbacks vs watchers
Hi martin, a call like getchildren(final String path, Watcher watcher, ChildrenCallback cb, Object ctx) Means that set a watch on this node for any further changes on the server. A client will see the response to getchildren data before the above watch is fired. Hope that helps. Thanks mahadev On 2/10/10 6:59 PM, Martin Traverso mtrave...@gmail.com wrote: What are the ordering guarantees for asynchronous callbacks vs watcher notifications (Java API) when both are used in the same call? E.g., for getChildren(final String path, Watcher watcher, ChildrenCallback cb, Object ctx) Will the callback always be invoked before the watcher if there is a state change on the server at about the same time the call is made? I *think* that's what's implied by the documentation, but I'm not sure I'm reading it right: All completions for asynchronous calls and watcher callbacks will be made in order, one at a time. The caller can do any processing they wish, but no other callbacks will be processed during that time. ( http://hadoop.apache.org/zookeeper/docs/r3.2.2/zookeeperProgrammers.html#Java+ Binding ) Thanks! Martin
ZOOKEEPER-22 and release 3.3
Hi all, I had been working on zookeeper-22 and found out that it needs quite a few extensive changes. We will need to do some memory measurements to see if it has any memory impacts or not. Since we are targetting 3.3 release for early march, ZOOKEEPER-22 would be hard to get into 3.3. I am proposing to move it to a later release (3.4), so that it can be tested early in the release phase and gets baked in the release. Thanks mahadev
Re: Q about ZK internal: how commit is being remembered
Qian, ZooKeeper gurantees that if a client sees some transaction response, then it will persist but the one's that a client does not see might be discarded or committed. So in case a quorum does not log the transaction, there might be a case wherein a zookeeper server which does not have the logged transaction becomes the leader (because the machines with the logged transaction are down). In that case the transaction is discarded. In a case when a machine which has the logged transaction becomes the leader that transaction will be committed. Hope that clear your doubt. mahadev On 1/28/10 6:02 PM, Qian Ye yeqian@gmail.com wrote: Thanks henry and ben, actually I have read the paper henry mentioned in this mail, but I'm still not so clear with some of the details. Anyway, maybe more study on the source code can help me understanding. Since Ben said that, if less than a quorum of servers have accepted a transaction, we can commit or discard. Would this feature cause any unexpected problem? Can you give some hints about this issue? On Fri, Jan 29, 2010 at 1:09 AM, Benjamin Reed br...@yahoo-inc.com wrote: henry is correct. just to state another way, Zab guarantees that if a quorum of servers have accepted a transaction, the transaction will commit. this means that if less than a quorum of servers have accepted a transaction, we can commit or discard. the only constraint we have in choosing is ordering. we have to decide which partially accepted transactions are going to be committed and which discarded before we propose any new messages so that ordering is preserved. ben Henry Robinson wrote: Hi - Note that a machine that has the highest received zxid will necessarily have seen the most recent transaction that was logged by a quorum of followers (the FIFO property of TCP again ensures that all previous messages will have been seen). This is the property that ZAB needs to preserve. The idea is to avoid missing a commit that went to a node that has since failed. I was therefore slightly imprecise in my previous mail - it's possible for only partially-proposed proposals to be committed if the leader that is elected next has seen them. Only when another proposal is committed instead must the original proposal be discarded. I highly recommend Ben Reed's and Flavio Junqueira's LADIS paper on the subject, for those with portal.acm.org access: http://portal.acm.org/citation.cfm?id=1529978 Henry On 27 January 2010 21:52, Qian Ye yeqian@gmail.com wrote: Hi Henry: According to your explanation, *ZAB makes the guarantee that a proposal which has been logged by a quorum of followers will eventually be committed* , however, the source code of Zookeeper, the FastLeaderElection.java file, shows that, in the election, the candidates only provide their zxid in the votes, the one with the max zxid would win the election. I mean, it seems that no check has been made to make sure whether the latest proposal has been logged by a quorum of servers. In this situation, the zookeeper would deliver a proposal, which is known as a failed one by the client. Imagine this scenario, a zookeeper cluster with 5 servers, Leader only receives 1 ack for proposal A, after a timeout, the client is told that the proposal failed. At this time, all servers restart due to a power failure. The server have the log of proposal A would be the leader, however, the client is told the proposal A failed. Do I misunderstand this? On Wed, Jan 27, 2010 at 10:37 AM, Henry Robinson he...@cloudera.com wrote: Qing - That part of the documentation is slightly confusing. The elected leader must have the highest zxid that has been written to disk by a quorum of followers. ZAB makes the guarantee that a proposal which has been logged by a quorum of followers will eventually be committed. Conversely, any proposals that *don't* get logged by a quorum before the leader sending them dies will not be committed. One of the ZAB papers covers both these situations - making sure proposals are committed or skipped at the right moments. So you get the neat property that leader election can be live in exactly the case where the ZK cluster is live. If a quorum of peers aren't available to elect the leader, the resulting cluster won't be live anyhow, so it's ok for leader election to fail. FLP impossibility isn't actually strictly relevant for ZAB, because FLP requires that message reordering is possible (see all the stuff in that paper about non-deterministically drawing messages from a potentially deliverable set). TCP FIFO channels don't reorder, so provide the extra signalling that ZAB requires. cheers, Henry 2010/1/26 Qing Yan qing...@gmail.com Hi, I have question about how zookeeper *remembers* a commit operation. According to
Re: Server exception when closing session
Hi Josh, This warning is not of any concern. Just a quick question, is there any reason for you to runn the server on a DEBUG level? Thanks mahadev On 1/22/10 5:19 PM, Josh Scheid jsch...@velocetechnologies.com wrote: Is it normal for client session close() to cause a server exception? Things seem to work, but the WARN is a bit disconcerting. 2010-01-22 17:15:01,573 - WARN [NIOServerCxn.Factory:2181:nioserverc...@518] - Exception causing close of session 0x126571af282114b due to java.io.IOException: Read error 2010-01-22 17:15:01,573 - DEBUG [NIOServerCxn.Factory:2181:nioserverc...@521] - IOException stack trace java.io.IOException: Read error at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:396) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:239) 2010-01-22 17:15:01,573 - INFO [NIOServerCxn.Factory:2181:nioserverc...@857] - closing session:0x126571af282114b NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/10.66.16.96:2181 remote=/10.66.24.94:59591] 2010-01-22 17:15:01,573 - INFO [ProcessThread:-1:preprequestproces...@384] - Processed session termination request for id: 0x126571af282114b 2010-01-22 17:15:01,583 - DEBUG [SyncThread:0:finalrequestproces...@74] - Processing request:: sessionid:0x126571af282114b type:closeSession cxid:0x4b5a4d95 zxid:0x43f3 txntype:-11 n/a 2010-01-22 17:15:01,583 - DEBUG [SyncThread:0:finalrequestproces...@147] - sessionid:0x126571af282114b type:closeSession cxid:0x4b5a4d95 zxid:0x43f3 txntype:-11 n/a zk 3.2.2. Client is using zkpython. Nothing is otherwise abnormal. I can just connect, then close the session and this occurs. -Josh
Re: Server exception when closing session
HI Josh, The server latency does seem huge. What os and hardware are you running it on? What is usage model of zookeeper? How much memory are you allocating to the server? The debug well exacerbate the problem. A dedicated disk means the following: Zookeeper has snapshots and transaction logs. The datadir is the directory that stores the transaction logs. Its highly recommended that this directory be on a separate disk that isnt being used by any other process. The snapshots can sit on a disk that is being used by the OS and can be shared. Also, Pat ran some tests for serve lantecies at: http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview You can take a look at that as well and see what the expected performance should be for your workload. Thanks mahadev On 1/22/10 5:40 PM, Josh Scheid jsch...@velocetechnologies.com wrote: On Fri, Jan 22, 2010 at 17:22, Mahadev Konar maha...@yahoo-inc.com wrote: This warning is not of any concern. OK. I'm used to warnings being things that must be addressed. I'll ignore this one in the future. Just a quick question, is there any reason for you to runn the server on a DEBUG level? We're having issues with server latency. Client default timeout of 1ms gets hit. I saw a stat output showing a 16s max latency today. Is DEBUG going to exacerbate that? Of the recommendations I've seen, the one I can't yet follow is a dedicated disk: dataDir is in the root partition of the server right now. -Josh
Re: Question regarding Membership Election
Hi Vijay, Unfortunately you wont be able to keep running the observer in the other DC if the quorum in the DC 1 is dead. Most of the folks we have talked to also want to avoid voiting across colos. They usually run two instances of Zookeeper in 2 DC's and copy state of zookeeper (using a bridge) across colos to keep them in sync. Usually the data requirement across colos is very small and they are usually able to do that by copying data across with there own bridge process. Hope that helps. Thanks mahadev On 1/14/10 12:12 PM, Vijay vijay2...@gmail.com wrote: Hi, I read about observers in other datacenter, My question is i dont want voting across the datacenters (So i will use observers), at the same time when a DC goes down i dont want to loose the cluster, whats the solution for it? I have to have 3 nodes in primary DC to accept 1 node failure. Thats fine... but what about the other DC? how many nodes and how will i make it work? Regards, /VJ
Re: Namespace partitioning ?
Hi kay, the namespace partitioning in zookeeper has been on a back burner for a long time. There isnt any jira open on it. There had been some discussions on this but no real work. Flavio/Ben have had this on there minds for a while but no real work/proposal is out yet. May I know is this something you are looking for in production? Thanks mahadev On 1/14/10 3:38 PM, Kay Kay kaykay.uni...@gmail.com wrote: Digging up some old tickets + search results - I am trying to understand what current state is , w.r.t support for namespace partitioning in zookeeper. Is it already in / any tickets-mailing lists to understand the current state.
Re: Killing a zookeeper server
Hi Adam, That seems fair to file as an improvement. Running 'stat' did return the right stats right? Saying the servers werent able to elect a leader? mahadev On 1/13/10 11:52 AM, Adam Rosien a...@rosien.net wrote: On a related note, it was initially confusing to me that the server returned 'imok' when it wasn't part of the quorum. I realize the internal checks are probably in separate areas of the code, but if others feel similarly I could file an improvement in JIRA. .. Adam On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey ni...@mailtrust.com wrote: So the solution for us was to just nuke zookeeper and restart everywhere. We will also be upgrading soon as well. To answer your question, yes I believe all the servers were running normally except for the fact that they were experiencing high CPU usage. As we began to see some CPU alerts I started restarting some of the servers. It was then that we noticed that they were not actually running according to 'stat'. I still have the log from one server with a debug level and the rest with a warn level. If you would like to see any of these and analyze them just let me know. Thanks for the help, Nick Bailey On Jan 12, 2010, at 8:20 PM, Patrick Hunt ph...@apache.org wrote: Nick Bailey wrote: In my last email I failded to include a log line that may be revelent as well 2010-01-12 18:33:10,658 [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager) DEBUG - Queue size: 0 2010-01-12 18:33:10,659 [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO - Notification time out: 6400 Yes, that is significant/interesting. I believe this means that there is some problem with the election process (ie the server re-joining the ensemble). We have a backoff on these attempts, which matches your description below. We have fixed some election issues in recent versions (we introduced fault injection testing prior to the 3.2.1 release which found a few issues with election). I don't have them off hand - but I've asked Flavio to comment directly (he's in diff tz). Can you provide a bit more background: prior to this issue, this particular server was running fine? You restarted it and then started seeing the issue? (rather than this being a new server I mean). What I'm getting at is that there shouldn't/couldn't be any networking/firewall type issue going on right? Can you provide a full/more log? What I'd suggest is shut down this one server, clear the log4j log file, then restart it. Let the problem reproduce, then gzip the log4j log file and attach to your response. Ok? Patrick We see this line occur frequently and the timeout will graduatlly increase to 6. It appears that all of our servers that seem to be acting normally are experiencing the cpu issue I mentioned earlier 'https://issues.apache.org/jira/browse/ZOOKEEPER-427'. Perhaps that is causing the timeout in responding? Also to answer your other questions Patrick, we aren't storing a large amount of data really and network latency appears fine. Thanks for the help, Nick -Original Message- From: Nick Bailey nicholas.bai...@rackspace.com Sent: Tuesday, January 12, 2010 6:03pm To: zookeeper-user@hadoop.apache.org Subject: Re: Killing a zookeeper server 12 was just to keep uniformity on our servers. Our clients are connecting from the same 12 servers. Easily modifiable and perhaps we should look into changing that. The logs just seem to indicate that the servers that claim to have no server running are continually attempting to elect a leader. A sample is provided below. The initial exception is something we see regularly in our logs and the debug and info lines following are simply repeating throughout the log. 2010-01-12 17:55:02,269 [NIOServerCxn.Factory:2181] (NIOServerCnxn) WARN - Exception causing close of session 0x0 due to java.io.IOException: Read error 2010-01-12 17:55:02,269 [NIOServerCxn.Factory:2181] (NIOServerCnxn) DEBUG - IOException stack trace java.io.IOException: Read error at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:295) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:16 2) 2010-01-12 17:55:02,269 [NIOServerCxn.Factory:2181] (NIOServerCnxn) INFO - closing session:0x0 NIOServerCnxn: java.nio.channels.SocketChannel[connected local=/172.20.36.9:2181 remote=/172.20.36.9:50367] 2010-01-12 17:55:02,270 [NIOServerCxn.Factory:2181] (NIOServerCnxn) DEBUG - ignoring exception during input shutdown java.net.SocketException: Transport endpoint is not connected at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:767) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:421)
Re: Fetching sequential children
Hi ohad, there isnt a way to get a selected set of children from the servers. So you will have to get all of them and filter out the unwanted ones. Also, what Steve suggested in the other email might be useful for you. Thanks mahadev On 12/23/09 12:29 AM, Ohad Ben Porat o...@outbrain.com wrote: Hey, Under the main node of my application I have the following sequential children: mytest1, mytest2, mytest3, sometest1,sometest2,sometest3. Now, I want to get all children of my main node that starts with mytest, something like getChildren(/main/mytest*, false), is there a command for that? Or must I bring all children and filter out the unwanted ones? Ohad
Re: zkfuse
Hi Maarten, zkfuse does not have any support for acls. We havent had much time to focus on zkfuse. Create/read/write/delete/ls are all supported. It was built mostly for infrequent updates and more of a browsing interface on filesystem. I don't think zkfuse is being used in production anywhere. Would you mind elaborating your use case? Thanks mahadev On 11/24/09 11:14 AM, Maarten Koopmans maar...@vrijheid.net wrote: Hi, I just started using zkfuse, and this may very well suit my needs for now. Thumbs up to the ZooKeeper team! What operations are supported (i.e. what is the best use of zkfuse). I can see how files, dirs there creation and listing map quite nicely. ACLs? I have noticed two things on a fresk Ubuntu 9.10 (posting for future archive reference): - I *have* to run in debug mode (-d) - you have to add libboost or it won't compile Regards, Maarten
Bugfix release 3.2.2
Hi all, We are planning to make a bugfix release 3.2.2 which will include a critical bugfix in the c client code. The jira is ZOOKEEPER-562, http://issues.apache.org/jira/browse/ZOOKEEPER-562. If you would like some fix to be considered for this bugfix release please feel free to post on the zookeeper-dev list. Thanks Mahadev
Re: zookeeper viewer
Hi Hamoun, Can you please mention which link is broken? Are you a looking for a zookeeper tree browser? Pat created a dashboard for zookeeper at github. Below is the link: http://github.com/phunt/zookeeper_dashboard Also, there is an open jira for a zookeeper browser which you can try out - http://issues.apache.org/jira/browse/ZOOKEEPER-418 Hope this helps. Thanks mahadev On 10/24/09 4:18 PM, Hamoun gh hamoun...@gmail.com wrote: I am looking for the zookeeper viewer. seems the link is broken. can somebody please help? Thank you, Hamoun Ghanbari
Re: Restarting a single zookeeper Server on the same port within the process
Hi Siddharth, Usually the time of releasing the port is dependent on the OS. So you can try sleeping a few more seconds to see if the port has been released or it .. Or just poll on the port to see if its in use or not There isnt an easier way to restart on the same port. mahadev On 10/22/09 4:52 PM, Siddharth Raghavan siddhar...@audiencescience.com wrote: Hello, I need to restart a single zookeeper server node on the same port within my unit tests. I tried stopping the server, having a delay and restarting it on the same port. But the server doesn't startup. When I re-start on a different port, it starts up correctly. Can you let me know how I can make this one work. Thank you. Regards, Siddharth
Re: Cluster Configuration Issues
HI Mark, ZooKeeper does not create the myid file in the data directory. Looking at the config file it looks like it is missing the quorum configuration for other servers. Please take alook at http://hadoop.apache.org/zookeeper/docs/r3.2.1/zookeeperAdmin.html#sc_zkMuli tServerSetup You will need to add config options for other servers in the quorum in the config file. Thanks mahadev On 10/20/09 10:12 AM, Mark Vigeant mark.vige...@riskmetrics.com wrote: Hey- So I'm trying to run hbase on 4 nodes, and in order to do that I need to run zookeeper in replicated mode (I could have hbase run the quorum for me, but it's suggested that I don't). I have an issue though. For some reason the id I'm assigning each server in the file myid in the assigned data directory is not getting read. I feel like another id is being created and put somewhere else. Does anyone have any tips on starting a zookeeper quorum? Do I create the myid file myself or do I edit one once it is created by zookeeper? This is what my config looks like: ticktime=2000 dataDir=/home/hadoop/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=hadoop1:2888:3888 The name of my machine is hadoop1, with user name hadoop. In /home/hadoop/zookeeper I've created a myid file with the number 1 in it. Mark Vigeant RiskMetrics Group, Inc.
Re: specifying the location of zookeeper.log
Hi Leonard, You should be able to set the ZOO_LOG_DIR as an environment variable to get a different log directory. I think you are using bin/zkServer.sh to start the server? Also, please open a jira for this. It would be good to fix the documentation for this. Thanks mahadev On 10/16/09 11:04 AM, Leonard Cuff lc...@valueclick.com wrote: I¹ve read through the admin manual at http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_loggi ng and I don¹t see that there is any way to specify a location for the server¹s own log file. zookeeper.log appears in the bin directory, regardless of setting dataDir or dataLogDir in the configuration file. Am I overlooking something? Is there a way to have this file appear somewhere else? TIA, Leonard
Re: specifying the location of zookeeper.log
Hi Leonard, Looks like you are right. bin/zkServer.sh just logs the output to console, so you should be able to redirect to any file you want. No? Anyways this is a bug. Please open a jira for it. Thanks mahadev On 10/16/09 11:27 AM, Leonard Cuff lc...@valueclick.com wrote: I should have mentioned in my original email, but I had already tried setting ZOO_LOG_DIR as an environment variable. I am using zkServer.sh and I see where it passes ZOO_LOG_DIR as a parameter to the java invocation. java -Dzookeeper.log.dir=${ZOO_LOG_DIR} -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \ -cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG I double checked by echo'ing the value of ZOO_LOG_DIR just before the java command. It's set correctly ... but it has no effect on the location of zookeeper.log :-( Leonard On 10/16/09 11:08 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Leonard, You should be able to set the ZOO_LOG_DIR as an environment variable to get a different log directory. I think you are using bin/zkServer.sh to start the server? Also, please open a jira for this. It would be good to fix the documentation for this. Thanks mahadev On 10/16/09 11:04 AM, Leonard Cuff lc...@valueclick.com wrote: I¹ve read through the admin manual at http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_loggi ng and I don¹t see that there is any way to specify a location for the server¹s own log file. zookeeper.log appears in the bin directory, regardless of setting dataDir or dataLogDir in the configuration file. Am I overlooking something? Is there a way to have this file appear somewhere else? TIA, Leonard
Re: specifying the location of zookeeper.log
Sorry some misinformation from my side. You can actually change the log4j properties to get it to write to a file. Using the following in your log4j properties file log4j.rootLogger=INFO, FILE log4j.appender.FILE=org.apache.log4j.FileAppender log4j.appender.FILE.File={$dir}/zoo.log log4j.appender.FILE.layout=org.apache.log4j.PatternLayout log4j.appender.FILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%c...@%l] - %m%n Will let you log to the output directory $dir. Hope that helps! mahadev On 10/16/09 11:35 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Leonard, Looks like you are right. bin/zkServer.sh just logs the output to console, so you should be able to redirect to any file you want. No? Anyways this is a bug. Please open a jira for it. Thanks mahadev On 10/16/09 11:27 AM, Leonard Cuff lc...@valueclick.com wrote: I should have mentioned in my original email, but I had already tried setting ZOO_LOG_DIR as an environment variable. I am using zkServer.sh and I see where it passes ZOO_LOG_DIR as a parameter to the java invocation. java -Dzookeeper.log.dir=${ZOO_LOG_DIR} -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \ -cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG I double checked by echo'ing the value of ZOO_LOG_DIR just before the java command. It's set correctly ... but it has no effect on the location of zookeeper.log :-( Leonard On 10/16/09 11:08 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Leonard, You should be able to set the ZOO_LOG_DIR as an environment variable to get a different log directory. I think you are using bin/zkServer.sh to start the server? Also, please open a jira for this. It would be good to fix the documentation for this. Thanks mahadev On 10/16/09 11:04 AM, Leonard Cuff lc...@valueclick.com wrote: I¹ve read through the admin manual at http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_logg i ng and I don¹t see that there is any way to specify a location for the server¹s own log file. zookeeper.log appears in the bin directory, regardless of setting dataDir or dataLogDir in the configuration file. Am I overlooking something? Is there a way to have this file appear somewhere else? TIA, Leonard
Re: specifying the location of zookeeper.log
I just realized that as well :) Sent out an email already! mahadev On 10/16/09 11:56 AM, Leonard Cuff lc...@valueclick.com wrote: Your comment that the output goes to the console made me realize this is configurable via the log4jproperties file, and that I'd configured it long ago and forgotten that I'd done so. Thanks for your attention. Leonard On 10/16/09 11:35 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Leonard, Looks like you are right. bin/zkServer.sh just logs the output to console, so you should be able to redirect to any file you want. No? Anyways this is a bug. Please open a jira for it. Thanks mahadev On 10/16/09 11:27 AM, Leonard Cuff lc...@valueclick.com wrote: I should have mentioned in my original email, but I had already tried setting ZOO_LOG_DIR as an environment variable. I am using zkServer.sh and I see where it passes ZOO_LOG_DIR as a parameter to the java invocation. java -Dzookeeper.log.dir=${ZOO_LOG_DIR} -Dzookeeper.root.logger=${ZOO_LOG4J_PROP} \ -cp $CLASSPATH $JVMFLAGS $ZOOMAIN $ZOOCFG I double checked by echo'ing the value of ZOO_LOG_DIR just before the java command. It's set correctly ... but it has no effect on the location of zookeeper.log :-( Leonard On 10/16/09 11:08 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Leonard, You should be able to set the ZOO_LOG_DIR as an environment variable to get a different log directory. I think you are using bin/zkServer.sh to start the server? Also, please open a jira for this. It would be good to fix the documentation for this. Thanks mahadev On 10/16/09 11:04 AM, Leonard Cuff lc...@valueclick.com wrote: I¹ve read through the admin manual at http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_logg i ng and I don¹t see that there is any way to specify a location for the server¹s own log file. zookeeper.log appears in the bin directory, regardless of setting dataDir or dataLogDir in the configuration file. Am I overlooking something? Is there a way to have this file appear somewhere else? TIA, Leonard
Re: Start problem of Running Replicated ZooKeeper
Hi Le, Is there some chance of the these servers not being able to talk to each other? IS the zookeeper prcoess running on debian-1? What error do you see on debian-1? The connection refused error suggests that debian-0 is not able to talk to debian-1 machine. Thanks mahadev On 9/23/09 2:41 AM, Le Zhou lezhouy...@gmail.com wrote: Hi, I'm trying to install HBase 0.20.0 in fully distributed mode on my cluster. As HBase depends on Zookeeper, I have to know first how to make Zookeeper work. I download the release 3.2.1 and install it on each machine in my cluster. Zookeeper in standalone mode works well on each machine in my cluster. I follow the Zookeeper Getting Started Guide and get expected output. Then I come to the Running replicated zookeeper On each machine in my cluster(debian-0, debian-1, debian-5), I append the following lines to zoo.cfg, and create in dataDir a myid which contains the server id(1 for debian-0, 2 for debian-1, 3 for debian-5). server.1=debian-0:2888:3888 server.2=debian-1:2888:3888 server.3=debian-5:2888:3888 then I start zookeeper server by running bin/zkServer.sh start, and I got the following output: cl...@debian-0:~/zookeeper$ bin/zkServer.sh start JMX enabled by default Using config: /home/cloud/zookeeper-3.2.1/bin/../conf/zoo.cfg Starting zookeeper ... STARTED cl...@debian-0:~/zookeeper$ 2009-09-23 15:30:27,976 - INFO [main:quorumpeercon...@80] - Reading configuration from: /home/cloud/zookeeper-3.2.1/bin/../conf/zoo.cfg 2009-09-23 15:30:27,981 - INFO [main:quorumpeercon...@232] - Defaulting to majority quorums 2009-09-23 15:30:28,009 - INFO [main:quorumpeerm...@118] - Starting quorum peer 2009-09-23 15:30:28,034 - INFO [Thread-1:quorumcnxmanager$liste...@409] - My election bind port: 3888 2009-09-23 15:30:28,045 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@487] - LOOKING 2009-09-23 15:30:28,070 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@579] - New election: -1 2009-09-23 15:30:28,075 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@618] - Notification: 1, -1, 1, 1, LOOKING, LOOKING, 1 2009-09-23 15:30:28,075 - WARN [WorkerSender Thread:quorumcnxmana...@336] - Cannot open channel to 2 at election address debian-1/172.20.53.86:3888 java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManage r.java:323) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.ja va:302) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.p rocess(FastLeaderElection.java:323) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.r un(FastLeaderElection.java:296) at java.lang.Thread.run(Thread.java:619) 2009-09-23 15:30:28,085 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@642] - Adding vote 2009-09-23 15:30:28,099 - WARN [WorkerSender Thread:quorumcnxmana...@336] - Cannot open channel to 3 at election address debian-5/172.20.14.194:3888 java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManage r.java:323) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.ja va:302) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.p rocess(FastLeaderElection.java:323) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.r un(FastLeaderElection.java:296) at java.lang.Thread.run(Thread.java:619) 2009-09-23 15:30:28,288 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorumcnxmana...@336] - Cannot open channel to 2 at election address debian-1/172.20.53.86:3888 java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManage r.java:323) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManage r.java:356) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeader Election.java:603) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:488) Terminal keeps on outputing the WARN info until I stop the zookeeper Server. I googled zookeeper cannot open channel to at address and searched in mailing list archives, but got nothing helpful. I need your help, thanks and best regards!
Re: ACL question w/ Zookeeper 3.1.1
HI todd, From what I understand, you are sayin that a creator_all_acl does not work with auth? I tried the following with CREATOR_ALL_ACL and it seemed to work for me... import org.apache.zookeeper.CreateMode; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.data.ACL; import org.apache.zookeeper.ZooDefs.Ids; import java.util.ArrayList; import java.util.List; public class TestACl implements Watcher { public static void main(String[] argv) throws Exception { ListACL acls = new ArrayListACL(1); String authentication_type = digest; String authentication = mahadev:some; for (ACL ids_acl : Ids.CREATOR_ALL_ACL) { acls.add(ids_acl); } TestACl tacl = new TestACl(); ZooKeeper zoo = new ZooKeeper(localhost:2181, 3000, tacl); zoo.addAuthInfo(authentication_type, authentication.getBytes()); zoo.create(/some, new byte[0], acls, CreateMode.PERSISTENT); zoo.setData(/some, new byte[0], -1); } @Override public void process(WatchedEvent event) { } } And it worked on my set of zookeeper servers And then I tried Without auth Getdata(/some) Which correctly gave me the error: Exception in thread main org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /some at org.apache.zookeeper.KeeperException.create(KeeperException.java:104) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:692) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268) Is this what you are trying to do? Thanks mahadev On 9/17/09 5:05 PM, Todd Greenwood to...@audiencescience.com wrote: I'm attempting to secure a zookeeper installation using zookeeper ACLs. However, I'm finding that while Ids.OPEN_ACL_UNSAFE works great, my attempts at using Ids.CREATOR_ALL_ACL are failing. Here's a code snippet: public class ZooWrapper { /* 1. Here I'm setting up my authentication. I've got an ACL list, and my authentication strings. */ private final ListACL acl = new ArrayListACL( 1 ); private static final String authentication_type = digest; private static final String authentication = audiencescience:gravy; public ZooWrapper( final String connection_string, final String path, final int connectiontimeout ) throws ZooWrapperException { ... /* 2. Here I'm adding the acls */ // This works (creates nodes, sets data on nodes) for ( ACL ids_acl : Ids.OPEN_ACL_UNSAFE ) { acl.add( ids_acl); } /* NOTE: This does not work (nodes are not created, cannot set data on nodes b/c nodes do not exist) */ //for ( ACL ids_acl : Ids.CREATOR_ALL_ACL ) //{ //acl.add( ids_acl ); //} /* 3. Finally, I create a new zookeeper instance and add my authorization info to it. */ zoo = new ZooKeeper( connection_string, connectiontimeout, this ); zoo.addAuthInfo( authentication_type, authentication.getBytes() ) /* 4. Later, I try to write some data into zookeeper by first creating the node, and then calling setdata... */ zoo.create( path, new byte[0], acl, CreateMode.PERSISTENT ); zoo.setData( path, bytes, -1 ) As I mentioned above, when I add Ids.OPEN_ACL_UNSAFE to acl, then both the create and setData succeed. However, when I use Ids.CREATOR_ALL_ACL, then the nodes are not created. Am I missing something obvious w/ respect to configuring ACLs? I've used the following references: http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-commits/200807 .mbox/%3c20080731201025.c62092388...@eris.apache.org%3e http://books.google.com/books?id=bKPEwR-Pt6ECpg=PT404lpg=PT404dq=zook eeper+ACL+digest+%22new+Id%22source=blots=kObz0y8eFksig=VFCAsNW0mBJyZ swoweJDI31iNlohl=enei=Z82ySojRFsqRlAeqxsyIDwsa=Xoi=book_resultct=re sultresnum=6#v=onepageq=zookeeper%20ACL%20digest%20%22new%20Id%22f=fa lse -Todd
Re: Infinite ping after calling setData()
Hi rob, you might want to take a look at our test cases in Src/java/test/ Specifically QuorumTest Wherein we start and stop a Quorum of servers in a single junit test. Thanks mahadev On 9/15/09 10:15 AM, Rob Baccus r...@audiencescience.com wrote: These are not the complete logs because they to long to add to email at this time. Unfortunately I am having completely different issues now with the servers not shutting down. When I get past that and if I run into this issue again I will give more details. Thanks. -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, September 14, 2009 5:37 PM To: zookeeper-user@hadoop.apache.org Subject: Re: Infinite ping after calling setData() Hi rob, Can you be a little more clear about what you are seeing? After reading through the email, it looks like the session is getting expired due to some reason (is that what you are debugging ?) Also, I see a close session for 0x30002 and no createsession for that in your logs. Are you sure these are the full logs? Thanks mahadev On 9/14/09 5:03 PM, Rob Baccus r...@audiencescience.com wrote: I am trying to automate creating an Ensemble then add data to it, and then pull it out. I am finding that after I call setData() there becomes an infinite loop of pings. This was working then just stopped when I changed some code around to use JUnit 4.1 instead of 3.8.1 which I would expect has nothing to do with this issue. I saw the issue relating to session expiration due to system resources but I don't believe this is the issue since this was working fine. My configuration: Linux VM with 1.5GB RAM Running in Eclipse 3.3 Configured 4 zookeeper servers running each with different client and leader/leader election ports and different local transaction log locations. All 4 servers come up without issues and a Leader is elected. Below is the stack trace that I am seeing with log4j DEBUG turned on for org.apache.zookeeper after the setData() call is made. DEBUG [main] (com.audiencescience.util.zookeeper.qa.failover.FailOverTest.testSimpleZ KClientFailover:124) - ** Before Set Data ** INFO [main-SendThread] (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:716) - Priming connection to java.nio.channels.SocketChannel[connected local=/172.17.1.133:40933 remote=robb02linux.corp.digimine.com/172.17.1.133:2181] INFO [main-SendThread] (org.apache.zookeeper.ClientCnxn$SendThread.run:868) - Server connection successful INFO [NIOServerCxn.Factory:2181] (org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest:503) - Connected to /172.17.1.133:40933 lastZxid 0 INFO [NIOServerCxn.Factory:2181] (org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest:534) - Creating new session 0x123bafb088a DEBUG [FollowerRequestProcessor:1] (org.apache.zookeeper.server.quorum.CommitProcessor.processRequest:168) - Processing request:: sessionid:0x123bafb088a type:createSession cxid:0x0 zxid:0xfffe txntype:unknown n/a DEBUG [ProcessThread:-1] (org.apache.zookeeper.server.quorum.CommitProcessor.processRequest:168) - Processing request:: sessionid:0x123bafb088a type:createSession cxid:0x0 zxid:0x30001 txntype:-10 n/a DEBUG [ProcessThread:-1] (org.apache.zookeeper.server.quorum.Leader.propose:560) - Proposing:: sessionid:0x123bafb088a type:createSession cxid:0x0 zxid:0x30001 txntype:-10 n/a WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2183] (org.apache.zookeeper.server.quorum.Follower.followLeader:242) - Got zxid 0x30001 expected 0x1 WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2182] (org.apache.zookeeper.server.quorum.Follower.followLeader:242) - Got zxid 0x30001 expected 0x1 WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181] (org.apache.zookeeper.server.quorum.Follower.followLeader:242) - Got zxid 0x30001 expected 0x1 INFO [SessionTracker] (org.apache.zookeeper.server.SessionTrackerImpl.run:132) - Expiring session 0x123baf02f28 INFO [SessionTracker] (org.apache.zookeeper.server.ZooKeeperServer.expire:317) - Expiring session 0x123baf02f28 INFO [ProcessThread:-1] (org.apache.zookeeper.server.PrepRequestProcessor.pRequest:360) - Processed session termination request for id: 0x123baf02f28 DEBUG [ProcessThread:-1] (org.apache.zookeeper.server.quorum.CommitProcessor.processRequest:168) - Processing request:: sessionid:0x123baf02f28 type:closeSession cxid:0x0 zxid:0x30002 txntype:-11 n/a DEBUG [ProcessThread:-1] (org.apache.zookeeper.server.quorum.Leader.propose:560) - Proposing:: sessionid:0x123baf02f28 type:closeSession cxid:0x0 zxid:0x30002 txntype:-11 n/a DEBUG [FollowerHandler-/172.17.1.133:40537] (org.apache.zookeeper.server.quorum.Leader.processAck:382) - Ack zxid: 0x30001 DEBUG [FollowerHandler-/172.17.1.133:40537
Re: zookeeper on ec2
Hi Satish, Connectionloss is a little trickier than just retrying blindly. Please read the following sections on this - http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling And the programmers guide: http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html To learn more about how to handle CONNECTIONLOSS. The idea is that that blindly retrying would create problems with CONNECTIONLOSS, since a CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation that you were executing failed to execute. It might be possible that this operation went through the servers. Since, this has been a constant source of confusion for everyone who starts using zookeeper we are working on a fix ZOOKEEPER-22 which will take care of this problem and programmers would not have to worry about CONNECTIONLOSS handling. Thanks mahadev On 9/1/09 4:13 PM, Satish Bhatti cthd2...@gmail.com wrote: I have recently started running on EC2 and am seeing quite a few ConnectionLoss exceptions. Should I just catch these and retry? Since I assume that eventually, if the shit truly hits the fan, I will get a SessionExpired? Satish On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning ted.dunn...@gmail.com wrote: We have used EC2 quite a bit for ZK. The basic lessons that I have learned include: a) EC2's biggest advantage after scaling and elasticity was conformity of configuration. Since you are bringing machines up and down all the time, they begin to act more like programs and you wind up with boot scripts that give you a very predictable environment. Nice. b) EC2 interconnect has a lot more going on than in a dedicated VLAN. That can make the ZK servers appear a bit less connected. You have to plan for ConnectionLoss events. c) for highest reliability, I switched to large instances. On reflection, I think that was helpful, but less important than I thought at the time. d) increasing and decreasing cluster size is nearly painless and is easily scriptable. To decrease, do a rolling update on the survivors to update their configuration. Then take down the instance you want to lose. To increase, do a rolling update starting with the new instances to update the configuration to include all of the machines. The rolling update should bounce each ZK with several seconds between each bounce. Rescaling the cluster takes less than a minute which makes it comparable to EC2 instance boot time (about 30 seconds for the Alestic ubuntu instance that we used plus about 20 seconds for additional configuration). On Mon, Jul 6, 2009 at 4:45 AM, David Graf david.g...@28msec.com wrote: Hello I wanna set up a zookeeper ensemble on amazon's ec2 service. In my system, zookeeper is used to run a locking service and to generate unique id's. Currently, for testing purposes, I am only running one instance. Now, I need to set up an ensemble to protect my system against crashes. The ec2 services has some differences to a normal server farm. E.g. the data saved on the file system of an ec2 instance is lost if the instance crashes. In the documentation of zookeeper, I have read that zookeeper saves snapshots of the in-memory data in the file system. Is that needed for recovery? Logically, it would be much easier for me if this is not the case. Additionally, ec2 brings the advantage that serves can be switch on and off dynamically dependent on the load, traffic, etc. Can this advantage be utilized for a zookeeper ensemble? Is it possible to add a zookeeper server dynamically to an ensemble? E.g. dependent on the in-memory load? David
Re: Runtime Interrogation of the Ensemble
Hi Todd, You can use jmx to to find such information. Also you can just do this Echo stat | nc localhost clientport To get status from the zookeeper servers. This is all documented in the forrest docs at http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html Hope this helps. Ensemble : object representing a zookeeper ensemble Long Ensemble.getLastTxid() ZKServer Ensemble.getCurrentLeader() ZKServer[] Ensemble.getPastLeaders() ZKServer[] Ensemble.getConnectedServers() ZKServer[] Ensemble.getDisConnectedServers() Boolean doWeHaveAQuorum() ZKServer : object representing a zookeeper server Client[] ZKServer.getClients() Client[] ZKServer.getDisconnectedClients() Boolean isLeader() Boolean isAbleToBeLeader() int get|set VotingWeight() Group get|set GroupMembership() Most of this is available via the stat/jmx. It would be cool if... 1. The ensemble could trigger nagios type alerts if it no longer had a quorum 2. The ensemble could dynamically bring up new zk servers in the event that enough severs have died to prevent a quorum 3. The ensemble/server api could allow analysis of problem servers (servers that keep dropping connections and/or clients) Nice idea... But we do not have anything like this as of now. Thanks mahadev
Re: question about watcher
Hi Qian, There isnt any such api. We have been thinking abt adding an api on cancelling a cleints watches. We have been thinking about adding a proc filesystem wherein a cleintt will have a list of all the watches. This data can be used to know which clients are watching what znode, but this has always been in the future discussions for us. We DO NOT have anything planned in the near future for this. Thanks mahadev On 8/5/09 6:57 PM, Qian Ye yeqian@gmail.com wrote: Hi all: Is there a client API for querying the watchers' owner for a specific znode? In some situation, we want to find out who set watchers on the znode. thx
Re: c client error message with chroot
This looks like a bug. Does this happen without doing any reads/writes using the zookeeper handle? Please do open a jira for this. Thanks mahadev On 8/2/09 10:53 PM, Michi Mutsuzaki mi...@cs.stanford.edu wrote: Hello, I'm doing something like this (using zookeeper-3.2.0): zhandle_t* zh = zookeeper_init(localhost:2818/servers, watcher, 1000, 0, 0, 0); and getting this error: 2009-08-03 05:48:30,693:3380(0x40a04950):zoo_i...@check_events@1439: initiated connection to server [127.0.0.1:2181] 2009-08-03 05:48:30,705:3380(0x40a04950):zoo_i...@check_events@1484: connected to server [127.0.0.1:2181] with session id=122ddb9be64016d 2009-08-03 05:48:30,705:3380(0x40c05950):zoo_er...@sub_string@730: server path does not include chroot path /servers The error log doesn't appear if I use localhost:2818 without chroot. Is this actually an error? Thanks! --Michi
Re: bad svn url : test-patch
Hi Todd, Yes this happens with the branch 3.2. The test-patch link is broken becasuse of the hadoop split. This file is used for hudson test environment. It isnt used anywhere else, so the svn co otherwise should be fine. We should fix it anyways. Thanks mahadev On 7/30/09 2:57 PM, Todd Greenwood to...@audiencescience.com wrote: FYI - looks like there is a bad url in svn... $ svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 branch-3.2 ... Abranch-3.2/build.xml Fetching external item into 'branch-3.2/src/java/test/bin' svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist This does not repro w/ 3.1: $ svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.1 branch-3.1 -Todd
Bug in 3.2 release.
Hi folks, We just discovered a bug in 3.2 release http://issues.apache.org/jira/browse/ZOOKEEPER-484. This bug will affect your clients whenever they switch zookeeper servers - from a zookeeper server that is a follower to a server that is leader. We should have a fix out by next week in 3.2.1 and trunk. 3.2.1 should be out in the next 2-3 weeks. If you are already using 3.2.0 in production I would suggest switching it back to 3.1.1 (though there is a workaround mentioned in the jira http://issues.apache.org/jira/browse/ZOOKEEPER-484 but I would advise against it). The 3.2.0 clients are compatible with 3.1.1 servers. Thanks mahadev -- End of Forwarded Message
Re: Leader Elections
Both of the options that Scott mentioned are quite interesting. Quite a few of our users are interested in these two features. I think for 2, we should be able to use observers with a subscription to the master cluster with interested in a special subtree. That avoids too much of cross talk. Henry/Flavio, do you guys want to keep this in mind for Observers in (not implement it in the jira but to generalize it in a way that partial subscription can be done later) http://issues.apache.org/jira/browse/ZOOKEEPER-368 Wherein an observer can just register with interest in a subtree and the master cluster can avoid sending updates to zookeeper tree not in the sbutree. This would be very helpful in a WAN setting where in only small data needs to be up to date within different data centers. thanks mahadev On 7/20/09 11:50 AM, Todd Greenwood to...@audiencescience.com wrote: Flavio, Ted, Henry, Scott, this would perfectly well for my use case provided: SINGLE ENSEMBLE: GROUP A : ZK Servers w/ read/write AND Leader Elections GROUP B : ZK Servers w/ read/write W/O Leader Elections So, we can craft this via Observers and Hiererarchial Quorum groups? Great. Problem solved. When will this be production ready? :o) Scott brought up a multi-feature that is very interesting for me. Namely: 1. Offline ZK servers that sync merge on reconnect The offline servers seems conceptually simple, it's kind of like a messaging system. However, the merge and resolve step when two servers reconnect might be challenging. Cool idea though. 2. Partial memory graph subscriptions The second idea is partial memory graph subscriptions. This would enable virtual ensembles to interract on the same physical ensemble. For my use case, this would prevent unnecessary cross talk between nodes on a WAN, allowing me to define the subsets of the memory graph that need to be replicated, and to whom. This would be a huge scalability win for WAN use cases. -Todd -Original Message- From: Scott Carey [mailto:sc...@richrelevance.com] Sent: Monday, July 20, 2009 11:00 AM To: zookeeper-user@hadoop.apache.org Subject: Re: Leader Elections Observers would be awesome especially with a couple enhancements / extensions: An option for the observers to enter a special state if the WAN link goes down to the master cluster. A read-only option would be great. However, allowing certain types of writes to continue on a limited basis would be highly valuable as well. An observer could own a special node and its subnodes. Only these subnodes would be writable by the observer when there was a session break to the master cluster, and the master cluster would take all the changes when the link is reestablished. Essentially, it is a portion of the hierarchy that is writable only by a specitfic observer, and read-only for others. The purpose of this would be for when the WAN link goes down to the master ZKs for certain types of use cases - status updates or other changes local to the observer that are strictly read-only outside the Observer's 'realm'. On 7/19/09 12:16 PM, Henry Robinson he...@cloudera.com wrote: You can. See ZOOKEEPER-368 - at first glance it sounds like observers will be a good fit for your requirements. Do bear in mind that the patch on the jira is only for discussion purposes; I would not consider it currently fit for production use. I hope to put up a much better patch this week. Henry On Sat, Jul 18, 2009 at 7:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: Can you submit updates via an observer? On Sat, Jul 18, 2009 at 6:38 AM, Flavio Junqueira f...@yahoo-inc.com wrote: 2- Observers: you could have one computing center containing an ensemble and observers around the edge just learning committed values. -- Ted Dunning, CTO DeepDyve
Re: Queue code
Also are there any performance numbers of zookeeeper based queues. How does it compare with JMS. thanks Kishore G Hi Kishore, We do not have any performance number fr queues on zookeeper. I think you can get a rough idea of those numbers from your usage of zookeeper (number of reads/writes per second) and zookeeper performance numbers on http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperOver.html Hope that helps. Thanks mahadev
Re: Instantiating HashSet for DataNode?
Hi Erik, I am not sure if that would a considerable opitmization but even if you wanted to do it, it would be much more than just adding a check in the constructor (the serialization/deserialization would need to have specialized code). Right now all the datanodes are treated equally for ser/derser and other purposes. mahadev On 7/14/09 1:42 PM, Erik Holstad erikhols...@gmail.com wrote: I'm not sure if I've miss read the code for the DataNode, but to me it looks like every node gets a set of children even though it might be an ephemeral node which cannot have children, so we are wasting 240 B for every one of those. Not sure if it makes a big difference, but just thinking that since everything sits in memory and there is no reason to instantiate it, maybe it would be possible just to add a check in the constructor? Regards Erik
Re: Help to compile Zookeeper C API on a old system
Hi Qian, I am not sure if it will work. You should be able to back port it such a way so that it works with gcc 3.*/4.*, but again I have never tried it. mahadev On 7/6/09 6:35 PM, Qian Ye yeqian@gmail.com wrote: Thanks Mahadev, I follow the installation instruction in the README, autoreconf -i -f ./configure --prefix=$dir make make install until ./configure --prefix=$dir, there is no error, however, errors came when I did make, My plan is change the compiler from gcc to g++, and solve the compile errors one by one. Will my plan do? Thanks~ On Tue, Jul 7, 2009 at 2:22 AM, Mahadev Konar maha...@yahoo-inc.com wrote: Hi Qian, What issues do you face? I have never tried compiling with the configuration below, but I could give it a try in my free time to see if I can get it to compile. mahadev On 7/6/09 7:37 AM, Qian Ye yeqian@gmail.com wrote: Hi all: I'm writing to ask you to do me a favor. It's urgent. For some unchangeable reason, I have to compile libzookeeper_st.a, libzookeeper_mt.a on an old system: gcc 2.96 autoconf 2.13 automake 1.4-p5 libtool 1.4.2 I cannot not compile the target lib in the usual way, and this task drives me crazy :-( could anyone help me out? Thanks a lot~
Re: General Question about Zookeeper
Hi Harold, As Henry mentioned, what acl's provide you is preventing access to znodes. If someone has access to zookeeper's data stored on zookeeper's server machines, they should be able to resconstruct the data and read it (using zookeeper deserialization code). I am not sure what kind of security model you are interested in, but for ZooKeeper we expect the server side data stored on local disks be inaccessible to normal users and only accessable to admins. Hope this helps. Thanks mahadev On 6/25/09 11:01 AM, Henry Robinson he...@cloudera.com wrote: Hi Harold, Each ZooKeeper server stores updates to znodes in logfiles, and periodic snapshots of the state of the datatree in snapshot files. A user who has the same permissions as the server will be able to read these files, and can therefore recover the state of the datatree without the ZK server intervening. ACLs are applied only by the server; there is no filesystem-level representation of them. Henry On Thu, Jun 25, 2009 at 6:48 PM, Harold Lim rold...@yahoo.com wrote: Hi All, How does zookeeper store data/files? From reading the doc, the clients can put ACL on files/znodes to limit read/write/create of other clients. However, I was wondering how are these znodes stored on Zookeeper servers? I am interested in a security aspect of zookeeper, where the clients and the servers don't necessarily belong to the same group. If a client creates a znode in the zookeeper? Can the person, who owns the zookeeper server, simply look at its filesystem and read the data (out-of-band, not using a client, simply browsing the file system of the machine hosting the zookeeper server)? Thanks, Harold