from:"Mahadev Konar"

testing.

2008-07-15 Thread Mahadev Konar

Testing.

mahadev

Re: test

2008-08-12 Thread Mahadev Konar

Testing please ignore.

mahadev

On 8/12/08 3:46 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:

> just a test, please ignore

RE: Client Termination Error

2008-08-15 Thread Mahadev Konar

Hi Satish,
 I see gnu.java* in your exceptions. Are you using gcj? 

mahadev

> -Original Message-
> From: Satish Bhatti [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 15, 2008 3:49 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Client Termination Error
> 
> Has anybody come across the following error:
> 
> (1) Start up ZooKeeper on 2 machines.
> (2) Connect 2 ZooKeeper clients, one per machine.
> (3) Exit one of the clients with a -C.
> 
> The server that client was connected to goes into an unrecoverable
error
> state, and continuously spits the following out to the console:
> 
> ERROR - [NIOServerCxn.Factory:[EMAIL PROTECTED] - FIXMSG
> java.nio.channels.CancelledKeyException
>at gnu.java.nio.SelectionKeyImpl.readyOps(libgcj.so.81)
>at
>
com.yahoo.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
13
> 6)
> ERROR - [NIOServerCxn.Factory:[EMAIL PROTECTED] - FIXMSG
> java.nio.channels.CancelledKeyException
>at gnu.java.nio.SelectionKeyImpl.readyOps(libgcj.so.81)
>at
>
com.yahoo.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
13
> 6)
> ERROR - [NIOServerCxn.Factory:[EMAIL PROTECTED] - FIXMSG
> java.nio.channels.CancelledKeyException
>at gnu.java.nio.SelectionKeyImpl.readyOps(libgcj.so.81)
>at
>
com.yahoo.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
13
> 6)
> ERROR - [NIOServerCxn.Factory:[EMAIL PROTECTED] - FIXMSG
> java.nio.channels.CancelledKeyException
>at gnu.java.nio.SelectionKeyImpl.readyOps(libgcj.so.81)
>at
>
com.yahoo.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
13
> 6)
> ERROR - [NIOServerCxn.Factory:[EMAIL PROTECTED] - FIXMSG
> java.nio.channels.CancelledKeyException
>at gnu.java.nio.SelectionKeyImpl.readyOps(libgcj.so.81)
>at
>
com.yahoo.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
13
> 6)
> ERROR - [NIOServerCxn.Factory:[EMAIL PROTECTED] - FIXMSG
> java.nio.channels.CancelledKeyException
>at gnu.java.nio.SelectionKeyImpl.readyOps(libgcj.so.81)
>at
>
com.yahoo.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
13
> 6)
> 
> 
> Notes:
> -
> 
> I cannot reproduce this every time, but I can get it about 7 times out
of
> 10.
> 
> ZooKeeper version
> ---
> 2.2.1. from SourceForge
> 
> OS (Ubuntu)
> 
> Linux 2.6.22-14-generic #1 SMP Fri Feb 1 04:59:50 UTC 2008 i686
GNU/Linux
> 
> Java VM (Sun JDK)
> 
> java version "1.6.0_04"
> Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
> Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)
> 
> 
> Satish

Re: Leader election stalled

2008-09-02 Thread Mahadev Konar

Hi Austin,
 Did you kill the leader process? It looks like that you didn't kill the
server since its responding to ruok. Is that true?

mahadev


On 9/2/08 9:56 AM, "Austin Shoemaker" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> We have run into a situation where killing the leader results in followers
> perpetually trying to reelect that leader.
> 
> We have 11 zookeeper (2.2.1 from SF.net) servers and 256 clients connecting
> at random. We kill the leader and observe the impact, monitoring a script
> that repeatedly prints the responses to "ruok" and "stat". All servers
> except the killed leader respond with "imok" and "ZooKeeperServer not
> running", respectively.
> 
> About half of the time, each remaining server gets into a loop of failing to
> connect to the killed leader and then reelecting the killed leader.
> 
> Here is an example log, which is representative of similar logs on the other
> servers. We additionally logged connectivity during leader election. If
> anyone would like complete logs, let me know.
> 
> Thanks,
> 
> Austin Shoemaker
> 
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING
> *WARN  - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889*
> ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG
> java.net.ConnectException: Connection refused
> *
>  cont'd *
> 
> ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG
> java.lang.Exception: shutdown Follower
> at
> com.yahoo.zookeeper.server.quorum.Follower.shutdown(Follower.java:364)
> at
> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:403)
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - LOOKING
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.22:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.22:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.21:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.21:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.12:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.12:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.11:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.11:2888
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.12:2890
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.12:2890
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.11:2890
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.11:2890
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.22:2889
> *WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Exception occurred when
> sending / receiving packet to / from /10.50.65.22:2889
> java.net.SocketTimeoutException: Receive timed out
> *WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to
> /10.50.65.21:2890
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.21:2890
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.21:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.21:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.12:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.12:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Sending election packet to /
> 10.50.65.11:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Received response from /
> 10.50.65.11:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - Election tally:
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - 8 -> 1
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - 4 -> 1
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - 7 -> 8
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Election complete,
> result.winner = 7
> *WARN  - [QuorumPeer:[EMAIL PROTECTED] - > Election complete, address
> = /10.50.65.22:2889
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING
> WARN  - [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.22:2889
> ERROR - [QuorumPeer:[EMAIL PROTECTED] - FIXMSG
> java.net.ConnectException: Connection refused
> *at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> at java.net.Socket.connect(Socket.java:519)
> at
> com.yahoo.zookeeper.server.quorum.Follower.followLeader(Follower.java:133)
> at
> com.yahoo.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:399)

FW: [ANN] katta-0.1.0 release - distribute lucene indexes in a grid

2008-09-17 Thread Mahadev Konar


-- Forwarded Message
From: Stefan Groschupf <[EMAIL PROTECTED]>
Reply-To: <[EMAIL PROTECTED]>
Date: Wed, 17 Sep 2008 17:06:19 -0700
To: <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>,
<[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
Subject: [ANN] katta-0.1.0 release - distribute lucene indexes in a grid

After 5 month work we are happy to announce the first developer
preview release of katta.
This release contains all functionality to serve a large, sharded
lucene index on many servers.
Katta is standing on the shoulders of the giants lucene, hadoop and
zookeeper.

Main features:
+ Plays well with Hadoop
+ Apache Version 2 License.
+ Node failure tolerance
+ Master failover
+ Shard replication
+ Plug-able network topologies (Shard - Distribution and Selection
Polices)
+ Node load balancing at client



Please give katta a test drive and give us some feedback!

Download:
http://sourceforge.net/project/platformdownload.php?group_id=225750

website:
http://katta.sourceforge.net/

Getting started in less than 3 min:
http://katta.wiki.sourceforge.net/Getting+started

Installation on a grid:
http://katta.wiki.sourceforge.net/Installation

Katta presentation today (09/17/08) at hadoop user, yahoo mission
college:
http://upcoming.yahoo.com/event/1075456/
* slides will be available online later


Many thanks for the hard work:
Johannes Zillmann, Marko Bauhardt, Martin Schaaf (101tec)

I apologize the cross posting.


Yours, the Katta Team.

~~~
101tec Inc., Menlo Park, California
http://www.101tec.com





-- End of Forwarded Message

Re: JMX Documentation?

2008-11-17 Thread Mahadev Konar

Hi Garth,
 Sorry for the delayed response.

Their is an open jira for doucmentation on jmx
http://issues.apache.org/jira/browse/ZOOKEEPER-177.

We will be adding the docs soon.

For now: 
The way you can get jmx support is by running

ManagedQuorumPeerMain rather than QuorumPeerMain

Here is an example commandline:

java   -classpath 
zookeeper/conf:zookeeper/log4j-1.2.15.jar:zookeeper/zookeeper.jar
org.apache.zookeeper.server.quorum.ManagedQuorumPeerMain server1/zoo.cfg


mahadev


On 11/16/08 2:12 PM, "Garth Patil" <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm upgrading to 3.0 from a previous version, and I just noticed the
> JMX MBean code in the tree. Is there documentation for what is exposed
> and some examples of how to use the MBeans? I couldn't find anything
> on the site or in the mailing list archive. Also, has anyone used them
> with something other than the platform MBean server (e.g. JBoss)?
> Cheers,
> Garth

Re: JMX Documentation?

2008-11-17 Thread Mahadev Konar

Hmm... On my linux machine it worked fine without the option...

mahadev


On 11/17/08 3:10 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:

> I found that I needed to run the server with
> 
> -J-Djava.rmi.server.hostname=localhost
> 
> in order to connect a local jconsole instance, this was on my ubuntu
> machine. YMMV.
> 
> Patrick
> 
> Mahadev Konar wrote:
>> Hi Garth,
>>  Sorry for the delayed response.
>> 
>> Their is an open jira for doucmentation on jmx
>> http://issues.apache.org/jira/browse/ZOOKEEPER-177.
>> 
>> We will be adding the docs soon.
>> 
>> For now: 
>> The way you can get jmx support is by running
>> 
>> ManagedQuorumPeerMain rather than QuorumPeerMain
>> 
>> Here is an example commandline:
>> 
>> java   -classpath
>> zookeeper/conf:zookeeper/log4j-1.2.15.jar:zookeeper/zookeeper.jar
>> org.apache.zookeeper.server.quorum.ManagedQuorumPeerMain server1/zoo.cfg
>> 
>> 
>> mahadev
>> 
>> 
>> On 11/16/08 2:12 PM, "Garth Patil" <[EMAIL PROTECTED]> wrote:
>> 
>>> Hi,
>>> I'm upgrading to 3.0 from a previous version, and I just noticed the
>>> JMX MBean code in the tree. Is there documentation for what is exposed
>>> and some examples of how to use the MBeans? I couldn't find anything
>>> on the site or in the mailing list archive. Also, has anyone used them
>>> with something other than the platform MBean server (e.g. JBoss)?
>>> Cheers,
>>> Garth
>>

Re: Watches are one time trigger?

2008-11-18 Thread Mahadev Konar

Hi Thomas,
 Yes, watches are one time triggers. Watches are supposed to be lighweight
and local to the server you are connected to.
We designed it such that it has minimum load impact on the servers.

mahadev

On 11/18/08 5:58 AM, "Kiesslich, Thomas" <[EMAIL PROTECTED]>
wrote:

> 
>  Hi, 
> 
> A watch is designed as a one time trigger. Why have you designed it that way?
> Why not as a normal listener?
> 
> 
> Mit freundlichen Grüßen / With best regards
> Thomas Kießlich 
> 
> Siemens Enterprise Communications GmbH & Co. KG
> HiPath Applications
> 
> SEN LIP DA 11
> Schertlinstr. 8
> 81379 Munich, Germany

Re: ZooKeeper Roadmap - 3.1.0 and beyond.

2008-11-18 Thread Mahadev Konar

Hi Krishna,
  Sorry for the delayed response. The responses are in line.


On 11/18/08 12:02 PM, "Krishna Sankar (ksankar)" <[EMAIL PROTECTED]> wrote:

> Have a couple of questions on the proposed multi-tenancy feature (pardon
> me if they are obvious, as I am slowly getting up to speed):
> 
> a) First, good initiative. I think this will make ZK more
> pervasive. I plan to participate and contribute
 thanks and we look forward to your contribution.
> b) Is there any assumption on the trust and security ? i.e. could
> we assume that the servers would be in a secure environment and so no
> need for SSL et al., could we trust the MAC Address/IP Address (this
> also raises the question of NAT et al, if they are relevant) and could
> we make an assumption that there is no need for a secure identity ?
 There is an assumption to some level. We do trust the ipaddress (got via
tcp connections assuming its difficult to forge) and we use
Raw tcp (so no security in the transfer layer). We do have a authorization
layer that is pluggable at the server and the clients can identify
themselves using that.

> c) I remember seeing one of Ben's ToDo, an entry for distributed
> ZK. I couldn't find a resolution or write-up. Possible, I am missing
> something obvious. Anyway, Is it already in place or do we need to
> consider that feature in the multi-tenancy capability ?
Its still in discussions. We don't have a conrete proposal yet. For multi
tenancy we don't need to consider it.


mahadev
> 
> Cheers & thanks
> 
> 
> |-Original Message-
> |From: Benjamin Patrick Hunt <[EMAIL PROTECTED]>
> |Sent: Mon, 27 Oct 2008 20:35:37 -0700
> |To: zookeeper-user@hadoop.apache.org
> |Subject: ZooKeeper Roadmap - 3.1.0 and beyond.
> |
> <..snip/>
> 5) (begin) multi-tenancy support. A number of users have expressed
> interest in being able to deploy ZK as a service in a cloud.
> Multi-tenancy support would be a huge benefit (quota, qos, namespace
> partitioning of nodes, billing, etc...)
> 
> <..snip>
>

Re: 2 Server Cluster !?

2008-11-25 Thread Mahadev Konar

Hi Valdimir,
 Its not possible to run a Zookeeeper quorum of 2 such that if one dies the
other one continues to run and server clients. The whole idea of zookeeper
is its high reliability and the guarantee's it provides.
http://wiki.apache.org/hadoop/ZooKeeper/Tao

So, its not possible to run zookeeper as you state.

mahadev

On 11/25/08 7:33 AM, "Vladimir Bobic" <[EMAIL PROTECTED]> wrote:

> Ahoy everyone,
> 
> I'm interested to know if there is anyway to run 2 servers cluster in which:
>  - if 1 server fails other one continues to serve clients.
>  - when failed server is back, force elections to favored server
>  I'm aware that there is possibility for both servers to work
> independently not knowing for each other,
> difference that could immerge is not issue.
> 
> Regards,
> Vladimir

Re: ZooKeeper (not) on wikipedia

2008-11-25 Thread Mahadev Konar

Great!! Go ahead.

mahadev


On 11/25/08 4:57 PM, "Krishna Sankar (ksankar)" <[EMAIL PROTECTED]> wrote:

> If no one has done it, I will take a first cut at it.
> 
> |-Original Message-
> |From: Patrick Hunt [mailto:[EMAIL PROTECTED]
> |Sent: Tuesday, November 25, 2008 3:26 PM
> |To: [EMAIL PROTECTED]; zookeeper-user@hadoop.apache.org
> |Subject: ZooKeeper (not) on wikipedia
> |
> |I noticed today that there is no ZooKeeper page on Wikipedia. Would
> |anyone like to create it?
> |
> |This would be preferable:
> |http://en.wikipedia.org/wiki/ZooKeeper
> |
> |But I also noticed this as well (perhaps _software could redirect? Or
> |just change the disambiguation page to link to ZooKeeper instead?):
> |http://en.wikipedia.org/wiki/Zookeeper_(disambiguation)
> |http://en.wikipedia.org/wiki/ZooKeeper_(software)
> |
> |The zk docs home and overview should have more than enough to start:
> |http://hadoop.apache.org/zookeeper/docs/r3.0.0/
> |http://hadoop.apache.org/zookeeper/docs/r3.0.0/zookeeperOver.html
> |
> |Regards,
> |
> |Patrick

Re: ZooKeeper 3.0 Fix Release slated for end of this week.

2008-12-01 Thread Mahadev Konar

Hi Jake, 
 We are in the process of releasing. The release is up for vote. As soon as
the release passes, we will put up the release.


Here is the email by Pat.



On 11/24/08 5:50 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote:
> From: Patrick Hunt <[EMAIL PROTECTED]>
> Date: November 24, 2008 5:50:24 PM PST
> To: [EMAIL PROTECTED]
> Subject: [VOTE] Release ZooKeeper 3.0.1 (candidate 1)
>
>
> I've created a second candidate build for ZooKeeper 3.0.1.
>
> *** Please download, test and VOTE before the
> *** vote closes EOD on Friday, November 28.***
>
> http://people.apache.org/~phunt/zookeeper-3.0.1-candidate-1/
>
> The only change to rc1 from rc0 is adding a missing apache license
> header to a source file:
> https://issues.apache.org/jira/browse/ZOOKEEPER-232
>
> Should we release this?
>
> Patrick


Mahadev

On 12/1/08 9:45 AM, "Jake Thompson" <[EMAIL PROTECTED]> wrote:

> Any status update?
> 
> On Tue, Nov 18, 2008 at 3:29 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> 
>> I've slated the 3.0.1 fix release of ZooKeeper for the end of this week.
>> 
>> https://issues.apache.org/jira/browse/ZOOKEEPER?report=com.atlassian.jira.plu
>> gin.system.project:roadmap-panel
>> 
>> Of particular interest are exists() NPE in ZOOKEEPER-226, and some perf
>> issues ZOOKEEPER-212 & ZOOKEEPER-223
>> 
>> If there are any questions, or issues that should be included that are not,
>> please speak up.
>> 
>> Patrick
>>

Re: NullPointerException stopping and starting Zookeeper servers

2008-12-08 Thread Mahadev Konar

Hi Thomas,
 This looks like a bug. Can you open a jira mentioning what the problem is
and how to recreate it?

Thanks
mahadev


On 12/8/08 11:33 AM, "Thomas Vinod Johnson" <[EMAIL PROTECTED]> wrote:

> Hi,
> I have a replicated zookeeper services consisting of 3 zookeeper (3.0.1)
> servers all running on the same host for testing purposes. I've created
> exactly one znode in this ensemble. At this point, I stop, then restart
> a single zookeeper server; moving onto the next one a few seconds later.
> A few restarts later (about 4 is usually sufficient), I get the
> following exception on one of the servers, at which point it exits:
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTx
> nLog.java:447)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTx
> nLog.java:358)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(File
> TxnLog.java:333)
> at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.
> java:102)
> at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:[EMAIL PROTECTED] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at 
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> 
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
>> = zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
>>   if (!next())
>>   return;
> 447c448,450
> < inputStream.close();
> ---
>>   if (inputStream != null) {
>>   inputStream.close();
>>   }
> 
> Is this a bug?
> 
> Thanks.
>

Re: zoo_set() version question

2008-12-12 Thread Mahadev Konar

Hi Avery,
 If you are using zoo_set(  , int version),

And you have a success, then the version of the node that denots your
successful zoo_set() above is
 = Version +1 

Are you  using it to keep track of what revision is the one valid for your
set's? 


mahadev

On 12/12/08 11:36 AM, "Avery Ching"  wrote:

> Patrick,
> 
> Thanks for responding.
> 
> I agree that I can use zoo_exists and zoo_get to get the version of the
> znode as it exists currently.
> 
> The problem I am trying to solve is that getting the version from struct
> Stat in either zoo_exists or zoo_get may not be the same version that my
> last successful zoo_set used.  I would like to get the version that denotes
> my last successful zoo_set() operation to a particular znode.
> 
> I understand that the data and version to the znode may change immediately
> one or multiple times after my zoo_set() and this is fine, but I would still
> like to know the znode's versions of the data I set.
> 
> Avery
> 
> On 12/12/08 11:11 AM, "Patrick Hunt"  wrote:
> 
>> Avery Ching wrote:
>>> If zoo_set() completes successfully with version != -1, can we assume that
>>> version -> version + 1 for this znode?  If not, is there a way for the user
>>> to get the version of the successfully completed zoo_set() operation?
>> 
>> You shouldn't rely on this, it may work, but it's not part of the
>> contract. Also, nothing says that some other client won't change the
>> node immediately after you change it.
>> 
>> You can access the version using zoo_exists or zoo_get - specifically
>> the "struct Stat stat" argument of either of those methods contains a
>> "version" member.
>> 
>> Patrick
>

Re: zoo_set() version question

2008-12-12 Thread Mahadev Konar

Also I am assuming that you are not using version = -1. -1 means that it
overrides the data on that node.

mahadev

On 12/12/08 12:12 PM, "Mahadev Konar"  wrote:

> Hi Avery,
>  If you are using zoo_set(  , int version),
> 
> And you have a success, then the version of the node that denots your
> successful zoo_set() above is
>  = Version +1 
> 
> Are you  using it to keep track of what revision is the one valid for your
> set's? 
> 
> 
> mahadev
> 
> On 12/12/08 11:36 AM, "Avery Ching"  wrote:
> 
>> Patrick,
>> 
>> Thanks for responding.
>> 
>> I agree that I can use zoo_exists and zoo_get to get the version of the
>> znode as it exists currently.
>> 
>> The problem I am trying to solve is that getting the version from struct
>> Stat in either zoo_exists or zoo_get may not be the same version that my
>> last successful zoo_set used.  I would like to get the version that denotes
>> my last successful zoo_set() operation to a particular znode.
>> 
>> I understand that the data and version to the znode may change immediately
>> one or multiple times after my zoo_set() and this is fine, but I would still
>> like to know the znode's versions of the data I set.
>> 
>> Avery
>> 
>> On 12/12/08 11:11 AM, "Patrick Hunt"  wrote:
>> 
>>> Avery Ching wrote:
>>>> If zoo_set() completes successfully with version != -1, can we assume that
>>>> version -> version + 1 for this znode?  If not, is there a way for the user
>>>> to get the version of the successfully completed zoo_set() operation?
>>> 
>>> You shouldn't rely on this, it may work, but it's not part of the
>>> contract. Also, nothing says that some other client won't change the
>>> node immediately after you change it.
>>> 
>>> You can access the version using zoo_exists or zoo_get - specifically
>>> the "struct Stat stat" argument of either of those methods contains a
>>> "version" member.
>>> 
>>> Patrick
>>

Re: zoo_set() version question

2008-12-12 Thread Mahadev Konar

That's right pat. I thought about that. Though ben already mentioned that we
missed the stat return in the c sync code.
But for the version, since its a test and set, we should also guarantee that
the version is a +1 to prev one. It would be really unintutive if it was
otherwise.

Also I noticed after ben's comments that the async callback zoo_aset() is
called back with stat argument. Only the zoo_get() sync api is missing stat
return code :).


mahadev

On 12/12/08 12:39 PM, "Patrick Hunt"  wrote:

> Mahadev Konar wrote:
>> And you have a success, then the version of the node that denots your
>> successful zoo_set() above is
>>  = Version +1 
> 
> Mahadev, that's the current implementation, but I wasn't aware we were
> exposing that detail as something users should rely on. Is it documented
> anywhere in the docs? If this is "user visible" we should document it, I
> thought we weren't exposing this for a reason...
> 
>> 
>> 
>> mahadev
>> 
>> On 12/12/08 11:36 AM, "Avery Ching"  wrote:
>> 
>>> Patrick,
>>> 
>>> Thanks for responding.
>>> 
>>> I agree that I can use zoo_exists and zoo_get to get the version of the
>>> znode as it exists currently.
>>> 
>>> The problem I am trying to solve is that getting the version from struct
>>> Stat in either zoo_exists or zoo_get may not be the same version that my
>>> last successful zoo_set used.  I would like to get the version that denotes
>>> my last successful zoo_set() operation to a particular znode.
>>> 
>>> I understand that the data and version to the znode may change immediately
>>> one or multiple times after my zoo_set() and this is fine, but I would still
>>> like to know the znode's versions of the data I set.
>>> 
>>> Avery
>>> 
>>> On 12/12/08 11:11 AM, "Patrick Hunt"  wrote:
>>> 
>>>> Avery Ching wrote:
>>>>> If zoo_set() completes successfully with version != -1, can we assume that
>>>>> version -> version + 1 for this znode?  If not, is there a way for the
>>>>> user
>>>>> to get the version of the successfully completed zoo_set() operation?
>>>> You shouldn't rely on this, it may work, but it's not part of the
>>>> contract. Also, nothing says that some other client won't change the
>>>> node immediately after you change it.
>>>> 
>>>> You can access the version using zoo_exists or zoo_get - specifically
>>>> the "struct Stat stat" argument of either of those methods contains a
>>>> "version" member.
>>>> 
>>>> Patrick
>>

Re: zoo_set() version question

2008-12-12 Thread Mahadev Konar

Maybe you guys are right! If we are returning the stat we should not
explicitly state it. As long as we don't have a dire need for it, we
shouldn't make it a guarantee on zookeeper.

@krishna,
 the count does overflow. Its an int. the calculation we did was that even
if we have 100 processes updating a node 10,000 times a day, it would take
around 5 years before you overflow the int.


mahadev


On 12/12/08 3:07 PM, "Krishna Sankar (ksankar)"  wrote:

> Most probably we shouldn't explicitly state the increment count but that
> it will increase. Also is there a rest/overflow condition ?
> Cheers
> 
> 
> |-Original Message-
> |From: Patrick Hunt [mailto:ph...@apache.org]
> |Sent: Friday, December 12, 2008 2:24 PM
> |To: Mahadev Konar
> |Cc: zookeeper-user@hadoop.apache.org
> |Subject: Re: zoo_set() version question
> |
> |That's fine, but we should document it. Please enter a JIRA that the
> |docs should talk about this.
> |
> |I notice we have this in the prog guide:
> |"Each time a znode's data changes, the version number increases."
> |
> |Sort of a moot point once we fix the zoo_set api but we should
> |explicitly state that it increments by 1.
> |
> |Patrick
> |
> |Mahadev Konar wrote:
> |> That's right pat. I thought about that. Though ben already mentioned
> |that we
> |> missed the stat return in the c sync code.
> |> But for the version, since its a test and set, we should also
> |guarantee that
> |> the version is a +1 to prev one. It would be really unintutive if it
> |was
> |> otherwise.
> |>
> |> Also I noticed after ben's comments that the async callback
> zoo_aset()
> |is
> |> called back with stat argument. Only the zoo_get() sync api is
> missing
> |stat
> |> return code :).
> |>
> |>
> |> mahadev
> |>
> |> On 12/12/08 12:39 PM, "Patrick Hunt"  wrote:
> |>
> |>> Mahadev Konar wrote:
> |>>> And you have a success, then the version of the node that denots
> |your
> |>>> successful zoo_set() above is
> |>>>  = Version +1
> |>> Mahadev, that's the current implementation, but I wasn't aware we
> |were
> |>> exposing that detail as something users should rely on. Is it
> |documented
> |>> anywhere in the docs? If this is "user visible" we should document
> |it, I
> |>> thought we weren't exposing this for a reason...
> |>>
> |>>>
> |>>> mahadev
> |>>>
> |>>> On 12/12/08 11:36 AM, "Avery Ching"  wrote:
> |>>>
> |>>>> Patrick,
> |>>>>
> |>>>> Thanks for responding.
> |>>>>
> |>>>> I agree that I can use zoo_exists and zoo_get to get the version
> of
> |the
> |>>>> znode as it exists currently.
> |>>>>
> |>>>> The problem I am trying to solve is that getting the version from
> |struct
> |>>>> Stat in either zoo_exists or zoo_get may not be the same version
> |that my
> |>>>> last successful zoo_set used.  I would like to get the version
> that
> |denotes
> |>>>> my last successful zoo_set() operation to a particular znode.
> |>>>>
> |>>>> I understand that the data and version to the znode may change
> |immediately
> |>>>> one or multiple times after my zoo_set() and this is fine, but I
> |would still
> |>>>> like to know the znode's versions of the data I set.
> |>>>>
> |>>>> Avery
> |>>>>
> |>>>> On 12/12/08 11:11 AM, "Patrick Hunt"  wrote:
> |>>>>
> |>>>>> Avery Ching wrote:
> |>>>>>> If zoo_set() completes successfully with version != -1, can we
> |assume that
> |>>>>>> version -> version + 1 for this znode?  If not, is there a way
> |for the
> |>>>>>> user
> |>>>>>> to get the version of the successfully completed zoo_set()
> |operation?
> |>>>>> You shouldn't rely on this, it may work, but it's not part of the
> |>>>>> contract. Also, nothing says that some other client won't change
> |the
> |>>>>> node immediately after you change it.
> |>>>>>
> |>>>>> You can access the version using zoo_exists or zoo_get -
> |specifically
> |>>>>> the "struct Stat stat" argument of either of those methods
> |contains a
> |>>>>> "version" member.
> |>>>>>
> |>>>>> Patrick
> |>

Re: What happens when a server loses all its state?

2008-12-16 Thread Mahadev Konar

Hi Thomas,

If a zookeeper server loses all state and their are enough servers in the
ensemble to continue a zookeeper service ( like 2 servers in the case of
ensemble of 3), then the server will get the latest snapshot from the leader
and continue.

The idea of zookeeper persisting its state on disk is just so that it does
not lose state. All the guarantees that zookeeper makes is based on the
understanding that we do not lose state of the data we store on the disk.

Their might be problems if we lose the state that we stored on the disk.
We might lose transactions that have been committed and the ensemble might
start with some snapshot in the past.

You might want ot read through how zookeeper internals work. This will help
you understand on why the persistence guarantees are required.

http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent
ations/attachments/zk-talk-upc.pdf

mahadev

On 12/16/08 9:45 AM, "Thomas Vinod Johnson"  wrote:

> What is the expected behavior if a server in a ZooKeeper service
> restarts with all its prior state lost? Empirically, everything seems to
> work*.  Is this something that one can count on, as part of ZooKeeper
> design, or are there known conditions under which this could cause
> problems, either liveness or violation of ZooKeeper guarantees?
> 
> I'm really most interested in a situation where a single server loses
> state, but insights into issues when more than one server loses state
> and other interesting failure scenarios are appreciated.
> 
> Thanks.
> 
> * The restarted server appears to catch up to the latest snapshot (from
> the current leader?).

Re: What happens when a server loses all its state?

2008-12-16 Thread Mahadev Konar

Hi Thomas,



> More generally, is it a safe assumption to make that the ZooKeeper
> service will maintain all its guarantees if a minority of servers lose
> persistent state (due to bad disks, etc) and restart at some point in
> the future?
Yes that is true. 

mahadev

> 
> Thanks.
> Mahadev Konar wrote:
>> Hi Thomas,
>> 
>> If a zookeeper server loses all state and their are enough servers in the
>> ensemble to continue a zookeeper service ( like 2 servers in the case of
>> ensemble of 3), then the server will get the latest snapshot from the leader
>> and continue.
>> 
>> 
>> The idea of zookeeper persisting its state on disk is just so that it does
>> not lose state. All the guarantees that zookeeper makes is based on the
>> understanding that we do not lose state of the data we store on the disk.
>> 
>> 
>> Their might be problems if we lose the state that we stored on the disk.
>> We might lose transactions that have been committed and the ensemble might
>> start with some snapshot in the past.
>> 
>> You might want ot read through how zookeeper internals work. This will help
>> you understand on why the persistence guarantees are required.
>> 
>> http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent
>> ations/attachments/zk-talk-upc.pdf
>> 
>> mahadev
>> 
>> 
>> 
>> On 12/16/08 9:45 AM, "Thomas Vinod Johnson"  wrote:
>> 
>>   
>>> What is the expected behavior if a server in a ZooKeeper service
>>> restarts with all its prior state lost? Empirically, everything seems to
>>> work*.  Is this something that one can count on, as part of ZooKeeper
>>> design, or are there known conditions under which this could cause
>>> problems, either liveness or violation of ZooKeeper guarantees?
>>> 
>>> I'm really most interested in a situation where a single server loses
>>> state, but insights into issues when more than one server loses state
>>> and other interesting failure scenarios are appreciated.
>>> 
>>> Thanks.
>>> 
>>> * The restarted server appears to catch up to the latest snapshot (from
>>> the current leader?).
>>> 
>> 
>>   
>

Re: What happens when a server loses all its state?

2008-12-16 Thread Mahadev Konar

Hi Thomas,
 Here is what would happen in the scenario you mentioned.

> Great - thanks Mahadev.
> 
> Not to drag this on more than necessary, please bear with me for one
> more example of 'amnesia' that comes to mind. I have a set of ZooKeeper
> servers A, B, C.
> - C is currently not running, A is the leader, B is the follower.
> - A proposes zxid1 to A and B, both acknowledge.
> - A asks A to commit (which it persists), but before the same commit
> request reaches B, all servers go down (say a power failure).
In this case, the zookeeper protocol says that zxid1 would be available only
if the client gets a success. So zxid1 may or may not get committed if A and
B come up later. ( this is a different scenario then what you mention
later).

> - Later, B and C come up (A is slow to reboot), but B has lost all state
> due to disk failure.
This is how zookeeper would work in this scenario ---

Now since we have B and C come up and B has the most recent state but loses
it, then zookeeper is clueless about this. So C would say I have the some
zxid say zxid-n and B would say that I have zxid = 0 (since its stateless)
and C would become a leader (since it has the highest zxid).

This would lead to loss of data and loss of state in zookeeper. That's what
I meant when I mentioned that zookeeper relies heavily on the state being
persisted on disk.

> - C becomes the new leader and perhaps continues with some more new
> transactions.
> 
Now if A comes back again, C would say that its the leader and ask A to
truncate all the transactions that A had to come to sync with C.

Again, you can see that how persistence loss can trigger state loss in
zookeeper. If its just minority of servers failing then this can be taken
care of by zookeeper but in this scenario is C failing and then being
brought up with an inconsisten state with another failure of A and data loss
of B -- which zookeeper cannot handle.

I hope this helps. 

mahadev

On 12/16/08 4:02 PM, "Thomas Vinod Johnson"  wrote:

> Mahadev Konar wrote:
>> Hi Thomas,
>> 
>> 
>> 
>>   
>>> More generally, is it a safe assumption to make that the ZooKeeper
>>> service will maintain all its guarantees if a minority of servers lose
>>> persistent state (due to bad disks, etc) and restart at some point in
>>> the future?
>>> 
>> Yes that is true.
>> 
>>   
> Likely I'm misunderstanding the protocol, but have I effectively lost
> zxid1 at this point? What would happen when A comes back up?
> 
> Thanks.

Re: Any practical limits to the number of znodes?

2008-12-29 Thread Mahadev Konar

Hi Jon, 
  We do not have any limit on the number of znodes in Zookeeper. Its mainly
limited by memory, since it keeps the whole namespace in memory.

Also the lock mechanism is a good use case for zookeeper.
It is listed as one of the recipes for zookeeper.

http://hadoop.apache.org/zookeeper/docs/r3.0.1/recipes.html#sc_recipes_Locks

mahadev

On 12/29/08 5:54 PM, "Jon Stefansson"  wrote:

> I am considering using ZooKeeper as a Lock mechanism in an application that
> eventually could produce several hundred thousand znodes per day. The znodes
> will contain little or no data. There will just be a lot of them.
> I don't see anything in the ZooKeeper documentation regarding znode size
> constraints. Is this a good use case for ZooKeeper?

Re: myid....

2009-01-05 Thread Mahadev Konar

You are right Kevin. The myid file is required for creation. I think we had
it well documented. Ill check and see if its not documented well enough and
open a jira in case its not.

mahadev


On 1/5/09 9:58 AM, "Kevin Burton"  wrote:

> Doesn't look like it automatically creates it.
> File myIdFile = new File(dataDir, "myid");
> if (!myIdFile.exists()) {
> throw new IllegalArgumentException(myIdFile.toString()
> + " file is missing");
> }
> 
> 
> On Mon, Jan 5, 2009 at 12:01 AM, Kevin Burton  wrote:
> 
>> I'll look at the code tomorrow morning but I definitely needed to create
>> the myid file for it to work might be a bug with my configuration or the
>> code I'll check the QuorumPeerConfig code tomorrow..
>> Kevin
>> 
>> 
>> On Sun, Jan 4, 2009 at 11:58 PM, Flavio Junqueira wrote:
>> 
>>> Hi Kevin, The admin doesn't need to create "myid". This is created
>>> automaticaly after parsing the configuration file (check
>>> QuorumPeerConfig.parse(String[]) if you are interested in the internals).
>>> 
>>> -Flavio
>>> 
>>> 
>>> On Jan 5, 2009, at 1:10 AM, Kevin Burton wrote:
>>> 
>>>  This wasn't clear in the documentation. but the admin needs to create
 the myid file, correct?
 Couldn't this happen on init if we use IP addresses and we note that it's
 a
 new zk server?
 
 Kevin
 
 --
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 AIM/YIM: sfburtonator
 Skype: burtonator
 Work: http://spinn3r.com
 
>>> 
>>> 
>> 
>> 
>> --
>> Founder/CEO Spinn3r.com
>> Location: San Francisco, CA
>> AIM/YIM: sfburtonator
>> Skype: burtonator
>> Work: http://spinn3r.com
>> 
> 
>

Re: Persistent watches........

2009-01-05 Thread Mahadev Konar

HI Kevin,

 The watches are just a signal that something on the node that you were
watching has changed. It does not get you the diff back saying what changed.
So you will have to read/stat the node in order for you to check what
changed.

The way the watches are implemented in Zookeeper is such that it keeps the
state management on a server (per client) low. This helps us keeping the
watches as lightweight operations on the servers.
We had been looking at providing a stream of updates for a node which a
client could subscribe to, but that is not on our roadmap right now.

I hope this helps.

Thanks 
mahadev

On 1/3/09 8:05 PM, "Kevin Burton"  wrote:

>> Because watches are one time triggers and there is latency between getting
> the
>> event and sending a new request to get a watch you cannot reliably see
> every
>> change that happens to a node in ZooKeeper. Be prepared to handle the case
> where
>> the znode changes multiple times between getting the event and setting the
> watch
>> again. (You may not care, but at least realize it may happen.)
> 
> This seems the opposite of what would be desirable in an ideal client.
> 
> So if I'm only using watches, and a file is changed with values 1,2,3,4
> rapidly, I may lose and never see value 4?  Or I'll just not see 2,3?
> 
> In one situation I'm worried about have a distributed process I'm trying to
> start, stop. I wouldn't want them to drop the second stop.
> 
> How hard would it be to have every mutate event (delete, write, acl change,
> etc) be an event which is persistent and the client sees every change?
> 
> Kevin

Re: Not performing work in the zookeeper even thread.

2009-01-05 Thread Mahadev Konar

HI Kevin,
 We have a single threaded c client in which the application needs to call
the zookeeper_process to process the events. What we noticed is that all the
users who are using it, have problems using it and would like to get rid of
it (only if they we not running on bsd).
So, the events being dispacthed by the event thread makes it easeir for the
users to use zookeeper. We do assume that they do minimum work in the
process method. We sould probably document this (if its not yet)... The huge
backlog is a problem for the clients and they should be aware that doing any
complex operations in the process method is asking for trouble.

Hope this helps.

mahadev

On 1/4/09 12:23 PM, "Kevin Burton"  wrote:

> It looks like events from Zookeeper are dispatches from its event thread:
> java.lang.Exception
> at foo.zookeeper.WatcherImpl.process(NodeWatcher.java:54)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:349)
> 
> ... it seems like a good idea to have implementations perform work in their
> own thread to avoid zookeeper from hitting backlog if one client decides to
> perform some complex operation in one event when there are hundreds of
> events pending delivery.
> 
> Kevin

Re: group messaging, empheral nodes on zookeeper

2009-01-06 Thread Mahadev Konar

I think ben already responded to your second question. Just to make sure all
of the questions in 2 are answered -- -

> 2. What happens to Empheral Nodes when a zookeeper server (not client) dies or
> is separated from the group ?

> 
> Supposing there are 5 zookeeper servers: server_1, ..., server_5
> And a client c_1 connects to server_1 and creates an empheral node /nd_1
> 
> 2a: What happens if server_1 dies/crashes (and c_1 is disconnected from
> server_1) ?
>     Is /nd_1 automatically deleted ?
> 
In this case, c1 will connect to some other active node and nd_1 stays.

> 2b: What happens if server_1 is disconnected from the rest of the group
> (server_2 to
>     server_5, but client c_1 still remains connected to server_1 ?
In this case, the server_1 should shutdown knowing that its disconnected
from the others and does not have a majority to carry on. So c_1 should
reconnect to some other active server and nd_1 should not be deleted.

>     Is /nd_1 automatically deleted ?
>     What happens if server_1 regions the rest of the group ?
>     what happens to /nd_1 then ?
I don't really understadn the question, but is it that both c_1 and s_1 are
in a network partition and the other servers are in a different one?
If that's the case then c_1 would not be able to reconnect to any of the
active servers and thus nd_1 would get deleted.

> 
> Thanks.
> 
> 
> 

I hope this helps 

mahadev

Re: InterruptedException

2009-01-06 Thread Mahadev Konar

Hi Kevin,
 The interrupt exception would be thrown in case any other thread tries to
interrupt zookeeper threads during a client call (its not really
interrupting the server but interrupting the client threads). Its like any
synchronous operation that waits throwing an interrupted exception if
interrupted by any other thread.

Mahadev

PS: ill try responding to your other emails as well as soon as possible :).

On 1/6/09 7:00 PM, "Kevin Burton"  wrote:

> Why does ZK throw InterruptedException?
> Shouldn't this be a KeeperException instead of a java system exception when
> interrupt() is called?
> 
> The javadoc just says:
> 
> "If the server transaction is interrupted"
> 
> If this is a ZK related it should be KeeperException...
> 
>

Re: event re-issue on reconnect?

2009-01-06 Thread Mahadev Konar

Does onData mean a datawatch?
onConnect
> onData path: /foo, version: 4, data: '2333'
> onDisconnect
> onConnect
> onData path: /foo, version: 4, data: '2333'


Are these the sequence of events that you get on the client?

mahadev


On 1/6/09 5:03 PM, "Kevin Burton"  wrote:

> I have an event watching a file... and if I restart the server I get this:
> 
> onConnect
> onData path: /foo, version: 4, data: '2333'
> onDisconnect
> onConnect
> onData path: /foo, version: 4, data: '2333'
> 
> It re-issues the same version of the file. I can of course watch for this in
> my code but it seems like a bug.
> 
> Shouldn't the client keep track of the stat of watches and not bubble up the
> same event on server reconnect?
> 
> Kevin

Re: Simpler ZooKeeper event interface....

2009-01-06 Thread Mahadev Konar

http://issues.apache.org/jira/browse/ZOOKEEPER-23

This has been fixed in zookeeper-3.0 release. Are you using a release from
sourceforge?


mahadev


On 1/6/09 4:57 PM, "Kevin Burton"  wrote:

> This could be simplified if the semantics for reconnect were simplified.
> Is there any reason why I should know about a disconnect if ZK is just going
> to reconnect me to another server in 1ms?
> 
> Why not hide *all* of this form the user and have the client re-issue
> watches on reconnect and hold off on throwing exceptions if the server
> returns.
> 
> This would allow the user to just handle three conditions... total ensemble
> failure, no ACL permission, or no node existing (of vice-versa).
> 
> Kevin
> 
> 
>> If I run an async request, the client should replay these if I'm
>> reconnected to another host.
>> 
>> --
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> AIM/YIM: sfburtonator
> Skype: burtonator
> Work: http://spinn3r.com

Re: Sending data during NodeDataChanged or NodeCreated

2009-01-06 Thread Mahadev Konar



> So if I understand this correctly, if I receive a NodeDataChanged event, and
> then attempt do do a read of that node, there's a race condition where the
> server could crash and I would be disconnected and my read would hit an
> Exception
> Or, the ACL could change and I no longer have permission to read the file
> (though I did for a short window).
> 
> . now I have to add all this logic to retry.  Are there any other race
> conditions I wonder.
I think you have mentioned all of them.

> 
> Why not just send the byte[] data during the NodeDataChanged or NodeCreated
> event from the server?  This would avoid all these issues.
> 
> It's almost certainly what the user wants anyway.
Its just that the watches are pretty lightweight and sending bytes around is
just more work to do at the server. Though we should experiment with how
much more load it generates and how useful would it be to send out the bytes
with the event of nodedatachanged and nodecreated.

mahadev
> 
> Kevin

Re: Simpler ZooKeeper event interface....

2009-01-06 Thread Mahadev Konar

Does javadoc help?  :)

Mahadev


On 1/6/09 4:10 PM, "Kevin Burton"  wrote:

>> 
>> 
>>  zk.getData( event.getPath(), true, this, null );
>> 
>>> 
>>> 
> Also, why not rename this getDataAsync  I can't tell the difference just
> by looking at the method and the different number of arguments.
> Should make things a bit more straight forward.
> 
> Kevin

Re: Simpler ZooKeeper event interface....

2009-01-07 Thread Mahadev Konar

You are right Pat. Replaying an async operation would involve a lot of state
management for clients across servers and would involve a lot more work in
determining which operation succeeded and the one which needs to be re run
and the semantics of zookeeper client calls would be much harder to
guarantee.

mahadev


On 1/7/09 10:33 AM, "Patrick Hunt"  wrote:

> Kevin Burton wrote:
>>> 3) it's possible for your code to get notified of a change, but never
>>> process the change. This might happen if:
>>>  a) a node changed watch fires
>>>  b) your client code runs an async getData
>>>  c) you are disconnected from the server
>>> 
>> 
>> Also, this seems very confusing...
>> 
>> If I run an async request, the client should replay these if I'm reconnected
>> to another host.
> 
> (Ben/Flavio/Mahadev can correct me if I'm wrong here or missed some detail)
> 
> Async operations are tricky as the server makes the change when it gets
> the request, not when the client processes the response. So you could
> request an async operation, which the server could process and respond
> to the client, immed. after which the client is disconnected from the
> server (before it can process the response). Client replay would not
> work in this case, and given that async is typically used for high
> throughput situations there could be a number of operations effected.
> 
> Patrick

Re: Simpler ZooKeeper event interface....

2009-01-07 Thread Mahadev Konar

Hi Vinod,
 I think what Ben meant was this--

The client will never know of a session expiration until and unless its
connected to one of the servers. So the leader cannot demote itself since
its connected to one of the servers. It might have lost its session (which
all the others except itself would have realized) but will have to wait to
demote itself until it connects to one of the servers.

mahadev

On 1/7/09 10:02 AM, "Vinod Johnson"  wrote:

> Benjamin Reed wrote:
>> You don't demote yourself on disconnect. (Everyone else may still believe you
>> are the leader.) Check out the "Things to Remember about Watches" section in
>> the programmer's guide.
>> 
>> When you are disconnected from ZK you don't know what is happening, so you
>> have to act conservatively. Your session may or may not have expired. You
>> will not know for sure until you reconnect to ZK.
>>   
> Just to make sure I'm not misunderstanding the last bit, even without
> reconnecting to ZK, the leader's session could expire at the client
> side, correct? In that case the conservative thing for the leader to do
> is to demote itself if the intent is to avoid split brain (even though
> the session may still be active at ZK for some period of time after this).

Re: ouch, zookeeper infinite loop

2009-01-07 Thread Mahadev Konar

The version of Jute we use is really an ancient version of recordio
ser/deser library in hadoop.  We do want to move to some
better(versioned/fast/well accepted) ser/deser library.

mahadev


On 1/7/09 12:08 PM, "Kevin Burton"  wrote:

> Ah... you think it was because it was empty?  Interesting.  I will have to
> play with Jute a bit.
> Kevin
> 
> On Wed, Jan 7, 2009 at 10:07 AM, Patrick Hunt  wrote:
> 
>> Thanks for the report, entered as:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-268
>> 
>> For the time being you can work around this by setting the threshold to
>> INFO for that class (in log4j.properties). Either that or just set the data
>> to a non-empty value for the znode.
>> 
>> Patrick
>> 
>> 
>> Kevin Burton wrote:
>> 
>>> Creating this node with this ACL:
>>> Created /foo
>>> setAcl /foo world:anyone:w
>>> 
>>> Causes the exception included below.
>>> 
>>> It's an infinite loop so it's just called over and over again filling my
>>> console.
>>> 
>>> I'm just doing an exists( path, true ); ... setting a watch still causes
>>> the
>>> problem.
>>> 
>>> 
>>> 
>>> java.lang.NullPointerException
>>>at org.apache.jute.Utils.toCSVBuffer(Utils.java:234)
>>>at
>>> org.apache.jute.CsvOutputArchive.writeBuffer(CsvOutputArchive.java:101)
>>>at
>>> 
>>> org.apache.zookeeper.proto.GetDataResponse.toString(GetDataResponse.java:48)
>>>at java.lang.String.valueOf(String.java:2827)
>>>at java.lang.StringBuilder.append(StringBuilder.java:115)
>>>at
>>> org.apache.zookeeper.ClientCnxn$Packet.toString(ClientCnxn.java:230)
>>>at java.lang.String.valueOf(String.java:2827)
>>>at java.lang.StringBuilder.append(StringBuilder.java:115)
>>>at
>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:586)
>>>at
>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:626)
>>>at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:852)
>>> java.lang.NullPointerException
>>>at org.apache.jute.Utils.toCSVBuffer(Utils.java:234)
>>>at
>>> org.apache.jute.CsvOutputArchive.writeBuffer(CsvOutputArchive.java:101)
>>>at
>>> 
>>> org.apache.zookeeper.proto.GetDataResponse.toString(GetDataResponse.java:48)
>>>at java.lang.String.valueOf(String.java:2827)
>>>at java.lang.StringBuilder.append(StringBuilder.java:115)
>>>at
>>> org.apache.zookeeper.ClientCnxn$Packet.toString(ClientCnxn.java:230)
>>>at java.lang.String.valueOf(String.java:2827)
>>>at java.lang.StringBuilder.append(StringBuilder.java:115)
>>>at
>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:586)
>>>at
>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:626)
>>>at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:852)
>>> 
>>> 
>

Re: Reconnecting to another host on failure but before session expires...

2009-01-07 Thread Mahadev Konar

Hi Kevin,

Here is the link to the jira
http://issues.apache.org/jira/browse/ZOOKEEPER-146

Comments are welcome.


mahadev


On 1/7/09 12:35 PM, "Kevin Burton"  wrote:

> I like this idea though if that URL vanishes you can't connect to ZK.
> The bootstrap problem is hard.
> 
> Another issue is adding more ensembles without reconfiguring all clients.
> 
> It might be nice to connect to a 'profile' and have the client persist hosts
> in that profile.
> 
> This way you can new machines (ZK servers) to the profile, wait for all ZK
> nodes to DL the new profile, then safely shutdown the old machines.
> 
> The local ZK nodes would need some type of local persistent storage though.
> 
> Kevin
> 
> On Wed, Jan 7, 2009 at 9:05 AM, Benjamin Reed  wrote:
> 
>> Using a string gives us some flexibility. There is an outstanding issue to
>> be able to pass in a URL: http://aeoueu/oeueue, the idea being that we
>> pull down the content to get the list of servers and ports.
>>

Re: Does session expiration only happen during total ensemble failure or network split?

2009-01-07 Thread Mahadev Konar

Kevin,
 the case you mention, the session is not really expired unless the quorum
decides to expire it. So the client assuming that the session expired would
be wrong to say. It is possible that as soon as you bring up the servers,
the client reconnects with the same session and the session is still valid.

Why would you want the session to expire if all the servers are down (which
should not happen unless you kill all the nodes or the datacenter is down) ?

mahadev

On 1/7/09 12:39 PM, "Kevin Burton"  wrote:

>> 
>> The ZK ensemble leader expires the client session if it doesn't hear from
>> the client w/in the timeout specified by the client when the session was
>> established.
>> 
>> A client will disconnect from a server in the ensemble and attempt
>> reconnect to another server in the ensemble if it doesn't hear from the
>> server w/in 2/3 of the specified session timeout.
>> 
> 
> OK... I got that part.  The issue I'm running into now though is that my
> sessions aren't actually timing out when I shutdown all servers in an
> ensemble.
> 
> One solution/hack would be to record how long you've been disconnected and
> assume that your session has been expired.
> 
> Kevin

Re: A modest proposal for simplifying zookeeper :)

2009-01-09 Thread Mahadev Konar

Hi Kevin,
  It would be great to have such high level interfaces. It could be
something that you could contribute :) . We havent had the bandwidth to
provide such interfaces for zookeeper. It would be great to have all such
recipes as a part of contrib package of zookeeper.

mahadev 

On 1/9/09 11:44 AM, "Kevin Burton"  wrote:

> OK so it sounds from the group that there are still reasons to provide
> rope in ZK to enable algorithms like leader election.
> Couldn't ZK ship higher level interfaces for leader election, mutexes,
> semapores, queues, barriers, etc instead of pushing this on developers?
> 
> Then the remaining APIs, configuration, event notification, and discovery,
> can be used on a simpler, rope free API.
> 
> The rope is what's killing me now :)
> 
> Kevin

Re: Maximum number of children

2009-01-12 Thread Mahadev Konar

I was going to suggest bucketing with predifined hashes.
/root/template/data/hashbucket/hash

For the issue raised by Joshua regarding the length of the output from the
server -- 
This is a bug. We seem to allow any number of children (< int) of a node and
the getchildren call fails to return the children. This leads to a chicken
and egg problem on how to get rid of the nodes if you do not know them.

Here we arent saving nething since the server has already processed the
request and sent us the data. We should get rid of this hard coded limit. I
am not sure why we had this limit.

Can you open a jira for this Joshua?

thanks
mahadev


On 1/12/09 5:39 PM, "Stu Hood"  wrote:

> To continue with your current design, you could create a trie based on shared
> hash prefixes.
> 
> /root/template/date/ 1a5e67/2b45dc
> /root/template/date/ 1a5e67/3d4a1f
> /root/template/date/ 3d4a1f/1a5e67
> /root/template/date/ 3d4a1f/2b45dc
> 
> Alternatively, you could use what the maildir mail storage format uses:
> /root/template/date/ eh/eharmony.com/jo/joshuatuberville
> 
> Just check with the second one that all of the characters you support in email
> addresses are supported in znode names.
> 
> Thanks,
> Stu
> 
> 
> -Original Message-
> From: "Joshua Tuberville" 
> Sent: Monday, January 12, 2009 7:53pm
> To: "'zookeeper-user@hadoop.apache.org'" 
> Subject: Maximum number of children
> 
> Hello,
> 
> We are attempting to use ZooKeeper to coordinate daily email thresholds.  To
> do this we created a node hierarchy of
> 
> /root/template/date/email_hash
> 
> The idea being that we only send the template to an email address once per
> day.  This is intended to support millions of email hashes per day. From the
> ZooKeeper perspective we just attempt a create and if it succeeds we proceed
> and if we get a node exists exception we stop processing.  This seems to
> operate fine for over 2 million email hashes so far in testing.  However we
> also want to prune all previous days nodes to conserve memory.  We have run
> into a hard limit while using the getChildren method for a given
> /root/template/date.  If the List of children exceeds the hardcoded 4,194,304
> byte limit ClientCnxn$SendThread.readLength() throws an exception on line 490.
> So we have an issue that we can not delete a node that has children nor is it
> possible to delete a node who has children whose total names exceed 4 Mb.
> 
> Any feedback or guidance is appreciated.
> 
> Joshua Tuberville
> 
>

Re: Maximum number of children

2009-01-13 Thread Mahadev Konar

Thanks Joshua. 

mahadev


On 1/13/09 10:43 AM, "Joshua Tuberville" 
wrote:

> Thanks to everyone for proposed schemes and I created ZOOKEEPER-272 per your
> request Mahadev.
> 
> Joshua
> 
> 
> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Monday, January 12, 2009 7:04 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Maximum number of children
> 
> I was going to suggest bucketing with predifined hashes.
> /root/template/data/hashbucket/hash
> 
> For the issue raised by Joshua regarding the length of the output from the
> server -- 
> This is a bug. We seem to allow any number of children (< int) of a node and
> the getchildren call fails to return the children. This leads to a chicken
> and egg problem on how to get rid of the nodes if you do not know them.
> 
> Here we arent saving nething since the server has already processed the
> request and sent us the data. We should get rid of this hard coded limit. I
> am not sure why we had this limit.
> 
> Can you open a jira for this Joshua?
> 
> thanks
> mahadev
> 
> 
> On 1/12/09 5:39 PM, "Stu Hood"  wrote:
> 
>> To continue with your current design, you could create a trie based on shared
>> hash prefixes.
>> 
>> /root/template/date/ 1a5e67/2b45dc
>> /root/template/date/ 1a5e67/3d4a1f
>> /root/template/date/ 3d4a1f/1a5e67
>> /root/template/date/ 3d4a1f/2b45dc
>> 
>> Alternatively, you could use what the maildir mail storage format uses:
>> /root/template/date/ eh/eharmony.com/jo/joshuatuberville
>> 
>> Just check with the second one that all of the characters you support in
>> email
>> addresses are supported in znode names.
>> 
>> Thanks,
>> Stu
>> 
>> 
>> -Original Message-
>> From: "Joshua Tuberville" 
>> Sent: Monday, January 12, 2009 7:53pm
>> To: "'zookeeper-user@hadoop.apache.org'" 
>> Subject: Maximum number of children
>> 
>> Hello,
>> 
>> We are attempting to use ZooKeeper to coordinate daily email thresholds.  To
>> do this we created a node hierarchy of
>> 
>> /root/template/date/email_hash
>> 
>> The idea being that we only send the template to an email address once per
>> day.  This is intended to support millions of email hashes per day. From the
>> ZooKeeper perspective we just attempt a create and if it succeeds we proceed
>> and if we get a node exists exception we stop processing.  This seems to
>> operate fine for over 2 million email hashes so far in testing.  However we
>> also want to prune all previous days nodes to conserve memory.  We have run
>> into a hard limit while using the getChildren method for a given
>> /root/template/date.  If the List of children exceeds the hardcoded 4,194,304
>> byte limit ClientCnxn$SendThread.readLength() throws an exception on line
>> 490.
>> So we have an issue that we can not delete a node that has children nor is it
>> possible to delete a node who has children whose total names exceed 4 Mb.
>> 
>> Any feedback or guidance is appreciated.
>> 
>> Joshua Tuberville
>> 
>> 
>

Delaying 3.2 release by 2 to 3 weeks?

2009-01-15 Thread Mahadev Konar

Hi all,
  I needed to get quotas in zookeeper 3.2.0 and wanted to see if delaying
the release by 2-3 weeks is ok with everyone?
Here is the jira for it -

http://issues.apache.org/jira/browse/ZOOKEEPER-231

Please respond if you have any issues with the delay.

thanks
mahadev

Re: Delaying 3.1 release by 2 to 3 weeks?

2009-01-15 Thread Mahadev Konar

That was release 3.1 and not 3.2 :)

mahadev


On 1/15/09 4:26 PM, "Mahadev Konar"  wrote:

> Hi all,
>   I needed to get quotas in zookeeper 3.2.0 and wanted to see if delaying
> the release by 2-3 weeks is ok with everyone?
> Here is the jira for it -
> 
> http://issues.apache.org/jira/browse/ZOOKEEPER-231
> 
> Please respond if you have any issues with the delay.
> 
> thanks
> mahadev
> 
>

Re: Standard redistributable set of primitives?

2009-01-16 Thread Mahadev Konar

Hi Tom,
 IT does sound like a reasonable idea. If you want to go ahead and implement
one of those, we would be happy to help out and get it into Zookeeper. We
havent had the bandwidth to put in these recipes in Zookeeper code base.
Please go ahead and create a jira if you want to work on it.

Thanks
mahadev

On 1/16/09 6:54 AM, "Tom Nichols"  wrote:

> Hi,
> 
> I was wondering if there were plans to create a set of standard
> ZooKeeper primitives, sort of like commons-collections.  I figure it
> would be mostly based off of the recipes on the ZK wiki, but it would
> provide users with a slightly easier starting point, not to mention it
> would be well-tested and prevent users avoid "re-inventing the wheel."
>  Does this sound like a reasonable idea?
> 
> Thanks.
> -Tom

Re: Delaying 3.1 release by 2 to 3 weeks?

2009-01-16 Thread Mahadev Konar

> we should delay. it would be good to try out quotas for a bit before we do the
> release. quotas are also a key part of the release. 3 weeks seem a little long
> though.
3 weeks is just a worst case estimate... We should probably be done in 2
weeks. 

mahadev
> 
> ben
> ________
> From: Mahadev Konar [maha...@yahoo-inc.com]
> Sent: Thursday, January 15, 2009 4:32 PM
> To: zookeeper-...@hadoop.apache.org
> Cc: zookeeper-user@hadoop.apache.org
> Subject: Re: Delaying 3.1 release by 2 to 3 weeks?
> 
> That was release 3.1 and not 3.2 :)
> 
> mahadev
> 
> 
> On 1/15/09 4:26 PM, "Mahadev Konar"  wrote:
> 
>> Hi all,
>>   I needed to get quotas in zookeeper 3.2.0 and wanted to see if delaying
>> the release by 2-3 weeks is ok with everyone?
>> Here is the jira for it -
>> 
>> http://issues.apache.org/jira/browse/ZOOKEEPER-231
>> 
>> Please respond if you have any issues with the delay.
>> 
>> thanks
>> mahadev
>> 
>> 
>

Re: Delaying 3.1 release by 2 to 3 weeks?

2009-01-16 Thread Mahadev Konar

I think it should be done in 2 weeks..

mahadev


On 1/16/09 2:34 PM, "Patrick Hunt"  wrote:

> Mahadev, can you complete quotas in 2 weeks? This includes completing
> the code itself, documentation, tests, and incorporating review feedback?
> 
> Parick
> 
> Benjamin Reed wrote:
>> we should delay. it would be good to try out quotas for a bit before
>> we do the release. quotas are also a key part of the release. 3 weeks
>> seem a little long though.
>> 
>> ben ____ From: Mahadev Konar
>> [maha...@yahoo-inc.com] Sent: Thursday, January 15, 2009 4:32 PM To:
>> zookeeper-...@hadoop.apache.org Cc: zookeeper-user@hadoop.apache.org
>> Subject: Re: Delaying 3.1 release by 2 to 3 weeks?
>> 
>> That was release 3.1 and not 3.2 :)
>> 
>> mahadev
>> 
>> 
>> On 1/15/09 4:26 PM, "Mahadev Konar"  wrote:
>> 
>>> Hi all, I needed to get quotas in zookeeper 3.2.0 and wanted to see
>>> if delaying the release by 2-3 weeks is ok with everyone? Here is
>>> the jira for it -
>>> 
>>> http://issues.apache.org/jira/browse/ZOOKEEPER-231
>>> 
>>> Please respond if you have any issues with the delay.
>>> 
>>> thanks mahadev
>>> 
>>> 
>>

Re: Testing Zookeeper

2009-02-10 Thread Mahadev Konar

HI Joshua,
  Feel free to open a jira and attach a patch.

Please take a look at how to contribute:

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

Thanks
mahadev

On 2/10/09 11:34 AM, "Joshua Tuberville" 
wrote:

> To test our zookeeper usage we built a utility class using some of the methods
> in org.apache.zookeeper.test.ClientBase out of the test folder.  This allows
> testing to be done using any framework JUnit4, JUnit5, TestNG, etc.  We would
> prefer this be in the zookeeper jar.  Should I open a JIRA item and include
> the class?
> 
> Thanks,
> Joshua

Re: Testing Zookeeper

2009-02-10 Thread Mahadev Konar

Hi Nitay and Joshua,
  It would be great if in the future we could keep these discussions about
development and aptches on zookeeper-dev rather than the user list. The user
list is supposed to be used for released versions and questions from users.

Thanks
mahadev


On 2/10/09 12:51 PM, "Joshua Tuberville" 
wrote:

> Nitay,
> 
> Thanks for pointing out your ticket.  I assumed someone had already done the
> same.  I will take a look at your patch and compare to our code.  I agree that
> there should be some common way for tests both internal and external to
> buildup a server and tear it down.
> 
> Joshua
> 
> -Original Message-
> From: Nitay [mailto:nit...@gmail.com]
> Sent: Tuesday, February 10, 2009 12:46 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Testing Zookeeper
> 
> Joshua,
> 
> There may already be some JIRAs open regarding this, e.g.
> https://issues.apache.org/jira/browse/ZOOKEEPER-278. You can assign those to
> yourself and attach your stuff there if it fits your issue.
> 
> On Tue, Feb 10, 2009 at 11:44 AM, Mahadev Konar wrote:
> 
>> HI Joshua,
>>  Feel free to open a jira and attach a patch.
>> 
>> Please take a look at how to contribute:
>> 
>> http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
>> 
>> Thanks
>> mahadev
>> 
>> On 2/10/09 11:34 AM, "Joshua Tuberville" 
>> wrote:
>> 
>>> To test our zookeeper usage we built a utility class using some of the
>> methods
>>> in org.apache.zookeeper.test.ClientBase out of the test folder.  This
>> allows
>>> testing to be done using any framework JUnit4, JUnit5, TestNG, etc.  We
>> would
>>> prefer this be in the zookeeper jar.  Should I open a JIRA item and
>> include
>>> the class?
>>> 
>>> Thanks,
>>> Joshua
>> 
>>

Re: Dealing with session expired

2009-02-12 Thread Mahadev Konar

Hi Tom,
  We prefer to discard the zookeeper instance if a session expires.
Maintaining a one to one relationship between a client handle and a session
makes it much simpler for users to understand the existence and
disappearance of ephemeral nodes and watches created by a zookeeper client.

thanks
mahadev


On 2/12/09 10:58 AM, "Tom Nichols"  wrote:

> I've come across the situation where a ZK instance will have an
> expired connection and therefore all operations fail.  Now AFAIK the
> only way to recover is to create  a new ZK instance with the old
> session ID, correct?
> 
> Now, my problem is, the ZK instance may be shared -- not between
> threads -- but maybe two classes in the same thread synchronize on
> different nodes by using different watchers.  So it makes sense that
> one ZK client instance can handle this.  Except that even if I detect
> the session expiration by catching the KeeperException, if I want to
> "resume" the session, I have to create a new ZK instance and pass it
> to any classes who were previously sharing the same instance.  Does
> this make sense so far?
> 
> Anyway, bottom line is, it would be nice if a ZK instance could itself
> recover a session rather than discarding that instance and creating a
> new one.
> 
> Thoughts?
> 
> Thanks in advance,
> 
> -Tom

Re: Dealing with session expired

2009-02-12 Thread Mahadev Konar

Hi Tom,
 The session expired event means that the the server expired the client and
that means the watches and ephemrals will go away for that node.

How are you running your zookeeper quorum? Session expiry event should be
really rare event . If you have a quorum of servers it should rarely happen.

mahadev


On 2/12/09 11:17 AM, "Tom Nichols"  wrote:

> So if a session expires, my ephemeral nodes and watches have already
> disappeared?  I suppose creating a new ZK instance with the old
> session ID would not do me any good in that case.  Correct?
> 
> Thanks.
> -Tom
> 
> 
> 
> On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar  wrote:
>> Hi Tom,
>>  We prefer to discard the zookeeper instance if a session expires.
>> Maintaining a one to one relationship between a client handle and a session
>> makes it much simpler for users to understand the existence and
>> disappearance of ephemeral nodes and watches created by a zookeeper client.
>> 
>> thanks
>> mahadev
>> 
>> 
>> On 2/12/09 10:58 AM, "Tom Nichols"  wrote:
>> 
>>> I've come across the situation where a ZK instance will have an
>>> expired connection and therefore all operations fail.  Now AFAIK the
>>> only way to recover is to create  a new ZK instance with the old
>>> session ID, correct?
>>> 
>>> Now, my problem is, the ZK instance may be shared -- not between
>>> threads -- but maybe two classes in the same thread synchronize on
>>> different nodes by using different watchers.  So it makes sense that
>>> one ZK client instance can handle this.  Except that even if I detect
>>> the session expiration by catching the KeeperException, if I want to
>>> "resume" the session, I have to create a new ZK instance and pass it
>>> to any classes who were previously sharing the same instance.  Does
>>> this make sense so far?
>>> 
>>> Anyway, bottom line is, it would be nice if a ZK instance could itself
>>> recover a session rather than discarding that instance and creating a
>>> new one.
>>> 
>>> Thoughts?
>>> 
>>> Thanks in advance,
>>> 
>>> -Tom
>> 
>>

Re: Watcher guarantees

2009-02-13 Thread Mahadev Konar

> If client sets a watcher on a znode by doing a getData operation is it
> guaranteed to get the next change after the value it read, or can a
> change be missed?
The watch is just a notification that the node changed. If you do a getData
on the node, you their might have been more updates on the node.
So yes, you can miss changes.
> 
> In other words if the value it read had zxid z1 and the next update of
> the znode has zxid z2, will the watcher always get the event for the
> change z2?
>
The watcher will always get an event for zxid z2, but since the watch does
not have the data with it -- it will have to do getdata for the that node.
 
> Thanks,
> Tom
Mahadev

Re: Recommended session timeout

2009-02-23 Thread Mahadev Konar

Hi Joey,
 here is a link to information on session timeouts.
http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
zkSessions

The session timeouts depends on how sensitive you want your application to
be. A very low session timeout like (1-2 seconds) might lead to your
application being very sensitive to events like minor network problems etc.,
a higher values of say (30 seconds) on the other hand might lead to slow
detection of client failures -- example one of the zookeeper client which
has ephemeral node goes down, in this case the ephemeral nodes will only go
away after session timeout.

I have seen some users using 10-15 seconds of session timeout, but you
should use as per your application requirements.

Hope this helps.
mahadev

On 2/22/09 3:09 AM, "Joey Echeverria"  wrote:

> Is there a recommended session timeout? Does it change based on the
> ensemble size?
> 
> Thanks,
> 
> -Joey

Re: Recommended session timeout

2009-02-23 Thread Mahadev Konar

On 2/23/09 11:37 PM, "Joey Echeverria"  wrote:

> Thanks for the link to the documentation. I've been running tests with
> a 5 second session timeout and disconnect events appear frequent. The
> network they're operating on is generally quite, but the disconnects
> to correlate with an increase in activity (e.g. loading data into the
> system).
> 
> Does this seem normal to you or does it imply a potential
> configuration problem on my network?
How many zookeeper quorum servers are you running? What is the config for
the zookeeper servers?

> 
> On a related topic, I was reading the 3.1 client source code,
> particularly the reconnect source, and noticed that the client sleeps
> for up to 1 second before trying to reconnect. This seems excessive
> and with a 5 second session timeout leads to more frequent session
> expirations. Almost every time it sleeps for more than about 800 ms, a
> disconnect is followed by an expiration.
Can you point me to the code which you think does this? A client is supposed
to disconnect itself from a server if it does not hear a response to its
ping's within 1/3 of the session timeout. It should then reconnect to the
other servers. Session expiration  happening so frequently does indicate a
problem. More information on your setup will help.

Thanks
mahadev

> 
> Is this a bug, or desirable behavior?
> 
> Thanks,
> 
> -Joey
> 
> On Mon, Feb 23, 2009 at 10:37 PM, Patrick Hunt  wrote:
>> The latest docs (3.1.0 has some updates to that section) can be found here:
>> http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperProgrammers.html#ch_z
>> kSessions
>> 
>> Patrick
>> 
>> Mahadev Konar wrote:
>>> 
>>> Hi Joey,
>>>  here is a link to information on session timeouts.
>>> 
>>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
>>> zkSessions
>>>  The session timeouts depends on how sensitive you want your application
>>> to
>>> be. A very low session timeout like (1-2 seconds) might lead to your
>>> application being very sensitive to events like minor network problems
>>> etc.,
>>> a higher values of say (30 seconds) on the other hand might lead to slow
>>> detection of client failures -- example one of the zookeeper client which
>>> has ephemeral node goes down, in this case the ephemeral nodes will only
>>> go
>>> away after session timeout.
>>> 
>>> I have seen some users using 10-15 seconds of session timeout, but you
>>> should use as per your application requirements.
>>> 
>>> Hope this helps.
>>> mahadev
>>> 
>>> 
>>> On 2/22/09 3:09 AM, "Joey Echeverria"  wrote:
>>> 
>>>> Is there a recommended session timeout? Does it change based on the
>>>> ensemble size?
>>>> 
>>>> Thanks,
>>>> 
>>>> -Joey
>>> 
>>

Re: Anyone using Zookeeper in AWS (Amazon Cloud)?

2009-02-26 Thread Mahadev Konar

Hi Greg,
  As for cross datacenters, we have tested zookeeper cross data centers and
it works fine. The only thing is that you might have to tweak synclimit and
tickTime to a little higher values for Zookeeper.

http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperAdmin.html#sc_config
uration provides documentation on these parameters.

As for communication protocol within zookeeper servers, we currently use raw
tcp sockets to send and receive data.  I cannot estimate the time it would
take the time for it to use https, but it wont be just a week of work for
sure.  Also, the client to zookeeper server is raw tcp again.

We would certainly like to have security in Zookeeper. Currently, Hadoop
Core is also working on getting security in place. We plan to have a similar
security model as theirs (I think they are looking at kerberos -- not sure)
.

mahadev

On 2/26/09 10:24 AM, "bebble zap"  wrote:

> We're thinking about using Zookeeper as our coordination service and
> also for doing group membership in the Amazon Cloud.  Currently our
> applications are deployed in Amazon Cloud on multiple availability
> zones (i.e. data centers), so this means that ZK nodes will be talking
> across datacenters.  I'm assuming that the additional latency from
> going across datacenters shouldn't be too big of an issue.  Also, we
> are paranoid about security in the cloud, so we'd like to use https as
> the communications protocol for Zookeeper -- not sure if this is a
> trivial thing to do or not.   Wondering if anyone's already doing this
> today or whether Zookeeper is not the right solution given our
> environment currently.
> 
> Thanks
> Greg

Re: How large an ensemble can one build with Zookeeper?

2009-03-03 Thread Mahadev Konar

HI Chad,
 The maximum number of zookeeper servers we have tested with is 13. Even
with 13 the performance starts to degrade very quickly (compared to ensemble
of 5 and 7). I am not sure we have the current numbers (we have made 3x or
so performance improvements) but with the old number in zookeeper.pdf on
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations

The slide is at the end.

You can see that the performance drops with 13 servers. We usually suggest 5
or 7 servers for ZooKeeper. We can get around 20K-30K writes per second and
more than 50K reads per second from an ensemble of 5 servers (as of now with
performance enhancements). With 5 servers you can tolerate a failure of 2
nodes. 
Please take a look at zookeeper presentations -
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations
To find out more about Zookeeper.

What is the rationale behind having such a huge amount of zookeeper servers?

Thanks
mahadev

On 3/3/09 5:30 PM, "Chad Harrington"  wrote:

> Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
> an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
> Are there limitations that make the system unusable at large numbers of
> servers?
> 
> Thanks,

Re: Distributed Lock Libraries

2009-03-06 Thread Mahadev Konar

Hi Fernando,
 Our 3.2 release is focussed more on having such recipes in a clieaner and
reliable way.
 One of the jiras that focusses on this is
http://issues.apache.org/jira/browse/ZOOKEEPER-78.

We hope to add more of such recipes in 3.2.

Thanks
mahadev

On 3/6/09 2:30 PM, "Fernando Padilla"  wrote:

> When I first discovered Zookeeper last year, it was all about the
> low-level file-system semantics, and letting the client use it with
> several knowns recipes (design patterns, algorithms, etc).
> 
> And I remember it mentioning that some people within Yahoo were working
> on a nice client-side library to expose those recipes in a cleaner more
> reliable way.
> 
> I was wondering, has any been able to create a client-side library that
> exposes Zookeeper as a simple java.util.concurrent.locks interfaces?
> (Lock, ReadWriteLock)??

Re: Dynamic addition of servers to Zookeeper cluster

2009-03-13 Thread Mahadev Konar

Hi Raghu,
 You are right that the cluster configuration is a static one. To deal with
this problem, we usually suggest that you change your configs for every
server and then re hup all of them at the same time (almost the same time I
mean). The clients would lose connections to the servers but will reconnect
and should regain their old sessions.

Also, their is an open jira on this:
http://issues.apache.org/jira/browse/ZOOKEEPER-107

thanks
mahadev

On 3/13/09 4:46 PM, "rag...@yahoo.com"  wrote:

> 
> ZooKeeper gurus,
> 
> Can I add servers dynamically to a ZooKeeper cluster? If I understand ZooKeepr
> cluster correctly, each server should know about other servers in the cluster
> during server start up. Does this mean that the cluster size is static once
> the cluster is running and a new server can be added to the cluster only by
> bringing down the cluster and restarting each server with the new server name
> included in each server's configuration file?
> 
> Thanks
> Raghu
> 
> 
> 
>

Re: Dynamic addition of servers to Zookeeper cluster

2009-03-13 Thread Mahadev Konar

By re hup I mean restart.

Thanks
mahadev


On 3/13/09 4:54 PM, "Mahadev Konar"  wrote:

> Hi Raghu,
>  You are right that the cluster configuration is a static one. To deal with
> this problem, we usually suggest that you change your configs for every
> server and then re hup all of them at the same time (almost the same time I
> mean). The clients would lose connections to the servers but will reconnect
> and should regain their old sessions.
> 
> Also, their is an open jira on this:
> http://issues.apache.org/jira/browse/ZOOKEEPER-107
> 
> thanks
> mahadev
> 
> 
> On 3/13/09 4:46 PM, "rag...@yahoo.com"  wrote:
> 
>> 
>> ZooKeeper gurus,
>> 
>> Can I add servers dynamically to a ZooKeeper cluster? If I understand
>> ZooKeepr
>> cluster correctly, each server should know about other servers in the cluster
>> during server start up. Does this mean that the cluster size is static once
>> the cluster is running and a new server can be added to the cluster only by
>> bringing down the cluster and restarting each server with the new server name
>> included in each server's configuration file?
>> 
>> Thanks
>> Raghu
>> 
>> 
>> 
>> 
>

Re: Semantics of ConnectionLoss exception

2009-03-25 Thread Mahadev Konar

Hi Nitay,
 > 
> - Does this event happening mean my ephemeral nodes will go away?
No. the client will try connecting to other servers and if its not able to
reconnect to the servers within the remaining session timeout.

If the client is not able to connect within the remaining session timeout,
the session will expire and you will get a session expired event.

http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html
Has this information scattered around, but we should put it in the FAQ
specifically. 

- Is the ZooKeeper handle I'm using dead after this event?
Again no. your handle is valid until you get an session expiry event or you
do a zoo_close on your handle.


Thanks 
mahadev




On 3/25/09 5:42 PM, "Nitay"  wrote:

> I'm a little unclear about the ConnectionLoss exception as it's described in
> the FAQ and would like some clarification.
> 
> From the state diagram, http://wiki.apache.org/hadoop/ZooKeeper/FAQ#1, there
> are three events that cause a ConnectionLoss:
> 
> 1) In Connecting state, call close().
> 2) In Connected state, call close().
> 3) In Connected state, get disconnected.
> 
> It's the third one I'm unclear about.
> 
> - Does this event happening mean my ephemeral nodes will go away?
> - Is the ZooKeeper handle I'm using dead after this event? Meaning that,
> similar to the SessionExpired case, I need to construct a new connection
> handle to ZooKeeper and take care of the restarting myself. It seems from
> the diagram that this should not be the case. Rather, seeing as the
> disconnected event sends the user back to the Connecting state, my handle
> should be fine and the library will keep trying to reconnect to ZooKeeper
> internally? I understand my current operation may have failed, what I'm
> asking about is future operations.
> 
> Thanks,
> -n

Re: Semantics of ConnectionLoss exception

2009-03-26 Thread Mahadev Konar

> 
> Isn't it the case that the client won't get session expired until it's
> able to connect to a server, right? So what might happen is that the
> client loses connection to the server, the server eventually expires the
> client and deletes ephemerals (notifying all watchers) but the client
> won't see the "session expiration" until it is able to reconnect to one
> of the servers. ie the client doesn't know it's been expired until it's
> able to reconnect to the cluster, at which point it's notified that it's
> been expired.
You are right pat!

mahadev

> 
>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html
>> Has this information scattered around, but we should put it in the FAQ
>> specifically. 
> 
> 3.0.1 is a bit old, try this for the latest docs:
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html
> 
>> - Is the ZooKeeper handle I'm using dead after this event?
>> Again no. your handle is valid until you get an session expiry event or you
>> do a zoo_close on your handle.
>> 
>> 
>> Thanks 
>> mahadev
>> 
>> 
>> 
>> 
>> On 3/25/09 5:42 PM, "Nitay"  wrote:
>> 
>>> I'm a little unclear about the ConnectionLoss exception as it's described in
>>> the FAQ and would like some clarification.
>>> 
>>> From the state diagram, http://wiki.apache.org/hadoop/ZooKeeper/FAQ#1, there
>>> are three events that cause a ConnectionLoss:
>>> 
>>> 1) In Connecting state, call close().
>>> 2) In Connected state, call close().
>>> 3) In Connected state, get disconnected.
>>> 
>>> It's the third one I'm unclear about.
>>> 
>>> - Does this event happening mean my ephemeral nodes will go away?
>>> - Is the ZooKeeper handle I'm using dead after this event? Meaning that,
>>> similar to the SessionExpired case, I need to construct a new connection
>>> handle to ZooKeeper and take care of the restarting myself. It seems from
>>> the diagram that this should not be the case. Rather, seeing as the
>>> disconnected event sends the user back to the Connecting state, my handle
>>> should be fine and the library will keep trying to reconnect to ZooKeeper
>>> internally? I understand my current operation may have failed, what I'm
>>> asking about is future operations.
>>> 
>>> Thanks,
>>> -n
>>

Re: Semantics of ConnectionLoss exception

2009-03-26 Thread Mahadev Konar

The problem is that we cannot differentiate between the servers being down
and a network problem from the client to servers.

If the servers are down and we expire the session for a client on the client
side -- the servers would come up and would still have the session as valid
(though it will be expired in a few session timeout seconds) but we pre
maturely would have expired the session at the client.

You can look at it this way -- the client does not know whats going on with
the zookeeper service, so instead of giving back a false answer to the
application it waits for a true answer from the servers. Also, we are
assuming that the client would not be able to proceed further neway if it
cannot contact the servers.


Hope this answers your question.

mahadev

On 3/26/09 12:09 PM, "Nitay"  wrote:

> Why is it done that way? How am I supposed to reliably detect that my
> ephemeral nodes are gone? Why not deliver the Session Expired event on the
> client side after the right time has passed without communication to any
> server?
> 
> On Thu, Mar 26, 2009 at 10:58 AM, Mahadev Konar wrote:
> 
>>> 
>>> Isn't it the case that the client won't get session expired until it's
>>> able to connect to a server, right? So what might happen is that the
>>> client loses connection to the server, the server eventually expires the
>>> client and deletes ephemerals (notifying all watchers) but the client
>>> won't see the "session expiration" until it is able to reconnect to one
>>> of the servers. ie the client doesn't know it's been expired until it's
>>> able to reconnect to the cluster, at which point it's notified that it's
>>> been expired.
>> You are right pat!
>> 
>> mahadev
>> 
>>> 
>>>> 
>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html
>>>> Has this information scattered around, but we should put it in the FAQ
>>>> specifically.
>>> 
>>> 3.0.1 is a bit old, try this for the latest docs:
>>> 
>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html
>>> 
>>>> - Is the ZooKeeper handle I'm using dead after this event?
>>>> Again no. your handle is valid until you get an session expiry event or
>> you
>>>> do a zoo_close on your handle.
>>>> 
>>>> 
>>>> Thanks
>>>> mahadev
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 3/25/09 5:42 PM, "Nitay"  wrote:
>>>> 
>>>>> I'm a little unclear about the ConnectionLoss exception as it's
>> described in
>>>>> the FAQ and would like some clarification.
>>>>> 
>>>>> From the state diagram, http://wiki.apache.org/hadoop/ZooKeeper/FAQ#1,
>> there
>>>>> are three events that cause a ConnectionLoss:
>>>>> 
>>>>> 1) In Connecting state, call close().
>>>>> 2) In Connected state, call close().
>>>>> 3) In Connected state, get disconnected.
>>>>> 
>>>>> It's the third one I'm unclear about.
>>>>> 
>>>>> - Does this event happening mean my ephemeral nodes will go away?
>>>>> - Is the ZooKeeper handle I'm using dead after this event? Meaning
>> that,
>>>>> similar to the SessionExpired case, I need to construct a new
>> connection
>>>>> handle to ZooKeeper and take care of the restarting myself. It seems
>> from
>>>>> the diagram that this should not be the case. Rather, seeing as the
>>>>> disconnected event sends the user back to the Connecting state, my
>> handle
>>>>> should be fine and the library will keep trying to reconnect to
>> ZooKeeper
>>>>> internally? I understand my current operation may have failed, what I'm
>>>>> asking about is future operations.
>>>>> 
>>>>> Thanks,
>>>>> -n
>>>> 
>> 
>>

Re: Divergence in ZK transaction logs in some corner cases?

2009-03-30 Thread Mahadev Konar

Hi Raghu,

  You are right in pointing out the problem. We have to log the Leader
election txnid, which currently we don't. Their is open jira for that

http://issues.apache.org/jira/browse/ZOOKEEPER-335.

If you log the txn id on a new leader election as well then this would not
be a problem.

In your case 
1) a crashes
2) B is elected the leader. So the zxid of the ensemble moves to 2,0 and IS
LOGGED IN THE TRANSACTION LOG BY EVRYONE IN THE ENSEMBLE (this is part that
is missing in the code).
Now B starts a new PROPOSAL (2,1), B logs the PROPOSAL and moves to zxid
(2,1)

3) B crashes before anyone else receives the PROPOSAL.

4) C is elected as the leader the new zxid chosen by C is
3,0 (since we logged 2,0 on C as per our last leader election)

5) Now C would start a proposal (3,1) and this way we do not divurge the
logs.

I hope this helps. 

mahadev


On 3/30/09 1:31 PM, "rag...@yahoo.com"  wrote:

> 
> Ben,
> 
> Thanks a lot for explaining this.
> 
> I have one more corner case in mind where the transaction logs could diverge.
> I might be wrong this time as well, but would like to understand how it works.
> Reading the Leader.lead() code, it seems like the new leader reads the last
> logged zxid and bumps up the higher 32 bits while resetting the lower 32 bits.
> So this means that cascading leader crashes without a PROPOSAL in between
> would make the new leader chose the same zxid as the one before. This could
> lead to a corner case like below:
> 
> In an ensemble of 5 servers (A, B, C, D and E), say the zxid is 1,10 (higher
> 32 bits, lower 32 bits) with A as the leader. Now the following events happen:
> 
> 1. A crashes.
> 2. B is elected the leader. So the zxid of the ensemble moves to 2,0. If I
> read the code correctly, no one logs the new zxid until a new PROPOSAL is
> made. Now B starts a new PROPOSAL (2,1), B logs the PROPOSAL and moves to zxid
> (2,1).
> 3. B crashes before anyone else receives the PROPOSAL.
> 4. C is elected as the leader. Since the new zxid depends on the last logged
> zxid (which is still 1,10 according to C's log), the new zxid chosen by C is
> 2,0 as well.
> 5. Now C starts a new PROPOSAL (2,1), C logs the PROPOSAL and crashes before
> anyone else has received the PROPOSAL. We have diverged logs in B and C with
> the same zxid (2,1).
> 
> Could you tell me if this is correct?
> 
> Thanks
> Raghu
> 
> 
> 
> 
> 
> - Original Message 
> From: Benjamin Reed 
> To: "zookeeper-user@hadoop.apache.org" 
> Sent: Saturday, 28 March, 2009 10:49:32
> Subject: Re: Divergence in ZK transaction logs in some corner cases?
> 
> if recover worked the way you outline, we would have a problem indeed.
> fortunately, we specifically address this case.
> 
> the problem is in your first step. when b is elected leader, he will not
> proposal 10, he will propose 101. the zxid is made up of two
> parts, the high order bits are an epoch number and the low order bits are a
> counter. when every a new leader is elected, he will increment the epoch
> number and reset the counter.
> 
> when A restarts you have the opposite problem, you need to make sure that A
> forgets 10 because we have skipped it and committing it will mean that 10 is
> delivered out of order. we take advantage of the epoch number in that case as
> well to make sure that A forgets about 10.
> 
> there is some discussion about this in:
> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperInternals.html#sc_atom
> icBroadcast
> 
> we have a presentation as well that i'll put up that may make it more clear.
> 
> ben
> 
> rag...@yahoo.com wrote:
>> ZK gurus,
>> 
>> I think the ZK transaction logs can diverge from one another in some corner
>> cases. I have one such corner case listed below, could you please confirm if
>> my understanding is correct?
>> 
>> Imagine a 5 srever ensemble (A,B,C,D,E). All the servers are @ zxid 9. A is
>> the leader and it starts a new PROPOSAL (@zxid 10). A writes the proposal to
>> the log, so A moves to zxid 10. Others haven't received the PROPOSAL yet and
>> A crashes. Now the following happens:
>> 
>> 1. B is elected as the newleader. B bumps up its in-mem zxid to 10. Since
>> other nodes are at the same zxid, it sends a SNAP so that the others can
>> rebuild their data tree. In-memory zxid of all other nodes moves to 10.
>> 2.  A comes back now, it accepts B as the leader as soon as the leader (B)
>> and N/2 other nodes vouch for B as the leader. So A joins the ensemble. Every
>> zookeeper node is at zxid 10.
>> 
>> 3. A new request is submitted to B. B runs PROPOSAL and COMMIT phases and the
>> cluster moves up to zxid 11. But the transaction log of A is different from
>> that of everyone else now. So the transaction logs have diverged.
>> 
>> Could you confirm if this can happen? Or am I reading the code wrong?
>> 
>> Thanks
>> Raghu
>> 
>> 
>>
> 
> 
>

Re: problems on EC2?

2009-04-14 Thread Mahadev Konar

Hi Ted,
> These problems seem to manifest around getting lots of anomalous disconnects
> and session expirations even though we have the timeout values set to 2
> seconds on the server side and 5 seconds on the client side.
> 

 Your scenario might be a little differetn from what Nitay (Hbase) is
seeing. In their scenario the zookeeper client was not able to send out
pings to the server due to gc stalling threads in their zookeeper
application process.

The latencies in zookeeper clients are directly related to Zookeeper server
machines. It is very much dependant on the disk io latencies that you would
get on the zookeeper servers and network latencies with your cluster.

I am not sure how much sensitive you want your zookeeper application to be
-- but increasing the timeout should help. Also, we recommend using
dedicated disk for zookeeper log transactions.

http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html#sc_streng
thsAndLimitations

Also, we have seen Ntp having problems and clocks going back on one of our
vm setup. This would lead to session getting timed out earler than the set
session timeout.

I hope this helps.

mahadev

On 4/14/09 5:48 PM, "Ted Dunning"  wrote:

> We have been using EC2 as a substrate for our search cluster with zookeeper
> as our coordination layer and have been seeing some strange problems.
> 
> These problems seem to manifest around getting lots of anomalous disconnects
> and session expirations even though we have the timeout values set to 2
> seconds on the server side and 5 seconds on the client side.
> 
> Has anybody else been seeing this?
> 
> Is this related to clock jumps in a virtualized setting?
> 
> On a related note, what is best practice for handling session expiration?
> Just deal with it as if it is a new start?

Re: Some one send me some demo of programming with C client API for Zookeeper

2009-04-16 Thread Mahadev Konar

Please take a look at src/c/src/cli.c for some examples on zookeeper c
client usage. Also you can see the test cases.

Also 
http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html

Will give you some exmaple code for c clients.

mahadev

On 4/16/09 2:30 AM, "Qian Ye"  wrote:

> Hi all:
> 
> I'm a fresh man to Zookeeper. Finding that the documents at
> zookeeper.hadoop.apache.org are mostly about Java client API. However, I
> want some c client code to get start.
> 
> Anyone could help me?
>

Re: Running ZooKeeper inside my web app

2009-04-16 Thread Mahadev Konar

HI David, 
 You should be able to start zookeeper from your web app. Please take a look
at src/java/test where we startup zookeeper servers in the test cases as
part of junit testing.

mahadev

On 4/16/09 6:38 AM, "David Pollak"  wrote:

> Howdy,
> I'm working on a project of which ZooKeeper is a module.
> 
> For production mode, having a separate ZooKeeper instance/cluster is fine,
> but in development mode, I'd really like to have ZooKeeper start as part of
> my web app so there's no external dependency/startup/thing to think about.
>  Is it possible to start ZooKeeper programatically from inside my web app?
> 
> Thanks,
> 
> David

Re: Server-client connection timeout

2009-04-21 Thread Mahadev Konar

Hi raghu,

http://wiki.apache.org/hadoop/ZooKeeper/FAQ
Explains on what timeouts mean for a zookeeper client.
A timeout does not mean a closed session. The client will reconnect to
another server and then renew the session. A closed session will make the
zoookeeper handle invalid.

Hope this helps.

Their is more info at

http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

And also 
http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_
zkSessions



mahadev


On 4/21/09 12:00 AM, "rag...@yahoo.com"  wrote:

> 
> I have a question related how ZK server deals with client timeout. If the
> client loses connectivity with the ZK server (which is still alive), then the
> ZK server will close the client session by issuing a closeSession transaction,
> correct? So even if the client has reestablished the session by connecting to
> another server by now, closeSession transaction will force the session to be
> deleted on all servers. The client will have to reconnect to one of the
> servers again and create a brand new session, right?
> 
> Could you please clarify if the above is correct?
> 
> Thanks
> Raghu
> 
> 
> 
>

Re: ZooKeeper's Atomic Broadcast & Leader Election Algorithms

2009-04-21 Thread Mahadev Konar

Hi Jason,

 You should be a able to get some idea from the set of presentations at
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations.

Also, please use zookeeper-...@hadoop.apache.org for questions which are not
related to zookeeper users.

mahadev

On 4/21/09 2:32 PM, "Jason Dusek"  wrote:

> 
>   I'd like to know more about how these things are implemented
>   and how message loss and leader failure affect their
>   correctness.
> 
> --
> Jason Dusek

Re: Does anyone have done some strict performance testing on current Zookeeper?

2009-04-23 Thread Mahadev Konar

Hi Qian,
  I can give you an example of one of our systems that uses zookeeper
(crawling for our Yahoo! search engine). It has of the order of 4K clients -
expecting to grow to 8K to 12K. Their write load is around 100 writes/sec
(this is pretty low) and 4K reads/sec (max reads) with an ensemble of 5
zookeeper servers.

Hope this helps. What kind of workload are you expecting?

Mahadev

On 4/22/09 7:06 AM, "Qian Ye"  wrote:

> Hi all:
> 
> I've read the materials about zookeeper at apache.org there days, it is very
> interesting. I'm planing to involve Zookeeper in our web service system,
> which is providing daily service to millions of users. So I really care
> about whether the Zookeeper can play as well as it is presented in the
> materials. I knew there are some successful store with Zookeeper, but I want
> to know the details. Could some one give me some specific cases of using
> Zookeeper in a big web system. Some detailed results about performance
> testing on Zookeeper are most wanted.
> 
> Thanks very much~

Re: Unique Id Generation

2009-04-23 Thread Mahadev Konar

Hi Satish,
 Most of the sequences (versions of nodes ) and the sequence flags are ints.
We do have plans to move it to long.
But in your case I can imagine you can split a long into 2 32 bits -

Parent (which is int) -> child(which is int)
Now after you run out of child epehemarls then you should create a node
Parent + 1
Remove parent 
And then start creating an ephemeral child

(so parent (32 bits) and child (32 bits)) would form a long.

I don't think this should be very hard to implement. Their is nothing in
zookeeper (out of the box) currently that would help you out.

Mahadev

On 4/23/09 4:52 PM, "Satish Bhatti"  wrote:

> We currently use a database sequence to generate unique ids for use by our
> application.  I was thinking about using ZooKeeper instead so I can get rid
> of the database.  My plan was to use the sequential id from ephemeral nodes,
> but looking at the code it appears that this is an int, not a long.  Is
> there any other straightforward way to generate ids using ZooKeeper?
> Thanks,
> 
> Satish

Re: Unique Id Generation

2009-04-24 Thread Mahadev Konar

Hi Satish,
 take a look at 
http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html#sc_mainte
nance

This can be run as a cron job and will get rid of old unwanted logs and
snapshots.

mahadev


On 4/24/09 10:18 AM, "Satish Bhatti"  wrote:

> A follow up to this:  I implemented method (b), and ran a test that
> generated 100K of ids.  This generated 1.3G worth of transaction logs.
>  Question:  when can these be safely deleted?  How does one know which ones
> may be deleted?  Or do they need to exist forever?
> 
> On Fri, Apr 24, 2009 at 9:52 AM, Ted Dunning  wrote:
> 
>> Of the methods proposed,
>> 
>> a) recursive sequential files
>> 
>> b) latest state file(s) that is updated using a pseudo transaction to give
>> a
>> range of numbers to allocate
>> 
>> c) just probe zxid
>> 
>> You should be pretty good with any of them.  With (a), you have to be
>> careful to avoid race conditions when you get to the end of the range for
>> the sub-level.  With (b), you get results of guaranteed nature although the
>> highest throughput versions might have gaps (shouldn't bother you).  The
>> code for this is more complex than the other implementations.  With (c),
>> you
>> could have potentially large gaps in the sequence, but 64 bits that
>> shouldn't be a big deal.  Code for that version would be the simplest of
>> any
>> of them.
>> 
>> On Fri, Apr 24, 2009 at 8:56 AM, Satish Bhatti  wrote:
>> 
>>> Hello Ben,
>>> Basically the ids are document Ids.  We will eventually have several
>>> billion
>>> documents in our system, and each has a unique long id.  Currently we are
>>> using a database sequence to generate these longs.  Having eliminated
>> other
>>> uses of the database, we didn't want to keep it around just to generate
>>> ids.
>>>  That is why I am looking to use ZooKeeper to generate them instead.
>>> 
>>> 
>>

Re: Multiple ZooKeeper client instances

2009-04-24 Thread Mahadev Konar

HI Satish,
  A zookeeper client usually has a very small footprint for memory and cpu.
The mutithreaded version of zookeeper client creates an internal thread to
do the io and callbacks. I would  suggest using the same zookeeper client
across the objects to have less number of threads in your client process.

mahadev

On 4/24/09 2:37 PM, "Satish Bhatti"  wrote:

> If my application has several objects who are using ZooKeeper for entirely
> unrelated reasons, is it recommended to create one ZooKeeper client instance
> and share it, or to create one per object?  Do the ZooKeeper client
> instances have a lot of overhead?  I am thinking that having one instance
> per object will lead to simpler code in terms of handling Session
> expirations.
> Satish

Re: Dynamic server addition/deletion

2009-05-01 Thread Mahadev Konar

Hi Raghu and Ted,
 Theire is already an open jira on this --

http://issues.apache.org/jira/browse/ZOOKEEPER-107

You can go through the suggestions on it and can continue the discussion on
the jira. Please  feel free to add your ideas to the jira.

Also, I don't think anyone is working on it (to answer raghu's question).

mahadev


On 5/1/09 12:30 PM, "Ted Dunning"  wrote:

> This is relatively easy to do now, although somewhat inelegant.
> 
> You can make configuration changes and then do a rolling restart of the
> systems.
> 
> A more elegant solution in which you add additional servers without a
> restart should be relatively easy to build into the code if you can make it
> look like the new machine is simply a reboot of a previously known machine.
> 
> Folk like Ben and Patrick and Mahadev should have better informed ideas
> about that.
> 
> 
> On Fri, May 1, 2009 at 12:25 PM, rag...@yahoo.com  wrote:
> 
>> 
>> Our product would require support for dynamic addition and deletion of ZK
>> servers to the cluster. We would like to come up with a design, propose the
>> design to the ZK developers and then implement the feature once the design
>> is signed off by the ZK developers. Before we go down that path, I would
>> like to know if people already have any ideas on this that they could share
>> with us. Also, we don't want to duplicate the effort, so we would appreciate
>> if you let us know anyone is already working on a design proposal for this
>> feature.
>> 
>> Thanks
>> Raghu
>> 
>> 
>> 
>> 
>

Re: Moving ZooKeeper Servers

2009-05-04 Thread Mahadev Konar

Hi Satish,
  Is the re generation of state in production something that is not
acceptable? Copying over the whole datadir and datalogdir as it is
maintaining the dirctory structure would be necessary.

Also, in general this is a bad idea (just to warn you) since you would have
to be careful with data copying ( making sure that their is one to one
mapping between the data copying from pre prod to prod)-- meaning

Pre prod1 -> prod1 (copying from pre prod1 to prod 1)
Pre prod2 -> prod2 (copy from pre prod 2 to prod 2).

The one to one mapping is essential to make sure data isnt lost.

Also, you have to make sure htat you have a clean database in prod1 and you
do not have files in production that overlaps old file from production and
the new files you copied over from pre production. This will cause database
corruption since you will have an overlap of database from pre prod and old
production.

So, zookeeper would work fine if you are careful with above but I would vote
against doing this for production since the above is pretty easy to mess up.

mahadev

On 5/4/09 11:10 AM, "Ted Dunning"  wrote:

> I think it would be easier to add the production machines to the cluster one
> by one and then remove the pre-production ZK instances from the cluster one
> by one.
> 
> This gives you continuity that you lack otherwise.  Adding machines is a
> matter of changing the configuration on each ZK and restarting ZK on that
> machine.  You could add the machines in a lump if you don't add so many as
> to prevent the cluster from having a quorum.  The configuration change and
> restart can be easily scripted and goes quite quickly.
> 
> After the hand-off, you can bring the pre-production machines machines back
> up with a smaller cluster configuration.
> 
> Of course, this trick only works if you have no production ZK already in
> place so it won't work the second time around.  It is also a bit unusual for
> the complete state of a pre-production staging cluster to be important
> enough to preserve.
> 
> On Mon, May 4, 2009 at 10:35 AM, Satish Bhatti  wrote:
> 
>> ... (2) At some point, we want to switch the preproduction instance to be
>> the
>> production instance.  For the ZooKeeper servers, we will copy the data +
>> logs directories from the pre machines currently running ZooKeeper to the
>> prod machines that will be running ZooKeeper, and start up ZooKeeper on
>> those machines.  Is this all that is necessary so that the new ZooKeeper
>> cluster effectively continues from where the pre cluster left off?  Am I
>> missing something?
>> 
>> --
> Ted Dunning, CTO
> DeepDyve

Re: Moving ZooKeeper Servers

2009-05-06 Thread Mahadev Konar

Yes, that is correct.

The quota node is a zookeeper service node for quota's in zookeeper.

mahadev


On 5/6/09 2:57 PM, "Satish Bhatti"  wrote:

> I ended up going with that suggestion, a short recursive function did the
> trick!  However, I noticed the following nodes:
> /zookeeper
> /zookeeper/quota
> 
> that were not created by me.  So I ignored them.  Is this correct?
> 
> Satish
> 
> 
> On Mon, May 4, 2009 at 4:33 PM, Ted Dunning  wrote:
> 
>> In fact, the much, much simpler approach of bringing up the production ZK
>> cluster and simply writing a program to read from the pre-production
>> cluster
>> and write to the production one is much more sound.  If you can't do that,
>> you may need to rethink your processes since they are likely to be delicate
>> for other reasons as well.
>> 
>> On Mon, May 4, 2009 at 2:35 PM, Mahadev Konar 
>> wrote:
>> 
>>> So, zookeeper would work fine if you are careful with above but I would
>>> vote
>>> against doing this for production since the above is pretty easy to mess
>>> up.
>>> 
>> 
>> 
>> 
>> --
>> Ted Dunning, CTO
>> DeepDyve
>> 
>> 111 West Evelyn Ave. Ste. 202
>> Sunnyvale, CA 94086
>> www.deepdyve.com
>> 858-414-0013 (m)
>> 408-773-0220 (fax)
>>

Re: Removing Children Watches

2009-05-14 Thread Mahadev Konar

Hi Satish,
  If you call getChildren(rootPath, true), it will set the watch and doing
the same operration with false _DOES NOT_ remove the watch.

In case you want different behaviour from these 2 different calls, you
should use the call back specific api;s

Getchildren(string path, watcher watcher, callback, Object)

mahadev 


On 5/14/09 1:13 PM, "Satish Bhatti"  wrote:

> (1)  Call zookeeper.getChildren( rootPath, true );This successfully sets the
> watch,
> Next time I add a node under rootPath, the watch gets triggered, as
> expected.
> 
> (2)  Call zookeeper.getChildren( rootPath, false );
> Next time I add a node under rootPath, the watch _STILL_ gets triggered!
> 
> What am I doing wrong?
> 
> Satish

Re: Some thoughts on Zookeeper after using it for a while in the CXF/DOSGi subproject

2009-05-29 Thread Mahadev Konar

Hi David,

> 
> 
> I second this.  If you folks want to host at http://scala-tools.org we'd be
> happy to host this non-Scala but super mega interesting and valuable
> project.  We'll even do builds on our hudson server.
We have had plans to publish our release on maven repos but havent had the
time to do so.  If you guys have any expertise on how to publish it to
apache maven repos, we'd be happy to accept  and incorporate the changes.

mahadev

> 
> 
>> 
>> * To use Zookeeper from within OSGi it has to be turned into an OSGi
>> bundle. Doing this is not hard and it's currently done in our
>> buildsystem [1]. However, I think it would make sense to have this
>> done somewhere in the Zookeeper buildsystem. Matter of fact I think
>> you should be able to release a single zookeeper.jar that's both an
>> ordinary jar and an OSGi bundle so it would work in both cases...
>> * The Zookeeper server is currently started with the zkServer.sh
>> script, but I think it would make sense to also allow it to run inside
>> an OSGi container, simply by starting a bundle. Has anyone ever done
>> any work in this regard? If not I'm planning to spend some time and
>> try to make this work.
>> * BTW I made some Windows versions of the zkCli/zkEnv/zkServer
>> scripts. Interested in taking these?
>> 
>> Thoughts anyone?
>> 
>> Best regards,
>> 
>> David Bosschaert
>> 
>> [1]
>> http://svn.apache.org/repos/asf/cxf/dosgi/trunk/discovery/distributed/zookeep
>> er-wrapper/pom.xml
>> 
> 
>

Re: ZooKeeper heavy CPU utilisation

2009-06-02 Thread Mahadev Konar

Hi Satish,
  Can you attach this trace to a jira? Please open one for this. Also, can
you do the following -

For all the threads for the zookeeper server you are seeing the problem on,

Can you do an strace on all the threads and see which thread is spinning?

Also, can you upload the configs of the servers to the jira that you create?

mahadev


On 6/2/09 1:29 PM, "Satish Bhatti"  wrote:

> Hey Ben,
> Strange you didn't get the attachment, my gmail is showing the paper clip
> thingy for that message.  ANyway, I have pasted the whole jstack output into
> this email, since it's pretty small.
> 
> Satish
> 
> 2009-06-02 11:56:26
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_03-b05 mixed
> mode):
> 
> "Attach Listener" daemon prio=10 tid=0x43e71800 nid=0x5566 waiting
> on condition [0x..0x]
>java.lang.Thread.State: RUNNABLE
> 
> "SyncThread:4" prio=10 tid=0x2aaac8274800 nid=0x1ced waiting on
> condition [0x42d63000..0x42d63c10]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab34e44d8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Ab
> stractQueuedSynchronizer.java:1925)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java
> :71)
> 
> "FollowerRequestProcessor:4" prio=10 tid=0x2aaac8273c00 nid=0x1cec
> waiting on condition [0x42b61000..0x42b61b90]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab34e4648> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Ab
> stractQueuedSynchronizer.java:1925)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerReques
> tProcessor.java:58)
> 
> "CommitProcessor:4" prio=10 tid=0x2aaac8150400 nid=0x1ceb in
> Object.wait() [0x42c62000..0x42c62b10]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x2aaab34e46f0> (a
> org.apache.zookeeper.server.quorum.CommitProcessor)
> at java.lang.Object.wait(Object.java:485)
> at
> 
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:80>
)
> - locked <0x2aaab34e46f0> (a
> org.apache.zookeeper.server.quorum.CommitProcessor)
> 
> "Thread-13" daemon prio=10 tid=0x2aaac80f9800 nid=0x1b5e runnable
> [0x42a6..0x42a60c90]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
> at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> - locked <0x2aaab34bf5c8> (a java.lang.Object)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxMa
> nager.java:551)
> 
> "Thread-12" daemon prio=10 tid=0x2aaac80f8c00 nid=0x1b5d waiting on
> condition [0x42258000..0x42258c10]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab34bf7c0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Ab
> stractQueuedSynchronizer.java:1925)
> at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:317)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxMa
> nager.java:479)
> 
> "Thread-11" prio=10 tid=0x43d22800 nid=0x1b5c runnable
> [0x4295f000..0x4295fb90]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
> at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> - locked <0x2aaab34bf9a8> (a java.lang.Object)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxMa
> nager.java:551)
> 
> "Thread-10" prio=10 tid=0x43d21400 nid=0x1b5b waiting on condition
> [0x4285e000..0x4285eb10]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Meth

Re: Errors during shutdown/startup of ZooKeeper

2009-06-02 Thread Mahadev Konar

Hi Nitay,
  This is not an error but should be a warning. I have opened up a jira for
it.

http://issues.apache.org/jira/browse/ZOOKEEPER-428


The message just says that a client is connecting to a server that is behind
that a server is was connected to earlier. The log should be warn and not
error and should be fixed in the next release.

mahadev

On 6/2/09 2:12 PM, "Nitay"  wrote:

> Hey guys,
> 
> We are getting a lot of messages like this in HBase:
> 
> [junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810]
> server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is 0xd
> 
> For more context, the block it usually appears in is:
> 
> [junit] 2009-06-02 13:27:54,083 INFO  [main-SendThread]
> zookeeper.ClientCnxn$SendThread(737): Priming connection to
> java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:56511
> remote=localhost/0:0:0:0:0:0:0:1:21810]
> [junit] 2009-06-02 13:27:54,084 INFO  [main-SendThread]
> zookeeper.ClientCnxn$SendThread(889): Server connection successful
> [junit] 2009-06-02 13:27:54,093 INFO  [NIOServerCxn.Factory:21810]
> server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511 lastZxid 16
> [junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810]
> server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid is 0x4
> [junit] 2009-06-02 13:27:54,094 WARN  [NIOServerCxn.Factory:21810]
> server.NIOServerCnxn(444): Exception causing close of session 0x0 due to
> java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4
> [junit] 2009-06-02 13:27:54,094 DEBUG [NIOServerCxn.Facto777ry:21810]
> server.NIOServerCnxn(447): IOException stack trace
> [junit] java.io.IOException: Client has seen zxid 0x10 our last zxid is
> 0x4
> [junit] at
> org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.jav
> a:544)
> [junit] at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:331)
> [junit] at
> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:176)
> [junit] 2009-06-02 13:27:54,094 INFO  [NIOServerCxn.Factory:21810]
> server.NIOServerCnxn(777): closing session:0x0 NIOServerCnxn:
> java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:21810
> remote=/0:0:0:0:0:0:0:1%0:56511]
> [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
> zookeeper.ClientCnxn$SendThread(919): Exception closing session
> 0x121a2a7c43a0002 to sun.nio.ch.selectionkeyi...@2c662b4e
> [junit] java.io.IOException: Read error rc = -1
> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
> [junit] at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
> [junit] at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
> zookeeper.ClientCnxn$SendThread(953): Ignoring exception during shutdown
> input
> [junit] java.net.SocketException: Socket is not connected
> [junit] at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
> [junit] at
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
> [junit] at
> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
> [junit] at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951)
> [junit] at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
> 
> 
> This happens in a seemingly endless loop. We are not quite sure what it
> means. Can someone help shed some light on these messages?
> 
> Thanks,
> -n

Re: Errors during shutdown/startup of ZooKeeper

2009-06-02 Thread Mahadev Konar

I think my last message got bounced...

They should get fixed automatically. Are you shutting downm servers often in
 your unit test? A client should be avle to connect to some other server
which 
is more recnet. Whats the reason behind your question that it isnt getting
 fixed by itself?
 
 mahadev
> 
> On 6/2/09 2:37 PM, "Nitay"  wrote:
> 
>> I see. That helps. However, even as warnings, these go on seemingly
>> endlessly. Why do they not get fixed by themselves? What are we doing wrong
>> here?
>> 
>> On Tue, Jun 2, 2009 at 2:24 PM, Mahadev Konar  wrote:
>> 
>>> Hi Nitay,
>>>  This is not an error but should be a warning. I have opened up a jira for
>>> it.
>>> 
>>> http://issues.apache.org/jira/browse/ZOOKEEPER-428
>>> 
>>> 
>>> The message just says that a client is connecting to a server that is
>>> behind
>>> that a server is was connected to earlier. The log should be warn and not
>>> error and should be fixed in the next release.
>>> 
>>> mahadev
>>> 
>>> On 6/2/09 2:12 PM, "Nitay"  wrote:
>>> 
>>>> Hey guys,
>>>> 
>>>> We are getting a lot of messages like this in HBase:
>>>> 
>>>> [junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is 0xd
>>>> 
>>>> For more context, the block it usually appears in is:
>>>> 
>>>> [junit] 2009-06-02 13:27:54,083 INFO  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(737): Priming connection to
>>>> java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:56511
>>>> remote=localhost/0:0:0:0:0:0:0:1:21810]
>>>> [junit] 2009-06-02 13:27:54,084 INFO  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(889): Server connection successful
>>>> [junit] 2009-06-02 13:27:54,093 INFO  [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511 lastZxid
>>> 16
>>>> [junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid is 0x4
>>>> [junit] 2009-06-02 13:27:54,094 WARN  [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(444): Exception causing close of session 0x0 due to
>>>> java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4
>>>> [junit] 2009-06-02 13:27:54,094 DEBUG [NIOServerCxn.Facto777ry:21810]
>>>> server.NIOServerCnxn(447): IOException stack trace
>>>> [junit] java.io.IOException: Client has seen zxid 0x10 our last zxid
>>> is
>>>> 0x4
>>>> [junit] at
>>>> 
>>> org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.j
>>> av
>>>> a:544)
>>>> [junit] at
>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:331)
>>>> [junit] at
>>>> 
>>> 
org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:176>>>
)
>>>> [junit] 2009-06-02 13:27:54,094 INFO  [NIOServerCxn.Factory:21810]
>>>> server.NIOServerCnxn(777): closing session:0x0 NIOServerCnxn:
>>>> java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:21810
>>>> remote=/0:0:0:0:0:0:0:1%0:56511]
>>>> [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(919): Exception closing session
>>>> 0x121a2a7c43a0002 to sun.nio.ch.selectionkeyi...@2c662b4e
>>>> [junit] java.io.IOException: Read error rc = -1
>>>> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
>>>> [junit] at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
>>>> [junit] at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
>>>> [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
>>>> zookeeper.ClientCnxn$SendThread(953): Ignoring exception during shutdown
>>>> input
>>>> [junit] java.net.SocketException: Socket is not connected
>>>> [junit] at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>> [junit] at
>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>>> [junit] at
>>>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>> [junit] at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951)
>>>> [junit] at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
>>>> 
>>>> 
>>>> This happens in a seemingly endless loop. We are not quite sure what it
>>>> means. Can someone help shed some light on these messages?
>>>> 
>>>> Thanks,
>>>> -n
>>> 
>>> 
>>

Re: Win32 as a production platform

2009-06-04 Thread Mahadev Konar

Hi Marc,
 The only thing missing would be testing and support. We do most of our
testing on linux boxes and for the same reason its easy for us to support
the platforms that we use. We do not have access to windows boxes to test
and (therefore) support windows as a suggested production platform. Also,
the other reason is that currently we have a very few set of people who
contribute there time for development and support of Zookeeper and are more
conversant with linux than windows. It would be a stretch to expect them to
support windows as a production platform.

Hope this helps.
Thanks
mahadev

On 6/4/09 4:38 AM, "Marc Frei"  wrote:

> Dear ZooKeeper users,
> 
> the ZooKeeper Administrator's Guide states that Win32 is currently only
> supported as a development platform but not as production platform:
> 
> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperAdmin.html#sc_su
> pportedPlatforms
> 
> Since we are currently evaluating ZooKeeper for production use on both
> Linux and Windows servers, I would be interested in knowing what the
> missing pieces for full-fledged Windows support are.
> 
> Thank you very much in advance and kind regards,
> 
> Marc

Re: Watches

2009-06-04 Thread Mahadev Konar

Hi Avinash,
  For watching just one node zoo_exist(path, true) would help you watching
that node. But if you want to see what there was some addition on /A you
should better use getChildren() api.

Also, with your code

>> stat = zk_.exists(path + "/A", true);
>>> if ( stat == null )
>>>  {
>>>zk_.create("/A", new byte[0], Ids.OPEN_ACL_UNSAFE,
>>> CreateMode.PERSISTENT);
>>>  }
>>> 

Shouldn't it be zk.exists("/A" + path) ?

Also you would create zk.create("/A" + path) ?

No?

Also for more information on the watches you can read in the programmer
manual at 
http://hadoop.apache.org/zookeeper/docs/r3.1.1/

And also the javadocs should give you an idea.


Hope this helps.
Mahadev

On 6/4/09 12:40 PM, "Avinash Lakshman"  wrote:

> I want to get notified whenever any sub znode is created under /A. Only one
> process amongst many will create these sub znodes. But everyone needs to be
> notified about this.
> 
> Cheers
> Avinash
> 
> On Thu, Jun 4, 2009 at 12:38 PM, Eric Bowman  wrote:
> 
>> Avinash Lakshman wrote:
>>> Hi All
>>> 
>>> I have a znode named /A. Now I will over time create znodes below it such
>>> /A/A1, /A/A2, ..., /A/An etc. Now every time I create this sub znode I
>> need
>>> to have all my processes notified. Can I get by just setting one watch on
>>> /A? So my set up looks as follows:
>>> 
>>> stat = zk_.exists(path + "/A", true);
>>> if ( stat == null )
>>>  {
>>>zk_.create("/A", new byte[0], Ids.OPEN_ACL_UNSAFE,
>>> CreateMode.PERSISTENT);
>>>  }
>>> 
>>> This doesn't seem to trigger any watch when I add the sub znodes. What am
>> I
>>> doing wrong
>> 
>> Try:
>> 
>> zk_.watchChildren( "/A", true )
>> 
>> What you are doing watches changes to the data at /A, not its children.
>> 
>> From the javadocs:
>> 
>> 
>>  getChildren
>> 
>>public List <
>> http://java.sun.com/javase/6/docs/api/java/util/List.html?is-external=true>> tring
>> <
>> 
http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>>>
>
>> *getChildren*(String <
>> http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true>
>> path,
>>Watcher <
>> http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/Watc
>> her.html>
>> watcher)
>> throws KeeperException <
>> http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/Keep
>> erException.html
>>> ,
>>InterruptedException <
>> http://java.sun.com/javase/6/docs/api/java/lang/InterruptedException.html?is-
>> external=true
>>> 
>> 
>>Return the list of the children of the node of the given path.
>> 
>>If the watch is non-null and the call is successful (no exception is
>>thrown), a watch will be left on the node with the given path. The
>>watch willbe triggered by a successful operation that deletes the
>>node of the given path or creates/delete a child under the node.
>> 
>> 
>> cheers,
>> Eric
>> 
>> --
>> Eric Bowman
>> Boboco Ltd
>> ebow...@boboco.ie
>> http://www.boboco.ie/ebowman/pubkey.pgp
>> +35318394189/+353872801532> 4189/+353872801532>
>> 
>>

Re: Newbie Questions

2009-06-07 Thread Mahadev Konar

HI Grant,
  I agree with Ted but just to elaborate a little more.

  Its good to have a single zookeeper instance connected to the server.
Zookeeper client are supposed to be long lived client and the expected idiom
to use a zookeeper client is to have a  long lived single zookeeper client
per application instance. Most of the zookeeper recipes use zookeeper
session capabilities for implementing those recipes. So in that case, it
becomes necessary to have just a single client per app instance. Even if you
don't plan to use zookeeper session capabilities (like ephemeral nodes and
watches) it would be good to just use a single zookeeper instance.

A zookeeper client if not being used in an application would just be sending
pings every one third of the timeout values you set. We are working on an
opimization in 3.3 wherein we wont be even sending these pings if the client
does not use ephemeral nodes and watches. ZOOKEEPER-321 is the jira if you
want to track that. Hope this helps.

mahadev

On 6/6/09 12:09 AM, "Ted Dunning"  wrote:

> It is a common idiom to have a single Zookeeper instance.  One reason for
> this is that it can be hard to keep track of which instance has which
> watches if you have lots of them around.
> 
> Instantiating several Zookeeper structures and then discarding them also
> eliminates the utility of ephemeral philes.
> 
> Watches and ephemerals are two of the key characteristics of ZK, so they are
> quite a loss.
> 
> That said, keeping a single zookeeper as a static in a single class isn't
> such a strange thing to do, especially if you can't imagine closing the ZK
> instance.  That gives you some scope but can keep the existence and use of
> ZK a secret.
> 
> You do have to worry a bit about how to initialize the ZK.  For that reason
> and for mocking purposes, it is pretty good practice to always inject the ZK
> instance into your classes a la spring.
> 
> On Fri, Jun 5, 2009 at 8:56 PM, Grant Ingersoll  wrote:
> 
>> What's the overhead of connecting to a Server?  In other words, if I'm in a
>> multi-threaded web-app server environment, should I cache my ZooKeeper
>> instance and set a larger timeout value or should I just construct them as I
>> need them?
>> 
>> 
>> 
>

Re: Question about ACL

2009-06-07 Thread Mahadev Konar

I think my last email bounced --

Hi Qian,
 I think we lack a lot of documentation on ACL’s and how to use your own
authentication schemes. 3.2 should have some of it. Zookeeper allows you to
add you own authentication schemes. You should be able to write you own
authentication scheme like the IPAuthenticationProvider.java. All you need
to do is write such a class that implements AuthenticationProvideer and
then when you start up the server, you will have to use
–Dzookeeper.authprovider.some_name=org.apache.zookeeper.server.auth.SomeAuth
enticationProvider – zookeeper server picks up these authentication
providers when it starts up from the system properties (if you want to
specify you own). Hope this helps.

mahadev

On 6/6/09 10:07 AM, "Qian Ye"  wrote:

> Hi  all:
> I've got some problem about setting ACL for node using C API. I read from the
> Guide that the scheme "ip" should accept mask, like 172.18.0.0/16。So
>   I called zoo_create() API with an ACL
> struct in the form {count:1, data:{{scheme:ip, id:172.18.0.0/16}}
>  }. However, the C API return a "Invalid ACL" error.
> Then I read the source code (the server side), and found that the ip scheme
> can only accept an id within only ip string (like, 172.18.0.0). 
> So, How can I set an ACL with the mask?

Re: Newbie Questions

2009-06-07 Thread Mahadev Konar

Hi Satish,
   My suggestion in the last email was just to prevent unnecessary
instantiation of zookeeper client objects. Creating a new zookeeper client
for different modules doing different things should be fine. I don't have
any recommemded approach for either of your suggestions. It really depends
on your encapsulation, application API's and your usage model.

thanks
mahadev


On 6/7/09 4:58 PM, "Satish Bhatti"  wrote:

> Hey Mahadev,
> I had a question about that.  In my application I am using ZooKeeper for
> several different unrelated purposes, e.g. to generate unique ids,  for
> distributed locks, and as a property store.  I have implemented generic
> black box classes that use ZooKeeper to provide that functionality, and when
> an object of one of those classes is instantiated it internally creates a
> ZooKeeper instance for its personal use.  So, for example, one of my apps
> has a property store and an id generator, and so it ends up using 2
> ZooKeeper client objects.  In principle I could create a single ZooKeeper
> client object and pass it in to the objects, so then I would only have a
> single ZooKeeper instance.  However, if receive
> a Watcher.Event.KeeperState.Expired event, a fresh ZooKeeper client instance
> has to be created.  If I were sharing the ZooKeeper instance, then somehow
> my objects would have to be notified that they should switch to using the
> new ZooKeeper instance.  That means somewhere in my app I would need to
> maintain a list of all objects using ZooKeeper.  Is this the recommended
> approach?  Is there some other more elegant way to do this?
> 
> Satish
> 
> 
> On Sun, Jun 7, 2009 at 11:58 AM, Mahadev Konar wrote:
> 
>> HI Grant,
>>  I agree with Ted but just to elaborate a little more.
>> 
>>  Its good to have a single zookeeper instance connected to the server.
>> Zookeeper client are supposed to be long lived client and the expected
>> idiom
>> to use a zookeeper client is to have a  long lived single zookeeper client
>> per application instance. Most of the zookeeper recipes use zookeeper
>> session capabilities for implementing those recipes. So in that case, it
>> becomes necessary to have just a single client per app instance. Even if
>> you
>> don't plan to use zookeeper session capabilities (like ephemeral nodes and
>> watches) it would be good to just use a single zookeeper instance.
>> 
>> A zookeeper client if not being used in an application would just be
>> sending
>> pings every one third of the timeout values you set. We are working on an
>> opimization in 3.3 wherein we wont be even sending these pings if the
>> client
>> does not use ephemeral nodes and watches. ZOOKEEPER-321 is the jira if you
>> want to track that. Hope this helps.
>> 
>> mahadev
>> 
>> 
>> On 6/6/09 12:09 AM, "Ted Dunning"  wrote:
>> 
>>> It is a common idiom to have a single Zookeeper instance.  One reason for
>>> this is that it can be hard to keep track of which instance has which
>>> watches if you have lots of them around.
>>> 
>>> Instantiating several Zookeeper structures and then discarding them also
>>> eliminates the utility of ephemeral philes.
>>> 
>>> Watches and ephemerals are two of the key characteristics of ZK, so they
>> are
>>> quite a loss.
>>> 
>>> That said, keeping a single zookeeper as a static in a single class isn't
>>> such a strange thing to do, especially if you can't imagine closing the
>> ZK
>>> instance.  That gives you some scope but can keep the existence and use
>> of
>>> ZK a secret.
>>> 
>>> You do have to worry a bit about how to initialize the ZK.  For that
>> reason
>>> and for mocking purposes, it is pretty good practice to always inject the
>> ZK
>>> instance into your classes a la spring.
>>> 
>>> On Fri, Jun 5, 2009 at 8:56 PM, Grant Ingersoll 
>> wrote:
>>> 
>>>> What's the overhead of connecting to a Server?  In other words, if I'm
>> in a
>>>> multi-threaded web-app server environment, should I cache my ZooKeeper
>>>> instance and set a larger timeout value or should I just construct them
>> as I
>>>> need them?
>>>> 
>>>> 
>>>> 
>>> 
>> 
>>

Re: Show your ZooKeeper pride!

2009-06-08 Thread Mahadev Konar

Its just that Zookeeper mostly is used in applications which are proprietary
(to Yahoo!) and so its harder to update the wiki with specifics on how it is
being used.

thanks
mahadev

On 6/8/09 7:01 PM, "Ted Dunning"  wrote:

>  How come Yahoo isn't listed?
> 
> On Mon, Jun 8, 2009 at 6:31 PM, Patrick Hunt  wrote:
> 
>> The Hadoop summit is Wednesday. If you're attending please feel free to say
>> hi -- Mahadev is presenting @4, Ben and I will be attending as well.
>> 
>> Also, regardless of whether you're attending or not we'd appreciate any
>> updates to the "powered by" page, if you're too busy to update it yourself
>> send us a snippet and we'll update it for you ;-)
>> 
>> http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy
>> 
>> Regards,
>> 
>> Patrick
>> 
> 
>

Re: Show your ZooKeeper pride!

2009-06-09 Thread Mahadev Konar

Ok I just updated the Powered BY Page to add Yahoo! With a high level
overview of what its used for.

Thanks for pushing me Stu :)..

mahadev


On 6/9/09 8:20 AM, "Stu Hood"  wrote:

> So is Hadoop... don't let that stop you.
> 
> -Original Message-
> From: "Mahadev Konar" 
> Sent: Monday, June 8, 2009 10:09pm
> To: zookeeper-user@hadoop.apache.org, "gene...@hadoop.apache.org"
> 
> Cc: "zookeeper-...@hadoop.apache.org" 
> Subject: Re: Show your ZooKeeper pride!
> 
> Its just that Zookeeper mostly is used in applications which are proprietary
> (to Yahoo!) and so its harder to update the wiki with specifics on how it is
> being used.
> 
> thanks
> mahadev
> 
> On 6/8/09 7:01 PM, "Ted Dunning"  wrote:
> 
>>  How come Yahoo isn't listed?
>> 
>> On Mon, Jun 8, 2009 at 6:31 PM, Patrick Hunt  wrote:
>> 
>>> The Hadoop summit is Wednesday. If you're attending please feel free to say
>>> hi -- Mahadev is presenting @4, Ben and I will be attending as well.
>>> 
>>> Also, regardless of whether you're attending or not we'd appreciate any
>>> updates to the "powered by" page, if you're too busy to update it yourself
>>> send us a snippet and we'll update it for you ;-)
>>> 
>>> http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy
>>> 
>>> Regards,
>>> 
>>> Patrick
>>> 
>> 
>> 
> 
> 
>

Re: zookeeper.getChildren asynchronous callback

2009-06-11 Thread Mahadev Konar

Agreed... It is a nice api to have and also reduces our memory footprint for
unwanted watches (as Ben suggested earlier).

mahadev


On 6/11/09 10:14 AM, "Satish Bhatti"  wrote:

> That's right Ben.  Basically, I would like to use it something like this:
> public boolean waitForChildrenChanged( String rootPath,
>long timeout )
> {
> BooleanLock blChildrenChanged = new BooleanLock();
> 
> Watcher tempWatcher =
> new Watcher()
> {
> public void process( WatchedEvent event )
> {
> logger.debug( "waitForAnyEntry(): Got state event: " +
> ZooKeeperUtils.watchedEventToString( event ) );
> blChildrenChanged.setValue( true );
> }
> };
> 
> zookeeper.getChildren( rootPath, watcher,
> 
> new AsyncCallback.ChildrenCallback()
> {
> public void processResult( int rc, String path, Object ctx,
> List children )
> {
> logger.debug( "waitForChildrenChanged():
> AsyncCallback.ChildrenCallback(): " + rc + ", " + path + ", " + ctx + ", " +
> children );
> }
> }, null );
> 
> blChildrenChanged.waitUntilTrue( timeout );
> 
> zookeeper.removeWatch( tempWatcher );
> 
> return blChildrenChanged.isTrue();
> }
> 
> The only piece missing from the API is the   zookeeper.removeWatch(
> tempWatcher );
> 
> Satish
> 
> 
> On Thu, Jun 11, 2009 at 7:09 AM, Benjamin Reed  wrote:
> 
>> just to clarify i believe you are talking about callbacks on the watch
>> object you are passing in the asynchronous call rather than the asynchronous
>> completion callback. (Henry is making the same assumption.) when you say you
>> are getting the callback 10 times, i believe your are talking about 10
>> different watch objects getting called back once each. right?
>> 
>> it turns out that the zookeeper client does know what you are watching, and
>> the zookeeper server will only register one watch. the thing that is missing
>> is the clearWatches call that Henry refers to. the thing that complicates
>> things a bit, perhaps not for you, is the scenario where we have different
>> modules sharing the same zookeeper handle. if different modules are
>> interested in watching the same object, you don't want one module to simply
>> clear a the watches for a path because one module may mess up the other.
>> 
>> we have talked about adding this ability to clear watches for a while. i
>> think the auto-watch reregistration patch made the issue slightly more
>> pressing since it means that watches can survive for the entire lifetime of
>> a session not just for the duration of a connection to a specific server.
>> i've created ZOOKEEPER-442 to track this issue.
>> 
>> ben
>> 
>> 
>> Henry Robinson wrote:
>> 
>>> Hi Satish -
>>> 
>>> As you've found out, you can set multiple identical watches per znode -
>>> the
>>> zookeeper client will not detect identical watches in case you really
>>> meant
>>> to call them several times. There's no way currently, as far as I know, to
>>> clear the watches once they've been set. So your options are either to
>>> avoid
>>> repeatedly setting them by detecting whether getChildren is a repeat call,
>>> or by dealing with multiple invocations on the callback path and not doing
>>> anything once you've established you're no longer interested.
>>> 
>>> It might well make sense to add a clearWatches(path) call to the API,
>>> which
>>> would be useful particularly for clients where callbacks are expensive and
>>> require a context switch (which I think is true for all clients right
>>> now!).
>>> 
>>> Henry
>>> 
>>> On Wed, Jun 10, 2009 at 8:05 PM, Satish Bhatti 
>>> wrote:
>>> 
>>> 
>>> 
 I am using the asynchronous (callback) version of
 zookeeper.getChildren().
  That call returns immediately, I then wait for a certain time interval
 for
 nodes to appear, and if not I exit the method that made the
 zookeeper.getChildren()
 call.  Later on, a node gets added under that node and I see in my
 logfile
 that the Watcher.process() callback that I set above gets called.  Now if
 I
 make 10 failed attempts to get a node using the above technique, and at
 some
 later time a node does get added, I see in the logfile that the
 Watcher.process() ends up being called 10 times!  Of course by this time
 I
 have totally lost interest in those callbacks.  Question:  Is there a way
 to
 remove that asynchronous callback?  i.e. If I make a asynchronous
 zookeeper.getChildren()
 call, wait time t, give up, at that point can I remove the async
 callback?
 Satish
 
 
 
>>> 
>>

Re: Authentification for Zookeeper Server

2009-06-16 Thread Mahadev Konar

Hi David, 
 There is a jira open to document this in our forrest docs -

http://issues.apache.org/jira/browse/ZOOKEEPER-329.

Ill try and explain how to do in the email, feel free to respond with more
questions. The c and java api both have a call called add_auth/addAuth to
add authentication data for a client. Also, you can write pulgins at the
server side to verify this authentication. Take a look at files in
src/java/main/org/apache/zookeeper/server/auth/.

Also, you can add a new authentication to the server using java system
property 
zookeeper.authProvider.newAuth=classname.

After adding the authdata using client addauth api's you can use the
CREATOR_ALL_ACL which means that all the auths that you added using add_auth
will be stored with a znode that you create and will be required to specify
if you want to access those znodes again.

This is a very short explanation, so please feel free to ask more questions
on it.

thanks
mahadev

On 6/16/09 7:19 AM, "David Graf"  wrote:

> Hello
> 
> I've implemented a locking service with ZooKeeper (in C++). It was
> pretty easy to implement! Now, I would like to set up some kind of
> authentification on the server(s) to avoid that others are using my
> ZooKeeper server(s).
> 
> How can I do that? In the documentation (zookeeerProgrammers.pdf), I
> only found a paragraph that describes how to set up an access control
> list on every node. But nowhere, I found a possibility to set an
> authentification mechanism on the complete ZooKeeper server.
> 
> David Graf

Re: Authentification for Zookeeper Server

2009-06-16 Thread Mahadev Konar

HI Gustavo,
 > or is the idea that you simply allow the
> client to connect, but prevent it from touching any node at all using
> ACLs?
Yes.

  The auth plugin  works at the znode level . The server side authentication
I was talking about is just to verify the authentication for a zookeeper
client for creating/reading/changing znodes in ZooKeeper. So, if you want it
to work at the server level, you will have to add authentication to all the
znodes that you create in ZooKeeper, so non authenticated clients would not
be able to read anything in ZooKeeper. If you create znodes with no auths,
clients without authentication might be able to read it.

Hope this answers your question.
Thanks
mahadev

On 6/16/09 9:57 AM, "Gustavo Niemeyer"  wrote:

> Hello there,
> 
> I'm an interested newcomer to ZooKeeper, so please forgive me if I
> miss some important basic detail.
> 
> I actually had the same high-level question than the original poster,
> so I'm interested in the response too.
> 
>>  There is a jira open to document this in our forrest docs -
>> 
>> http://issues.apache.org/jira/browse/ZOOKEEPER-329.
>> 
>> Ill try and explain how to do in the email, feel free to respond with more
>> questions. The c and java api both have a call called add_auth/addAuth to
>> add authentication data for a client. Also, you can write pulgins at the
>> server side to verify this authentication. Take a look at files in
>> src/java/main/org/apache/zookeeper/server/auth/.
> 
> Oh, interesting.  So the auth plugin API works both at the node level
> and at the server level, or is the idea that you simply allow the
> client to connect, but prevent it from touching any node at all using
> ACLs?

Re: Authentification for Zookeeper Server

2009-06-17 Thread Mahadev Konar

Hi David, 
 Good question. You can set acls on the root. There is a minor bug related
to it (though its easy to get around it).

The jira is
http://issues.apache.org/jira/browse/ZOOKEEPER-433

The bug is that ZooKeeper does not allow you to do a get acl on the root if
it has not been set. You will still be able to set acl on the root and then
do a getacl with the right auth but just that a get acl on a raw root node
without any acl's being set by admin/user fails.

Hope that helps
mahadev 

On 6/17/09 12:56 AM, "David Graf"  wrote:

> Hello
> 
> Thanks a lot for the answers!
> 
> Due to the fact that I am running my ZooKeeper servers and clients on
> Amazon EC2 instances, using the ec2 Security Groups might be the best
> choice.
> 
> Nevertheless, I have a question concerning the authentification on the
> znode level. How is it possible to prevent clients creating node on
> the root level? Is it also possible to set an ACL on the root
> (although the root is not created by a client)?
> 
> David

Re: ZK quota

2009-06-18 Thread Mahadev Konar

Hi Raghu,
  We do have plans to enforce quota in future. Enforcing requires some more
work then just reporting. Reporting is a good enough tool for operations to
manage a zookeeper cluster but we would certainly like to enforce it in the
near future.

Thanks
mahadev

On 6/18/09 7:01 PM, "rag...@yahoo.com"  wrote:

> 
> Is there a reason why node count/byte quota is not actually enforced but
> rather ZK just warns? Are there any plans to enforce the quota in a future
> release?
> 
> Thanks
> Raghu
> 
> 
>

Re: common client

2009-06-22 Thread Mahadev Konar

Hi Stefan,
 This would be a good addition. Feel free to open a jira and contribute the
code. As Nitay suggested, this can go in to src/recipes/$recipe_name and
would be quite useful.

thanks
mahadev


On 6/22/09 4:45 PM, "Nitay"  wrote:

> +1. I would be interested in things like this. I think it should be in
> some contrib/ type thing under zookeeper, like the recipes.
> 
> On Mon, Jun 22, 2009 at 4:41 PM, Stefan Groschupf wrote:
>> Hi,
>> 
>> I wonder if people are interested to work together on a zk client that
>> support some more functionality than zk offers by default.
>> Katta has this client and I copied the code into a couple other projects as
>> well but I'm sure it could be better than it is.
>> 
>> http://katta.svn.sourceforge.net/viewvc/katta/trunk/src/main/java/net/sf/katt
>> a/zk/ZKClient.java?view=markup
>> 
>> I'm sure other would benefit from such a client.
>> 
>> Some of the feature are:
>> + Connect
>> + Data and StateChangeListener - subscribe once, get events until
>> unsubscribe
>> + Threadsafe
>> 
>> It is not a lot of code but I'm just tired to have it duplicated so many
>> times.
>> Anyone interested to join in?  Or is there something like this already?
>> I could just copy this to a github project.
>> 
>> Stefan
>> 
>>

Re: General Question about Zookeeper

2009-06-25 Thread Mahadev Konar

Hi Harold,
 As Henry mentioned, what acl's provide you is preventing access to znodes.
If someone has access to zookeeper's data stored on zookeeper's server
machines, they should be able to resconstruct the data and read it (using
zookeeper deserialization code).

I am not sure what kind of security model you are interested in, but for
ZooKeeper we expect the server side data stored on local disks be
inaccessible to normal users and only accessable to admins.

Hope this helps.
Thanks
mahadev

On 6/25/09 11:01 AM, "Henry Robinson"  wrote:

> Hi Harold,
> 
> Each ZooKeeper server stores updates to znodes in logfiles, and periodic
> snapshots of the state of the datatree in snapshot files.
> 
> A user who has the same permissions as the server will be able to read these
> files, and can therefore recover the state of the datatree without the ZK
> server intervening. ACLs are applied only by the server; there is no
> filesystem-level representation of them.
> 
> Henry
> 
> 
> 
> On Thu, Jun 25, 2009 at 6:48 PM, Harold Lim  wrote:
> 
>> 
>> Hi All,
>> 
>> How does zookeeper store data/files?
>> From reading the doc, the clients can put ACL on files/znodes to limit
>> read/write/create of other clients. However, I was wondering how are these
>> znodes stored on Zookeeper servers?
>> 
>> I am interested in a security aspect of zookeeper, where the clients and
>> the servers don't necessarily belong to the same "group". If a client
>> creates a znode in the zookeeper? Can the person, who owns the zookeeper
>> server, simply look at its filesystem and read the data (out-of-band, not
>> using a client, simply browsing the file system of the machine hosting the
>> zookeeper server)?
>> 
>> 
>> Thanks,
>> Harold
>> 
>> 
>> 
>>

Re: General Question about Zookeeper

2009-06-25 Thread Mahadev Konar

Hi Harold,
  Let me explain the whole concept of ZooKeeper Acls.

1) Zookeeper servers are run using some user id say X
2) zookeeper client use ZooKeeper client libaryr to create zookeeper nodes
on zookeeper servers. They could be running as user id C. They can provide
acl's to create such nodes for there accessability restrictions. These ACL's
have NOTHING to do with (user id X) or user id C. The access controls are
intependent of any user id the client is running with or the server is
running with
3) A user X can obviously create zookeeper database since he has access to
the local filesystem data that zookeeper is snapshots/txns into.


Hope this helps.
Mahadev
 
On 6/25/09 11:20 AM, "Harold Lim"  wrote:

> 
> Hi Henry,
> 
> Does that mean for example, if I own the Zookeeper server and physical machine
> and have lots of clients using this Zookeeper server, I can simply look at the
> logfiles and snapshot files and see all of the information created by those
> clients?
> 
> 
> Thanks,
> Harold
> 
> --- On Thu, 6/25/09, Henry Robinson  wrote:
> 
>> From: Henry Robinson 
>> Subject: Re: General Question about Zookeeper
>> To: zookeeper-user@hadoop.apache.org
>> Date: Thursday, June 25, 2009, 2:01 PM
>> Hi Harold,
>> 
>> Each ZooKeeper server stores updates to znodes in logfiles,
>> and periodic
>> snapshots of the state of the datatree in snapshot files.
>> 
>> A user who has the same permissions as the server will be
>> able to read these
>> files, and can therefore recover the state of the datatree
>> without the ZK
>> server intervening. ACLs are applied only by the server;
>> there is no
>> filesystem-level representation of them.
>> 
>> Henry
>> 
>> 
>> 
>> On Thu, Jun 25, 2009 at 6:48 PM, Harold Lim 
>> wrote:
>> 
>>> 
>>> Hi All,
>>> 
>>> How does zookeeper store data/files?
>>> From reading the doc, the clients can put ACL on
>> files/znodes to limit
>>> read/write/create of other clients. However, I was
>> wondering how are these
>>> znodes stored on Zookeeper servers?
>>> 
>>> I am interested in a security aspect of zookeeper,
>> where the clients and
>>> the servers don't necessarily belong to the same
>> "group". If a client
>>> creates a znode in the zookeeper? Can the person, who
>> owns the zookeeper
>>> server, simply look at its filesystem and read the
>> data (out-of-band, not
>>> using a client, simply browsing the file system of the
>> machine hosting the
>>> zookeeper server)?
>>> 
>>> 
>>> Thanks,
>>> Harold
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 
>

Re: Some questions about Zookeeper 3.2.0

2009-06-30 Thread Mahadev Konar

Hi Qian,
 Sorry for the delayed response. ZooKeeper guarantees that their is
backwards compatibility with minor releases. So, 3.2* is backwards
compatible with 3.1*. So, what Ted mentions below related to rolling upgrade
should work. 

Hope this helps.
Thanks
Mahadev

Ted Dunning:

A rolling update works very well for that.  You can also change the number
of nodes in the cluster.

To do this, you replace the config files on the surviving servers and on the
new server.

Then take down the one that is leaving the cluster and then one by one
restart the servers that will remain in the cluster.  Then start the new
server.

If you only have 3 ZK servers, you might want to do the upgrade in two steps
if you are totally paranoid about running with just 2 servers in your
cluster.  In that scenario, you would add the new server to the config and
restart each of the 3 ZK servers and staring the new one.  That gives you a
ZK cluster with 4 servers.  Another rolling update can reconfigure it to
have only the 3 servers you want.

Make sure you leave several seconds between restarting each server.  That
will give the cluster time to calm down and do any necessary leader
elections

On 6/28/09 11:03 PM, "Qian Ye"  wrote:

> A related question.  If I have run a Zookeeper service with many ephemeral
> znodes on five servers, then I want to replace one of the servers by a new
> one, and the new server has a new IP. How can I do the job without any loss
> of data ?
> 
> Thanks
> 
> 
> On Mon, Jun 29, 2009 at 9:24 AM, Qian Ye  wrote:
> 
>> Thanks for the explanation, it sounds great.
>> 
>> It seems that I should study more on Zookeeper :-)
>> 
>> regards
>> 
>> 
>> On Mon, Jun 29, 2009 at 2:15 AM, Ted Dunning wrote:
>> 
>>> I don't think you should be very nervous at all.
>>> 
>>> There are two questions:
>>> 
>>> 1) can 3.1.1 go to 3.2 with no down time.  This is very likely, but a
>>> wiser
>>> head than mine should have final say
>>> 
>>> 2) can 3.1.1 go to 3.2 with < 1 minute of downtime.  The is for sure.
>>> 
>>> Neither option involves data loss.
>>> 
>>> ZK is actually really really good for HA operations.
>>> 
>>> On Sun, Jun 28, 2009 at 4:55 AM, Qian Ye  wrote:
>>> 
 your answer makes me nervous about upgrade Zookeeper
 server

>>> 
>> 
>> 
>> 
>> --
>> With Regards!
>> 
>> Ye, Qian
>> Made in Zhejiang University
>> 
>> 
>

Re: common client

2009-06-30 Thread Mahadev Konar

Hi Stefan,
  Feel free to use the mailing list to discuss things related to the
zookeeper common client. If the traffic is huge, as you suggested the
discussions can be moved  out of this mailing list.

mahadev


On 6/30/09 9:24 PM, "Stefan Groschupf"  wrote:

> Hi All,
> we created a github repo:
> http://github.com/joa23/zkclient/tree/master
> 
> It is empty right now but my colleagues Peter and Vivek will push
> something into it within the next days.
> As soon we have something stable we can push it where ever we want.
> Github is just easy to get started.
> 
> If that is ok with the zookeeper crew we can use this mailing list to
> discuss things. As soon the traffic is too much we would start a tmp
> google group or something.
> Every helping hand but even more improvement suggestion are welcome.
> 
> Thanks,
> Stefan
> 
> ~~~
> Hadoop training and consulting
> http://www.scaleunlimited.com
> http://www.101tec.com
> 
> 
> 
> On Jun 23, 2009, at 7:42 AM, Henry Robinson wrote:
> 
>> +1 to this idea. It will be good to have some more focus on examples
>> of how
>> to build applications using ZK; experiences here will feed back into
>> the
>> design of the core.
>> 
>> Henry
>> 
>> On Tue, Jun 23, 2009 at 2:23 AM, Mahadev Konar > inc.com>wrote:
>> 
>>> Hi Stefan,
>>> This would be a good addition. Feel free to open a jira and
>>> contribute the
>>> code. As Nitay suggested, this can go in to src/recipes/
>>> $recipe_name and
>>> would be quite useful.
>>> 
>>> thanks
>>> mahadev
>>> 
>>> 
>>> On 6/22/09 4:45 PM, "Nitay"  wrote:
>>> 
>>>> +1. I would be interested in things like this. I think it should
>>>> be in
>>>> some contrib/ type thing under zookeeper, like the recipes.
>>>> 
>>>> On Mon, Jun 22, 2009 at 4:41 PM, Stefan Groschupf
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> I wonder if people are interested to work together on a zk client
>>>>> that
>>>>> support some more functionality than zk offers by default.
>>>>> Katta has this client and I copied the code into a couple other
>>>>> projects
>>> as
>>>>> well but I'm sure it could be better than it is.
>>>>> 
>>>>> 
>>> 
http://katta.svn.sourceforge.net/viewvc/katta/trunk/src/main/java/net/sf/kat>>>
t
>>>>> a/zk/ZKClient.java?view=markup
>>>>> 
>>>>> I'm sure other would benefit from such a client.
>>>>> 
>>>>> Some of the feature are:
>>>>> + Connect
>>>>> + Data and StateChangeListener - subscribe once, get events until
>>>>> unsubscribe
>>>>> + Threadsafe
>>>>> 
>>>>> It is not a lot of code but I'm just tired to have it duplicated
>>>>> so many
>>>>> times.
>>>>> Anyone interested to join in?  Or is there something like this
>>>>> already?
>>>>> I could just copy this to a github project.
>>>>> 
>>>>> Stefan
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: Help to compile Zookeeper C API on a old system

2009-07-06 Thread Mahadev Konar

Hi Qian,
  What issues do you face? I have never tried compiling with the
configuration below, but I could give it a try in my free time to see if I
can get it to compile.

mahadev

On 7/6/09 7:37 AM, "Qian Ye"  wrote:

> Hi all:
> 
> I'm writing to ask you to do me a favor. It's urgent. For some unchangeable
> reason, I have to compile "libzookeeper_st.a", "libzookeeper_mt.a" on an old
> system:
> 
> gcc 2.96
> autoconf 2.13
> automake 1.4-p5
> libtool 1.4.2
> 
> I cannot not compile the target lib in the usual way, and this task drives
> me crazy :-(
> 
> could anyone help me out? Thanks a lot~

Re: zookeeper on ec2

2009-07-06 Thread Mahadev Konar

Hi David,
 Answers in line:


On 7/6/09 4:45 AM, "David Graf"  wrote:

> Hello
> 
> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> system, zookeeper is used to run a locking service and to generate
> unique id's. Currently, for testing purposes, I am only running one
> instance. Now, I need to set up an ensemble to protect my system
> against crashes.

> The ec2 services has some differences to a normal server farm. E.g.
> the data saved on the file system of an ec2 instance is lost if the
> instance crashes. In the documentation of zookeeper, I have read that
> zookeeper saves snapshots of the in-memory data in the file system. Is
> that needed for recovery? Logically, it would be much easier for me if
> this is not the case.
Yes, zookeeper keeps persistent state on disk. This is used for recovery and
correctness of zookeeper.

> Additionally, ec2 brings the advantage that serves can be switch on
> and off dynamically dependent on the load, traffic, etc. Can this
> advantage be utilized for a zookeeper ensemble? Is it possible to add
> a zookeeper server dynamically to an ensemble? E.g. dependent on the
> in-memory load?
It is not yet possible to add servers dynamically. There is work going on to
to that on http://issues.apache.org/jira/browse/ZOOKEEPER-107. This should
get into the next release (am hoping). For now, you will have to do a
rolling restart if you do not want the services to do down or else restart
all the machines at the same time (the zookeeper cleints should be able to
handle a minor downtime of zookeeper service).

Thanks
mahadev
> 
> David

Re: Help to compile Zookeeper C API on a old system

2009-07-06 Thread Mahadev Konar

Hi Qian,
  I am not sure if it will work. You should be able to back port it such a
way so that it works with gcc 3.*/4.*, but again I have never tried it.

mahadev


On 7/6/09 6:35 PM, "Qian Ye"  wrote:

> Thanks Mahadev,  I follow the installation instruction in the README,
> 
> autoreconf -i -f
> ./configure --prefix=$dir
> make
> make install
> 
> until "./configure --prefix=$dir", there is no error, however, errors came
> when I did make,
> 
> My plan is change the compiler from gcc to g++, and solve the compile errors
> one by one.
> 
> Will my plan do?
> 
> Thanks~
> 
> 
> On Tue, Jul 7, 2009 at 2:22 AM, Mahadev Konar  wrote:
> 
>> Hi Qian,
>>  What issues do you face? I have never tried compiling with the
>> configuration below, but I could give it a try in my free time to see if I
>> can get it to compile.
>> 
>> mahadev
>> 
>> 
>> On 7/6/09 7:37 AM, "Qian Ye"  wrote:
>> 
>>> Hi all:
>>> 
>>> I'm writing to ask you to do me a favor. It's urgent. For some
>> unchangeable
>>> reason, I have to compile "libzookeeper_st.a", "libzookeeper_mt.a" on an
>> old
>>> system:
>>> 
>>> gcc 2.96
>>> autoconf 2.13
>>> automake 1.4-p5
>>> libtool 1.4.2
>>> 
>>> I cannot not compile the target lib in the usual way, and this task
>> drives
>>> me crazy :-(
>>> 
>>> could anyone help me out? Thanks a lot~
>> 
>> 
>

Re: Question about the sequential flag on create.

2009-07-13 Thread Mahadev Konar

Hi Erik,
 The children that you get in return are not guaranteed to be in sorted
order  and that's why they need to be sorted each time at the client side.
Hope that helps.

thanks
mahadev


On 7/13/09 4:27 PM, "Erik Holstad"  wrote:

> Hey!
> I have been playing around with the queue and barrier example found on the
> home page and have some questions about the code.
> First of all I had trouble getting the queue example to work since the code
> turns the sequence number into an int and then try to get information
> from it, missing out the padding, which caused some confusion at first. So I
> changed it to compare the strings themselves so you didn't have to
> add the padding back on.
> 
> So the fact that you have to sort the children every time you get them is a
> little bit confusing to me, does anyone have simple answer to why that is?
> 
> Regards Erik

Re: Question about the sequential flag on create.

2009-07-13 Thread Mahadev Konar

Internally children of a node are not guranteed to be stored as sorted via
their names. The counter that you mention is just a version number on the
parent that is used during creation of children of a node that are created
with a Sequential flag.  It has nothing to do with how the children of a
node are stored in the data structures internally.

Hope this helps. 
mahadev

On 7/13/09 4:53 PM, "Erik Holstad"  wrote:

> Hi Mahadev!
> Thanks for the quick reply. Yeah, I saw that in the source, but was just
> curious why that is, since it is a part of an internal
> counter structure, right?
> 
> Regards Erik

Re: Instantiating HashSet for DataNode?

2009-07-14 Thread Mahadev Konar

Hi Erik,
  I am not sure if that would a considerable opitmization but even if you
wanted to do it, it would be much more than just adding a check in the
constructor (the serialization/deserialization would need to have
specialized code). Right now all the datanodes are treated equally for
ser/derser and other purposes.

mahadev

On 7/14/09 1:42 PM, "Erik Holstad"  wrote:

> I'm not sure if I've miss read the code for the DataNode, but to me it looks
> like every node gets a set of children even though it might be an
> ephemeral node which cannot have children, so we are wasting 240 B for every
> one of those. Not sure if it makes a big difference, but just thinking
> that since everything sits in memory and there is no reason to instantiate
> it, maybe it would be possible just to add a check in the constructor?
> 
> Regards Erik

Re: zkCleanup.sh is buggy

2009-07-17 Thread Mahadev Konar

Hi Fernando,
 Please do file a jira ( http://issues.apache.org/jira/browse/ZOOKEEPER )
and the patch below as an attachment to the created jira.

Here is how to contribute:
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

 thnaks 
Mahadev


On 7/17/09 11:59 AM, "Fernando Padilla"  wrote:

> Actually.. this is the patch I would suggest:
> 
> 
> remove everything below and including the "eval", and change to:
> 
> 
> ZOODATADIR=$(grep '^dataDir=' $ZOOCFG | sed -e 's/.*=//')
> ZOODATALOGDIR=$(grep '^dataLogDir=' $ZOOCFG | sed -e 's/.*=//')
> 
> if [ "x${ZOODATALOGDIR}" = "x" ]
> then
> java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}"
> "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
>   -cp $CLASSPATH $JVMFLAGS \
>   org.apache.zookeeper.server.PurgeTxnLog $ZOODATADIR $*
> else
> java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}"
> "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
>   -cp $CLASSPATH $JVMFLAGS \
>   org.apache.zookeeper.server.PurgeTxnLog $ZOODATALOGDIR $ZOODATADIR $*
> fi
> 
> 
> 
> 
> Fernando Padilla wrote:
>> I am playing with the zookeeper 3.2.0 build, and it looks like the
>> zkCleanup.sh script is a little buggy. :)
>> 
>> It calls:
>> 
>> PurgeTxnLog $dataDir
>> 
>> but doesn't pass through the count of snapshots.. you could do it simply
>> by adding:
>> 
>> PurgeTxnLog $dataDir $*
>> 
>> 
>> Though I just realized, it only uses $dataDir, and is not smart enough
>> to realize if it's using a different dataLogDir...
>> 
>> Should I file a bug?

1 2 3 >

1 - 100 of 247 matches

Mail list logo