[PATCH] a simple Leader Election or exclusive Write Lock protocol/policy

2008-07-17 Thread James Strachan
So having recently discovered ZooKeeper, I'm really liking it - good job folks!

I've seen discussions of building high level features from the core ZK
library and had not seen any available on the interweb so figured I'd
have a try creating a simple one. Feel free to ignore it if a ZK ninja
can think of a neater way of doing it - I've basically followed the
protocol defined in the recent ZK presentation...
http://developer.yahoo.com/blogs/hadoop/2008/03/intro-to-zookeeper-video.html

I've submitted the code as a patch here...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

I figured the Java Client might as well come with some helper code to
make doing things like exclusive locks or leader elections easier; we
could always spin them out into a separate library if and when
required etc. Right now its one fairly simple class :)

Currently its a simple class where you can register a Runnable to be
invoked when you have the lock; or you can just keep asking if you
have the lock now and again as you see fit etc.

WriteLock locker = new WriteLock(zookeeper, "/foo/bar");
locker.setWhenOwner(new Runnable() {...}); // fire this code when owner...

// lets try own it
locker.acquire();

// I may or may not have the lock now
if (locker.isOwner()) {}

// time passes
locker.close();


Thoughts?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: Recipe contrib -- was Re: [PATCH] a simple Leader Election or exclusive Write Lock protocol/policy

2008-07-18 Thread James Strachan
As a newbie around hadoop/zookeeper (but long time apache hacker on
various things), I think the high level protocol/recipe stuff should
definitely be a separate, released module; so folks who wanna use it
can share a versioned binary and so forth. It would be awesome to have
a few good protocols/recipes for things like leader election, write
locks, read/write locks - maybe load balancing (e.g. sparse hash maps
and whatnot) - plus maybe more distributed java-util-concurrent stuff
like barriers etc. Even if folks don't reuse the recipes at least they
can act as good educational material for using the core ZK client API
etc.

I don't see a huge need yet to split these Java recipe's into multiple
individual releases/jars yet - as we're talking a pretty small
codebase, so a single zookeeper-recipe.jar would suffice for now. e.g.
for leader election & exclusive write locks we're talking two classes
probably; maybe with a few interface hooks or something. I'd be
tempted to stick with all the recipe's in a single jar for now - then
if we find some complex recipe's coming along that are more
specialised and require loads of code, we could spin those out later
but many of the core recipe's are gonna be pretty small in terms of
code and I can also see a small chunk of code being reusable in a few
recipe's etc. (e.g. exclusive write locks are kinda the same thing as
leader election).

The bigger decision is more do we release the core Java API/impl to ZK
with the recipe's or as a separate release process. Certainly the core
of ZK is pretty stable; the recipe's are gonna be in a state of flux &
rapid development (hopefully!:) for a while. Its gonna be a while
before all the recipes are totally finished & hardened; but I don't
see why they can't be released in a kinda work-in-progress state. e.g.
so long as we document that the recipe code may change in the future,
I don't see any issue including the recipe code along with the ZK
client and server in a single release. But down the line we could
always split it up if we need to.

I tend to prefer deferring decisions until we really need to and
saving work where possible (particularly when it comes to things like
doing releases); so for now I'd prefer to release the recipe's in the
Java ZK client/server release distro as a separate optional jar (along
with some warnings the recipe's are work in progress and might change
a little over the next few months as we figure out the neatest way to
implement the higher level protocols etc) plus separate documentation
stuff.

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: Recipe contrib -- was Re: [PATCH] a simple Leader Election or exclusive Write Lock protocol/policy

2008-07-18 Thread James Strachan
2008/7/17 Benjamin Reed <[EMAIL PROTECTED]>:
> Excellent proposal. The only thing I would add is that there should be
> an english description of the recipe in subversion. That way if someone
> wanted to do a compatible binding they can do it. If the recipe is on
> the wiki it would be hard to keep it in sync, so it is important that it
> is in subversion. My preference would be that the doc would be in the
> same contrib subdirectory as the source for ease of maintenance.

Good idea. How about for Java recipe's we include the documentation as
HTML with the javadoc so we can link to it easily and so that the
recipe is kept with the code & versioned nicely (so as the
recipe/algorithm changes we version it with the source code etc)

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


javadoc for the Write Lock / Leader Election

2008-07-18 Thread James Strachan
The other thread was already quite big and covering a large range of
issues so thought I'd spin up a little separate thread :)

I've just updated the patch to include better javadoc which is linked
to an embedded HTML documentation describing the protocol. The
documention includes the pseudocode from the online ZooKeeper
presentation (that I used) and I've also included the text from
ZOOKEEPER-79 which I'm glad to say seems to match up perfectly with
the pseudocode I'd used :)
https://issues.apache.org/jira/browse/ZOOKEEPER-78

One thing confused me though; the last paragraph says...

This protocol guarantees that there is at any time only one node that
thinks it is the leader. But it does not disseminate information about
who is the leader. If you want everyone to know who is the leader, you
can have an additional Znode whose value is the name of the current
leader (or some identifying information on how to contact the leader,
etc.). Note that this cannot be done atomically, so by the time other
nodes find out who the leader is, the leadership may already have
passed on to a different node.

In the current implementation, WriteLock - each znode can know,
whenever it attempts to acquire the lock - if it didn't get the lock,
who the owner is. I guess this is only true momentarily the split
second that the acquire() method is called (i.e. the exact moment the
getChildren() is called and the lowest value is found). Or is there
some other subtle issue I'm not seeing?

I guess we could add a method to WriteLock - if folks wanted - a kinda
queryLeader() method where we just use the same algorithm to find who
the current leader is - if folks cared. Though am not sure how useful
knowing who the leader is :). Though I guess writing the leader's
identity to some canonical znode that any other znode can read
whenever it wishes is less risky and maybe simpler.

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


javadoc for the Write Lock / Leader Election

2008-07-18 Thread James Strachan
The other thread was already quite big and covering a large range of
issues so thought I'd spin up a little separate thread :)

I've just updated the patch to include better javadoc which is linked
to an embedded HTML documentation describing the protocol. The
documention includes the pseudocode from the online ZooKeeper
presentation (that I used) and I've also included the text from
ZOOKEEPER-79 which I'm glad to say seems to match up perfectly with
the pseudocode I'd used :)
https://issues.apache.org/jira/browse/ZOOKEEPER-78

One thing confused me though; the last paragraph says...

This protocol guarantees that there is at any time only one node that
thinks it is the leader. But it does not disseminate information about
who is the leader. If you want everyone to know who is the leader, you
can have an additional Znode whose value is the name of the current
leader (or some identifying information on how to contact the leader,
etc.). Note that this cannot be done atomically, so by the time other
nodes find out who the leader is, the leadership may already have
passed on to a different node.

In the current implementation, WriteLock - each znode can know,
whenever it attempts to acquire the lock - if it didn't get the lock,
who the owner is. I guess this is only true momentarily the split
second that the acquire() method is called (i.e. the exact moment the
getChildren() is called and the lowest value is found). Or is there
some other subtle issue I'm not seeing?

I guess we could add a method to WriteLock - if folks wanted - a kinda
queryLeader() method where we just use the same algorithm to find who
the current leader is - if folks cared. Though am not sure how useful
knowing who the leader is :). Though I guess writing the leader's
identity to some canonical znode that any other znode can read
whenever it wishes is less risky and maybe simpler.

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: javadoc for the Write Lock / Leader Election

2008-07-18 Thread James Strachan
Thanks for the clarification. I think it makes lots of sense for the
leader to write to some canonical place to advertise itself; if others
are interested in knowing if it is the leader

2008/7/18 Flavio Junqueira <[EMAIL PROTECTED]>:
> Hi James, the fact that the client's node has another node n ahead of it the
> in the sequence order doesn't mean that the owner of n is aware that it is
> the lock holder or the leader. This is because operations are propagated
> asynchronously. Also, a getChildren() doesn't guarantee that you have the
> latest list, and it is possible that another node is at the head of the
> ordered list of nodes at the moment you read the response of getChildren().
> This is because getChildren() will return the local state of one server,
> while the ensemble of servers is processing or have even already decided
> upon a change to the list.
>
> In the way I understand Jacob's suggestion, a leader client creates a
> separate node to acknowledge that it is actually aware that it is the
> leader, and so it is ready to perform the role of a leader.
>
> -Flavio
>
>> -Original Message-
>>
>> One thing confused me though; the last paragraph says...
>>
>> This protocol guarantees that there is at any time only one node that
>> thinks it is the leader. But it does not disseminate information about
>> who is the leader. If you want everyone to know who is the leader, you
>> can have an additional Znode whose value is the name of the current
>> leader (or some identifying information on how to contact the leader,
>> etc.). Note that this cannot be done atomically, so by the time other
>> nodes find out who the leader is, the leadership may already have
>> passed on to a different node.
>>
>> In the current implementation, WriteLock - each znode can know,
>> whenever it attempts to acquire the lock - if it didn't get the lock,
>> who the owner is. I guess this is only true momentarily the split
>> second that the acquire() method is called (i.e. the exact moment the
>> getChildren() is called and the lowest value is found). Or is there
>> some other subtle issue I'm not seeing?
>>
>> I guess we could add a method to WriteLock - if folks wanted - a kinda
>> queryLeader() method where we just use the same algorithm to find who
>> the current leader is - if folks cared. Though am not sure how useful
>> knowing who the leader is :). Though I guess writing the leader's
>> identity to some canonical znode that any other znode can read
>> whenever it wishes is less risky and maybe simpler.
>>
>> --
>> James
>> ---
>> http://macstrac.blogspot.com/
>>
>> Open Source Integration
>> http://open.iona.com
>
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: An interest in increasing the DI'ness of ZooKeeper?

2008-07-18 Thread James Strachan
+1 :)

I'm a fellow ActiveMQ hacker too and would love to see ZK included
with ActiveMQ. Dependency Injection can really help keep your code
simple but leaving it flexible so it can be used in many different
ways.

Here's some links on DI
http://martinfowler.com/articles/injection.html
http://www.theserverside.com/tt/articles/article.tss?l=SpringFramework

2008/7/18 Hiram Chirino <[EMAIL PROTECTED]>:
> Hi Guys,
>
> First off, great project!  I think ZooKeeper is a fabulous idea.  I
> can see folks wanting to embedd ZK servers in their products too.  I
> could see the ActiveMQ project embedding it for several reasons.  And
> with that in mind,  I think it would be awesome of ZK tried to use
> more dependency injection (DI) to configure it's objects.  That way
> and embedding project could directly configure it with java code, or
> use Spring or Guice etc. etc.
>
> If you guys are interested in supporting this use case, I'd be happy
> to start contributing patches to make that happen.
>
> --
> Regards,
> Hiram
>
> Blog: http://hiramchirino.com
>
> Open Source SOA
> http://open.iona.com
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


auto-reconnection ZooKeeper proxy?

2008-07-18 Thread James Strachan

I work on the ActiveMQ project which implements the JMS API - which is
a kinda complex thing but it involves a number of objects
(Connections, Sessions, Producers, Consumers). In some JMS providers
its the end users responsibility to deal with detecting a connection
failure (from any other kind of error) and then automatically
recreating all the dependent objects.

We added support for auto-reconnection which greatly simplifies the
developers life; it lets the JMS client automatically deal with any
socket failures, reconnecting to a broker for you and re-establishing
all of those in-flight operations (subscriptions, in progress sends
and so forth).
http://activemq.apache.org/how-can-i-support-auto-reconnection.html

Having seen the value of wrapping up the auto-reconnection within a
proxy; am thinking its also got merits on ZK



As we start creating protocols/recipes that implement higher order
features like locks, leader elections and so forth we could probably
do with some kinda auto-reconnecting facade to ZooKeeper just to
simplify the implementation code of protocols/recipes. Its a kinda
complex area though and I'm sure different protocols will want
different things; but even for something so simple as a lock - I can
see the value in an auto-reconnecting proxy.

e.g. there's already 5 different method calls in the current WriteLock
implementation which all really need a custom try/catch around them to
detect loss of the connection which then should be wrapped in a
reconnect-retry logic.

What to do about watches is interesting; though for now the current
behaviour seems fine (fire them all forcing a re-watch) though we
could though in the future re-enable watches in the new server
connection as an option.

All I'm thinking about for now is a kinda ReconnectingZooKeeper which
looks like a ZooKeeper object but which internally catches dead
connections and then internally tries to reconnect to one of the ZK
servers under the covers - retrying the current read/write operation
until the ReconnectPolicy says to fail. e.g. some folks might wanna
retry connecting forever; others for a certain amount of time or
certain number of attempts etc.

So something like...

public class ReconnectingZooKeeper extends ZooKeeper {
  ...
  // for each method that reads/writes synchronously
  public Stat exists(String path) {...
 boolean retry = true;
 for (int count = 0; retry; count++ ) {
   try {

  // really do the method call!
  return super.exists(path);

   } catch (ConnectionClosedException e) {

  // lets let any watches or listeners respond to connection
loss first before we retry
  fireAnyWatchesAndStuff();

  if (!shouldRetry(count)) {
 throw e;
   }
   }
}


Any watches should fire when a connection is lost - and all writes
should be replicated to the new server we connect to right? So I'm
thinking, if we had a ReconnectingZooKeeper implementation, we could
use it with the current WriteLock implementation so that the protocol
could survive ZK server loss & reconnection while still working.

e.g. on connection loss the leader/lock owner needs to loose the lock
until it gets it back just in case; but other than that I think it
should work.

Am sure there's some gremlins somewhere in automatically reconnecting;
though provided the watch mechanism works, clients will be able to do
the right thing I think.

Thoughts?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: Recipe contrib -- was Re: [PATCH] a simple Leader Election or exclusive Write Lock protocol/policy

2008-07-21 Thread James Strachan
It should be pretty easy to link together the various recipe's from
the site/wiki and where possible share the recipe documentation across
languages

2008/7/18 Benjamin Reed <[EMAIL PROTECTED]>:
> Some initial implementations of a recipe may only be in C, so it would
> be nice to have a standard way of finding the recipe that wasn't
> dependent on the language that implements the recipe.
>
> ben
>
> James Strachan wrote:
>> 2008/7/17 Benjamin Reed <[EMAIL PROTECTED]>:
>>
>>> Excellent proposal. The only thing I would add is that there should be
>>> an english description of the recipe in subversion. That way if someone
>>> wanted to do a compatible binding they can do it. If the recipe is on
>>> the wiki it would be hard to keep it in sync, so it is important that it
>>> is in subversion. My preference would be that the doc would be in the
>>> same contrib subdirectory as the source for ease of maintenance.
>>>
>>
>> Good idea. How about for Java recipe's we include the documentation as
>> HTML with the javadoc so we can link to it easily and so that the
>> recipe is kept with the code & versioned nicely (so as the
>> recipe/algorithm changes we version it with the source code etc)
>>
>>
>
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: auto-reconnection ZooKeeper proxy?

2008-07-22 Thread James Strachan
I've been experimenting with the WriteLock implementation to deal with
server failure; I've found that its maybe too simplistic creating a
reconnecting ZooKeeper proxy; instead I'm just making it easy to retry
operations (or arbitrary ZK code blocks) using a helper class
(currently called ProtocolSupport but am open to suggestions for a
better class name for a base class for higher level protocol
implementations).

Using the WriteLock as an example; it seems you often want the retry
logic to include a number of calls to ZooKeeper; (e.g. check if a
znode exists, if it doesn't try to create it - retrying the whole
thing when ZK exceptions like connection loss occur etc).

I'll submit the patch soon to ZOOKEEPER-78 including this...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

One thing I have found is I've managed to get a
SessionExpiredException in my test case (not sure why though; I
thought ZooKeeper automatically kept sending keep alive pings?). I
just wondered what a client should do if that happens; I didn't see
any easy way to effectively disconnect and reconnect a ZooKeeper
client in this case.

I'm assuming that the SessionExpiredException is always gonna be
possible; so I've patched ZooKeeper to allow clients to handle a
SessionExpiredException and force a reconnection (to get a new
session).

So I've created a small patch to add a reconnect() method to ZooKeeper
which just closes and recreates the cnxn object...
https://issues.apache.org/jira/browse/ZOOKEEPER-84

(I also added a toString() method for easier debugging when running
test cases with multiple clients in the same jvm).

There's maybe a less drastic way to force the re-connection of a
ZooKeeper client; but I figured trashing and recreating the cnxn
object at least is lowest risk and a simple patch :) and the code
should only be executed rarely so performance isn't such an issue.

Thoughts?

2008/7/18 James Strachan <[EMAIL PROTECTED]>:
> 
> I work on the ActiveMQ project which implements the JMS API - which is
> a kinda complex thing but it involves a number of objects
> (Connections, Sessions, Producers, Consumers). In some JMS providers
> its the end users responsibility to deal with detecting a connection
> failure (from any other kind of error) and then automatically
> recreating all the dependent objects.
>
> We added support for auto-reconnection which greatly simplifies the
> developers life; it lets the JMS client automatically deal with any
> socket failures, reconnecting to a broker for you and re-establishing
> all of those in-flight operations (subscriptions, in progress sends
> and so forth).
> http://activemq.apache.org/how-can-i-support-auto-reconnection.html
>
> Having seen the value of wrapping up the auto-reconnection within a
> proxy; am thinking its also got merits on ZK
> 
>
>
> As we start creating protocols/recipes that implement higher order
> features like locks, leader elections and so forth we could probably
> do with some kinda auto-reconnecting facade to ZooKeeper just to
> simplify the implementation code of protocols/recipes. Its a kinda
> complex area though and I'm sure different protocols will want
> different things; but even for something so simple as a lock - I can
> see the value in an auto-reconnecting proxy.
>
> e.g. there's already 5 different method calls in the current WriteLock
> implementation which all really need a custom try/catch around them to
> detect loss of the connection which then should be wrapped in a
> reconnect-retry logic.
>
> What to do about watches is interesting; though for now the current
> behaviour seems fine (fire them all forcing a re-watch) though we
> could though in the future re-enable watches in the new server
> connection as an option.
>
> All I'm thinking about for now is a kinda ReconnectingZooKeeper which
> looks like a ZooKeeper object but which internally catches dead
> connections and then internally tries to reconnect to one of the ZK
> servers under the covers - retrying the current read/write operation
> until the ReconnectPolicy says to fail. e.g. some folks might wanna
> retry connecting forever; others for a certain amount of time or
> certain number of attempts etc.
>
> So something like...
>
> public class ReconnectingZooKeeper extends ZooKeeper {
>  ...
>  // for each method that reads/writes synchronously
>  public Stat exists(String path) {...
> boolean retry = true;
> for (int count = 0; retry; count++ ) {
>   try {
>
>  // really do the method call!
>  return super.exists(path);
>
>   } catch (ConnectionClosedException e) {
>
>  // lets let any watches or listeners respond to connection
> loss first before we retry
>  f

things lock up when the client reconnects?

2008-07-22 Thread James Strachan
I wonder if anyone else has seen this recently; I've been trying to
make the WriteLock implementation survive server restarts (i.e.
reconnecting to another ZK server) with some success. See the latest
patch here...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

but I've found I can reliably get things to lock up. See the
WriteLockTest.java and change the workAroundClosingLastZNodeFails to
false and you should be able to run the test yourself and see things
lock up.

It seems like things lock up when waiting on a Packet being sent to
the transport. Sometimes I get a session timed out exception, so if I
see that I try and recreate the cxcn object which is maybe causing the
issue; I tried patching the ClientCnxn.SendThread.close() method to do
a cleanup() to wake up any blocked threads before closing (its in the
patch for ZOOKEEPER-78 which also depends on the patch for
ZOOKEEPER-84 BTW); am wondering if anyone has a better idea of dealing
with a session timeout?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: Website

2008-07-23 Thread James Strachan
2008/7/22 Hiram Chirino <[EMAIL PROTECTED]>:
> Lol.. Apache infrastructure supports multiple wiki backends.  It's up
> to the project to pick which one you want to you.  You currently have
> picked MoinMoin, but you could have easily picked Confluence, just
> like these other Apache projects did:
>
> http://cwiki.apache.org/confluence/dashboard.action

FWIW I've been on plenty of apache projects that started on MoinMoin,
then realised down the line how much better Confluence is - then went
through the wiki migration pain.

Tools like wikis are personal things; and folks tend to prefer to use
the tool they know. But its worth seriously thinking about which wiki
you want up front - before you realise down the line that you wanna
switch. Just because the rest of hadoop uses MoinMoin doesn't mean you
have to follow suit; all wikis can link to each other nicely. And
there might be some hadoop folks who are not exactly head over heals
about MoinMoin.

All I'm gonna say is I've used both heavily and far prefer Confluence
hands down; it means you can avoid all the Forest stuff; Confluence
can auto-export its content to make a rather nice static HTML site for
you; keeping your website and wiki clean, updated and fresh. Plus you
can then reuse the same wiki markup inside your issue tracker (JIRA) -
so you don't need to remember multiple wiki formats. Finally, there's
great JIRA macros in confluence for creating lovely release notes in
your website; or the snippet macros to link to fragments of test cases
in your online documentation (so your documentation gets unit tested).

Having said all that - its up to you guys to pick something you're
happy with - so I'll bow out now and leave you to it :)

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: things lock up when the client reconnects?

2008-07-23 Thread James Strachan
2008/7/22 Flavio Junqueira <[EMAIL PROTECTED]>:
> James, I'd like to clarify what exactly is the issue you're looking at. If
> you provide a list of ZooKeeper servers, then a client will try to reconnect
> to another ZooKeeper server upon a disconnection. Reconnecting to another
> server does not guarantee maintaining the same session, though. So, are you
> trying to guarantee that the session is still the same upon a reconnection?
> If so, I don't think you can do it by just changing the client, since the
> servers might have expired the old session.

I'm trying to test the WriteLock implementation in the case where the
server dies and the client reconnects to another server.
In the test case I'm just running one server, killing it, restarting
it and trying to get the client to reconnect.

The test case is WriteLockTest in this patch...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

(unfortunately its not been committed yet so I can't easily point you
at the code). Its very easy to run the test with different numbers of
clients and see lockups at various places.

The bizarre thing I've seen is that things do reconnect mostly fine
(apart from the SessionExpiredException issue in one of the clients)
https://issues.apache.org/jira/browse/ZOOKEEPER-84

but a lockup often happens when trying to close down the ZooKeeper instance.

When running the test case with 3 independent clients and one server;
I tend to see the last client having a session expired and its often
the one that locks up; but when running the test with more clients I
see more lockups elsewhere.

I just wondered if folks had seen similar lockups when you try
restarting ZK servers?

(I'm testing on OS X; this lockup could be timing related maybe).

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


mailing lists archived on nabble.com?

2008-07-23 Thread James Strachan
Many apache projects including Hadoop register with nabble to host
online forums & great online archives of the mailing lists...
http://www.nabble.com/Hadoop-f17066.html
Currently there's hadoop-core, hbase and lucene on there.

I often refer to mailing list posts by nabble link; they're really
handy. Plus end users often prefer the forum style nabble approach to
getting every single email sent to a mailing list.

Does a committer on zookeeper fancy registering the ZK mailing lists
too (as a child of the Hadoop list)? I'd do it myself but then I'd be
the owner of the forum which doesn't feel right - a committer should
probably do it.

Its a pretty quick process, click here
http://n2.nabble.com/more/MailingListRequest.jtp

and fill out the details. The hardest part is knowing the mailing list
software which AFAIK is ezmlm :)

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


when should a SessionExpiredException occur?

2008-07-23 Thread James Strachan
Am just wondering if I've hit this due to some other bug. I thought ZK
did keep-alive pings to ensure each client is alive and its session
does not expire? Or does the client have to explicitly keep calling
some method on the ZooKeeper interface to ensure a steady flow of
packets to the ZK server to keep it alive?

The test case WriteLockTest in the patch for ZOOKEEPER-78 (the
WriteLock) can always reproduce a SessionExpiredException when using 3
clients (its always the 3rd session that expires).

Now when a SessionExpiredException occurs, any recipe/protocol has to
be able to deal with it; so the ZOOKEEPER-84 issue is still valid
IMHO. But I'm wondering if in my test case it shouldn't be happening;
as I've got 3 clients and a server all in the same JVM and the JVM
isn't locked or pegged nor do the TCP sockets fail AFAIK.

So I just thought I'd ask; are the keep alive packets used by default?
If they are then maybe they are not sent very frequently or something?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: things lock up when the client reconnects?

2008-07-23 Thread James Strachan
BTW one other observation; when I use 3 clients in the same JVM (i.e.
3 separate instances of ZooKeeper to try simulate a set of different
processes) I find that each client receives an initial WatchEvent on
startup; then from that point on, only the first 2 clients receive
further watch events for the connection starting/stopping, despite me
closing the server down, waiting a while, restarting the server then
stopping it again etc.

I'm wondering if this is related to why the 3rd client seems to kinda
lock up; that its loosing connection watch events. There's nothing
hard coded somewhere that only allows 2 ZooKeeper clients per JVM or
anything is there? :)

I'm gonna have a look around and see if there's any nasty static
variables around or something... We could maybe do with some more
tests for multiple clients with failover etc.

Anyone else seen something like this?

2008/7/23 James Strachan <[EMAIL PROTECTED]>:
> 2008/7/22 Flavio Junqueira <[EMAIL PROTECTED]>:
>> James, I'd like to clarify what exactly is the issue you're looking at. If
>> you provide a list of ZooKeeper servers, then a client will try to reconnect
>> to another ZooKeeper server upon a disconnection. Reconnecting to another
>> server does not guarantee maintaining the same session, though. So, are you
>> trying to guarantee that the session is still the same upon a reconnection?
>> If so, I don't think you can do it by just changing the client, since the
>> servers might have expired the old session.
>
> I'm trying to test the WriteLock implementation in the case where the
> server dies and the client reconnects to another server.
> In the test case I'm just running one server, killing it, restarting
> it and trying to get the client to reconnect.
>
> The test case is WriteLockTest in this patch...
> https://issues.apache.org/jira/browse/ZOOKEEPER-78
>
> (unfortunately its not been committed yet so I can't easily point you
> at the code). Its very easy to run the test with different numbers of
> clients and see lockups at various places.
>
> The bizarre thing I've seen is that things do reconnect mostly fine
> (apart from the SessionExpiredException issue in one of the clients)
> https://issues.apache.org/jira/browse/ZOOKEEPER-84
>
> but a lockup often happens when trying to close down the ZooKeeper instance.
>
> When running the test case with 3 independent clients and one server;
> I tend to see the last client having a session expired and its often
> the one that locks up; but when running the test with more clients I
> see more lockups elsewhere.
>
> I just wondered if folks had seen similar lockups when you try
> restarting ZK servers?
>
> (I'm testing on OS X; this lockup could be timing related maybe).
>
> --
> James
> ---
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://open.iona.com
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: when should a SessionExpiredException occur?

2008-07-23 Thread James Strachan
2008/7/23 Benjamin Reed <[EMAIL PROTECTED]>:
> SessionExpiredExceptions should be extremely rare. Basically they should only
> happen if a machine goes down (of course that would mean no exception would
> actually get generated since the client is dead :) or a network partition
> occurs.
>
> Having said that we seem to have a bug that cause SessionExpiredExceptions
> when nothing bad has happened. The bug must be in the heart beat code (we do
> them automatically, so the client shouldn't have to worry about it). If you
> can reproduce it well, it would greatly help to track down the bug! Can you
> send me the code to reproduce the problem?

Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
currently dependent on the ZOOKEEPER-84 patch as well (though given
your recent comment I'm gonna refactor the code to not require a
ZooKeeper change :)

I'll ping the list when I've refactored the test case to not require
the ZOOKEEPER-84 change.

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: when should a SessionExpiredException occur?

2008-07-23 Thread James Strachan
2008/7/23 James Strachan <[EMAIL PROTECTED]>:
> 2008/7/23 Benjamin Reed <[EMAIL PROTECTED]>:
>> SessionExpiredExceptions should be extremely rare. Basically they should only
>> happen if a machine goes down (of course that would mean no exception would
>> actually get generated since the client is dead :) or a network partition
>> occurs.
>>
>> Having said that we seem to have a bug that cause SessionExpiredExceptions
>> when nothing bad has happened. The bug must be in the heart beat code (we do
>> them automatically, so the client shouldn't have to worry about it). If you
>> can reproduce it well, it would greatly help to track down the bug! Can you
>> send me the code to reproduce the problem?
>
> Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
> currently dependent on the ZOOKEEPER-84 patch as well (though given
> your recent comment I'm gonna refactor the code to not require a
> ZooKeeper change :)
>
> I'll ping the list when I've refactored the test case to not require
> the ZOOKEEPER-84 change.

I've just updated the patch on ZOOKEEPER-78 to avoid the dependency on
ZOOKEEPER-84. It now uses a ZooKeeperFacade class which wraps up the
creation of the ZooKeeper - and recreation of it if a
SessionExpiredException is received.

The test case currently hangs there...

[junit] "main" prio=5 tid=0x01001710 nid=0xb0801000 in
Object.wait() [0xb07ff000..0xb0800148]
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on <0x096105e0> (a
org.apache.zookeeper.ClientCnxn$Packet)
[junit] at java.lang.Object.wait(Object.java:474)
[junit] at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
[junit] - locked <0x096105e0> (a org.apache.zookeeper.ClientCnxn$Packet)
[junit] at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
[junit] - locked <0x0bd54108> (a org.apache.zookeeper.ZooKeeper)
[junit] at
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
[junit] at
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
[junit] at junit.framework.TestCase.runBare(TestCase.java:140)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:110)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:128)
[junit] at junit.framework.TestResult.run(TestResult.java:113)
[junit] at junit.framework.TestCase.run(TestCase.java:124)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:232)
[junit] at junit.framework.TestSuite.run(TestSuite.java:227)
[junit] at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
[junit] at
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)


basically the 3rd ZooKeeper client cannot close down; it just hangs in
the close() method.

(BTW it might be nice to avoid the close() method waiting forever - it
might as well wait, say, 10 seconds then just close anyway).

Though now I've refactored the code to avoid the patch on ZooKeeper to
deal with reconnecting when a SessionExpiredException occurs, I don't
seem to get any session expired exceptions :). I'm starting to wonder
if its maybe related to old persistent data on disk causing the
exception?

I still get the strange lack of Watch Events on the 3rd client though
and the hang on closing (if
WriteLockTest,workAroundClosingLastZNodeFails is set to false - I've
hacked the test to pass by default).

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


do the test cases work for anyone else?

2008-07-23 Thread James Strachan
I've always had some tests failing on most boxes I try; I wasn't sure
if everyone else got those or if they do work on some platforms?

On OS X I get these failures
[junit] Test org.apache.zookeeper.test.AsyncTest FAILED
[junit] Test org.apache.zookeeper.test.WatcherFuncTest FAILED

On a linux box (an EC2 box) I get these failures

[junit] Test org.apache.zookeeper.test.ClientTest FAILED
[junit] Test org.apache.zookeeper.test.QuorumTest FAILED

Maybe they all work on windows? :)

I tried adding an explicit forkmode="perTest" to the  and I
still get the same results. Can anyone else get the tests to work on
linux or OS X?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: do the test cases work for anyone else?

2008-07-23 Thread James Strachan
FWIW I've ran the tests a few times; I think all these 4 tests have
timing failures in them. I've seen all of them fail on OS X at some
point. Sometimes only 2 will fail. On Linux I've seen just ClientTest
fail.


2008/7/23 Patrick Hunt <[EMAIL PROTECTED]>:
> I'm on ubuntu (hardy heron) and they work. Our CI machine has intermittent
> failures (solaris x86):
> http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/
>
> there's some timing issue, what you're seeing is probably related to:
> https://issues.apache.org/jira/browse/ZOOKEEPER-61
>
> Frankly tests and docs are both areas that ZooKeeper could use _a lot_ of
> care and feeding. Tests in particular could use some refactoring and a
> better implementation for launching/testing/stopping client/server tests.
>
> As you're able to reproduce the issue reliably would you like to take on 61?
> Feel free to assign to yourself if so.

As a newbie on the project its hard enough grokking ZK itself and
attempting to contribute patches, but fixing bad test cases of ZK is
even harder :) I was hoping the folks who know ZK really well can fix
the tests they've written :). But I'll take a look and see if I can
see anything obvious I can do to help with my limited knowledge of the
history of the code and internals.

How about we raise a JIRA for all tests that fail?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


assigning JIRAs to non committers

2008-07-23 Thread James Strachan
Just an idle observation as I'd never seen this workflow before on
JIRA so thought I'd ask :)

I've been watching some of the recent JIRA activity with interest.
I've seen a few JIRAs arrive, someone submits a test case who's not a
committer, then the issue gets assigned to the person who submitted
the patch. In some cases; when there may be many patches to assign
over time, I can understand it (e.g. ZOOKEEPER-78 could take a zillion
iterations before the feature is complete) - but in general if one
JIRA gets one patch from a non-committer, should the JIRA be left
unassigned - or assigned to a committer to review and apply or
reject-with-reason the patch?

i.e. lets say I raise a JIRA and attach a patch; once we're at that
stage I can't actually do anything else, not being a committer - other
than add another version of the patch :) So am not sure if its worth
assigning the issue to me. I guess the person who raised the issue &
submitted the patch can always mark it as unassigned :)

No biggie I just thought I'd ask if this was an intentional way you
guys had worked together in the past?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: mailing lists archived on nabble.com?

2008-07-23 Thread James Strachan
Done
https://issues.apache.org/jira/browse/ZOOKEEPER-85

2008/7/23 Patrick Hunt <[EMAIL PROTECTED]>:
> Good idea. Please enter a Jira and assign it to me.
>
> Patrick
>
>
> James Strachan wrote:
>>
>> Many apache projects including Hadoop register with nabble to host
>> online forums & great online archives of the mailing lists...
>> http://www.nabble.com/Hadoop-f17066.html
>> Currently there's hadoop-core, hbase and lucene on there.
>>
>> I often refer to mailing list posts by nabble link; they're really
>> handy. Plus end users often prefer the forum style nabble approach to
>> getting every single email sent to a mailing list.
>>
>> Does a committer on zookeeper fancy registering the ZK mailing lists
>> too (as a child of the Hadoop list)? I'd do it myself but then I'd be
>> the owner of the forum which doesn't feel right - a committer should
>> probably do it.
>>
>> Its a pretty quick process, click here
>> http://n2.nabble.com/more/MailingListRequest.jtp
>>
>> and fill out the details. The hardest part is knowing the mailing list
>> software which AFAIK is ezmlm :)
>>
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: assigning JIRAs to non committers

2008-07-23 Thread James Strachan
Cool thanks for the heads up! You live and learn :) Its funny how
totally different all the various Apache projects are and how they get
things done.

My bad for not reading the contributing section of the wiki yet :)

2008/7/23 Patrick Hunt <[EMAIL PROTECTED]>:
> James Strachan wrote:
>>
>> Just an idle observation as I'd never seen this workflow before on
>> JIRA so thought I'd ask :)
>
> I'm new to JIRA as well...
>
>> I've been watching some of the recent JIRA activity with interest.
>> I've seen a few JIRAs arrive, someone submits a test case who's not a
>> committer, then the issue gets assigned to the person who submitted
>> the patch. In some cases; when there may be many patches to assign
>> over time, I can understand it (e.g. ZOOKEEPER-78 could take a zillion
>> iterations before the feature is complete) - but in general if one
>> JIRA gets one patch from a non-committer, should the JIRA be left
>> unassigned - or assigned to a committer to review and apply or
>> reject-with-reason the patch?
>
> I believe the workflow is that the jira is assigned to the person resolving
> the issue (ie submiting the patch). You/Hiram have been added as
> "contributors" to jira, this means that jiras can be assigned to you. We
> typically add ppl to the contributor list as soon as they submit a patch.
>
> After that point you do the back/forth in the comments trying to get
> everyone to agree to a resolution. If this is a patch you then change the
> status to "patch available" and ask for review/voting, after which if you
> get a "+1" it's then up to a committer to commit to svn.
>
> full details here:
> http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
>
>> i.e. lets say I raise a JIRA and attach a patch; once we're at that
>> stage I can't actually do anything else, not being a committer - other
>> than add another version of the patch :) So am not sure if its worth
>> assigning the issue to me. I guess the person who raised the issue &
>> submitted the patch can always mark it as unassigned :)
>
> It's assigned to the person who resolved the issue. If accepted it's up the
> the committers to get it into svn, but you (the resolver) are still
> responsible. This information is also important for reporting purposes.
>
>> No biggie I just thought I'd ask if this was an intentional way you
>> guys had worked together in the past?
>
> This is generally how Hadoop core/hbase do things.
>
> Patrick
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: Website

2008-07-23 Thread James Strachan
2008/7/23 Doug Cutting <[EMAIL PROTECTED]>:
> James Strachan wrote:
>>
>> Tools like wikis are personal things; and folks tend to prefer to use
>> the tool they know.
>
> That's a key point.
>
> To make a switch you'd need:
>  1. Someone familiar with Confluence to lead the transition, convert the
> existing website and wiki content, set up static export etc.  Are you
> volunteering?

I would yes, but only if 2) gets approval.

>  2. Buy in from Zookeeper's primary contributors, who will end up writing
> and maintaining the documentation (Pat, Ben, etc.).  I don't really count,
> since I'm mostly a kibitzer here.
>
> Also, with Confluence export, how does one deal with versioning?  A
> convenience of keeping documentation in subversion is that it gets versioned
> with releases.  By maintaining the trunk documentation to match the trunk
> implementation, we automatically get documentation that matches each
> version, but we can still maintain the documentation in release branches.  I
> don't see how this would not add overhead with Confluence exports.  If
> Confluence always represented trunk, and we exported at release branch
> points, then it would be hard to patch branched documentation.  Maintaining
> multiple branches in Confluence would add management overhead, since these
> would need to be synchronized with subversion branching, tagging, etc.  How
> have other projects dealt with this issue?

BTW MoinMoin has the same issue; when documentation is in the wiki you
need to grab a snapshot of it to include in releases (or add it to
svn) to support versioned documentation.

What we've done in the past is copy the static HTML from the wiki with
releases; or in some projects we turn the HTML from Confluence into a
proper manual in PDF or HTML format. e.g.

if you download 1.4.0 of Camel..
http://activemq.apache.org/camel/camel-140-release.html

and look in the docs directory; you'll see a manual in PDF and HTML
format. Thats generated from the wiki whenever there is a release from
these pages
http://activemq.apache.org/camel/book.html
which include various wiki pages together to form a user manual.

which are then included together in this page
http://activemq.apache.org/camel/book-in-one-page.html


Maybe moving away from Forrest is a step too far right now; but its
certainly worth thinking whether for the wiki content its gonna be
MoinMoin or Confluence. Only if you choose Confluence then you can
consider generating a user manual or the static website from it
(neither AFAIK are possible with MoinMoin).


Incidentally a totally different thought; whats gonna be the split
between whats the static website (e.g. Forrest) versus stuff thats in
the wiki versus documentation that goes inside each release? Its often
a kinda slippery slope figuring out which bit does what and its a PITA
moving content into different formats to move between them; so while
no tool is perfect, I kinda like that with confluence there's just one
place to put docs and you can then slice and dice as you see fit (and
make multiple spaces if you want & share content across spaces) to
deal with different version issues etc.

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: Mailing lists archived on nabble.

2008-07-23 Thread James Strachan
Awesome :)

2008/7/23 Patrick Hunt <[EMAIL PROTECTED]>:
> Both the user and dev lists are now being archived on nabble.
>
> http://n2.nabble.com/zookeeper-dev-f578911.html
> http://n2.nabble.com/zookeeper-user-f578899.html
>
> Patrick
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


Re: do the test cases work for anyone else?

2008-07-23 Thread James Strachan
Here's the first issue with a patch...
https://issues.apache.org/jira/browse/ZOOKEEPER-86

2008/7/23 Patrick Hunt <[EMAIL PROTECTED]>:
> It's important to capture this type of information in jira.
>
> James Strachan wrote:
>>
>> FWIW I've ran the tests a few times; I think all these 4 tests have
>> timing failures in them. I've seen all of them fail on OS X at some
>> point. Sometimes only 2 will fail. On Linux I've seen just ClientTest
>> fail.
>>
>>
>> 2008/7/23 Patrick Hunt <[EMAIL PROTECTED]>:
>>>
>>> I'm on ubuntu (hardy heron) and they work. Our CI machine has
>>> intermittent
>>> failures (solaris x86):
>>> http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/
>>>
>>> there's some timing issue, what you're seeing is probably related to:
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-61
>>>
>>> Frankly tests and docs are both areas that ZooKeeper could use _a lot_ of
>>> care and feeding. Tests in particular could use some refactoring and a
>>> better implementation for launching/testing/stopping client/server tests.
>>>
>>> As you're able to reproduce the issue reliably would you like to take on
>>> 61?
>>> Feel free to assign to yourself if so.
>>
>> As a newbie on the project its hard enough grokking ZK itself and
>> attempting to contribute patches, but fixing bad test cases of ZK is
>> even harder :) I was hoping the folks who know ZK really well can fix
>> the tests they've written :). But I'll take a look and see if I can
>> see anything obvious I can do to help with my limited knowledge of the
>> history of the code and internals.
>>
>> How about we raise a JIRA for all tests that fail?
>>
>



-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com


[jira] Created: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-17 Thread james strachan (JIRA)
added a high level protocol/feature - for easy Leader Election or exclusive 
Write Lock creation
---

 Key: ZOOKEEPER-78
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Affects Versions: 3.0.0
Reporter: james strachan
 Attachments: writeLock_protocol.patch

Here's a patch which adds a little WriteLock helper class for performing leader 
elections or creating exclusive locks in some directory znode. Note its an 
early cut; am sure we can improve it over time. The aim is to avoid folks 
having to use the low level ZK stuff but provide a simpler high level 
abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-17 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: writeLock_protocol.patch

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>    Reporter: james strachan
> Attachments: writeLock_protocol.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-79) Document jacob's leader election on the wiki recipes page

2008-07-17 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614431#action_12614431
 ] 

james strachan commented on ZOOKEEPER-79:
-

BTW https://issues.apache.org/jira/browse/ZOOKEEPER-78 contains a patch of just 
that :)

> Document jacob's leader election on the wiki recipes page
> -
>
> Key: ZOOKEEPER-79
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-79
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: documentation
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>
> The following discussion occurred on the zookeeper-user list. We need to 
> formalize this recipe and document on the wiki recipes page:
> -from jacob 
> Avinash
>  
> The following protocol will help you fix the observed misbehavior. As Flavio 
> points out, you cannot rely on the order of nodes in getChildren, you must 
> use an intrinsic property of each node to determine who is the leader. The 
> protocol devised by Runping Qi and described here will do that.
>  
> First of all, when you create child nodes of the node that holds the 
> leadership bids, you must create them with the EPHEMERAL and SEQUENCE flag. 
> ZooKeeper guarantees to give you an ephemeral node named uniquely and with a 
> sequence number larger by at least one than any previously created node in 
> the sequence. You provide a prefix, like "L_" or your own choice, and 
> ZooKeeper creates nodes named "L_23", "L_24", etc. The sequence number starts 
> at 0 and increases monotonously.
>  
> Once you've placed your leadership bid, you search backwards from the 
> sequence number of *your* node to see if there are any preceding (in terms of 
> the sequence number) nodes. When you find one, you place a watch on it and 
> wait for it to disappear. When you get the watch notification, you search 
> again, until you do not find a preceding node, then you know you're the 
> leader. This protocol guarantees that there is at any time only one node that 
> thinks it is the leader. But it does not disseminate information about who is 
> the leader. If you want everyone to know who is the leader, you can have an 
> additional Znode whose value is the name of the current leader (or some 
> identifying information on how to contact the leader, etc.). Note that this 
> cannot be done atomically, so by the time other nodes find out who the leader 
> is, the leadership may already have passed on to a different node.
>  
> Flavio
>  
> Might it make sense to provide a standardized implementation of leader 
> election in the library code in Java?
>  
> --Jacob
>  
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Flavio 
> Junqueira
> Sent: Friday, July 11, 2008 1:02 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Zookeeper-user] Leader election
>  
> Hi Avinash, getChildren returns a list in lexicographic order, so if you are 
> updating the children of the election node concurrently, then you may get a 
> different first node with different clients. If you are using the sequence 
> flag to create nodes, then you may consider stripping the prefix of the node 
> name and using the sufix value to determine order.
> Hope it helps.
> -Flavio
>  
> - Original Message 
> From: Avinash Lakshman <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, July 11, 2008 7:20:06 AM
> Subject: [Zookeeper-user] Leader election
> Hi
> I am trying to elect leader among 50 nodes. There is always one odd guy who 
> seems to think that someone else distinct from what some other nodes see as 
> leader. Could someone please tell me what is wrong with the following code 
> for leader election:
> public void electLeader()
> {   
> ZooKeeper zk = StorageService.instance().getZooKeeperHandle();
> String path = "/Leader";
> try
> {
> String createPath = path + "/L-"; 
>   
> LeaderElector.createLock_.lock();
> while( true )
> {
> /* Get all znodes under the Leader znode */
> List values = zk.getChildren(path, false);
> /*
>  * Get the first znode and if it is the
>  * pathCreated created above then the data
&

[jira] Commented: (ZOOKEEPER-79) Document jacob's leader election on the wiki recipes page

2008-07-17 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614448#action_12614448
 ] 

james strachan commented on ZOOKEEPER-79:
-

Ah cool :) Was just checking we were not about to do the same thing separate :).

I've basically followed the same algorithm from the wiki recipe - and the same 
one described in the ZooKeeper tutorial...
http://developer.yahoo.com/blogs/hadoop/2008/03/intro-to-zookeeper-video.html

So AFAIK yes its the same




> Document jacob's leader election on the wiki recipes page
> -
>
> Key: ZOOKEEPER-79
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-79
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: documentation
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
>
> The following discussion occurred on the zookeeper-user list. We need to 
> formalize this recipe and document on the wiki recipes page:
> -from jacob 
> Avinash
>  
> The following protocol will help you fix the observed misbehavior. As Flavio 
> points out, you cannot rely on the order of nodes in getChildren, you must 
> use an intrinsic property of each node to determine who is the leader. The 
> protocol devised by Runping Qi and described here will do that.
>  
> First of all, when you create child nodes of the node that holds the 
> leadership bids, you must create them with the EPHEMERAL and SEQUENCE flag. 
> ZooKeeper guarantees to give you an ephemeral node named uniquely and with a 
> sequence number larger by at least one than any previously created node in 
> the sequence. You provide a prefix, like "L_" or your own choice, and 
> ZooKeeper creates nodes named "L_23", "L_24", etc. The sequence number starts 
> at 0 and increases monotonously.
>  
> Once you've placed your leadership bid, you search backwards from the 
> sequence number of *your* node to see if there are any preceding (in terms of 
> the sequence number) nodes. When you find one, you place a watch on it and 
> wait for it to disappear. When you get the watch notification, you search 
> again, until you do not find a preceding node, then you know you're the 
> leader. This protocol guarantees that there is at any time only one node that 
> thinks it is the leader. But it does not disseminate information about who is 
> the leader. If you want everyone to know who is the leader, you can have an 
> additional Znode whose value is the name of the current leader (or some 
> identifying information on how to contact the leader, etc.). Note that this 
> cannot be done atomically, so by the time other nodes find out who the leader 
> is, the leadership may already have passed on to a different node.
>  
> Flavio
>  
> Might it make sense to provide a standardized implementation of leader 
> election in the library code in Java?
>  
> --Jacob
>  
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Flavio 
> Junqueira
> Sent: Friday, July 11, 2008 1:02 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Zookeeper-user] Leader election
>  
> Hi Avinash, getChildren returns a list in lexicographic order, so if you are 
> updating the children of the election node concurrently, then you may get a 
> different first node with different clients. If you are using the sequence 
> flag to create nodes, then you may consider stripping the prefix of the node 
> name and using the sufix value to determine order.
> Hope it helps.
> -Flavio
>  
> - Original Message 
> From: Avinash Lakshman <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, July 11, 2008 7:20:06 AM
> Subject: [Zookeeper-user] Leader election
> Hi
> I am trying to elect leader among 50 nodes. There is always one odd guy who 
> seems to think that someone else distinct from what some other nodes see as 
> leader. Could someone please tell me what is wrong with the following code 
> for leader election:
> public void electLeader()
> {   
> ZooKeeper zk = StorageService.instance().getZooKeeperHandle();
> String path = "/Leader";
> try
> {
> String createPath = path + "/L-"; 
>   
> LeaderElector.createLock_.lock();
> while( true )
> {
> /* Get all znodes under the Leader znode */
>   

[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-18 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614703#action_12614703
 ] 

james strachan commented on ZOOKEEPER-78:
-

Thanks Flavio! 

Totally agreed on 1. Strictly speaking we should catch all exceptions and 
handle them properly (which may mean throwing some, or responding properly to 
others or whatever).

One of the main reasons for the retry logic was to avoid errors like trying to 
create a znode that already exists or loosing connection to the ZK server etc - 
but we should go through all possible exceptions and handle them much cleaner.

In particular we really need test cases that show the server closing and 
restarting during the process of acquiring the lock or after a lock owner has 
the lock etc.  

I figured I'd send a patch first and see if anyone else had a better 
implementation lying around - or knew a neater way to solve this - before I 
spent too much time getting it totally correct etc. 

For 2) I just added that so that when running the unit tests you could see INFO 
or DEBUG level logging etc (particularly when running in your IDE)

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-18 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: writeLock_protocol_with_documentation-version2.patch

here's an updated patch which has better documentation and includes the recipe 
documentation linked to from the javadoc - but which could be used stand alone 
as well if required.

I've also included the description from ZOOKEEPER-79 as well

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol.patch, 
> writeLock_protocol_with_documentation-version2.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-18 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: (was: writeLock_protocol_with_documentation-version2.patch)

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>    Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol_version3.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-18 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: (was: writeLock_protocol.patch)

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>    Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol_version3.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-18 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: writeLock_protocol_version3.patch

Here is an improved version. 

* we use more optimal comparison by using a ZNodeName object which caches the 
prefix & sequence number for ordering node names. We can also use this to order 
node names using different prefixes - maybe useful for read/write locks
* fixed a bug and enhanced the test case so that we now test that a leader is 
established; then when that leader fails another leader/owner is created

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol_version3.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-18 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614748#action_12614748
 ] 

james strachan commented on ZOOKEEPER-78:
-

BTW I just deleted the other 2 patches to avoid confusion; the latest patch 
includes the previous changes etc

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol_version3.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-83) Switch to using maven to build ZooKeeper

2008-07-22 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615546#action_12615546
 ] 

james strachan commented on ZOOKEEPER-83:
-

I just took a look at the patch; basically the maven conventions are to put 
source in ${module-name}/src/main/java and tests in 
${module-name}/src/test/java and resources in src/main/resources etc.

Plus it looks like hiram's split the project into multiple maven modules. (e.g. 
so that the Java 6 JMX code is a separate module so that the core of zookeeper 
can be used on Java 5 - which is a good thing IMHO - plus separating the jute 
stuff so it can be used in development time to generate code etc). Its also 
easy to generate an uber-jar if folks want later on.

This patch looks good to me - assuming folks are happy to go the maven route 
(which many other apache projects do btw - it certainly makes it much easier 
for zookeeper to be reused by other maven projects).

If this patch gets applied I'll happily volunteer to refactor my 
recipes/protocols patch to create a zookeeper-protocols module to create a 
separate jar for higher level stuff

> Switch to using maven to build ZooKeeper
> 
>
> Key: ZOOKEEPER-83
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-83
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Reporter: Hiram Chirino
>Assignee: Hiram Chirino
> Attachments: zookeeper-mavened.tgz
>
>
> Maven is a great too for building java projects at the ASF.  It helps 
> standardize the build a bit since it's a convention oriented.
> It's dependency auto downloading would remove the need to store the 
> dependencies in svn, and it will handle many of the suggested ASF policies 
> like gpg signing of the releases and such.
> The ZooKeeper build is almost vanilla except for the jute compiler bits.  
> Things that would need to change are:
>  * re-organize the source tree a little so that it uses the maven directory 
> conventions
>  * seperate the jute bits out into seperate modules so that a maven plugin 
> can be with it
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-22 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615618#action_12615618
 ] 

james strachan commented on ZOOKEEPER-78:
-

Great catch Benjamin! I've a working patch using your algorithm; am using 
x-sessionId-sequenceNumber and its working a treat (though its a tad hard to 
force ZK to fail mid-create :). 

Am working on some unit tests to try out the server stopping/starting which 
I'll attach shortly once they're working a bit better...

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>    Affects Versions: 3.0.0
>Reporter: james strachan
>Assignee: james strachan
> Attachments: writeLock_protocol_version3.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-22 Thread james strachan (JIRA)
provide a mechanism to reconnect a ZooKeeper if a client receives a 
SessionExpiredException
---

 Key: ZOOKEEPER-84
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
 Project: Zookeeper
  Issue Type: Improvement
  Components: java client
Reporter: james strachan


am about to attach a patch which adds a reconnect() method to easily 
re-establish a connection if a session expires - along with a toString() 
implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-22 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-84:


Status: Patch Available  (was: Open)

about to submit a match - whoops forgot to add it :)

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>    Reporter: james strachan
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-22 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-84:


Attachment: reconnect_patch.patch

sorry I forgot to add the patch before :) here it is now, hopefully this will 
make more sense now :)

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>    Reporter: james strachan
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-22 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: (was: writeLock_protocol_version3.patch)

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>    Reporter: james strachan
>Assignee: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-22 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: patch_with_including_Benjamin's_fix.patch

this modified patch includes an implementation of Benjamin's algorithm, using 
w-sessionId-sequenceNumber as a naming convention so that we can reuse files 
created for the same session if we get a connection failure.

this patch also includes a unit test which tests out the WriteLock still 
working if we stop and start the server.

also the code is refactored so that all the retry logic is in ProtocolSupport

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
>Assignee: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-83) Switch to using maven to build ZooKeeper

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615954#action_12615954
 ] 

james strachan commented on ZOOKEEPER-83:
-

Note its pretty trivial to maintain an Ant build as well as a Maven build if 
folks really have an aversion to Maven. There's really no reason at all to 
disallow a maven build from being created; they can both happily coexist - and 
its also a pretty trivial bit of work - not some huge bit of R&D thats gonna 
slow down development on other things. Also note that a non committer has 
already contributed the patch already for you - so no more work is required 
other than committing it :)

Its worth remembering that pretty much all popular Java software at Apache is 
now released into a Maven repository...
http://people.apache.org/repo/m2-ibiblio-rsync-repository/org/apache/

so if folks stick with Ant and refuse to allow a maven build to coexist with 
the Ant build, someone should volunteer to figure out how to hack the Ant build 
to release ZooKeeper into the Apache maven repository - otherwise its pretty 
hard for folks who do use maven to reuse your code (and see from the repo how 
many Apache projects we're talking about not being able to easily reuse 
ZooKeeper).

i.e. as part of the move to the ASF and being a good Apache citizen, I'd 
recommend hugely that as a minimum ZooKeeper releases its jars into the Apache 
Maven repository like most other projects do.

The easiest way to do this is just to reuse a Maven build to do releases (there 
doesn't yet seem to be anything in the Ant build to do actual signed releases 
or deploy builds anywhere) - and let folks who prefer Ant to stick to that for 
day to day development.

The easier it is to reuse a project, the more likely it'll get used and the 
more likely you'll get contributions; at least thats my experience at Apache. 
That doesn't mean you have to force your Ant-loving developers to switch build 
tools!

> Switch to using maven to build ZooKeeper
> 
>
> Key: ZOOKEEPER-83
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-83
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Reporter: Hiram Chirino
>Assignee: Hiram Chirino
> Attachments: zookeeper-mavened.tgz
>
>
> Maven is a great too for building java projects at the ASF.  It helps 
> standardize the build a bit since it's a convention oriented.
> It's dependency auto downloading would remove the need to store the 
> dependencies in svn, and it will handle many of the suggested ASF policies 
> like gpg signing of the releases and such.
> The ZooKeeper build is almost vanilla except for the jute compiler bits.  
> Things that would need to change are:
>  * re-organize the source tree a little so that it uses the maven directory 
> conventions
>  * seperate the jute bits out into seperate modules so that a maven plugin 
> can be with it
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615957#action_12615957
 ] 

james strachan commented on ZOOKEEPER-84:
-

I'm kinda confused by that :)

So what am I meant to do if I get a SessionExpiredException? I want to 
reconnect with a new session; as the current connection is totally useless and 
invalid.

Right now the only option I can see is to recreate from scratch the entire 
ZooKeeper object?

If so that means we've gotta introduce a ZooKeeperProxy thingy which wraps the 
ZooKeeper object and just lets it be recreated if a SessionExpiredException 
occurs. Seems kinda unnecessary when really a ZooKeeper instance is capable of 
easily recreating the connection to the ZK server when the session is expired.

Maybe the issue is the phrase reconnect() - maybe a method called 
recreateNewSession() is better? We could also document it that this is only to 
be called if your client becomes invalid because the ZK server has expired the 
session?

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: james strachan
>Assignee: james strachan
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616012#action_12616012
 ] 

james strachan commented on ZOOKEEPER-84:
-

the ZooKeeper will reconnect by default if the socket fails right? So I'm only 
really talking about reconectingWithNewSession() in those rare cases that the 
ZK server times out the session

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: james strachan
>Assignee: james strachan
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616065#action_12616065
 ] 

james strachan commented on ZOOKEEPER-84:
-

I hear you :)

So an Elect Leader or Write Lock protocol has to deal with expired sessions and 
create new sessions; at some point someone has to recreate something. You can 
pass the buck and say we're not gonna allow the ZooKeeper to reconnect. Then 
say we're not allowed to have the WriteLock reconnect, then the next and next 
layer of the onion. But eventually there's gonna be something somewhere that 
recreates a session :)

For now I'll work on the assumption we're gonna have to have an object which is 
a wrapper around a ZooKeeper so that it can handle reconnections by just 
discarding one ZooKeeper instance and creating another. This object could be 
shared across Protocols (we might wanna reuse one connection with ZK to make 
multiple locks for example).


> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>      Components: java client
>    Reporter: james strachan
>Assignee: james strachan
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616114#action_12616114
 ] 

james strachan commented on ZOOKEEPER-84:
-

You can mark this issue as RESOLVED/WILL_NOT_FIX if you like now - I've 
implemented a ZooKeeperFacade to wrap up the reconnectWithNewSession() logic 
for ZOOKEEPER-78

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: james strachan
>Assignee: james strachan
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Attachment: using_zookeeper_facade.patch

This patch no longer requires ZOOKEEPER-84, we now use a ZooKeeperFacade which 
wraps up the creation of the ZooKeeper instance and allows it to be replaced if 
a SessionExpiredException occurs.

The test case works in the current patch. To get the test case to hang closing 
the 3rd client, just edit WriteLockTest and set the 
workAroundClosingLastZNodeFails field to a value of false. You will then get 
this stack dump when the test hangs (on OS X at least :)...

{code}
[junit] "main" prio=5 tid=0x01001710 nid=0xb0801000 in Object.wait() 
[0xb07ff000..0xb0800148]
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on <0x096105e0> (a 
org.apache.zookeeper.ClientCnxn$Packet)
[junit] at java.lang.Object.wait(Object.java:474)
[junit] at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
[junit] - locked <0x096105e0> (a org.apache.zookeeper.ClientCnxn$Packet)
[junit] at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
[junit] - locked <0x0bd54108> (a org.apache.zookeeper.ZooKeeper)
[junit] at 
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
[junit] at 
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
[junit] at junit.framework.TestCase.runBare(TestCase.java:140)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:110)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:128)
[junit] at junit.framework.TestResult.run(TestResult.java:113)
[junit] at junit.framework.TestCase.run(TestCase.java:124)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:232)
[junit] at junit.framework.TestSuite.run(TestSuite.java:227)
[junit] at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
{code}

This could maybe just be an OS X based timing issue

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>    Reporter: james strachan
>Assignee: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-78:


Assignee: (was: james strachan)
  Status: Patch Available  (was: Open)

Patch is now attached

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>    Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616136#action_12616136
 ] 

james strachan commented on ZOOKEEPER-63:
-

I wonder if I've seen this too - I can reliably get a hung test when trying to 
close a client (though the server is still up at the point if the hang).

I'm thinking the close() method should not wait() forever on the disconnect 
packet, just a closeTimeout length - say a few seconds. Afterall blocking and 
forcing a reconnect just to redeliver the disconnect packet seems a bit silly - 
when the server has to deal with clients which just have their sockets fail 
anyway :)

> Race condition in client close() operation
> --
>
> Key: ZOOKEEPER-63
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any 
> open connections with the client when it receives this. If the client has not 
> yet shutdown it's subthreads (event/send threads for example) these threads 
> may consider the condition an error. We see this alot in the tests where the 
> clients output error logs because they are unaware that a disconnection has 
> been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client) 
> before sending disconnect request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616144#action_12616144
 ] 

james strachan commented on ZOOKEEPER-84:
-

If ZOOKEEPER-78 ever gets committed (hint, hint :) we can just refer folks to 
the ZooKeeperFacade if ever folks hit the SessionExpiredException

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>    Reporter: james strachan
>Assignee: Benjamin Reed
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-83) Switch to using maven to build ZooKeeper

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616149#action_12616149
 ] 

james strachan commented on ZOOKEEPER-83:
-

So long as dependencies don't change much and the build process doesn't undergo 
any major radical changes, its not gonna require much in the way of maintenance 
keeping the maven build going. (BTW Maven can auto-generate an Ant build too 
which uses the Maven ant tasks to work with dependencies and repos). Am sure 
Hiram or myself would happily keep the maven build ticking along. So its zero 
overhead to the ZK team.

In terms of Ivy; yeah I've heard it can work with maven repos; but I've never 
worked on a project that really uses it - it just seams much easier if you 
wanna work with maven metadata, dependencies, hierarchical projects and 
repositories to just use Maven :). There's a zillion issues with keeping your 
metadata up to date on compile/optional/test dependencies and dealing with 
hierarchial projects and transitive dependency issues - so I've no idea what 
issues you'd run into using Ivy - over the years we've bumped into a zillion 
issues with Maven :( so I certainly understand the maven resistence. Everyone 
has a love/hate relationship with it - until you try to re-implement what it 
does in Ant and you end up with some kinda grudging respect for it :). 
(Especially if you ever have to do any OSGi work!)

But sure if you guys wanna figure out the Ivy route so it'd be easy to build, 
test and install the ZK jars into a local or remote Maven repo, I'd be really 
happy! Given both Hiram and myself have been really heavy Maven users for many 
years (after being heavy Ant users before then) I doubt you'll be getting any 
Ant+Ivy patches for the build system from either of us though :).

Heck if the Maven build only lives around until someone figures out how to 
replace it with Ant+Ivy I think we'd all be happy; so starting with a maven 
build then replacing it later if someone can figure out how to do it with 
Ant+Ivy sounds a reasonable approach. (And we could always keep the maven build 
ticking along until the Ant+Ivy approach really does work).

> Switch to using maven to build ZooKeeper
> 
>
> Key: ZOOKEEPER-83
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-83
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Reporter: Hiram Chirino
>Assignee: Hiram Chirino
> Attachments: zookeeper-mavened.tgz
>
>
> Maven is a great too for building java projects at the ASF.  It helps 
> standardize the build a bit since it's a convention oriented.
> It's dependency auto downloading would remove the need to store the 
> dependencies in svn, and it will handle many of the suggested ASF policies 
> like gpg signing of the releases and such.
> The ZooKeeper build is almost vanilla except for the jute compiler bits.  
> Things that would need to change are:
>  * re-organize the source tree a little so that it uses the maven directory 
> conventions
>  * seperate the jute bits out into seperate modules so that a maven plugin 
> can be with it
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-85) register the ZooKeeper mailing lists with nabble.com

2008-07-23 Thread james strachan (JIRA)
register the ZooKeeper mailing lists with nabble.com


 Key: ZOOKEEPER-85
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-85
 Project: Zookeeper
  Issue Type: Task
Reporter: james strachan
Assignee: Patrick Hunt


Many apache projects including Hadoop register with nabble to host
online forums & great online archives of the mailing lists...
http://www.nabble.com/Hadoop-f17066.html
Currently there's hadoop-core, hbase and lucene on there.

I often refer to mailing list posts by nabble link; they're really
handy. Plus end users often prefer the forum style nabble approach to
getting every single email sent to a mailing list.

Does a committer on zookeeper fancy registering the ZK mailing lists
too (as a child of the Hadoop list)? I'd do it myself but then I'd be
the owner of the forum which doesn't feel right - a committer should
probably do it.

Its a pretty quick process, click here
http://n2.nabble.com/more/MailingListRequest.jtp

and fill out the details. The hardest part is knowing the mailing list
software which AFAIK is ezmlm :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation

2008-07-23 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616151#action_12616151
 ] 

james strachan commented on ZOOKEEPER-63:
-

BTW here's the hang I seem to be able to get quite easily using the test case 
WriteLockTest in the ZOOKEEPER-78 patch (you need to set 
workAroundClosingLastZNodeFails to false to make it hang)


{code}
   [junit] "main" prio=5 tid=0x01001710 nid=0xb0801000 in
Object.wait() [0xb07ff000..0xb0800148]
   [junit] at java.lang.Object.wait(Native Method)
   [junit] - waiting on <0x096105e0> (a
org.apache.zookeeper.ClientCnxn$Packet)
   [junit] at java.lang.Object.wait(Object.java:474)
   [junit] at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
   [junit] - locked <0x096105e0> (a org.apache.zookeeper.ClientCnxn$Packet)
   [junit] at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
   [junit] - locked <0x0bd54108> (a org.apache.zookeeper.ZooKeeper)
   [junit] at
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
   [junit] at
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
   [junit] at junit.framework.TestCase.runBare(TestCase.java:140)
   [junit] at junit.framework.TestResult$1.protect(TestResult.java:110)
   [junit] at junit.framework.TestResult.runProtected(TestResult.java:128)
   [junit] at junit.framework.TestResult.run(TestResult.java:113)
   [junit] at junit.framework.TestCase.run(TestCase.java:124)
   [junit] at junit.framework.TestSuite.runTest(TestSuite.java:232)
   [junit] at junit.framework.TestSuite.run(TestSuite.java:227)
   [junit] at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
   [junit] at
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
   [junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
   [junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
   [junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
{code}

> Race condition in client close() operation
> --
>
> Key: ZOOKEEPER-63
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
>
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any 
> open connections with the client when it receives this. If the client has not 
> yet shutdown it's subthreads (event/send threads for example) these threads 
> may consider the condition an error. We see this alot in the tests where the 
> clients output error logs because they are unaware that a disconnection has 
> been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client) 
> before sending disconnect request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-63) Race condition in client close() operation

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-63:


Attachment: patch_ZOOKEEPER-63.patch

This patch avoids the close() method blocking forever. It waits just once, up 
to the closeTimeout so if the socket is blocked or some other strangeness is 
going on, the calling thread will only wait up to the timeout (which defaults 
to 2 seconds).

BTW this patch fixes the hang I was having in the test case to ZOOKEEPER-78

> Race condition in client close() operation
> --
>
> Key: ZOOKEEPER-63
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
> Attachments: patch_ZOOKEEPER-63.patch
>
>
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any 
> open connections with the client when it receives this. If the client has not 
> yet shutdown it's subthreads (event/send threads for example) these threads 
> may consider the condition an error. We see this alot in the tests where the 
> clients output error logs because they are unaware that a disconnection has 
> been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client) 
> before sending disconnect request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-63) Race condition in client close() operation

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-63:


Status: Patch Available  (was: Open)

about to attach a patch

> Race condition in client close() operation
> --
>
> Key: ZOOKEEPER-63
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
> Attachments: patch_ZOOKEEPER-63.patch
>
>
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any 
> open connections with the client when it receives this. If the client has not 
> yet shutdown it's subthreads (event/send threads for example) these threads 
> may consider the condition an error. We see this alot in the tests where the 
> clients output error logs because they are unaware that a disconnection has 
> been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client) 
> before sending disconnect request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-86) intermittent test failure of org.apache.zookeeper.test.AsyncTest

2008-07-23 Thread james strachan (JIRA)
intermittent test failure of org.apache.zookeeper.test.AsyncTest


 Key: ZOOKEEPER-86
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-86
 Project: Zookeeper
  Issue Type: Bug
  Components: tests
 Environment: OS X and linux. It sometimes passes; but mostly seems to 
fail on OS X each time
Reporter: james strachan


Will attach the test output in an attachment...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-86) intermittent test failure of org.apache.zookeeper.test.AsyncTest

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-86:


Attachment: TEST-org.apache.zookeeper.test.AsyncTest.txt

here's the output when ran on OS X (using Leopard)

> intermittent test failure of org.apache.zookeeper.test.AsyncTest
> 
>
> Key: ZOOKEEPER-86
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-86
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
> Environment: OS X and linux. It sometimes passes; but mostly seems to 
> fail on OS X each time
>Reporter: james strachan
> Attachments: TEST-org.apache.zookeeper.test.AsyncTest.txt
>
>
> Will attach the test output in an attachment...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-86) intermittent test failure of org.apache.zookeeper.test.AsyncTest

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-86:


Status: Patch Available  (was: Open)

about to attach

> intermittent test failure of org.apache.zookeeper.test.AsyncTest
> 
>
> Key: ZOOKEEPER-86
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-86
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
> Environment: OS X and linux. It sometimes passes; but mostly seems to 
> fail on OS X each time
>    Reporter: james strachan
> Attachments: TEST-org.apache.zookeeper.test.AsyncTest.txt
>
>
> Will attach the test output in an attachment...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-86) intermittent test failure of org.apache.zookeeper.test.AsyncTest

2008-07-23 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-86:


Attachment: patch_for_ZOOKEEPER-86.patch

this patch seems to fix the test case on OS X at least; I've split the test 
case into 2 parts (so they are forked separately) and added more delays before 
trying to rebind to the server socket which seems to fix the error

> intermittent test failure of org.apache.zookeeper.test.AsyncTest
> 
>
> Key: ZOOKEEPER-86
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-86
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
> Environment: OS X and linux. It sometimes passes; but mostly seems to 
> fail on OS X each time
>Reporter: james strachan
> Attachments: patch_for_ZOOKEEPER-86.patch, 
> TEST-org.apache.zookeeper.test.AsyncTest.txt
>
>
> Will attach the test output in an attachment...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-86) intermittent test failure of org.apache.zookeeper.test.AsyncTest

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616365#action_12616365
 ] 

james strachan commented on ZOOKEEPER-86:
-

BTW I have sometimes still seen the AsyncHammerTest fail on OS X still; the 
basic issue is the restart of the quorum servers - its often the 3rd one - the 
server socket has not yet been released by the OS which tends to cause the 
failure. While things seem to work much better now, we might wanna add a bigger 
sleep in between restarts if it starts getting more common again

> intermittent test failure of org.apache.zookeeper.test.AsyncTest
> 
>
> Key: ZOOKEEPER-86
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-86
> Project: Zookeeper
>  Issue Type: Bug
>  Components: tests
> Environment: OS X and linux. It sometimes passes; but mostly seems to 
> fail on OS X each time
>Reporter: james strachan
> Attachments: patch_for_ZOOKEEPER-86.patch, 
> TEST-org.apache.zookeeper.test.AsyncTest.txt
>
>
> Will attach the test output in an attachment...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616374#action_12616374
 ] 

james strachan commented on ZOOKEEPER-22:
-

BTW this discussion came up recently on the dev lists too...

http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/200807.mbox/[EMAIL
 PROTECTED]


To be able to retry operations on conection close (or due to session 
expiration) there is a patch in 
https://issues.apache.org/jira/browse/ZOOKEEPER-78

which adds a ZooKeeperFacade for dealing with reconnecting on session 
expiration and some helper methods in ProtocolSupport for retrying synchronous 
operations or blocks of code in light of connection failures

> Automatic request retries on connect failover
> -
>
> Key: ZOOKEEPER-22
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: c client, java client
>Reporter: Patrick Hunt
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail&aid=1831412&group_id=209147&atid=1008547

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616391#action_12616391
 ] 

james strachan commented on ZOOKEEPER-78:
-

h3. Moving patches for this issue to subversion for easier tracking

http://svn.apache.org/repos/asf/activemq/sandbox/zookeeper/zookeeper-protocols/

I've already submitted about five patches for this issue so far and I'm sure 
there's gonna be loads more coming. Developing higher level protocols is a much 
bigger job than I previously thought ;-) particularly with having tests for all 
the various failure scenarios and adding support for the various other higher 
level protocols.

Its kinda time consuming creating loads of patches & attaching them to the same 
issue and deleting the old ones so its easy for commmitters to review - but 
more importantly, all the history of the many patches gets totally lost using 
the attach-patch-to-jira model - which also makes it harder for committers to 
watch progress on this issue.

I've never done this before on any other Apache project - and this approach is 
*temporary* and only reserved for the single ZOOKEEPER-78 issue; but I've 
checked in this patch into an svn sandbox area at Apache that I have commit 
karma on and will continue to work on it there; so that all the history is 
preserved. I can then do many more frequent & smaller commits; any ZK committer 
can review and easily apply my patches whenever they feel like - and its gonna 
be much easier for anyone in the ZK community to track progress on this issue 
and see how the implementation has changed over time as some things work or I 
find better ways of solving the issue.

This approach is totally temporary; its not an attempt to move the code outside 
of the ZK community or anything like that. At any point feel free to commit 
(actually just copy in svn which will keep all the history & commit comments 
etc) to the ZK trunk. You could even mirror the code to the ZK tree in 
sandbox/contrib area if you like - just like Hiram did to mirror the ZK code to 
the maven-patch example in the activemq sandbox.

I'm hoping in a few weeks my hacking on this issue will near completion and we 
can permanently move the code back into the ZK tree; but in the meantime its 
trivial to reuse it where it is or mirror it into the ZK tree as folks in the 
ZK community see fit. Also if I ever earn committer karma on ZK I can just move 
it into some ZK contrib area myself :)


h3. Building the code

In terms of sandbox - I ended up reusing Hiram's sandbox area that shows the 
maven build working on ZK; as I prefer to use maven and it was then super easy 
for me to create a new maven module, zookeeper-protocols that just includes the 
source and test cases for the high level protocols.

If you're new to maven and want to build it, I've checked in instructions 
here...
https://svn.apache.org/repos/asf/activemq/sandbox/zookeeper/BUILDING-maven.txt

Whenever we move this code back into the ZK trunk am sure we can hack an Ant 
build for it.

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>      Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-88) implement java.util.concurrent.locks.Lock

2008-07-24 Thread james strachan (JIRA)
implement java.util.concurrent.locks.Lock
-

 Key: ZOOKEEPER-88
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-88
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: james strachan


we should implement the 
[java.util.concurrent.locks.Lock|http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/Lock.html]
 to make it easier for folks to reuse the Lock and to help make the API be more 
natural to end users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616416#action_12616416
 ] 

james strachan commented on ZOOKEEPER-78:
-

Just added the WhenOwnerListener interface : 
http://svn.apache.org/viewvc?view=rev&revision=679325 I just need to figure out 
how to add notifications of loss of owner/leader status when the connection 
fails or the session expires etc.

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616418#action_12616418
 ] 

james strachan commented on ZOOKEEPER-22:
-

BTW you can see the code for 
[ProtocolSupport|http://svn.apache.org/viewvc/activemq/sandbox/zookeeper/zookeeper-protocols/src/main/java/org/apache/zookeeper/protocols/ProtocolSupport.java?view=markup]
 and 
[ZooKeeperFacade|http://svn.apache.org/viewvc/activemq/sandbox/zookeeper/zookeeper-protocols/src/main/java/org/apache/zookeeper/protocols/ZooKeeperFacade.java?view=markup]
 as I've checked in the patch for ZOOKEEPER-78 into a temporary sandbox area, 
[details 
here|https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616391#action_12616391]

> Automatic request retries on connect failover
> -
>
> Key: ZOOKEEPER-22
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: c client, java client
>Reporter: Patrick Hunt
>
> Moved from SourceForge to Apache.
> http://sourceforge.net/tracker/index.php?func=detail&aid=1831412&group_id=209147&atid=1008547

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616413#action_12616413
 ] 

james strachan commented on ZOOKEEPER-78:
-

Thanks for the great comments Benjamin! Have already [added the constructor for 
you|https://svn.apache.org/repos/asf/activemq/sandbox/zookeeper/zookeeper-protocols/src/main/java/org/apache/zookeeper/protocols/WriteLock.java]
 :)

BTW I was pondering about switching the whenOwner from a Runnable to some kinda 
interface that invokes you when you become the leader/owner - or when you stop 
being the leader/owner. Something like 

{code}
public interface WhenOwnerListener {
  void whenOwner();
  void whenNotOwner();
}
{code}

Where only znodes that are the owner would be notified with the whenOwner() 
method; but then if a connection fails or session times out, they'd be notified 
with a call to whenNotOwner();
 
Spookily - I'd set myself the target today to properly implement the watches so 
that WriteLock gets a notification of it no longer being the leader/owner when 
a connection fails (which normally auto-reconnects anyway right now in the base 
ZooKeeper). Then I was gonna add a notification mechanism so we could notify 
the leader/owner is no longer the leader/owner when the session expired 
exception occurs.

So we're absolutely on the same page; once I'd grokked the proper watch code 
for dealing with normal connection failures & reconnects I was hoping to add 
something vaguely similar to the ZooKeeperFacade so that higher level protocols 
can be aware of both when ZooKeeper reconnects and when ZooKeeperFacade creates 
a whole new connection.

Does that make sense? I totally understand your concerns at making sure the 
WriteLock and associated helper classes like ProtocolSupport/ZooKeeperFacade do 
the right thing - I want exactly the same thing :) I'd just not yet had the 
chance to go through all the different failure conditions and scenarios and 
make sure they all work properly :)



> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>    Affects Versions: 3.0.0
>Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-89) invoke WhenOwnerListener.whenNotOwner() when the ZK connection fails

2008-07-24 Thread james strachan (JIRA)
invoke WhenOwnerListener.whenNotOwner() when the ZK connection fails


 Key: ZOOKEEPER-89
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-89
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: james strachan




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-90) invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the znode is the leader

2008-07-24 Thread james strachan (JIRA)
invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the 
znode is the leader
---

 Key: ZOOKEEPER-90
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-90
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: james strachan




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-84) provide a mechanism to reconnect a ZooKeeper if a client receives a SessionExpiredException

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616420#action_12616420
 ] 

james strachan commented on ZOOKEEPER-84:
-

BTW here is the code for 
[ZooKeeperFacade|http://svn.apache.org/viewvc/activemq/sandbox/zookeeper/zookeeper-protocols/src/main/java/org/apache/zookeeper/protocols/ZooKeeperFacade.java?view=markup]
 as I've checked in the patch for ZOOKEEPER-78 into a temporary sandbox area, 
[details 
here|https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616391#action_12616391]

> provide a mechanism to reconnect a ZooKeeper if a client receives a 
> SessionExpiredException
> ---
>
> Key: ZOOKEEPER-84
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-84
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: java client
>Reporter: james strachan
>Assignee: Benjamin Reed
> Attachments: reconnect_patch.patch
>
>
> am about to attach a patch which adds a reconnect() method to easily 
> re-establish a connection if a session expires - along with a toString() 
> implementation for easier debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616426#action_12616426
 ] 

james strachan commented on ZOOKEEPER-63:
-

So this patch does not attempt to fix the race condition problem, apologies if 
I gave that impression :)

What it does do though is act as a workaround so that if a client is not able 
to properly send a disconnect packet to the server for *any reason at all* such 
as

* a hung socket (which can be quite common) 
* no servers available
* a race condition in the ZK client code of some kind (which we definitely have 
now)

to not hang the client application forever - as its trying to close and shut 
down anyway :). So its a side benefit that it acts as a band aid until someone 
fixes all the possible race conditions and potential socket hangs.

Let me put it another way. Given that the client is closing; is it really 
correct to leave it potentially hanging around forever just because it cannot 
be sure if the disconnect packet was received and properly processed by the 
server? If the socket is dead before the call to close(), is it really correct 
to block until a connection can be re-established, just so it can be properly 
closed - when the code will effectively close the hung socket without sending a 
disconnect packet anyway :) ? 

The server has to detect and timeout failed sessions; whether it receives an 
explicit disconnect packet or not (as a process could just hang). So do we 
really need to be super strict on the client side, forcing clients to block 
when trying to shut down if they can't do so cleanly within some time period?

I totally agree that we should fix the race condition though :). I just wanted 
a work around to avoid my ZK test cases hanging forever due to the race 
condition :) 

> Race condition in client close() operation
> --
>
> Key: ZOOKEEPER-63
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Patrick Hunt
>Assignee: Benjamin Reed
> Attachments: patch_ZOOKEEPER-63.patch
>
>
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any 
> open connections with the client when it receives this. If the client has not 
> yet shutdown it's subthreads (event/send threads for example) these threads 
> may consider the condition an error. We see this alot in the tests where the 
> clients output error logs because they are unaware that a disconnection has 
> been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client) 
> before sending disconnect request.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-90) invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the znode is the leader

2008-07-24 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan resolved ZOOKEEPER-90.
-

Resolution: Fixed

this is now fixed in [this 
patch|http://svn.apache.org/viewvc?rev=679341&view=rev] to ZOOKEEPER-78

> invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the 
> znode is the leader
> ---
>
> Key: ZOOKEEPER-90
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-90
> Project: Zookeeper
>  Issue Type: Sub-task
>    Reporter: james strachan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-91) provide an option for the WriteLock to also watch the locks own znode, so that if someone else deletes it then it is equivalent to calling WriteLock.unlock()

2008-07-24 Thread james strachan (JIRA)
provide an option for the WriteLock to also watch the locks own znode, so that 
if someone else deletes it then it is equivalent to calling WriteLock.unlock()
-

 Key: ZOOKEEPER-91
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-91
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: james strachan


Most clients probably wont need this, but it could be a handy system management 
feature to allow the WriteLock to watch its own znode so that if someone else 
deletes it, it then relinquishes the lock and tries to get it back again

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616446#action_12616446
 ] 

james strachan commented on ZOOKEEPER-78:
-

Benjamin I added ZOOKEEPER-89 and ZOOKEEPER-90 to track the dealing with loss 
of ownership/leader with connection reconnects and with session expiration. 
I've not been able to test out the latter yet; but I've tested the former and I 
think both are implemented now via the patch for 
 [ZOOKEEPER-90|http://svn.apache.org/viewvc?rev=679354&view=r] and 
[ZOOKEEPER-89|http://svn.apache.org/viewvc?rev=679341&view=rev]


> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-89) invoke WhenOwnerListener.whenNotOwner() when the ZK connection fails

2008-07-24 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan resolved ZOOKEEPER-89.
-

Resolution: Fixed

fixed now with [this patch|http://svn.apache.org/viewvc?rev=679354&view=rev] 
from ZOOKEEPER-78

> invoke WhenOwnerListener.whenNotOwner() when the ZK connection fails
> 
>
> Key: ZOOKEEPER-89
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-89
> Project: Zookeeper
>  Issue Type: Sub-task
>Reporter: james strachan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-92) spring factory beans for ZooKeeper, ZooKeeperFacade and ZooKeeperServer

2008-07-24 Thread james strachan (JIRA)
spring factory beans for ZooKeeper, ZooKeeperFacade and ZooKeeperServer
---

 Key: ZOOKEEPER-92
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-92
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: james strachan


for folks who use Spring for Dependency Injection it might be handy to have a 
couple of factory beans to make it easier to create and configure the 
ZooKeeper, ZooKeeperFacade and ZooKeeperServer via the normal Spring dependency 
mechanism; via Java or XML code etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-90) invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the znode is the leader

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616455#action_12616455
 ] 

james strachan commented on ZOOKEEPER-90:
-

FWIW I don't think the ZooKeeperFacade needs to have a WhenOwnerListener to be 
honest; as the underlying ZooKeeper should invoke the Watcher directly - when 
the connection is closed via the session expired reconnect or via close(). I 
guess it doesn't harm to have a few too many events firing in there just in 
case :)

> invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the 
> znode is the leader
> ---
>
> Key: ZOOKEEPER-90
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-90
> Project: Zookeeper
>  Issue Type: Sub-task
>Reporter: james strachan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-90) invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the znode is the leader

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616456#action_12616456
 ] 

james strachan commented on ZOOKEEPER-90:
-

I guess the earlier we can notify a leader/owner that they are no longer gonna 
be the leader/owner the better. e.g. its better to notify them up front than 
wait until the effect of a socket closing etc?

> invoke WhenOwnerListener.whenNotOwner() when the ZK session expires and the 
> znode is the leader
> ---
>
> Key: ZOOKEEPER-90
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-90
> Project: Zookeeper
>  Issue Type: Sub-task
>    Reporter: james strachan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-88) implement java.util.concurrent.locks.Lock

2008-07-24 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-88:


Status: Patch Available  (was: Open)

I've just submitted an [initial patch at implementing 
this|http://svn.apache.org/viewvc?rev=679435&view=rev] which could use some 
more tests and code review

> implement java.util.concurrent.locks.Lock
> -
>
> Key: ZOOKEEPER-88
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-88
> Project: Zookeeper
>  Issue Type: Sub-task
>  Components: java client
>Reporter: james strachan
>
> we should implement the 
> [java.util.concurrent.locks.Lock|http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/locks/Lock.html]
>  to make it easier for folks to reuse the Lock and to help make the API be 
> more natural to end users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-74) Cleaning/restructuring up Zookeeper server code

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616518#action_12616518
 ] 

james strachan commented on ZOOKEEPER-74:
-

its a very minor thing, but the contributing guide says to use Sun's coding 
conventions..

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

yet lots of the code tends to litter fields in classes in between methods; its 
sometimes a bit hard looking at the source to grok what fields are owned by 
what object. I prefer the Sun standards where all the fields are at the top as 
most java folks and apache java projects do. 

Though good IDE's can kinda work around this though and so I tend to rely on 
the outline view in my IDE rather than the source to grok what state a class 
has :)

> Cleaning/restructuring up Zookeeper server code
> ---
>
> Key: ZOOKEEPER-74
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-74
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 3.0.0
>
>
> I have been thinking this for a while and find that the zookeeper server code 
> needs some cleaning up. The server code is a little tricky/confusing to read 
> sometimes gievn that there is no clearity on ownership of objects. I will put 
> down a proposal for restructuring/cleaning the code up with javadocs so that 
> the code is easier to understand and develop on. comments on what you find 
> confusing are welcome on this jira. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-58) Race condition on ClientCnxn.java

2008-07-24 Thread james strachan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

james strachan updated ZOOKEEPER-58:


Status: Patch Available  (was: Open)

just marking this issue as having a patch - does a committer fancy applying it? 
:)

> Race condition on ClientCnxn.java
> -
>
> Key: ZOOKEEPER-58
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-58
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Attachments: patch-incoming-race.txt
>
>
> There is a race condition involving the ByteByffer incomingBuffer, a field of 
> ClientCnxn.SendThread. SendThread reads a packet in two steps: first it reads 
> the length of the packet, followed by a read of the packet itself. Each of 
> these steps corresponds to a call to doIO() from the main loop of run(). If 
> there is an exception or the session times out, then it may leave 
> incomingBuffer in an inconsistent state. 
> The attached patch adds code to reset incomingBuffer upon a call to 
> SendThread.cleanup(). This method is called upon an exception on run().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616579#action_12616579
 ] 

james strachan commented on ZOOKEEPER-78:
-

Wow I confess to be being kinda surprised by that response :) I didn't realise 
you guys were so attached to the exact svn command line used to apply a patch - 
I thought you'd welcome all contributions and that the svn commit history would 
be more useful to you, given the large amount of changes and complexity of the 
code and large number of comments already on this JIRA - rather than focus 
purely on a minor couple of keystrokes required apply the patch :)  

FWIW I've attached about 5 patch files so far, with none of them being 
committed anywhere - then made about 7 patches since then in svn with history 
and I'm sure there'll be another 5 or so changes to go before this patch is 
done. 

Never mind - I'll happily comply with your strict patch acceptance policy. Give 
me a few weeks or so to completely finish the code and documentation and I'll 
submit a single big patch for all the work to this JIRA. If you want you can 
get all the history too with a trivial alternative svn command - but if that 
offends you, please forget I'm using a sandbox svn area to work on this 
(pretend I'm just saving it on my hard drive and please disregard the links 
I've added to some JIRAs to refer to parts of this patch in a simple way) and 
just use the single patch file I'll attach in a few weeks or so.


> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-83) Switch to using maven to build ZooKeeper

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616580#action_12616580
 ] 

james strachan commented on ZOOKEEPER-83:
-

Hudson does support Maven and Ant natively

http://hudson.gotdns.com/wiki/display/HUDSON/Plugins#Plugins-Buildtools


> Switch to using maven to build ZooKeeper
> 
>
> Key: ZOOKEEPER-83
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-83
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: build
>Reporter: Hiram Chirino
> Attachments: zookeeper-mavened.tgz
>
>
> Maven is a great too for building java projects at the ASF.  It helps 
> standardize the build a bit since it's a convention oriented.
> It's dependency auto downloading would remove the need to store the 
> dependencies in svn, and it will handle many of the suggested ASF policies 
> like gpg signing of the releases and such.
> The ZooKeeper build is almost vanilla except for the jute compiler bits.  
> Things that would need to change are:
>  * re-organize the source tree a little so that it uses the maven directory 
> conventions
>  * seperate the jute bits out into seperate modules so that a maven plugin 
> can be with it
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-78) added a high level protocol/feature - for easy Leader Election or exclusive Write Lock creation

2008-07-24 Thread james strachan (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616603#action_12616603
 ] 

james strachan commented on ZOOKEEPER-78:
-

{quote}
Let's stick with a process for now that all contributors can use, not just ASF 
committers.
{quote}

Huh? Everyone has access to ASF svn? Only committers can commit using either 
approach. I don't grok your point.

> added a high level protocol/feature - for easy Leader Election or exclusive 
> Write Lock creation
> ---
>
> Key: ZOOKEEPER-78
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-78
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: java client
>Affects Versions: 3.0.0
>Reporter: james strachan
> Attachments: patch_with_including_Benjamin's_fix.patch, 
> using_zookeeper_facade.patch
>
>
> Here's a patch which adds a little WriteLock helper class for performing 
> leader elections or creating exclusive locks in some directory znode. Note 
> its an early cut; am sure we can improve it over time. The aim is to avoid 
> folks having to use the low level ZK stuff but provide a simpler high level 
> abstraction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.