Re: common client

2009-06-22 Thread Nitay
+1. I would be interested in things like this. I think it should be in
some contrib/ type thing under zookeeper, like the recipes.

On Mon, Jun 22, 2009 at 4:41 PM, Stefan Groschupf wrote:
> Hi,
>
> I wonder if people are interested to work together on a zk client that
> support some more functionality than zk offers by default.
> Katta has this client and I copied the code into a couple other projects as
> well but I'm sure it could be better than it is.
>
> http://katta.svn.sourceforge.net/viewvc/katta/trunk/src/main/java/net/sf/katta/zk/ZKClient.java?view=markup
>
> I'm sure other would benefit from such a client.
>
> Some of the feature are:
> + Connect
> + Data and StateChangeListener - subscribe once, get events until
> unsubscribe
> + Threadsafe
>
> It is not a lot of code but I'm just tired to have it duplicated so many
> times.
> Anyone interested to join in?  Or is there something like this already?
> I could just copy this to a github project.
>
> Stefan
>
>


Re: Show your ZooKeeper pride!

2009-06-08 Thread Nitay
Added HBase.

On Mon, Jun 8, 2009 at 7:01 PM, Ted Dunning  wrote:

>  How come Yahoo isn't listed?
>
> On Mon, Jun 8, 2009 at 6:31 PM, Patrick Hunt  wrote:
>
> > The Hadoop summit is Wednesday. If you're attending please feel free to
> say
> > hi -- Mahadev is presenting @4, Ben and I will be attending as well.
> >
> > Also, regardless of whether you're attending or not we'd appreciate any
> > updates to the "powered by" page, if you're too busy to update it
> yourself
> > send us a snippet and we'll update it for you ;-)
> >
> > http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy
> >
> > Regards,
> >
> > Patrick
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> http://www.deepdyve.com
> 858-414-0013 (m)
> 408-773-0220 (fax)
>


Re: Errors during shutdown/startup of ZooKeeper

2009-06-03 Thread Nitay
I'm still working on it (going on in parallel with a bunch of other things).
Will let you guys know what I figure out as soon as I get some results. I
think you are on to something Patrick. That is some gold advice. Thanks
guys.

-n

On Wed, Jun 3, 2009 at 11:39 AM, Patrick Hunt  wrote:

> Nitay, any luck? Feel free to create a JIRA to track this. If you point to
> the test code that's experiencing the problem we'll try and take a look.
>
> Patrick
>
>
> Patrick Hunt wrote:
>
>> This log manifests if the client is running ahead of the server.
>>
>> say you have:
>> 1) client connects to server A and sees some changes
>> 2) client gets disconnected from A and attempts to connect to B
>> 3) B can be running behind A by some number of changes (it will eventually
>> catch up)
>> 4) client will attempt to connect to another server that's at, or ahead of
>> it's zxid until successful.
>>
>> why? this ensures that the client never sees old data, part of the
>> guarantee you are provided when using zk. However since servers in a quorum
>> can run behind (minority) then you might see this.
>>
>> It's unusual to see this so many times however. I see that you are running
>> this as part of a junit test. Perhaps that has some impact? Are you shutting
>> down servers, perhaps clearing the datadir and restarting them, w/o closing
>> all of the clients? If your tests are not running in "fork mode" for junit
>> (or multiple tests w/in a junit test class) then old clients can hang around
>> _if not explicitly closed_ and try to re-connect to new servers that you are
>> using for new tests - if the servers are starting fresh (zxid=1) then you
>> can see this alot as the old (zombie) clients cannot connect to the new
>> servers. Perhaps this is what you are seeing?
>>
>> Patrick
>>
>> Nitay wrote:
>>
>>> I see. That helps. However, even as warnings, these go on seemingly
>>> endlessly. Why do they not get fixed by themselves? What are we doing
>>> wrong
>>> here?
>>>
>>> On Tue, Jun 2, 2009 at 2:24 PM, Mahadev Konar 
>>> wrote:
>>>
>>>  Hi Nitay,
>>>>  This is not an error but should be a warning. I have opened up a jira
>>>> for
>>>> it.
>>>>
>>>> http://issues.apache.org/jira/browse/ZOOKEEPER-428
>>>>
>>>>
>>>> The message just says that a client is connecting to a server that is
>>>> behind
>>>> that a server is was connected to earlier. The log should be warn and
>>>> not
>>>> error and should be fixed in the next release.
>>>>
>>>> mahadev
>>>>
>>>> On 6/2/09 2:12 PM, "Nitay"  wrote:
>>>>
>>>>  Hey guys,
>>>>>
>>>>> We are getting a lot of messages like this in HBase:
>>>>>
>>>>> [junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810]
>>>>> server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is
>>>>> 0xd
>>>>>
>>>>> For more context, the block it usually appears in is:
>>>>>
>>>>>[junit] 2009-06-02 13:27:54,083 INFO  [main-SendThread]
>>>>> zookeeper.ClientCnxn$SendThread(737): Priming connection to
>>>>> java.nio.channels.SocketChannel[connected
>>>>> local=/0:0:0:0:0:0:0:1%0:56511
>>>>> remote=localhost/0:0:0:0:0:0:0:1:21810]
>>>>>[junit] 2009-06-02 13:27:54,084 INFO  [main-SendThread]
>>>>> zookeeper.ClientCnxn$SendThread(889): Server connection successful
>>>>>[junit] 2009-06-02 13:27:54,093 INFO  [NIOServerCxn.Factory:21810]
>>>>> server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511
>>>>> lastZxid
>>>>>
>>>> 16
>>>>
>>>>>[junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810]
>>>>> server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid is
>>>>> 0x4
>>>>>[junit] 2009-06-02 13:27:54,094 WARN  [NIOServerCxn.Factory:21810]
>>>>> server.NIOServerCnxn(444): Exception causing close of session 0x0 due
>>>>> to
>>>>> java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4
>>>>>[junit] 2009-06-02 13:27:54,094 DEBUG
>>>>> [NIOServerCxn.Facto777ry:21810]
>>>>> server.NIOServerCnxn(447): IOException stack trace
>>>>>[jun

Re: Errors during shutdown/startup of ZooKeeper

2009-06-02 Thread Nitay
I see. That helps. However, even as warnings, these go on seemingly
endlessly. Why do they not get fixed by themselves? What are we doing wrong
here?

On Tue, Jun 2, 2009 at 2:24 PM, Mahadev Konar  wrote:

> Hi Nitay,
>  This is not an error but should be a warning. I have opened up a jira for
> it.
>
> http://issues.apache.org/jira/browse/ZOOKEEPER-428
>
>
> The message just says that a client is connecting to a server that is
> behind
> that a server is was connected to earlier. The log should be warn and not
> error and should be fixed in the next release.
>
> mahadev
>
> On 6/2/09 2:12 PM, "Nitay"  wrote:
>
> > Hey guys,
> >
> > We are getting a lot of messages like this in HBase:
> >
> > [junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810]
> > server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is 0xd
> >
> > For more context, the block it usually appears in is:
> >
> > [junit] 2009-06-02 13:27:54,083 INFO  [main-SendThread]
> > zookeeper.ClientCnxn$SendThread(737): Priming connection to
> > java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:56511
> > remote=localhost/0:0:0:0:0:0:0:1:21810]
> > [junit] 2009-06-02 13:27:54,084 INFO  [main-SendThread]
> > zookeeper.ClientCnxn$SendThread(889): Server connection successful
> > [junit] 2009-06-02 13:27:54,093 INFO  [NIOServerCxn.Factory:21810]
> > server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511 lastZxid
> 16
> > [junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810]
> > server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid is 0x4
> > [junit] 2009-06-02 13:27:54,094 WARN  [NIOServerCxn.Factory:21810]
> > server.NIOServerCnxn(444): Exception causing close of session 0x0 due to
> > java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4
> > [junit] 2009-06-02 13:27:54,094 DEBUG [NIOServerCxn.Facto777ry:21810]
> > server.NIOServerCnxn(447): IOException stack trace
> > [junit] java.io.IOException: Client has seen zxid 0x10 our last zxid
> is
> > 0x4
> > [junit] at
> >
> org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.jav
> > a:544)
> > [junit] at
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:331)
> > [junit] at
> >
> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:176)
> > [junit] 2009-06-02 13:27:54,094 INFO  [NIOServerCxn.Factory:21810]
> > server.NIOServerCnxn(777): closing session:0x0 NIOServerCnxn:
> > java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:21810
> > remote=/0:0:0:0:0:0:0:1%0:56511]
> > [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
> > zookeeper.ClientCnxn$SendThread(919): Exception closing session
> > 0x121a2a7c43a0002 to sun.nio.ch.selectionkeyi...@2c662b4e
> > [junit] java.io.IOException: Read error rc = -1
> > java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
> > [junit] at
> > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
> > [junit] at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
> > [junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
> > zookeeper.ClientCnxn$SendThread(953): Ignoring exception during shutdown
> > input
> > [junit] java.net.SocketException: Socket is not connected
> > [junit] at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
> > [junit] at
> > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
> > [junit] at
> > sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
> > [junit] at
> > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951)
> > [junit] at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
> >
> >
> > This happens in a seemingly endless loop. We are not quite sure what it
> > means. Can someone help shed some light on these messages?
> >
> > Thanks,
> > -n
>
>


Errors during shutdown/startup of ZooKeeper

2009-06-02 Thread Nitay
Hey guys,

We are getting a lot of messages like this in HBase:

[junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810]
server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is 0xd

For more context, the block it usually appears in is:

[junit] 2009-06-02 13:27:54,083 INFO  [main-SendThread]
zookeeper.ClientCnxn$SendThread(737): Priming connection to
java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:56511
remote=localhost/0:0:0:0:0:0:0:1:21810]
[junit] 2009-06-02 13:27:54,084 INFO  [main-SendThread]
zookeeper.ClientCnxn$SendThread(889): Server connection successful
[junit] 2009-06-02 13:27:54,093 INFO  [NIOServerCxn.Factory:21810]
server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511 lastZxid 16
[junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810]
server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid is 0x4
[junit] 2009-06-02 13:27:54,094 WARN  [NIOServerCxn.Factory:21810]
server.NIOServerCnxn(444): Exception causing close of session 0x0 due to
java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4
[junit] 2009-06-02 13:27:54,094 DEBUG [NIOServerCxn.Facto777ry:21810]
server.NIOServerCnxn(447): IOException stack trace
[junit] java.io.IOException: Client has seen zxid 0x10 our last zxid is
0x4
[junit] at
org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:544)
[junit] at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:331)
[junit] at
org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:176)
[junit] 2009-06-02 13:27:54,094 INFO  [NIOServerCxn.Factory:21810]
server.NIOServerCnxn(777): closing session:0x0 NIOServerCnxn:
java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1%0:21810
remote=/0:0:0:0:0:0:0:1%0:56511]
[junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
zookeeper.ClientCnxn$SendThread(919): Exception closing session
0x121a2a7c43a0002 to sun.nio.ch.selectionkeyi...@2c662b4e
[junit] java.io.IOException: Read error rc = -1
java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
[junit] at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653)
[junit] at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
[junit] 2009-06-02 13:27:54,097 WARN  [main-SendThread]
zookeeper.ClientCnxn$SendThread(953): Ignoring exception during shutdown
input
[junit] java.net.SocketException: Socket is not connected
[junit] at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
[junit] at
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
[junit] at
sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
[junit] at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951)
[junit] at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)


This happens in a seemingly endless loop. We are not quite sure what it
means. Can someone help shed some light on these messages?

Thanks,
-n


Re: problems on EC2?

2009-04-14 Thread Nitay
Yes, we are. We currently don't handle SessionExpired very well at all in
HBase. There are two things going on in parallel to fix it:

1) Reinitialize the ZooKeeper handler (and everything else that depends on
it) on the node in question when a SessionExpired event occurs.
2) Reduce the number of SessionExpired events we get by using Joey's JNI
solution. After the various talks about session timeout, different GC flags,
etc, we decided to pursue the JNI solution. We plan on contributing his work
back to ZooKeeper, under some contrib, so that others can use it.

In the really short term, for folks that are seeing it, using the concurrent
GC and bumping up the session timeout to 30 seconds or so seems to reduce
the frequency of the problem.

I'm curious if your problems are the same as ours. You should try tweaking
the GC parameters and session timeout to see if the problems you're having
are the same as ours.

Cheers,
-n

On Tue, Apr 14, 2009 at 6:34 PM, Ted Dunning  wrote:

> Very good pointer.  Thanks.
>
> Are you still having your problems?
>
> On Tue, Apr 14, 2009 at 6:09 PM, Nitay  wrote:
>
> > Hi Ted,
> >
> > Fellow user coming from HBase. We were recently seeing lots of
> > SessionExpired events as well. Check out this mail thread:
> >
> >
> >
> http://markmail.org/search/?q=SessionExpired#query:SessionExpired+page:1+mid:gt4c2kn4n4f5s5kw+state:results
> >
> > Perhaps this might have something to do with what you're seeing.
> >
> > Cheers,
> > -n
> >
> > On Tue, Apr 14, 2009 at 5:48 PM, Ted Dunning 
> > wrote:
> >
> > > We have been using EC2 as a substrate for our search cluster with
> > zookeeper
> > > as our coordination layer and have been seeing some strange problems.
> > >
> > > These problems seem to manifest around getting lots of anomalous
> > > disconnects
> > > and session expirations even though we have the timeout values set to 2
> > > seconds on the server side and 5 seconds on the client side.
> > >
> > > Has anybody else been seeing this?
> > >
> > > Is this related to clock jumps in a virtualized setting?
> > >
> > > On a related note, what is best practice for handling session
> expiration?
> > > Just deal with it as if it is a new start?
> > >
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> www.deepdyve.com
> 858-414-0013 (m)
> 408-773-0220 (fax)
>


Re: problems on EC2?

2009-04-14 Thread Nitay
Hi Ted,

Fellow user coming from HBase. We were recently seeing lots of
SessionExpired events as well. Check out this mail thread:

http://markmail.org/search/?q=SessionExpired#query:SessionExpired+page:1+mid:gt4c2kn4n4f5s5kw+state:results

Perhaps this might have something to do with what you're seeing.

Cheers,
-n

On Tue, Apr 14, 2009 at 5:48 PM, Ted Dunning  wrote:

> We have been using EC2 as a substrate for our search cluster with zookeeper
> as our coordination layer and have been seeing some strange problems.
>
> These problems seem to manifest around getting lots of anomalous
> disconnects
> and session expirations even though we have the timeout values set to 2
> seconds on the server side and 5 seconds on the client side.
>
> Has anybody else been seeing this?
>
> Is this related to clock jumps in a virtualized setting?
>
> On a related note, what is best practice for handling session expiration?
> Just deal with it as if it is a new start?
>


Re: Semantics of ConnectionLoss exception

2009-03-26 Thread Nitay
Why is it done that way? How am I supposed to reliably detect that my
ephemeral nodes are gone? Why not deliver the Session Expired event on the
client side after the right time has passed without communication to any
server?

On Thu, Mar 26, 2009 at 10:58 AM, Mahadev Konar wrote:

> >
> > Isn't it the case that the client won't get session expired until it's
> > able to connect to a server, right? So what might happen is that the
> > client loses connection to the server, the server eventually expires the
> > client and deletes ephemerals (notifying all watchers) but the client
> > won't see the "session expiration" until it is able to reconnect to one
> > of the servers. ie the client doesn't know it's been expired until it's
> > able to reconnect to the cluster, at which point it's notified that it's
> > been expired.
> You are right pat!
>
> mahadev
>
> >
> >>
> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html
> >> Has this information scattered around, but we should put it in the FAQ
> >> specifically.
> >
> > 3.0.1 is a bit old, try this for the latest docs:
> >
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html
> >
> >> - Is the ZooKeeper handle I'm using dead after this event?
> >> Again no. your handle is valid until you get an session expiry event or
> you
> >> do a zoo_close on your handle.
> >>
> >>
> >> Thanks
> >> mahadev
> >>
> >>
> >>
> >>
> >> On 3/25/09 5:42 PM, "Nitay"  wrote:
> >>
> >>> I'm a little unclear about the ConnectionLoss exception as it's
> described in
> >>> the FAQ and would like some clarification.
> >>>
> >>> From the state diagram, http://wiki.apache.org/hadoop/ZooKeeper/FAQ#1,
> there
> >>> are three events that cause a ConnectionLoss:
> >>>
> >>> 1) In Connecting state, call close().
> >>> 2) In Connected state, call close().
> >>> 3) In Connected state, get disconnected.
> >>>
> >>> It's the third one I'm unclear about.
> >>>
> >>> - Does this event happening mean my ephemeral nodes will go away?
> >>> - Is the ZooKeeper handle I'm using dead after this event? Meaning
> that,
> >>> similar to the SessionExpired case, I need to construct a new
> connection
> >>> handle to ZooKeeper and take care of the restarting myself. It seems
> from
> >>> the diagram that this should not be the case. Rather, seeing as the
> >>> disconnected event sends the user back to the Connecting state, my
> handle
> >>> should be fine and the library will keep trying to reconnect to
> ZooKeeper
> >>> internally? I understand my current operation may have failed, what I'm
> >>> asking about is future operations.
> >>>
> >>> Thanks,
> >>> -n
> >>
>
>


Semantics of ConnectionLoss exception

2009-03-25 Thread Nitay
I'm a little unclear about the ConnectionLoss exception as it's described in
the FAQ and would like some clarification.

>From the state diagram, http://wiki.apache.org/hadoop/ZooKeeper/FAQ#1, there
are three events that cause a ConnectionLoss:

1) In Connecting state, call close().
2) In Connected state, call close().
3) In Connected state, get disconnected.

It's the third one I'm unclear about.

- Does this event happening mean my ephemeral nodes will go away?
- Is the ZooKeeper handle I'm using dead after this event? Meaning that,
similar to the SessionExpired case, I need to construct a new connection
handle to ZooKeeper and take care of the restarting myself. It seems from
the diagram that this should not be the case. Rather, seeing as the
disconnected event sends the user back to the Connecting state, my handle
should be fine and the library will keep trying to reconnect to ZooKeeper
internally? I understand my current operation may have failed, what I'm
asking about is future operations.

Thanks,
-n


Re: Testing Zookeeper

2009-02-10 Thread Nitay
Joshua,

There may already be some JIRAs open regarding this, e.g.
https://issues.apache.org/jira/browse/ZOOKEEPER-278. You can assign those to
yourself and attach your stuff there if it fits your issue.

On Tue, Feb 10, 2009 at 11:44 AM, Mahadev Konar wrote:

> HI Joshua,
>  Feel free to open a jira and attach a patch.
>
> Please take a look at how to contribute:
>
> http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
>
> Thanks
> mahadev
>
> On 2/10/09 11:34 AM, "Joshua Tuberville" 
> wrote:
>
> > To test our zookeeper usage we built a utility class using some of the
> methods
> > in org.apache.zookeeper.test.ClientBase out of the test folder.  This
> allows
> > testing to be done using any framework JUnit4, JUnit5, TestNG, etc.  We
> would
> > prefer this be in the zookeeper jar.  Should I open a JIRA item and
> include
> > the class?
> >
> > Thanks,
> > Joshua
>
>