Xid out of order. Got 8 expected 7

2010-05-12 Thread Jordan Zimmerman
We've just started seeing an odd error and are having trouble determining the cause. Xid out of order. Got 8 expected 7 Any hints on what can cause this? Any ideas on how to debug? We're using ZK 3.3.0. The error occurs in ClientCnxn.java line 781 -Jordan

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Mahadev Konar
Hi Jordan, Can you create a jira for this? And attach all the server logs and client logs related to this timeline? How did you start up the servers? Is there some changes you might have made accidentatlly to the servers? Thanks mahadev On 5/12/10 10:49 AM, Jordan Zimmerman

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Jordan Zimmerman
Sure - if you think it's a bug. We were using Zookeeper without issue. I then refactored a bunch of code and this new behavior started. I'm starting ZK using zkServer start and haven't made any changes to the code at all. I'll get the logs together and post a JIRA. -JZ On May 12, 2010, at

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
Hi Jordan, you've seen this once or frequently? (having the server + client logs will help alot) Patrick On 05/12/2010 11:08 AM, Jordan Zimmerman wrote: Sure - if you think it's a bug. We were using Zookeeper without issue. I then refactored a bunch of code and this new behavior started. I'm

Re: New ZooKeeper client library Cages

2010-05-12 Thread Dominic Williams
Hi Patrick, Internally, ZkMultiLock constructs single path ZkReadLock and ZkWriteLock objects to handle the lock paths you add to it. These work in a similar way to that described in the ZooKeeper recipes. If you only add a single lock path to ZkMultiLock, then when you try and acquire() it

Re: New ZooKeeper client library Cages

2010-05-12 Thread Dominic Williams
Hi Mahadev, Thanks for your interest. We currently use Cages to apply locking where necessary in our operations against the Cassandra database, to manage node membership of an in-house clustered distributed platform called Starburst, and soon for maintaining centralized configuration for those

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Jordan Zimmerman
Apologies... I thought I was running 3.3.0 server, but was running 3.2.2 server with 3.3.0 client. I upgraded the server and now all works again. Sorry to trouble y'all. -Jordan On May 12, 2010, at 11:11 AM, Patrick Hunt wrote: Hi Jordan, you've seen this once or frequently? (having the

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
I'm still interested though... Are you using the new getChildren api that was added to the client in 3.3.0? (it provides a Stat object on return, the old getChildren did not). While we don't officially support 3.3.0 client with 3.2.2 server (we do support the other way around), there shouldn't

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
I think that explains it then - the server is probably dropping the new (3.3.0) getChildren message (xid 7) as it (3.2.2 server) doesn't know about that message type. Then the server responds to the client for a subsequent operation (xid 8), and at that point the client notices that

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Benjamin Reed
is this a bug? shouldn't we be returning an error. ben On 05/12/2010 11:34 AM, Patrick Hunt wrote: I think that explains it then - the server is probably dropping the new (3.3.0) getChildren message (xid 7) as it (3.2.2 server) doesn't know about that message type. Then the server responds to

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Jordan Zimmerman
Technically, there is an error generated. IMO - a more descriptive error would be helpful. -JZ On May 12, 2010, at 11:41 AM, Benjamin Reed wrote: is this a bug? shouldn't we be returning an error. ben On 05/12/2010 11:34 AM, Patrick Hunt wrote: I think that explains it then - the

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
You're right. Ben, would you mind entering a JIRA? Patrick On 05/12/2010 11:41 AM, Benjamin Reed wrote: is this a bug? shouldn't we be returning an error. ben On 05/12/2010 11:34 AM, Patrick Hunt wrote: I think that explains it then - the server is probably dropping the new (3.3.0)

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
I think Ben meant that the unknown operation itself (from server perspective) should result in an error directly on both client and server. Patrick On 05/12/2010 11:45 AM, Jordan Zimmerman wrote: Technically, there is an error generated. IMO - a more descriptive error would be helpful. -JZ

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Jordan Zimmerman
So, I'm off the Jira hook then? -JZ On May 12, 2010, at 11:49 AM, Patrick Hunt wrote: You're right. Ben, would you mind entering a JIRA? Patrick On 05/12/2010 11:41 AM, Benjamin Reed wrote: is this a bug? shouldn't we be returning an error. ben On 05/12/2010 11:34 AM, Patrick Hunt

Re: Xid out of order. Got 8 expected 7

2010-05-12 Thread Patrick Hunt
Hm, if you don't mind enter that jira, would still like to verify by looking at the logs. Patrick On 05/12/2010 11:52 AM, Jordan Zimmerman wrote: So, I'm off the Jira hook then? -JZ On May 12, 2010, at 11:49 AM, Patrick Hunt wrote: You're right. Ben, would you mind entering a JIRA?

Re: New ZooKeeper client library Cages

2010-05-12 Thread Dominic Williams
Anyone interested in using Cages and ZooKeeper with NoSQL databases might like new blog post http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/ On 12 May 2010 00:02, Dominic Williams thedwilli...@googlemail.com wrote: Anyone looking for a Java client

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Aaron Crow
I may have a better idea of what caused the trouble. I way, WAY underestimated the number of nodes we collect over time. Right now we're at 1.9 million. This isn't a bug of our application; it's actually a feature (but perhaps an ill-conceived one). A most recent snapshot from a Zookeeper db is

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Ted Dunning
Impressive number here, especially at your quoted few per second rate. Are you sure that you haven't inadvertently synchronized GC on multiple machines? On Wed, May 12, 2010 at 8:30 PM, Aaron Crow dirtyvagab...@yahoo.com wrote: Right now we're at 1.9 million. This isn't a bug of our

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Aaron Crow
Hi Ted, yeah it's a big number, eh? We're essentially using Zookeeper to track the state of cache entries, and currently we don't bound our cache. I didn't realize how many entries we grow to over a long period of time, until I started counting nodes in Zookeeper. But, sorry, I'm not sure what you

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Patrick Hunt
On 05/12/2010 08:30 PM, Aaron Crow wrote: I may have a better idea of what caused the trouble. I way, WAY underestimated the number of nodes we collect over time. Right now we're at 1.9 million. This isn't a bug of our application; it's actually a feature (but perhaps an ill-conceived one). A

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-05-12 Thread Ted Dunning
Yes. That is roughly what I mean. If one server starts a GC, it can effectively go offline. That might pressure the other servers enough that one of them starts a GC. This is unlikely with your GC settings, but you should turn on the verbose GC logging to be sure. On Wed, May 12, 2010 at