Re: Getting a node exists code on a sequence create

2010-11-03 Thread Benjamin Reed

yes, i think you have summarized the problem nicely jeremy.

i'm curious about your reasoning for running servers in standalone mode 
and then merging. can you explain that a bit more?


thanx
ben

On 11/01/2010 04:51 PM, Jeremy Stribling wrote:

I think this is caused by stupid behavior on our application's part, and
the error message just confused me.  Here's what I think is happening.

1) 3 servers are up and accepting data, creating sequential znodes under
/zkrsm.
2) 1 server dies, the other 2 continue creating sequential znodes.
3) The 1st server restarts, but instead of joining the other 2 servers,
it starts an instance by itself, knowing only about the znodes created
before it died.  [This is a bug in our application -- it is supposed to
join the other 2 servers in their cluster.]
4) Another server (#2) dies and restarts, joining the cluster of server
#1.  It knows about more sequential znodes under /zkrsm than server #1.
5) At this point, trying to create a new znode in the #1-#2 cluster
might be problematic, because servers #1 and #2 know about different
sets of znode.  If #1 allocates what it thinks is a new sequential
number for a new znode, it could be one already used by server #2, and
hence a node exists code might be returned.

So, in summary, our application is almost certainly using Zookeeper
wrong.  Sorry to waste time on the list, but maybe this thread can help
someone in the future.

(If this explanation sounds totally off-base though, let me know.  I'm
not 100% certain this is what's happening, but it definitely seems likely.)

Thanks,

Jeremy

On 11/01/2010 02:56 PM, Jeremy Stribling wrote:

Yes, every znode in /zkrsm was created with the sequence flag.

We bring up a cluster of three nodes, though we do it in a slightly
odd manner to support dynamism: each node starts up as a single-node
instance knowing only itself, and then each node is contacted by a
coordinator that kills the ZooKeeperServer object and starts a new
QuorumPeer object using the full list of three servers.  I know this
is weird; perhaps this has something to do with it.

Other than the weird setup behavior, we are just writing a few
sequential records into the system (which all seems to work fine),
killing one of the nodes (one that has been elected leader via the
standard recommended ZK leader election algorithm), restarting it, and
then trying to create more sequential znodes.  I'm guessing this is
pretty well-tested behavior, so there must be something weird or wrong
about the way I have stuff setup.

I'm happy to provide whatever logs or snapshots might help someone
track this down.  Thanks,

Jeremy

On 11/01/2010 02:42 PM, Benjamin Reed wrote:

how were you able to reproduce it?

all the znodes in /zkrsm were created with the sequence flag. right?

ben

On 11/01/2010 02:28 PM, Jeremy Stribling wrote:

We were able to reproduce it.  A stat on all three servers looks
identical:

[zk:ip:port(CONNECTED) 0] stat /zkrsm
cZxid = 9
ctime = Mon Nov 01 13:01:57 PDT 2010
mZxid = 9
mtime = Mon Nov 01 13:01:57 PDT 2010
pZxid = 12884902218
cversion = 177
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 0
numChildren = 177

Creating a sequential node through the command line also fails:

[zk:ip:port(CONNECTED) 1] create -s /zkrsm/_record
testdata
Node already exists: /zkrsm/_record

One potentially interesting thing is that numChildren above is 177,
though I have sequence numbers on that record prefix up to 214 or so.
There seem to be some gaps though -- I thin ls /zkrsm only shows
about
177.  Not sure if that's relevant or not.

Thanks,

Jeremy

On 11/01/2010 12:06 PM, Jeremy Stribling wrote:

Thanks for the reply.  It happened every time we called create, not
just once.  More than that, we tried restarting each of the nodes in
the system (one-by-one), including the new master, and the problem
continued.

Unfortunately we cleaned everything up, and it's not in that state
anymore.  We haven't yet tried to reproduce, but I will try and report
back if I can get any cversion info.

Jeremy

On 11/01/2010 11:33 AM, Patrick Hunt wrote:

Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short
period
of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Striblingst...@nicira.com
wrote:

HI everyone,

Is there any situation in which creating a new ZK node with the
SEQUENCE
flag should result in a 

Re: Getting a node exists code on a sequence create

2010-11-03 Thread Jeremy Stribling
We're building a product for users that aren't used to deploying 
distributed systems, and we're trying to make it as easy to configure 
and use as possible.  That means not requiring the full list of IP 
addresses for every node at configuration time; instead, each node can 
be configured with a single IP of an existing node and the cluster can 
grow dynamically as new nodes come up.  Furthermore, if a new node needs 
to be added to the cluster, it can be done without changing the 
configurations of the existing nodes.


As you might be able to guess, we're really hoping there's progress in 
the future on a version of Zookeeper that can properly support dynamic 
joins, but for now this approach seems to work ok for us.


Jeremy

On 11/03/2010 11:37 AM, Benjamin Reed wrote:

yes, i think you have summarized the problem nicely jeremy.

i'm curious about your reasoning for running servers in standalone 
mode and then merging. can you explain that a bit more?


thanx
ben

On 11/01/2010 04:51 PM, Jeremy Stribling wrote:

I think this is caused by stupid behavior on our application's part, and
the error message just confused me.  Here's what I think is happening.

1) 3 servers are up and accepting data, creating sequential znodes under
/zkrsm.
2) 1 server dies, the other 2 continue creating sequential znodes.
3) The 1st server restarts, but instead of joining the other 2 servers,
it starts an instance by itself, knowing only about the znodes created
before it died.  [This is a bug in our application -- it is supposed to
join the other 2 servers in their cluster.]
4) Another server (#2) dies and restarts, joining the cluster of server
#1.  It knows about more sequential znodes under /zkrsm than server #1.
5) At this point, trying to create a new znode in the #1-#2 cluster
might be problematic, because servers #1 and #2 know about different
sets of znode.  If #1 allocates what it thinks is a new sequential
number for a new znode, it could be one already used by server #2, and
hence a node exists code might be returned.

So, in summary, our application is almost certainly using Zookeeper
wrong.  Sorry to waste time on the list, but maybe this thread can help
someone in the future.

(If this explanation sounds totally off-base though, let me know.  I'm
not 100% certain this is what's happening, but it definitely seems 
likely.)


Thanks,

Jeremy

On 11/01/2010 02:56 PM, Jeremy Stribling wrote:

Yes, every znode in /zkrsm was created with the sequence flag.

We bring up a cluster of three nodes, though we do it in a slightly
odd manner to support dynamism: each node starts up as a single-node
instance knowing only itself, and then each node is contacted by a
coordinator that kills the ZooKeeperServer object and starts a new
QuorumPeer object using the full list of three servers.  I know this
is weird; perhaps this has something to do with it.

Other than the weird setup behavior, we are just writing a few
sequential records into the system (which all seems to work fine),
killing one of the nodes (one that has been elected leader via the
standard recommended ZK leader election algorithm), restarting it, and
then trying to create more sequential znodes.  I'm guessing this is
pretty well-tested behavior, so there must be something weird or wrong
about the way I have stuff setup.

I'm happy to provide whatever logs or snapshots might help someone
track this down.  Thanks,

Jeremy

On 11/01/2010 02:42 PM, Benjamin Reed wrote:

how were you able to reproduce it?

all the znodes in /zkrsm were created with the sequence flag. right?

ben

On 11/01/2010 02:28 PM, Jeremy Stribling wrote:

We were able to reproduce it.  A stat on all three servers looks
identical:

[zk:ip:port(CONNECTED) 0] stat /zkrsm
cZxid = 9
ctime = Mon Nov 01 13:01:57 PDT 2010
mZxid = 9
mtime = Mon Nov 01 13:01:57 PDT 2010
pZxid = 12884902218
cversion = 177
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 0
numChildren = 177

Creating a sequential node through the command line also fails:

[zk:ip:port(CONNECTED) 1] create -s /zkrsm/_record
testdata
Node already exists: /zkrsm/_record

One potentially interesting thing is that numChildren above is 177,
though I have sequence numbers on that record prefix up to 214 or so.
There seem to be some gaps though -- I thin ls /zkrsm only shows
about
177.  Not sure if that's relevant or not.

Thanks,

Jeremy

On 11/01/2010 12:06 PM, Jeremy Stribling wrote:

Thanks for the reply.  It happened every time we called create, not
just once.  More than that, we tried restarting each of the nodes in
the system (one-by-one), including the new master, and the problem
continued.

Unfortunately we cleaned everything up, and it's not in that state
anymore.  We haven't yet tried to reproduce, but I will try and 
report

back if I can get any cversion info.

Jeremy

On 11/01/2010 11:33 AM, Patrick Hunt wrote:
Hi Jeremy, this sounds like a bug to me, I don't think you 
should be


Re: Getting a node exists code on a sequence create

2010-11-01 Thread Patrick Hunt
Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short period
of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Stribling st...@nicira.com wrote:
 HI everyone,

 Is there any situation in which creating a new ZK node with the SEQUENCE
 flag should result in a node exists error?  I'm seeing this happening
 after a failure of a ZK node that appeared to have been the master; when the
 new master takes over, my app is unable to create a new SEQUENCE node under
 an existing parent node.  I'm using Zookeeper 3.2.2.

 Here's a representative log snippet:

 --
 3050756 [ProcessThread:-1] TRACE
 org.apache.zookeeper.server.PrepRequestProcessor  -
 :Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
 zxid:0xfffe txntype:unknown /zkrsm/_record
 3050756 [ProcessThread:-1] WARN
 org.apache.zookeeper.server.PrepRequestProcessor  - Got exception when
 processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
 zxid:0xfffe txntype:unknown n/a
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
 NodeExists
        at
 org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
        at
 org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
 3050756 [ProcessThread:-1] DEBUG
 org.apache.zookeeper.server.quorum.CommitProcessor  - Processing request::
 sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050756 [ProcessThread:-1] DEBUG org.apache.zookeeper.server.quorum.Leader
  - Proposing:: sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
 zxid:0x5027e txntype:-1 n/a
 3050756 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
 Ack zxid: 0x5027e
 3050757 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
 outstanding proposal: 0x5027e
 3050757 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
 outstanding proposals all
 3050757 [SyncThread:0] DEBUG org.apache.zookeeper.server.quorum.Leader  -
 Count for zxid: 0x5027e is 1
 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
 org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
 org.apache.zookeeper.server.quorum.Leader  - outstanding proposal:
 0x5027e
 3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
 org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
 org.apache.zookeeper.server.quorum.Leader  - Count for zxid: 0x5027e is
 2
 3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
 org.apache.zookeeper.server.quorum.CommitProcessor  - Committing request::
 sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050757 [CommitProcessor:0] DEBUG
 org.apache.zookeeper.server.FinalRequestProcessor  - Processing request::
 sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050757 [CommitProcessor:0] TRACE
 org.apache.zookeeper.server.FinalRequestProcessor  -
 :Esessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
 txntype:-1 n/a
 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
 org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
 3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
 org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
 3050757 [FollowerHandler-/172.16.0.28:41062] DEBUG
 org.apache.zookeeper.server.quorum.Leader  - outstanding is 0
 --

 I'm still a n00b at understanding ZK log messages, so maybe there's
 something obvious going on.  I looked in the JIRA and did my best to search
 the mailing list archives, but couldn't find anything related to this.  Any
 ideas?  Thanks very much,

 Jeremy



Re: Getting a node exists code on a sequence create

2010-11-01 Thread Jeremy Stribling
Thanks for the reply.  It happened every time we called create, not just 
once.  More than that, we tried restarting each of the nodes in the 
system (one-by-one), including the new master, and the problem continued.


Unfortunately we cleaned everything up, and it's not in that state 
anymore.  We haven't yet tried to reproduce, but I will try and report 
back if I can get any cversion info.


Jeremy

On 11/01/2010 11:33 AM, Patrick Hunt wrote:

Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short period
of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Striblingst...@nicira.com  wrote:
   

HI everyone,

Is there any situation in which creating a new ZK node with the SEQUENCE
flag should result in a node exists error?  I'm seeing this happening
after a failure of a ZK node that appeared to have been the master; when the
new master takes over, my app is unable to create a new SEQUENCE node under
an existing parent node.  I'm using Zookeeper 3.2.2.

Here's a representative log snippet:

--
3050756 [ProcessThread:-1] TRACE
org.apache.zookeeper.server.PrepRequestProcessor  -
:Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0xfffe txntype:unknown /zkrsm/_record
3050756 [ProcessThread:-1] WARN
org.apache.zookeeper.server.PrepRequestProcessor  - Got exception when
processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0xfffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
at
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
at
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
3050756 [ProcessThread:-1] DEBUG
org.apache.zookeeper.server.quorum.CommitProcessor  - Processing request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
txntype:-1 n/a
3050756 [ProcessThread:-1] DEBUG org.apache.zookeeper.server.quorum.Leader
  - Proposing:: sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0x5027e txntype:-1 n/a
3050756 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
Ack zxid: 0x5027e
3050757 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
outstanding proposal: 0x5027e
3050757 [SyncThread:0] TRACE org.apache.zookeeper.server.quorum.Leader  -
outstanding proposals all
3050757 [SyncThread:0] DEBUG org.apache.zookeeper.server.quorum.Leader  -
Count for zxid: 0x5027e is 1
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposal:
0x5027e
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
org.apache.zookeeper.server.quorum.Leader  - Count for zxid: 0x5027e is
2
3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
org.apache.zookeeper.server.quorum.CommitProcessor  - Committing request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
txntype:-1 n/a
3050757 [CommitProcessor:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor  - Processing request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
txntype:-1 n/a
3050757 [CommitProcessor:0] TRACE
org.apache.zookeeper.server.FinalRequestProcessor  -
:Esessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 zxid:0x5027e
txntype:-1 n/a
3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
3050757 [FollowerHandler-/172.16.0.28:41062] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
3050757 [FollowerHandler-/172.16.0.28:41062] DEBUG
org.apache.zookeeper.server.quorum.Leader  - outstanding is 0
--

I'm still a n00b at understanding ZK log messages, so maybe there's
something obvious going on.  I looked in the JIRA and did my best to search
the mailing list archives, but couldn't find anything related to this.  Any
ideas?  Thanks very much,

Jeremy

 


Re: Getting a node exists code on a sequence create

2010-11-01 Thread Jeremy Stribling
We were able to reproduce it.  A stat on all three servers looks 
identical:


[zk: ip:port(CONNECTED) 0] stat /zkrsm
cZxid = 9
ctime = Mon Nov 01 13:01:57 PDT 2010
mZxid = 9
mtime = Mon Nov 01 13:01:57 PDT 2010
pZxid = 12884902218
cversion = 177
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 0
numChildren = 177

Creating a sequential node through the command line also fails:

[zk: ip:port(CONNECTED) 1] create -s /zkrsm/_record 
testdata

Node already exists: /zkrsm/_record

One potentially interesting thing is that numChildren above is 177, 
though I have sequence numbers on that record prefix up to 214 or so.  
There seem to be some gaps though -- I thin ls /zkrsm only shows about 
177.  Not sure if that's relevant or not.


Thanks,

Jeremy

On 11/01/2010 12:06 PM, Jeremy Stribling wrote:
Thanks for the reply.  It happened every time we called create, not 
just once.  More than that, we tried restarting each of the nodes in 
the system (one-by-one), including the new master, and the problem 
continued.


Unfortunately we cleaned everything up, and it's not in that state 
anymore.  We haven't yet tried to reproduce, but I will try and report 
back if I can get any cversion info.


Jeremy

On 11/01/2010 11:33 AM, Patrick Hunt wrote:

Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short period
of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Striblingst...@nicira.com  
wrote:

HI everyone,

Is there any situation in which creating a new ZK node with the 
SEQUENCE

flag should result in a node exists error?  I'm seeing this happening
after a failure of a ZK node that appeared to have been the master; 
when the
new master takes over, my app is unable to create a new SEQUENCE 
node under

an existing parent node.  I'm using Zookeeper 3.2.2.

Here's a representative log snippet:

--
3050756 [ProcessThread:-1] TRACE
org.apache.zookeeper.server.PrepRequestProcessor  -
:Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0xfffe txntype:unknown /zkrsm/_record
3050756 [ProcessThread:-1] WARN
org.apache.zookeeper.server.PrepRequestProcessor  - Got exception when
processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0xfffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode =

NodeExists
at
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245) 


at
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) 


3050756 [ProcessThread:-1] DEBUG
org.apache.zookeeper.server.quorum.CommitProcessor  - Processing 
request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 
zxid:0x5027e

txntype:-1 n/a
3050756 [ProcessThread:-1] DEBUG 
org.apache.zookeeper.server.quorum.Leader

  - Proposing:: sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0x5027e txntype:-1 n/a
3050756 [SyncThread:0] TRACE 
org.apache.zookeeper.server.quorum.Leader  -

Ack zxid: 0x5027e
3050757 [SyncThread:0] TRACE 
org.apache.zookeeper.server.quorum.Leader  -

outstanding proposal: 0x5027e
3050757 [SyncThread:0] TRACE 
org.apache.zookeeper.server.quorum.Leader  -

outstanding proposals all
3050757 [SyncThread:0] DEBUG 
org.apache.zookeeper.server.quorum.Leader  -

Count for zxid: 0x5027e is 1
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposal:
0x5027e
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
org.apache.zookeeper.server.quorum.Leader  - Count for zxid: 
0x5027e is

2
3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
org.apache.zookeeper.server.quorum.CommitProcessor  - Committing 
request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 
zxid:0x5027e

txntype:-1 n/a
3050757 [CommitProcessor:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor  - Processing 
request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691 
zxid:0x5027e

txntype:-1 n/a
3050757 [CommitProcessor:0] TRACE

Re: Getting a node exists code on a sequence create

2010-11-01 Thread Benjamin Reed

how were you able to reproduce it?

all the znodes in /zkrsm were created with the sequence flag. right?

ben

On 11/01/2010 02:28 PM, Jeremy Stribling wrote:

We were able to reproduce it.  A stat on all three servers looks
identical:

[zk:ip:port(CONNECTED) 0] stat /zkrsm
cZxid = 9
ctime = Mon Nov 01 13:01:57 PDT 2010
mZxid = 9
mtime = Mon Nov 01 13:01:57 PDT 2010
pZxid = 12884902218
cversion = 177
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 0
numChildren = 177

Creating a sequential node through the command line also fails:

[zk:ip:port(CONNECTED) 1] create -s /zkrsm/_record
testdata
Node already exists: /zkrsm/_record

One potentially interesting thing is that numChildren above is 177,
though I have sequence numbers on that record prefix up to 214 or so.
There seem to be some gaps though -- I thin ls /zkrsm only shows about
177.  Not sure if that's relevant or not.

Thanks,

Jeremy

On 11/01/2010 12:06 PM, Jeremy Stribling wrote:

Thanks for the reply.  It happened every time we called create, not
just once.  More than that, we tried restarting each of the nodes in
the system (one-by-one), including the new master, and the problem
continued.

Unfortunately we cleaned everything up, and it's not in that state
anymore.  We haven't yet tried to reproduce, but I will try and report
back if I can get any cversion info.

Jeremy

On 11/01/2010 11:33 AM, Patrick Hunt wrote:

Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short period
of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Striblingst...@nicira.com
wrote:

HI everyone,

Is there any situation in which creating a new ZK node with the
SEQUENCE
flag should result in a node exists error?  I'm seeing this happening
after a failure of a ZK node that appeared to have been the master;
when the
new master takes over, my app is unable to create a new SEQUENCE
node under
an existing parent node.  I'm using Zookeeper 3.2.2.

Here's a representative log snippet:

--
3050756 [ProcessThread:-1] TRACE
org.apache.zookeeper.server.PrepRequestProcessor  -
:Psessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0xfffe txntype:unknown /zkrsm/_record
3050756 [ProcessThread:-1] WARN
org.apache.zookeeper.server.PrepRequestProcessor  - Got exception when
processing sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0xfffe txntype:unknown n/a
org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode =
NodeExists
 at
org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)

 at
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)

3050756 [ProcessThread:-1] DEBUG
org.apache.zookeeper.server.quorum.CommitProcessor  - Processing
request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0x5027e
txntype:-1 n/a
3050756 [ProcessThread:-1] DEBUG
org.apache.zookeeper.server.quorum.Leader
   - Proposing:: sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0x5027e txntype:-1 n/a
3050756 [SyncThread:0] TRACE
org.apache.zookeeper.server.quorum.Leader  -
Ack zxid: 0x5027e
3050757 [SyncThread:0] TRACE
org.apache.zookeeper.server.quorum.Leader  -
outstanding proposal: 0x5027e
3050757 [SyncThread:0] TRACE
org.apache.zookeeper.server.quorum.Leader  -
outstanding proposals all
3050757 [SyncThread:0] DEBUG
org.apache.zookeeper.server.quorum.Leader  -
Count for zxid: 0x5027e is 1
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - Ack zxid: 0x5027e
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposal:
0x5027e
3050757 [FollowerHandler-/172.16.0.28:48776] TRACE
org.apache.zookeeper.server.quorum.Leader  - outstanding proposals all
3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
org.apache.zookeeper.server.quorum.Leader  - Count for zxid:
0x5027e is
2
3050757 [FollowerHandler-/172.16.0.28:48776] DEBUG
org.apache.zookeeper.server.quorum.CommitProcessor  - Committing
request::
sessionid:0x12bf518350f0001 type:create cxid:0x4cca0691
zxid:0x5027e
txntype:-1 n/a
3050757 [CommitProcessor:0] DEBUG
org.apache.zookeeper.server.FinalRequestProcessor  - Processing
request::
sessionid:0x12bf518350f0001 type:create 

Re: Getting a node exists code on a sequence create

2010-11-01 Thread Jeremy Stribling
I think this is caused by stupid behavior on our application's part, and 
the error message just confused me.  Here's what I think is happening.


1) 3 servers are up and accepting data, creating sequential znodes under 
/zkrsm.

2) 1 server dies, the other 2 continue creating sequential znodes.
3) The 1st server restarts, but instead of joining the other 2 servers, 
it starts an instance by itself, knowing only about the znodes created 
before it died.  [This is a bug in our application -- it is supposed to 
join the other 2 servers in their cluster.]
4) Another server (#2) dies and restarts, joining the cluster of server 
#1.  It knows about more sequential znodes under /zkrsm than server #1.
5) At this point, trying to create a new znode in the #1-#2 cluster 
might be problematic, because servers #1 and #2 know about different 
sets of znode.  If #1 allocates what it thinks is a new sequential 
number for a new znode, it could be one already used by server #2, and 
hence a node exists code might be returned.


So, in summary, our application is almost certainly using Zookeeper 
wrong.  Sorry to waste time on the list, but maybe this thread can help 
someone in the future.


(If this explanation sounds totally off-base though, let me know.  I'm 
not 100% certain this is what's happening, but it definitely seems likely.)


Thanks,

Jeremy

On 11/01/2010 02:56 PM, Jeremy Stribling wrote:

Yes, every znode in /zkrsm was created with the sequence flag.

We bring up a cluster of three nodes, though we do it in a slightly 
odd manner to support dynamism: each node starts up as a single-node 
instance knowing only itself, and then each node is contacted by a 
coordinator that kills the ZooKeeperServer object and starts a new 
QuorumPeer object using the full list of three servers.  I know this 
is weird; perhaps this has something to do with it.


Other than the weird setup behavior, we are just writing a few 
sequential records into the system (which all seems to work fine), 
killing one of the nodes (one that has been elected leader via the 
standard recommended ZK leader election algorithm), restarting it, and 
then trying to create more sequential znodes.  I'm guessing this is 
pretty well-tested behavior, so there must be something weird or wrong 
about the way I have stuff setup.


I'm happy to provide whatever logs or snapshots might help someone 
track this down.  Thanks,


Jeremy

On 11/01/2010 02:42 PM, Benjamin Reed wrote:

how were you able to reproduce it?

all the znodes in /zkrsm were created with the sequence flag. right?

ben

On 11/01/2010 02:28 PM, Jeremy Stribling wrote:

We were able to reproduce it.  A stat on all three servers looks
identical:

[zk:ip:port(CONNECTED) 0] stat /zkrsm
cZxid = 9
ctime = Mon Nov 01 13:01:57 PDT 2010
mZxid = 9
mtime = Mon Nov 01 13:01:57 PDT 2010
pZxid = 12884902218
cversion = 177
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0
dataLength = 0
numChildren = 177

Creating a sequential node through the command line also fails:

[zk:ip:port(CONNECTED) 1] create -s /zkrsm/_record
testdata
Node already exists: /zkrsm/_record

One potentially interesting thing is that numChildren above is 177,
though I have sequence numbers on that record prefix up to 214 or so.
There seem to be some gaps though -- I thin ls /zkrsm only shows 
about

177.  Not sure if that's relevant or not.

Thanks,

Jeremy

On 11/01/2010 12:06 PM, Jeremy Stribling wrote:

Thanks for the reply.  It happened every time we called create, not
just once.  More than that, we tried restarting each of the nodes in
the system (one-by-one), including the new master, and the problem
continued.

Unfortunately we cleaned everything up, and it's not in that state
anymore.  We haven't yet tried to reproduce, but I will try and report
back if I can get any cversion info.

Jeremy

On 11/01/2010 11:33 AM, Patrick Hunt wrote:

Hi Jeremy, this sounds like a bug to me, I don't think you should be
getting the nodeexists when the sequence flag is set.

Looking at the code briefly we use the parent's cversion
(incremented each time the child list is changed, added/removed).

Did you see this error each time you called create, or just once? If
you look at the cversion in the Stat of the znode /zkrsm on each of
the servers what does it show? You can use the java CLI to connect to
each of your servers and access this information. It would be
interesting to see if the data was out of sync only for a short 
period

of time, or forever. Is this repeatable?

Ben/Flavio do you see anything here?

Patrick

On Thu, Oct 28, 2010 at 6:06 PM, Jeremy Striblingst...@nicira.com
wrote:

HI everyone,

Is there any situation in which creating a new ZK node with the
SEQUENCE
flag should result in a node exists error?  I'm seeing this 
happening

after a failure of a ZK node that appeared to have been the master;
when the
new master takes over, my app is unable to create a new SEQUENCE
node under
an existing