Client seeing wrong data on nodeDataChanged
I'm trying to debug an issue that maybe you fellas have some ideas for figuring. In short: Client 1 updates a znode setting its content to X, then X again, then Y, and then finally it deletes the znode. Client 1 is watching the znode and I can see that its getting three nodeDataChanged events and a nodeDeleted. Client 2 is also watching the znode. It gets notified three times: two nodeDataChanged events(only) and a nodeDeleted event. I'd expect 3 nodeDataChanged events but understand a client might skip states. The problem is that when client 2 looks at the data in the znode on nodeDataChanged, for both cases the data is Y. Not X and then Y, but Y both times. This is unexpected. This is 3.3.1 on a 5 node ensemble. I have full zk logging enabled. Would it help posting these? St.Ack
Re: Client seeing wrong data on nodeDataChanged
On Thu, Oct 28, 2010 at 7:32 PM, Ted Dunning ted.dunn...@gmail.com wrote: Client 2 is not guaranteed to see X if it doesn't get to asking before the value has been updated to Y. Right, but I wouldn't expect the watch to be triggered twice with value Y. Anyways, I think we have a handle on whats going on: at the time of the above incident, the master process is experiencing a flood of zk changes and our thought is that we're not paying sufficient attention to the order of receipt. Will be back if this is not the issue. Thanks, St.Ack
Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup
We're working on hbase 0.21 as being the first hbase that shows up in a maven repo. St.Ack On Fri, Jan 22, 2010 at 1:12 PM, Mahadev Konar maha...@yahoo-inc.com wrote: Unfortunately no.. We are planning to deploy 3.3 as the first version on maven repo. Thanks mahadev On 1/22/10 12:58 PM, Ted Dunning ted.dunn...@gmail.com wrote: Is ZK 3.2.2 in a maven repository somewhere? -- Forwarded message -- From: Drew Farris drew.far...@gmail.com Date: Fri, Jan 22, 2010 at 11:47 AM Subject: Re: [jira] Commented: (MAHOUT-238) Further Dependency Cleanup To: mahout-...@lucene.apache.org Neither hbase 0.20.2 nor zookeeper (any version) appear to be in a maven repo at this point, so Mahout would have to roll and deploy these. What was the process that was followed to build and deploy the mahout-packaged hadoop 0.20.1 and hbase artifacts? Is this something I could submit a patch to Mahout for, or better left for the committers? As Ted pointed out, yes the release of zk is 3.2.2 Drew On Thu, Jan 21, 2010 at 5:12 AM, zhao zhendong zhaozhend...@gmail.com wrote: Hi Drew, I propose to 1) update hbase-0.20.0.jar to hbase-0.20.2.jar due to the later is stable and hbased-platform is based on this version, 2) and add zookeeper-3.2.1.jar. Cheers, Zhendong On Tue, Jan 19, 2010 at 12:36 PM, zhao zhendong zhaozhend...@gmail.com wrote: Hi Drew, Including a source code in snapshots that will be great. Currently, the HDFS reader does not work in 0.20.2. Without source code, it's not convenient for me to debug the code. Cheers, Zhendong On Sat, Jan 9, 2010 at 12:25 AM, Drew Farris drew.far...@gmail.com wrote: I wonder if we can get the hadoop people to include source jars with their snapshots? On Fri, Jan 8, 2010 at 11:23 AM, Sean Owen sro...@gmail.com wrote: I need a fix after 0.20.1, that's the primary reason. As a bonus, we don't have to maintain our own version. The downside is relying on a SNAPSHOT, but seems worth it to me. On Fri, Jan 8, 2010 at 4:02 PM, zhao zhendong zhaozhend...@gmail.com wrote: Thanks Drew, +1 for me to maintain a stable hadoop release, such as 0.20.1. The reason is obvious :) Cheers, Zhendong -- - Zhen-Dong Zhao (Maxim) Department of Computer Science School of Computing National University of Singapore -- - Zhen-Dong Zhao (Maxim) Department of Computer Science School of Computing National University of Singapore
Asking zk cluster how its configured and whats this expired about?
Hey lads: I want to ask a running zk cluster what its configuration is -- ticktime, session timeout, etc. -- but do not see how. There are the four letter words. Dump and stat do not print what I want. I took a look in logs -- the leader in particular -- and do not see vitals dumped out. Am I missing something? I was also wondering what this expire stuff in the dump output is about? Here is what I see: $ echo dump|nc X.X.X.X 2181 SessionTracker dump: Session Sets (12): 0 expire at Tue Nov 24 20:56:24 UTC 2009: 0 expire at Tue Nov 24 20:56:27 UTC 2009: 0 expire at Tue Nov 24 20:56:30 UTC 2009: 0 expire at Tue Nov 24 20:56:39 UTC 2009: 0 expire at Tue Nov 24 20:56:42 UTC 2009: 0 expire at Tue Nov 24 20:56:45 UTC 2009: 0 expire at Tue Nov 24 20:56:48 UTC 2009: 0 expire at Tue Nov 24 20:57:00 UTC 2009: 0 expire at Tue Nov 24 20:57:03 UTC 2009: 2 expire at Tue Nov 24 20:57:06 UTC 2009: 82512629887926272 154570223919497222 2 expire at Tue Nov 24 20:57:09 UTC 2009: 82512629887926273 10455035895349254 3 expire at Tue Nov 24 20:57:21 UTC 2009: 154570223919497216 10455035895349248 154570223919497221 ephemeral nodes dump: Sessions with Ephemerals (4): 0x2524ccbca8: /hbase/rs/1259042878053 0x12524ccb9f4: /hbase/rs/1259042878032 0x12524ccb9f40001: /hbase/rs/1259042878106 0x22524ccb993: /hbase/master Thanks, St.Ack
Re: Asking zk cluster how its configured and whats this expired about?
On Tue, Nov 24, 2009 at 1:33 PM, Patrick Hunt ph...@apache.org wrote: We can definitely add this, please create a JIRA. ZOOKEEPER-595 I was also wondering what this expire stuff in the dump output is about? Those are the expiration sets, or buckets. Each client session is put into a bucket based on when we last heard from it and it's timeout. The leader uses this to determine when to expire sessions. Unfortunately the session ids are being printed in decimal, this is fixed in 3.3.0. Good find, actually this would be useful information for you to monitor in determining which of your hbase clients are falling behind wrt heartbeating. Well, are items listed under '2 expire at Tue Nov 24 20:57:06 UTC 2009' items that have expired or rather, just a logging of when they will expire? Looking in logs I do not see sessions expiring. Thanks, St.Ack Here is what I see: $ echo dump|nc X.X.X.X 2181 SessionTracker dump: Session Sets (12): 0 expire at Tue Nov 24 20:56:24 UTC 2009: 0 expire at Tue Nov 24 20:56:27 UTC 2009: 0 expire at Tue Nov 24 20:56:30 UTC 2009: 0 expire at Tue Nov 24 20:56:39 UTC 2009: 0 expire at Tue Nov 24 20:56:42 UTC 2009: 0 expire at Tue Nov 24 20:56:45 UTC 2009: 0 expire at Tue Nov 24 20:56:48 UTC 2009: 0 expire at Tue Nov 24 20:57:00 UTC 2009: 0 expire at Tue Nov 24 20:57:03 UTC 2009: 2 expire at Tue Nov 24 20:57:06 UTC 2009: 82512629887926272 154570223919497222 2 expire at Tue Nov 24 20:57:09 UTC 2009: 82512629887926273 10455035895349254 3 expire at Tue Nov 24 20:57:21 UTC 2009: 154570223919497216 10455035895349248 154570223919497221 ephemeral nodes dump: Sessions with Ephemerals (4): 0x2524ccbca8: /hbase/rs/1259042878053 0x12524ccb9f4: /hbase/rs/1259042878032 0x12524ccb9f40001: /hbase/rs/1259042878106 0x22524ccb993: /hbase/master Thanks, St.Ack
Please disregard - Re: Exception on close of connection (WAS - Re: c client on win32)
Please disregard. Sorry for the noise (Patrick, of note, I am seeing this session timeout on a cluster other than Zhenyus). St.Ack On Fri, Nov 20, 2009 at 4:24 PM, stack st...@duboce.net wrote: Sorry, I had a bad subject on the below question. St.Ack On Fri, Nov 20, 2009 at 4:22 PM, stack st...@duboce.net wrote: Below an excerpt from a single node zk quorum that was at heart of a small hbase cluster. Unfortunately the log is not at DEBUG level (I've asked the gentleman to up the log level meantime). What it seems to be reporting is that an exception while closing a session caused it to timeout all connected sessions. Here is the line that mentions the exception on close of session. There is no stack trace: 2009-11-20 03:41:04,766 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x124bc250d700790 due to java.io.IOException: Read error Is it correct that an error at this stage throws out all connected sessions? Thanks, St.Ack 2009-11-20 00:00:04,948 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.20.101:50716 lastZxid 0 2009-11-20 00:00:04,982 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x1250f26319f0016 2009-11-20 00:00:05,051 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x1250f26319f0016 valid:true 2009-11-20 00:00:05,051 WARN org.apache.zookeeper.server.PrepRequestProcessor: Got exception when processing sessionid:0x1250f26319f0016 type:create c xid:0x1 zxid:0xfffe txntype:unknown n/a org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) 2009-11-20 00:00:40,150 WARN org.apache.zookeeper.server.PrepRequestProcessor: Got exception when processing sessionid:0x1250f26319f0016 type:create c xid:0x4 zxid:0xfffe txntype:unknown n/a org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114) 2009-11-20 00:00:50,428 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x1250f26319f0016 due to java.io.IOExceptio n: Read error 2009-11-20 00:00:50,429 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x1250f26319f0016 NIOServerCnxn: java.nio.channels.SocketChann el[connected local=/10.1.20.101:2181 remote=/10.1.20.101:50716] 2009-11-20 00:01:22,002 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x1250f26319f0016 2009-11-20 00:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1250f26319f0016 2009-11-20 00:01:22,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x1250f26319f0016 2009-11-20 03:41:04,766 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x124bc250d700790 due to java.io.IOExceptio n: Read error 2009-11-20 03:41:04,864 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x1250f26319f 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1250f26319f 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x124bc250d7007a2 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x124bc250d7007a2 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x124bc250d700794 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x124bc250d700794
Re: Please disregard - Re: Exception on close of connection (WAS - Re: c client on win32)
I think now I can explain the session expirations; hbase cilents especially up in a map/reduce task can exit without closing the zk session. Will fix. St.Ack On Fri, Nov 20, 2009 at 4:45 PM, Patrick Hunt ph...@apache.org wrote: Yes, right, that's what I meant to say - what is causing the client to die, throwing read error on the server side, and then later you end up with the session expiration because the client was not closed gracefully. (thanks mahadev) Patrick Mahadev Konar wrote: That should be the case since the server gets an exception reading from the socket - meaning the client went away (not gracefully) and that leads the server to expire the session in 30 seconds. mahadev On 11/20/09 4:35 PM, Patrick Hunt ph...@apache.org wrote: Oops too late. ;-) I'm perplexed as to why you see all these expirations though. Are you killing your clients, ie not cleaning up the ZK session gracefully via close()? Patrick stack wrote: Please disregard. Sorry for the noise (Patrick, of note, I am seeing this session timeout on a cluster other than Zhenyus). St.Ack On Fri, Nov 20, 2009 at 4:24 PM, stack st...@duboce.net wrote: Sorry, I had a bad subject on the below question. St.Ack On Fri, Nov 20, 2009 at 4:22 PM, stack st...@duboce.net wrote: Below an excerpt from a single node zk quorum that was at heart of a small hbase cluster. Unfortunately the log is not at DEBUG level (I've asked the gentleman to up the log level meantime). What it seems to be reporting is that an exception while closing a session caused it to timeout all connected sessions. Here is the line that mentions the exception on close of session. There is no stack trace: 2009-11-20 03:41:04,766 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x124bc250d700790 due to java.io.IOException: Read error Is it correct that an error at this stage throws out all connected sessions? Thanks, St.Ack 2009-11-20 00:00:04,948 INFO org.apache.zookeeper.server.NIOServerCnxn: Connected to /10.1.20.101:50716 lastZxid 0 2009-11-20 00:00:04,982 INFO org.apache.zookeeper.server.NIOServerCnxn: Creating new session 0x1250f26319f0016 2009-11-20 00:00:05,051 INFO org.apache.zookeeper.server.NIOServerCnxn: Finished init of 0x1250f26319f0016 valid:true 2009-11-20 00:00:05,051 WARN org.apache.zookeeper.server.PrepRequestProcessor: Got exception when processing sessionid:0x1250f26319f0016 type:create c xid:0x1 zxid:0xfffe txntype:unknown n/a org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProces sor.java:245) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.j ava:114) 2009-11-20 00:00:40,150 WARN org.apache.zookeeper.server.PrepRequestProcessor: Got exception when processing sessionid:0x1250f26319f0016 type:create c xid:0x4 zxid:0xfffe txntype:unknown n/a org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProces sor.java:245) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.j ava:114) 2009-11-20 00:00:50,428 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x1250f26319f0016 due to java.io.IOExceptio n: Read error 2009-11-20 00:00:50,429 INFO org.apache.zookeeper.server.NIOServerCnxn: closing session:0x1250f26319f0016 NIOServerCnxn: java.nio.channels.SocketChann el[connected local=/10.1.20.101:2181 remote=/10.1.20.101:50716] 2009-11-20 00:01:22,002 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x1250f26319f0016 2009-11-20 00:01:22,002 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1250f26319f0016 2009-11-20 00:01:22,002 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination request for id: 0x1250f26319f0016 2009-11-20 03:41:04,766 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x124bc250d700790 due to java.io.IOExceptio n: Read error 2009-11-20 03:41:04,864 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x1250f26319f 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x1250f26319f 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x124bc250d7007a2 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x124bc250d7007a2 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.SessionTrackerImpl: Expiring session 0x124bc250d700794 2009-11-20 03:41:04,927 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x124bc250d700794