NullPointerException stopping and starting Zookeeper servers
Hi, I have a replicated zookeeper services consisting of 3 zookeeper (3.0.1) servers all running on the same host for testing purposes. I've created exactly one znode in this ensemble. At this point, I stop, then restart a single zookeeper server; moving onto the next one a few seconds later. A few restarts later (about 4 is usually sufficient), I get the following exception on one of the servers, at which point it exits: java.lang.NullPointerException at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:333) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421) 2008-12-08 14:14:24,880 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:[EMAIL PROTECTED] - Shutdown called java.lang.Exception: shutdown Leader! reason: Forcing shutdown at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427) Exception in thread QuorumPeer:/0:0:0:0:0:0:0:0:2183 java.lang.NullPointerException at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427) The inputStream field is null, apparently because next is being called at line 358 even after next returns false. Having very little knowledge about the implementation, I don't know if the existence of hdr.getZxid() = zxid is supposed to be an invariant across all invocations of the server; however the following change to FileTxnLog.java seems to make the problem go away. diff FileTxnLog.java /tmp/FileTxnLog.java 358c358,359 next(); --- if (!next()) return; 447c448,450 inputStream.close(); --- if (inputStream != null) { inputStream.close(); } Is this a bug? Thanks.
Re: What happens when a server loses all its state?
Sorry, I should have been a little more explicit. At this point, the situation I'm considering is this; out of 3 servers, 1 server 'A' forgets its persistent state (due to a bad disk, say) and it restarts. My guess from what I could understand/reason about the internals was that the server 'A' will re-synchronize correctly on restart, by getting the entire snapshot. I just wanted to make sure that this was a good assumption to make - or find out if I was missing corner cases where the fact that A has lost all memory could lead to inconsistencies (to take an example, in plain Paxos, no acceptor can forget the highest number prepare request to which it has responded). More generally, is it a safe assumption to make that the ZooKeeper service will maintain all its guarantees if a minority of servers lose persistent state (due to bad disks, etc) and restart at some point in the future? Thanks. Mahadev Konar wrote: Hi Thomas, If a zookeeper server loses all state and their are enough servers in the ensemble to continue a zookeeper service ( like 2 servers in the case of ensemble of 3), then the server will get the latest snapshot from the leader and continue. The idea of zookeeper persisting its state on disk is just so that it does not lose state. All the guarantees that zookeeper makes is based on the understanding that we do not lose state of the data we store on the disk. Their might be problems if we lose the state that we stored on the disk. We might lose transactions that have been committed and the ensemble might start with some snapshot in the past. You might want ot read through how zookeeper internals work. This will help you understand on why the persistence guarantees are required. http://wiki.apache.org/hadoop-data/attachments/ZooKeeper(2f)ZooKeeperPresent ations/attachments/zk-talk-upc.pdf mahadev On 12/16/08 9:45 AM, Thomas Vinod Johnson thomas.john...@sun.com wrote: What is the expected behavior if a server in a ZooKeeper service restarts with all its prior state lost? Empirically, everything seems to work*. Is this something that one can count on, as part of ZooKeeper design, or are there known conditions under which this could cause problems, either liveness or violation of ZooKeeper guarantees? I'm really most interested in a situation where a single server loses state, but insights into issues when more than one server loses state and other interesting failure scenarios are appreciated. Thanks. * The restarted server appears to catch up to the latest snapshot (from the current leader?).
Re: Simpler ZooKeeper event interface....
In the case of an active leader, L continues to send commands (whatever) to the followers. However a new leader L' has since been elected and is also sending commands to the followers. In this case it seems like either a) L should not send commands if it's not sync'd to the ensemble (and holds the leader token) or b) followers should not accept commands from non-leader (only accept from the current leader). a) seems the right way to go; if L is disconnected it should stop sending commands to the followers, if it's resync'd in time it can Seems to make sense in this particular case (I had some other cases in mind that I'm not so sure about though) Feel free to discuss... The thought is not that well formed, so perhaps it does not warrant much discussion ... This is more a realization that as far as the leader election recipe goes, if *in general* one wants to guarantee not having multiple leaders at the same time, certain assumptions have to made about timely reception and processing of events. So naively, if I wanted to use the recipe to ensure that only one system owns an IP address at any given time, I think there would be no way to guarantee it without making some assumptions about timing. In retrospect, this should have been obvious. In practice it may be simple enough to work around these problems (I actually think now that in my case an 'at least once' queue is more appropriate). Any way, like I said half baked thoughts ..