odd error message
We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been up for about a year with no problems. On two nodes, we killed the previous instance and started the 3.3.0 instance. The first node was a follower and the second a leader. All went according to plan and no clients seemed to notice anything. The stat command showed connections moving around as expected and all other indicators were normal. When we did the third node, we saw this in the log: 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] - Leader epoch 18 is less than our epoch 19 The third node refused all connections. We brought down the third node, wiped away its snapshot, restarted and it joined without complaint. Note that the third node was originally a follower and had never been a leader during the upgrade process. Does anybody know why this happened? We are fully upgraded and there was no interruption to normal service, but this seems strange.
Re: Would this work?
I can't comment on the details of your code (but I have run in-process ZK's in the past without problem) Operationally, however, this isn't a great idea. The problem is two-fold: a) firstly, somebody would probably like to look at Zookeeper to understand the state of your service. If the service is down, then ZK will go away. That means that Zookeeper can't be used that way and is mild to moderate on the logarithmic international suckitude scale. b) secondly, if you want to upgrade your server without upgrading Zookeeper then you still have to bounce Zookeeper. This is probably not a problem, but it can be a slight pain. c) thirdly, you can't scale your service independently of how you scale Zookeeper. This may or may not bother you, but it would bother me. d) fourthly, you will be synchronizing your server restarts with ZK's service restarts. Moving these events away from each other is likely to make them slightly more reliable. There is no failure mode that I know of that would be tickled here, but your service code will be slightly more complex since it has to make sure that ZK is up before it does stuff. If you could make the assumption that ZK is up or exit, that would be simpler. e) yes, I know that is more than two issues. That is itself an issue since any design where the number of worries is increasing so fast is suspect on larger grounds. If there are small problems cropping up at that rate, the likelihood of there being a large problem that comes up seems higher. Your choice and your mileage will vary. On Tue, Apr 20, 2010 at 1:25 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: This may sound weird but I want to know if there is something inherent that would preclude this from working. I want to have a thrift based service which exposes some API to read/write to certain znodes. I want ZK to run within the same process. So I will start the ZK process from within my main using QuorumPeerMain.main(). Now the implementation of my API would instantiate a ZooKeeper object and try reading/writing from specific znodes as the case may be. I tried running this and as soon as I instantiate my ZooKeeper object I get some really weird exceptions. What is wrong in this approach?
Re: Would this work?
There are a small handful of cases where the server code will system.exit. This is typically only if quorum communication fails in some weird, unrecoverable way. We've been working to remove this (mainly so zk can be deployed in a container) but there are still a few cases left. I don't see any server logs in that log snippet - having that detail would shed more light on why the client is unable to connect. Are you sure that the server is being started? Patrick On 04/20/2010 02:25 PM, Ted Dunning wrote: I can't comment on the details of your code (but I have run in-process ZK's in the past without problem) Operationally, however, this isn't a great idea. The problem is two-fold: a) firstly, somebody would probably like to look at Zookeeper to understand the state of your service. If the service is down, then ZK will go away. That means that Zookeeper can't be used that way and is mild to moderate on the logarithmic international suckitude scale. b) secondly, if you want to upgrade your server without upgrading Zookeeper then you still have to bounce Zookeeper. This is probably not a problem, but it can be a slight pain. c) thirdly, you can't scale your service independently of how you scale Zookeeper. This may or may not bother you, but it would bother me. d) fourthly, you will be synchronizing your server restarts with ZK's service restarts. Moving these events away from each other is likely to make them slightly more reliable. There is no failure mode that I know of that would be tickled here, but your service code will be slightly more complex since it has to make sure that ZK is up before it does stuff. If you could make the assumption that ZK is up or exit, that would be simpler. e) yes, I know that is more than two issues. That is itself an issue since any design where the number of worries is increasing so fast is suspect on larger grounds. If there are small problems cropping up at that rate, the likelihood of there being a large problem that comes up seems higher. Your choice and your mileage will vary. On Tue, Apr 20, 2010 at 1:25 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: This may sound weird but I want to know if there is something inherent that would preclude this from working. I want to have a thrift based service which exposes some API to read/write to certain znodes. I want ZK to run within the same process. So I will start the ZK process from within my main using QuorumPeerMain.main(). Now the implementation of my API would instantiate a ZooKeeper object and try reading/writing from specific znodes as the case may be. I tried running this and as soon as I instantiate my ZooKeeper object I get some really weird exceptions. What is wrong in this approach?
Re: odd error message
Ok, I think this is possible. So here is what happens currently. This has been a long standing bug and should be fixed in 3.4 https://issues.apache.org/jira/browse/ZOOKEEPER-335 A newly elected leader currently doesn't log the new leader transaction to its database In your case, the follower (the 3rd server) did log it but the leader never did. Now when you brought up the 3rd server it had the transaction log present but the leader did not have that. In that case the 3rd server cried fowl and shut down. Removing the DB is totally fine. For now, we should update our docs on 3.3 and mention that this problem might occur during upgrade and fix it in 3.4. Thanks for bringing it up Ted. Thanks mahadev On 4/20/10 2:14 PM, Ted Dunning ted.dunn...@gmail.com wrote: We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been up for about a year with no problems. On two nodes, we killed the previous instance and started the 3.3.0 instance. The first node was a follower and the second a leader. All went according to plan and no clients seemed to notice anything. The stat command showed connections moving around as expected and all other indicators were normal. When we did the third node, we saw this in the log: 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] - Leader epoch 18 is less than our epoch 19 The third node refused all connections. We brought down the third node, wiped away its snapshot, restarted and it joined without complaint. Note that the third node was originally a follower and had never been a leader during the upgrade process. Does anybody know why this happened? We are fully upgraded and there was no interruption to normal service, but this seems strange.