Re: odd error message
Ok, I think this is possible. So here is what happens currently. This has been a long standing bug and should be fixed in 3.4 https://issues.apache.org/jira/browse/ZOOKEEPER-335 A newly elected leader currently doesn't log the new leader transaction to its database In your case, the follower (the 3rd server) did log it but the leader never did. Now when you brought up the 3rd server it had the transaction log present but the leader did not have that. In that case the 3rd server cried fowl and shut down. Removing the DB is totally fine. For now, we should update our docs on 3.3 and mention that this problem might occur during upgrade and fix it in 3.4. Thanks for bringing it up Ted. Thanks mahadev On 4/20/10 2:14 PM, "Ted Dunning" wrote: > We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been > up for about a year with no problems. > > On two nodes, we killed the previous instance and started the 3.3.0 > instance. The first node was a follower and the second a leader. > > All went according to plan and no clients seemed to notice anything. The > stat command showed connections moving around as expected and all other > indicators were normal. > > When we did the third node, we saw this in the log: > > 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] - > Leader epoch 18 is less than our epoch 19 > > The third node refused all connections. > > We brought down the third node, wiped away its snapshot, restarted and it > joined without complaint. Note that the third node > was originally a follower and had never been a leader during the upgrade > process. > > Does anybody know why this happened? > > We are fully upgraded and there was no interruption to normal service, but > this seems strange.
Re: Would this work?
There are a small handful of cases where the server code will "system.exit". This is typically only if quorum communication fails in some weird, unrecoverable way. We've been working to remove this (mainly so zk can be deployed in a container) but there are still a few cases left. I don't see any server logs in that log snippet - having that detail would shed more light on why the client is unable to connect. Are you sure that the server is being started? Patrick On 04/20/2010 02:25 PM, Ted Dunning wrote: I can't comment on the details of your code (but I have run in-process ZK's in the past without problem) Operationally, however, this isn't a great idea. The problem is two-fold: a) firstly, somebody would probably like to look at Zookeeper to understand the state of your service. If the service is down, then ZK will go away. That means that Zookeeper can't be used that way and is mild to moderate on the logarithmic international suckitude scale. b) secondly, if you want to upgrade your server without upgrading Zookeeper then you still have to bounce Zookeeper. This is probably not a problem, but it can be a slight pain. c) thirdly, you can't scale your service independently of how you scale Zookeeper. This may or may not bother you, but it would bother me. d) fourthly, you will be synchronizing your server restarts with ZK's service restarts. Moving these events away from each other is likely to make them slightly more reliable. There is no failure mode that I know of that would be tickled here, but your service code will be slightly more complex since it has to make sure that ZK is up before it does stuff. If you could make the assumption that ZK is up or exit, that would be simpler. e) yes, I know that is more than two issues. That is itself an issue since any design where the number of worries is increasing so fast is suspect on larger grounds. If there are small problems cropping up at that rate, the likelihood of there being a large problem that comes up seems higher. Your choice and your mileage will vary. On Tue, Apr 20, 2010 at 1:25 PM, Avinash Lakshman< avinash.laksh...@gmail.com> wrote: This may sound weird but I want to know if there is something inherent that would preclude this from working. I want to have a thrift based service which exposes some API to read/write to certain znodes. I want ZK to run within the same process. So I will start the ZK process from within my main using QuorumPeerMain.main(). Now the implementation of my API would instantiate a ZooKeeper object and try reading/writing from specific znodes as the case may be. I tried running this and as soon as I instantiate my ZooKeeper object I get some really weird exceptions. What is wrong in this approach?
Re: Would this work?
I can't comment on the details of your code (but I have run in-process ZK's in the past without problem) Operationally, however, this isn't a great idea. The problem is two-fold: a) firstly, somebody would probably like to look at Zookeeper to understand the state of your service. If the service is down, then ZK will go away. That means that Zookeeper can't be used that way and is mild to moderate on the logarithmic international suckitude scale. b) secondly, if you want to upgrade your server without upgrading Zookeeper then you still have to bounce Zookeeper. This is probably not a problem, but it can be a slight pain. c) thirdly, you can't scale your service independently of how you scale Zookeeper. This may or may not bother you, but it would bother me. d) fourthly, you will be synchronizing your server restarts with ZK's service restarts. Moving these events away from each other is likely to make them slightly more reliable. There is no failure mode that I know of that would be tickled here, but your service code will be slightly more complex since it has to make sure that ZK is up before it does stuff. If you could make the assumption that ZK is up or exit, that would be simpler. e) yes, I know that is more than two issues. That is itself an issue since any design where the number of worries is increasing so fast is suspect on larger grounds. If there are small problems cropping up at that rate, the likelihood of there being a large problem that comes up seems higher. Your choice and your mileage will vary. On Tue, Apr 20, 2010 at 1:25 PM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > This may sound weird but I want to know if there is something inherent that > would preclude this from working. I want to have a thrift based service > which exposes some API to read/write to certain znodes. I want ZK to run > within the same process. So I will start the ZK process from within my main > using QuorumPeerMain.main(). Now the implementation of my API would > instantiate a ZooKeeper object and try reading/writing from specific znodes > as the case may be. I tried running this and as soon as I instantiate my > ZooKeeper object I get some really weird exceptions. What is wrong in this > approach? >
odd error message
We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been up for about a year with no problems. On two nodes, we killed the previous instance and started the 3.3.0 instance. The first node was a follower and the second a leader. All went according to plan and no clients seemed to notice anything. The stat command showed connections moving around as expected and all other indicators were normal. When we did the third node, we saw this in the log: 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] - Leader epoch 18 is less than our epoch 19 The third node refused all connections. We brought down the third node, wiped away its snapshot, restarted and it joined without complaint. Note that the third node was originally a follower and had never been a leader during the upgrade process. Does anybody know why this happened? We are fully upgraded and there was no interruption to normal service, but this seems strange.
Re: Would this work?
Hi Avinash, This mostly looks like the zookeeper client is not able to find the zookeeper server running on the port that you have specified it on. Are you sure you are running zookeeper server on the port you are passing to the zookeeper client? You can check by running Echo stat| nc localhost port To see if the server is running or not. Thanks mahadev On 4/20/10 1:25 PM, "Avinash Lakshman" wrote: > Hi All > > This may sound weird but I want to know if there is something inherent that > would preclude this from working. I want to have a thrift based service > which exposes some API to read/write to certain znodes. I want ZK to run > within the same process. So I will start the ZK process from within my main > using QuorumPeerMain.main(). Now the implementation of my API would > instantiate a ZooKeeper object and try reading/writing from specific znodes > as the case may be. I tried running this and as soon as I instantiate my > ZooKeeper object I get some really weird exceptions. What is wrong in this > approach? Here is a snapshot of the stack trace: > > 2010-04-20 13:14:31,551 - INFO [pool-1-thread-1:environm...@97] - Client > environment:zookeeper.version=3.1.1-755636, built on 03/18/2009 16:52 GMT > 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client > environment:host.name=a.b.c.com > 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.version=1.7.0-ea > 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.vendor=Sun Microsystems Inc. > 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.home=/usr/local/jdk1.7-drop/jre > 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.class.path=config/:lib/zookeeper-3.1.1.jar:lib/log4j-1.2.15.j > ar:lib/antlr-2.7.7.jar:li > b/antlr-3.0.1.jar:lib/atlas.jar:lib/commons-cli-1.1.jar:lib/DiscoveryService.j > ar:lib/fb303.jar:lib/if-java.jar:lib/jline-0.9.94.jar:lib/stringtemplate-3.0.j > ar:lib/thrift.jar:lib > /atlasimpl.jar:lib/slf4j-api-1.5.8.jar:lib/slf4j-log4j12-1.5.8.jar > 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.library.path=/usr/local/jdk1.7-drop/jre/lib/amd64/server:/usr > /local/jdk1.7-drop/jre/li > b/amd64:/usr/local/jdk1.7-drop/jre/../lib/amd64:/usr/java/packages/lib/amd64:/ > lib:/usr/lib > 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.io.tmpdir=/tmp > 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.compiler= > 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client > environment:os.name=Linux > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:os.arch=amd64 > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:os.version=2.6.12-1.1398_FC4smp > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:user.name=root > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:user.home=/root > 2010-04-20 13:14:31,556 - INFO [pool-1-thread-1:environm...@97] - Client > environment:user.dir=/var/myservice > 2010-04-20 13:14:31,557 - INFO [pool-1-thread-1:zookee...@341] - Initiating > client connection, host=a.b.c.com sessionTimeout=1 > watcher=a.b.c.mycl...@716c9867 > 2010-04-20 13:14:31,558 - INFO [pool-1-thread-1:clientc...@91] - > zookeeper.disableAutoWatchReset is false > 2010-04-20 13:14:31,566 - INFO > [pool-1-thread-1-SendThread:clientcnxn$sendthr...@800] - Attempting > connection to server a.b.c.com/10.18.39.211:2181 > 2010-04-20 13:14:31,567 - WARN > [pool-1-thread-1-SendThread:clientcnxn$sendthr...@898] - Exception closing > session 0x0 to sun.nio.ch.selectionkeyi...@7b2884e0 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:864) > 2010-04-20 13:14:31,568 - WARN > [pool-1-thread-1-SendThread:clientcnxn$sendthr...@932] - Ignoring exception > during shutdown input > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:656) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:930) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:901)
Re: Would this work?
Hi Avinash - It's definitely possible to have an in-process ZK server - I've done it - but it's not always easy. Are you passing a configuration file to QuorumPeerMain.main? Are there any errors when you run that method? I think, from recollection, that QPM.main should block in the standalone case, so are you constructing the ZooKeeper object in a different thread? Are you giving the server enough time to come up? The error you have means that the server is not coming up for clients on port 2181 at 10.18.39.211. Is this the right address? cheers, Henry On 20 April 2010 13:25, Avinash Lakshman wrote: > Hi All > > This may sound weird but I want to know if there is something inherent that > would preclude this from working. I want to have a thrift based service > which exposes some API to read/write to certain znodes. I want ZK to run > within the same process. So I will start the ZK process from within my main > using QuorumPeerMain.main(). Now the implementation of my API would > instantiate a ZooKeeper object and try reading/writing from specific znodes > as the case may be. I tried running this and as soon as I instantiate my > ZooKeeper object I get some really weird exceptions. What is wrong in this > approach? Here is a snapshot of the stack trace: > > 2010-04-20 13:14:31,551 - INFO [pool-1-thread-1:environm...@97] - Client > environment:zookeeper.version=3.1.1-755636, built on 03/18/2009 16:52 GMT > 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client > environment:host.name=a.b.c.com > 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.version=1.7.0-ea > 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.vendor=Sun Microsystems Inc. > 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.home=/usr/local/jdk1.7-drop/jre > 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client > > environment:java.class.path=config/:lib/zookeeper-3.1.1.jar:lib/log4j-1.2.15.jar:lib/antlr-2.7.7.jar:li > > b/antlr-3.0.1.jar:lib/atlas.jar:lib/commons-cli-1.1.jar:lib/DiscoveryService.jar:lib/fb303.jar:lib/if-java.jar:lib/jline-0.9.94.jar:lib/stringtemplate-3.0.jar:lib/thrift.jar:lib > /atlasimpl.jar:lib/slf4j-api-1.5.8.jar:lib/slf4j-log4j12-1.5.8.jar > 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client > > environment:java.library.path=/usr/local/jdk1.7-drop/jre/lib/amd64/server:/usr/local/jdk1.7-drop/jre/li > > b/amd64:/usr/local/jdk1.7-drop/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib > 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.io.tmpdir=/tmp > 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client > environment:java.compiler= > 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client > environment:os.name=Linux > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:os.arch=amd64 > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:os.version=2.6.12-1.1398_FC4smp > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:user.name=root > 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client > environment:user.home=/root > 2010-04-20 13:14:31,556 - INFO [pool-1-thread-1:environm...@97] - Client > environment:user.dir=/var/myservice > 2010-04-20 13:14:31,557 - INFO [pool-1-thread-1:zookee...@341] - > Initiating > client connection, host=a.b.c.com sessionTimeout=1 > watcher=a.b.c.mycl...@716c9867 > 2010-04-20 13:14:31,558 - INFO [pool-1-thread-1:clientc...@91] - > zookeeper.disableAutoWatchReset is false > 2010-04-20 13:14:31,566 - INFO > [pool-1-thread-1-SendThread:clientcnxn$sendthr...@800] - Attempting > connection to server a.b.c.com/10.18.39.211:2181 > 2010-04-20 13:14:31,567 - WARN > [pool-1-thread-1-SendThread:clientcnxn$sendthr...@898] - Exception > closing > session 0x0 to sun.nio.ch.selectionkeyi...@7b2884e0 > java.net.ConnectException: Connection refused >at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) >at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:864) > 2010-04-20 13:14:31,568 - WARN > [pool-1-thread-1-SendThread:clientcnxn$sendthr...@932] - Ignoring > exception > during shutdown input > java.nio.channels.ClosedChannelException >at > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:656) >at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378) >at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:930) >at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:901) > -- Henry Robinson Software Engineer Cloudera 415-994-6679
Would this work?
Hi All This may sound weird but I want to know if there is something inherent that would preclude this from working. I want to have a thrift based service which exposes some API to read/write to certain znodes. I want ZK to run within the same process. So I will start the ZK process from within my main using QuorumPeerMain.main(). Now the implementation of my API would instantiate a ZooKeeper object and try reading/writing from specific znodes as the case may be. I tried running this and as soon as I instantiate my ZooKeeper object I get some really weird exceptions. What is wrong in this approach? Here is a snapshot of the stack trace: 2010-04-20 13:14:31,551 - INFO [pool-1-thread-1:environm...@97] - Client environment:zookeeper.version=3.1.1-755636, built on 03/18/2009 16:52 GMT 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client environment:host.name=a.b.c.com 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.version=1.7.0-ea 2010-04-20 13:14:31,552 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.vendor=Sun Microsystems Inc. 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.home=/usr/local/jdk1.7-drop/jre 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.class.path=config/:lib/zookeeper-3.1.1.jar:lib/log4j-1.2.15.jar:lib/antlr-2.7.7.jar:li b/antlr-3.0.1.jar:lib/atlas.jar:lib/commons-cli-1.1.jar:lib/DiscoveryService.jar:lib/fb303.jar:lib/if-java.jar:lib/jline-0.9.94.jar:lib/stringtemplate-3.0.jar:lib/thrift.jar:lib /atlasimpl.jar:lib/slf4j-api-1.5.8.jar:lib/slf4j-log4j12-1.5.8.jar 2010-04-20 13:14:31,553 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.library.path=/usr/local/jdk1.7-drop/jre/lib/amd64/server:/usr/local/jdk1.7-drop/jre/li b/amd64:/usr/local/jdk1.7-drop/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.io.tmpdir=/tmp 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client environment:java.compiler= 2010-04-20 13:14:31,554 - INFO [pool-1-thread-1:environm...@97] - Client environment:os.name=Linux 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client environment:os.arch=amd64 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client environment:os.version=2.6.12-1.1398_FC4smp 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client environment:user.name=root 2010-04-20 13:14:31,555 - INFO [pool-1-thread-1:environm...@97] - Client environment:user.home=/root 2010-04-20 13:14:31,556 - INFO [pool-1-thread-1:environm...@97] - Client environment:user.dir=/var/myservice 2010-04-20 13:14:31,557 - INFO [pool-1-thread-1:zookee...@341] - Initiating client connection, host=a.b.c.com sessionTimeout=1 watcher=a.b.c.mycl...@716c9867 2010-04-20 13:14:31,558 - INFO [pool-1-thread-1:clientc...@91] - zookeeper.disableAutoWatchReset is false 2010-04-20 13:14:31,566 - INFO [pool-1-thread-1-SendThread:clientcnxn$sendthr...@800] - Attempting connection to server a.b.c.com/10.18.39.211:2181 2010-04-20 13:14:31,567 - WARN [pool-1-thread-1-SendThread:clientcnxn$sendthr...@898] - Exception closing session 0x0 to sun.nio.ch.selectionkeyi...@7b2884e0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:864) 2010-04-20 13:14:31,568 - WARN [pool-1-thread-1-SendThread:clientcnxn$sendthr...@932] - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:656) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:930) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:901)