Re: odd error message

2010-04-20 Thread Mahadev Konar
Ok, I think this is possible.
So here is what happens currently. This has been a long standing bug and
should be fixed in 3.4

https://issues.apache.org/jira/browse/ZOOKEEPER-335

A newly elected leader currently doesn't log the new leader transaction to
its database

In your case, the follower (the 3rd server) did log it but the leader never
did. Now when you brought up the 3rd server it had the transaction log
present but the leader did not have that. In that case the 3rd server cried
fowl and shut down.

Removing the DB is totally fine. For now, we should update our docs on 3.3
and mention that this problem might occur during upgrade and fix it in 3.4.


Thanks for bringing it up Ted.


Thanks
mahadev

On 4/20/10 2:14 PM, "Ted Dunning"  wrote:

> We have just done an upgrade of ZK to 3.3.0.  Previous to this, ZK has been
> up for about a year with no problems.
> 
> On two nodes, we killed the previous instance and started the 3.3.0
> instance.  The first node was a follower and the second a leader.
> 
> All went according to plan and no clients seemed to notice anything.  The
> stat command showed connections moving around as expected and all other
> indicators were normal.
> 
> When we did the third node, we saw this in the log:
> 
> 2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] -
> Leader epoch 18 is less than our epoch 19
> 
> The third node refused all connections.
> 
> We brought down the third node, wiped away its snapshot, restarted and it
> joined without complaint.  Note that the third node
> was originally a follower and had never been a leader during the upgrade
> process.
> 
> Does anybody know why this happened?
> 
> We are fully upgraded and there was no interruption to normal service, but
> this seems strange.



Re: Would this work?

2010-04-20 Thread Patrick Hunt
There are a small handful of cases where the server code will 
"system.exit". This is typically only if quorum communication fails in 
some weird, unrecoverable way. We've been working to remove this (mainly 
so zk can be deployed in a container) but there are still a few cases left.


I don't see any server logs in that log snippet - having that detail 
would shed more light on why the client is unable to connect. Are you 
sure that the server is being started?


Patrick

On 04/20/2010 02:25 PM, Ted Dunning wrote:

I can't comment on the details of your code (but I have run in-process ZK's
in the past without problem)

Operationally, however, this isn't a great idea.  The problem is two-fold:

a) firstly, somebody would probably like to look at Zookeeper to understand
the state of your service.  If the service is
down, then ZK will go away.  That means that Zookeeper can't be used that
way and is mild to moderate
on the logarithmic international suckitude scale.

b) secondly, if you want to upgrade your server without upgrading Zookeeper
then you still have to bounce
Zookeeper.  This is probably not a problem, but it can be a slight pain.

c) thirdly, you can't scale your service independently of how you scale
Zookeeper.  This may or may
not bother you, but it would bother me.

d) fourthly, you will be synchronizing your server restarts with ZK's
service restarts.  Moving these events
away from each other is likely to make them slightly more reliable.  There
is no failure mode that I know
of that would be tickled here, but your service code will be slightly more
complex since it has to make sure
that ZK is up before it does stuff.  If you could make the assumption that
ZK is up or exit, that would be
simpler.

e) yes, I know that is more than two issues.  That is itself an issue since
any design where the number of worries
is increasing so fast is suspect on larger grounds.  If there are small
problems cropping up at that rate, the likelihood
of there being a large problem that comes up seems higher.

Your choice and your mileage will vary.

On Tue, Apr 20, 2010 at 1:25 PM, Avinash Lakshman<
avinash.laksh...@gmail.com>  wrote:


This may sound weird but I want to know if there is something inherent that
would preclude this from working. I want to have a thrift based service
which exposes some API to read/write to certain znodes. I want ZK to run
within the same process. So I will start the ZK process from within my main
using QuorumPeerMain.main(). Now the implementation of my API would
instantiate a ZooKeeper object and try reading/writing from specific znodes
as the case may be. I tried running this and as soon as I instantiate my
ZooKeeper object I get some really weird exceptions. What is wrong in this
approach?





Re: Would this work?

2010-04-20 Thread Ted Dunning
I can't comment on the details of your code (but I have run in-process ZK's
in the past without problem)

Operationally, however, this isn't a great idea.  The problem is two-fold:

a) firstly, somebody would probably like to look at Zookeeper to understand
the state of your service.  If the service is
down, then ZK will go away.  That means that Zookeeper can't be used that
way and is mild to moderate
on the logarithmic international suckitude scale.

b) secondly, if you want to upgrade your server without upgrading Zookeeper
then you still have to bounce
Zookeeper.  This is probably not a problem, but it can be a slight pain.

c) thirdly, you can't scale your service independently of how you scale
Zookeeper.  This may or may
not bother you, but it would bother me.

d) fourthly, you will be synchronizing your server restarts with ZK's
service restarts.  Moving these events
away from each other is likely to make them slightly more reliable.  There
is no failure mode that I know
of that would be tickled here, but your service code will be slightly more
complex since it has to make sure
that ZK is up before it does stuff.  If you could make the assumption that
ZK is up or exit, that would be
simpler.

e) yes, I know that is more than two issues.  That is itself an issue since
any design where the number of worries
is increasing so fast is suspect on larger grounds.  If there are small
problems cropping up at that rate, the likelihood
of there being a large problem that comes up seems higher.

Your choice and your mileage will vary.

On Tue, Apr 20, 2010 at 1:25 PM, Avinash Lakshman <
avinash.laksh...@gmail.com> wrote:

> This may sound weird but I want to know if there is something inherent that
> would preclude this from working. I want to have a thrift based service
> which exposes some API to read/write to certain znodes. I want ZK to run
> within the same process. So I will start the ZK process from within my main
> using QuorumPeerMain.main(). Now the implementation of my API would
> instantiate a ZooKeeper object and try reading/writing from specific znodes
> as the case may be. I tried running this and as soon as I instantiate my
> ZooKeeper object I get some really weird exceptions. What is wrong in this
> approach?
>


odd error message

2010-04-20 Thread Ted Dunning
We have just done an upgrade of ZK to 3.3.0.  Previous to this, ZK has been
up for about a year with no problems.

On two nodes, we killed the previous instance and started the 3.3.0
instance.  The first node was a follower and the second a leader.

All went according to plan and no clients seemed to notice anything.  The
stat command showed connections moving around as expected and all other
indicators were normal.

When we did the third node, we saw this in the log:

2010-04-20 14:07:49,010 - FATAL [QuorumPeer:/0.0.0.0:2181:follo...@71] -
Leader epoch 18 is less than our epoch 19

The third node refused all connections.

We brought down the third node, wiped away its snapshot, restarted and it
joined without complaint.  Note that the third node
was originally a follower and had never been a leader during the upgrade
process.

Does anybody know why this happened?

We are fully upgraded and there was no interruption to normal service, but
this seems strange.


Re: Would this work?

2010-04-20 Thread Mahadev Konar
Hi Avinash,
 This mostly looks like the zookeeper client is not able to find the
zookeeper server running on the port that you have specified it on.

Are you sure you are running zookeeper server on the port you are passing to
the zookeeper client? You can check by running

Echo stat| nc localhost port


To see if the server is running or not.


Thanks
mahadev


On 4/20/10 1:25 PM, "Avinash Lakshman"  wrote:

> Hi All
> 
> This may sound weird but I want to know if there is something inherent that
> would preclude this from working. I want to have a thrift based service
> which exposes some API to read/write to certain znodes. I want ZK to run
> within the same process. So I will start the ZK process from within my main
> using QuorumPeerMain.main(). Now the implementation of my API would
> instantiate a ZooKeeper object and try reading/writing from specific znodes
> as the case may be. I tried running this and as soon as I instantiate my
> ZooKeeper object I get some really weird exceptions. What is wrong in this
> approach? Here is a snapshot of the stack trace:
> 
> 2010-04-20 13:14:31,551 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:zookeeper.version=3.1.1-755636, built on 03/18/2009 16:52 GMT
> 2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:host.name=a.b.c.com
> 2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.version=1.7.0-ea
> 2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.vendor=Sun Microsystems Inc.
> 2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.home=/usr/local/jdk1.7-drop/jre
> 2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.class.path=config/:lib/zookeeper-3.1.1.jar:lib/log4j-1.2.15.j
> ar:lib/antlr-2.7.7.jar:li
> b/antlr-3.0.1.jar:lib/atlas.jar:lib/commons-cli-1.1.jar:lib/DiscoveryService.j
> ar:lib/fb303.jar:lib/if-java.jar:lib/jline-0.9.94.jar:lib/stringtemplate-3.0.j
> ar:lib/thrift.jar:lib
> /atlasimpl.jar:lib/slf4j-api-1.5.8.jar:lib/slf4j-log4j12-1.5.8.jar
> 2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.library.path=/usr/local/jdk1.7-drop/jre/lib/amd64/server:/usr
> /local/jdk1.7-drop/jre/li
> b/amd64:/usr/local/jdk1.7-drop/jre/../lib/amd64:/usr/java/packages/lib/amd64:/
> lib:/usr/lib
> 2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.io.tmpdir=/tmp
> 2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.compiler=
> 2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:os.name=Linux
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:os.arch=amd64
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:os.version=2.6.12-1.1398_FC4smp
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:user.name=root
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:user.home=/root
> 2010-04-20 13:14:31,556 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:user.dir=/var/myservice
> 2010-04-20 13:14:31,557 - INFO  [pool-1-thread-1:zookee...@341] - Initiating
> client connection, host=a.b.c.com sessionTimeout=1
> watcher=a.b.c.mycl...@716c9867
> 2010-04-20 13:14:31,558 - INFO  [pool-1-thread-1:clientc...@91] -
> zookeeper.disableAutoWatchReset is false
> 2010-04-20 13:14:31,566 - INFO
>  [pool-1-thread-1-SendThread:clientcnxn$sendthr...@800] - Attempting
> connection to server a.b.c.com/10.18.39.211:2181
> 2010-04-20 13:14:31,567 - WARN
>  [pool-1-thread-1-SendThread:clientcnxn$sendthr...@898] - Exception closing
> session 0x0 to sun.nio.ch.selectionkeyi...@7b2884e0
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:864)
> 2010-04-20 13:14:31,568 - WARN
>  [pool-1-thread-1-SendThread:clientcnxn$sendthr...@932] - Ignoring exception
> during shutdown input
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:656)
> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:930)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:901)



Re: Would this work?

2010-04-20 Thread Henry Robinson
Hi Avinash -

It's definitely possible to have an in-process ZK server - I've done it -
but it's not always easy. Are you passing a configuration file to
QuorumPeerMain.main? Are there any errors when you run that method? I think,
from recollection, that QPM.main should block in the standalone case, so are
you constructing the ZooKeeper object in a different thread? Are you giving
the server enough time to come up?

The error you have means that the server is not coming up for clients on
port 2181 at 10.18.39.211. Is this the right address?

cheers,
Henry

On 20 April 2010 13:25, Avinash Lakshman  wrote:

> Hi All
>
> This may sound weird but I want to know if there is something inherent that
> would preclude this from working. I want to have a thrift based service
> which exposes some API to read/write to certain znodes. I want ZK to run
> within the same process. So I will start the ZK process from within my main
> using QuorumPeerMain.main(). Now the implementation of my API would
> instantiate a ZooKeeper object and try reading/writing from specific znodes
> as the case may be. I tried running this and as soon as I instantiate my
> ZooKeeper object I get some really weird exceptions. What is wrong in this
> approach? Here is a snapshot of the stack trace:
>
> 2010-04-20 13:14:31,551 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:zookeeper.version=3.1.1-755636, built on 03/18/2009 16:52 GMT
> 2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:host.name=a.b.c.com
> 2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.version=1.7.0-ea
> 2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.vendor=Sun Microsystems Inc.
> 2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.home=/usr/local/jdk1.7-drop/jre
> 2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
>
> environment:java.class.path=config/:lib/zookeeper-3.1.1.jar:lib/log4j-1.2.15.jar:lib/antlr-2.7.7.jar:li
>
> b/antlr-3.0.1.jar:lib/atlas.jar:lib/commons-cli-1.1.jar:lib/DiscoveryService.jar:lib/fb303.jar:lib/if-java.jar:lib/jline-0.9.94.jar:lib/stringtemplate-3.0.jar:lib/thrift.jar:lib
> /atlasimpl.jar:lib/slf4j-api-1.5.8.jar:lib/slf4j-log4j12-1.5.8.jar
> 2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
>
> environment:java.library.path=/usr/local/jdk1.7-drop/jre/lib/amd64/server:/usr/local/jdk1.7-drop/jre/li
>
> b/amd64:/usr/local/jdk1.7-drop/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
> 2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.io.tmpdir=/tmp
> 2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:java.compiler=
> 2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:os.name=Linux
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:os.arch=amd64
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:os.version=2.6.12-1.1398_FC4smp
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:user.name=root
> 2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:user.home=/root
> 2010-04-20 13:14:31,556 - INFO  [pool-1-thread-1:environm...@97] - Client
> environment:user.dir=/var/myservice
> 2010-04-20 13:14:31,557 - INFO  [pool-1-thread-1:zookee...@341] -
> Initiating
> client connection, host=a.b.c.com sessionTimeout=1
> watcher=a.b.c.mycl...@716c9867
> 2010-04-20 13:14:31,558 - INFO  [pool-1-thread-1:clientc...@91] -
> zookeeper.disableAutoWatchReset is false
> 2010-04-20 13:14:31,566 - INFO
>  [pool-1-thread-1-SendThread:clientcnxn$sendthr...@800] - Attempting
> connection to server a.b.c.com/10.18.39.211:2181
> 2010-04-20 13:14:31,567 - WARN
>  [pool-1-thread-1-SendThread:clientcnxn$sendthr...@898] - Exception
> closing
> session 0x0 to sun.nio.ch.selectionkeyi...@7b2884e0
> java.net.ConnectException: Connection refused
>at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:864)
> 2010-04-20 13:14:31,568 - WARN
>  [pool-1-thread-1-SendThread:clientcnxn$sendthr...@932] - Ignoring
> exception
> during shutdown input
> java.nio.channels.ClosedChannelException
>at
> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:656)
>at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378)
>at
> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:930)
>at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:901)
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679


Would this work?

2010-04-20 Thread Avinash Lakshman
Hi All

This may sound weird but I want to know if there is something inherent that
would preclude this from working. I want to have a thrift based service
which exposes some API to read/write to certain znodes. I want ZK to run
within the same process. So I will start the ZK process from within my main
using QuorumPeerMain.main(). Now the implementation of my API would
instantiate a ZooKeeper object and try reading/writing from specific znodes
as the case may be. I tried running this and as soon as I instantiate my
ZooKeeper object I get some really weird exceptions. What is wrong in this
approach? Here is a snapshot of the stack trace:

2010-04-20 13:14:31,551 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:zookeeper.version=3.1.1-755636, built on 03/18/2009 16:52 GMT
2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:host.name=a.b.c.com
2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.version=1.7.0-ea
2010-04-20 13:14:31,552 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.vendor=Sun Microsystems Inc.
2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.home=/usr/local/jdk1.7-drop/jre
2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.class.path=config/:lib/zookeeper-3.1.1.jar:lib/log4j-1.2.15.jar:lib/antlr-2.7.7.jar:li
b/antlr-3.0.1.jar:lib/atlas.jar:lib/commons-cli-1.1.jar:lib/DiscoveryService.jar:lib/fb303.jar:lib/if-java.jar:lib/jline-0.9.94.jar:lib/stringtemplate-3.0.jar:lib/thrift.jar:lib
/atlasimpl.jar:lib/slf4j-api-1.5.8.jar:lib/slf4j-log4j12-1.5.8.jar
2010-04-20 13:14:31,553 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.library.path=/usr/local/jdk1.7-drop/jre/lib/amd64/server:/usr/local/jdk1.7-drop/jre/li
b/amd64:/usr/local/jdk1.7-drop/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.io.tmpdir=/tmp
2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:java.compiler=
2010-04-20 13:14:31,554 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:os.name=Linux
2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:os.arch=amd64
2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:os.version=2.6.12-1.1398_FC4smp
2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:user.name=root
2010-04-20 13:14:31,555 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:user.home=/root
2010-04-20 13:14:31,556 - INFO  [pool-1-thread-1:environm...@97] - Client
environment:user.dir=/var/myservice
2010-04-20 13:14:31,557 - INFO  [pool-1-thread-1:zookee...@341] - Initiating
client connection, host=a.b.c.com sessionTimeout=1
watcher=a.b.c.mycl...@716c9867
2010-04-20 13:14:31,558 - INFO  [pool-1-thread-1:clientc...@91] -
zookeeper.disableAutoWatchReset is false
2010-04-20 13:14:31,566 - INFO
 [pool-1-thread-1-SendThread:clientcnxn$sendthr...@800] - Attempting
connection to server a.b.c.com/10.18.39.211:2181
2010-04-20 13:14:31,567 - WARN
 [pool-1-thread-1-SendThread:clientcnxn$sendthr...@898] - Exception closing
session 0x0 to sun.nio.ch.selectionkeyi...@7b2884e0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:864)
2010-04-20 13:14:31,568 - WARN
 [pool-1-thread-1-SendThread:clientcnxn$sendthr...@932] - Ignoring exception
during shutdown input
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:656)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:378)
at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:930)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:901)