Re: feedback zkclient

2009-10-01 Thread Patrick Hunt

I started looking a bit more closely at the source, some questions:

1) I tried generating the javadocs (see my fork of the project on github 
if you want my changes to build.xml for this) but it looks like there's 
pretty much no javadoc. Some information, particularly on semantics of 
user-exposed operations would be useful (esp re my earlier README 
comment - some high level document describing the benefits, etc... of 
the library)


If I'm your proto-typical lazy developer (which I am :-) ), I'm really 
expecting some helpful docs to get me bootstrapped.


2) what purpose does ZkEventThread serve?

3) there's definitely an issue in the retryUntilConnected logic that you 
need to address


let's say you call zkclient.create, and the connection to the server is 
lost while the request is in flight. At this point ConnectionLoss is 
thrown on the client side, however you (client) have no information on 
whether the server has made the change or not. The retry method's while 
loop will re-run the create (after reconnect), and the result seen by 
the caller (user code) could be either OK or may be NODEEXISTS 
exception, there's no way to know which.


Mahadev is working on ZOOKEEPER-22 which will address this issue, but 
that's a future version, not today.


4) when I saw that you had separated zkclient and zkconnection I thought 
ah, this is interesting however when I saw the implementation I was 
confused:


a) what purpose does this separation serve?
b) I thought it was to allow multiple zkclients to share a single 
connection, however looking at zkclient.close, it closes the underlying 
connection.


5) there's a lot of wrapping of exceptions, looks like this is done in 
order to make them unchecked. Is this wise? How much simpler does it 
really make things? Esp things like interrupted exception? As you 
mentioned, one of your intents is to simplify things, but perhaps too 
simple? Some short, clear examples of usage would be helpful here to 
compare/contrast, I took a very quick look at some of the tests but that 
didn't help much. Is there a test(s) in particular that I should look at 
to see how zkclient is used, and the benefits incurred?


Regards,

Patrick

Patrick Hunt wrote:

Hi Stefan, two suggestions off the bat:

1) fill in something in the README, doesn't have to be final or 
polished, but give some insight into the what/why/how/where/goals/etc... 
to get things moving quickly for reviewers  new users.


2) you should really discuss on the dev list. It's up to you to include 
user, but apache discourages use of user for development discussion 
(plus you'll pickup more developer insight there)


Patrick

Stefan Groschupf wrote:

Hi Zookeeper developer,
it would be great if you guys could give us some feedback about our 
project zkclient.

http://github.com/joa23/zkclient
The main idea is making the life of lazy developers that only want 
minimal zk functionality much easier.


We have a functionality like zkclient mock making testing easy and 
fast without running a real zkserver, simple call back interfaces for 
the different event types, reconnecting handling in case of timeout etc.


We feel we come closer to a release so it would be great if some 
experts could have a look and give us some feedback.

Thanks,
Stefan



~~~
Hadoop training and consulting
http://www.scaleunlimited.com
http://www.101tec.com





Re: feedback zkclient

2009-10-01 Thread Peter Voss

Hi Patrick,

On 01.10.2009, at 08:57, Patrick Hunt wrote:


I started looking a bit more closely at the source, some questions:

1) I tried generating the javadocs (see my fork of the project on  
github if you want my changes to build.xml for this) but it looks  
like there's pretty much no javadoc. Some information, particularly  
on semantics of user-exposed operations would be useful (esp re my  
earlier README comment - some high level document describing the  
benefits, etc... of the library)


If I'm your proto-typical lazy developer (which I am :-) ), I'm  
really expecting some helpful docs to get me bootstrapped.


2) what purpose does ZkEventThread serve?


ZkClient updates it's connection state from the ZooKeeper events.  
Based on these it notifies listeners, updates it's connection state or  
reconnects to ZooKeeper. ZkClient has its own event thread to prevent  
dead-locks. When a listener blocks (because it waits until ZkClient  
has reconnected to Zookeeper), ZkClient wouldn't be able to receive  
the reconnect event from ZooKeeper anymore, if we had re-used the  
Zookeeper event thread to notifier listeners. See the javadoc for  
ZkEventThread for more information.


3) there's definitely an issue in the retryUntilConnected logic that  
you need to address


let's say you call zkclient.create, and the connection to the server  
is lost while the request is in flight. At this point ConnectionLoss  
is thrown on the client side, however you (client) have no  
information on whether the server has made the change or not. The  
retry method's while loop will re-run the create (after reconnect),  
and the result seen by the caller (user code) could be either OK or  
may be NODEEXISTS exception, there's no way to know which.


Mahadev is working on ZOOKEEPER-22 which will address this issue,  
but that's a future version, not today.


Good catch. I wasn't aware that nodes could still be have been created  
when receiving a ConnectionLoss. But how would you deal with that?
If we create a znode and get a ConnectionLoss exception, then wait  
until the connection is back and check if the znode is there. There is  
no way of knowing whether it was us who created the node or somebody  
else, right?

Anyway. That's definitely a design issue.

4) when I saw that you had separated zkclient and zkconnection I  
thought ah, this is interesting however when I saw the  
implementation I was confused:


a) what purpose does this separation serve?


It's just to have all ZooKeeper communication in one place, where the  
higher lever stuff is in ZkClient. That way we are able to provide an  
in-memory ZkConnection implementation that doesn't connect to a real  
ZooKeeper. This could be used for easier testing.


b) I thought it was to allow multiple zkclients to share a single  
connection, however looking at zkclient.close, it closes the  
underlying connection.


Actually each ZkClient instance maintains one ZooKeeper connection.

5) there's a lot of wrapping of exceptions, looks like this is done  
in order to make them unchecked. Is this wise? How much simpler  
does it really make things? Esp things like interrupted exception?  
As you mentioned, one of your intents is to simplify things, but  
perhaps too simple? Some short, clear examples of usage would be  
helpful here to compare/contrast, I took a very quick look at some  
of the tests but that didn't help much. Is there a test(s) in  
particular that I should look at to see how zkclient is used, and  
the benefits incurred?


Checked exceptions are very painful when you are assembling together a  
larger number of libraries (which is true for most enterprise  
applications). Either you wind up having a general throws  
Exception (which I don't really like, because it's too general) at  
most of your interfaces, or you have to wrap checked exceptions into  
runtime exceptions.


We didn't want a library to introduce yet another checked exception  
that you MUST catch or rethrow. I know that there are different  
opinions about that, but that's the idea behind this.


Similar situation for the InterruptedException. ZkClient also converts  
this to a runtime exception and makes sure that the interrupted flag  
doesn't get cleared. There are just too many existing libraries that  
have a catch (Exception e) somewhere that totally ignores that this  
would reset the interrupt flag, if e is an InterruptedException.  
Therefore we better avoid having all of the methods throwing that  
exception.


Thanks a lot for the valuable feedback,
--Peter



Regards,

Patrick

Patrick Hunt wrote:

Hi Stefan, two suggestions off the bat:
1) fill in something in the README, doesn't have to be final or  
polished, but give some insight into the what/why/how/where/goals/ 
etc... to get things moving quickly for reviewers  new users.
2) you should really discuss on the dev list. It's up to you to  
include user, but apache discourages use of user for development  

Re: feedback zkclient

2009-10-01 Thread Ted Dunning
I think that another way to say this is that zkClient is going a bit for the
Spring philosophy that if the caller can't (or won't) be handling the
situation, then they shouldn't be forced to declare it.  The Spring
jdbcTemplate is a grand example of the benefits of this.

First implementations of this policy generally are a bit too broad, though,
so this should be examined carefully.

On Thu, Oct 1, 2009 at 8:05 AM, Peter Voss i...@petervoss.org wrote:

 5) there's a lot of wrapping of exceptions, looks like this is done in
 order to make them unchecked. Is this wise? How much simpler does it
 really make things? Esp things like interrupted exception? As you mentioned,
 one of your intents is to simplify things, but perhaps too simple? Some
 short, clear examples of usage would be helpful here to compare/contrast, I
 took a very quick look at some of the tests but that didn't help much. Is
 there a test(s) in particular that I should look at to see how zkclient is
 used, and the benefits incurred?


 Checked exceptions are very painful when you are assembling together a
 larger number of libraries (which is true for most enterprise applications).
 Either you wind up having a general throws Exception (which I don't really
 like, because it's too general) at most of your interfaces, or you have to
 wrap checked exceptions into runtime exceptions.

 We didn't want a library to introduce yet another checked exception that
 you MUST catch or rethrow. I know that there are different opinions about
 that, but that's the idea behind this.

 Similar situation for the InterruptedException. ZkClient also converts this
 to a runtime exception and makes sure that the interrupted flag doesn't get
 cleared. There are just too many existing libraries that have a catch
 (Exception e) somewhere that totally ignores that this would reset the
 interrupt flag, if e is an InterruptedException. Therefore we better avoid
 having all of the methods throwing that exception.




-- 
Ted Dunning, CTO
DeepDyve


Re: feedback zkclient

2009-10-01 Thread Ted Dunning
There is not much way to totally avoid this without massive performance loss
because the connection loss could be during the the time that the
confirmation is returning.

You may be able to tell if the file is yours be examining the content and
ownership, but this is pretty implementation dependent.  In particular, it
makes queues very difficult to implement correctly.  If this happens during
the creation of an ephemeral file, the only option may be to close the
connection (thus deleting all ephemeral files) and start over.

On Thu, Oct 1, 2009 at 8:05 AM, Peter Voss i...@petervoss.org wrote:

 3) there's definitely an issue in the retryUntilConnected logic that you
 need to address

 let's say you call zkclient.create, and the connection to the server is
 lost while the request is in flight. At this point ConnectionLoss is thrown
 on the client side, however you (client) have no information on whether the
 server has made the change or not. The retry method's while loop will re-run
 the create (after reconnect), and the result seen by the caller (user code)
 could be either OK or may be NODEEXISTS exception, there's no way to know
 which.

 Mahadev is working on ZOOKEEPER-22 which will address this issue, but
 that's a future version, not today.


 Good catch. I wasn't aware that nodes could still be have been created when
 receiving a ConnectionLoss. But how would you deal with that?
 If we create a znode and get a ConnectionLoss exception, then wait until
 the connection is back and check if the znode is there. There is no way of
 knowing whether it was us who created the node or somebody else, right?




-- 
Ted Dunning, CTO
DeepDyve


Re: feedback zkclient

2009-10-01 Thread Patrick Hunt
Not to harp on this ;-) but this sounds like something that would be a 
very helpful addition to the README.


Ted Dunning wrote:

I think that another way to say this is that zkClient is going a bit for the
Spring philosophy that if the caller can't (or won't) be handling the
situation, then they shouldn't be forced to declare it.  The Spring
jdbcTemplate is a grand example of the benefits of this.

First implementations of this policy generally are a bit too broad, though,
so this should be examined carefully.

On Thu, Oct 1, 2009 at 8:05 AM, Peter Voss i...@petervoss.org wrote:


5) there's a lot of wrapping of exceptions, looks like this is done in

order to make them unchecked. Is this wise? How much simpler does it
really make things? Esp things like interrupted exception? As you mentioned,
one of your intents is to simplify things, but perhaps too simple? Some
short, clear examples of usage would be helpful here to compare/contrast, I
took a very quick look at some of the tests but that didn't help much. Is
there a test(s) in particular that I should look at to see how zkclient is
used, and the benefits incurred?


Checked exceptions are very painful when you are assembling together a
larger number of libraries (which is true for most enterprise applications).
Either you wind up having a general throws Exception (which I don't really
like, because it's too general) at most of your interfaces, or you have to
wrap checked exceptions into runtime exceptions.

We didn't want a library to introduce yet another checked exception that
you MUST catch or rethrow. I know that there are different opinions about
that, but that's the idea behind this.

Similar situation for the InterruptedException. ZkClient also converts this
to a runtime exception and makes sure that the interrupted flag doesn't get
cleared. There are just too many existing libraries that have a catch
(Exception e) somewhere that totally ignores that this would reset the
interrupt flag, if e is an InterruptedException. Therefore we better avoid
having all of the methods throwing that exception.







Re: feedback zkclient

2009-10-01 Thread Patrick Hunt

Ted Dunning wrote:

You may be able to tell if the file is yours be examining the content and
ownership, but this is pretty implementation dependent.  In particular, it
makes queues very difficult to implement correctly.  If this happens during
the creation of an ephemeral file, the only option may be to close the
connection (thus deleting all ephemeral files) and start over.


One nice thing about ephemeral is that the Stat contains the owner 
sessionid. As you say, it's highly implementation dependent. It's also 
something we recognize is a problem for users, we've slated it for 3.3.0

http://issues.apache.org/jira/browse/ZOOKEEPER-22

Patrick




On Thu, Oct 1, 2009 at 8:05 AM, Peter Voss i...@petervoss.org wrote:


3) there's definitely an issue in the retryUntilConnected logic that you

need to address

let's say you call zkclient.create, and the connection to the server is
lost while the request is in flight. At this point ConnectionLoss is thrown
on the client side, however you (client) have no information on whether the
server has made the change or not. The retry method's while loop will re-run
the create (after reconnect), and the result seen by the caller (user code)
could be either OK or may be NODEEXISTS exception, there's no way to know
which.

Mahadev is working on ZOOKEEPER-22 which will address this issue, but
that's a future version, not today.


Good catch. I wasn't aware that nodes could still be have been created when
receiving a ConnectionLoss. But how would you deal with that?
If we create a znode and get a ConnectionLoss exception, then wait until
the connection is back and check if the znode is there. There is no way of
knowing whether it was us who created the node or somebody else, right?







Re: feedback zkclient

2009-10-01 Thread Ted Dunning
That looks really lovely.

Judging by history and that fact that only 40/127 issues are resolved, 3.3
is probably 3-6 months away.  Is that a fair assessment?

On Thu, Oct 1, 2009 at 11:13 AM, Patrick Hunt ph...@apache.org wrote:

 One nice thing about ephemeral is that the Stat contains the owner
 sessionid. As you say, it's highly implementation dependent. It's also
 something we recognize is a problem for users, we've slated it for 3.3.0
 http://issues.apache.org/jira/browse/ZOOKEEPER-22




-- 
Ted Dunning, CTO
DeepDyve


Re: feedback zkclient

2009-10-01 Thread Patrick Hunt

Ted Dunning wrote:

Judging by history and that fact that only 40/127 issues are resolved, 3.3
is probably 3-6 months away.  Is that a fair assessment?


Yes, that's fair.

Patrick


On Thu, Oct 1, 2009 at 11:13 AM, Patrick Hunt ph...@apache.org wrote:


One nice thing about ephemeral is that the Stat contains the owner
sessionid. As you say, it's highly implementation dependent. It's also
something we recognize is a problem for users, we've slated it for 3.3.0
http://issues.apache.org/jira/browse/ZOOKEEPER-22







Re: How do we find the Server the client is connected to?

2009-10-01 Thread Patrick Hunt
That detail is purposefully not exposed through the client api, however 
it is output to the log on connection establishment.


Why would your client code need to know which server in the ensemble it 
is connected to?


Patrick

Rob Baccus wrote:

How do I determine the server the client is connected to?  It is not
exposed as far as I can see in either the ZooKeep object or the
ClentCnxn object.  I did find on line 790 in ClientCnxn.StartConnect()
method the place the actual server connection is happening but that is
not exposed.

Rob Baccus
425-201-3812




RE: How do we find the Server the client is connected to?

2009-10-01 Thread Todd Greenwood
Failover testing.

 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Thursday, October 01, 2009 3:44 PM
 To: zookeeper-user@hadoop.apache.org; Rob Baccus
 Subject: Re: How do we find the Server the client is connected to?
 
 That detail is purposefully not exposed through the client api,
however
 it is output to the log on connection establishment.
 
 Why would your client code need to know which server in the ensemble
it
 is connected to?
 
 Patrick
 
 Rob Baccus wrote:
  How do I determine the server the client is connected to?  It is not
  exposed as far as I can see in either the ZooKeep object or the
  ClentCnxn object.  I did find on line 790 in
ClientCnxn.StartConnect()
  method the place the actual server connection is happening but that
is
  not exposed.
 
  Rob Baccus
  425-201-3812
 
 


problem starting ensemble mode

2009-10-01 Thread Hector Yuen
Hi all,

I am trying to start zookeeper in two nodes, the configuration file I have
is

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=hec-bp1:2888:3888
server.2=hec-bp2:2888:3888


i also have two files /var/zookeeper/myid  on each of the machines, the
files contain 1 and 2 on each of the servers


When I start, I get the following

Starting zookeeper ...
STARTED
hec...@hec-bp2:/zookeeper$ 2009-10-01 15:48:15,786 - INFO
[main:quorumpeercon...@80] - Reading configuration from:
/zookeeper/bin/../conf/zoo.cfg
2009-10-01 15:48:15,882 - INFO  [main:quorumpeercon...@232] - Defaulting to
majority quorums
2009-10-01 15:48:15,899 - INFO  [main:quorumpeerm...@118] - Starting quorum
peer
2009-10-01 15:48:15,943 - INFO  [Thread-1:quorumcnxmanager$liste...@409] -
My election bind port: 3888
2009-10-01 15:48:15,961 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@487] - LOOKING
2009-10-01 15:48:15,963 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@579] - New election: -1
2009-10-01 15:48:15,978 - WARN  [WorkerSender Thread:quorumcnxmana...@336] -
Cannot open channel to 1 at election address
hec-bp1.admin.nimblestorage.com/10.12.6.192:3888
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
at java.nio.channels.SocketChannel.open(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
at java.lang.Thread.run(Unknown Source)
2009-10-01 15:48:15,981 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@618] - Notification: 2,
-1, 1, 2, LOOKING, LOOKING, 2
2009-10-01 15:48:15,981 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@642] - Adding vote
2009-10-01 15:48:16,184 - WARN
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorumcnxmana...@336] - Cannot open
channel to 1 at election address
hec-bp1.admin.nimblestorage.com/10.12.6.192:3888


I can expect these kind of messages when the other server hasn't been
started, but even after a while keeps sending these messages.

I can ping and ssh between the machines.
I noticed that just port 3888 is listening when I do netstat -an, why is
port 2888 not being used?

Any ideas?

Thanks
-h


Re: How do we find the Server the client is connected to?

2009-10-01 Thread Ted Dunning
Grovel the logs.

On Thu, Oct 1, 2009 at 3:46 PM, Todd Greenwood to...@audiencescience.comwrote:

 Failover testing.

  -Original Message-
  From: Patrick Hunt [mailto:ph...@apache.org]
  Sent: Thursday, October 01, 2009 3:44 PM
  To: zookeeper-user@hadoop.apache.org; Rob Baccus
  Subject: Re: How do we find the Server the client is connected to?
 
  That detail is purposefully not exposed through the client api,
 however
  it is output to the log on connection establishment.
 
  Why would your client code need to know which server in the ensemble
 it
  is connected to?
 
  Patrick
 
  Rob Baccus wrote:
   How do I determine the server the client is connected to?  It is not
   exposed as far as I can see in either the ZooKeep object or the
   ClentCnxn object.  I did find on line 790 in
 ClientCnxn.StartConnect()
   method the place the actual server connection is happening but that
 is
   not exposed.
  
   Rob Baccus
   425-201-3812
  
  




-- 
Ted Dunning, CTO
DeepDyve


Re: How do we find the Server the client is connected to?

2009-10-01 Thread Patrick Hunt

It's possible, but not pretty.

Try this:

1) create a subclass of ZooKeeper to be used in your tests

2) in the subclass add something like this:

public String getConnectedServer() {
return ((SocketChannel)cnxn.sendThread.sockKey.channel()).socket()
.getInetAddress().toString();
}

Feel free to add a JIRA, I think we could make this a protected method 
on ZooKeeper to make testing easier (and not expose internals).


Regards,

Patrick

Todd Greenwood wrote:

Failover testing.


-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Thursday, October 01, 2009 3:44 PM
To: zookeeper-user@hadoop.apache.org; Rob Baccus
Subject: Re: How do we find the Server the client is connected to?

That detail is purposefully not exposed through the client api,

however

it is output to the log on connection establishment.

Why would your client code need to know which server in the ensemble

it

is connected to?

Patrick

Rob Baccus wrote:

How do I determine the server the client is connected to?  It is not
exposed as far as I can see in either the ZooKeep object or the
ClentCnxn object.  I did find on line 790 in

ClientCnxn.StartConnect()

method the place the actual server connection is happening but that

is

not exposed.

Rob Baccus
425-201-3812




Re: How do we find the Server the client is connected to?

2009-10-01 Thread Patrick Hunt

Possible, but very ugly. I do something similar to this in zk tests:
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPeerAddressInQuorum()
if you want to see an example.

Patrick

Ted Dunning wrote:

Grovel the logs.

On Thu, Oct 1, 2009 at 3:46 PM, Todd Greenwood to...@audiencescience.comwrote:


Failover testing.


-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Thursday, October 01, 2009 3:44 PM
To: zookeeper-user@hadoop.apache.org; Rob Baccus
Subject: Re: How do we find the Server the client is connected to?

That detail is purposefully not exposed through the client api,

however

it is output to the log on connection establishment.

Why would your client code need to know which server in the ensemble

it

is connected to?

Patrick

Rob Baccus wrote:

How do I determine the server the client is connected to?  It is not
exposed as far as I can see in either the ZooKeep object or the
ClentCnxn object.  I did find on line 790 in

ClientCnxn.StartConnect()

method the place the actual server connection is happening but that

is

not exposed.

Rob Baccus
425-201-3812








Re: problem starting ensemble mode

2009-10-01 Thread Patrick Hunt

Hi Hector, looks like a connectivity issue to me: NoRouteToHostException.

3888 is the election port
2888 is the quorum port

basically, the ensemble uses the election port for leader election. Once 
a leader is elected it then uses the quorum port for subsequent 
communication.


Could it be a firewall issue? Your configs/logs look ok to me otw.

Try using something like telnet to verify connectivity on the 3888  
2888 ports between the two servers.


Patrick

Hector Yuen wrote:

Hi all,

I am trying to start zookeeper in two nodes, the configuration file I have
is

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=hec-bp1:2888:3888
server.2=hec-bp2:2888:3888


i also have two files /var/zookeeper/myid  on each of the machines, the
files contain 1 and 2 on each of the servers


When I start, I get the following

Starting zookeeper ...
STARTED
hec...@hec-bp2:/zookeeper$ 2009-10-01 15:48:15,786 - INFO
[main:quorumpeercon...@80] - Reading configuration from:
/zookeeper/bin/../conf/zoo.cfg
2009-10-01 15:48:15,882 - INFO  [main:quorumpeercon...@232] - Defaulting to
majority quorums
2009-10-01 15:48:15,899 - INFO  [main:quorumpeerm...@118] - Starting quorum
peer
2009-10-01 15:48:15,943 - INFO  [Thread-1:quorumcnxmanager$liste...@409] -
My election bind port: 3888
2009-10-01 15:48:15,961 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorump...@487] - LOOKING
2009-10-01 15:48:15,963 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@579] - New election: -1
2009-10-01 15:48:15,978 - WARN  [WorkerSender Thread:quorumcnxmana...@336] -
Cannot open channel to 1 at election address
hec-bp1.admin.nimblestorage.com/10.12.6.192:3888
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
at java.nio.channels.SocketChannel.open(Unknown Source)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
at java.lang.Thread.run(Unknown Source)
2009-10-01 15:48:15,981 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@618] - Notification: 2,
-1, 1, 2, LOOKING, LOOKING, 2
2009-10-01 15:48:15,981 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:fastleaderelect...@642] - Adding vote
2009-10-01 15:48:16,184 - WARN
[QuorumPeer:/0:0:0:0:0:0:0:0:2181:quorumcnxmana...@336] - Cannot open
channel to 1 at election address
hec-bp1.admin.nimblestorage.com/10.12.6.192:3888


I can expect these kind of messages when the other server hasn't been
started, but even after a while keeps sending these messages.

I can ping and ssh between the machines.
I noticed that just port 3888 is listening when I do netstat -an, why is
port 2888 not being used?

Any ideas?

Thanks
-h