Hi
Using ZK version 3.3.6
I have a ZK cluster consisting of three ZK servers running at
192.168.78.201, .202 and .203. To test behavior when one ZK is down I
have only .202 and .203 running. .201 was shut down just before the
test. Have some znodes in ZK, a.o. /a/b.txt - it is protected by digest
ALL ACL username/password myuser/mypass, but that probably does not matter.
Running the following test-code:
----------- test code - start -------------------
import java.io.IOException;
import org.apache.zookeeper.ZooKeeper;
public class OneZKDownSmallTester {
public static void main(String []args) throws Exception {
for (int i = 0; i < 10; i++) {
ZooKeeper zk =
getZooKeeper("192.168.78.201:2181,192.168.78.202:2181,192.168.78.203:2181",
"myuser", "mypass");
try {
for (int j = 0; j < 10; j++) {
System.out.println("Exists " + i + "," + j + ": " +
zk.exists("/a/b.txt", false));
}
} finally {
zk.close();
}
}
}
public static ZooKeeper getZooKeeper(String zkConnectionStr, String
digestCredentialsUsername, String digestCredentialsPassword) throws
IOException {
ZooKeeper result = new ZooKeeper(zkConnectionStr, 10 * 1000, null);
result.addAuthInfo("digest", (digestCredentialsUsername + ":" +
digestCredentialsPassword).getBytes());
return result;
}
}
----------- test code - end -------------------
I get different results (print to stdout). A run of the test might give me
----------- result #1 - start -------------------
Exists 0,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 0,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 1,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 2,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 3,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 4,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 5,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 6,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,0:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,1:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,2:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,3:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,4:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,5:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,6:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,7:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,8:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exists 7,9:
8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611
Exception in thread "main"
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /a/b.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843)
at OneZKDownSmallTester.main(OneZKDownSmallTester.java:12)
----------- result #1 - end --------------------
Another run of the test might give me
----------- result #2 - start -------------------
Exception in thread "main"
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /a/b.txt
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843)
at OneZKDownSmallTester.main(OneZKDownSmallTester.java:12)
----------- result #2 - end --------------------
It seems to be random at what outer iteration (iterating over variable
i) it fails. If all three ZK servers are running it never fails. To me
it seems like the client, at creation time, picks a random of the three
ZK servers to connect to and insists to work against that one.
I am not sure but I think there is some wrong behavior here
* Shouldnt the client pick a running ZK server for connection. That is,
shouldnt it "see" that the server on .201 isnt running and choose one of
the others?
* It is strange to me that the connection-loss exception comes at the
zk.exists operation and not at the new ZooKeeper(...) operation?
* Even if .201 did run at the point in time where connection is
established, but .201 went down at some point afterwards, shouldnt the
client just transparently switch to one of the others and use that,
without throwing excptions at me (except if it wasnt able to reestablish
the connection with one of the other ZK servers before session timeout)?
Can one of you guys confirm or reject that the above is "bug", as in "it
is not intended behavior"? If so, can you tell me if it has been
corrected in a later version than 3.3.6?
Kind regards, Per Steffensen