Hi

Using ZK version 3.3.6

I have a ZK cluster consisting of three ZK servers running at 192.168.78.201, .202 and .203. To test behavior when one ZK is down I have only .202 and .203 running. .201 was shut down just before the test. Have some znodes in ZK, a.o. /a/b.txt - it is protected by digest ALL ACL username/password myuser/mypass, but that probably does not matter.

Running the following test-code:
----------- test code - start -------------------
import java.io.IOException;

import org.apache.zookeeper.ZooKeeper;

public class OneZKDownSmallTester {

    public static void main(String []args) throws Exception {
        for (int i = 0; i < 10; i++) {
ZooKeeper zk = getZooKeeper("192.168.78.201:2181,192.168.78.202:2181,192.168.78.203:2181", "myuser", "mypass");
            try {
                for (int j = 0; j < 10; j++) {
System.out.println("Exists " + i + "," + j + ": " + zk.exists("/a/b.txt", false));
                }
            } finally {
                zk.close();
            }
        }
    }

public static ZooKeeper getZooKeeper(String zkConnectionStr, String digestCredentialsUsername, String digestCredentialsPassword) throws IOException {
        ZooKeeper result = new ZooKeeper(zkConnectionStr, 10 * 1000, null);
result.addAuthInfo("digest", (digestCredentialsUsername + ":" + digestCredentialsPassword).getBytes());
        return result;
    }

}
----------- test code - end -------------------

I get different results (print to stdout). A run of the test might give me
----------- result #1 - start -------------------
Exists 0,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 0,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 1,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 2,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 3,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 4,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 5,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 6,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,0: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,1: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,2: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,3: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,4: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,5: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,6: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,7: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,8: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exists 7,9: 8589934611,21474840943,1330718648589,1330802223875,180,0,0,0,89466,0,8589934611 Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /a/b.txt
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843)
    at OneZKDownSmallTester.main(OneZKDownSmallTester.java:12)
----------- result #1 - end --------------------

Another run of the test might give me
----------- result #2 - start -------------------
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /a/b.txt
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843)
    at OneZKDownSmallTester.main(OneZKDownSmallTester.java:12)
----------- result #2 - end --------------------

It seems to be random at what outer iteration (iterating over variable i) it fails. If all three ZK servers are running it never fails. To me it seems like the client, at creation time, picks a random of the three ZK servers to connect to and insists to work against that one.

I am not sure but I think there is some wrong behavior here
* Shouldnt the client pick a running ZK server for connection. That is, shouldnt it "see" that the server on .201 isnt running and choose one of the others? * It is strange to me that the connection-loss exception comes at the zk.exists operation and not at the new ZooKeeper(...) operation? * Even if .201 did run at the point in time where connection is established, but .201 went down at some point afterwards, shouldnt the client just transparently switch to one of the others and use that, without throwing excptions at me (except if it wasnt able to reestablish the connection with one of the other ZK servers before session timeout)?

Can one of you guys confirm or reject that the above is "bug", as in "it is not intended behavior"? If so, can you tell me if it has been corrected in a later version than 3.3.6?

Kind regards, Per Steffensen

Reply via email to