Hi, In Oozie, we're using Curator to connect to ZooKeeper and both had been working fine. We recently upgraded Curator, which brought us from ZooKeeper 3.4.5 to 3.4.6; this broke our unit tests that used ZooKeeper with Kerberos/SASL.
We use the MiniKDC from Hadoop to have a KDC for the unit test, and setup a "zookeeper/localhost" principal. With ZK 3.4.6, this caused an "AuthFailed" error when trying to do anything with the ZooKeeper client. I did some digging and found this set of log messages: 23334 [pool-1-thread-1] INFO org.apache.zookeeper.ZooKeeper - Initiating > client connection, connectString=127.0.0.1:50921 sessionTimeout=60000 > watcher=org.apache.curator.ConnectionState@655bf451 > 23335 [pool-1-thread-1-SendThread(127.0.0.1:50921)] INFO > org.apache.zookeeper.client.ZooKeeperSaslClient - Client will use GSSAPI as > SASL mechanism. > 23337 [pool-1-thread-1-SendThread(127.0.0.1:50921)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > 127.0.0.1/127.0.0.1:50921. Will attempt to SASL-authenticate using Login > Context section 'Client' > 23337 [pool-1-thread-1-SendThread(127.0.0.1:50921)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > 127.0.0.1/127.0.0.1:50921, initiating session > 23337 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:50921] INFO > org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket > connection from /127.0.0.1:50928 > 23339 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:50921] INFO > org.apache.zookeeper.server.ZooKeeperServer - Client attempting to > establish new session at /127.0.0.1:50928 > 23339 [SyncThread:0] INFO org.apache.zookeeper.server.ZooKeeperServer - > Established session 0x1492ff641fe0001 with negotiated timeout 60000 for > client /127.0.0.1:50928 > 23339 [pool-1-thread-1-SendThread(127.0.0.1:50921)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > 127.0.0.1/127.0.0.1:50921, sessionid = 0x1492ff641fe0001, negotiated > timeout = 60000 > 23339 [pool-1-thread-1-EventThread] INFO > org.apache.curator.framework.state.ConnectionStateManager - State change: > CONNECTED > 23345 [NioProcessor-1] WARN > org.apache.directory.server.protocol.shared.kerberos.StoreUtils - No server > entry found for kerberos principal name zookeeper/[email protected] > 23345 [NioProcessor-1] WARN org.apache.directory.server.KERBEROS_LOG - No > server entry found for kerberos principal name zookeeper/ > [email protected] > 23346 [NioProcessor-1] WARN > org.apache.directory.server.kerberos.protocol.KerberosProtocolHandler - > Server not found in Kerberos database (7) > 23346 [NioProcessor-1] WARN org.apache.directory.server.KERBEROS_LOG - > Server not found in Kerberos database (7) As you can see, it connects to "127.0.0.1/127.0.0.1". When I force maven to use ZK 3.4.5 and run the test, I get similar messages here: 25475 [pool-1-thread-1] INFO org.apache.zookeeper.ZooKeeper - Initiating > client connection, connectString=127.0.0.1:50018 sessionTimeout=60000 > watcher=org.apache.curator.ConnectionState@a9c6fd8 > 25538 [pool-1-thread-1-SendThread(localhost:50018)] INFO > org.apache.zookeeper.client.ZooKeeperSaslClient - Client will use GSSAPI as > SASL mechanism. > 25556 [pool-1-thread-1-SendThread(localhost:50018)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > localhost/127.0.0.1:50018. Will attempt to SASL-authenticate using Login > Context section 'Client' > 25557 [pool-1-thread-1-SendThread(localhost:50018)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > localhost/127.0.0.1:50018, initiating session > 25557 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:50018] INFO > org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket > connection from /127.0.0.1:50023 > 25575 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:50018] INFO > org.apache.zookeeper.server.ZooKeeperServer - Client attempting to > establish new session at /127.0.0.1:50023 > 25579 [SyncThread:0] INFO > org.apache.zookeeper.server.persistence.FileTxnLog - Creating new log file: > log.1 > 25647 [SyncThread:0] INFO org.apache.zookeeper.server.ZooKeeperServer - > Established session 0x1492fb3a68b0000 with negotiated timeout 60000 for > client /127.0.0.1:50023 You'll notice that it's connecting to "localhost/127.0.0.1". I've verified that this is indeed the issue by trying it with a "zookeeper/127.0.0.1" principal and seeing that it works with ZK 3.4.6. I also tried using a "zookeeper/127.0.0.1" principal with ZK 3.4.5, but this fails for the same reason but in reverse (i.e. ZooKeeper is trying to use "zookeeper/localhost", even though we specified "zookeeper/127.0.0.1"). TLDR; With ZooKeeper 3.4.5, "zookeeper/localhost" principal works while "zookeeper/127.0.0.1" principal fails. With ZooKeeper 3.4.6, "zookeeper/127.0.0.1" principal works while "zookeeper/localhost" principal fails. Any ideas what the problem is? On a related note, is there a reason why ZooKeeper requires setting System properties to configure it for SASL/Kerberos? That greatly complicates using it, especially in tests. Are there any plan on adding a way to pass a Configuration or Properties object or file? thanks - Robert
