Perhaps I'm simply misunderstanding what the expected behavior would be. Why would my client by disconnected? Does zookeeper drop idle clients? And note that this isn't a spurious disconnect; my client is *always* dropped at that time.
On Friday zkCli seemed to be working just fine for me, but now I am getting disconnections similar to my kazoo-based client. Here is a session showing `ls /' failing. This behavior is reproducible for me currently. $ ./bin/zkCli.sh > Connecting to localhost:2181 > Welcome to ZooKeeper! > JLine support is enabled > [zk: localhost:2181(CONNECTING) 0] > WATCHER:: > WatchedEvent state:SyncConnected type:None path:null > [zk: localhost:2181(CONNECTED) 0] > [zk: localhost:2181(CONNECTED) 0] ls / > WATCHER:: > WatchedEvent state:Disconnected type:None path:null > Exception in thread "main" > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for / > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) > at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:720) > at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:588) > at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:360) > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) And here is a session where after connecting I just sit and wait for a bit (without sending any commands), whereupon I get disconnected and then immediately reconnected. Interestingly, after the reconnecting I am able to issue multiple commands without any issues. This behavior is reproducible for me currently. $ ./bin/zkCli.sh > Connecting to localhost:2181 > Welcome to ZooKeeper! > JLine support is enabled > [zk: localhost:2181(CONNECTING) 0] > [zk: localhost:2181(CONNECTING) 0] > WATCHER:: > WatchedEvent state:SyncConnected type:None path:null > [zk: localhost:2181(CONNECTED) 0] > WATCHER:: > WatchedEvent state:Disconnected type:None path:null > WATCHER:: > WatchedEvent state:SyncConnected type:None path:null > [zk: localhost:2181(CONNECTED) 0] > [zk: localhost:2181(CONNECTED) 0] ls / > [zookeeper] > [zk: localhost:2181(CONNECTED) 1] > [zk: localhost:2181(CONNECTED) 1] > [zk: localhost:2181(CONNECTED) 1] ls /zookeeper > [quota] > [zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota > [] Thanks, - Ian On Sun, Sep 28, 2014 at 2:22 PM, Camille Fournier <[email protected]> wrote: > Sorry but with what you've sent us I don't really see what the problem is. > It does look like you connect and then nothing happens for 20s and then the > connection is dropped. If you use the zkCli script to connect via the > command line do you see the same problem? > > C > > On Fri, Sep 26, 2014 at 12:48 PM, Ian Rose <[email protected]> wrote: > > > Hi all - > > > > I've just gotten started using SolrCloud, which uses Zookeeper for > > coordination; I am otherwise completely new to Zookeeper. Now I am > trying > > to query Zookeeper directly for some simple information. I am finding, > > however, that although my clients are able to connect they very > frequently > > receive timeouts. It almost seems like the server isn't receiving the > > heartbeat messages at all (or isn't responding to them). I've seen > similar > > behavior both when using the go-zookeeper > > <https://godoc.org/github.com/samuel/go-zookeeper/zk> and kazoo > > <https://github.com/python-zk/kazoo> (python) client libraries (I wanted > > to > > try >1 to ensure that it wasn't a client lib problem). > > > > My config is very simple: I am running the client and a single Zookeeper > > node on the same machine (my laptop). There are no other clients of the > > Zookeeper node while I am running these tests, so there is no practical > > possibility that the JVM is overloaded or GCing. > > > > Here is a very basic kazoo client that I am using. Obviously it isn't > > doing any "real" work right now - this is just to demonstrate the > > disconnects. > > > > #!/usr/bin/env python2.7 > > > > from kazoo.client import KazooClient, KazooState > > import logging > > import time > > > > def my_listener(state): > > if state == KazooState.LOST: > > # Register somewhere that the session was lost > > logging.warning('handle lost') > > elif state == KazooState.SUSPENDED: > > # Handle being disconnected from Zookeeper > > logging.debug('handle being disconnected') > > else: > > # Handle being connected/reconnected to Zookeeper > > logging.debug('handle being (re)connected') > > > > > > if __name__ == '__main__': > > logging.basicConfig(format='%(asctime)-15s %(levelname)s %(message)s', > > level=logging.DEBUG) > > > > logging.debug('starting...') > > zk = KazooClient(hosts='127.0.0.1:2181', timeout=30) > > zk.start() > > zk.add_listener(my_listener) > > > > time.sleep(35) > > > > zk.stop() > > > > -------------------- > > > > Here is my zookeeper config: > > > > # The number of milliseconds of each tick > > tickTime=2000 > > > > # The number of ticks that the initial > > # synchronization phase can take > > initLimit=10 > > > > # The number of ticks that can pass between > > # sending a request and getting an acknowledgement > > syncLimit=5 > > > > # the directory where the snapshot is stored. > > dataDir=/Users/ianrose/Code/zookeeper/var/data > > > > # the port at which the clients will connect > > clientPort=2181 > > > > # The number of snapshots to retain in dataDir > > autopurge.snapRetainCount=5 > > > > # Purge task interval in hours > > # Set to "0" to disable auto purge feature > > autopurge.purgeInterval=1 > > > > > > -------------------- > > > > > > Here is the output I get on the client: > > > > 2014-09-26 12:43:20,603 DEBUG starting... > > 2014-09-26 12:43:20,604 INFO Connecting to 127.0.0.1:2181 > > 2014-09-26 12:43:20,605 DEBUG Sending request(xid=None): > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000, > session_id=0, > > > passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', > > read_only=None) > > 2014-09-26 12:43:20,609 INFO Zookeeper connection established, state: > > CONNECTED > > 2014-09-26 12:43:40,222 WARNING Connection dropped: outstanding heartbeat > > ping not received > > 2014-09-26 12:43:40,222 WARNING Transition to CONNECTING > > 2014-09-26 12:43:40,222 INFO Zookeeper connection lost > > 2014-09-26 12:43:40,222 DEBUG handle being disconnected > > 2014-09-26 12:43:40,747 INFO Connecting to 127.0.0.1:2181 > > 2014-09-26 12:43:40,748 DEBUG Sending request(xid=None): > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000, > > session_id=92520388231233540, > > passwd='\xd1M\xb9\xb3\xae\xab\xa1!@x\x06nv\xb7\xe3*', read_only=None) > > 2014-09-26 12:43:40,750 INFO Zookeeper connection established, state: > > CONNECTED > > 2014-09-26 12:43:40,751 DEBUG handle being (re)connected > > 2014-09-26 12:43:55,611 DEBUG Sending request(xid=1): Close() > > 2014-09-26 12:43:55,614 INFO Closing connection to 127.0.0.1:2181 > > 2014-09-26 12:43:55,615 INFO Zookeeper session lost, state: CLOSED > > 2014-09-26 12:43:55,615 WARNING handle lost > > > > > > And here is trace-level logging from the server side: > > > > 2014-09-26 12:43:20,605 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket > > connection from /127.0.0.1:58959 > > 2014-09-26 12:43:20,605 [myid:] - DEBUG [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment > > request > > from client /127.0.0.1:58959 client's lastZxid is 0x0 > > 2014-09-26 12:43:20,605 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868][] - Client attempting to > > establish new session at /127.0.0.1:58959 > > 2014-09-26 12:43:20,605 [myid:] - TRACE [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - SessionTrackerImpl --- Adding > > session > > 0x148b2cd8b010004 30000 > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0 > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004 > > type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown > > reqpath:n/a > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0 > > cport:-1)::ZooTrace@71][] - SessionTrackerImpl --- Existing session > > 0x148b2cd8b010004 30000 > > 2014-09-26 12:43:20,608 [myid:] - DEBUG > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request:: > > sessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f > > txntype:-10 reqpath:n/a > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] - > > :Esessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f > > txntype:-10 reqpath:n/a > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] - > > SessionTrackerImpl --- Existing session 0x148b2cd8b010004 30000 > > 2014-09-26 12:43:20,608 [myid:] - DEBUG > > [SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004 > > type:createSession cxid:0x0 zxid:0x78f txntype:-10 reqpath:n/a > > 2014-09-26 12:43:20,609 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617 > > ][] > > - Established session 0x148b2cd8b010004 with negotiated timeout 30000 for > > client /127.0.0.1:58959 > > 2014-09-26 12:43:40,748 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket > > connection from /127.0.0.1:58960 > > 2014-09-26 12:43:40,749 [myid:] - DEBUG [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment > > request > > from client /127.0.0.1:58960 client's lastZxid is 0x0 > > 2014-09-26 12:43:40,749 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861][] - Client attempting to renew > > session 0x148b2cd8b010004 at /127.0.0.1:58960 > > 2014-09-26 12:43:40,749 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection > for > > client /127.0.0.1:58959 which had sessionid 0x148b2cd8b010004 > > 2014-09-26 12:43:40,750 [myid:] - TRACE [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - Session 0x148b2cd8b010004 is > valid: > > true > > 2014-09-26 12:43:40,750 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617][] - Established session > > 0x148b2cd8b010004 with negotiated timeout 30000 for client / > > 127.0.0.1:58960 > > 2014-09-26 12:43:50,742 [myid:] - DEBUG > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request:: > > sessionid:0x148b2cd8b010004 type:ping cxid:0xfffffffffffffffe > > zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a > > 2014-09-26 12:43:50,742 [myid:] - DEBUG > > [SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004 > > type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown > > reqpath:n/a > > 2014-09-26 12:43:55,612 [myid:] - TRACE [ProcessThread(sid:0 > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004 > > type:closeSession > > cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a > > 2014-09-26 12:43:55,612 [myid:] - INFO [ProcessThread(sid:0 > > cport:-1)::SessionTrackerImpl@198][] - Session closing: > 0x148b2cd8b010004 > > 2014-09-26 12:43:55,612 [myid:] - INFO [ProcessThread(sid:0 > > cport:-1)::PrepRequestProcessor@494][] - Processed session termination > for > > sessionid: 0x148b2cd8b010004 > > 2014-09-26 12:43:55,613 [myid:] - DEBUG > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request:: > > sessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790 > > txntype:-11 reqpath:n/a > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] - > > :Esessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790 > > txntype:-11 reqpath:n/a > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] - > > SessionTrackerImpl --- Removing session 0x148b2cd8b010004 > > 2014-09-26 12:43:55,614 [myid:] - DEBUG > > [SyncThread:0:FinalRequestProcessor@160][] - sessionid:0x148b2cd8b010004 > > type:closeSession cxid:0x1 zxid:0x790 txntype:-11 reqpath:n/a > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:NIOServerCnxn@168 > ][] > > - Add a buffer to outgoingBuffers, sk > sun.nio.ch.SelectionKeyImpl@28cfdeb5 > > is valid: true > > 2014-09-26 12:43:55,615 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket connection > for > > client /127.0.0.1:58960 which had sessionid 0x148b2cd8b010004 > > > > > > I'm assuming that this is due to a misunderstanding on my part. Can > anyone > > point me in the right direction? > > > > Much appreciated! > > - Ian > > >
