Well unfortunately although I am able to repro this problem with a completely new, clean ZK install on my laptop, I *cannot* repro with the same on my coworkers laptop. So unfortunately I am forced to conclude that there is something strange going on locally.
Thanks for the help anyhow! - Ian On Mon, Sep 29, 2014 at 9:40 AM, Camille Fournier <[email protected]> wrote: > No that is not expected. Odd that you get disconnected once and then > reconnect fine. Does the same thing happen in your kazoo clients, one > disconnect but then the second connect is ok? > Which version of ZK are you running? Are you running this with some sort of > auth or password to the zk server? > > Thanks, > C > > On Mon, Sep 29, 2014 at 9:24 AM, Ian Rose <[email protected]> wrote: > > > Perhaps I'm simply misunderstanding what the expected behavior would be. > > Why would my client by disconnected? Does zookeeper drop idle clients? > > And note that this isn't a spurious disconnect; my client is *always* > > dropped > > at that time. > > > > On Friday zkCli seemed to be working just fine for me, but now I am > getting > > disconnections similar to my kazoo-based client. > > > > Here is a session showing `ls /' failing. This behavior is reproducible > > for me currently. > > > > $ ./bin/zkCli.sh > > > Connecting to localhost:2181 > > > Welcome to ZooKeeper! > > > JLine support is enabled > > > [zk: localhost:2181(CONNECTING) 0] > > > WATCHER:: > > > WatchedEvent state:SyncConnected type:None path:null > > > [zk: localhost:2181(CONNECTED) 0] > > > [zk: localhost:2181(CONNECTED) 0] ls / > > > WATCHER:: > > > WatchedEvent state:Disconnected type:None path:null > > > Exception in thread "main" > > > org.apache.zookeeper.KeeperException$ConnectionLossException: > > > KeeperErrorCode = ConnectionLoss for / > > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472) > > > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500) > > > at > > org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:720) > > > at > org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:588) > > > at > org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:360) > > > at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) > > > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) > > > > > > And here is a session where after connecting I just sit and wait for a > bit > > (without sending any commands), whereupon I get disconnected and then > > immediately reconnected. Interestingly, after the reconnecting I am able > > to issue multiple commands without any issues. This behavior is > > reproducible for me currently. > > > > $ ./bin/zkCli.sh > > > Connecting to localhost:2181 > > > Welcome to ZooKeeper! > > > JLine support is enabled > > > [zk: localhost:2181(CONNECTING) 0] > > > [zk: localhost:2181(CONNECTING) 0] > > > WATCHER:: > > > WatchedEvent state:SyncConnected type:None path:null > > > [zk: localhost:2181(CONNECTED) 0] > > > WATCHER:: > > > WatchedEvent state:Disconnected type:None path:null > > > WATCHER:: > > > WatchedEvent state:SyncConnected type:None path:null > > > [zk: localhost:2181(CONNECTED) 0] > > > [zk: localhost:2181(CONNECTED) 0] ls / > > > [zookeeper] > > > [zk: localhost:2181(CONNECTED) 1] > > > [zk: localhost:2181(CONNECTED) 1] > > > [zk: localhost:2181(CONNECTED) 1] ls /zookeeper > > > [quota] > > > [zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota > > > [] > > > > > > > > Thanks, > > - Ian > > > > > > On Sun, Sep 28, 2014 at 2:22 PM, Camille Fournier <[email protected]> > > wrote: > > > > > Sorry but with what you've sent us I don't really see what the problem > > is. > > > It does look like you connect and then nothing happens for 20s and then > > the > > > connection is dropped. If you use the zkCli script to connect via the > > > command line do you see the same problem? > > > > > > C > > > > > > On Fri, Sep 26, 2014 at 12:48 PM, Ian Rose <[email protected]> > > wrote: > > > > > > > Hi all - > > > > > > > > I've just gotten started using SolrCloud, which uses Zookeeper for > > > > coordination; I am otherwise completely new to Zookeeper. Now I am > > > trying > > > > to query Zookeeper directly for some simple information. I am > finding, > > > > however, that although my clients are able to connect they very > > > frequently > > > > receive timeouts. It almost seems like the server isn't receiving > the > > > > heartbeat messages at all (or isn't responding to them). I've seen > > > similar > > > > behavior both when using the go-zookeeper > > > > <https://godoc.org/github.com/samuel/go-zookeeper/zk> and kazoo > > > > <https://github.com/python-zk/kazoo> (python) client libraries (I > > wanted > > > > to > > > > try >1 to ensure that it wasn't a client lib problem). > > > > > > > > My config is very simple: I am running the client and a single > > Zookeeper > > > > node on the same machine (my laptop). There are no other clients of > > the > > > > Zookeeper node while I am running these tests, so there is no > practical > > > > possibility that the JVM is overloaded or GCing. > > > > > > > > Here is a very basic kazoo client that I am using. Obviously it > isn't > > > > doing any "real" work right now - this is just to demonstrate the > > > > disconnects. > > > > > > > > #!/usr/bin/env python2.7 > > > > > > > > from kazoo.client import KazooClient, KazooState > > > > import logging > > > > import time > > > > > > > > def my_listener(state): > > > > if state == KazooState.LOST: > > > > # Register somewhere that the session was lost > > > > logging.warning('handle lost') > > > > elif state == KazooState.SUSPENDED: > > > > # Handle being disconnected from Zookeeper > > > > logging.debug('handle being disconnected') > > > > else: > > > > # Handle being connected/reconnected to Zookeeper > > > > logging.debug('handle being (re)connected') > > > > > > > > > > > > if __name__ == '__main__': > > > > logging.basicConfig(format='%(asctime)-15s %(levelname)s > > %(message)s', > > > > level=logging.DEBUG) > > > > > > > > logging.debug('starting...') > > > > zk = KazooClient(hosts='127.0.0.1:2181', timeout=30) > > > > zk.start() > > > > zk.add_listener(my_listener) > > > > > > > > time.sleep(35) > > > > > > > > zk.stop() > > > > > > > > -------------------- > > > > > > > > Here is my zookeeper config: > > > > > > > > # The number of milliseconds of each tick > > > > tickTime=2000 > > > > > > > > # The number of ticks that the initial > > > > # synchronization phase can take > > > > initLimit=10 > > > > > > > > # The number of ticks that can pass between > > > > # sending a request and getting an acknowledgement > > > > syncLimit=5 > > > > > > > > # the directory where the snapshot is stored. > > > > dataDir=/Users/ianrose/Code/zookeeper/var/data > > > > > > > > # the port at which the clients will connect > > > > clientPort=2181 > > > > > > > > # The number of snapshots to retain in dataDir > > > > autopurge.snapRetainCount=5 > > > > > > > > # Purge task interval in hours > > > > # Set to "0" to disable auto purge feature > > > > autopurge.purgeInterval=1 > > > > > > > > > > > > -------------------- > > > > > > > > > > > > Here is the output I get on the client: > > > > > > > > 2014-09-26 12:43:20,603 DEBUG starting... > > > > 2014-09-26 12:43:20,604 INFO Connecting to 127.0.0.1:2181 > > > > 2014-09-26 12:43:20,605 DEBUG Sending request(xid=None): > > > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000, > > > session_id=0, > > > > > > > > > > passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', > > > > read_only=None) > > > > 2014-09-26 12:43:20,609 INFO Zookeeper connection established, state: > > > > CONNECTED > > > > 2014-09-26 12:43:40,222 WARNING Connection dropped: outstanding > > heartbeat > > > > ping not received > > > > 2014-09-26 12:43:40,222 WARNING Transition to CONNECTING > > > > 2014-09-26 12:43:40,222 INFO Zookeeper connection lost > > > > 2014-09-26 12:43:40,222 DEBUG handle being disconnected > > > > 2014-09-26 12:43:40,747 INFO Connecting to 127.0.0.1:2181 > > > > 2014-09-26 12:43:40,748 DEBUG Sending request(xid=None): > > > > Connect(protocol_version=0, last_zxid_seen=0, time_out=30000, > > > > session_id=92520388231233540, > > > > passwd='\xd1M\xb9\xb3\xae\xab\xa1!@x\x06nv\xb7\xe3*', read_only=None) > > > > 2014-09-26 12:43:40,750 INFO Zookeeper connection established, state: > > > > CONNECTED > > > > 2014-09-26 12:43:40,751 DEBUG handle being (re)connected > > > > 2014-09-26 12:43:55,611 DEBUG Sending request(xid=1): Close() > > > > 2014-09-26 12:43:55,614 INFO Closing connection to 127.0.0.1:2181 > > > > 2014-09-26 12:43:55,615 INFO Zookeeper session lost, state: CLOSED > > > > 2014-09-26 12:43:55,615 WARNING handle lost > > > > > > > > > > > > And here is trace-level logging from the server side: > > > > > > > > 2014-09-26 12:43:20,605 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket > > > > connection from /127.0.0.1:58959 > > > > 2014-09-26 12:43:20,605 [myid:] - DEBUG [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment > > > > request > > > > from client /127.0.0.1:58959 client's lastZxid is 0x0 > > > > 2014-09-26 12:43:20,605 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868][] - Client attempting to > > > > establish new session at /127.0.0.1:58959 > > > > 2014-09-26 12:43:20,605 [myid:] - TRACE [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - SessionTrackerImpl --- Adding > > > > session > > > > 0x148b2cd8b010004 30000 > > > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0 > > > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004 > > > > type:createSession cxid:0x0 zxid:0xfffffffffffffffe txntype:unknown > > > > reqpath:n/a > > > > 2014-09-26 12:43:20,606 [myid:] - TRACE [ProcessThread(sid:0 > > > > cport:-1)::ZooTrace@71][] - SessionTrackerImpl --- Existing session > > > > 0x148b2cd8b010004 30000 > > > > 2014-09-26 12:43:20,608 [myid:] - DEBUG > > > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request:: > > > > sessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f > > > > txntype:-10 reqpath:n/a > > > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] > - > > > > :Esessionid:0x148b2cd8b010004 type:createSession cxid:0x0 zxid:0x78f > > > > txntype:-10 reqpath:n/a > > > > 2014-09-26 12:43:20,608 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] > - > > > > SessionTrackerImpl --- Existing session 0x148b2cd8b010004 30000 > > > > 2014-09-26 12:43:20,608 [myid:] - DEBUG > > > > [SyncThread:0:FinalRequestProcessor@160][] - > > sessionid:0x148b2cd8b010004 > > > > type:createSession cxid:0x0 zxid:0x78f txntype:-10 reqpath:n/a > > > > 2014-09-26 12:43:20,609 [myid:] - INFO > > [SyncThread:0:ZooKeeperServer@617 > > > > ][] > > > > - Established session 0x148b2cd8b010004 with negotiated timeout 30000 > > for > > > > client /127.0.0.1:58959 > > > > 2014-09-26 12:43:40,748 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197][] - Accepted socket > > > > connection from /127.0.0.1:58960 > > > > 2014-09-26 12:43:40,749 [myid:] - DEBUG [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@810][] - Session establishment > > > > request > > > > from client /127.0.0.1:58960 client's lastZxid is 0x0 > > > > 2014-09-26 12:43:40,749 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@861][] - Client attempting to > > renew > > > > session 0x148b2cd8b010004 at /127.0.0.1:58960 > > > > 2014-09-26 12:43:40,749 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket > connection > > > for > > > > client /127.0.0.1:58959 which had sessionid 0x148b2cd8b010004 > > > > 2014-09-26 12:43:40,750 [myid:] - TRACE [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooTrace@71][] - Session 0x148b2cd8b010004 is > > > valid: > > > > true > > > > 2014-09-26 12:43:40,750 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@617][] - Established session > > > > 0x148b2cd8b010004 with negotiated timeout 30000 for client / > > > > 127.0.0.1:58960 > > > > 2014-09-26 12:43:50,742 [myid:] - DEBUG > > > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request:: > > > > sessionid:0x148b2cd8b010004 type:ping cxid:0xfffffffffffffffe > > > > zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a > > > > 2014-09-26 12:43:50,742 [myid:] - DEBUG > > > > [SyncThread:0:FinalRequestProcessor@160][] - > > sessionid:0x148b2cd8b010004 > > > > type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe > > txntype:unknown > > > > reqpath:n/a > > > > 2014-09-26 12:43:55,612 [myid:] - TRACE [ProcessThread(sid:0 > > > > cport:-1)::ZooTrace@90][] - :Psessionid:0x148b2cd8b010004 > > > > type:closeSession > > > > cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a > > > > 2014-09-26 12:43:55,612 [myid:] - INFO [ProcessThread(sid:0 > > > > cport:-1)::SessionTrackerImpl@198][] - Session closing: > > > 0x148b2cd8b010004 > > > > 2014-09-26 12:43:55,612 [myid:] - INFO [ProcessThread(sid:0 > > > > cport:-1)::PrepRequestProcessor@494][] - Processed session > termination > > > for > > > > sessionid: 0x148b2cd8b010004 > > > > 2014-09-26 12:43:55,613 [myid:] - DEBUG > > > > [SyncThread:0:FinalRequestProcessor@88][] - Processing request:: > > > > sessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790 > > > > txntype:-11 reqpath:n/a > > > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@90][] > - > > > > :Esessionid:0x148b2cd8b010004 type:closeSession cxid:0x1 zxid:0x790 > > > > txntype:-11 reqpath:n/a > > > > 2014-09-26 12:43:55,614 [myid:] - TRACE [SyncThread:0:ZooTrace@71][] > - > > > > SessionTrackerImpl --- Removing session 0x148b2cd8b010004 > > > > 2014-09-26 12:43:55,614 [myid:] - DEBUG > > > > [SyncThread:0:FinalRequestProcessor@160][] - > > sessionid:0x148b2cd8b010004 > > > > type:closeSession cxid:0x1 zxid:0x790 txntype:-11 reqpath:n/a > > > > 2014-09-26 12:43:55,614 [myid:] - TRACE > [SyncThread:0:NIOServerCnxn@168 > > > ][] > > > > - Add a buffer to outgoingBuffers, sk > > > sun.nio.ch.SelectionKeyImpl@28cfdeb5 > > > > is valid: true > > > > 2014-09-26 12:43:55,615 [myid:] - INFO [NIOServerCxn.Factory: > > > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007][] - Closed socket > connection > > > for > > > > client /127.0.0.1:58960 which had sessionid 0x148b2cd8b010004 > > > > > > > > > > > > I'm assuming that this is due to a misunderstanding on my part. Can > > > anyone > > > > point me in the right direction? > > > > > > > > Much appreciated! > > > > - Ian > > > > > > > > > >
