Filed a bug at https://issues.apache.org/jira/browse/ZOOKEEPER-3018
Thanks, Daniel -----Original Message----- From: Daniel Chan Sent: Tuesday, April 3, 2018 11:49 AM To: [email protected] Subject: RE: [3.4.6] Ephemeral node not deleted after session is gone Hi Andor, Please see my replies and requested information inline. Thanks, Daniel -----Original Message----- From: Andor Molnar [mailto:[email protected]] Sent: Tuesday, April 3, 2018 2:26 AM To: [email protected] Subject: Re: [3.4.6] Ephemeral node not deleted after session is gone There're a few questions on the original thread which might be useful to answer here as well: 1) Why is the session closed, the client closed it or the cluster expired it? [Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster 2) which server was the session attached to - the first (44sec max lat) or one of the others? Which server was the leader? [Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader 3) the znode exists on all 4 servers, is that right? [Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers Would also be useful to attach server logs related to the session expiration as well as LogFormatter output of txn log files about the nodes. [Daniel Chan] Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003): 2018-03-12 03:28:35,127 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775 2018-03-12 03:28:35,131 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode 2018-03-12 03:28:35,131 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775 2018-03-12 03:28:35,137 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775 2018-03-12 03:30:36,415 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260 2018-03-12 03:30:36,422 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode 2018-03-12 03:30:36,423 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260 2018-03-12 03:30:36,428 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260 2018-03-31 01:29:58,865 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003 Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429: 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1 3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2 Regards, Andor On Tue, Apr 3, 2018 at 10:34 AM, Andor Molnar <[email protected]> wrote: > Hi Daniel, > > Thanks for the bugreport. > Interesting that this issue should have been fixed already by ages: > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > _jira_browse_ZOOKEEPER-2D1208&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qI > rMUB65eapI_JnE&r=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns&m=eycsMys > gttgbjNW3XhfWJ8TgkcWxEFjljV-TpzD5EFU&s=ryVABxZ1RLdjrc3D4I6M5ZpST_jU6GV > QDWoE4AH83L0&e= > > Regards, > Andor > > > On Tue, Apr 3, 2018 at 3:22 AM, Daniel Chan > <[email protected]> > wrote: > >> We have a live Zookeeper environment (quorum size is 2) and observed >> a strange behavior: >> Kafka created 2 ephemeral nodes /brokers/ids/822712429 and >> /brokers/ids/707577499 on 2018-03-12 03:30:36.933 The Kafka clients >> were long gone but as of today, the two ephemeral nodes are still >> present >> >> Troubleshooting: >> 1) Lists the outstanding sessions and ephemeral nodes $ echo dump | >> nc $SERVER1 2181 SessionTracker dump: >> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863 >> ephemeral nodes dump: >> Sessions with Ephemerals (2): >> 0x162183ea9f70003: >> /brokers/ids/822712429 >> 0x162183ea9f70002: >> /brokers/ids/707577499 >> /controller >> >> 2) stat on /brokers/ids/822712429 >> zk> stat /brokers/ids/822712429 >> czxid: 4294967344 >> mzxid: 4294967344 >> pzxid: 4294967344 >> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700) >> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700) >> version: 0 >> cversion: 0 >> aversion: 0 >> owner: 99668799174148099 >> datalen: 102 >> children: 0 >> >> 3) List full connection/session details for all clients connected $ >> echo cons | nc $SERVER1 2181 >> /10.247.114.70:30401[0](queued=0,recved=1,sent=0) >> /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid= >> 0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000, >> lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117, >> llat=0,minlat=0,avglat=0,maxlat=31) >> >> $ echo cons | nc $SERVER2 2181 >> /10.196.18.61:28173[0](queued=0,recved=1,sent=0) >> /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800, >> sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000 >> ,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0, >> minlat=0,avglat=0,maxlat=1235) >> >> 4) health >> $ echo mntr | nc $SERVER1 2181 >> zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT >> zk_avg_latency 0 >> zk_max_latency 443 >> zk_min_latency 0 >> zk_packets_received 11158019 >> zk_packets_sent 11158244 >> zk_num_alive_connections 2 >> zk_outstanding_requests 0 >> zk_server_state follower >> zk_znode_count 344 >> zk_watch_count 0 >> zk_ephemerals_count 3 >> zk_approximate_data_size 36654 >> zk_open_file_descriptor_count 33 >> zk_max_file_descriptor_count 65536 >> >> 5) Could not find any special exception from zookeeper logs about the >> two sessions >> >> Is this a known bug in version 3.4.6? what could be the potential >> cause of the issue? >> >> Thanks, >> Daniel >> > >
