Is the server a virtualized host? I still think this might shed some light, what are you seeing for this? before/during/after the event:
>>>> Use the "stat" 4 >>>> letter word to see the server's request processing latency, see if >>>> that's high (higer than the timeout is bad news). Again, checkout the >>>> troubleshooting guide. On Fri, Feb 11, 2011 at 4:32 PM, Fournier, Camille F. [Tech] <[email protected]> wrote: > I think the gist of the problem is that while the ZK Server is expiring > SESSION_0, SESSION_1 is coming in and trying to send a ping. It is blocked a > bit behind the expiring of SESSION_0 due to the synchronous nature of > touching the session table, and then its request for a ping will be processed > behind the session expiration processing for SESSION_0. So either the expirer > takes long enough dealing with SESSION_0 that it immediately wants to expire > SESSION_1 on next processing, or, the processing of the session expiration > takes just long enough for SESSION_1 to not get a response to its heartbeat > back from the server, which then causes it to disconnect and reconnect, and > in the interim the server determines the session dead due to timeout. > > Long story short, those timeouts are too short for the server to reliably > execute responses in time to guarantee they won't be incorrectly activated. > > C > > -----Original Message----- > From: Nick Patania [mailto:[email protected]] > Sent: Friday, February 11, 2011 6:19 PM > To: [email protected] > Subject: Re: tickTime and sessionTimeout > > 3.3.2-1031432 > > On Fri, Feb 11, 2011 at 6:15 PM, Fournier, Camille F. [Tech] > <[email protected]> wrote: >> Which version of ZK? >> >> -----Original Message----- >> From: Nick Patania [mailto:[email protected]] >> Sent: Friday, February 11, 2011 6:10 PM >> To: [email protected] >> Subject: Re: tickTime and sessionTimeout >> >> A single server. I've intentionally made it trivial to demonstrate >> the behavior. >> If I increase the timeout, the issue goes away. >> >> On Fri, Feb 11, 2011 at 6:00 PM, Fournier, Camille F. [Tech] >> <[email protected]> wrote: >>> What is your ZooKeeper setup here? And do you continue to see this issue if >>> you increase your session timeout? >>> >>> C >>> >>> -----Original Message----- >>> From: Nick Patania [mailto:[email protected]] >>> Sent: Friday, February 11, 2011 5:24 PM >>> To: [email protected] >>> Subject: Re: tickTime and sessionTimeout >>> >>> Patrick, thanks for your input. >>> >>> I have rerun the test several times now while logging GC on the server >>> and running ping from CLIENT_1: >>> >>> - No GC happens on the server during the period of interest (a >>> couple of young generation runs happen before I kill HOST_0, and they >>> complete in under 3ms). >>> - Round trip times for ping from CLIENT_1 are consistently under >>> 250us throughout. >>> >>> Regarding client GC -- I can consistently reproduce this using a C >>> client. Regarding the theory of swapping on CLIENT_1 -- if that were >>> the cause, the problem wouldn't be 100% reproducible. I also looked >>> through the client log for SESSION_1 -- I see "Got ping response ... >>> after 1ms" repeatedly, followed by "Client session timed out, have not >>> heard from server in 666ms"... >>> >>> On Fri, Feb 11, 2011 at 1:46 PM, Patrick Hunt <[email protected]> wrote: >>>> Those are pretty short timeouts, many sources of delay could be >>>> causing this. Network jitter/latency, GC/swap (server or client), IO >>>> write latency, etc... See if any of this might be your issue: >>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting >>>> >>>> I can clearly see this sequence in your log for session1: >>>> >>>> --- >>>> 2011-02-11 14:18:52,672 - sessionid:SESSION_1 type:ping >>>> >>>> 2011-02-11 14:18:54,502 - Expiring session SESSION_1, timeout of 1000ms >>>> exceeded >>>> >>>> 2011-02-11 14:18:55,011 - Processing request:: sessionid:SESSION_1 >>>> type:ping >>>> --- >>>> >>>> from the looks of it session 1 doesn't send a ping to the server for >>>> ~2.5 seconds, as a result it's expired. >>>> >>>> You should also look at your session 1 client log and see what it's >>>> view of the world is like. (is it gc/swapping?). Use the "stat" 4 >>>> letter word to see the server's request processing latency, see if >>>> that's high (higer than the timeout is bad news). Again, checkout the >>>> troubleshooting guide. >>>> >>>> Patrick >>>> >>>> ps please use pastebin or attachment, otw the formatting of wrecked >>>> and it's harder to read the log >>>> >>>> On Fri, Feb 11, 2011 at 6:41 AM, Nick Patania >>>> <[email protected]> wrote: >>>>> This is the portion that seems relevant. For readability, I replaced >>>>> the host and session for the host that I kill with HOST_0 and >>>>> SESSION_0 (I expect these to timeout). The client that should be >>>>> healthy is HOST_1 and SESSION_1. >>>>> >>>>> >>>>> 2011-02-11 14:18:51,901 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_0 type:ping cxid:0xfffffffffffffffe >>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:51,901 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_0 >>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>>> txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:52,005 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:52,005 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>>> txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:52,339 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:52,339 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>>> txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:52,672 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:52,672 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>>> txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:53,002 - INFO [SessionTracker:ZooKeeperServer@314] - >>>>> Expiring session SESSION_0, timeout of 1000ms exceeded >>>>> 2011-02-11 14:18:53,002 - INFO >>>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session >>>>> termination for sessionid: SESSION_0 >>>>> 2011-02-11 14:18:53,010 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_0 type:closeSession cxid:0x0 zxid:0x103 txntype:-11 >>>>> reqpath:n/a >>>>> 2011-02-11 14:18:53,010 - INFO [SyncThread:0:NIOServerCnxn@1435] - >>>>> Closed socket connection for client /HOST_0:34618 which had sessionid >>>>> SESSION_0 >>>>> 2011-02-11 14:18:54,502 - INFO [SessionTracker:ZooKeeperServer@314] - >>>>> Expiring session SESSION_1, timeout of 1000ms exceeded >>>>> 2011-02-11 14:18:54,502 - INFO >>>>> [ProcessThread:-1:PrepRequestProcessor@387] - Processed session >>>>> termination for sessionid: SESSION_1 >>>>> 2011-02-11 14:18:55,011 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_1 type:ping cxid:0xfffffffffffffffe >>>>> zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:55,011 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@160] - sessionid:SESSION_1 >>>>> type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe >>>>> txntype:unknown reqpath:n/a >>>>> 2011-02-11 14:18:55,011 - INFO >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@251] >>>>> - Accepted socket connection from /HOST_1:40556 >>>>> 2011-02-11 14:18:55,019 - DEBUG >>>>> [SyncThread:0:FinalRequestProcessor@78] - Processing request:: >>>>> sessionid:SESSION_1 type:closeSession cxid:0x0 zxid:0x104 txntype:-11 >>>>> reqpath:n/a >>>>> 2011-02-11 14:18:55,019 - DEBUG >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:ZooKeeperServer@590] - >>>>> Dropping request: No session with sessionid SESSION_1 exists, probably >>>>> expired and removed >>>>> 2011-02-11 14:18:55,019 - INFO [SyncThread:0:NIOServerCnxn@1435] - >>>>> Closed socket connection for client /HOST_1:40555 which had sessionid >>>>> SESSION_1 >>>>> 2011-02-11 14:18:55,020 - DEBUG [SyncThread:0:NIOServerCnxn@1451] - >>>>> ignoring exception during output shutdown >>>>> java.net.SocketException: Transport endpoint is not connected >>>>> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) >>>>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1447) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1412) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSessionWithoutWakeup(NIOServerCnxn.java:343) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSession(NIOServerCnxn.java:330) >>>>> at >>>>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:133) >>>>> at >>>>> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161) >>>>> at >>>>> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) >>>>> 2011-02-11 14:18:55,021 - DEBUG [SyncThread:0:NIOServerCnxn@1459] - >>>>> ignoring exception during input shutdown >>>>> java.net.SocketException: Transport endpoint is not connected >>>>> at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>>>> at >>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) >>>>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn.closeSock(NIOServerCnxn.java:1455) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:1412) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSessionWithoutWakeup(NIOServerCnxn.java:343) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.closeSession(NIOServerCnxn.java:330) >>>>> at >>>>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:133) >>>>> at >>>>> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:161) >>>>> at >>>>> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) >>>>> 2011-02-11 14:18:55,022 - WARN >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn$Factory@272] >>>>> - Ignoring unexpected runtime exception >>>>> java.nio.channels.CancelledKeyException >>>>> at >>>>> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) >>>>> at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69) >>>>> at >>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:241) >>>>> 2011-02-11 14:18:55,023 - DEBUG >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@735] - >>>>> Session establishment request from client /HOST_1:40556 client's >>>>> lastZxid is 0x0 >>>>> 2011-02-11 14:18:55,023 - INFO >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@770] - Client >>>>> attempting to renew session SESSION_1 at /HOST_1:40556 >>>>> 2011-02-11 14:18:55,024 - INFO >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1573] - >>>>> Invalid session SESSION_1 for client /HOST_1:40556, probably expired >>>>> 2011-02-11 14:18:55,025 - WARN >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@634] - >>>>> EndOfStreamException: Unable to read additional data from client >>>>> sessionid SESSION_1, likely client has closed socket >>>>> 2011-02-11 14:18:55,025 - INFO >>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:4799:NIOServerCnxn@1435] - >>>>> Closed socket connection for client /HOST_1:40556 which had sessionid >>>>> SESSION_1 >>>>> >>>>> >>>>> On Thu, Feb 10, 2011 at 8:11 PM, Benjamin Reed <[email protected]> >>>>> wrote: >>>>>> do you see anything in the server log? >>>>>> >>>>>> ben >>>>>> >>>>>> On 02/10/2011 03:16 PM, Patania, Nick wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I run the following test: >>>>>>> >>>>>>> >>>>>>> * Two clients connect to a zookeeper server; the tickTime on the >>>>>>> server is 500, and the sessionTimeout on the client is 1000. >>>>>>> >>>>>>> * Kill the host running one of the clients. >>>>>>> >>>>>>> * The second client receives a session timeout. >>>>>>> >>>>>>> Is there any reason why this might happen? >>>>>>> >>>>>>> Thanks >>>>>>> Nick Patania >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> NOTICE: Morgan Stanley is not acting as a municipal advisor and the >>>>>>> opinions or views contained herein are not intended to be, and do not >>>>>>> constitute, advice within the meaning of Section 975 of the Dodd-Frank >>>>>>> Wall >>>>>>> Street Reform and Consumer Protection Act. If you have received this >>>>>>> communication in error, please destroy all electronic and paper copies >>>>>>> and >>>>>>> notify the sender immediately. Mistransmission is not intended to waive >>>>>>> confidentiality or privilege. Morgan Stanley reserves the right, to the >>>>>>> extent permitted under applicable law, to monitor electronic >>>>>>> communications. >>>>>>> This message is subject to terms available at the following link: >>>>>>> http://www.morganstanley.com/disclaimers. If you cannot access these >>>>>>> links, >>>>>>> please notify us by reply message and we will send the contents to you. >>>>>>> By >>>>>>> messaging with Morgan Stanley you consent to the foregoing. >>>>>> >>>>>> >>>>> >>>> >>> >> >
