Thanks . We're running on Amazon EC2 so the network is unpredictable. Our app is doing a ton of other I/O via Amazon S3 as well as to Cassandra.
Even given that, asking for 6 seconds and getting 14-24 seems out of bounds. Problem only occurs after app has been running for an hour, with heavy load. GC is a good idea but nothing bad there. I could raise the session timeout and hardcode the timeout parameter to 3 or 4 seconds...a hack but it might get me past this issue. Brian Sent from my iPhone On Feb 26, 2013, at 1:24 PM, Camille Fournier <[email protected]> wrote: > Is it possible that something else is going on in the application that is > using this client, or are you observing this happen in a simple test > client? I don't think a 15-25s wait time is within any reasonable bounds of > "more or less". This timeout is passed down to native code in the JVM so a > bug there causing a "more" of that magnitude would probably affect a lot of > people. I would start looking into what kind of networking conditions could > be causing such a hang, assuming you don't have full GC happening that is > pausing the process during this period (which could cause such a long hang). > > > On Tue, Feb 26, 2013 at 12:15 PM, Brian Tarbox <[email protected]>wrote: > >> The main client loop involves sending keep-alive pings in-between calls to >> the NIO selector.select call which looks for data from the server >> (including ping responses). >> >> What I've found is that the select() which takes a timeout value takes a >> hugely varying time to complete. >> >> When asking for a max 6 second timeout on the select call I'm in fact >> staying in the call for 15-25 seconds. Which leads to starving the keep >> alives which leads to timeouts. >> >> Looking at the NIO documentation of the timeout parameter to select it >> says: >> timeout - If positive, block for up to timeout milliseconds, *more or less* >> * >> * >> Has anyone else seen this or have a suggestion for a work around? This >> seems like a basic flaw. If I can't count on timely return from select it >> seems to break the how keep-alive scheme. >> >> Thanks in advance for any help! >> >> Brian Tarbox >> >> -- >> http://about.me/BrianTarbox >>
