Hi James, Can you try the patch in TS-1074 (https://issues.apache.org/jira/browse/TS-1074) to see if that resolves your problem.
Brian On Jan 25, 2012, at 6:36 PM, "Whitaker, James X. -ND" <[email protected]> wrote: > Hi, > > Apologies for the length, but this is a strange one. > > The one thing that might be suspicious: I have a VM for development. My > development environment is VMWare on a mac (i7 MacBookPro), with CentOS > 5.7 as the guest OS that hosts ATS. > > I have scoured over every line of the ATS source code in the last few > days, to educate myself and to attempt to answer the following question: > > Why is every single request going to ATS taking so long to complete? > Callgrind and cachegrind shows approximately one millisecond of CPU time > for one particular example involving a cache hit, but wireshark is showing > response times that range from 13ms to 130ms or more for end-to-end > processing of cache hits, with an average of 50ms. ATS is apparently > going to sleep for a minimum of 10 milliseconds, typically 50 to 80 > milliseconds, and sometimes as long as 20 seconds (when under load) before > wireshark shows ATS issuing either a cached response or a request to a > service that is queued up by our plugin. > > As near as I can tell, ATS is putting each and every request to sleep, > rather than just blasting it through. > > There are very respectable throughput numbers, but the time for each > request is stunningly slow. > > This is absolutely baffling. > > There is no I/O pending, no logging, no peer caches active... nothing > else happening. No other polling processing, no other requests in the > pipeline other than those issued by my ruby script that sends them out one > at a time. I start it up cold (cleared cache, rebooted machine, the > works) and have a 512MB ramcache configured.... with 5Gig of RAM allocated > to the VM. I shut off all logging, all peer caching... anything that can > hit the disks or introduce any kind of dependency, and tell it to serve up > a 100 byte chunk of cached data.... > > I tried recompiling with EAGER_SIGNALLING and other such possibilities, > and they ALL increased the latency/lag. > > A week ago, I was seeing about 5% of requests completing in an acceptable > 2.2 milliseconds, but in the last week, the fastest time for processing a > cache hit, on an ATS server being polled by my ruby script that I have > seen is 12.2 milliseconds. The shortest time showed in wireshark for > issuing an http request queued up by our plugin is 36ms after the initial > GET coming into ATS, but this delay is usually around 130 milliseconds. I > would expect to see half a millisecond. When the request comes back over > the wire from the http request instigated by our plugin, ATS issues the > expected reply to the client in less than a millisecond with no apparent > sleep state intervening. > > I will run back to back requests from my polling ruby script, and the CPU > usage level is so low it sometimes does not show up in ³top². > > The AVERAGE response time is 40ms for a (I would think would be) > ram-cached request I would expect the server to handle in 1ms. I have > thrown hundreds of polling processes and threads at it, and it's responses > per second and overall throughput is quite respectable.... there is just > this bizarre inexplicable latency where ATS seems to be going to sleep on > every single request before doing anything with it, for absolutely no > reason. > > I tried disabling accept threads by setting it to zero in records.config. > I tried every combination of thread counts and ratios I could think of. > > Looking over the iocore/eventsystem/UnixEThread.cc I notice that it is > apparently dequeuing from one queue and putting events on another queue > with a TIME TO EXECUTE marker flag associated with that command, and this > is a priority queue that is sorted into bins. Since the contents of these > priority queues are time based, with nanosecond granularity they are > changing radically every millisecond; trying to figure out what is in them > at any point in time is not proving very easy... debuggers are not > cooperating with me. This code, and nanosecond granularity strikes me as > really odd. > > The only thing I can think of is that their is a clock granularity/jitter > problem between the clock samples. ATS code is sampling the time of day > clock, causing the re-enqueued priority queue items to suddenly get an > execution time stamp (that is supposed to be a hundred nanoseconds or so > in the past the time I would estimate that it takes to process the > re-enquing of the priority queue items). I suspect this time stamp might > have jitter in it causing items just pulled off the queue to go to sleep > waiting for their ³time² to come around. This might be a CentOS problem, > it might be a VMWare clock problem, it might be a Mac clock problem... it > might be a VMWare buffering the time of day clock problem (that seems the > most likely to me).... One thought that comes to mind is that VMWare > might be the cause of all the lag in the form of buffered clocks and only > allowing ³wake up² style signals every 50ms or something strange like > that. If it was a clock buffering problem, I would expect to see a wide > distribution of delays starting with ZERO and ranging up to whatever the > clock granularity problem is. However, every request gets a minimum of a > 10ms delay. > > There are bunches of timings between 11.9 and 12.3ms, and a bunch between > 20.78 and 23.7 and then next batch is at 31 to 34 ms, etc... with clusters > between X2.0 ms and X4.0ms for each request, where X is a random tens > place digit. This suggests a 10ms granularity in a timer or a reference > clock. Maybe VMWare or CentOS only supports 10ms timer granularity when > something doesn't get immediately pulled off the queue? > > Another thought I had was turning off chunking mode... maybe ATS thinks > we are not done forming our responses or queries, and is buffering up > everything until a timeout tells it to go ahead and to dump ³incompletely > formed stream data² as chunk buffered I/O in the ³OK, X milliseconds have > passed, just dump your buffers because there is a delay getting the next > chunk out² paradigm? Are these 10,20,30,40,50,60,70ms delays the products > of waiting for a chunk buffering timeout? Turning off chunking mode in > records.config dropped the average latency to 36 milliseconds from 42. I > would think it would make the problem go completely away or not do > anything at all. Mystifying. > > There are some spots in the code where we call out to TSContCall (contp, > TS_EVENT_VCONN_EOS, data); to tell it that a READ is done so that ATS > wakes up and continues. That reduced an annoying latency from 80ms to > 15ms. > > I tried putting in: > TSContCall(TSVIOContGet(mRequestVio), TS_EVENT_VCONN_WRITE_COMPLETE, > mRequestVio); > after one call to: > mRequestVio = TSVConnWrite(mVc, httpCont, mServerRequestBufferReader, > content_length); > This didn't do anything at all to the latencies. Maybe this is the wrong > call? > > I'm rather stumped. Does anyone there have ideas? Is the answer ³ATS was > not designed to run on VM's²? > > Thank you for reading this far, > > James Whitaker > >
