Also, I know that some people have had issues with CONFIG proxy.config.http.share_server_sessions, you might try changing that config value to 0 and seeing if that improves performance.
Brian On 1/25/12 6:58 PM, "Brian Geffon" <[email protected]> wrote: >Hi James, >Can you try the patch in TS-1074 >(https://issues.apache.org/jira/browse/TS-1074) to see if that >resolves your problem. > > >Brian > >On Jan 25, 2012, at 6:36 PM, "Whitaker, James X. -ND" ><[email protected]> wrote: > >> Hi, >> >> Apologies for the length, but this is a strange one. >> >> The one thing that might be suspicious: I have a VM for development. My >> development environment is VMWare on a mac (i7 MacBookPro), with CentOS >> 5.7 as the guest OS that hosts ATS. >> >> I have scoured over every line of the ATS source code in the last few >> days, to educate myself and to attempt to answer the following question: >> >> Why is every single request going to ATS taking so long to complete? >> Callgrind and cachegrind shows approximately one millisecond of CPU time >> for one particular example involving a cache hit, but wireshark is >>showing >> response times that range from 13ms to 130ms or more for end-to-end >> processing of cache hits, with an average of 50ms. ATS is apparently >> going to sleep for a minimum of 10 milliseconds, typically 50 to 80 >> milliseconds, and sometimes as long as 20 seconds (when under load) >>before >> wireshark shows ATS issuing either a cached response or a request to a >> service that is queued up by our plugin. >> >> As near as I can tell, ATS is putting each and every request to sleep, >> rather than just blasting it through. >> >> There are very respectable throughput numbers, but the time for each >> request is stunningly slow. >> >> This is absolutely baffling. >> >> There is no I/O pending, no logging, no peer caches active... nothing >> else happening. No other polling processing, no other requests in the >> pipeline other than those issued by my ruby script that sends them out >>one >> at a time. I start it up cold (cleared cache, rebooted machine, the >> works) and have a 512MB ramcache configured.... with 5Gig of RAM >>allocated >> to the VM. I shut off all logging, all peer caching... anything that >>can >> hit the disks or introduce any kind of dependency, and tell it to serve >>up >> a 100 byte chunk of cached data.... >> >> I tried recompiling with EAGER_SIGNALLING and other such possibilities, >> and they ALL increased the latency/lag. >> >> A week ago, I was seeing about 5% of requests completing in an >>acceptable >> 2.2 milliseconds, but in the last week, the fastest time for processing >>a >> cache hit, on an ATS server being polled by my ruby script that I have >> seen is 12.2 milliseconds. The shortest time showed in wireshark for >> issuing an http request queued up by our plugin is 36ms after the >>initial >> GET coming into ATS, but this delay is usually around 130 milliseconds. >> I >> would expect to see half a millisecond. When the request comes back >>over >> the wire from the http request instigated by our plugin, ATS issues the >> expected reply to the client in less than a millisecond with no apparent >> sleep state intervening. >> >> I will run back to back requests from my polling ruby script, and the >>CPU >> usage level is so low it sometimes does not show up in ³top². >> >> The AVERAGE response time is 40ms for a (I would think would be) >> ram-cached request I would expect the server to handle in 1ms. I have >> thrown hundreds of polling processes and threads at it, and it's >>responses >> per second and overall throughput is quite respectable.... there is >>just >> this bizarre inexplicable latency where ATS seems to be going to sleep >>on >> every single request before doing anything with it, for absolutely no >> reason. >> >> I tried disabling accept threads by setting it to zero in >>records.config. >> I tried every combination of thread counts and ratios I could think of. >> >> Looking over the iocore/eventsystem/UnixEThread.cc I notice that it is >> apparently dequeuing from one queue and putting events on another queue >> with a TIME TO EXECUTE marker flag associated with that command, and >>this >> is a priority queue that is sorted into bins. Since the contents of >>these >> priority queues are time based, with nanosecond granularity they are >> changing radically every millisecond; trying to figure out what is in >>them >> at any point in time is not proving very easy... debuggers are not >> cooperating with me. This code, and nanosecond granularity strikes me >>as >> really odd. >> >> The only thing I can think of is that their is a clock >>granularity/jitter >> problem between the clock samples. ATS code is sampling the time of day >> clock, causing the re-enqueued priority queue items to suddenly get an >> execution time stamp (that is supposed to be a hundred nanoseconds or so >> in the past the time I would estimate that it takes to process the >> re-enquing of the priority queue items). I suspect this time stamp >>might >> have jitter in it causing items just pulled off the queue to go to sleep >> waiting for their ³time² to come around. This might be a CentOS >>problem, >> it might be a VMWare clock problem, it might be a Mac clock problem... >>it >> might be a VMWare buffering the time of day clock problem (that seems >>the >> most likely to me).... One thought that comes to mind is that VMWare >> might be the cause of all the lag in the form of buffered clocks and >>only >> allowing ³wake up² style signals every 50ms or something strange like >> that. If it was a clock buffering problem, I would expect to see a wide >> distribution of delays starting with ZERO and ranging up to whatever the >> clock granularity problem is. However, every request gets a minimum >>of a >> 10ms delay. >> >> There are bunches of timings between 11.9 and 12.3ms, and a bunch >>between >> 20.78 and 23.7 and then next batch is at 31 to 34 ms, etc... with >>clusters >> between X2.0 ms and X4.0ms for each request, where X is a random tens >> place digit. This suggests a 10ms granularity in a timer or a reference >> clock. Maybe VMWare or CentOS only supports 10ms timer granularity when >> something doesn't get immediately pulled off the queue? >> >> Another thought I had was turning off chunking mode... maybe ATS thinks >> we are not done forming our responses or queries, and is buffering up >> everything until a timeout tells it to go ahead and to dump >>³incompletely >> formed stream data² as chunk buffered I/O in the ³OK, X milliseconds >>have >> passed, just dump your buffers because there is a delay getting the next >> chunk out² paradigm? Are these 10,20,30,40,50,60,70ms delays the >>products >> of waiting for a chunk buffering timeout? Turning off chunking mode in >> records.config dropped the average latency to 36 milliseconds from 42. >>I >> would think it would make the problem go completely away or not do >> anything at all. Mystifying. >> >> There are some spots in the code where we call out to TSContCall (contp, >> TS_EVENT_VCONN_EOS, data); to tell it that a READ is done so that ATS >> wakes up and continues. That reduced an annoying latency from 80ms to >> 15ms. >> >> I tried putting in: >> TSContCall(TSVIOContGet(mRequestVio), TS_EVENT_VCONN_WRITE_COMPLETE, >> mRequestVio); >> after one call to: >> mRequestVio = TSVConnWrite(mVc, httpCont, mServerRequestBufferReader, >> content_length); >> This didn't do anything at all to the latencies. Maybe this is the >>wrong >> call? >> >> I'm rather stumped. Does anyone there have ideas? Is the answer ³ATS >>was >> not designed to run on VM's²? >> >> Thank you for reading this far, >> >> James Whitaker >> >>
