On Mar 28, 2007, at 6:28 PM, Benji York wrote:
Last week I spent a very enjoyable day (no kidding) debugging a
very, very slow cold-start situation (more than 15 minutes to
return from the first request). When making the first request to
the app (Zope 3 based), the app server and storage server would
show virtually no CPU utilization, and there would be about a
megabit of network traffic (on a gigabit link). There was no
obvious bottleneck.
After liberal application of strace, tcpdump, wireshark (aka
ethereal), and the Python profiler we discovered that while waiting
for an outstanding request for an object to load, ZEO calls a
threading.Connection
threading.Condition :)
instance's wait method with a timeout. When given a timeout that
method enters a wait loop with a time.sleep to sleep for a while
and then see if the condition has been met.
We found that time.sleep on that box had a minimum granularity of
10ms (when passed a non-zero value), thus causing each object load
to take approximately that long. As you can imagine, that somewhat
slowed down the retrieval of the several thousand objects required
to satisfy the initial request(s) (until the ZEO cache was
sufficiently warm).
The fix? Short-term: bump the operating system's timer interrupt
on that box to 1000Hz from 100Hz, increasing time.sleep's
granularity to 1ms (this was on Linux, Window's time.sleep appears
have a much higher resolution).
Long-term: Jim has found that the timeout call in the wait-for-
result code can be avoided, side-stepping the call to time.sleep
altogether.
What was the time to return the first request after the fix?
Jim
--
Jim Fulton mailto:[EMAIL PROTECTED] Python
Powered!
CTO (540) 361-1714
http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org
_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev