Reformatted excerpts from Edward Z. Yang's message of 2009-06-03:
> http://web.mit.edu/~ezyang/Public/sup-performance.png
> 
> Look at String::=~. Definitely not acceptable.

Doing some profiling on my end, it looks like the majority of IMAP
syncing time is spent in these five methods:

Redwood::Index#load_entry_for_id (22%)
Redwood::IMAP#load_message (25%)
Redwood::Message#message_to_chunks (16.5%)
Redwood::IMAP#load_header (14%)
Redwood::Index#sync_message (13%)

Four of those are essentially wrappers around IMAP or Ferret methods.
The Sup-specific one is Message#message_to_chunks. But message_to_chunks
and its callee text_to_chunks doesn't seem to have a major culprits.
String#=~ only takes 2.3% of the CPU time, and a chunk of that is coming
from RubyMail.

Just for good measure, I "manually" measured the individual regexen in
text_to_chunks. After parsing 100 messages from an IMAP server, which
took 1m27s for me, I got:

           time (s)   calls
     bqp = 0.00854 /  1789 = 209411.2 calls/second
      n1 = 0.00218 /   313 = 143709.8 calls/second
     qsp = 0.03212 /  1923 =  59873.0 calls/second
      n2 = 0.00202 /   312 = 154226.4 calls/second
   empty = 0.00061 /    90 = 146341.5 calls/second
      sp = 0.02570 /  1927 =  74995.1 calls/second
      qp = 0.03392 /  4459 = 131452.5 calls/second

The names are abbreviations for the various regexen in that method. You
can see that the cumulative time spent on any one regex is at most .034
seconds (qp, which is QUERY_PATTERN), and that the slowest one is qsp,
QUERY_START_PATTERN.

Incidentally, I can parse IMAP mailboxes at a little over 1 message/s,
and mbox files at ~50 messages/second, which also suggests that the IMAP
libraries are the biggest time sink here.

This is all with ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]), Sup
git next.

Now that's all CPU stuff. There might be memory blowup issues. If
nothing else, Sup leaks memory over time, but fixing that involves the
getting into the hellish world of Ruby<->C land or the even more hellish
internals of MRI, so I've been loathe to start down that path.

You might be able to speed up sup-sync runs on IMAP by threading the
network access and the parsing. But the IMAP connection seems pretty
CPU-heavy so who knows.

None of this answers the original question of why all Ruby threads block
when Sup waits for a response from IMAP. Since I'm pretty sure the IMAP
libraries are all Ruby (why they're so slow!), and that core Ruby IO
should be nonblocking, this might be an interpreter bug. Are you able to
pinpoint what part of MRI is blocking?
-- 
William <wmorgan-...@masanjin.net>
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk

Reply via email to