Reformatted excerpts from Edward Z. Yang's message of 2009-06-03: > http://web.mit.edu/~ezyang/Public/sup-performance.png > > Look at String::=~. Definitely not acceptable.
Doing some profiling on my end, it looks like the majority of IMAP syncing time is spent in these five methods: Redwood::Index#load_entry_for_id (22%) Redwood::IMAP#load_message (25%) Redwood::Message#message_to_chunks (16.5%) Redwood::IMAP#load_header (14%) Redwood::Index#sync_message (13%) Four of those are essentially wrappers around IMAP or Ferret methods. The Sup-specific one is Message#message_to_chunks. But message_to_chunks and its callee text_to_chunks doesn't seem to have a major culprits. String#=~ only takes 2.3% of the CPU time, and a chunk of that is coming from RubyMail. Just for good measure, I "manually" measured the individual regexen in text_to_chunks. After parsing 100 messages from an IMAP server, which took 1m27s for me, I got: time (s) calls bqp = 0.00854 / 1789 = 209411.2 calls/second n1 = 0.00218 / 313 = 143709.8 calls/second qsp = 0.03212 / 1923 = 59873.0 calls/second n2 = 0.00202 / 312 = 154226.4 calls/second empty = 0.00061 / 90 = 146341.5 calls/second sp = 0.02570 / 1927 = 74995.1 calls/second qp = 0.03392 / 4459 = 131452.5 calls/second The names are abbreviations for the various regexen in that method. You can see that the cumulative time spent on any one regex is at most .034 seconds (qp, which is QUERY_PATTERN), and that the slowest one is qsp, QUERY_START_PATTERN. Incidentally, I can parse IMAP mailboxes at a little over 1 message/s, and mbox files at ~50 messages/second, which also suggests that the IMAP libraries are the biggest time sink here. This is all with ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]), Sup git next. Now that's all CPU stuff. There might be memory blowup issues. If nothing else, Sup leaks memory over time, but fixing that involves the getting into the hellish world of Ruby<->C land or the even more hellish internals of MRI, so I've been loathe to start down that path. You might be able to speed up sup-sync runs on IMAP by threading the network access and the parsing. But the IMAP connection seems pretty CPU-heavy so who knows. None of this answers the original question of why all Ruby threads block when Sup waits for a response from IMAP. Since I'm pretty sure the IMAP libraries are all Ruby (why they're so slow!), and that core Ruby IO should be nonblocking, this might be an interpreter bug. Are you able to pinpoint what part of MRI is blocking? -- William <wmorgan-...@masanjin.net> _______________________________________________ sup-talk mailing list sup-talk@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-talk