mh_header performance patch

2005-03-24 Thread Michael Klepikov
I read my MH mail from two clients: Emacs MH mode, directly from the MH 
dirs, and Mozilla via the UW IMAP server. The IMAP server frequently 
re-reads my mailboxes, which are sometimes large (e.g. 4k, 6k messages), 
and it was taking a long time. I found that the routine mh_headers 
always reads the entire message no matter how long it is and always does 
strcrlfcpy on the entire body, while I certainly won't be requesting all 
the bodies in the client. I felt it is doing unnecessary work, both I/O 
and CPU.

I made a change so that mh_headers reads only the beginning of each 
message file in 4kb chunks until it reaches the end of headers. I 
reorganized the code a bit, to avoid duplication between mh_headers and 
mh_text which requires both headers and body. Empirically, there tends 
to be not more than 3kb of headers in my messages, so 4kb seems like a 
good compromise. I didn't make it configurable, just hardcoded 4096.

After the change, folder scan time for 6k messages went down from 20+s 
to ~13s on a dual CPU Sunblade-2500 running Solaris 8, with Mozilla 
1.7.3 and imapd running on the same machine. Everything seems to work. I 
verified with debug logging that I actually do go through the 
header-only branch.

diff -c patch attached. Please feel free to adjust as necessary.
--Michael Klepikov
*** imap-2004c1.klm/src/osdep/unix/mh.c Thu Mar 24 15:43:50 2005
--- imap-2004c1/src/osdep/unix/mh.c Thu Mar 24 15:12:49 2005
***
*** 565,580 
if (mail_elt (stream,i)-sequence) mh_header (stream,i,j,NIL);
  }
  
! /* MH mail fetch message header
   * Accepts: MAIL stream
   *message # to fetch
   *pointer to returned header text length
   *option flags
   * Returns: message header in RFC822 format
   */
  
! char *mh_header (MAILSTREAM *stream,unsigned long msgno,unsigned long *length,
!long flags)
  {
unsigned long i,hdrsize;
int fd;
--- 565,581 
if (mail_elt (stream,i)-sequence) mh_header (stream,i,j,NIL);
  }
  
! /* MH mail fetch message header and (optionally) text
   * Accepts: MAIL stream
   *message # to fetch
   *pointer to returned header text length
   *option flags
+  *  a boolean flag whether or not need msg text
   * Returns: message header in RFC822 format
   */
  
! char *mh_msg_fetch (MAILSTREAM *stream,unsigned long msgno,
! unsigned long *length, long flags, int need_text)
  {
unsigned long i,hdrsize;
int fd;
***
*** 585,591 
*length = 0;/* default to empty */
if (flags  FT_UID) return ;/* UID call impossible */
elt = mail_elt (stream,msgno);/* get elt */
!   if (!elt-private.msg.header.text.data) {
/* purge cache if too big */
  if (LOCAL-cachedtexts  max (stream-nmsgs * 4096,2097152)) {
mail_gc (stream,GC_TEXTS);/* just can't keep that much */
--- 586,593 
*length = 0;/* default to empty */
if (flags  FT_UID) return ;/* UID call impossible */
elt = mail_elt (stream,msgno);/* get elt */
!   if (!elt-private.msg.header.text.data ||
!   (need_text  !elt-private.msg.text.text.data)) {
/* purge cache if too big */
  if (LOCAL-cachedtexts  max (stream-nmsgs * 4096,2097152)) {
mail_gc (stream,GC_TEXTS);/* just can't keep that much */
***
*** 608,634 
LOCAL-buf = (char *) fs_get ((LOCAL-buflen = sbuf.st_size) + 1);
  }
/* slurp message */
! read (fd,LOCAL-buf,sbuf.st_size);
/* tie off file */
! LOCAL-buf[sbuf.st_size] = '\0';
! close (fd);   /* flush message file */
/* find end of header */
! for (i = 0,t = LOCAL-buf; *t  !(i  (*t == '\n')); i = (*t++ == 
'\n'));
/* number of header bytes */
! hdrsize = (*t ? ++t : t) - LOCAL-buf;
! elt-rfc822_size =/* size of entire message in CRLF form 
*/
!   (elt-private.msg.header.text.size =
!strcrlfcpy (elt-private.msg.header.text.data,i,LOCAL-buf,
!  hdrsize)) +
!(elt-private.msg.text.text.size =
! strcrlfcpy (elt-private.msg.text.text.data,i,t,
! sbuf.st_size - hdrsize));
/* add to cached size */
! LOCAL-cachedtexts += elt-rfc822_size;
}
*length = elt-private.msg.header.text.size;
return (char *) elt-private.msg.header.text.data;
  }
  
  /* MH mail fetch message text (body only)
   * Accepts: MAIL stream
--- 610,680 
LOCAL-buf = (char *) fs_get ((LOCAL-buflen = sbuf.st_size) + 1);
  }
/* slurp message */
! if (need_text) {
!   /*syslog (LOG_ALERT,Fetching msg with text #%ld, msgno);*/
!   read (fd,LOCAL-buf,sbuf.st_size);
   

Re: mh_header performance patch

2005-03-24 Thread Mark Crispin
Thank you very much!
The next release will be imap-2004d, a maintenance release.  imap-2004d is 
under a coding freeze for the upcoming Pine 4.63 release, and so your 
changes won't make it in in imap-2004d.

However, I'm doing some related performance improvements, as part of the 
imap-2005 work, to other drivers.  Your changes, or a variants of these 
changes, will definitely be part of imap-2005.

Thanks again!
-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.