Re: [courier-users] Re: Enhancement request

2004-08-06 Thread Jon Nelson
On Wed, 4 Aug 2004, Sam Varshavchik wrote:

 Jon Nelson writes:

 
  I recently gave Dovecot a try.
  It's not nearly as featureful (or seemingly as stable) as
  courier-imap, but it has one very important distinction:
 
  It is *wicked* fast.
 
  It made me think - indexes are what makes dovecot so fast.  What would
  it take to add similar indexing capabilities to courier-imap?

 By showing a need for it without using the word “benchmark”.

OK.  I know for a fact a number of people that refuse to use
courier-imap because it is much too slow when dealing with large
mailboxes.  By large I mean anything much over 10 thousand or so
entries.  Don't talk to me about filesystems, or CPU speed, or anything
else.  As I explain below, on identical hardware with identical
filesystems with identical everything, dovecot was much faster.

This is how I arrived at that conclusion:

To satisfy my own curiosity, I installed dovecot, because it could use
my maildirs in-place.  Most of the server-created files are identical,
if named differently (courierimapuiddb is the same as dovecot-uid, or
whatever, and so on).

I cracked open my client, pine, and opened a folder. It contains 4000
messages.  I then sorted and timed the operation. It took 17 seconds.
I repeated this operation several times, and consistently got between
5 and 6 seconds (I assume because of cache effect, because the client
sends the same request each time.)

I closed pine, shut down courier-imap, and started dovecot.  Performing
the same tests on the same folder netted me a re-sorted folder in less
than 1 second every time.  There was a very slightly longer delay
opening the folder, the first time (I assume because of the time it
took to index the folder).  Subsequent openings were faster in
dovecot than in courier.

Then I performed the tests back-to-back.  courier then dovecot (and vice
versa), each time deleting the indexes.  The results were fairly
consistent -- dovecot produced *no* delay opening a folder once the
index had been created, and *no* delay to sort the folder in any way I
chose.  You'll notice the caveat, which I explained in the previous
paragraph.

An example of the IMAP session follows.  Here, I am sorting by THREAD.

19:58:43.823813 read(0, 001a THREAD REFERENCES US-ASCII ALL\r\n,
8192) = 41
19:58:55.474217 write(1, * THREAD ((2)(3)(4 

As you can see, it took 12 seconds (11.5 or so) to perform the
operation.  I performed these tests on several folders.

The fact of the matter is that on identical hardware, with identical
mailboxes, dovecot was faster, sometimes much much faster, than
courier-imap.

Searching and sorting are the two easiest ways of experiencing this
difference.  The client used was pine.

 People do not use IMAP servers to run benchmarks.  People use IMAP servers
 to read mail.  All IMAP-reading mail clients that might be considered
 popular will cache all message metadata.  When you're scrolling through the
 folder's index the IMAP client is not going to issue a server request for
 every new message that's scrolled into view.  All of the message metadata
 will be cached.  So if the mail client wants to resort the folder it won't
 ask the server to do it, it'll do the job itself.

I have as much healthy disdain for synthetic benchmarks as the next guy.

 So, if you want to evaluate indexing you need to take a reasonably
 popular IMAP client, log its IMAP commands, then show how indexing
 will help. Arbitrary benchmarks won't cut it, and adding indexes for
 the benefit of a lesser-used IMAP tool will come at the expense of
 greater overhead for the rest of the IMAP clients, which makes no
 sense.

If you consider pine a 'lesser-used' IMAP tool, then what you are saying
in effect is that courier-imap is not suitable for use with pine and
large mailboxes.

Why does adding an index necessarily create more overhead?  (The real
question here is whether the additional overhead results in a more
efficient or faster time-to-response on client queries).

Certainly there is overhead in maintaining an index, but the purpose of
an index is faster data retrieval.  It stands to reason that having an
index would benefit even very sophisticated IMAP clients.

One of the reasons IMAP exists is because people wanted *less* storage
on the client end, *less* client-side state from
invocation-to-invocation.

The overwhelming majority of IMAP clients that I'm aware of don't store
much if anything at all between invocations (this statement is clearly
at odds with your statement: All IMAP-reading mail clients that might
be considered popular will cache all message metadata..)  Certainly,
some do, but for many people it's simply not practical to have to use
the same client on the same machine all the time.  It may be true that
Outlook and friends do cache metadata, but pine doesn't, and pine is
very popular, too.  By looking at this mailing list's User-Agent and
X-Mailer strings, Mozilla is the most popular, trailed by outlook, mutt,

Re: [courier-users] Re: Enhancement request

2004-08-04 Thread Bill Taroli
Sam Varshavchik wrote:
Jon Nelson writes:
I recently gave Dovecot a try.
It's not nearly as featureful (or seemingly as stable) as
courier-imap, but it has one very important distinction:
It is *wicked* fast.
It made me think - indexes are what makes dovecot so fast.  What would
it take to add similar indexing capabilities to courier-imap?
So, if you want to evaluate indexing you need to take a reasonably 
popular IMAP client, log its IMAP commands, then show how indexing 
will help.  Arbitrary benchmarks won't cut it, and adding indexes for 
the benefit of a lesser-used IMAP tool will come at the expense of 
greater overhead for the rest of the IMAP clients, which makes no sense.

I can't speak for him, but perhaps he's referring to server-side 
searches? I can't say that my database is all that large, but I do know 
that searches can take eons (I run a Solaris 9 system with 1GB RAM, 
2x300MHz procs, and LVD disk). I agree that server-side cache would do 
little for client performance. But is there anything other than a search 
that might benefit on the server side? And, regardless, are such 
features used enough to make the effort to include the caching worthwhile?

Bill


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [courier-users] Re: Enhancement request

2004-08-04 Thread Bill Taroli
Sam Varshavchik wrote:
Bill Taroli writes:
Sam Varshavchik wrote:
Jon Nelson writes:
I recently gave Dovecot a try.
It's not nearly as featureful (or seemingly as stable) as
courier-imap, but it has one very important distinction:
It is *wicked* fast.
It made me think - indexes are what makes dovecot so fast.  What would
it take to add similar indexing capabilities to courier-imap?

So, if you want to evaluate indexing you need to take a reasonably 
popular IMAP client, log its IMAP commands, then show how indexing 
will help.  Arbitrary benchmarks won't cut it, and adding indexes 
for the benefit of a lesser-used IMAP tool will come at the expense 
of greater overhead for the rest of the IMAP clients, which makes no 
sense.

I can't speak for him, but perhaps he's referring to server-side 
searches?

And how would the server telepathically know what the client is going 
to search for, and thus prepare a suitable index in advance?

Well, it's not as if we're talking about a normalized database here... 
there is a quite finite amount of data one can search. Just a brief 
review of search options in the clients I use suggests that the most 
likely thing to index are header fields. The body of the message, much 
like blob objects in databases, might well be considered something not 
worthy of indexing but still searchable -- no different than it is now. 
It might even be that what gets indexed be a local decision. As you 
suggest, indexing isn't one of those things that's usually 
one-size-fits-all...

And *maybe* the indexing itself isn't something that gets done in imapd 
at all... perhaps it would make more sense to extend the search 
capability by allowing the searches to be passed to an external agent, 
which itself would be interested in the indexing and management of the 
information to do searches ? Seems more in keeping with the Courier 
design philosophy I remember reading a long time ago that says if it 
isn't something that's focused on the processing of mail, then implement 
that somewhere else. In this case, the something else would then need 
to be hooked via imapd when search requests were executed.

Bill


smime.p7s
Description: S/MIME Cryptographic Signature