Re: [courier-users] Re: Enhancement request
On Wed, 4 Aug 2004, Sam Varshavchik wrote: Jon Nelson writes: I recently gave Dovecot a try. It's not nearly as featureful (or seemingly as stable) as courier-imap, but it has one very important distinction: It is *wicked* fast. It made me think - indexes are what makes dovecot so fast. What would it take to add similar indexing capabilities to courier-imap? By showing a need for it without using the word “benchmark”. OK. I know for a fact a number of people that refuse to use courier-imap because it is much too slow when dealing with large mailboxes. By large I mean anything much over 10 thousand or so entries. Don't talk to me about filesystems, or CPU speed, or anything else. As I explain below, on identical hardware with identical filesystems with identical everything, dovecot was much faster. This is how I arrived at that conclusion: To satisfy my own curiosity, I installed dovecot, because it could use my maildirs in-place. Most of the server-created files are identical, if named differently (courierimapuiddb is the same as dovecot-uid, or whatever, and so on). I cracked open my client, pine, and opened a folder. It contains 4000 messages. I then sorted and timed the operation. It took 17 seconds. I repeated this operation several times, and consistently got between 5 and 6 seconds (I assume because of cache effect, because the client sends the same request each time.) I closed pine, shut down courier-imap, and started dovecot. Performing the same tests on the same folder netted me a re-sorted folder in less than 1 second every time. There was a very slightly longer delay opening the folder, the first time (I assume because of the time it took to index the folder). Subsequent openings were faster in dovecot than in courier. Then I performed the tests back-to-back. courier then dovecot (and vice versa), each time deleting the indexes. The results were fairly consistent -- dovecot produced *no* delay opening a folder once the index had been created, and *no* delay to sort the folder in any way I chose. You'll notice the caveat, which I explained in the previous paragraph. An example of the IMAP session follows. Here, I am sorting by THREAD. 19:58:43.823813 read(0, 001a THREAD REFERENCES US-ASCII ALL\r\n, 8192) = 41 19:58:55.474217 write(1, * THREAD ((2)(3)(4 As you can see, it took 12 seconds (11.5 or so) to perform the operation. I performed these tests on several folders. The fact of the matter is that on identical hardware, with identical mailboxes, dovecot was faster, sometimes much much faster, than courier-imap. Searching and sorting are the two easiest ways of experiencing this difference. The client used was pine. People do not use IMAP servers to run benchmarks. People use IMAP servers to read mail. All IMAP-reading mail clients that might be considered popular will cache all message metadata. When you're scrolling through the folder's index the IMAP client is not going to issue a server request for every new message that's scrolled into view. All of the message metadata will be cached. So if the mail client wants to resort the folder it won't ask the server to do it, it'll do the job itself. I have as much healthy disdain for synthetic benchmarks as the next guy. So, if you want to evaluate indexing you need to take a reasonably popular IMAP client, log its IMAP commands, then show how indexing will help. Arbitrary benchmarks won't cut it, and adding indexes for the benefit of a lesser-used IMAP tool will come at the expense of greater overhead for the rest of the IMAP clients, which makes no sense. If you consider pine a 'lesser-used' IMAP tool, then what you are saying in effect is that courier-imap is not suitable for use with pine and large mailboxes. Why does adding an index necessarily create more overhead? (The real question here is whether the additional overhead results in a more efficient or faster time-to-response on client queries). Certainly there is overhead in maintaining an index, but the purpose of an index is faster data retrieval. It stands to reason that having an index would benefit even very sophisticated IMAP clients. One of the reasons IMAP exists is because people wanted *less* storage on the client end, *less* client-side state from invocation-to-invocation. The overwhelming majority of IMAP clients that I'm aware of don't store much if anything at all between invocations (this statement is clearly at odds with your statement: All IMAP-reading mail clients that might be considered popular will cache all message metadata..) Certainly, some do, but for many people it's simply not practical to have to use the same client on the same machine all the time. It may be true that Outlook and friends do cache metadata, but pine doesn't, and pine is very popular, too. By looking at this mailing list's User-Agent and X-Mailer strings, Mozilla is the most popular, trailed by outlook, mutt,
[courier-users] Re: Enhancement request
Jon Nelson writes: I cracked open my client, pine, and opened a folder. It contains 4000 messages. I then sorted and timed the operation. It took 17 seconds. That's what I suspected. Pine and mutt are fine mail clients, but unfortunately their mind share is very modest. Furthermore, they work differently, and use IMAP in a very different fashion than other, more popular IMAP clients. You should repeat your experiment with Mozilla, Thunderbird, Evolution, or other IMAP clients whose names I won't mention, but are easy to surmise. You will get much different results. It's certainly possible to add implicit indexing in such a way as to improve the response time in pine or mutt. Unfortunately that'll come as an expense of deteriorated response time to other, more popular IMAP clients, because the server will now waste time building indexes that it will never use. You can't please everyone, and unless you can find some ways to change things to the benefit of everyone, this will be a difficult argument to make. pgpbrKdXKmqz5.pgp Description: PGP signature
[courier-users] Re: Enhancement request
Jon Nelson writes: I recently gave Dovecot a try. It's not nearly as featureful (or seemingly as stable) as courier-imap, but it has one very important distinction: It is *wicked* fast. It made me think - indexes are what makes dovecot so fast. What would it take to add similar indexing capabilities to courier-imap? By showing a need for it without using the word benchmark. People do not use IMAP servers to run benchmarks. People use IMAP servers to read mail. All IMAP-reading mail clients that might be considered popular will cache all message metadata. When you're scrolling through the folder's index the IMAP client is not going to issue a server request for every new message that's scrolled into view. All of the message metadata will be cached. So if the mail client wants to resort the folder it won't ask the server to do it, it'll do the job itself. So, if you want to evaluate indexing you need to take a reasonably popular IMAP client, log its IMAP commands, then show how indexing will help. Arbitrary benchmarks won't cut it, and adding indexes for the benefit of a lesser-used IMAP tool will come at the expense of greater overhead for the rest of the IMAP clients, which makes no sense. pgpL2U3wbStE1.pgp Description: PGP signature
Re: [courier-users] Re: Enhancement request
Sam Varshavchik wrote: Jon Nelson writes: I recently gave Dovecot a try. It's not nearly as featureful (or seemingly as stable) as courier-imap, but it has one very important distinction: It is *wicked* fast. It made me think - indexes are what makes dovecot so fast. What would it take to add similar indexing capabilities to courier-imap? So, if you want to evaluate indexing you need to take a reasonably popular IMAP client, log its IMAP commands, then show how indexing will help. Arbitrary benchmarks won't cut it, and adding indexes for the benefit of a lesser-used IMAP tool will come at the expense of greater overhead for the rest of the IMAP clients, which makes no sense. I can't speak for him, but perhaps he's referring to server-side searches? I can't say that my database is all that large, but I do know that searches can take eons (I run a Solaris 9 system with 1GB RAM, 2x300MHz procs, and LVD disk). I agree that server-side cache would do little for client performance. But is there anything other than a search that might benefit on the server side? And, regardless, are such features used enough to make the effort to include the caching worthwhile? Bill smime.p7s Description: S/MIME Cryptographic Signature
[courier-users] Re: Enhancement request
Bill Taroli writes: Sam Varshavchik wrote: Jon Nelson writes: I recently gave Dovecot a try. It's not nearly as featureful (or seemingly as stable) as courier-imap, but it has one very important distinction: It is *wicked* fast. It made me think - indexes are what makes dovecot so fast. What would it take to add similar indexing capabilities to courier-imap? So, if you want to evaluate indexing you need to take a reasonably popular IMAP client, log its IMAP commands, then show how indexing will help. Arbitrary benchmarks won't cut it, and adding indexes for the benefit of a lesser-used IMAP tool will come at the expense of greater overhead for the rest of the IMAP clients, which makes no sense. I can't speak for him, but perhaps he's referring to server-side searches? And how would the server telepathically know what the client is going to search for, and thus prepare a suitable index in advance? pgpxKpArq6PVm.pgp Description: PGP signature
Re: [courier-users] Re: Enhancement request
Sam Varshavchik wrote: Bill Taroli writes: Sam Varshavchik wrote: Jon Nelson writes: I recently gave Dovecot a try. It's not nearly as featureful (or seemingly as stable) as courier-imap, but it has one very important distinction: It is *wicked* fast. It made me think - indexes are what makes dovecot so fast. What would it take to add similar indexing capabilities to courier-imap? So, if you want to evaluate indexing you need to take a reasonably popular IMAP client, log its IMAP commands, then show how indexing will help. Arbitrary benchmarks won't cut it, and adding indexes for the benefit of a lesser-used IMAP tool will come at the expense of greater overhead for the rest of the IMAP clients, which makes no sense. I can't speak for him, but perhaps he's referring to server-side searches? And how would the server telepathically know what the client is going to search for, and thus prepare a suitable index in advance? Well, it's not as if we're talking about a normalized database here... there is a quite finite amount of data one can search. Just a brief review of search options in the clients I use suggests that the most likely thing to index are header fields. The body of the message, much like blob objects in databases, might well be considered something not worthy of indexing but still searchable -- no different than it is now. It might even be that what gets indexed be a local decision. As you suggest, indexing isn't one of those things that's usually one-size-fits-all... And *maybe* the indexing itself isn't something that gets done in imapd at all... perhaps it would make more sense to extend the search capability by allowing the searches to be passed to an external agent, which itself would be interested in the indexing and management of the information to do searches ? Seems more in keeping with the Courier design philosophy I remember reading a long time ago that says if it isn't something that's focused on the processing of mail, then implement that somewhere else. In this case, the something else would then need to be hooked via imapd when search requests were executed. Bill smime.p7s Description: S/MIME Cryptographic Signature