[Evolution-hackers] Memory usage of CamelFolderSummary and CamelImapFolder
I did another round of checking where the memory of Camel is going to in tinymail's Camel. I have also tested its Camel with Evolution with success. Tinymail's Camel has the following features: o. The CamelFolderSummary uses mmap. This significantly reduces memory usage because an mmap is on-demand paged. o. The CamelMessageInfoBase structure is significantly smaller (this has been put in #ifdef's so that it can easily be reversed for Evolution). This reduces memory usage of the memory that can't be mmap()ed. o. When new message-info arrives, each 1000th item the CamelFolderSummary is dumped to disk and effectively reloaded. The pstring_hashtable will also be updated (pstrings that no longer occur are unreferenced) o. The CamelImapFolder consumes much less bandwidth by asking for a lot less headers and by more efficiently forming IMAP commands, the procedure has been reimplemented into a cancellable one. By that I mean that if a cancellation happens, already received data can mostly be recovered (and will be recovered). This without the continuations which are only available in IMAP4rev1 (by simply storing data more quickly on disk, and starting from the previous store point) o. When the CamelFolderSummary instance is reloaded, it will reuse CamelMessageInfo instances. It will not destroy them unless they have been removed during an expunge request. If the message-info is available in the mmap, it will unreference the pstrings that might have been in use by the message-info and it will reassign the struct's char pointers to locations in the mmap. Because Camel has a property accessor, this even works on folders that are open (messages that are currently visible should not be a problem because the tree-view makes copies of the strings when they need to become visible -- the GtkTreeView does, I haven't checked Evolution's but Evolution didn't crash after I did a a lot basic removing, copying, scrolling and moving of messages). o. The this is a non-mmapped message-info instance flags have been put in the flags member rather than separate gbooleans which consumed another two ints in memory per instance These four/five/six dots (together) solve all the remaining problems that the original mmap patch had (the most important issue was that when new messages arrived, those where not mmap()ed, they are now reloaded periodically hence will be mmap()ed during such a reload quickly). The speed of fetching new messages is actually much faster than the original Evolution implementation. That's probably because a lot less bandwidth is needed. If you would count without the bandwidth optimisation (on a hypothetical extremely fast IMAP service) you would see a performance hit compared to the original one. That's mostly because each 1000th received message-info, a reload happens. For a massif graph of downloading 3000, 800, 3000, 800, 3000, 800 headers you can check out this message on the tinymail mailing list: http://mail.gnome.org/archives/tinymail-devel-list/2006-November/msg00111.html The relevant code: https://svn.tinymail.org/svn/tinymail/trunk/libtinymail-camel/camel-lite/camel/camel-folder-summary.c https://svn.tinymail.org/svn/tinymail/trunk/libtinymail-camel/camel-lite/camel/providers/imap/camel-imap-folder.c If I can assist people with bringing their Camel into this shape, then please let me know. -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org work: vanhoof at x-tend dot be blog: http://pvanhoof.be/blog ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] Memory usage of CamelFolderSummary and CamelImapFolder
On Thu, 2006-11-16 at 16:13 +0100, Philip Van Hoof wrote: I did another round of checking where the memory of Camel is going to in tinymail's Camel. I have also tested its Camel with Evolution with success. Tinymail's Camel has the following features: Next to these features, tinymail's Camel also: o. Removes all compilation warnings o. Uses all of Matthew Barnes's very nice patches (except the GStringChunk one) (thanks a lot Matthew) o. Uses all my GSlice-and-other patches (check the patches mailing list of Evolution to get an overview) o. Renames all the library-names to avoid filename conflicts o. Removes the libedataserver dependency (a very small libedataserver is statically linked) o. Has correct linking (the provider's libraries have all incorrect LDFLAGS crap, and link with WAY to much libraries) -- I hate it when people have no clue about what they are doing in the Makefile.am or simply forget to remove old linking flags o. Has its own configure.ac and its own build environment, and is integrated with tinymail's build and repository o. Reimplements OpenSSL support (certificates are on todo) o. Still supports all typical features of the normal Camel (NSS/NSPR/SMIME/SSL/Kerberos/etc etc) o. Removes the folder meta-info patch (unused by Tinymail, and not really done very well -- consumes memory, and it looks like a lot, but in stead of testing it .. I simply removed it, so I don't know for sure) o. Eum ... o. Did I miss something? And next to existing features will tinymail's Camel soon: o. Support partial message retrieval in IMAP (certain) o. Support partial message retrieval in POP (uncertain) o. Support summaries in the POP provider (most likely) o. Support merging and backup of local cache (certain) o. Have correct certificate management for the SSL support (low priority) o. . . . . (you already know I wont stop, no matter what) o. The CamelFolderSummary uses mmap. This significantly reduces memory usage because an mmap is on-demand paged. o. The CamelMessageInfoBase structure is significantly smaller (this has been put in #ifdef's so that it can easily be reversed for Evolution). This reduces memory usage of the memory that can't be mmap()ed. o. When new message-info arrives, each 1000th item the CamelFolderSummary is dumped to disk and effectively reloaded. The pstring_hashtable will also be updated (pstrings that no longer occur are unreferenced) o. The CamelImapFolder consumes much less bandwidth by asking for a lot less headers and by more efficiently forming IMAP commands, the procedure has been reimplemented into a cancellable one. By that I mean that if a cancellation happens, already received data can mostly be recovered (and will be recovered). This without the continuations which are only available in IMAP4rev1 (by simply storing data more quickly on disk, and starting from the previous store point) o. When the CamelFolderSummary instance is reloaded, it will reuse CamelMessageInfo instances. It will not destroy them unless they have been removed during an expunge request. If the message-info is available in the mmap, it will unreference the pstrings that might have been in use by the message-info and it will reassign the struct's char pointers to locations in the mmap. Because Camel has a property accessor, this even works on folders that are open (messages that are currently visible should not be a problem because the tree-view makes copies of the strings when they need to become visible -- the GtkTreeView does, I haven't checked Evolution's but Evolution didn't crash after I did a a lot basic removing, copying, scrolling and moving of messages). o. The this is a non-mmapped message-info instance flags have been put in the flags member rather than separate gbooleans which consumed another two ints in memory per instance These four/five/six dots (together) solve all the remaining problems that the original mmap patch had (the most important issue was that when new messages arrived, those where not mmap()ed, they are now reloaded periodically hence will be mmap()ed during such a reload quickly). The speed of fetching new messages is actually much faster than the original Evolution implementation. That's probably because a lot less bandwidth is needed. If you would count without the bandwidth optimisation (on a hypothetical extremely fast IMAP service) you would see a performance hit compared to the original one. That's mostly because each 1000th received message-info, a reload happens. For a massif graph of downloading 3000, 800, 3000, 800, 3000, 800 headers you can check out this message on the tinymail mailing list:
Re: [Evolution-hackers] Memory usage of CamelFolderSummary and CamelImapFolder
On Thu, 2006-11-16 at 11:41 -0500, Joe Shaw wrote: Hey Joe, On Thu, 2006-11-16 at 16:13 +0100, Philip Van Hoof wrote: o. The CamelFolderSummary uses mmap. This significantly reduces memory usage because an mmap is on-demand paged. Does the on-disk format of the CamelFolderSummary change much or at all? In reading a summary from disk with Beagle, the main problem we've found is that it is entirely unsearchable, because records within the file are of variable length and there is no end-of-record marker, which means that you can't open the file, seek to some random location, and expect to find where the next (or previous) message begins. This means that any time the summary changes, we have to walk the whole thing over again to see changes. It does change. Now the files are mmap()able and all strings have end markers (\0 characters). All strings are also data-padded to four bytes. So you can mmap() the file and search for strings, once found you can walk-back to the uid of the E-mail. The variable-length is still the case. But I can, if necessary, adjust the summary file format to have end-of-item markers (so that you can walk the file back until you find that marker, and they you'll know the exact location of for example the uid). There was some work a while back to do a metasummary, which was essentially a summary of the summary for easier searching, but I'm not sure what the end result of that was, or if it's in 2.8 or newer. That is a patch that I have removed from tinymail's Camel (because I dislike its implementation). -- Philip Van Hoof, software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://www.pvanhoof.be/blog ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers