[Evolution-hackers] Memory usage of CamelFolderSummary and CamelImapFolder

2006-11-16 Thread Philip Van Hoof
I did another round of checking where the memory of Camel is going to in
tinymail's Camel.

I have also tested its Camel with Evolution with success.

Tinymail's Camel has the following features:

  o. The CamelFolderSummary uses mmap. This significantly reduces memory
 usage because an mmap is on-demand paged.

  o. The CamelMessageInfoBase structure is significantly smaller (this
 has been put in #ifdef's so that it can easily be reversed for
 Evolution). This reduces memory usage of the memory that can't be
 mmap()ed.

  o. When new message-info arrives, each 1000th item the
 CamelFolderSummary is dumped to disk and effectively reloaded. The
 pstring_hashtable will also be updated (pstrings that no longer
 occur are unreferenced)

  o. The CamelImapFolder consumes much less bandwidth by asking for a
 lot less headers and by more efficiently forming IMAP commands,
 the procedure has been reimplemented into a cancellable one. By
 that I mean that if a cancellation happens, already received data
 can mostly be recovered (and will be recovered). This without the
 continuations which are only available in IMAP4rev1 (by simply
 storing data more quickly on disk, and starting from the previous
 store point)

  o. When the CamelFolderSummary instance is reloaded, it will reuse
 CamelMessageInfo instances. It will not destroy them unless they
 have been removed during an expunge request.

 If the message-info is available in the mmap, it will unreference
 the pstrings that might have been in use by the message-info and it
 will reassign the struct's char pointers to locations in the mmap.

 Because Camel has a property accessor, this even works on folders
 that are open (messages that are currently visible should not be a
 problem because the tree-view makes copies of the strings when they
 need to become visible -- the GtkTreeView does, I haven't checked
 Evolution's but Evolution didn't crash after I did a a lot basic
 removing, copying, scrolling and moving of messages).

  o. The this is a non-mmapped message-info instance flags have been
 put in the flags member rather than separate gbooleans which
 consumed another two ints in memory per instance

These four/five/six dots (together) solve all the remaining problems
that the original mmap patch had (the most important issue was that when
new messages arrived, those where not mmap()ed, they are now reloaded
periodically hence will be mmap()ed during such a reload quickly).

The speed of fetching new messages is actually much faster than the
original Evolution implementation. That's probably because a lot less
bandwidth is needed. If you would count without the bandwidth
optimisation (on a hypothetical extremely fast IMAP service) you would
see a performance hit compared to the original one. That's mostly
because each 1000th received message-info, a reload happens.


For a massif graph of downloading 3000, 800, 3000, 800, 3000, 800
headers you can check out this message on the tinymail mailing list:

http://mail.gnome.org/archives/tinymail-devel-list/2006-November/msg00111.html


The relevant code:
https://svn.tinymail.org/svn/tinymail/trunk/libtinymail-camel/camel-lite/camel/camel-folder-summary.c
https://svn.tinymail.org/svn/tinymail/trunk/libtinymail-camel/camel-lite/camel/providers/imap/camel-imap-folder.c

If I can assist people with bringing their Camel into this shape, then
please let me know.

-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
work: vanhoof at x-tend dot be
blog: http://pvanhoof.be/blog

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Memory usage of CamelFolderSummary and CamelImapFolder

2006-11-16 Thread Philip Van Hoof
On Thu, 2006-11-16 at 16:13 +0100, Philip Van Hoof wrote:
 I did another round of checking where the memory of Camel is going to in
 tinymail's Camel.
 
 I have also tested its Camel with Evolution with success.
 
 Tinymail's Camel has the following features:
 

Next to these features, tinymail's Camel also:

  o. Removes all compilation warnings
  o. Uses all of Matthew Barnes's very nice patches (except the
 GStringChunk one) (thanks a lot Matthew)
  o. Uses all my GSlice-and-other patches (check the patches mailing
 list of Evolution to get an overview)
  o. Renames all the library-names to avoid filename conflicts
  o. Removes the libedataserver dependency (a very small libedataserver
 is statically linked)
  o. Has correct linking (the provider's libraries have all incorrect
 LDFLAGS crap, and link with WAY to much libraries) -- I hate it
 when people have no clue about what they are doing in the
 Makefile.am or simply forget to remove old linking flags
  o. Has its own configure.ac and its own build environment, and is
 integrated with tinymail's build and repository
  o. Reimplements OpenSSL support (certificates are on todo)
  o. Still supports all typical features of the normal Camel
 (NSS/NSPR/SMIME/SSL/Kerberos/etc etc)
  o. Removes the folder meta-info patch (unused by Tinymail, and not
 really done very well -- consumes memory, and it looks like a lot,
 but in stead of testing it .. I simply removed it, so I don't know
 for sure)

  o. Eum ...
  o. Did I miss something?


And next to existing features will tinymail's Camel soon:

  o. Support partial message retrieval in IMAP (certain)
  o. Support partial message retrieval in POP (uncertain)
  o. Support summaries in the POP provider (most likely)
  o. Support merging and backup of local cache (certain)
  o. Have correct certificate management for the SSL support (low
 priority)
  o. . . . . (you already know I wont stop, no matter what)



   o. The CamelFolderSummary uses mmap. This significantly reduces memory
  usage because an mmap is on-demand paged.
 
   o. The CamelMessageInfoBase structure is significantly smaller (this
  has been put in #ifdef's so that it can easily be reversed for
  Evolution). This reduces memory usage of the memory that can't be
  mmap()ed.
 
   o. When new message-info arrives, each 1000th item the
  CamelFolderSummary is dumped to disk and effectively reloaded. The
  pstring_hashtable will also be updated (pstrings that no longer
  occur are unreferenced)
 
   o. The CamelImapFolder consumes much less bandwidth by asking for a
  lot less headers and by more efficiently forming IMAP commands,
  the procedure has been reimplemented into a cancellable one. By
  that I mean that if a cancellation happens, already received data
  can mostly be recovered (and will be recovered). This without the
  continuations which are only available in IMAP4rev1 (by simply
  storing data more quickly on disk, and starting from the previous
  store point)
 
   o. When the CamelFolderSummary instance is reloaded, it will reuse
  CamelMessageInfo instances. It will not destroy them unless they
  have been removed during an expunge request.
 
  If the message-info is available in the mmap, it will unreference
  the pstrings that might have been in use by the message-info and it
  will reassign the struct's char pointers to locations in the mmap.
 
  Because Camel has a property accessor, this even works on folders
  that are open (messages that are currently visible should not be a
  problem because the tree-view makes copies of the strings when they
  need to become visible -- the GtkTreeView does, I haven't checked
  Evolution's but Evolution didn't crash after I did a a lot basic
  removing, copying, scrolling and moving of messages).
 
   o. The this is a non-mmapped message-info instance flags have been
  put in the flags member rather than separate gbooleans which
  consumed another two ints in memory per instance
 
 These four/five/six dots (together) solve all the remaining problems
 that the original mmap patch had (the most important issue was that when
 new messages arrived, those where not mmap()ed, they are now reloaded
 periodically hence will be mmap()ed during such a reload quickly).
 
 The speed of fetching new messages is actually much faster than the
 original Evolution implementation. That's probably because a lot less
 bandwidth is needed. If you would count without the bandwidth
 optimisation (on a hypothetical extremely fast IMAP service) you would
 see a performance hit compared to the original one. That's mostly
 because each 1000th received message-info, a reload happens.
 
 
 For a massif graph of downloading 3000, 800, 3000, 800, 3000, 800
 headers you can check out this message on the tinymail mailing list:
 
 

Re: [Evolution-hackers] Memory usage of CamelFolderSummary and CamelImapFolder

2006-11-16 Thread Philip Van Hoof
On Thu, 2006-11-16 at 11:41 -0500, Joe Shaw wrote:

Hey Joe,

 On Thu, 2006-11-16 at 16:13 +0100, Philip Van Hoof wrote:
o. The CamelFolderSummary uses mmap. This significantly reduces memory
   usage because an mmap is on-demand paged.
 
 Does the on-disk format of the CamelFolderSummary change much or at all?
 In reading a summary from disk with Beagle, the main problem we've found
 is that it is entirely unsearchable, because records within the file are
 of variable length and there is no end-of-record marker, which means
 that you can't open the file, seek to some random location, and expect
 to find where the next (or previous) message begins.  This means that
 any time the summary changes, we have to walk the whole thing over again
 to see changes.

It does change. Now the files are mmap()able and all strings have end
markers (\0 characters). All strings are also data-padded to four bytes.

So you can mmap() the file and search for strings, once found you can
walk-back to the uid of the E-mail.

The variable-length is still the case. But I can, if necessary, adjust
the summary file format to have end-of-item markers (so that you can
walk the file back until you find that marker, and they you'll know the
exact location of for example the uid).

 There was some work a while back to do a metasummary, which was
 essentially a summary of the summary for easier searching, but I'm not
 sure what the end result of that was, or if it's in 2.8 or newer.

That is a patch that I have removed from tinymail's Camel (because I
dislike its implementation).


-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog




___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers