Re: [Evolution-hackers] Moving the struct instance heap space to mmap

Philip Van Hoof Fri, 08 Sep 2006 01:24:33 -0700

On Fri, 2006-09-08 at 12:44 +0930, Michael Zucchi wrote:

> Hmm, will it really reduce memory usage though?  I'm not so sure.
> Remember that although the summary file contains all strings on disk,
> in memory they are actually unique-ized.  For a lot of load-types
> (e.g. mailing lists), this saves a trememdous amount of memory - even
> after all the overhead of the hashtable to manage it.  Address and
> subject strings in particular.  And a mmap file isn't terribly
> different to performance to malloc'd memory - once the latter is
> setup, in both cases the kernel is the one loading it off disk (swap,
> or filesystem).

With the offsets in a different index mmap I can do the same trick of
course. E.g. let the offset of two fields be the same (let them point to
a uniqified version of the string, for example the first).

It will indeed be tricky when writing out the summary and index to
support this. But not impossible. Hard ... is fun :). If it wouldn't be
hard, it wouldn't be a challenge, lots of our community members wouldn't
be interested in writing it ;-)

I am. I of course do have priorities for tinymail and other projects.
But this being one of those funky ideas that I can't get rid of in my
head, sorts it higher in my prio-list (else I go nuts).

> (an mmap file IS just memory, so if "mmap'd file size + in-memory
> support tables" > "in-memory version", then you haven't generally won
> - even if 'ps' shows you have).

I agree but a little but: An mmap can (and often will) be swapped in and
swapped out by the kernel. It, in general, knows better, than the
application developer possibly can, when it's a good idea to swap pages
in and out of mmaps on the system.

My opinion is that it's a particular interesting memory technique on
mobile devices. Mobile device users often don't care a lot that
switching between applications takes a few milliseconds before the
recently switched-to application becomes usable (the swap-in happened).

It can also be shared if multiple E-mail clients do things with the same
summary files. I agree this is uncommon but on large installations --
e.g. servers that run a lot instances of the E-mail client with a lot
shared folders or something like that -- this might be a significant
improvement. An application like Evolution can probably be adapted to
figure out about shared folders. Or an admin can be taught to set
symbolic links to the cache folders of shared accounts as a work-around
that can be used already (with some inotify magic that will reload the
summaries of folders of which the summary file got altered by another
process: for example respond with a mremap in the notified process).

> Note also that all of the summary items, and all of the strings thus
> stored, are using special allocators which reduce - significantly -
> the overhead of malloc, on these small allocations.  In earlier
> versions I tried a different string allocator for each messageinfo
> which mimicks much of what you're doing here, but in memory.  Strings
> were stored in an array structure, but I only stored a single pointer
> to the base of the array, and packed the strings into a NUL separated
> right-sized buffer, and searched each time I needed to access them,
> overhead was only about 5 bytes (pointer to the 'array' plus an array
> 'limit').   Performance was fine, but memory usage was significantly
> larger than the current model, where strings are 'unique-ized' - like
> 30% or more iirc. 

True. And we can mimic this behaviour with mmaps too. I do agree it
would be complex. Especially in the current Camel design. That's why we
need people like you, Michael, to help us develop the disk summary
concept ;-)

My opinion is that with a good design, all complex programming concepts
become simple, understandable, maintainable and doable.

KISS is too often used by people who have no f. idea what software
development is about. KISS is not "doing everything in Python" or "don't
type more than 5 lines per algorithm".

I'm a supporter of splitting complex things up in as much simple pieces
as needed to make all of the pieces understandable. Descartes was the
dude that taught us that (other than that, I'm much more into Socrates
by the way).

My point of this dialog: too complex doesn't exist. We've put a few
dudes on the moon and build technology that splits atoms. We, humans,
are designed to overcome complexity.

> Having extra indices on disk - you're really just writing a datbase
> table - and all of the associated complexity, consistency and
> coherency issues that implies.  And unless you do something 'proper'
> like a btree with a unique key, you're going to have to end up storing
> plenty of extra support stuff in memory anyway, e.g. a hashtable to
> find items by uid, the sorted list of messages for the view, and so
> on.

Finding the items would be done by calculating the offset in the index,
which can be done each time access to one of its members is needed. I
wouldn't use the uid but would use a "nth" integer. The uid, being
fixed-sized, can one of the members in the index. Thus making it
possible to search for the uid in this index (should be rather fast, in
an mmap it's probably as fast as searching it in a typical hashtable).

> We did a lot to reduce per-message overhead at the camel level, but
> its an area of diminishing returns - the ui table view will still take
> as much memory, etc. 

Aha but that can also be reduced if you develop a model that disposes
the rows that become invisible. With fixed-row and fixed-width and
implemented row_unref of a custom GtkTreeModel, you can already do
something like this with GtkTreeView. I'm sure ETable and its model type
can be adapted (or already are) to support something like this too.

Tinymail is suitable and tested to do this (with GtkTreeView) already.

I tested this principle with the Camel POP provider that doesn't have
support for summaries. What I did was load CamelMimeMessage instances
for each row that became visible. It worked, yet per row consumed a
CamelMimeMessage: an enormous amount of memory (compared to a summary
item). What didn't work was, of course, sorting. Because instantiating
CamelMimeMessage instances for each to-sort item, took days.

But the CamelMimeMessage instances of invisible rows weren't allocated.
Only the visible ones. Which kept memory consumption of tinymail below
10M for a 15,000 messages POP folder (valgrind measured). This wasn't
using the mmap concept, which in principle does more or less the same
with the summary data (but then using 4k pages in the kernel).

Viewing and scrolling, using this implementation, was already fast
enough on a mobile device like the Nokia 770. I'm not going to call is
"snappy" fast. Just "fast enough" or "acceptable fast".

I am now, like Evolution, going to use the MBOX provider for this in
tinymail. This support for POP without a summary, was indeed a proof of
concept in tinymail.

Thanks a lot for your point of view Michael. I value it.

> On 08/09/06, Philip Van Hoof <[EMAIL PROTECTED]> wrote:
>         On Thu, 2006-09-07 at 21:14 +0200, Philip Van Hoof wrote:
>         
>         > To read message n, you would simply do something like:
>         >
>         > from = sstart + *(istart + (sizeof (int) * 4 * n) + 1)
>         > subject = sstart + *(istart + (sizeof (int) * 4 * n) + 2) 
>         > to = sstart + *(istart + (sizeof (int) * 4 * n) + 3)
>         > flags = sstart + *(istart + (sizeof (int) * 4 * n) + 4)
>         >
>         
>         No no no no no. I meant (something like) this of course:
>         
>         
>         unsigned char *sstart = (uchar*)mmap(..), *istart =
>         (uchar*)mmap(..); 
>         #define AMOUNT_OF_OFFSETS_PER_RECORD 4
>         #define AOO AMOUNT_OF_OFFSETS_PER_RECORD
>         
>         from = sstart + *(istart + ((sizeof (uint32_t) * AOO * n) +
>         sizeof (uint32_t)))
>         subject = sstart + *(istart + ((sizeof (uint32_t) * AOO * n) +
>         sizeof (uint32_t))) 
>         to = sstart + *(istart + ((sizeof (uint32_t) * AOO * n) +
>         sizeof (uint32_t)))
>         flags = sstart + *(istart + ((sizeof (uint32_t) * AOO * n) +
>         sizeof (uint32_t)))
>         
>         To much pointers in my head ;)
>         
>         --
>         Philip Van Hoof, software developer at x-tend 
>         home: me at pvanhoof dot be
>         gnome: pvanhoof at gnome dot org
>         work: vanhoof at x-tend dot be
>         http://www.pvanhoof.be - http://www.x-tend.be 
>         
> 
-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be

_______________________________________________
Performance-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/performance-list

Re: [Evolution-hackers] Moving the struct instance heap space to mmap

Reply via email to