[Evolution-hackers] mmap() for the summary file

2006-06-11 Thread Philip Van Hoof
Hi there,

I've been trying to replace the fread()/fopen() implementation in
camel-folder-summary.c with an mmap() one.

I know camel-file-utils.c will put duplicate strings in a hashtable and
that way reduce memory usage for the summary information. Because a lot
mail boxes have duplicate strings for the From and To headers. I know
why and how this is implemented. And I understand that this already
reduces memory usage a lot.

However. On a small device with few memory resources, the kernel knows
better when to allocate and when to deallocate uncachable data like this
summary information.

Therefore I propose to replace the implementation with mmap(). Not only
I propose it, I also already tried it myself.

While trying this, I came to the conclusion that it *would* be possible
if the strings would have been terminated by '\0' in stead of being
stored pascal-like using an encoded unsigned 32 bit integer in front of
the string data.

That decision makes this (using the current file format) impossible,
unless the mmap'd memory (and therefore also the file on disk) is
constantly rewritten (with '\0' characters) or unless the entire
infrastructure that uses the summary strings is adapted to use this
length information rather than using the strings directly from the
mmap'ed memory *as* NULL terminated C strings (char arrays with a NULL
termination). The second solution implies that all would have to be
converted to GString's.

I think it would reduce memory usage of Evolution with ~40mb (depending
on the total amount of summary information being loaded). It would make
the sorting of the header summary view a little bit slower on certain
machines (mainly on machines that have very few memory resources left,
so that the kernel will not put a lot of this mmap'ed data in its
buffers/cache).

The file format should be adapted in two ways:

- Duplicate strings will need to be stored at only one location *on the
disk*. So the hashtable implementation wouldn't be a memory-only but
also a in-the-summary-file something.

For example: A string-field can be a pointer to the first character of
the string, or a pointer to another location in the file (in the mmap).

- Strings will need to be '\0' terminated *in the file* so that they are
directly usable from the mmap() memory block. 


Who are the brave souls that want to join me with this brain-damaging
idea? And would a change like this (which would mean that a migration
procedure each time an old folder-summary is loaded would need to run)
ever get upstream?

I measured (using valgrind) that most of the Evolution memory usages
goes to storing a in-memory version of the summary files. I also
measured that there's quite a lot memory segmentation going on (while
loading the summary file) and that it (the memory for the file) consumes
~ twice as much as the on-filesystem filesize of the summary file.

Loading using mmap() would be faster and wouldn't consume as much real
memory (it would consume a mmap, that is true, and that memory would
most likely go to the buffers/cache which the kernel manages, that is
also true). Sorting might become a little bit slower (but probably not
noticeable on most desktop hardware).

I'm being serious, I would like to waste my time on this one. If the
camel team of Evolution likes the idea (and wouldn't mind wasting some
of their time on it as well). If not ... I'd rather wait for the
disk-summary branch or for libspruce than to waste my time with it.
Because forking camel would mean wasting huge amounts of time on
maintaining a fork.

I attached a patch with my current tryout. I already load the header of
the summary file using mmap. That is already working. The difficult part
is, however, making the strings themselves usable. Because those aren't
NULL terminated. But please check the patch, you'll immediately see what
I mean.

Copying the strings, and NULL terminating the copy, is not a good option
because that would make the entire mmap-concept pointless (you still
copy it to real memory, so the entire reason-for-mmap then gone). Note
that this is what the current implementation also does: it copies the
string and null terminates the copy. And then frees the malloc that was
allocated for reading it from the file.

In fact is that copy unnecessary. Since fread() is a copy (and not like
mmap also real on-disk data), it wouldn't matter if you'd use the
original malloc()'d memory. This memory copying is probably causing the
memory segmentation I mentioned above. If you'd implement it like this,
you'd better at least used gslice.

But anyway ;)


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be
? camel-mime-tables.c
? finalise
Index: camel-folder-summary.c
===
RCS file: /cvs/gnome/evolution-data-server/camel/camel-folder-summary.c,v

[Evolution-hackers] mmap() for the summary file

2006-06-11 Thread Philip Van Hoof
Hi there,

I've been trying to replace the fread()/fopen() implementation in
camel-folder-summary.c with an mmap() one.

I know camel-file-utils.c will put duplicate strings in a hashtable and
that way reduce memory usage for the summary information. Because a lot
mail boxes have duplicate strings for the From and To headers. I know
why and how this is implemented. And I understand that this already
reduces memory usage a lot.

However. On a small device with few memory resources, the kernel knows
better when to allocate and when to deallocate uncachable data like this
summary information.

Therefore I propose to replace the implementation with mmap(). Not only
I propose it, I also already tried it myself.

While trying this, I came to the conclusion that it *would* be possible
if the strings would have been terminated by '\0' in stead of being
stored pascal-like using an encoded unsigned 32 bit integer in front of
the string data.

That decision makes this (using the current file format) impossible,
unless the mmap'd memory (and therefore also the file on disk) is
constantly rewritten (with '\0' characters) or unless the entire
infrastructure that uses the summary strings is adapted to use this
length information rather than using the strings directly from the
mmap'ed memory *as* NULL terminated C strings (char arrays with a NULL
termination). The second solution implies that all would have to be
converted to GString's.

I think it would reduce memory usage of Evolution with ~40mb (depending
on the total amount of summary information being loaded). It would make
the sorting of the header summary view a little bit slower on certain
machines (mainly on machines that have very few memory resources left,
so that the kernel will not put a lot of this mmap'ed data in its
buffers/cache).

The file format should be adapted in two ways:

- Duplicate strings will need to be stored at only one location *on the
disk*. So the hashtable implementation wouldn't be a memory-only but
also a in-the-summary-file something.

For example: A string-field can be a pointer to the first character of
the string, or a pointer to another location in the file (in the mmap).

- Strings will need to be '\0' terminated *in the file* so that they are
directly usable from the mmap() memory block. 


Who are the brave souls that want to join me with this brain-damaging
idea? And would a change like this (which would mean that a migration
procedure each time an old folder-summary is loaded would need to run)
ever get upstream?

I measured (using valgrind) that most of the Evolution memory usages
goes to storing a in-memory version of the summary files. I also
measured that there's quite a lot memory segmentation going on (while
loading the summary file) and that it (the memory for the file) consumes
~ twice as much as the on-filesystem filesize of the summary file.

Loading using mmap() would be faster and wouldn't consume as much real
memory (it would consume a mmap, that is true, and that memory would
most likely go to the buffers/cache which the kernel manages, that is
also true). Sorting might become a little bit slower (but probably not
noticeable on most desktop hardware).

I'm being serious, I would like to waste my time on this one. If the
camel team of Evolution likes the idea (and wouldn't mind wasting some
of their time on it as well). If not ... I'd rather wait for the
disk-summary branch or for libspruce than to waste my time with it.
Because forking camel would mean wasting huge amounts of time on
maintaining a fork.

I attached a patch with my current tryout. I already load the header of
the summary file using mmap. That is already working. The difficult part
is, however, making the strings themselves usable. Because those aren't
NULL terminated. But please check the patch, you'll immediately see what
I mean.

Copying the strings, and NULL terminating the copy, is not a good option
because that would make the entire mmap-concept pointless (you still
copy it to real memory, so the entire reason-for-mmap then gone). Note
that this is what the current implementation also does: it copies the
string and null terminates the copy. And then frees the malloc that was
allocated for reading it from the file.

In fact is that copy unnecessary. Since fread() is a copy (and not like
mmap also real on-disk data), it wouldn't matter if you'd use the
original malloc()'d memory. This memory copying is probably causing the
memory segmentation I mentioned above. If you'd implement it like this,
you'd better at least used gslice.

But anyway ;)


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be
? camel-mime-tables.c
? finalise
Index: camel-folder-summary.c
===
RCS file: /cvs/gnome/evolution-data-server/camel/camel-folder-summary.c,v

Re: [Evolution-hackers] How would this work?

2006-06-11 Thread Harish Krishnaswamy
Yes. It is indeed possible for you to use the IMAP camel provider to
talk to your custom server. Just have a look at how the GroupWise
provider is implemented.

See camel_provider_module_init () in
http://cvs.gnome.org/viewcvs/evolution-data-server/camel/providers/groupwise/camel-groupwise-provider.c?rev=1.33view=markup

which (when use_imap is TRUE) sets the groupwise camel provider store to
that of imap.

The groupwise-account-setup plugin is also a good working model for you
to base the zimbra account creation on. This has some limitations
currently which will be addressed in near future (and hence likely to
change).


--Harish

On Sat, 2006-06-10 at 15:55 -0700, Scott Herscher wrote:
 Hey all. I'm wondering if it's possible to write a custom backend for
 evolution and evolution-data-server that re-uses the IMAP camel
 provider?
 
 I've written a custom e-book library that kinda works, and I'm getting
 started on writing a custom e-cal backend that will do calendaring.
 In the interest of time, and since the server I'm working with
 supports the IMAP protocol, I was hoping I could do something simple
 like reuse the IMAP camel-provider and use my custom addressbook and
 calendar plugins in setting the account up.  Is this possible?  If so,
 how would I do something like that?
 
 Scott
 ___
 Evolution-hackers mailing list
 Evolution-hackers@gnome.org
 http://mail.gnome.org/mailman/listinfo/evolution-hackers

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers