[Evolution-hackers] mmap() for the summary file
Hi there, I've been trying to replace the fread()/fopen() implementation in camel-folder-summary.c with an mmap() one. I know camel-file-utils.c will put duplicate strings in a hashtable and that way reduce memory usage for the summary information. Because a lot mail boxes have duplicate strings for the From and To headers. I know why and how this is implemented. And I understand that this already reduces memory usage a lot. However. On a small device with few memory resources, the kernel knows better when to allocate and when to deallocate uncachable data like this summary information. Therefore I propose to replace the implementation with mmap(). Not only I propose it, I also already tried it myself. While trying this, I came to the conclusion that it *would* be possible if the strings would have been terminated by '\0' in stead of being stored pascal-like using an encoded unsigned 32 bit integer in front of the string data. That decision makes this (using the current file format) impossible, unless the mmap'd memory (and therefore also the file on disk) is constantly rewritten (with '\0' characters) or unless the entire infrastructure that uses the summary strings is adapted to use this length information rather than using the strings directly from the mmap'ed memory *as* NULL terminated C strings (char arrays with a NULL termination). The second solution implies that all would have to be converted to GString's. I think it would reduce memory usage of Evolution with ~40mb (depending on the total amount of summary information being loaded). It would make the sorting of the header summary view a little bit slower on certain machines (mainly on machines that have very few memory resources left, so that the kernel will not put a lot of this mmap'ed data in its buffers/cache). The file format should be adapted in two ways: - Duplicate strings will need to be stored at only one location *on the disk*. So the hashtable implementation wouldn't be a memory-only but also a in-the-summary-file something. For example: A string-field can be a pointer to the first character of the string, or a pointer to another location in the file (in the mmap). - Strings will need to be '\0' terminated *in the file* so that they are directly usable from the mmap() memory block. Who are the brave souls that want to join me with this brain-damaging idea? And would a change like this (which would mean that a migration procedure each time an old folder-summary is loaded would need to run) ever get upstream? I measured (using valgrind) that most of the Evolution memory usages goes to storing a in-memory version of the summary files. I also measured that there's quite a lot memory segmentation going on (while loading the summary file) and that it (the memory for the file) consumes ~ twice as much as the on-filesystem filesize of the summary file. Loading using mmap() would be faster and wouldn't consume as much real memory (it would consume a mmap, that is true, and that memory would most likely go to the buffers/cache which the kernel manages, that is also true). Sorting might become a little bit slower (but probably not noticeable on most desktop hardware). I'm being serious, I would like to waste my time on this one. If the camel team of Evolution likes the idea (and wouldn't mind wasting some of their time on it as well). If not ... I'd rather wait for the disk-summary branch or for libspruce than to waste my time with it. Because forking camel would mean wasting huge amounts of time on maintaining a fork. I attached a patch with my current tryout. I already load the header of the summary file using mmap. That is already working. The difficult part is, however, making the strings themselves usable. Because those aren't NULL terminated. But please check the patch, you'll immediately see what I mean. Copying the strings, and NULL terminating the copy, is not a good option because that would make the entire mmap-concept pointless (you still copy it to real memory, so the entire reason-for-mmap then gone). Note that this is what the current implementation also does: it copies the string and null terminates the copy. And then frees the malloc that was allocated for reading it from the file. In fact is that copy unnecessary. Since fread() is a copy (and not like mmap also real on-disk data), it wouldn't matter if you'd use the original malloc()'d memory. This memory copying is probably causing the memory segmentation I mentioned above. If you'd implement it like this, you'd better at least used gslice. But anyway ;) -- Philip Van Hoof, software developer at x-tend home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org work: vanhoof at x-tend dot be http://www.pvanhoof.be - http://www.x-tend.be ? camel-mime-tables.c ? finalise Index: camel-folder-summary.c === RCS file: /cvs/gnome/evolution-data-server/camel/camel-folder-summary.c,v
[Evolution-hackers] mmap() for the summary file
Hi there, I've been trying to replace the fread()/fopen() implementation in camel-folder-summary.c with an mmap() one. I know camel-file-utils.c will put duplicate strings in a hashtable and that way reduce memory usage for the summary information. Because a lot mail boxes have duplicate strings for the From and To headers. I know why and how this is implemented. And I understand that this already reduces memory usage a lot. However. On a small device with few memory resources, the kernel knows better when to allocate and when to deallocate uncachable data like this summary information. Therefore I propose to replace the implementation with mmap(). Not only I propose it, I also already tried it myself. While trying this, I came to the conclusion that it *would* be possible if the strings would have been terminated by '\0' in stead of being stored pascal-like using an encoded unsigned 32 bit integer in front of the string data. That decision makes this (using the current file format) impossible, unless the mmap'd memory (and therefore also the file on disk) is constantly rewritten (with '\0' characters) or unless the entire infrastructure that uses the summary strings is adapted to use this length information rather than using the strings directly from the mmap'ed memory *as* NULL terminated C strings (char arrays with a NULL termination). The second solution implies that all would have to be converted to GString's. I think it would reduce memory usage of Evolution with ~40mb (depending on the total amount of summary information being loaded). It would make the sorting of the header summary view a little bit slower on certain machines (mainly on machines that have very few memory resources left, so that the kernel will not put a lot of this mmap'ed data in its buffers/cache). The file format should be adapted in two ways: - Duplicate strings will need to be stored at only one location *on the disk*. So the hashtable implementation wouldn't be a memory-only but also a in-the-summary-file something. For example: A string-field can be a pointer to the first character of the string, or a pointer to another location in the file (in the mmap). - Strings will need to be '\0' terminated *in the file* so that they are directly usable from the mmap() memory block. Who are the brave souls that want to join me with this brain-damaging idea? And would a change like this (which would mean that a migration procedure each time an old folder-summary is loaded would need to run) ever get upstream? I measured (using valgrind) that most of the Evolution memory usages goes to storing a in-memory version of the summary files. I also measured that there's quite a lot memory segmentation going on (while loading the summary file) and that it (the memory for the file) consumes ~ twice as much as the on-filesystem filesize of the summary file. Loading using mmap() would be faster and wouldn't consume as much real memory (it would consume a mmap, that is true, and that memory would most likely go to the buffers/cache which the kernel manages, that is also true). Sorting might become a little bit slower (but probably not noticeable on most desktop hardware). I'm being serious, I would like to waste my time on this one. If the camel team of Evolution likes the idea (and wouldn't mind wasting some of their time on it as well). If not ... I'd rather wait for the disk-summary branch or for libspruce than to waste my time with it. Because forking camel would mean wasting huge amounts of time on maintaining a fork. I attached a patch with my current tryout. I already load the header of the summary file using mmap. That is already working. The difficult part is, however, making the strings themselves usable. Because those aren't NULL terminated. But please check the patch, you'll immediately see what I mean. Copying the strings, and NULL terminating the copy, is not a good option because that would make the entire mmap-concept pointless (you still copy it to real memory, so the entire reason-for-mmap then gone). Note that this is what the current implementation also does: it copies the string and null terminates the copy. And then frees the malloc that was allocated for reading it from the file. In fact is that copy unnecessary. Since fread() is a copy (and not like mmap also real on-disk data), it wouldn't matter if you'd use the original malloc()'d memory. This memory copying is probably causing the memory segmentation I mentioned above. If you'd implement it like this, you'd better at least used gslice. But anyway ;) -- Philip Van Hoof, software developer at x-tend home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org work: vanhoof at x-tend dot be http://www.pvanhoof.be - http://www.x-tend.be ? camel-mime-tables.c ? finalise Index: camel-folder-summary.c === RCS file: /cvs/gnome/evolution-data-server/camel/camel-folder-summary.c,v
Re: [Evolution-hackers] How would this work?
Yes. It is indeed possible for you to use the IMAP camel provider to talk to your custom server. Just have a look at how the GroupWise provider is implemented. See camel_provider_module_init () in http://cvs.gnome.org/viewcvs/evolution-data-server/camel/providers/groupwise/camel-groupwise-provider.c?rev=1.33view=markup which (when use_imap is TRUE) sets the groupwise camel provider store to that of imap. The groupwise-account-setup plugin is also a good working model for you to base the zimbra account creation on. This has some limitations currently which will be addressed in near future (and hence likely to change). --Harish On Sat, 2006-06-10 at 15:55 -0700, Scott Herscher wrote: Hey all. I'm wondering if it's possible to write a custom backend for evolution and evolution-data-server that re-uses the IMAP camel provider? I've written a custom e-book library that kinda works, and I'm getting started on writing a custom e-cal backend that will do calendaring. In the interest of time, and since the server I'm working with supports the IMAP protocol, I was hoping I could do something simple like reuse the IMAP camel-provider and use my custom addressbook and calendar plugins in setting the account up. Is this possible? If so, how would I do something like that? Scott ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers ___ Evolution-hackers mailing list Evolution-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/evolution-hackers