Re: [Evolution-hackers] Moving the struct instance heap space to mmap

2006-09-08 Thread Eero Tamminen
Hi,

ext Federico Mena Quintero wrote:
 gpointer
 gimme_ptr_to_subject_for_message_number (int n)
 {
return pointers_to_message_summaries[n] + SUBJECT_OFFSET;
 }

Functions that do nothing else besides single array or struct lookup
are hopefully static inlines in some header, so that the binary is
not bloated with:
- unnecessary global function resolving (both performance  memory
   issue as can be read from Depper's ldso paper)
- code for pushing and popping the args to/from stack

(I'm not sure about x86, but if I remember correctly on 86k the inline
version of this function would have been just one asm instruction...)


The matter is a bit more tricky if one would want to export this API
from a shared library, as then details about the library internal
structures (offsets, array names etc) would be compiled into binaries
using the library.  This could be handled with library versioning,
but one needs to be more careful with changes and updates will require
more often rebuild of the dependent applications.


- Eero
___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Moving the struct instance heap space to mmap

2006-09-08 Thread Philip Van Hoof
On Thu, 2006-09-07 at 21:48 -0500, Federico Mena Quintero wrote:
 On Thu, 2006-09-07 at 21:14 +0200, Philip Van Hoof wrote:
 
  The *new*/*extra* idea is to create a second index file which contains
  the offsets to the pointers in the camel summary file. Then mmap also
  that file. Extra because the idea will build on top of the existing
 
 Ummm, but this won't reduce your working set by very much, will it?
 
 I haven't looked at the details, but can't you just keep an array in
 memory with pointers to the *start* of each summary block, and then
 compute the other pointers on demand?

Yes. This sounds possible. Each member of the summary, however, would
need a length (or a lot strlens are needed per access).

The SUBJECT_OFFSET would be variable because the other members, in the
summary, aren't fixed-sized. Unless you for example use fixed-sized
strings (which would make the mmaped file grow a lot). Consider that you
would have to support the CC field of spam E-mail: 500 E-mail
addresses :), forcing you to make that CC field +- 2k in size per item.

Or the subject of ~ 200 bytes per item, whereas some subjects would
probably be less than 10 bytes. Even for mmap that would become wastage.

But it would be possible, with the offset to the start of the record, to
strlen the strings and that way calculate the position of the next item.

Or to put the length of the strings in a little sub-index per record (a
little bit like reiserfs, hehe). Or to put the length of the string in
front of the string, like pstrings, to avoid the strlen (note that this
makes each string 32 bits larger, unless you encode the integer which
makes data alignment more difficult).


Thanks a lot for your input Federico.

-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Moving the struct instance heap space to mmap

2006-09-08 Thread Philip Van Hoof
On Fri, 2006-09-08 at 11:22 -0500, Federico Mena Quintero wrote:
 On Fri, 2006-09-08 at 10:37 +0200, Philip Van Hoof wrote:

 1. What is the maximum length of those strings?  What is the maximum
 length of one message header in the summary file?  In the message info
 struct, can you use short ints for offsets instead of full pointers?  Or
 even single bytes for lengths, if the strings are really short?

At this moment is the file written using encoded integers. The algorithm
for that basically looks at the 0x80 bit to see whether or not the
bitfield ends on the current byte. In memory it's expanded to a 4 byte
normale integer (which is a reason why ls summary is smaller than the
amount of memory being used).

I'm sure that indeed an unsigned char can be used in stead. Very few
strings are larger than 256 bytes. And else two unsigned chars or 16
bit. Any string of which the length doesn't fit in 16 bit is probably an
error rather than a summary-string coming from an E-mail.

It does, however, feel a bit like micro optimizations. But if you
multiply it with the amount of headers, it becomes more significant
indeed.

 2. What are the access patterns?  When the array of summaries gets
 accesed, do you need all the fields, or only some of them?  I.e. you may
 need to access some flags directly, but you may be able to afford
 computing some lenghts by hand if you are doing an uncommon operation.

Only some of them. Removing unneeded pointers from CamelMessageInfoBase
is on my todo list. I tried it once and only partly succeeded (something
became unstable for some strange reason with the NNTP provider).

 3. Even if some things get accessed frequently, some others may not be.
 All my message folders show the default columns:  Flags, From, Subject,
 Date --- these are all fields in CamelMessageInfoBase.  However, some
 others don't get displayed at all:  mlist, to, cc, etc.  Can we remove
 those pointers from the struct, and compute them on demand?  [Those
 fields may be used when filtering, especially mlist.  But filtering is
 pretty much only done when you fetch new mail (isn't it?), so you can
 maybe compute that field on demand instead of keeping it around.]
 
 The goal is to reduce the working set.  Moving all that stuff to another
 mmap() cache just moves your working set from the heap to elsewhere :)

Correct. But the kernel does an awesome job (if you don't have a swap
partition -- mobile device --) of swapping it out. And an even better
job if your mmap is read-only (which it is, in this case).

The third idea sounds a little bit like the disk summary idea. Jeffrey
and Michael can probably explain that one better than me.


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be

___
Evolution-hackers mailing list
Evolution-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/evolution-hackers