----- Original Message ----
> MemPage bitfield patch below. 
> 
> sizeof(MemPage) on Linux: 
> 
>   original: 84
>   patched:  76
> ...
> Break-even for memory is 904/8 = 113 MemPage structs allocated.

I didn't look at the code, so mind me :)

If the MemPage are malloced individually (instead of being put in arrays), then 
they are 16 byte aligned on most platforms, making the allocated block 
effectively the same size (well, that depends on how many bytes are used by 
malloc before the user block in memory).

If on the other hand those structs are packed in arrays then there can be a 
benefit.
But there, I would think that a good experiment would be to split the fields 
into different arrays (the same old optimizations on chunky vs planar for those 
coming from computer graphics) and group data by frequency of use and/or 
locality for the caches.
An example I remember from back in the days was a struct containing data for 
each pixel that we split into two structs (puting the data used less frequently 
in a separate struct), and with this change we got over 500% speed improvement 
on the typical workload just because the processor was doing less cache miss 
and could prefetch much more efficiently when iterating over data.

Also, my take on bitfields is that they are not thread/multi processor friendly 
(there is no atomic "set bit"), and also compilers typically don't optimize 
well with that (so before applying this patch, I would test on other platforms 
than gcc linux x86).

Nicolas

Reply via email to