----- Original Message ---- > MemPage bitfield patch below. > > sizeof(MemPage) on Linux: > > original: 84 > patched: 76 > ... > Break-even for memory is 904/8 = 113 MemPage structs allocated.
I didn't look at the code, so mind me :) If the MemPage are malloced individually (instead of being put in arrays), then they are 16 byte aligned on most platforms, making the allocated block effectively the same size (well, that depends on how many bytes are used by malloc before the user block in memory). If on the other hand those structs are packed in arrays then there can be a benefit. But there, I would think that a good experiment would be to split the fields into different arrays (the same old optimizations on chunky vs planar for those coming from computer graphics) and group data by frequency of use and/or locality for the caches. An example I remember from back in the days was a struct containing data for each pixel that we split into two structs (puting the data used less frequently in a separate struct), and with this change we got over 500% speed improvement on the typical workload just because the processor was doing less cache miss and could prefetch much more efficiently when iterating over data. Also, my take on bitfields is that they are not thread/multi processor friendly (there is no atomic "set bit"), and also compilers typically don't optimize well with that (so before applying this patch, I would test on other platforms than gcc linux x86). Nicolas