On Sat, 2012-09-22 at 09:07 +0200, Keith Packard wrote:
> Adam Jackson writes:
>
> > Pick smaller types where possible, including bitfielding some Bools and
> > small enums, then shuffle the result to be hole-free. 192 -> 128 bytes
> > on LP64, 144 -> 96 bytes on ILP32.
>
> One thing that would make this easier to check for 'optimal' packing
> would be to simply start with the largest sized objects and work down to
> the smallest ones. Otherwise, I'm sitting here counting bits. Or would
> that be less efficient at run time?
That's true, as far as it goes, but I find it less tedious to just ask
what the answer is:
hate:~/xserver% pahole -C _Client hw/vfb/Xvfb
struct _Client {
pointerrequestBuffer;/* 0 8 */
pointerosPrivate;/* 8 8 */
Mask clientAsMask; /*16 4 */
short int index;/*20 2 */
unsigned char majorOp; /*22 1 */
unsigned char minorOp; /*23 1 */
intswapped:1;/*24:31 4 */
intlocal:1; /*24:30 4 */
intbig_requests:1; /*24:29 4 */
intclientGone:1; /*24:28 4 */
intcloseDownMode:2; /*24:26 4 */
intclientState:2;/*24:24 4 */
/* Bitfield combined with next fields */
char smart_priority; /*25 1 */
short int noClientException;/*26 2 */
intpriority; /*28 4 */
ReplySwapPtr pSwapReplyFunc; /*32 8 */
XIDerrorValue; /*40 4 */
intsequence; /*44 4 */
intignoreCount; /*48 4 */
intnumSaved; /*52 4 */
SaveSetElt * saveSet; /*56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
int ()(ClientPtr) * * requestVector;/*64 8 */
CARD32 req_len; /*72 4 */
unsigned int replyBytesRemaining; /*76 4 */
PrivateRec * devPrivates; /*80 8 */
short unsigned int xkbClientFlags; /*88 2 */
short unsigned int mapNotifyMask;/*90 2 */
short unsigned int newKeyboardNotifyMask; /*92 2 */
short unsigned int vMajor; /*94 2 */
short unsigned int vMinor; /*96 2 */
KeyCodeminKC;/*98 1 */
KeyCodemaxKC;/*99 1 */
intsmart_start_tick; /* 100 4 */
intsmart_stop_tick; /* 104 4 */
intsmart_check_tick; /* 108 4 */
DeviceIntPtr clientPtr;/* 112 8 */
ClientIdPtrclientIds;/* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
/* size: 128, cachelines: 2, members: 37 */
};
Not the best-documented set of tools in the world, but very handy.
As far as efficiency, I suspect cacheline fill cost would dominate over
the cost of computing offsets. So if you really wanted to ricer tune
this, try a multi-client benchmark with the server under cachegrind,
figure out the histogram of field access, put the most-frequently used
member as the first element so its address constant-folds with that of
the struct itself, and then try to cram as many frequently-accessed
fields into the first cacheline as you can.
Having done that I'm not sure you'd see a statistically significant win
even in x11perf -noop.
- ajax
signature.asc
Description: This is a digitally signed message part
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel