Gregory Maxwell <[EMAIL PROTECTED]> writes: > might be useful to align the structure so it always crosses two lines > and measure the performance difference.. the delta could be basically > attributed to the cache line bouncing since even one additional bounce > would overwhelm the other performance effects from the changed > alignment.
Good idea. I goosed the struct declaration and setup code to arrange that the BufMappingLock's spinlock and the rest of its data were in different cache lines instead of the same one. The results (still on Red Hat's 4-way Opteron): previous best code (slock-no-cmpb and spin-delay-2): 1 31s 2 42s 4 51s 8 100s with LWLock padded to 32 bytes and correctly aligned: 1 31s 2 41s 4 51s 8 97s with LWLocks 32 bytes, but deliberately misaligned: 1 30s 2 50s 4 102s 8 200s There is no other reason than having to touch multiple cache lines for the second and third cases to be different: the array indexing code should be exactly the same. These last numbers are pretty close to what I got from the separated-spinlock patch: 1 31s 2 52s 4 106s 8 213s So it seems there's no doubt that it's the doubled cache traffic that was causing most of the problem there. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings