Gregory Maxwell <[EMAIL PROTECTED]> writes:
> might be useful to align the structure so it always crosses two lines
> and measure the performance difference.. the delta could be basically
> attributed to the cache line bouncing since even one additional bounce
> would overwhelm the other performance effects from the changed
> alignment.

Good idea.  I goosed the struct declaration and setup code to arrange
that the BufMappingLock's spinlock and the rest of its data were in
different cache lines instead of the same one.  The results (still
on Red Hat's 4-way Opteron):

previous best code (slock-no-cmpb and spin-delay-2):
                1 31s   2 42s   4 51s   8 100s
with LWLock padded to 32 bytes and correctly aligned:
                1 31s   2 41s   4 51s   8 97s
with LWLocks 32 bytes, but deliberately misaligned:
                1 30s   2 50s   4 102s  8 200s

There is no other reason than having to touch multiple cache lines for
the second and third cases to be different: the array indexing code
should be exactly the same.

These last numbers are pretty close to what I got from the
separated-spinlock patch:
                1 31s   2 52s   4 106s  8 213s
So it seems there's no doubt that it's the doubled cache traffic that
was causing most of the problem there.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to