On 03/26/2014 06:59 AM, Robert Haas wrote:
On Tue, Mar 25, 2014 at 11:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
Robert Haas <robertmh...@gmail.com> writes:
I really think we
should change that rule; it leads to ugly code, and bugs.
No doubt, but are we prepared to believe that we have working "compiler
barriers" other than volatile? (Which, I will remind you, is in the C
standard. Unlike "compiler barriers".)
I'm prepared to believe that it would take some time to be certain
that we've got that right on all compilers we support. The most
common ones are easily dealt with, I think, but it might be harder to
verify the behavior on more obscure compilers. But I'd rather deal
with bullet-proofing SpinLockAcquire() and SpinLockRelease() *in
general* than deal with bullet-proofing every call site individually,
which is what we have to do right now.
+1. To support porting to new compilers, we can fall back to e.g calling
a dummy function through a function pointer, if we don't have a proper
implementation.
Aside from the correctness issue: A while ago, while working on the
xloginsert slots stuff, I looked at the assembler that gcc generated for
ReserverXLogInsertLocation(). That function is basically:
SpinLockAcquire()
<modify and read a few variables in shared memory>
SpinLockRelease()
<fairly expensive calculations based on values read>
Gcc decided to move the release of the lock so that it became:
SpinLockAcquire()
<modify and read a few variables in shared memory>
<fairly expensive calculations based on values read>
SpinLockRelease()
I went through some effort while writing the patch to make sure the
spinlock is held for as short duration as possible. But gcc undid that
to some extent :-(.
I tried to put a compiler barrier there and run some short performance
tests, but it didn't make any noticeable difference. So it doesn't seem
matter in practice - maybe the CPU reorders things anyway, or the
calculations are not expensive enough to matter after all.
I just looked at the assembler generated for LWLockAcquire, and a
similar thing is happening there. The code is:
...
641 /* We are done updating shared state of the lock itself. */
642 SpinLockRelease(&lock->mutex);
643
644 TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(l), T_ID(l), mode);
645
646 /* Add lock to list of locks held by this backend */
647 held_lwlocks[num_held_lwlocks++] = l;
...
gcc reordered part of the "held_lwlocks" assignment after releasing the
spinlock:
.L21:
.loc 1 647 0
movslq num_held_lwlocks(%rip), %rax
addq $16, %r12
.LVL23:
.loc 1 652 0
testl %ebx, %ebx
.loc 1 642 0
movb $0, (%r14) <--- SpinLockRelease
.loc 1 652 0
leal -1(%rbx), %r13d
.loc 1 647 0
leal 1(%rax), %edx
movq %r14, held_lwlocks(,%rax,8)
.LVL24:
movl %edx, num_held_lwlocks(%rip)
The held_lwlocks assignment was deliberately put outside the
spinlock-protected section, to minimize the time the spinlock is held,
but the compiler moved some of it back in: the basic block .L21.
A compiler barrier would avoid that kind of reordering. Not using
volatile would also allow the compiler to reorder and optimize the
instructions *within* the spinlock-protected block, which might shave a
few instructions while a spinlock is held. Granted, spinlock-protected
blocks are small so there isn't much room for optimization, but still.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers