Re: [HACKERS] Still something fishy in the fastpath lock stuff

Heikki Linnakangas Wed, 26 Mar 2014 01:11:22 -0700

On 03/26/2014 06:59 AM, Robert Haas wrote:

On Tue, Mar 25, 2014 at 11:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

Robert Haas <robertmh...@gmail.com> writes:

I really think we
should change that rule; it leads to ugly code, and bugs.


No doubt, but are we prepared to believe that we have working "compiler
barriers" other than volatile?  (Which, I will remind you, is in the C
standard.  Unlike "compiler barriers".)


I'm prepared to believe that it would take some time to be certain
that we've got that right on all compilers we support.  The most
common ones are easily dealt with, I think, but it might be harder to
verify the behavior on more obscure compilers.  But I'd rather deal
with bullet-proofing SpinLockAcquire() and SpinLockRelease() *in
general* than deal with bullet-proofing every call site individually,
which is what we have to do right now.

+1. To support porting to new compilers, we can fall back to e.g callinga dummy function through a function pointer, if we don't have a properimplementation.

Aside from the correctness issue: A while ago, while working on thexloginsert slots stuff, I looked at the assembler that gcc generated forReserverXLogInsertLocation(). That function is basically:


    SpinLockAcquire()
    <modify and read a few variables in shared memory>
    SpinLockRelease()
    <fairly expensive calculations based on values read>

Gcc decided to move the release of the lock so that it became:

    SpinLockAcquire()
    <modify and read a few variables in shared memory>
    <fairly expensive calculations based on values read>
    SpinLockRelease()

I went through some effort while writing the patch to make sure thespinlock is held for as short duration as possible. But gcc undid thatto some extent :-(.

I tried to put a compiler barrier there and run some short performancetests, but it didn't make any noticeable difference. So it doesn't seemmatter in practice - maybe the CPU reorders things anyway, or thecalculations are not expensive enough to matter after all.

I just looked at the assembler generated for LWLockAcquire, and asimilar thing is happening there. The code is:


        ...
641     /* We are done updating shared state of the lock itself. */
642     SpinLockRelease(&lock->mutex);
643
644     TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(l), T_ID(l), mode);
645
646     /* Add lock to list of locks held by this backend */
647     held_lwlocks[num_held_lwlocks++] = l;
        ...

gcc reordered part of the "held_lwlocks" assignment after releasing thespinlock:


.L21:
        .loc 1 647 0
        movslq  num_held_lwlocks(%rip), %rax
        addq    $16, %r12
.LVL23:
        .loc 1 652 0
        testl   %ebx, %ebx
        .loc 1 642 0
        movb    $0, (%r14)      <--- SpinLockRelease
        .loc 1 652 0
        leal    -1(%rbx), %r13d
        .loc 1 647 0
        leal    1(%rax), %edx
        movq    %r14, held_lwlocks(,%rax,8)
.LVL24:
        movl    %edx, num_held_lwlocks(%rip)

The held_lwlocks assignment was deliberately put outside thespinlock-protected section, to minimize the time the spinlock is held,but the compiler moved some of it back in: the basic block .L21.

A compiler barrier would avoid that kind of reordering. Not usingvolatile would also allow the compiler to reorder and optimize theinstructions *within* the spinlock-protected block, which might shave afew instructions while a spinlock is held. Granted, spinlock-protectedblocks are small so there isn't much room for optimization, but still.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Still something fishy in the fastpath lock stuff

Reply via email to