Re: [HACKERS] Inefficient barriers on solaris with sun cc

Robert Haas Thu, 02 Oct 2014 07:56:38 -0700

On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> It's actually more complex than that :(
>
> Simple things first:
>
> Oracle's definition seems pretty iron clad:
> http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
> __machine_acq_barrier is a clear superset of __machine_r_barrier and
> __machine_rel_barrier is a clear superset of __machine_w_barrier
>
> And that's what we're essentially discussing, no? That said, there seems
> to be no reason to avoid using __machine_r/w_barrier().


So let's use those, then.

> But for the reason why I defined pg_read_barrier/write_barrier to
> __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):
>
> The C11/C++11 definition it's made for is hellishly hard to
> understand. There's very subtle differences between acquire/release
> operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
> parts of the standards. I think it essentially guarantees the mapping
> we're talking about, but it's not entirely clear.
>
> The way acquire/release fences are defined is that they form a
> 'synchronizes-with' relationship with each other. Which would, I think,
> be sufficient given that without a release like operation on the other
> thread a read/wrie barrier isn't worth much. But there's a rub in that
> it requires a atomic operation involved somehere to give that guarantee.
>
> I *did* check that the emitted code on relevant architectures is sane,
> but that doesn't guarantee anything for the future.
>
> Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
> definitely guaranteeing what we need, even if superflously heavy on some
> platforms. It still is significantly more efficient than
> __sync_synchronize() which is what was used before. I.e. it generates no
> code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
> otherwise, although I don't know why) and similar on ia64.

A fully barrier on x86 should be an mfence, right?  With only a
compiler barrier, you have loads ordered with respect to loads and
stores ordered with respect to stores, but the load/store ordering
isn't fully defined.

> Which is why these acquire/release fences, in contrast to
> acquire/release operations, have more guarantees... You put your finger
> right onto the spot.

But, uh, we still don't seem to know what those guarantees actually ARE.

>> Say I want to appear to only change things while flag is 1, so I
>> write this code:
>>
>> flag = 1
>> acquire barrier
>> things++
>> release barrier
>> flag = 0
>>
>> With the definition you (and Oracle) propose
>> this won't work, because
>> there's nothing to keep the modification of things from being
>> reordered before flag = 1.  What good is that?  Apparently, I don't
>> have any idea!
>
> As written above, I don't think that applies to oracle's definition?

Oracle's definition doesn't look sufficient there.  The acquire
barrier guarantees that the load operations before the barrier will be
completed before the load and store operations after the barrier, but
the only operation before the barrier is a store, not a load, so it
guarantees nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inefficient barriers on solaris with sun cc

Reply via email to