Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-23 Thread Oskari Saarenmaa
06.10.2014, 17:42, Andres Freund kirjoitti:
 I think we can pretty much apply Oskari's patch after replacing
 acquire/release with read/write intrinsics.

Attached a patch rebased to current master using read  write barriers.

/ Oskari
From a994c0f4feff74050ade183ec26d726397fa14a7 Mon Sep 17 00:00:00 2001
From: Oskari Saarenmaa o...@ohmu.fi
Date: Thu, 23 Oct 2014 18:36:31 +0300
Subject: [PATCH] =?UTF-8?q?=C2=A0atomics:=20add=20compiler=20and=20memory?=
 =?UTF-8?q?=20barriers=20for=20solaris=20studio?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 configure |  2 +-
 configure.in  |  2 +-
 src/include/pg_config.h.in|  3 +++
 src/include/port/atomics/generic-sunpro.h | 17 +
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index b403a04..1248b06 100755
--- a/configure
+++ b/configure
@@ -9164,7 +9164,7 @@ fi
 done
 
 
-for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo ac_cv_header_$ac_header | $as_tr_sh`
 ac_fn_c_check_header_mongrel $LINENO $ac_header $as_ac_Header $ac_includes_default
diff --git a/configure.in b/configure.in
index df86882..0a3725f 100644
--- a/configure.in
+++ b/configure.in
@@ -1016,7 +1016,7 @@ AC_SUBST(UUID_LIBS)
 ##
 
 dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES
-AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
 
 # On BSD, test for net/if.h will fail unless sys/socket.h
 # is included first.
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index ddcf4b0..3e78d65 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -340,6 +340,9 @@
 /* Define to 1 if `long long int' works and is 64 bits. */
 #undef HAVE_LONG_LONG_INT_64
 
+/* Define to 1 if you have the mbarrier.h header file. */
+#undef HAVE_MBARRIER_H
+
 /* Define to 1 if you have the `mbstowcs_l' function. */
 #undef HAVE_MBSTOWCS_L
 
diff --git a/src/include/port/atomics/generic-sunpro.h b/src/include/port/atomics/generic-sunpro.h
index 77d3ebe..cd84107 100644
--- a/src/include/port/atomics/generic-sunpro.h
+++ b/src/include/port/atomics/generic-sunpro.h
@@ -19,6 +19,23 @@
 
 #if defined(HAVE_ATOMICS)
 
+#ifdef HAVE_MBARRIER_H
+#include mbarrier.h
+
+#define pg_compiler_barrier_impl()	__compiler_barrier()
+
+#ifndef pg_memory_barrier_impl
+#	define pg_memory_barrier_impl()		__machine_rw_barrier()
+#endif
+#ifndef pg_read_barrier_impl
+#	define pg_read_barrier_impl()		__machine_r_barrier()
+#endif
+#ifndef pg_write_barrier_impl
+#	define pg_write_barrier_impl()		__machine_w_barrier()
+#endif
+
+#endif /* HAVE_MBARRIER_H */
+
 /* Older versions of the compiler don't have atomic.h... */
 #ifdef HAVE_ATOMIC_H
 
-- 
1.8.4.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-06 Thread Robert Haas
On Thu, Oct 2, 2014 at 2:06 PM, Andres Freund and...@2ndquadrant.com wrote:
 Also, I pretty much designed those definitions to match what Linux
 does.  And it doesn't require that either, though it says that in most
 cases it will work out that way.

 My point is that that read barriers aren't particularly meaningful
 without a defined store order from another thread/process. Without any
 form of pairing you don't have that. The writing side could just have
 reordered the writes in a way you didn't want them.  And the kernel docs
 do say A lack of appropriate pairing is almost certainly an error. But
 since read barriers also pair with lock releases operations, that's
 normally not a big problem.

Agreed, but it's possible to have a read-fence where an atomic
operation provides the ordering on the other side, or something like
that.

 I'm still unsure what you want to show with that example?

Me, too.  I think we've drifted off in the weeds.  Do we know what we
need to know to fix $SUBJECT?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-06 Thread Andres Freund
On 2014-10-06 11:38:47 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 2:06 PM, Andres Freund and...@2ndquadrant.com wrote:
  Also, I pretty much designed those definitions to match what Linux
  does.  And it doesn't require that either, though it says that in most
  cases it will work out that way.
 
  My point is that that read barriers aren't particularly meaningful
  without a defined store order from another thread/process. Without any
  form of pairing you don't have that. The writing side could just have
  reordered the writes in a way you didn't want them.  And the kernel docs
  do say A lack of appropriate pairing is almost certainly an error. But
  since read barriers also pair with lock releases operations, that's
  normally not a big problem.
 
 Agreed, but it's possible to have a read-fence where an atomic
 operation provides the ordering on the other side, or something like
 that.

Sure, that's one of the possible pairings. Most atomics have barrier
semantics...

  I'm still unsure what you want to show with that example?
 
 Me, too.  I think we've drifted off in the weeds.  Do we know what we
 need to know to fix $SUBJECT?

I think we can pretty much apply Oskari's patch after replacing
acquire/release with read/write intrinsics.

I'm opening a bug with the gcc folks about clarifying the docs on their
intrinsics.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Andres Freund
On 2014-09-26 10:28:21 -0400, Robert Haas wrote:
 On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa o...@ohmu.fi wrote:
  So you think a read barrier is the same thing as an acquire barrier
  and a write barrier is the same as a release barrier?  That would be
  surprising.  It's certainly not true in general.
 
  The above doc describes the difference: read barrier requires loads before
  the barrier to be completed before loads after the barrier - an acquire
  barrier is the same, but it also requires loads to be complete before stores
  after the barrier.
 
  Similarly write barrier requires stores before the barrier to be completed
  before stores after the barrier - a release barrier is the same, but it also
  requires loads before the barrier to be completed before stores after the
  barrier.
 
  So acquire is read + loads-before-stores and release is write +
  loads-before-stores.
 
 Hmm.  My impression was that an acquire barrier means that loads and
 stores can migrate forward across the barrier but not backward; and
 that a release barrier means that loads and stores can migrate
 backward across the barrier but not forward.

It's actually more complex than that :(

Simple things first:

Oracle's definition seems pretty iron clad:
http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
__machine_acq_barrier is a clear superset of __machine_r_barrier and
__machine_rel_barrier is a clear superset of __machine_w_barrier

And that's what we're essentially discussing, no? That said, there seems
to be no reason to avoid using __machine_r/w_barrier().


But for the reason why I defined pg_read_barrier/write_barrier to
__atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):

The C11/C++11 definition it's made for is hellishly hard to
understand. There's very subtle differences between acquire/release
operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
parts of the standards. I think it essentially guarantees the mapping
we're talking about, but it's not entirely clear.

The way acquire/release fences are defined is that they form a
'synchronizes-with' relationship with each other. Which would, I think,
be sufficient given that without a release like operation on the other
thread a read/wrie barrier isn't worth much. But there's a rub in that
it requires a atomic operation involved somehere to give that guarantee.

I *did* check that the emitted code on relevant architectures is sane,
but that doesn't guarantee anything for the future.

Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
definitely guaranteeing what we need, even if superflously heavy on some
platforms. It still is significantly more efficient than
__sync_synchronize() which is what was used before. I.e. it generates no
code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
otherwise, although I don't know why) and similar on ia64.

As a reference, relevant standard sections are:
C11: 5.1.2.4 5); 7.17.4
C++11: 29.3; 1.10
Not that we can rely on those, but I think it's a good thing to orient on.

 I'm actually not really sure what this means unless the barrier also
 does something in and of itself.

 For example, consider this:
 
 some stuff
 CAS(lock, 0, 1) // i am an acquire barrier
 more stuff
 lock = 0 // i am a release barrier
 even more stuff
 
 If the CAS() and lock = 0 instructions were FULL barriers, then we'd
 be saying that the stuff that happens in the critical section needs to
 be exactly more stuff.  But if they are acquire and release
 barriers, respectively, then the CPU is allowed to move some stuff
 or even more stuff into the critical section; but what it can't do
 is move more stuff out.

 Now if you just have a naked acquire barrier that is not doing
 anything itself, I don't really know what the semantics of that should
 be.

Which is why these acquire/release fences, in contrast to
acquire/release operations, have more guarantees... You put your finger
right onto the spot.

 Say I want to appear to only change things while flag is 1, so I
 write this code:
 
 flag = 1
 acquire barrier
 things++
 release barrier
 flag = 0
 
 With the definition you (and Oracle) propose

As written above, I don't think that applies to oracle's definition?

 this won't work, because
 there's nothing to keep the modification of things from being
 reordered before flag = 1.  What good is that?  Apparently, I don't
 have any idea!

I hope it's a bit clearer now?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote:
 It's actually more complex than that :(

 Simple things first:

 Oracle's definition seems pretty iron clad:
 http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
 __machine_acq_barrier is a clear superset of __machine_r_barrier and
 __machine_rel_barrier is a clear superset of __machine_w_barrier

 And that's what we're essentially discussing, no? That said, there seems
 to be no reason to avoid using __machine_r/w_barrier().

So let's use those, then.

 But for the reason why I defined pg_read_barrier/write_barrier to
 __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):

 The C11/C++11 definition it's made for is hellishly hard to
 understand. There's very subtle differences between acquire/release
 operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
 parts of the standards. I think it essentially guarantees the mapping
 we're talking about, but it's not entirely clear.

 The way acquire/release fences are defined is that they form a
 'synchronizes-with' relationship with each other. Which would, I think,
 be sufficient given that without a release like operation on the other
 thread a read/wrie barrier isn't worth much. But there's a rub in that
 it requires a atomic operation involved somehere to give that guarantee.

 I *did* check that the emitted code on relevant architectures is sane,
 but that doesn't guarantee anything for the future.

 Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
 definitely guaranteeing what we need, even if superflously heavy on some
 platforms. It still is significantly more efficient than
 __sync_synchronize() which is what was used before. I.e. it generates no
 code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
 otherwise, although I don't know why) and similar on ia64.

A fully barrier on x86 should be an mfence, right?  With only a
compiler barrier, you have loads ordered with respect to loads and
stores ordered with respect to stores, but the load/store ordering
isn't fully defined.

 Which is why these acquire/release fences, in contrast to
 acquire/release operations, have more guarantees... You put your finger
 right onto the spot.

But, uh, we still don't seem to know what those guarantees actually ARE.

 Say I want to appear to only change things while flag is 1, so I
 write this code:

 flag = 1
 acquire barrier
 things++
 release barrier
 flag = 0

 With the definition you (and Oracle) propose
 this won't work, because
 there's nothing to keep the modification of things from being
 reordered before flag = 1.  What good is that?  Apparently, I don't
 have any idea!

 As written above, I don't think that applies to oracle's definition?

Oracle's definition doesn't look sufficient there.  The acquire
barrier guarantees that the load operations before the barrier will be
completed before the load and store operations after the barrier, but
the only operation before the barrier is a store, not a load, so it
guarantees nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Andres Freund
On 2014-10-02 10:55:06 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 10:34 AM, Andres Freund and...@2ndquadrant.com wrote:
  It's actually more complex than that :(
 
  Simple things first:
 
  Oracle's definition seems pretty iron clad:
  http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
  __machine_acq_barrier is a clear superset of __machine_r_barrier and
  __machine_rel_barrier is a clear superset of __machine_w_barrier
 
  And that's what we're essentially discussing, no? That said, there seems
  to be no reason to avoid using __machine_r/w_barrier().
 
 So let's use those, then.

Right, I've never contended that.

  But for the reason why I defined pg_read_barrier/write_barrier to
  __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):
 
  The C11/C++11 definition it's made for is hellishly hard to
  understand. There's very subtle differences between acquire/release
  operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
  parts of the standards. I think it essentially guarantees the mapping
  we're talking about, but it's not entirely clear.
 
  The way acquire/release fences are defined is that they form a
  'synchronizes-with' relationship with each other. Which would, I think,
  be sufficient given that without a release like operation on the other
  thread a read/wrie barrier isn't worth much. But there's a rub in that
  it requires a atomic operation involved somehere to give that guarantee.
 
  I *did* check that the emitted code on relevant architectures is sane,
  but that doesn't guarantee anything for the future.
 
  Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
  definitely guaranteeing what we need, even if superflously heavy on some
  platforms. It still is significantly more efficient than
  __sync_synchronize() which is what was used before. I.e. it generates no
  code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
  otherwise, although I don't know why) and similar on ia64.
 
 A fully barrier on x86 should be an mfence, right?

Right. I've not talked about changing full barrier semantics. What I was
referring to is that until the atomics patch we always redefine
read/write barriers to be full barriers when using gcc intrinsics.

 With only a compiler barrier, you have loads ordered with respect to
 loads and stores ordered with respect to stores, but the load/store
 ordering isn't fully defined.

Yes.

  Which is why these acquire/release fences, in contrast to
  acquire/release operations, have more guarantees... You put your finger
  right onto the spot.
 
 But, uh, we still don't seem to know what those guarantees actually ARE.

Paired together they form a synchronized-with relationship. Problem #1
is that the standard's language isn't, to me at least, clear if there's
not some case where that's not the case. Problem #2 is that our current
README.barrier definition doesn't actually require barriers to be
paired. Which imo is bad, but still a fact.

The definition of ACQ_REL is pretty clearly sufficient imo: Full
barrier in both directions and synchronizes with acquire loads and
release stores in another thread..

  Say I want to appear to only change things while flag is 1, so I
  write this code:
 
  flag = 1
  acquire barrier
  things++
  release barrier
  flag = 0
 
  With the definition you (and Oracle) propose
  this won't work, because
  there's nothing to keep the modification of things from being
  reordered before flag = 1.  What good is that?  Apparently, I don't
  have any idea!
 
  As written above, I don't think that applies to oracle's definition?
 
 Oracle's definition doesn't look sufficient there.

Perhaps I'm just not understanding what you want to show with this
example. This started as a discussion of comparing acquire/release with
read/write barriers, right? Or are you generally wondering about the
point acquire/release barriers?

 The acquire
 barrier guarantees that the load operations before the barrier will be
 completed before the load and store operations after the barrier, but
 the only operation before the barrier is a store, not a load, so it
 guarantees nothing.

Well, 'acquire' operations always have to related to a load. That's why
standalone 'acquire fences' or 'acquire barriers' are more heavyweight
than just a acquiring read.

And realistically, in the above example, you'd have to read flag to see
that it's not already 1, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Robert Haas
On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund and...@2ndquadrant.com wrote:
 So let's use those, then.

 Right, I've never contended that.

OK, cool.

 A fully barrier on x86 should be an mfence, right?

 Right. I've not talked about changing full barrier semantics. What I was
 referring to is that until the atomics patch we always redefine
 read/write barriers to be full barriers when using gcc intrinsics.

OK, got it.  If there's a cheaper way to tell gcc loads before loads
or stores before stores, I'm fine with doing that for those cases.

  Which is why these acquire/release fences, in contrast to
  acquire/release operations, have more guarantees... You put your finger
  right onto the spot.

 But, uh, we still don't seem to know what those guarantees actually ARE.

 Paired together they form a synchronized-with relationship. Problem #1
 is that the standard's language isn't, to me at least, clear if there's
 not some case where that's not the case. Problem #2 is that our current
 README.barrier definition doesn't actually require barriers to be
 paired. Which imo is bad, but still a fact.

I don't know what a synchronized-with relationship means.

Also, I pretty much designed those definitions to match what Linux
does.  And it doesn't require that either, though it says that in most
cases it will work out that way.

 The definition of ACQ_REL is pretty clearly sufficient imo: Full
 barrier in both directions and synchronizes with acquire loads and
 release stores in another thread..

I dunno.  What's an acquire load?  What's a release store?  I know
what loads and stores are; I don't know what the adjectives mean.

 The acquire
 barrier guarantees that the load operations before the barrier will be
 completed before the load and store operations after the barrier, but
 the only operation before the barrier is a store, not a load, so it
 guarantees nothing.

 Well, 'acquire' operations always have to related to a load.That's why
 standalone 'acquire fences' or 'acquire barriers' are more heavyweight
 than just a acquiring read.

Again, I can't judge any of this, because you haven't defined the
terms anywhere.

 And realistically, in the above example, you'd have to read flag to see
 that it's not already 1, right?

Not necessarily.  You could be the only writer.  Think about the way
the backend entries in the stats system work.  The point of setting
the flag may be for other people to know whether the data is in the
middle of being modified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-10-02 Thread Andres Freund
On 2014-10-02 11:35:32 -0400, Robert Haas wrote:
 On Thu, Oct 2, 2014 at 11:18 AM, Andres Freund and...@2ndquadrant.com wrote:
   Which is why these acquire/release fences, in contrast to
   acquire/release operations, have more guarantees... You put your finger
   right onto the spot.
 
  But, uh, we still don't seem to know what those guarantees actually ARE.
 
  Paired together they form a synchronized-with relationship. Problem #1
  is that the standard's language isn't, to me at least, clear if there's
  not some case where that's not the case. Problem #2 is that our current
  README.barrier definition doesn't actually require barriers to be
  paired. Which imo is bad, but still a fact.
 
 I don't know what a synchronized-with relationship means.

I'm using the standard's language here, given that I'm trying to reason
about its behaviour...

What it means is that if you have a matching pair of acquire/release
operations or barriers/fences everything that happened *before* the last
release fence will be visible *after* executing the next acquire
operation in a different thread-of-execution. And 'after' is defined in
the way that is true if the 'acquiring' thread can see the result of the
'releasing' operation.
I.e. no loads after the acquire can see values from before the release.

My problem with the definition in the standard is that it's not
particularly clear how acquire fences *without* a underlying explicit
atomic operation are defined in the standard.

I checked gcc's current code and it's fine in that regard. Also other
popular concurrent open source stuff like
http://git.qemu.org/?p=qemu.git;a=blob;f=include/qemu/atomic.h;hb=HEAD
does precisely what I'm talking about:

100 #ifndef smp_wmb
101 #ifdef __ATOMIC_RELEASE
102 #define smp_wmb()   __atomic_thread_fence(__ATOMIC_RELEASE)
103 #else
104 #define smp_wmb()   __sync_synchronize()
105 #endif
106 #endif
107
108 #ifndef smp_rmb
109 #ifdef __ATOMIC_ACQUIRE
110 #define smp_rmb()   __atomic_thread_fence(__ATOMIC_ACQUIRE)
111 #else
112 #define smp_rmb()   __sync_synchronize()
113 #endif
114 #endif

The commit that added it
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5444e768ee1abe6e021bece19a9a932351f88c88
was written by one gcc guy and reviewed by another one...

So I think we can be pretty sure that gcc's __atomic_thread_fence()
behaves like we want. We probably have to be a bit more careful about
extending that definition (by including atomic.h and doing
atomic_thread_fence(memory_order_acquire)) to use general C11. Which is
probably a couple years away anyway.

 Also, I pretty much designed those definitions to match what Linux
 does.  And it doesn't require that either, though it says that in most
 cases it will work out that way.

My point is that that read barriers aren't particularly meaningful
without a defined store order from another thread/process. Without any
form of pairing you don't have that. The writing side could just have
reordered the writes in a way you didn't want them.  And the kernel docs
do say A lack of appropriate pairing is almost certainly an error. But
since read barriers also pair with lock releases operations, that's
normally not a big problem.

  The definition of ACQ_REL is pretty clearly sufficient imo: Full
  barrier in both directions and synchronizes with acquire loads and
  release stores in another thread..
 
 I dunno.  What's an acquire load?  What's a release store?  I know
 what loads and stores are; I don't know what the adjectives mean.

An acquire load is either an explicit atomic load (tas, cmpxchg, etc
also count) or a normal load combined with a acquire barrier. The symmetric
definition is true for release store.

(so, on x86 every load/store that prevents compiler reordering
essentially a acquire/release store)

  And realistically, in the above example, you'd have to read flag to see
  that it's not already 1, right?
 
 Not necessarily.  You could be the only writer.  Think about the way
 the backend entries in the stats system work.  The point of setting
 the flag may be for other people to know whether the data is in the
 middle of being modified.

So you're thinking about something seqlock alike... Isn't the problem
then that you actually don't want acquire semantics, but release or
write barrier semantics on that store? The acquire/read barrier part
would be on the reader side, no?
I'm still unsure what you want to show with that example?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-26 Thread Oskari Saarenmaa

25.09.2014, 16:34, Andres Freund kirjoitti:

Binaries compiled on solaris using sun studio cc currently don't have
compiler and memory barriers implemented. That means we fall back to
relatively slow generic implementations for those. Especially compiler,
read, write barriers will be much slower than necessary (since they all
just need to prevent compiler reordering as both sparc and x86 are run
in TSO mode under solaris).


Attached patch implements compiler and memory barriers for Solaris 
Studio based on documentation at

http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html

I defined read and write barriers as acquire and release barriers 
instead of pure read and write ones as that's what other platforms 
appear to do.


/ Oskari
From 0d1ee2b1d720a6ff1ae6b4707356e198b763fccf Mon Sep 17 00:00:00 2001
From: Oskari Saarenmaa o...@ohmu.fi
Date: Fri, 26 Sep 2014 15:05:34 +0300
Subject: [PATCH] =?UTF-8?q?=C2=A0atomics:=20add=20compiler=20and=20memory?=
 =?UTF-8?q?=20barriers=20for=20solaris=20studio?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 configure |  2 +-
 configure.in  |  2 +-
 src/include/pg_config.h.in|  3 +++
 src/include/port/atomics/generic-sunpro.h | 17 +
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index f0580ce..6aa55d1 100755
--- a/configure
+++ b/configure
@@ -9163,7 +9163,7 @@ fi
 done
 
 
-for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo ac_cv_header_$ac_header | $as_tr_sh`
 ac_fn_c_check_header_mongrel $LINENO $ac_header $as_ac_Header $ac_includes_default
diff --git a/configure.in b/configure.in
index 527b076..6dc9c08 100644
--- a/configure.in
+++ b/configure.in
@@ -1016,7 +1016,7 @@ AC_SUBST(UUID_LIBS)
 ##
 
 dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES
-AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
 
 # On BSD, test for net/if.h will fail unless sys/socket.h
 # is included first.
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index ddcf4b0..3e78d65 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -340,6 +340,9 @@
 /* Define to 1 if `long long int' works and is 64 bits. */
 #undef HAVE_LONG_LONG_INT_64
 
+/* Define to 1 if you have the mbarrier.h header file. */
+#undef HAVE_MBARRIER_H
+
 /* Define to 1 if you have the `mbstowcs_l' function. */
 #undef HAVE_MBSTOWCS_L
 
diff --git a/src/include/port/atomics/generic-sunpro.h b/src/include/port/atomics/generic-sunpro.h
index b71b523..faab3d7 100644
--- a/src/include/port/atomics/generic-sunpro.h
+++ b/src/include/port/atomics/generic-sunpro.h
@@ -17,6 +17,23 @@
  * -
  */
 
+#ifdef HAVE_MBARRIER_H
+#include mbarrier.h
+
+#define pg_compiler_barrier_impl()	__compiler_barrier()
+
+#ifndef pg_memory_barrier_impl
+#	define pg_memory_barrier_impl()		__machine_rw_barrier()
+#endif
+#ifndef pg_read_barrier_impl
+#	define pg_read_barrier_impl()		__machine_acq_barrier()
+#endif
+#ifndef pg_write_barrier_impl
+#	define pg_write_barrier_impl()		__machine_rel_barrier()
+#endif
+
+#endif /* HAVE_MBARRIER_H */
+
 /* Older versions of the compiler don't have atomic.h... */
 #ifdef HAVE_ATOMIC_H
 
-- 
2.1.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-26 Thread Robert Haas
On Fri, Sep 26, 2014 at 8:36 AM, Oskari Saarenmaa o...@ohmu.fi wrote:
 25.09.2014, 16:34, Andres Freund kirjoitti:
 Binaries compiled on solaris using sun studio cc currently don't have
 compiler and memory barriers implemented. That means we fall back to
 relatively slow generic implementations for those. Especially compiler,
 read, write barriers will be much slower than necessary (since they all
 just need to prevent compiler reordering as both sparc and x86 are run
 in TSO mode under solaris).

 Attached patch implements compiler and memory barriers for Solaris Studio
 based on documentation at
 http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html

 I defined read and write barriers as acquire and release barriers instead of
 pure read and write ones as that's what other platforms appear to do.

So you think a read barrier is the same thing as an acquire barrier
and a write barrier is the same as a release barrier?  That would be
surprising.  It's certainly not true in general.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-26 Thread Oskari Saarenmaa

26.09.2014, 15:39, Robert Haas kirjoitti:

On Fri, Sep 26, 2014 at 8:36 AM, Oskari Saarenmaa o...@ohmu.fi wrote:

25.09.2014, 16:34, Andres Freund kirjoitti:

Binaries compiled on solaris using sun studio cc currently don't have
compiler and memory barriers implemented. That means we fall back to
relatively slow generic implementations for those. Especially compiler,
read, write barriers will be much slower than necessary (since they all
just need to prevent compiler reordering as both sparc and x86 are run
in TSO mode under solaris).


Attached patch implements compiler and memory barriers for Solaris Studio
based on documentation at
http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html

I defined read and write barriers as acquire and release barriers instead of
pure read and write ones as that's what other platforms appear to do.


So you think a read barrier is the same thing as an acquire barrier
and a write barrier is the same as a release barrier?  That would be
surprising.  It's certainly not true in general.


The above doc describes the difference: read barrier requires loads 
before the barrier to be completed before loads after the barrier - an 
acquire barrier is the same, but it also requires loads to be complete 
before stores after the barrier.


Similarly write barrier requires stores before the barrier to be 
completed before stores after the barrier - a release barrier is the 
same, but it also requires loads before the barrier to be completed 
before stores after the barrier.


So acquire is read + loads-before-stores and release is write + 
loads-before-stores.


The generic gcc atomics also define read barrier to __ATOMIC_ACQUIRE and 
write barrier to __ATOMIC_RELEASE.


/ Oskari


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-26 Thread Andres Freund
On 2014-09-26 08:39:38 -0400, Robert Haas wrote:
 On Fri, Sep 26, 2014 at 8:36 AM, Oskari Saarenmaa o...@ohmu.fi wrote:
  25.09.2014, 16:34, Andres Freund kirjoitti:
  Binaries compiled on solaris using sun studio cc currently don't have
  compiler and memory barriers implemented. That means we fall back to
  relatively slow generic implementations for those. Especially compiler,
  read, write barriers will be much slower than necessary (since they all
  just need to prevent compiler reordering as both sparc and x86 are run
  in TSO mode under solaris).
 
  Attached patch implements compiler and memory barriers for Solaris Studio
  based on documentation at
  http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
 
  I defined read and write barriers as acquire and release barriers instead of
  pure read and write ones as that's what other platforms appear to do.
 
 So you think a read barrier is the same thing as an acquire barrier
 and a write barrier is the same as a release barrier?  That would be
 surprising.  It's certainly not true in general.

It's generally true that a read barrier is implied by an acquire
barrier, no? Same for write barriers being implied by read
barriers. Neither is true the other way round, but that's fine.

Given how postgres uses memory barriers we actually could declare
read/write barriers to be compiler barriers when on solaris. Both
supported architectures (x86, sparc) are run in TSO mode. As the
existing barrier code for x86 says:
 * Both 32 and 64 bit x86 do not allow loads to be reordered with other loads,
 * or stores to be reordered with other stores, but a load can be performed
 * before a subsequent store.
 *
 * Technically, some x86-ish chips support uncached memory access and/or
 * special instructions that are weakly ordered.  In those cases we'd need
 * the read and write barriers to be lfence and sfence.  But since we don't
 * do those things, a compiler barrier should be enough.
 *
 * lock; addl has worked for longer than mfence. It's also rumored to be
 * faster in many scenarios

Unless I miss something the same is true for sparc *in solaris
userland*. But I'd be perfectly happy to go with something like Oksari's
version because it's still much better than the current code.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-26 Thread Robert Haas
On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa o...@ohmu.fi wrote:
 So you think a read barrier is the same thing as an acquire barrier
 and a write barrier is the same as a release barrier?  That would be
 surprising.  It's certainly not true in general.

 The above doc describes the difference: read barrier requires loads before
 the barrier to be completed before loads after the barrier - an acquire
 barrier is the same, but it also requires loads to be complete before stores
 after the barrier.

 Similarly write barrier requires stores before the barrier to be completed
 before stores after the barrier - a release barrier is the same, but it also
 requires loads before the barrier to be completed before stores after the
 barrier.

 So acquire is read + loads-before-stores and release is write +
 loads-before-stores.

Hmm.  My impression was that an acquire barrier means that loads and
stores can migrate forward across the barrier but not backward; and
that a release barrier means that loads and stores can migrate
backward across the barrier but not forward.  I'm actually not really
sure what this means unless the barrier also does something in and of
itself.  For example, consider this:

some stuff
CAS(lock, 0, 1) // i am an acquire barrier
more stuff
lock = 0 // i am a release barrier
even more stuff

If the CAS() and lock = 0 instructions were FULL barriers, then we'd
be saying that the stuff that happens in the critical section needs to
be exactly more stuff.  But if they are acquire and release
barriers, respectively, then the CPU is allowed to move some stuff
or even more stuff into the critical section; but what it can't do
is move more stuff out.

Now if you just have a naked acquire barrier that is not doing
anything itself, I don't really know what the semantics of that should
be.  Say I want to appear to only change things while flag is 1, so I
write this code:

flag = 1
acquire barrier
things++
release barrier
flag = 0

With the definition you (and Oracle) propose, this won't work, because
there's nothing to keep the modification of things from being
reordered before flag = 1.  What good is that?  Apparently, I don't
have any idea!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-26 Thread Oskari Saarenmaa

26.09.2014, 17:28, Robert Haas kirjoitti:

On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa o...@ohmu.fi wrote:

So you think a read barrier is the same thing as an acquire barrier
and a write barrier is the same as a release barrier?  That would be
surprising.  It's certainly not true in general.


The above doc describes the difference: read barrier requires loads before
the barrier to be completed before loads after the barrier - an acquire
barrier is the same, but it also requires loads to be complete before stores
after the barrier.

Similarly write barrier requires stores before the barrier to be completed
before stores after the barrier - a release barrier is the same, but it also
requires loads before the barrier to be completed before stores after the
barrier.

So acquire is read + loads-before-stores and release is write +
loads-before-stores.


Hmm.  My impression was that an acquire barrier means that loads and
stores can migrate forward across the barrier but not backward; and
that a release barrier means that loads and stores can migrate
backward across the barrier but not forward.  I'm actually not really
sure what this means unless the barrier also does something in and of
itself.  For example, consider this:


[...]


With the definition you (and Oracle) propose, this won't work, because
there's nothing to keep the modification of things from being
reordered before flag = 1.  What good is that?  Apparently, I don't
have any idea!


I'm not proposing any definition for acquire or release barriers, I was 
just proposing to use the things Solaris Studio defines as acquire and 
release barriers to implement read and write barriers in PostgreSQL 
because similar barrier names are used with gcc and on Solaris Studio 
acquire is a stronger read barrier and release is a stronger write 
barrier.  atomics.h's definition of pg_(read|write)_barrier doesn't have 
any requirements for loads before stores, though, so we could use 
__machine_r_barrier and __machine_w_barrier instead.


But as Andres pointed out all this is probably unnecessary and we could 
define read and write barrier as __compiler_barrier with Solaris Studio 
cc.  It's only available for Solaris (x86 and Sparc) and Linux (x86).


/ Oskari


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Inefficient barriers on solaris with sun cc

2014-09-25 Thread Andres Freund
Hi,

Binaries compiled on solaris using sun studio cc currently don't have
compiler and memory barriers implemented. That means we fall back to
relatively slow generic implementations for those. Especially compiler,
read, write barriers will be much slower than necessary (since they all
just need to prevent compiler reordering as both sparc and x86 are run
in TSO mode under solaris).

Since my estimate is that we'll use more and more barriers, that's going
to hurt more and more.

I do *not* plan to do anything about it atm, I just thought it might be
helpful to have this stated somewhere searchable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inefficient barriers on solaris with sun cc

2014-09-25 Thread Robert Haas
On Thu, Sep 25, 2014 at 9:34 AM, Andres Freund and...@2ndquadrant.com wrote:
 Binaries compiled on solaris using sun studio cc currently don't have
 compiler and memory barriers implemented. That means we fall back to
 relatively slow generic implementations for those. Especially compiler,
 read, write barriers will be much slower than necessary (since they all
 just need to prevent compiler reordering as both sparc and x86 are run
 in TSO mode under solaris).

 Since my estimate is that we'll use more and more barriers, that's going
 to hurt more and more.

 I do *not* plan to do anything about it atm, I just thought it might be
 helpful to have this stated somewhere searchable.

To put that another way:

If there are any Sun Studio users out there who care about performance
on big iron, please send a patch to fix this...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers