Re: [HACKERS] Spinlocks and compiler/memory barriers

Robert Haas Mon, 30 Jun 2014 05:04:52 -0700

On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund <[email protected]> wrote:
>> No, I think it's going to be *much* simpler than that.  How about I
>> take a crack at this next week and then either (a) I'll see why it's a
>> bad idea and we can go from there or (b) you can review what I come up
>> with and tell me why it sucks?
>
> Ok. I think that's going in the wrong direction (duplication of
> nontrivial knowledge), but maybe I'm taking a to 'purist' approach
> here. Prove me wrong :)


I'm not sure what you'll think of this, but see attached.  I think
newer releases of Sun Studio support that GCC way of doing a barrier,
but I don't know how to test for that, so at the moment that uses the
fallback "put it in a function" approach, along with HPPA non-GCC and
Univel CC.  Those things are obscure enough that I don't care about
making them less performance.  I think AIX is actually OK as-is; if
_check_lock() is a compiler barrier but _clear_lock() is not, that
would be pretty odd.

>> > How are you suggesting we deal with the generic S_UNLOCK
>> > case without having a huge ifdef?
>> > Why do we build an abstraction layer (barrier.h) and then not use it?
>>
>> Because (1) the abstraction doesn't fit very well unless we do a lot
>> of additional work to build acquire and release barriers for every
>> platform we support and
>
> Meh. Something like the (untested):
> #if !defined(pg_release_barrier) && defined(pg_read_barrier) && 
> defined(pg_write_barrier)
> #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} 
> while (0)
> #elif !defined(pg_release_barrier)
> #define pg_release_barrier() pg_memory_barrier()
> #endif
>
> before the fallback definition of pg_read/write_barrier should suffice?

That doesn't sound like a good idea.  For example, on PPC, a read
barrier is lwsync, and so is a write barrier.  That would also suck on
any architecture where either a read or write barrier gets implemented
as a full memory barrier, though I guess it might be smart enough to
optimize away most of the cost of the second of two barriers in
immediate succession.

I think if we want to use barrier.h as the foundation for this, we're
going to need to define a new set of things in there that have acquire
and release semantics, or just use full barriers for everything -
which would be wasteful on, e.g., x86.  And I don't particularly see
the point in going to a lot of work to invent those new abstractions
everywhere when we can just tweak s_lock.h in place a bit and be done
with it.  Making those files interdependent doesn't strike me as a
win.

>> (2) I don't have much confidence that we can
>> depend on the spinlock fallback for barriers without completely
>> breaking obscure platforms, and I'd rather make a more minimal set of
>> changes.
>
> Well, it's the beginning of the cycle. And we're already depending on
> barriers for correctness (and it's not getting less), so I don't really
> see what avoiding barrier usage buys us except harder to find breakage?

If we use barriers to implement any facility other than spinlocks, I
have reasonable confidence that we won't break things more than they
already are broken today, because I think the fallback to
acquire-and-release a spinlock probably works, even though it's likely
terrible for performance.  I have significantly less confidence that
trying to implement spinlocks in terms of barriers is going to be
reliable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index efe1b43..38dc34d 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -154,6 +154,17 @@ s_lock(volatile slock_t *lock, const char *file, int line)
 	return delays;
 }
 
+#ifdef USE_DEFAULT_S_UNLOCK
+void
+s_unlock(slock_t *lock)
+{
+#ifdef TAS_ACTIVE_WORD
+	*TAS_ACTIVE_WORD(lock) = -1;
+#else
+	*lock = 0;
+#endif
+}
+#endif
 
 /*
  * Set local copy of spins_per_delay during backend startup.
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 895abe6..f1a89dc 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -55,14 +55,16 @@
  *	on Alpha TAS() will "fail" if interrupted.  Therefore a retry loop must
  *	always be used, even if you are certain the lock is free.
  *
- *	Another caution for users of these macros is that it is the caller's
- *	responsibility to ensure that the compiler doesn't re-order accesses
- *	to shared memory to precede the actual lock acquisition, or follow the
- *	lock release.  Typically we handle this by using volatile-qualified
- *	pointers to refer to both the spinlock itself and the shared data
- *	structure being accessed within the spinlocked critical section.
- *	That fixes it because compilers are not allowed to re-order accesses
- *	to volatile objects relative to other such accesses.
+ *	It is the responsibility of these macros to make sure that the compiler
+ *	does not re-order accesses to shared memory to precede the actual lock
+ *	acquisition, or follow the lock release.  Prior to PostgreSQL 9.5, this
+ *	was the caller's responsibility, which meant that callers had to use
+ *	volatile-qualified pointers to refer to both the spinlock itself and the
+ *	shared data being accessed within the spinlocked critical section.  This
+ *	was notationally awkward, easy to forget (and thus error-prone), and
+ *	prevented some useful compiler optimizations.  For these reasons, we
+ *	now require that the macros themselves prevent compiler re-ordering,
+ *	so that the caller doesn't need to take special precautions.
  *
  *	On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and
  *	S_UNLOCK() macros must further include hardware-level memory fence
@@ -478,14 +480,14 @@ tas(volatile slock_t *lock)
 #define S_UNLOCK(lock)	\
 do \
 { \
-	__asm__ __volatile__ ("	lwsync \n"); \
+	__asm__ __volatile__ ("	lwsync \n" ::: "memory"); \
 	*((volatile slock_t *) (lock)) = 0; \
 } while (0)
 #else
 #define S_UNLOCK(lock)	\
 do \
 { \
-	__asm__ __volatile__ ("	sync \n"); \
+	__asm__ __volatile__ ("	sync \n" ::: "memory"); \
 	*((volatile slock_t *) (lock)) = 0; \
 } while (0)
 #endif /* USE_PPC_LWSYNC */
@@ -593,7 +595,9 @@ do \
 		"       .set noreorder      \n" \
 		"       .set nomacro        \n" \
 		"       sync                \n" \
-		"       .set pop              "); \
+		"       .set pop              "
+:
+:		"memory");
 	*((volatile slock_t *) (lock)) = 0; \
 } while (0)
 
@@ -651,6 +655,15 @@ tas(volatile slock_t *lock)
 typedef unsigned char slock_t;
 #endif
 
+#if !defined(S_UNLOCK)
+#if defined(__INTEL_COMPILER)
+#define S_UNLOCK(lock)	\
+	do { __memory_barrier(); *(lock) = 0; } while (0)
+#else
+#define S_UNLOCK(lock)	\
+	do { __asm__ __volatile__("" : : : "memory");  *(lock) = 0; } while (0)
+#endif
+#endif
 
 #endif	/* defined(__GNUC__) || defined(__INTEL_COMPILER) */
 
@@ -724,9 +737,13 @@ tas(volatile slock_t *lock)
 	return (lockval == 0);
 }
 
-#endif /* __GNUC__ */
+#define S_UNLOCK(lock)	\
+	do { \
+		__asm__ __volatile__("" : : : "memory"); \
+		*TAS_ACTIVE_WORD(lock) = -1; \
+	} while (0)
 
-#define S_UNLOCK(lock)	(*TAS_ACTIVE_WORD(lock) = -1)
+#endif /* __GNUC__ */
 
 #define S_INIT_LOCK(lock) \
 	do { \
@@ -764,6 +781,8 @@ typedef unsigned int slock_t;
 #define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)
 /* On IA64, it's a win to use a non-locking test before the xchg proper */
 #define TAS_SPIN(lock)	(*(lock) ? 1 : TAS(lock))
+#define S_UNLOCK(lock)	\
+	do { _Asm_sched_fence(); (*(lock)) = 0); } while (0)
 
 #endif	/* HPUX on IA64, non gcc */
 
@@ -826,6 +845,9 @@ spin_delay(void)
 }
 #endif
 
+#define S_UNLOCK(lock)	\
+	do { MemoryBarrier(); (*(lock)) = 0); } while (0)
+
 #endif
 
 
@@ -876,7 +898,21 @@ extern int	tas_sema(volatile slock_t *lock);
 #endif	 /* S_LOCK_FREE */
 
 #if !defined(S_UNLOCK)
-#define S_UNLOCK(lock)		(*((volatile slock_t *) (lock)) = 0)
+/*
+ * On most platforms, S_UNLOCK is essentially *(lock) = 0, but we can't just
+ * put that it into an inline macro, because the compiler might reorder
+ * instructions from the critical section to occur after the lock release.
+ * But since the compiler probably can't know what the external function
+ * s_unlock is doing, putting the same logic there should be adequate.
+ * Wherever possible, it's best not to rely on this default implementation,
+ * both because a sufficiently-smart globally optimizing compiler might be
+ * able to see through this charade, and perhaps more importantly because
+ * adding the cost of a function call to every spinlock release may hurt
+ * performance significantly.
+ */
+#define USE_DEFAULT_S_UNLOCK
+extern void s_unlock(volatile s_lock *lock);
+#define S_UNLOCK(lock)		s_unlock(lock)
 #endif	 /* S_UNLOCK */
 
 #if !defined(S_INIT_LOCK)

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spinlocks and compiler/memory barriers

Reply via email to