On Sat, Jun 28, 2014 at 9:41 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> No, I think it's going to be *much* simpler than that. How about I >> take a crack at this next week and then either (a) I'll see why it's a >> bad idea and we can go from there or (b) you can review what I come up >> with and tell me why it sucks? > > Ok. I think that's going in the wrong direction (duplication of > nontrivial knowledge), but maybe I'm taking a to 'purist' approach > here. Prove me wrong :)
I'm not sure what you'll think of this, but see attached. I think newer releases of Sun Studio support that GCC way of doing a barrier, but I don't know how to test for that, so at the moment that uses the fallback "put it in a function" approach, along with HPPA non-GCC and Univel CC. Those things are obscure enough that I don't care about making them less performance. I think AIX is actually OK as-is; if _check_lock() is a compiler barrier but _clear_lock() is not, that would be pretty odd. >> > How are you suggesting we deal with the generic S_UNLOCK >> > case without having a huge ifdef? >> > Why do we build an abstraction layer (barrier.h) and then not use it? >> >> Because (1) the abstraction doesn't fit very well unless we do a lot >> of additional work to build acquire and release barriers for every >> platform we support and > > Meh. Something like the (untested): > #if !defined(pg_release_barrier) && defined(pg_read_barrier) && > defined(pg_write_barrier) > #define pg_release_barrier() do { pg_read_barrier(); pg_write_barrier();} > while (0) > #elif !defined(pg_release_barrier) > #define pg_release_barrier() pg_memory_barrier() > #endif > > before the fallback definition of pg_read/write_barrier should suffice? That doesn't sound like a good idea. For example, on PPC, a read barrier is lwsync, and so is a write barrier. That would also suck on any architecture where either a read or write barrier gets implemented as a full memory barrier, though I guess it might be smart enough to optimize away most of the cost of the second of two barriers in immediate succession. I think if we want to use barrier.h as the foundation for this, we're going to need to define a new set of things in there that have acquire and release semantics, or just use full barriers for everything - which would be wasteful on, e.g., x86. And I don't particularly see the point in going to a lot of work to invent those new abstractions everywhere when we can just tweak s_lock.h in place a bit and be done with it. Making those files interdependent doesn't strike me as a win. >> (2) I don't have much confidence that we can >> depend on the spinlock fallback for barriers without completely >> breaking obscure platforms, and I'd rather make a more minimal set of >> changes. > > Well, it's the beginning of the cycle. And we're already depending on > barriers for correctness (and it's not getting less), so I don't really > see what avoiding barrier usage buys us except harder to find breakage? If we use barriers to implement any facility other than spinlocks, I have reasonable confidence that we won't break things more than they already are broken today, because I think the fallback to acquire-and-release a spinlock probably works, even though it's likely terrible for performance. I have significantly less confidence that trying to implement spinlocks in terms of barriers is going to be reliable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index efe1b43..38dc34d 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -154,6 +154,17 @@ s_lock(volatile slock_t *lock, const char *file, int line) return delays; } +#ifdef USE_DEFAULT_S_UNLOCK +void +s_unlock(slock_t *lock) +{ +#ifdef TAS_ACTIVE_WORD + *TAS_ACTIVE_WORD(lock) = -1; +#else + *lock = 0; +#endif +} +#endif /* * Set local copy of spins_per_delay during backend startup. diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 895abe6..f1a89dc 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -55,14 +55,16 @@ * on Alpha TAS() will "fail" if interrupted. Therefore a retry loop must * always be used, even if you are certain the lock is free. * - * Another caution for users of these macros is that it is the caller's - * responsibility to ensure that the compiler doesn't re-order accesses - * to shared memory to precede the actual lock acquisition, or follow the - * lock release. Typically we handle this by using volatile-qualified - * pointers to refer to both the spinlock itself and the shared data - * structure being accessed within the spinlocked critical section. - * That fixes it because compilers are not allowed to re-order accesses - * to volatile objects relative to other such accesses. + * It is the responsibility of these macros to make sure that the compiler + * does not re-order accesses to shared memory to precede the actual lock + * acquisition, or follow the lock release. Prior to PostgreSQL 9.5, this + * was the caller's responsibility, which meant that callers had to use + * volatile-qualified pointers to refer to both the spinlock itself and the + * shared data being accessed within the spinlocked critical section. This + * was notationally awkward, easy to forget (and thus error-prone), and + * prevented some useful compiler optimizations. For these reasons, we + * now require that the macros themselves prevent compiler re-ordering, + * so that the caller doesn't need to take special precautions. * * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and * S_UNLOCK() macros must further include hardware-level memory fence @@ -478,14 +480,14 @@ tas(volatile slock_t *lock) #define S_UNLOCK(lock) \ do \ { \ - __asm__ __volatile__ (" lwsync \n"); \ + __asm__ __volatile__ (" lwsync \n" ::: "memory"); \ *((volatile slock_t *) (lock)) = 0; \ } while (0) #else #define S_UNLOCK(lock) \ do \ { \ - __asm__ __volatile__ (" sync \n"); \ + __asm__ __volatile__ (" sync \n" ::: "memory"); \ *((volatile slock_t *) (lock)) = 0; \ } while (0) #endif /* USE_PPC_LWSYNC */ @@ -593,7 +595,9 @@ do \ " .set noreorder \n" \ " .set nomacro \n" \ " sync \n" \ - " .set pop "); \ + " .set pop " +: +: "memory"); *((volatile slock_t *) (lock)) = 0; \ } while (0) @@ -651,6 +655,15 @@ tas(volatile slock_t *lock) typedef unsigned char slock_t; #endif +#if !defined(S_UNLOCK) +#if defined(__INTEL_COMPILER) +#define S_UNLOCK(lock) \ + do { __memory_barrier(); *(lock) = 0; } while (0) +#else +#define S_UNLOCK(lock) \ + do { __asm__ __volatile__("" : : : "memory"); *(lock) = 0; } while (0) +#endif +#endif #endif /* defined(__GNUC__) || defined(__INTEL_COMPILER) */ @@ -724,9 +737,13 @@ tas(volatile slock_t *lock) return (lockval == 0); } -#endif /* __GNUC__ */ +#define S_UNLOCK(lock) \ + do { \ + __asm__ __volatile__("" : : : "memory"); \ + *TAS_ACTIVE_WORD(lock) = -1; \ + } while (0) -#define S_UNLOCK(lock) (*TAS_ACTIVE_WORD(lock) = -1) +#endif /* __GNUC__ */ #define S_INIT_LOCK(lock) \ do { \ @@ -764,6 +781,8 @@ typedef unsigned int slock_t; #define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE) /* On IA64, it's a win to use a non-locking test before the xchg proper */ #define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock)) +#define S_UNLOCK(lock) \ + do { _Asm_sched_fence(); (*(lock)) = 0); } while (0) #endif /* HPUX on IA64, non gcc */ @@ -826,6 +845,9 @@ spin_delay(void) } #endif +#define S_UNLOCK(lock) \ + do { MemoryBarrier(); (*(lock)) = 0); } while (0) + #endif @@ -876,7 +898,21 @@ extern int tas_sema(volatile slock_t *lock); #endif /* S_LOCK_FREE */ #if !defined(S_UNLOCK) -#define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) +/* + * On most platforms, S_UNLOCK is essentially *(lock) = 0, but we can't just + * put that it into an inline macro, because the compiler might reorder + * instructions from the critical section to occur after the lock release. + * But since the compiler probably can't know what the external function + * s_unlock is doing, putting the same logic there should be adequate. + * Wherever possible, it's best not to rely on this default implementation, + * both because a sufficiently-smart globally optimizing compiler might be + * able to see through this charade, and perhaps more importantly because + * adding the cost of a function call to every spinlock release may hurt + * performance significantly. + */ +#define USE_DEFAULT_S_UNLOCK +extern void s_unlock(volatile s_lock *lock); +#define S_UNLOCK(lock) s_unlock(lock) #endif /* S_UNLOCK */ #if !defined(S_INIT_LOCK)
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers