Following some advice from Intel, http://www.intel.com/cd/ids/developer/asmo-na/eng/technologies/threading /20469.htm?page=2 I'm looking at whether the LWLock data structures may be within the same cache line.
Intel uses 128 byte cache lines on its high end processors. slru.c uses BUFFERALIGN which is currently hardcoded in pg_config_manual.c to be #define ALIGNOF_BUFFER 32 which seems to be the wrong setting for the Intel CPUs, possibly others. In slru.c we have this code fragment: /* Release shared lock, grab per-buffer lock instead */ LWLockRelease(shared->ControlLock); LWLockAcquire(shared->buffer_locks[slotno], LW_EXCLUSIVE); The purpose of this is to reduce contention, by holding finer grained locks. ISTM what this does is drop one lock then take another lock by accessing an array (buffer_locks) which will be in the same cache line for all locks, then access the LWLock data structure, which again will be all within the same cache line. ISTM that we have fine grained LWLocks, but not fine grained cache lines. That means that all Clog and Subtrans locks would be effected, since we have 8 of each. For other global LWlocks, the same thing applies, so BufMgrLock and many other locks are effectively all the same from the cache's perspective. ...and BTW, what is MMCacheLock?? is that an attempt at padding already? It looks like padding out LWLock struct would ensure that each of those were in separate cache lines? Any views? Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings