On Mon, Jan 2, 2012 at 5:41 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Simon Riggs <si...@2ndquadrant.com> writes: >> Does anyone have a better idea for reducing BufFreelistLock >> contention? Something simple that will work for 9.2? > > Get rid of the freelist? Once shared buffers are full, it's just about > useless anyway. But you'd need to think about the test cases that you > pay attention to, as there might be scenarios where it remains useful.
Agree freelist is mostly useless, but startup and dropping objects requires it. When its full its just an integer > 0 test, which is cheap and its on the same cacheline as the nextVictimBuffer anyway, so we have to fetch it. The clock sweep is where all the time goes, in its current form. We have 2 problems 1. StrategySyncStart() requests exclusive lock on BufFreelistLock, so bgwriter has to fight with backends to find out where it should clean. As a result it spends lots of time waiting and insufficient time cleaning. 2. When a backend can't find a free buffer, it spins for a long time while holding the lock. This makes the buffer strategy O(N) in its worst case, which slows everything down. Notably, while this is happening the bgwriter sits doing nothing at all, right at the moment when it is most needed. The Clock algorithm is an approximation of an LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid worst case behaviour makes sense. How much to tweak? Well,... (1) is fixed by buffreelistlock_reduction.v1.patch (2) is fixed by freelist_ok.v2.patch -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 91cc001..86af943 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1242,7 +1242,7 @@ BufferSync(int flags) * Note that we don't read the buffer alloc count here --- that should be * left untouched till the next BgBufferSync() call. */ - buf_id = StrategySyncStart(NULL, NULL); + buf_id = StrategySyncStart(NULL, NULL, NULL); num_to_scan = NBuffers; num_written = 0; while (num_to_scan-- > 0) @@ -1315,6 +1315,7 @@ BgBufferSync(void) int strategy_buf_id; uint32 strategy_passes; uint32 recent_alloc; + bool with_lock; /* * Information saved between calls so we can determine the strategy @@ -1352,10 +1353,11 @@ BgBufferSync(void) * Find out where the freelist clock sweep currently is, and how many * buffer allocations have happened since our last call. */ - strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc); + strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc, &with_lock); /* Report buffer alloc counts to pgstat */ - BgWriterStats.m_buf_alloc += recent_alloc; + if (with_lock) + BgWriterStats.m_buf_alloc += recent_alloc; /* * If we're not running the LRU scan, just stop after doing the stats diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 3e62448..27d4ae8 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -252,20 +252,57 @@ StrategyFreeBuffer(volatile BufferDesc *buf) * being read. */ int -StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc) +StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc, bool *withlock) { int result; - LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); - result = StrategyControl->nextVictimBuffer; - if (complete_passes) - *complete_passes = StrategyControl->completePasses; - if (num_buf_alloc) + if (withlock) { - *num_buf_alloc = StrategyControl->numBufferAllocs; - StrategyControl->numBufferAllocs = 0; + bool gotlock = false; + + /* + * Try to get the lock without waiting. + */ + if (LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE)) + gotlock = true; + + /* + * If lock isn't available without waiting then mostly we don't + * care and just read the nextVictimBuffer without a lock. + * + * Bgwriter stats reporting relies on us reading accurate values + * and being able to reset the counter, so about 1% of the time + * we want to wait for the lock so we stats don't get starved out. + */ + if (!gotlock && random() <= (MAX_RANDOM_VALUE / 100)) + { + LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); + gotlock = true; + } + + if (gotlock) + { + *withlock = true; + result = StrategyControl->nextVictimBuffer; + if (complete_passes) + *complete_passes = StrategyControl->completePasses; + if (num_buf_alloc) + { + *num_buf_alloc = StrategyControl->numBufferAllocs; + StrategyControl->numBufferAllocs = 0; + } + LWLockRelease(BufFreelistLock); + + return result; + } } - LWLockRelease(BufFreelistLock); + + /* + * We didn't get the lock, but read the value anyway on the assumption + * that reading this value is atomic. + */ + result = StrategyControl->nextVictimBuffer; + return result; } diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h index e43719e..95a5a2f 100644 --- a/src/include/storage/buf_internals.h +++ b/src/include/storage/buf_internals.h @@ -187,7 +187,8 @@ extern void StrategyFreeBuffer(volatile BufferDesc *buf); extern bool StrategyRejectBuffer(BufferAccessStrategy strategy, volatile BufferDesc *buf); -extern int StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc); +extern int StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc, + bool *withlock); extern Size StrategyShmemSize(void); extern void StrategyInitialize(bool init);
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index bf9903b..b2935af 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -108,6 +108,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) { volatile BufferDesc *buf; int trycounter; + int freelist_limit; /* How fast we get less choosy if no victims */ + int usage_limit; /* How choosy we are about our next victim */ /* * If given a strategy object, see whether it can select a buffer. We @@ -167,7 +169,12 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } /* Nothing on the freelist, so run the "clock sweep" algorithm */ - trycounter = NBuffers; + trycounter = 0; + +#define FREELIST_WORST_CASE_LIMIT 16 + usage_limit = 0; + freelist_limit = FREELIST_WORST_CASE_LIMIT; + for (;;) { buf = &BufferDescriptors[StrategyControl->nextVictimBuffer]; @@ -179,16 +186,28 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) } /* - * If the buffer is pinned or has a nonzero usage_count, we cannot use - * it; decrement the usage_count (unless pinned) and keep scanning. + * If the buffer is pinned we cannot use it. + * + * We first look for a buffer with usage_count == 0, but limiting out + * search to the first FREELIST_WORST_CASE_LIMIT buffers. We slowly + * get less choosy if the buffers we see are all popular. + * + * This is designed to give O(k) worst case behaviour in cases + * where there are many popular buffers and shared buffers is large, + * which is common since those two siituations are related. */ LockBufHdr(buf); if (buf->refcount == 0) { - if (buf->usage_count > 0) + if (buf->usage_count > usage_limit) { buf->usage_count--; - trycounter = NBuffers; + if (trycounter++ > freelist_limit) + { + trycounter = 0; + usage_limit++; + freelist_limit *= 2; + } } else { @@ -198,7 +217,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held) return buf; } } - else if (--trycounter == 0) + else if (trycounter++ >= NBuffers) { /* * We've scanned all the buffers without making any state changes,
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers