On Mon, Oct 27, 2014 at 9:32 AM, Andres Freund <and...@2ndquadrant.com> wrote: > I've previously posted a patch at > http://archives.postgresql.org/message-id/20141010160020.GG6670%40alap3.anarazel.de > that reduces contention in StrategyGetBuffer() by making the clock sweep > lockless. Robert asked me to post it to a new thread; I originally > wrote it to see some other contention in more detail, that's why it > ended up in that thread...
Does this LATCHPTR_ACCESS_ONCE crap really do anything? Why do we need that? I'm not convinced that it's actually solving any problem we would otherwise have with this code. But if it is, then how about adding a flag that is 4 bytes wide or less alongside bgwriterLatch, and just checking the flag instead of checking bgwriterLatch itself? Actually, the same approach would work for INT_ACCESS_ONCE. So I propose this approach instead: Add a new atomic flag to StrategyControl, useSlowPath. If StrategyGetBuffer() finds useSlowPath set, call StrategyGetBufferSlow(), which will acquire the spinlock, check whether bgwriterLatch is set and/or the freelist is non-empty and return a buffer ID if able to allocate from the freelist; otherwise -1. If StrategyGetBufferSlow() returns -1, or we decide not to call it in the first place because useSlowPath isn't set, then fall through to your clock-sweep logic. We can set useSlowPath at stratup and whenever we put buffers on the free list. The lack of memory barriers while testing useSlowPath (or, in your version, when testing the ACCESS_ONCE conditions) means that we could fail to set the bgwriter latch or pop from the freelist if another backend has very recently updated those things. But that's not actually a problem; it's fine for any individual request to skip those things, as they are more like hints than correctness requirements. The interaction between this and the bgreclaimer idea is interesting. We can't making popping the freelist lockless without somehow dealing with the resulting A-B-A problem (namely, that between the time we read &head->next and the time we CAS the list-head to that value, the head might have been popped, another item pushed, and the original list head pushed again). So even if bgreclaimer saves some work for individual backends - avoiding the need for them to clock-sweep across many buffers - it may not be worth it if it means taking a spinlock to pop the freelist instead of using an atomics-driven clock sweep. Considering that there may be a million plus buffers to scan through, that's a surprising conclusion, but it seems to be where the data is pointing us. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers