Thanks for reviewing, Andres. On Thu, Sep 11, 2014 at 7:01 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> +static void bgreclaim_quickdie(SIGNAL_ARGS); >> +static void BgreclaimSigHupHandler(SIGNAL_ARGS); >> +static void ReqShutdownHandler(SIGNAL_ARGS); >> +static void bgreclaim_sigusr1_handler(SIGNAL_ARGS); > > This looks inconsistent.
It's exactly the same as what bgwriter.c does. > No LWLockReleaseAll(), AbortBufferIO(), ...? Unconvinced that that's a > good idea, regardless of it possibly being true today (which I'm not > sure about yet). We really need a more centralized way to handle error cleanup in auxiliary processes. The current state of affairs is really pretty helter-skelter. But for this patch, I think we should aim to mimic the existing style, as ugly as it is. I'm not sure whether Amit's got the logic correct, though: I'd agree LWLockReleaseAll(), at a minimum, is probably a good idea. >> +Two water mark indicators are used to maintain sufficient number of buffers >> +on freelist. Low water mark indicator is used by backends to wake >> bgreclaimer >> +when the number of buffers in freelist falls below it. High water mark >> +indicator is used by bgreclaimer to move buffers to freelist. > > For me the description of the high water as stated here doesn't seem to > explain anything. Yeah, let me try to revise and expand on that a bit: Background Reclaimer's Processing --------------------------------- The background reclaimer runs the clock sweep to identify buffers that are good candidates for eviction and puts them on the freelist. This makes buffer allocation much faster, since removing a buffer from the head of a linked list is much cheaper than linearly scanning the whole buffer pool until a promising candidate is found. It's possible that a buffer we add to the freelist may be accessed or even pinned before it's evicted; if that happens, the backend that would have evicted it will simply disregard it and take the next buffer instead (or run the clock sweep itself, if necessary). However, to make sure that doesn't happen too often, we need to keep the freelist as short as possible, so that there won't be many other buffer accesses between when the time a buffer is added to the freelist and the time when it's actually evicted. We use two water marks to control the activity of the bgreclaimer process. Each time bgreclaimer is awoken, it will move buffers to the freelist until the length of the free list reaches the high water mark. It will then sleep. When the number of buffers on the freelist reaches the low water mark, backends attempting to allocate new buffers will set the bgreclaimer's latch, waking it up again. While it's important for the high water mark to be small (for the reasons described above), we also need to ensure adequate separation between the low and high water marks, so that the bgreclaimer isn't constantly being awoken to find just a handful of additional candidate buffers, and we need to ensure that the low watermark is adequate to keep the freelist from becoming completely empty before bgreclaimer has time to wake up and beginning filling it again. > This section should have a description of how the reclaimer interacts > with the bgwriter logic. Do we put dirty buffers on the freelist that > are then cleaned by the bgwriter? Which buffers does the bgwriter write > out? The bgwriter is cleaning ahead of the strategy point, and the bgreclaimer is advancing the strategy point. I think we should consider having the bgreclaimer wake the bgwriter if it comes across a dirty buffer, because while the bgwriter only estimates the rate of buffer allocation, bgreclaimer *knows* the rate of allocation, because its own activity is tied to the allocation rate. I think there's the potential for this kind of thing to make the background writer significantly more effective than it is today, but I'm heavily in favor of leaving it for a separate patch. > I wonder if we don't want to increase the high watermark when > tmp_recent_backend_clocksweep > 0? That doesn't really work unless there's some countervailing force to eventually reduce it again; otherwise, it'd just converge to infinity. And it doesn't really seem necessary at the moment. > Hm. Perhaps we should do a bufHdr->refcount != zero check without > locking here? The atomic op will transfer the cacheline exclusively to > the reclaimer's CPU. Even though it very shortly afterwards will be > touched afterwards by the pinning backend. Meh. I'm not in favor of adding more funny games with locking unless we can prove they're necessary for performance. > * Are we sure that the freelist_lck spinlock won't cause pain? Right now > there will possibly be dozens of processes busily spinning on it... I > think it's a acceptable risk, but we should think about it. As you and I have talked about before, we could reduce contention here by partitioning the freelist, or by using a CAS loop to pop items off of it. But I am not convinced either is necessary; I think it's hard for the system to accumulate enough people hitting the freelist simultaneously to matter, because the stuff they've got to do between one freelist access and the next is generally going to be something much more expensive, like reading 8kB from the OS. One question in my mind is whether we ought to separate this patch into two - one for the changes to the locking regime, and another for the addition of the bgreclaimer process. Those things are really two different features, although they are tightly enough coupled that maybe it's OK to keep them together. I also think it would be good to get some statistics on how often regular backends are running the clocksweep vs. how often bgreclaimer is satisfying their needs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers