Re: [HACKERS] Scaling shared buffer eviction

Robert Haas Tue, 23 Sep 2014 16:22:21 -0700

On Tue, Sep 23, 2014 at 6:54 PM, Andres Freund <and...@2ndquadrant.com> wrote:
> Am I understanding you correctly that you also measured context switches
> for spinlocks? If so, I don't think that's a valid comparison. LWLocks
> explicitly yield the CPU as soon as there's any contention while
> spinlocks will, well, spin. Sure they also go to sleep, but it'll take
> longer.


No, I measured CPU consumption attributable to s_lock.

(I checked context-switches, too.)

> It's also worthwile to remember in such comparisons that lots of the
> cost of spinlocks will be in the calling routines, not s_lock() - the
> first TAS() is inlined into it. And that's the one that'll incur cache
> misses and such...

True.  I can check that - I did not.

> Note that I'm explicitly *not* doubting the use of a spinlock
> itself. Given the short acquiration times and the exclusive use of
> exlusive acquiration a spinlock makes more sense. The lwlock's spinlock
> alone will have about as much contention.

Right.

> I think it might be possible to construct some cases where the spinlock
> performs worse than the lwlock. But I think those will be clearly in the
> minority. And at least some of those will be fixed by bgwriter. As an
> example consider a single backend COPY ... FROM of large files with a
> relatively large s_b. That's a seriously bad workload for the current
> victim buffer selection because frequently most of the needs to be
> searched for a buffer with usagecount 0. And this patch will double the
> amount of atomic ops during that.

It will actually be far worse than that, because we'll acquire and
release the spinlock for every buffer over which we advance the clock
sweep, instead of just once for the whole thing.  That's reason to
hope that a smart bgreclaimer process may be a good improvement on top
of this.  It can do things like advance the clock sweep hand 16
buffers at a time and then sweep them all after-the-fact, reducing
contention on the mutex by an order-of-magnitude, if that turns out to
be an important consideration.

But I think it's right to view that as something we need to test vs.
the baseline established by this patch.  What's clear today is that
workloads which stress buffer-eviction fall to pieces, because the
entire buffer-eviction process is essentially single-threaded.  One
process can't begin evicting a buffer until another has finished doing
so.  This lets multiple backends do that at the same time.  We may
find cases where that leads to an unpleasant amount of contention, but
since we have several ideas for how to mitigate that as needs be, I
think it's OK to go ahead.  The testing we're doing on the combined
patch is conflating the effects the new locking regimen with however
the bgreclaimer affects things, and it's very clear to me now that we
need to make sure those are clearly separate.

> Let me try to quantify that.

Please do.

>> What's interesting is that I can't see in the perf output any real
>> sign that the buffer mapping locks are slowing things down, but they
>> clearly are, because when I take this patch and also boost
>> NUM_BUFFER_PARTITIONS to 128, the performance goes up:
>
> What did events did you profile?

cs.

>> I welcome further testing and comments, but my current inclination is
>> to go ahead and push the attached patch.  To my knowledge, nobody has
>> at any point objected to this aspect of what's being proposed, and it
>> looks safe to me and seems to be a clear win.  We can then deal with
>> the other issues on their own merits.
>
> I've took a look at this, and all the stuff I saw that I disliked were
> there before this patch. So +1 for going ahead.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

Reply via email to