Re: [HACKERS] Scaling shared buffer eviction

Robert Haas Thu, 11 Sep 2014 06:50:18 -0700

On Thu, Sep 11, 2014 at 9:22 AM, Andres Freund <[email protected]> wrote:
>> It's exactly the same as what bgwriter.c does.
>
> So what? There's no code in common, so I see no reason to have one
> signal handler using underscores and the next one camelcase names.


/me shrugs.

It's not always possible to have things be consistent with each other
within a file and also with what gets done in other files.  I'm not
sure we should fault patch authors for choosing a different one than
we would have chosen.  FWIW, I probably would have done it the way
Amit did it.  I don't actually care, though.

> I wonder if we should recheck the number of freelist items before
> sleeping. As the latch currently is reset before sleeping (IIRC) we
> might miss being woken up soon. It very well might be that bgreclaim
> needs to run for more than one cycle in a row to keep up...

The outer loop in BgMoveBuffersToFreelist() was added to address
precisely this point, which I raised in a previous review.

>> > I wonder if we don't want to increase the high watermark when
>> > tmp_recent_backend_clocksweep > 0?
>>
>> That doesn't really work unless there's some countervailing force to
>> eventually reduce it again; otherwise, it'd just converge to infinity.
>> And it doesn't really seem necessary at the moment.
>
> Right, it obviously needs to go both ways. I'm a bit sceptic about
> untunable, fixed, numbers proving to be accurate for widely varied
> workloads.

Me, too, but I'm *even more* skeptical about making things complicated
on the pure theory that a simple solution can't be correct.  I'm not
blind to the possibility that the current logic is inadequate, but
testing proves that it works well enough to produce a massive
performance boost over where we are now.  When, and if, we develop a
theory about specifically how it falls short then, sure, let's adjust
it.  But I think it would be a serious error to try to engineer a
perfect algorithm here based on the amount of testing that we can
reasonably do pre-commit.  We have no chance of getting that right,
and I'd rather have an algorithm that is simple and imperfect than an
algorithm that is complex and still imperfect.  No matter what, it's
better than what we have now.

>> > Hm. Perhaps we should do a bufHdr->refcount != zero check without
>> > locking here? The atomic op will transfer the cacheline exclusively to
>> > the reclaimer's CPU. Even though it very shortly afterwards will be
>> > touched afterwards by the pinning backend.
>>
>> Meh.  I'm not in favor of adding more funny games with locking unless
>> we can prove they're necessary for performance.
>
> Well, this in theory increases the number of processes touching buffer
> headers regularly. Currently, if you have one read IO intensive backend,
> there's pretty much only process touching the cachelines. This will make
> it two. I don't think it's unreasonable to try to reduce the cacheline
> pingpong caused by that...

It's not unreasonable, but this is a good place to apply Knuth's first
law of optimization.  There's no proof we need to do this, so let's
not until there is.

>> I also think it would be good to
>> get some statistics on how often regular backends are running the
>> clocksweep vs. how often bgreclaimer is satisfying their needs.
>
> I think that's necessary. The patch added buf_backend_clocksweep. Maybe
> we just also need buf_backend_from_freelist?

That's just (or should be just) buf_alloc - buf_backend_clocksweep.

I think buf_backend_clocksweep should really be called
buf_alloc_clocksweep, and should be added (in all relevant places)
right next to buf_alloc.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

Reply via email to