Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

Greg Smith Tue, 17 Jul 2012 21:01:18 -0700

On 07/17/2012 06:56 PM, Tom Lane wrote:

So I went to fix this in the obvious way (attached), but while testing
it I found that the number of buffers_backend events reported during
a regression test run barely changed; which surprised the heck out of
me, so I dug deeper.  The cause turns out to be extremely scary:
ForwardFsyncRequest isn't getting called at all in the bgwriter process,
because the bgwriter process has a pendingOpsTable.

When I did my testing early this year to look at checkpointerperformance (among other 9.2 write changes like group commit), I did seesome cases where buffers_backend was dramatically different on 9.2 vs.9.1 There were plenty of cases where the totals across a 10 minutepgbench were almost identical though, so this issue didn't stick outthen. That's a very different workload than the regression tests though.

This implies that nobody has done pull-the-plug testing on either HEAD
or 9.2 since the checkpointer split went in (2011-11-01), because even
a modicum of such testing would surely have shown that we're failing to
fsync a significant fraction of our write traffic.

Ugh. Most of my pull the plug testing the last six months has beenfocused on SSD tests with older versions. I want to duplicate this (andany potential fix) now that you've highlighted it.

Furthermore, I would say that any performance testing done since then,
if it wasn't looking at purely read-only scenarios, isn't worth the
electrons it's written on.  In particular, any performance gain that
anybody might have attributed to the checkpointer splitup is very
probably hogwash.

There hasn't been any performance testing that suggested thecheckpointer splitup was justified. The stuff I did showed it beingflat out negative for a subset of pgbench oriented cases, which didn'tseem real-world enough to disprove it as the right thing to do though.

I thought there were two valid justifications for the checkpointer split(which is not a feature I have any corporate attachment to--I'm asisolated from how it was developed as you are). The first is that itseems like the right architecture to allow reworking checkpoints andbackground writes for future write path optimization. A good chunk ofthe time when I've tried to improve one of those (like my spread syncstuff from last year), the code was complicated by the background writerneeding to follow the drum of checkpoint timing, and vice-versa. Beingable to hack on those independently got a sign of relief from me. Andwhile this adds some code duplication in things like the process setup,I thought the result would be cleaner for people reading the code tofollow too. This problem is terrible, but I think part of how it creptin is that the single checkpoint+background writer process was doing waytoo many things to even follow all of them some days.

The second justification for the split was that it seems easier to get alow power result from, which I believe was the angle Peter Geoghegan wasworking when this popped up originally. The checkpointer has to runsometimes, but only at a 50% duty cycle as it's tuned out of the box.It seems nice to be able to approach that in a way that's powerefficient without coupling it to whatever heartbeat the BGW is runningat. I could even see people changing the frequencies for eachindependently depending on expected system load. Tune for lower powerwhen you don't expect many users, that sort of thing.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

Reply via email to