Greg Smith wrote:
True, you'd have to replay 1.5 checkpoint intervals on average instead
of 0.5 (more or less, assuming checkpoints had been short). I don't
think we're in the business of optimizing crash recovery time though.
If you're not, I think you should be. Keeping that replay interval time
down was one of the reasons why the people I was working with were
displeased with the implications of the very spread out style of some
LDC tunings. They were already unhappy with the implied recovery time
of how high they had to set checkpoint_settings for good performance,
and making it that much bigger aggrevates the issue. Given a knob where
the LDC can be spread out a bit but not across the entire interval, that
makes it easier to control how much expansion there is relative to the
I agree on that one: we *should* optimize crash recovery time. It may
not be the most important thing on earth, but it's a significant
consideration for some systems.
However, I think shortening the checkpoint interval is a perfectly valid
solution to that. It does lead to more full page writes, but in 8.3 more
full page writes can actually make the recovery go faster, not slower,
because with we no longer read in the previous contents of the page when
we restore it from a full page image. In any case, while people
sometimes complain that we have a large WAL footprint, it's not usually
This is off-topic, but at PGCon in May, Itagaki-san and his colleagues
whose names I can't remember, pointed out to me very clearly that our
recovery is *slow*. So slow, that in the benchmarks they were running,
their warm stand-by slave couldn't keep up with the master generating
the WAL, even though both are running on the same kind of hardware.
The reason is simple: There can be tens of backends doing I/O and
generating WAL, but in recovery we serialize them. If you have decent
I/O hardware that could handle for example 10 concurrent random I/Os, at
recovery we'll be issuing them one at a time. That's a scalability
issue, and doesn't show up on a laptop or a small server with a single disk.
That's one of the first things I'm planning to tackle when the 8.4 dev
cycle opens. And I'm planning to look at recovery times in general; I've
never even measured it before so who knows what comes up.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend