Partitioned checkpoint have the significant disadvantage that it
increases random write io by the number of passes. Which is a bad idea,
*especially* on SSDs.

> >So we'd need logic like this
> >1. Run through shared buffers and analyze the files contained in there
> >2. Assign files to one of N batches so we can make N roughly equal sized
> >mini-checkpoints
> >3. Make N passes through shared buffers, writing out files assigned to
> >each batch as we go

That's essentially what Fabien's sorting patch does by sorting all

> What I think might work better is actually keeping the write/fsync phases we
> have now, but instead of postponing the fsyncs until the next checkpoint we
> might spread them after the writes. So with target=0.5 we'd do the writes in
> the first half, then the fsyncs in the other half. Of course, we should sort
> the data like you propose, and issue the fsyncs in the same order (so that
> the OS has time to write them to the devices).

I think the approach in Fabien's patch of enforcing that there's not
very much dirty data to flush by forcing early cache flushes is
better. Having gigabytes worth of dirty data in the OS page cache can
have massive negative impact completely independent of fsyncs.

> I wonder how much the original paper (written in 1996) is effectively
> obsoleted by spread checkpoints, but the benchmark results posted by
> Horikawa-san suggest there's a possible gain. But perhaps partitioning the
> checkpoints is not the best approach?

I think it's likely that the patch will have only a very small effect if
applied ontop of Fabien's patch (which'll require some massaging I'm


Andres Freund

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to