Re: [HACKERS] Partitioned checkpointing

Simon Riggs Fri, 11 Sep 2015 06:57:33 -0700

On 11 September 2015 at 09:07, Fabien COELHO <[email protected]> wrote:



> Some general comments :
>

Thanks for the summary Fabien.


> I understand that what this patch does is cutting the checkpoint of
> buffers in 16 partitions, each addressing 1/16 of buffers, and each with
> its own wal-log entry, pacing, fsync and so on.
>
> I'm not sure why it would be much better, although I agree that it may
> have some small positive influence on performance, but I'm afraid it may
> also degrade performance in some conditions. So I think that maybe a better
> understanding of why there is a better performance and focus on that could
> help obtain a more systematic gain.
>

I think its a good idea to partition the checkpoint, but not doing it this
way.

Splitting with N=16 does nothing to guarantee the partitions are equally
sized, so there would likely be an imbalance that would reduce the
effectiveness of the patch.


> This method interacts with the current proposal to improve the
> checkpointer behavior by avoiding random I/Os, but it could be combined.
>
> I'm wondering whether the benefit you see are linked to the file flushing
> behavior induced by fsyncing more often, in which case it is quite close
> the "flushing" part of the current "checkpoint continuous flushing" patch,
> and could be redundant/less efficient that what is done there, especially
> as test have shown that the effect of flushing is *much* better on sorted
> buffers.
>
> Another proposal around, suggested by Andres Freund I think, is that
> checkpoint could fsync files while checkpointing and not wait for the end
> of the checkpoint. I think that it may also be one of the reason why your
> patch does bring benefit, but Andres approach would be more systematic,
> because there would be no need to fsync files several time (basically your
> patch issues 16 fsync per file). This suggest that the "partitionning"
> should be done at a lower level, from within the CheckPointBuffers, which
> would take care of fsyncing files some time after writting buffers to them
> is finished.


The idea to do a partial pass through shared buffers and only write a
fraction of dirty buffers, then fsync them is a good one.

The key point is that we spread out the fsyncs across the whole checkpoint
period.

I think we should be writing out all buffers for a particular file in one
pass, then issue one fsync per file.  >1 fsyncs per file seems a bad idea.

So we'd need logic like this
1. Run through shared buffers and analyze the files contained in there
2. Assign files to one of N batches so we can make N roughly equal sized
mini-checkpoints
3. Make N passes through shared buffers, writing out files assigned to each
batch as we go

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Partitioned checkpointing

Reply via email to