On 11 September 2015 at 09:07, Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> Some general comments : > Thanks for the summary Fabien. > I understand that what this patch does is cutting the checkpoint of > buffers in 16 partitions, each addressing 1/16 of buffers, and each with > its own wal-log entry, pacing, fsync and so on. > > I'm not sure why it would be much better, although I agree that it may > have some small positive influence on performance, but I'm afraid it may > also degrade performance in some conditions. So I think that maybe a better > understanding of why there is a better performance and focus on that could > help obtain a more systematic gain. > I think its a good idea to partition the checkpoint, but not doing it this way. Splitting with N=16 does nothing to guarantee the partitions are equally sized, so there would likely be an imbalance that would reduce the effectiveness of the patch. > This method interacts with the current proposal to improve the > checkpointer behavior by avoiding random I/Os, but it could be combined. > > I'm wondering whether the benefit you see are linked to the file flushing > behavior induced by fsyncing more often, in which case it is quite close > the "flushing" part of the current "checkpoint continuous flushing" patch, > and could be redundant/less efficient that what is done there, especially > as test have shown that the effect of flushing is *much* better on sorted > buffers. > > Another proposal around, suggested by Andres Freund I think, is that > checkpoint could fsync files while checkpointing and not wait for the end > of the checkpoint. I think that it may also be one of the reason why your > patch does bring benefit, but Andres approach would be more systematic, > because there would be no need to fsync files several time (basically your > patch issues 16 fsync per file). This suggest that the "partitionning" > should be done at a lower level, from within the CheckPointBuffers, which > would take care of fsyncing files some time after writting buffers to them > is finished. The idea to do a partial pass through shared buffers and only write a fraction of dirty buffers, then fsync them is a good one. The key point is that we spread out the fsyncs across the whole checkpoint period. I think we should be writing out all buffers for a particular file in one pass, then issue one fsync per file. >1 fsyncs per file seems a bad idea. So we'd need logic like this 1. Run through shared buffers and analyze the files contained in there 2. Assign files to one of N batches so we can make N roughly equal sized mini-checkpoints 3. Make N passes through shared buffers, writing out files assigned to each batch as we go -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services