Re: [HACKERS] Partitioned checkpointing

Tomas Vondra Fri, 11 Sep 2015 08:55:39 -0700


On 09/11/2015 03:56 PM, Simon Riggs wrote:


The idea to do a partial pass through shared buffers and only write a
fraction of dirty buffers, then fsync them is a good one.

The key point is that we spread out the fsyncs across the whole
checkpoint period.

I doubt that's really what we want to do, as it defeats one of thepurposes of spread checkpoints. With spread checkpoints, we write thedata to the page cache, and then let the OS to actually write the datato the disk. This is handled by the kernel, which marks the data asexpired after some time (say, 30 seconds) and then flushes them to disk.

The goal is to have everything already written to disk when we callfsync at the beginning of the next checkpoint, so that the fsync arecheap and don't cause I/O issues.

What you propose (spreading the fsyncs) significantly changes that,because it minimizes the amount of time the OS has for writing the datato disk in the background to 1/N. That's a significant change, and I'dbet it's for the worse.


I think we should be writing out all buffers for a particular file
in one pass, then issue one fsync per file. >1 fsyncs per file seems
a bad idea.

So we'd need logic like this
1. Run through shared buffers and analyze the files contained in there
2. Assign files to one of N batches so we can make N roughly equal sized
mini-checkpoints
3. Make N passes through shared buffers, writing out files assigned to
each batch as we go

What I think might work better is actually keeping the write/fsyncphases we have now, but instead of postponing the fsyncs until the nextcheckpoint we might spread them after the writes. So with target=0.5we'd do the writes in the first half, then the fsyncs in the other half.Of course, we should sort the data like you propose, and issue thefsyncs in the same order (so that the OS has time to write them to thedevices).

I wonder how much the original paper (written in 1996) is effectivelyobsoleted by spread checkpoints, but the benchmark results posted byHorikawa-san suggest there's a possible gain. But perhaps partitioningthe checkpoints is not the best approach?


regards

--
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned checkpointing

Reply via email to