Heikki Linnakangas wrote:
Here's results from a batch of test runs with LDC. This patch only
spreads out the writes, fsyncs work as before. This patch also includes
the optimization that we don't write buffers that were dirtied after
starting the checkpoint.
See tests 276-280. 280 is the baseline with no patch attached, the
others are with load distributed checkpoints with different values for
checkpoint_write_percent. But after running the tests I noticed that the
spreading was actually controlled by checkpoint_write_rate, which sets
the minimum rate for the writes, so all those tests with the patch
applied are effectively the same; the writes were spread over a period
of 1 minute. I'll fix that setting and run more tests.
I ran another series of tests, with a less aggressive bgwriter_delay
setting, which also affects the minimum rate of the writes in the WIP
patch I used.
Now that the checkpoints are spread out more, the response times are
With the 40% checkpoint_write_percent setting, the checkpoints last ~3
minutes. About 85% of the buffer cache is dirty at the beginning of
checkpoints, and thanks to the optimization of not writing pages dirtied
after checkpoint start, only ~47% of those are actually written by the
checkpoint. That explains why the checkpoints only last ~3 minutes, and
not checkpoint_timeout*checkpoint_write_percent, which would be 6
minutes. The estimation of how much progress has been done and how much
is left doesn't take the gain from that optimization into account.
The sync phase only takes ~5 seconds. I'm very happy with these results.
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster