Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
The main tuning knob is checkpoint_smoothing, which is defined as a
fraction of the checkpoint interval (both checkpoint_timeout and
checkpoint_segments are taken into account). Normally, the write phase
of a checkpoint takes exactly that much time
So the question is, why in the heck would anyone want the behavior that
"checkpoints take exactly X time"?? The useful part of this whole patch
is to cap the write rate at something that doesn't interfere too much
with foreground queries. I don't see why people wouldn't prefer
"checkpoints can take any amount of time up to the checkpoint interval,
but we do our best not to exceed Y writes/second".
Because it's easier to tune. You don't need to know how much checkpoint
I/O you can tolerate. The system will use just enough I/O bandwidth to
meet the deadline, but not more than that.
Basically I don't see what useful values checkpoint_smoothing would have
other than 0 and 1. You might as well make it a bool.
Well that's one option. It feels like a good thing to be able to control
how much headroom you have until the next checkpoint, but maybe we
can just hardcode it close to 1. It's also good to avoid spreading the
checkpoints unnecessarily, to keep recovery times lower, but you can
control that with the min rate setting as well.
There's another possible strategy: keep the I/O rate constant, but vary
the length of the checkpoint. checkpoint_rate allows you to do that.
But only from the lower side.
Now how would you replace checkpoint_smoothing with a max I/O rate?
I don't see why you think that's hard. It looks to me like the
components of the decision are the same numbers in any case: you have to
estimate your progress towards checkpoint completion, your available
time till next checkpoint, and your write rate. Then you either delay
Let me put it this way: If you define a min and a max I/O rate, when
would the max I/O rate limit take effect? If there's few dirty buffers
in the pool, so that you'll finish the checkpoint in time before the
next one is due writing at the min rate, that's what you'll use. If
there's more, you'll need to write fast enough that you'll finish the
checkpoint in time, regardless of the max rate. Or would you let the
next checkpoint slip and keep writing at the max rate? That seems like a
footgun if someone sets it to a too low value.
Or are you thinking that we have just one setting: checkpoint_rate? You
describe it as a maximum, but I've been thinking of it as a minimum
because you *will* exceed it if the next checkpoint is due soon.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?