Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
(BTW, the patch seems
a bit schizoid about whether checkpoint_rate is int or float.)
Yeah, I've gone back and forth on the data type. I wanted it to be a
float, but guc code doesn't let you specify a float in KB, so I switched
it to int.
I seriously question trying to claim that it's blocks at all, seeing
that the *actual* units are pages per unit time. Pretending that it's
a memory unit does more to obscure the truth than document it.
Hmm. What I wanted to avoid is that the I/O rate you get then depends on
your bgwriter_delay, so you if you change that you need to change
checkpoint_min_rate as well.
Now we already have that issue with bgwriter_all/lru_maxpages, and I
don't like it there either. If you think it's better to let the user
define it directly as pages/bgwriter_delay, fine.
And checkpoint_rate really needs to be named checkpoint_min_rate, if
it's going to be a minimum. However, I question whether we need it at
Hmm. With bgwriter_delay of 200 ms, and checkpoint_min_rate of 512 KB/s,
using the non-broken formula above, we get:
(512*1024/8192) * 200 / 1000 = 12.8, truncated to 12.
So I think that's fine.
"Fine?" That's 12x the value you have actually tested. That's enough
of a change to invalidate all your performance testing IMHO.
I'll reschedule the tests to be sure, after we settle on how we want to
control this feature.
I still think you've not demonstrated a need to expose this parameter.
Greg Smith wanted to explicitly control the I/O rate, and let the
checkpoint duration vary. I personally think that fixing the checkpoint
duration is better because it's easier to tune.
But if we only do that, you might end up with ridiculously long
checkpoints when there's not many dirty pages. If we want to avoid that,
we need some way of telling what's a safe minimum rate to write at,
because that can vary greatly depending on your hardware.
But maybe we don't care about prolonging checkpoints, and don't really
need any GUCs at all. We could then just hardcode writes_per_nap to some
low value, and target duration close to 1.0. You would have a checkpoint
running practically all the time, and you would use
checkpoint_timeout/checkpoint_segments to control how long it takes. I'm
a bit worried about jumping to such a completely different regime,
though. For example, pg_start_backup needs to create a new checkpoint,
so it would need to wait on average 1.5 * checkpoint_timeout/segments,
and recovery would need to process on average 1.5 as much WAL as before.
Though with LDC, you should get away with shorter checkpoint intervals
than before, because the checkpoints aren't as invasive.
If we do that, we should remove bgwriter_all_* settings. They wouldn't
do much because we would have checkpoint running all the time, writing
out dirty pages.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings