Simon Riggs wrote:
On Fri, 2007-06-15 at 11:34 +0100, Heikki Linnakangas wrote:
- What units should we use for the new GUC variables? From
implementation point of view, it would be simplest if
checkpoint_write_rate is given as pages/bgwriter_delay, similarly to
bgwriter_*_maxpages. I never liked those *_maxpages settings, though, a
more natural unit from users perspective would be KB/s.
checkpoint_maxpages would seem like a better name; we've already had
those _maxpages settings for 3 releases, so changing that is not really
an option (at so late a stage).
As Tom pointed out, we don't promise compatibility of conf-files over
major releases. I wasn't actually thinking of changing any of the
existing parameters, just thinking about the best name and behavior for
the new ones.
We don't really care about units because
the way you use it is to nudge it up a little and see if that works
Not necessarily. If it's given in KB/s, you might very well have an idea
of how much I/O your hardware is capable of, and set aside a fraction of
that for checkpoints.
Can we avoid having another parameter? There must be some protection in
there to check that a checkpoint lasts for no longer than
checkpoint_timeout, so it makes most sense to vary the checkpoint in
relation to that parameter.
Sure, that's what checkpoint_write_percent is for. checkpoint_rate can
be used to finish the checkpoint faster, if there's not much work to do.
For example, if there's only 10 pages to flush in a checkpoint,
checkpoint_timeout is 30 minutes and checkpoint_write_percent = 50%, you
don't want to spread out those 10 writes over 15 minutes, that would be
just silly. checkpoint_rate sets the *minimum* rate used to write. If
writing at that minimum rate isn't enough to finish the checkpoint in
time, as defined by by checkpoint interval * checkpoint_write_percent,
we write more aggressively.
I'm more interested in checkpoint_write_percent myself as well, but Greg
Smith said he wanted the checkpoint to use a constant I/O rate and let
the length of the checkpoint to vary.
- The signaling between RequestCheckpoint and bgwriter is a bit tricky.
Bgwriter now needs to deal immediate checkpoint requests, like those
coming from explicit CHECKPOINT or CREATE DATABASE commands, differently
from those triggered by checkpoint_segments. I'm afraid there might be
race conditions when a CHECKPOINT is issued at the same instant as
checkpoint_segments triggers one. What might happen then is that the
checkpoint is performed lazily, spreading the writes, and the CHECKPOINT
command has to wait for that to finish which might take a long time. I
have not been able to convince myself neither that the race condition
exists or that it doesn't.
Is there a mechanism for requesting immediate/non-immediate checkpoints?
No, CHECKPOINT requests an immediate one. Is there a use case for
pg_start_backup() should be a normal checkpoint I think. No need for
backup to be an intrusive process.
Good point. A spread out checkpoint can take a long time to finish,
though. Is there risk for running into a timeout or something if it
takes say 10 minutes for a call to pg_start_backup to finish?
- to coordinate the writes with with checkpoint_segments, we need to
read the WAL insertion location. To do that, we need to acquire the
WALInsertLock. That means that in the worst case, WALInsertLock is
acquired every bgwriter_delay when a checkpoint is in progress. I don't
think that's a problem, it's only held for a very short duration, but I
thought I'd mention it.
I think that is a problem.
Do we need to know it so exactly that we look
at WALInsertLock? Maybe use info_lck to request the latest page, since
that is less heavily contended and we need never wait across I/O.
Is there such a value available, that's protected by just info_lck? I
can't see one.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not