Simon Riggs wrote:
On Fri, 2007-06-15 at 11:34 +0100, Heikki Linnakangas wrote:

- What units should we use for the new GUC variables? From implementation point of view, it would be simplest if checkpoint_write_rate is given as pages/bgwriter_delay, similarly to bgwriter_*_maxpages. I never liked those *_maxpages settings, though, a more natural unit from users perspective would be KB/s.

checkpoint_maxpages would seem like a better name; we've already had
those _maxpages settings for 3 releases, so changing that is not really
an option (at so late a stage).

As Tom pointed out, we don't promise compatibility of conf-files over major releases. I wasn't actually thinking of changing any of the existing parameters, just thinking about the best name and behavior for the new ones.

We don't really care about units because
the way you use it is to nudge it up a little and see if that works

Not necessarily. If it's given in KB/s, you might very well have an idea of how much I/O your hardware is capable of, and set aside a fraction of that for checkpoints.

Can we avoid having another parameter? There must be some protection in
there to check that a checkpoint lasts for no longer than
checkpoint_timeout, so it makes most sense to vary the checkpoint in
relation to that parameter.

Sure, that's what checkpoint_write_percent is for. checkpoint_rate can be used to finish the checkpoint faster, if there's not much work to do. For example, if there's only 10 pages to flush in a checkpoint, checkpoint_timeout is 30 minutes and checkpoint_write_percent = 50%, you don't want to spread out those 10 writes over 15 minutes, that would be just silly. checkpoint_rate sets the *minimum* rate used to write. If writing at that minimum rate isn't enough to finish the checkpoint in time, as defined by by checkpoint interval * checkpoint_write_percent, we write more aggressively.

I'm more interested in checkpoint_write_percent myself as well, but Greg Smith said he wanted the checkpoint to use a constant I/O rate and let the length of the checkpoint to vary.

- The signaling between RequestCheckpoint and bgwriter is a bit tricky. Bgwriter now needs to deal immediate checkpoint requests, like those coming from explicit CHECKPOINT or CREATE DATABASE commands, differently from those triggered by checkpoint_segments. I'm afraid there might be race conditions when a CHECKPOINT is issued at the same instant as checkpoint_segments triggers one. What might happen then is that the checkpoint is performed lazily, spreading the writes, and the CHECKPOINT command has to wait for that to finish which might take a long time. I have not been able to convince myself neither that the race condition exists or that it doesn't.

Is there a mechanism for requesting immediate/non-immediate checkpoints?

No, CHECKPOINT requests an immediate one. Is there a use case for CHECKPOINT LAZY?

pg_start_backup() should be a normal checkpoint I think. No need for
backup to be an intrusive process.

Good point. A spread out checkpoint can take a long time to finish, though. Is there risk for running into a timeout or something if it takes say 10 minutes for a call to pg_start_backup to finish?

- to coordinate the writes with with checkpoint_segments, we need to read the WAL insertion location. To do that, we need to acquire the WALInsertLock. That means that in the worst case, WALInsertLock is acquired every bgwriter_delay when a checkpoint is in progress. I don't think that's a problem, it's only held for a very short duration, but I thought I'd mention it.

I think that is a problem.


Do we need to know it so exactly that we look
at WALInsertLock? Maybe use info_lck to request the latest page, since
that is less heavily contended and we need never wait across I/O.

Is there such a value available, that's protected by just info_lck? I can't see one.

  Heikki Linnakangas

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not

Reply via email to