Re: [PATCHES] Load Distributed Checkpoints, revised patch

Heikki Linnakangas Sun, 17 Jun 2007 00:54:34 -0700

Simon Riggs wrote:

On Fri, 2007-06-15 at 11:34 +0100, Heikki Linnakangas wrote:
- What units should we use for the new GUC variables? Fromimplementation point of view, it would be simplest ifcheckpoint_write_rate is given as pages/bgwriter_delay, similarly tobgwriter_*_maxpages. I never liked those *_maxpages settings, though, amore natural unit from users perspective would be KB/s.
checkpoint_maxpages would seem like a better name; we've already had
those _maxpages settings for 3 releases, so changing that is not really
an option (at so late a stage).

As Tom pointed out, we don't promise compatibility of conf-files overmajor releases. I wasn't actually thinking of changing any of theexisting parameters, just thinking about the best name and behavior forthe new ones.

We don't really care about units because
the way you use it is to nudge it up a little and see if that works
etc..

Not necessarily. If it's given in KB/s, you might very well have an ideaof how much I/O your hardware is capable of, and set aside a fraction ofthat for checkpoints.

Can we avoid having another parameter? There must be some protection in
there to check that a checkpoint lasts for no longer than
checkpoint_timeout, so it makes most sense to vary the checkpoint in
relation to that parameter.

Sure, that's what checkpoint_write_percent is for. checkpoint_rate canbe used to finish the checkpoint faster, if there's not much work to do.For example, if there's only 10 pages to flush in a checkpoint,checkpoint_timeout is 30 minutes and checkpoint_write_percent = 50%, youdon't want to spread out those 10 writes over 15 minutes, that would bejust silly. checkpoint_rate sets the *minimum* rate used to write. Ifwriting at that minimum rate isn't enough to finish the checkpoint intime, as defined by by checkpoint interval * checkpoint_write_percent,we write more aggressively.

I'm more interested in checkpoint_write_percent myself as well, but GregSmith said he wanted the checkpoint to use a constant I/O rate and letthe length of the checkpoint to vary.

- The signaling between RequestCheckpoint and bgwriter is a bit tricky.Bgwriter now needs to deal immediate checkpoint requests, like thosecoming from explicit CHECKPOINT or CREATE DATABASE commands, differentlyfrom those triggered by checkpoint_segments. I'm afraid there might berace conditions when a CHECKPOINT is issued at the same instant ascheckpoint_segments triggers one. What might happen then is that thecheckpoint is performed lazily, spreading the writes, and the CHECKPOINTcommand has to wait for that to finish which might take a long time. Ihave not been able to convince myself neither that the race conditionexists or that it doesn't.
Is there a mechanism for requesting immediate/non-immediate checkpoints?

No, CHECKPOINT requests an immediate one. Is there a use case forCHECKPOINT LAZY?

pg_start_backup() should be a normal checkpoint I think. No need for
backup to be an intrusive process.

Good point. A spread out checkpoint can take a long time to finish,though. Is there risk for running into a timeout or something if ittakes say 10 minutes for a call to pg_start_backup to finish?

- to coordinate the writes with with checkpoint_segments, we need toread the WAL insertion location. To do that, we need to acquire theWALInsertLock. That means that in the worst case, WALInsertLock isacquired every bgwriter_delay when a checkpoint is in progress. I don'tthink that's a problem, it's only held for a very short duration, but Ithought I'd mention it.
I think that is a problem.


Why?

Do we need to know it so exactly that we look
at WALInsertLock? Maybe use info_lck to request the latest page, since
that is less heavily contended and we need never wait across I/O.

Is there such a value available, that's protected by just info_lck? Ican't see one.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [PATCHES] Load Distributed Checkpoints, revised patch

Reply via email to