Re: [HACKERS] Load distributed checkpoint V4

Heikki Linnakangas Mon, 23 Apr 2007 03:05:35 -0700

ITAGAKI Takahiro wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
We might want to call GetCheckpointProgress somethingelse, though. It doesn't return the amount of progress made, but ratherthe amount of progress we should've made up to that point or we're indanger of not completing the checkpoint in time.
GetCheckpointProgress might be a bad name; It returns the progress we should
have done, not at that time. How about GetCheckpointTargetProgress?


Better. A bit long though. Not that I have any better suggestions ;-)

In the sync phase, we sleep between each fsync until enoughtime/segments have passed, assuming that the time to fsync isproportional to the file length. I'm not sure that's a very goodassumption. We might have one huge files with only very little changeddata, for example a logging table that is just occasionaly appended to.If we begin by fsyncing that, it'll take a very short time to finish,and we'll then sleep for a long time. If we then have another large fileto fsync, but that one has all pages dirty, we risk running out of timebecause of the unnecessarily long sleep. The segmentation of relationslimits the risk of that, though, by limiting the max. file size, and Idon't really have any better suggestions.
It is difficult to estimate fsync costs. We need additonal statistics to
do it. For example, if we record the number of write() for each segment,
we might use the value as the number of dirty pages in segments. We don't
have per-file write statistics now, but if we will have those information,
we can use them to control checkpoints more cleverly.

It's probably not worth it to be too clever with that. Even if werecorded the number of writes we made, we still wouldn't know how manyof them haven't been flushed to disk yet.

I guess we're fine if we do just avoid excessive waiting per thediscussion in the next paragraph, and use a reasonable safety margin inthe default values.

Should we try doing something similar for the sync phase? If there'sonly 2 small files to fsync, there's no point sleeping for 5 minutesbetween them just to use up the checkpoint_sync_percent budget.
Hmmm... if we add a new parameter like kernel_write_throughput [kB/s] and
clamp the maximum sleeping to size-of-segment / kernel_write_throuput (*1),we can avoid unnecessary sleeping in fsync phase. Do we want to have such
a new parameter? I think we have many and many guc variables even now.

How about using the same parameter that controls the minimum write speedof the write-phase (the patch used bgwriter_all_maxpages, but Isuggested renaming it)?

I don't want to add new parameters any more if possible...


Agreed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Load distributed checkpoint V4

Reply via email to