Re: [HACKERS] Controlling Load Distributed Checkpoints

Heikki Linnakangas Thu, 07 Jun 2007 05:39:40 -0700

Thinking about this whole idea a bit more, it occured to me that thecurrent approach to write all, then fsync all is really a historicalartifact of the fact that we used to use the system-wide sync callinstead of fsyncs to flush the pages to disk. That might not be the bestway to do things in the new load-distributed-checkpoint world.


How about interleaving the writes with the fsyncs?

1.

Scan all shared buffers, and build a list of all files with dirty pages,and buffers belonging to them


2.
foreach(file in list)
{
  foreach(buffer belonging to file)
  {
    write();
    sleep(); /* to throttle the I/O rate */
  }

sleep(); /* to give the OS a chance to flush the writes at it's ownpace */

  fsync()
}

This would spread out the fsyncs in a natural way, making the knob tocontrol the duration of the sync phase unnecessary.

At some point we'll also need to fsync all files that have been modifiedsince the last checkpoint, but don't have any dirty buffers in thebuffer cache. I think it's a reasonable assumption that fsyncing thosefiles doesn't generate a lot of I/O. Since the writes have been madesome time ago, the OS has likely already flushed them to disk.

Doing the 1st phase of just scanning the buffers to see which ones aredirty also effectively implements the optimization of not writingbuffers that were dirtied after the checkpoint start. And grouping thewrites per file gives the OS a better chance to group the physical writes.

One problem is that currently the segmentation of relations to 1GB filesis handled at a low level inside md.c, and we don't really have anyvisibility into that in the buffer manager. ISTM that some changes tothe smgr interfaces would be needed for this to work well, though justdoing it on a relation per relation basis would also be better than thecurrent approach.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to