Re: [PATCHES] Load distributed checkpoint V3

Heikki Linnakangas Thu, 05 Apr 2007 06:00:32 -0700

ITAGAKI Takahiro wrote:

Here is the latest version of Load distributed checkpoint patch.


Bgwriter has two goals:
1. keep enough buffers clean that normal backends never need to do a write
2. smooth checkpoints by writing buffers ahead of time

Load distributed checkpoints will do 2. in a much better way than thebgwriter_all_* guc options. I think we should remove that aspect ofbgwriter in favor of this patch.

The scheduling of bgwriter gets quite complicated with the patch. If I'mreading it correctly, bgwriter will keep periodically writing buffers toachieve 1. while the "write"-phase of checkpoint is in progress. Thatmakes sense; now that checkpoints take longer, we would miss goal 1.otherwise. But we don't do that in the "sleep-between-write-and-fsync"-and "fsync"-phases. We should, shouldn't we?

I'd suggest rearranging the code so that BgBufferSync and mdsync wouldbasically stay like they are without the patch; the signature wouldn'tchange. To do the naps during a checkpoint, inject calls to newfunctions like CheckpointWriteNap() and CheckpointFsyncNap() insideBgBufferSync and mdsync. Those nap functions would check if enoughprogress has been made since last call and sleep if so.

The piece of code that implements 1. would be refactored to a newfunction, let's say BgWriteLRUBuffers(). The nap-functions would callBgWriteLRUBuffers if more than bgwriter_delay milliseconds have passedsince last call to it.

This way the changes to CreateCheckpoint, BgBufferSync and mdsync wouldbe minimal, and bgwriter would keep cleaning buffers for normal backendsduring the whole checkpoint.

Another thought is to have a separate checkpointer-process so that thebgwriter process can keep cleaning dirty buffers while the checkpoint isrunning in a separate process. One problem with that is that wecurrently collect all the fsync requests in bgwriter. If we had aseparate checkpointer process, we'd need to do that in the checkpointerinstead, and bgwriter would need to send a message to the checkpointerevery time it flushes a buffer, which would be a lot of chatter.Alternatively, bgwriter could somehow pass the pendingOpsTable to thecheckpointer process at the beginning of checkpoint, but that notexactly trivial either.

PS. Great that you're working on this. It's a serious problem underheavy load.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [PATCHES] Load distributed checkpoint V3

Reply via email to