Re: [HACKERS] Controlling Load Distributed Checkpoints

Greg Smith Mon, 11 Jun 2007 00:55:45 -0700

On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote:

If the kernel can treat sequential writes better than random writes, isit worth sorting dirty buffers in block order per file at the start ofcheckpoints?

I think it has the potential to improve things. There are three obviousand one subtle argument against it I can think of:

1) Extra complexity for something that may not help. This would need somegood, robust benchmarking improvements to justify its use.

2) Block number ordering may not reflect actual order on disk. Whiletrue, it's got to be better correlated with it than writing at random.

3) The OS disk elevator should be dealing with this issue, particularlybecause it may really know the actual disk ordering.

Here's the subtle thing: by writing in the same order the LRU scan occursin, you are writing dirty buffers in the optimal fashion to eliminateclient backend writes during BuferAlloc. This makes the checkpoint areally effective LRU clearing mechanism. Writing in block order willchange that.

I spent some time trying to optimize the elevator part of this operation,since I knew that on the system I was using block order was actual order.I found that under Linux, the behavior of the pdflush daemon that managesdirty memory had a more serious impact on writing behavior at checkpointtime than playing with the elevator scheduling method did. The waypdflush works actually has several interesting implications for how tooptimize this patch. For example, how writes get blocked when the dirtymemory reaches certain thresholds means that you may not get the fullbenefit of the disk elevator at checkpoint time the way most would expect.

Since much of that was basically undocumented, I had to write my ownanalysis of the actual workings, which is now available athttp://www.westnet.com/~gsmith/content/linux-pdflush.htm I hope thatanyone who wants more information about how Linux kernel parameters likedirty_background_ratio actually work, and how they impact the writingstrategy, should find that article uniquely helpful.

Some kernels or storage subsystems treat all I/Os too fairly so thatuser transactions waiting for reads are blocked by checkpoints writes.

In addition to that (which I've seen happen quite a bit), in the Linuxcase another fairness issue is that the code that handles writes allows asingle process writing a lot of data to block writes for everyone else.That means that in addition to being blocked on actual reads, if a clientbackend starts a write in order to complete a buffer allocation to holdnew information, that can grind to a halt because of the checkpointprocess as well.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to