Re: [HACKERS] Controlling Load Distributed Checkpoints

Greg Smith Thu, 07 Jun 2007 12:02:22 -0700

On Thu, 7 Jun 2007, Heikki Linnakangas wrote:

So there's two extreme ways you can use LDC:
1. Finish the checkpoint as soon as possible, without disturbing otheractivity too much2. Disturb other activity as little as possible, as long as thecheckpoint finishes in a reasonable time.Are both interesting use cases, or is it enough to cater for just one ofthem? I think 2 is easier to tune.

The motivation for the (1) case is that you've got a system that'sdirtying the buffer cache very fast in normal use, where even thebackground writer is hard pressed to keep the buffer pool clean. Thecheckpoint is the most powerful and efficient way to clean up many dirtybuffers out of such a buffer cache in a short period of time so thatyou're back to having room to work in again. In that situation, sincethere are many buffers to write out, you'll also be suffering greatly fromfsync pauses. Being able to synchronize writes a little better with theunderlying OS to smooth those out is a huge help.

I'm completely biased because of the workloads I've been dealing withrecently, but I consider (2) so much easier to tune for that it's barelyworth worrying about. If your system is so underloaded that you can letthe checkpoints take their own sweet time, I'd ask if you have enoughgoing on that you're suffering very much from checkpoint performanceissues anyway. I'm used to being in a situation where if you don't pushout checkpoint data as fast as physically possible, you end up fightingwith the client backends for write bandwidth once the LRU point moves pastwhere the checkpoint has written out to already. I'm not sure how muchalways running the LRU background writer will improve that situation.

On a Linux system, one way to model it is that the OS flushes dirty buffersto disk at the same rate as we write them, but delayed bydirty_expire_centisecs. That should hold if the writes are spread out enough.

If they're really spread out, sure. There is congestion avoidance codeinside the Linux kernel that makes dirty_expire_centisecs not quite workthe way it is described under load. All you can say in the general caseis that when dirty_expire_centisecs has passed, the kernel badly wants towrite the buffers out as quickly as possible; that could still be manyseconds after the expiration time on a busy system, or on one with slowI/O.

On every system I've ever played with Postgres write performance on, Idiscovered that the memory-based parameters like dirty_background_ratiowere really driving write behavior, and I almost ignore the expire timeoutnow. Plotting the "Dirty:" value in /proc/meminfo as you're running testsis extremely informative for figuring out what Linux is really doingunderneath the database writes.

The influence of the congestion code is why I made the comment aboutwatching how long writes are taking to gauge how fast you can dump dataonto the disks. When you're suffering from one of the congestionmechanisms, the initial writes start blocking, even before the fsync.That behavior is almost undocumented outside of the relevant kernel sourcecode.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to