Re: [HACKERS] Load distributed checkpoint

Greg Smith Fri, 22 Dec 2006 13:34:03 -0800

On Fri, 22 Dec 2006, Simon Riggs wrote:

I have also seen cases where the WAL drive, even when separated, appears
to spike upwards during a checkpoint. My best current theory, so far
untested, is that the WAL and data drives are using the same CFQ
scheduler and that the scheduler actively slows down WAL requests when
it need not. Mounting the drives as separate block drives with separate
schedulers, CFQ for data and Deadline for WAL should help.

The situation I've been seeing is that the database needs a new block tocomplete a query and issues a read request to get it, but that read isbehind the big checkpoint fsync. Client sits there for quite some timewaiting for the fsync to finish before it gets the data it needs, and nowyour trivial select took seconds to complete. It's fairly easy toreplicate this problem using pgbench on Linux--I've seen a query sit therefor 15 seconds when going out of my way to aggrevate the behavior. One ofTakayuki's posts here mentioned a worst-case delay of 13 seconds, that'sthe problem rearing its ugly head.

You may be right that what you're seeing would be solved with a morecomplicated tuning on a per-device basis (which, by the way, isn'tavailable unless you're running a more recent Linux kernel than most manydistributions have available). You can tune the schedulers all day andnot make a lick of difference to what I've been running into; I know, Itried.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [HACKERS] Load distributed checkpoint

Reply via email to