Re: [HACKERS] Controlling Load Distributed Checkpoints

Gregory Stark Thu, 07 Jun 2007 12:31:53 -0700

"Greg Smith" <[EMAIL PROTECTED]> writes:

> I'm completely biased because of the workloads I've been dealing with 
> recently,
> but I consider (2) so much easier to tune for that it's barely worth worrying
> about.  If your system is so underloaded that you can let the checkpoints take
> their own sweet time, I'd ask if you have enough going on that you're 
> suffering
> very much from checkpoint performance issues anyway.  I'm used to being in a
> situation where if you don't push out checkpoint data as fast as physically
> possible, you end up fighting with the client backends for write bandwidth 
> once
> the LRU point moves past where the checkpoint has written out to already.  I'm
> not sure how much always running the LRU background writer will improve that
> situation.


I think you're working from a faulty premise.

There's no relationship between the volume of writes and how important the
speed of checkpoint is. In either scenario you should assume a system that is
close to the max i/o bandwidth. The only question is which task the admin
would prefer take the hit for maxing out the bandwidth, the transactions or
the checkpoint.

You seem to have imagined that letting the checkpoint take longer will slow
down transactions. In fact that's precisely the effect we're trying to avoid.
Right now we're seeing tests where Postgres stops handling *any* transactions
for up to a minute. In virtually any real world scenario that would simply be
unacceptable.

That one-minute outage is a direct consequence of trying to finish the
checkpoint as quick as possible. If we spread it out then it might increase
the average i/o load if you sum it up over time, but then you just need a
faster i/o controller. 

The only scenario where you would prefer the absolute lowest i/o rate summed
over time would be if you were close to maxing out your i/o bandwidth,
couldn't buy a faster controller, and response time was not a factor, only
sheer volume of transactions processed mattered. That's a much less common
scenario than caring about the response time.

The flip side of having to worry about response time buying a faster
controller doesn't even help. It would shorten the duration of the checkpoint
but not eliminate it. A 30-second outage every half hour is just as
unacceptable as a 1-minute outage every half hour.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to