Re: [HACKERS] Controlling Load Distributed Checkpoints

Greg Smith Thu, 07 Jun 2007 13:54:01 -0700

On Thu, 7 Jun 2007, Gregory Stark wrote:

You seem to have imagined that letting the checkpoint take longer will slow
down transactions.

And you seem to have imagined that I have so much spare time that I'm justmaking stuff up to entertain myself and sow confusion.

I observed some situations where delaying checkpoints too long ends upslowing down both transaction rate and response time, using earliervariants of the LDC patch and code with similar principles I wrote. I'mtrying to keep the approach used here out of the worst of the corner casesI ran into, or least to make it possible for people in those situations tohave some ability to tune out of the bad spots. I am unfortunately notfree to disclose all those test results, and since that project is over Ican't see how the current LDC compares to what I tested at the time.

I plainly stated I had a bias here, one that's not even close to theaverage case. My concern here was that Heikki would end up optimizing ina direction where a really wide spread across the active checkpointinterval was strongly preferred. I wanted to offer some suggestions onthe type of situation where that might not be true, but where a differenttuning of LDC would still be an improvement over the current behavior.There are some tuning knobs there that I don't want to see go away untilthere's been a wider range of tests to prove they aren't effective.

Right now we're seeing tests where Postgres stops handling *any* transactions
for up to a minute. In virtually any real world scenario that would simply be
unacceptable.

No doubt; I've seen things get close to that bad myself, both on the highand low end. I collided with the issue in a situation of "maxing out youri/o bandwidth, couldn't buy a faster controller" at one point, which iswhat kicked off my working in this area. It turned out there were stillsome software tunables left that pulled the worst case down to the 2-5second range instead. With more checkpoint_segments to decrease thefrequency, that was just enough to make the problem annoying rather thancrippling. But after that, I could easily imagine a different applicationscenario where the behavior you describe is the best case.

This is really a serious issue with the current design of the database,one that merely changes instead of going away completely if you throw morehardware at it. I'm perversely glad to hear this is torturing more peoplethan just me as it improves the odds the situation will improve.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] Controlling Load Distributed Checkpoints

Reply via email to