On Thu, 7 Jun 2007, Gregory Stark wrote:
You seem to have imagined that letting the checkpoint take longer will slow down transactions.
And you seem to have imagined that I have so much spare time that I'm just making stuff up to entertain myself and sow confusion.
I observed some situations where delaying checkpoints too long ends up slowing down both transaction rate and response time, using earlier variants of the LDC patch and code with similar principles I wrote. I'm trying to keep the approach used here out of the worst of the corner cases I ran into, or least to make it possible for people in those situations to have some ability to tune out of the bad spots. I am unfortunately not free to disclose all those test results, and since that project is over I can't see how the current LDC compares to what I tested at the time.
I plainly stated I had a bias here, one that's not even close to the average case. My concern here was that Heikki would end up optimizing in a direction where a really wide spread across the active checkpoint interval was strongly preferred. I wanted to offer some suggestions on the type of situation where that might not be true, but where a different tuning of LDC would still be an improvement over the current behavior. There are some tuning knobs there that I don't want to see go away until there's been a wider range of tests to prove they aren't effective.
Right now we're seeing tests where Postgres stops handling *any* transactions for up to a minute. In virtually any real world scenario that would simply be unacceptable.
No doubt; I've seen things get close to that bad myself, both on the high and low end. I collided with the issue in a situation of "maxing out your i/o bandwidth, couldn't buy a faster controller" at one point, which is what kicked off my working in this area. It turned out there were still some software tunables left that pulled the worst case down to the 2-5 second range instead. With more checkpoint_segments to decrease the frequency, that was just enough to make the problem annoying rather than crippling. But after that, I could easily imagine a different application scenario where the behavior you describe is the best case.
This is really a serious issue with the current design of the database, one that merely changes instead of going away completely if you throw more hardware at it. I'm perversely glad to hear this is torturing more people than just me as it improves the odds the situation will improve.
-- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly