On Wed, 20 Jun 2007, Heikki Linnakangas wrote:

Another series with 150 warehouses is more interesting. At that # of warehouses, the data disks are 100% busy according to iostat. The 90% percentile response times are somewhat higher with LDC, though the variability in both the baseline and LDC test runs seem to be pretty high.

Great, this the exactly the behavior I had observed and wanted someone else to independantly run into. When you're in 100% disk busy land, LDC can shift the distribution of bad transactions around in a way that some people may not be happy with, and that might represent a step backward from the current code for them. I hope you can understand now why I've been so vocal that it must be possible to pull this new behavior out so the current form of checkpointing is still available.

While it shows up in the 90% figure, what happens is most obvious in the response time distribution graphs. Someone who is currently getting a run like #295 right now: http://community.enterprisedb.com/ldc/295/rt.html

Might be really unhappy if they turn on LDC expecting to smooth out checkpoints and get the shift of #296 instead: http://community.enterprisedb.com/ldc/296/rt.html

That is of course cherry-picking the most extreme examples. But it illustrates my concern about the possibility for LDC making things worse on a really overloaded system, which is kind of counter-intuitive because you might expect that would be the best case for its improvements.

When I summarize the percentile behavior from your results with 150 warehouses in a table like this:

Test    LDC %   90%
295     None    3.703
297     None    4.432
292     10      3.432
298     20      5.925
296     30      5.992
294     40      4.132

I think it does a better job of showing how LDC can shift the top percentile around under heavy load, even though there are runs where it's a clear improvement. Since there is so much variability in results when you get into this territory, you really need to run a lot of these tests to get a feel for the spread of behavior. I spent about a week of continuously running tests stalking this bugger before I felt I'd mapped out the boundaries with my app. You've got your own priorities, but I'd suggest you try to find enough time for a more exhaustive look at this area before nailing down the final form for the patch.

* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to