On Sun, 26 Aug 2007, Kevin Grittner wrote:

usagecount | count | isdirty
         0 |  8711 | f
         1 |  9394 | f
         2 |  1188 | f
         3 |   869 | f
         4 |   160 | f
         5 |   157 | f

Here's a typical sample from your set. Notice how you've got very few buffers with a high usage count. This is a situation the background writer is good at working with. Either the old or new work-in-progress LRU writer can aggressively pound away at any of the buffers with a 0 usage count shortly after they get dirty, and that won't be inefficient because there aren't large numbers of other clients using them.

Compare against this other sample:

usagecount | count | isdirty
         0 |  9093 | f
         1 |  6702 | f
         2 |  2267 | f
         3 |   602 | f
         4 |   428 | f
         5 |  1388 | f

Notice that you have a much larger number of buffers where the usage count is 4 or 5. The all-scan part of the 8.2 background writer will waste a lot of writes when you have a profile that's more like this. If there have been 4+ client backends touching the buffer recently, you'd be crazy to write it out right now if you could instead be focusing on banging out the ones where the usage count is 0. The 8.2 background writer would write them out anyway, which meant that when you hit a checkpoint both the OS and the controller cache were filled with such buffers before you even started writing the checkpoint data. The new setup in 8.3 only worries about the high usage count buffers when you hit a checkpoint, at which point it streams them out over a longer, adjustable period (as not to spike the I/O more than necessary and block your readers) than the 8.2 design, which just dumped them all immediately.

Just to be sure that I understand, are you saying it would be a bad scene if
the physical writes happened, or that the overhead of pushing them out to
the OS would be crippling?

If you have a lot of buffers where the usage_count data was high, it would be problematic to write them out every time they were touched; odds are good somebody else is going to dirty them again soon enough so why bother. On your workload, that doesn't seem to be the case. But that is the situation on some other test workloads, and balancing for that situation has been central to the parts of the redesign I've been injecting suggestions into. One of the systems I was tormented by had the usagecount of 5 for >20% of the buffers in the cache under heavy load, and had a physical write been executed every time one of those was touched that would have been crippling (even if the OS was smart to cache and therefore make redundant some of the writes, which is behavior I would prefer not to rely on).

This contrib module seems pretty safe, patch and all.  Does anyone think
there is significant risk to slipping it into the 8.2.4 database where we
have massive public exposure on the web site handling 2 million hits per

I think it's fairly safe, and my patch was pretty small; just exposing some data that nobody had been looking at before. Think how much easier your life would have been when doing your earlier tuning if you were looking at the data in these terms. Just be aware that running the query is itself intensive and causes its own tiny hiccup in throughput every time it executes, so you may want to consider this more of a snapshot you run periodically to learn more about your data rather than something you do very regularly.

I also think we need to somehow develop a set of tests which report maximum response time on (what should be) fast queries while the database is under different loads, so that those of us for whom reliable response time is more important than maximum overall throughput are protected from performance regressions.

My guess is that the DBT2 tests that Heikki has been running are a more complicated than you think they are; there are response time guarantee requirements in there as well as the throughput numbers. The tests that I run (which I haven't been publishing yet but will be with the final patch soon) also report worst-case and 90-th percentile latency numbers as well as TPS. A "regression" that improved TPS at the expense of those two would not be considered an improvement by anyone involved here.

* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to