On 02/14/2012 01:45 PM, Greg Smith wrote:
scale=1000, db is 94% of RAM; clients=4 Version TPS 9.0 535 9.1 491 (-8.4% relative to 9.0) 9.2 338 (-31.2% relative to 9.1)
A second pass through this data noted that the maximum number of buffers cleaned by the background writer is <=2785 in 9.0/9.1, while it goes as high as 17345 times in 9.2. The background writer is so busy now it hits the max_clean limit around 147 times in the slower of the 9.2 runs. That's an average of once every 4 seconds, quite frequent. Whereas max_clean rarely happens in the comparable 9.0/9.1 results. This is starting to point my finger more toward this being an unintended consequence of the background writer/checkpointer split.
Thinking out loud, about solutions before the problem is even nailed down, I wonder if we should consider lowering bgwriter_lru_maxpages now in the default config? In older versions, the page cleaning work had at most a 50% duty cycle; it was only running when checkpoints were not. If we wanted to keep the ceiling on background writer cleaning at the same level in the default configuration, that would require dropping bgwriter_lru_maxpages from 100 to 50. That would be roughly be the same amount of maximum churn. It's obviously more complicated than that, but I think there's a defensible position along those lines to consider.
As a historical aside, I wonder how much this behavior might have been to blame for my failing to get spread checkpoints to show a positive outcome during 9.1 development. The way that was written also kept the cleaner running during checkpoints. I didn't measure those two changes individually as much as I did the combination.
 I normally do 3 runs of every scale/client combination, and find that more useful than a single run lasting 3X as long. The first out of each of the 3 runs I do at any scale is usually a bit faster than the later two, presumably due to table and/or disk fragmentation. I've tried to make this less of a factor in pgbench-tools by iterating through all requested client counts first, before beginning a second run of those scale/client combination. So if the two client counts were 4 and 8, it would be 4/8/4/8/4/8, which works much better than 4/4/4/8/8/8 in terms of fragmentation impacting the average result. Whether it would be better or worse to eliminate this difference by rebuilding the whole database multiple times for each scale is complicated. I happen to like seeing the results with a bit more fragmentation mixed in, see how they compare with the fresh database. Since more rebuilds would also make these tests take much longer than they already do, that's the tie-breaker that's led to the current testing schedule being the preferred one.
-- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers