Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring

Greg Smith Mon, 12 Mar 2007 18:45:46 -0800

On Fri, 9 Mar 2007, Tom Lane wrote:

It strikes me that the patch would be more useful if it produced ahistogram of the observed usage_counts

Don't have something worth releasing yet, but I did code a first rev ofthis today. The results are quite instructive and it's well worth lookingat.

The main server I work on has shared_buffers=60000 for pgbench testing andthe background writers turned way up (in hopes of reducing checkpointtimes; that's a whole nother topic). I run pgbench with s=100 (~1.6GBdatabase). Here's what I get as statistics on the buffer pool after ascan when the server is "happy", from a run with 20 clients:


writes=38.3MB (8.2%) pinned+used=38.3MB (8.2%)
dirty buffer usage count histogram:
0=0.1% 1=0.3% 2=26% 3=17% 4=21% 5+=36%

Basically, everything that's left dirty is also heavily used; as I notedbefore, when I inspect with pg_buffercache these are mostly index blocks.I note that I should probably generate a second histogram that comparesthe clean data. The all scan is writing pages out as soon as they dirty,the LRU background writer is lucky if it can find anything to do. (Notethat I don't rely on the LRU writer more because that causes a significantlull in writes after a checkpoint. By the time it gets going again it'sharder to catch up, and delaying PostgreSQL writes also aggrevates issueswith the way Linux caches writes.)

What's I found really interesting was comparing a "sad" section whereperformance breaks down. This is from a minute later:


writes=441.6MB (94.2%) pinned+used=356.2MB (76.0%)
dirty buffer usage count histogram:
0=18.7% 1=26.4% 2=31% 3=11% 4=9% 5+=4%

Note how the whole buffer distribution has shifted toward lower usagecounts. The breakdown seems to involve evicting the index blocks to makeroom for the recently dirty data when it can't be written out fast enough.As the index blocks go poof, things slow further, and into the viciouscircle you go. Eventually you can end up blocked on a combination ofbuffer evictions and disk seeks for uncached data that are fighting withthe writes.

The bgwriter change this suggested to me is defining a triage behaviorwhere the all scan switches to acting like the LRU one:

-Each sample period, note what % of the usage_count=0 records are dirty-If that number is above a tunable threshold, switch to only writingusage_count=0 records until it isn't anymore.

On my system a first guess for that tunable would be 2-5%, based on whatvalues I see on either side of the sad periods. No doubt some systemswould set that much higher, haven't tested my system at home yet to have aguideline for a more typical PC.

As for why this behavior matters so much to me, I actually have aprototype auto-tuning background writer design that was hung up on thisparticular problem. It notes how often you write out max_pages, uses thatto model the average percentage of the buffer cache you're traversing eachpass, then runs a couple of weighted-average feedback loops to aim for atarget seconds/sweep.

The problem was that it went berzerk when the whole buffer cache was dirty(I hope someone appreciates that I've been calling this "Felix Unger oncrack mode" in my notes). I needed some way to prioritize which buffersto concentrate on when that happens, and so far the above has been a goodfirst-cut way to help with that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring

Reply via email to