On Sun, 13 May 2007, Heikki Linnakangas wrote:

StrategyReportWrite increments numClientWrites without holding the BufFreeListLock, that's a race condition. The terminology needs some adjustment; clients don't write buffers, backends do.

That was another piece of debugging code I moved into the main path without thinking too hard about it, good catch. I have a documentation/naming patch I've started on that revises a lot of the pg_stat_bgwriter names to be more consistant and easier to understand (as well as re-ordering the view); the underlying code is still fluid enough that I was trying to nail that down first.

That algorithm seems decent, but I wonder why the simple fudge factor wasn't good enough? I would've thought that a 2x or even bigger fudge factor would still be only a tiny fraction of shared_buffers, and wouldn't really affect performance.

I like the way the smoothing evens out the I/O rates. I saw occasional spots where the buffer allocations drop to 0 for a few intervals while other stuff is going on everybody is waiting for, and I didn't want all LRU cleanup come to halt just because there's a fraction of a second where nothing happened in the middle of a very busy period.

As for why not overestimate, if you get into a situation where the buffer cache is very dirty with much of the data being recently used (I normally see this with bulk UPDATEs on indexed tables), you can end up scanning many buffers for each one you find that can be written out. In this kind of situation, deciding that you actually need to write out twice as many just because you don't trust your estimate is very inefficient.

I was able to simulate most of the bad behavior I look for with the pgbench schema using "update accounts set abalance=abalance+1;". To throw some sample numbers out, on my test server I was just doing final work on last night, I was seeing peaks of about 600-1200 buffers allocated per 200ms interval doing that simple UPDATE with shared_buffers=32768.

Let's call it 2% of the pool. If 50% of the pool is either dirty or can't be reused yet, that means I'll average having to scan 2%/50%=4% of the pool to find enough buffers to reuse per interval. I wouldn't describe that as a tiny fraction, and doubling it is not an insignificant load increase. I'd like to be able to increase the LRU percentage scanned without being concerned that I'm wasting resources because of this situation.

The fact that this problem exists is what got me digging into the background writer code in the first place, because it's way worse on my production server than this example suggests. The buffer cache is bigger, but the ability of the server to dirty it under heavy load is far better. Returning to the theme discussed in the -hackers thread I referenced: you can't try to make the background writer LRU do all the writes without exposing yourself to issues like this, because it doesn't touch the usage counts. Therefore it's vulnerable to breakdowns if your buffer pool shifts toward dirty and non-reusable.

Having the background writer run amok when reusable buffers are rare can really pull down the performance of the other backends (as well as delay checkpoints), both in terms of CPU usage and locking issues. I don't feel it's a good idea to try and push it too hard unless some of these underlying issues are fixed first; I'd rather err on the side of letting it do less rather than more than it has to.

The normal way to return multiple values is to pass a pointer as an argument, though that can get ugly as well if there's a lot of return values.

I'm open to better suggestions, but after tinkering with this interface for over a month now--including pointers and enums--this is the first implementation I was happy with.

There are four things I eventually need returned here, to support the fully automatic BGW tuning. My 1st implementation passed in pointers, and in addition to being ugly I found consistantly checking for null pointers and data consistancy a drag, both from the coding and the overhead perspective.

What combinations of the flags are valid? Would an enum be better?

And my 2nd generation code used an enum. There are five possible return code states:

!CLEAN + !REUSABLE + WRITTEN (all-scan only)
!CLEAN + !REUSABLE + !WRITTEN (rejected by skip)

!CLEAN + REUSABLE + !WRITTEN isn't possible (all paths will write dirty reusable buffers)

I found the enum-based code more confusing, both reading it and making sure it was correct when writing it, than the current form. Right now I have lines like:

 if (buffer_state & BUF_REUSABLE)

With an enum this has to be something like

if (buffer_state == BUF_CLEAN_REUSABLE || buffer_state == BUF_REUSABLE_WRITTEN)

And that was a pain all around; I kept having to stare at the table above to make sure the code was correct. Also, in order to pass back full usage_count information I was back to either pointers or bitshifting anyway. While this particular patch doesn't need the usage count, the later ones I'm working on do, and I'd like to get this interface complete while it's being tinkered with anyway.

Or how about moving the checks for dirty and pinned buffers from SyncOneBuffer to the callers?

There are 3 callers to SyncOneBuffer, and almost all the code is shared between them. Trying to push even just the dirty/pinned stuff back into the callers would end up being a cut and paste job that would duplicate many lines. That's on top of the fact that the buffer is cleanly locked/unlocked all in one section of code right now, and I didn't see how to move any parts of that to the callers without disrupting that clean interface.

* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?


Reply via email to