Re: [PATCHES] Automatic adjustment of bgwriter_lru_maxpages

Greg Smith Sun, 13 May 2007 11:56:13 -0700

On Sun, 13 May 2007, Heikki Linnakangas wrote:

StrategyReportWrite increments numClientWrites without holding theBufFreeListLock, that's a race condition. The terminology needs someadjustment; clients don't write buffers, backends do.

That was another piece of debugging code I moved into the main pathwithout thinking too hard about it, good catch. I have adocumentation/naming patch I've started on that revises a lot of thepg_stat_bgwriter names to be more consistant and easier to understand (aswell as re-ordering the view); the underlying code is still fluid enoughthat I was trying to nail that down first.

That algorithm seems decent, but I wonder why the simple fudge factorwasn't good enough? I would've thought that a 2x or even bigger fudgefactor would still be only a tiny fraction of shared_buffers, andwouldn't really affect performance.

I like the way the smoothing evens out the I/O rates. I saw occasionalspots where the buffer allocations drop to 0 for a few intervals whileother stuff is going on everybody is waiting for, and I didn't want allLRU cleanup come to halt just because there's a fraction of a second wherenothing happened in the middle of a very busy period.

As for why not overestimate, if you get into a situation where the buffercache is very dirty with much of the data being recently used (I normallysee this with bulk UPDATEs on indexed tables), you can end up scanningmany buffers for each one you find that can be written out. In this kindof situation, deciding that you actually need to write out twice as manyjust because you don't trust your estimate is very inefficient.

I was able to simulate most of the bad behavior I look for with thepgbench schema using "update accounts set abalance=abalance+1;". To throwsome sample numbers out, on my test server I was just doing final work onlast night, I was seeing peaks of about 600-1200 buffers allocated per200ms interval doing that simple UPDATE with shared_buffers=32768.

Let's call it 2% of the pool. If 50% of the pool is either dirty or can'tbe reused yet, that means I'll average having to scan 2%/50%=4% of thepool to find enough buffers to reuse per interval. I wouldn't describethat as a tiny fraction, and doubling it is not an insignificant loadincrease. I'd like to be able to increase the LRU percentage scannedwithout being concerned that I'm wasting resources because of thissituation.

The fact that this problem exists is what got me digging into thebackground writer code in the first place, because it's way worse on myproduction server than this example suggests. The buffer cache is bigger,but the ability of the server to dirty it under heavy load is far better.Returning to the theme discussed in the -hackers thread I referenced:you can't try to make the background writer LRU do all the writes withoutexposing yourself to issues like this, because it doesn't touch the usagecounts. Therefore it's vulnerable to breakdowns if your buffer poolshifts toward dirty and non-reusable.

Having the background writer run amok when reusable buffers are rare canreally pull down the performance of the other backends (as well as delaycheckpoints), both in terms of CPU usage and locking issues. I don't feelit's a good idea to try and push it too hard unless some of theseunderlying issues are fixed first; I'd rather err on the side of lettingit do less rather than more than it has to.

The normal way to return multiple values is to pass a pointer as anargument, though that can get ugly as well if there's a lot of returnvalues.

I'm open to better suggestions, but after tinkering with this interfacefor over a month now--including pointers and enums--this is the firstimplementation I was happy with.

There are four things I eventually need returned here, to support thefully automatic BGW tuning. My 1st implementation passed in pointers, andin addition to being ugly I found consistantly checking for null pointersand data consistancy a drag, both from the coding and the overheadperspective.

What combinations of the flags are valid? Would an enum be better?

And my 2nd generation code used an enum. There are five possible returncode states:


CLEAN + REUSABLE + !WRITTEN
CLEAN + !REUSABLE + !WRITTEN
!CLEAN + !REUSABLE + WRITTEN (all-scan only)
!CLEAN + !REUSABLE + !WRITTEN (rejected by skip)
!CLEAN + REUSABLE + WRITTEN

!CLEAN + REUSABLE + !WRITTEN isn't possible (all paths will write dirtyreusable buffers)

I found the enum-based code more confusing, both reading it and makingsure it was correct when writing it, than the current form. Right now Ihave lines like:


 if (buffer_state & BUF_REUSABLE)

With an enum this has to be something like

if (buffer_state == BUF_CLEAN_REUSABLE || buffer_state ==BUF_REUSABLE_WRITTEN)

And that was a pain all around; I kept having to stare at the table aboveto make sure the code was correct. Also, in order to pass back fullusage_count information I was back to either pointers or bitshiftinganyway. While this particular patch doesn't need the usage count, thelater ones I'm working on do, and I'd like to get this interface completewhile it's being tinkered with anyway.

Or how about moving the checks for dirty and pinned buffers fromSyncOneBuffer to the callers?

There are 3 callers to SyncOneBuffer, and almost all the code is sharedbetween them. Trying to push even just the dirty/pinned stuff back intothe callers would end up being a cut and paste job that would duplicatemany lines. That's on top of the fact that the buffer is cleanlylocked/unlocked all in one section of code right now, and I didn't see howto move any parts of that to the callers without disrupting that cleaninterface.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [PATCHES] Automatic adjustment of bgwriter_lru_maxpages

Reply via email to