The idea that bgwriter smooths out the response time of transactions is only true if the buffer lists T1 and T2 have *some* clean buffers available for use when performing I/O. The alternative is that transactions unlucky enough to encounter the no-clean-buffers situation have to clean a space for themselves, effectively making the bgwriter redundant.
In BufferSync, we start off by calling StrategyDirtyBufferList to make a list of all the dirty buffers. Even though we know we are limited to maxpages, we still scan the whole of shared_buffers (...making it a very expensive call and thereby causing us to increase bgwriter_delay, which then negates the cleaning effect as described above). Once we've got the list, we limit ourselves to only using maxpages of the list that we just built. We do it that way round to allow bgwriter_percent to calculate how many of the dirty buffers it should flush, on the assumption that percent < 100. If the bgwriter_percent = 100, then we should actually do the sensible thing and prepare the list that we need, i.e. limit StrategyDirtyBufferList to finding at most bgwriter_maxpages. Thus if you have a large shared_buffers, you can still have a relatively frequent bgwriter_delay, so that the bgwriter can keep the LRUs of the T1 and T2 lists free for use...and so let backends get on with useful work. Patch which implements this attached, for discussion. Mark, any chance we could run this patch on STP to test whether it has a beneficial performance effect? Re-run test 207 to compare? I'll be asking for this in 8.0, if it works, for all the same performance reasons discussed previously as well as coming under the header of "bgwriter default changes" since this effects the default behaviour when bgwriter_percent=100. There are some other ideas for 8.1, but that can wait. -- Best Regards, Simon Riggs
Index: buffer/bufmgr.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.182 diff -d -c -r1.182 bufmgr.c *** buffer/bufmgr.c 24 Nov 2004 02:56:17 -0000 1.182 --- buffer/bufmgr.c 11 Dec 2004 17:09:31 -0000 *************** *** 681,686 **** --- 681,687 ---- { BufferDesc **dirty_buffers; BufferTag *buftags; + int maxdirty; int num_buffer_dirty; int i; *************** *** 688,704 **** if (percent == 0 || maxpages == 0) return 0; /* * Get a list of all currently dirty buffers and how many there are. * We do not flush buffers that get dirtied after we started. They * have to wait until the next checkpoint. */ ! dirty_buffers = (BufferDesc **) palloc(NBuffers * sizeof(BufferDesc *)); ! buftags = (BufferTag *) palloc(NBuffers * sizeof(BufferTag)); LWLockAcquire(BufMgrLock, LW_EXCLUSIVE); ! num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags, ! NBuffers); /* * If called by the background writer, we are usually asked to only --- 689,714 ---- if (percent == 0 || maxpages == 0) return 0; + /* If we know we will write all dirty buffers, up to the limit of maxpages + * then we can make a cheaper call to StrategyDirtyBufferList + */ + if (percent = 100) + maxdirty = maxpages; + else + maxdirty = NBuffers; + /* * Get a list of all currently dirty buffers and how many there are. * We do not flush buffers that get dirtied after we started. They * have to wait until the next checkpoint. */ ! dirty_buffers = (BufferDesc **) palloc(maxdirty * sizeof(BufferDesc *)); ! buftags = (BufferTag *) palloc(maxdirty * sizeof(BufferTag)); LWLockAcquire(BufMgrLock, LW_EXCLUSIVE); ! num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags, ! maxdirty); /* * If called by the background writer, we are usually asked to only
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])