Heikki, 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect.
How about using 256/blocksize? - Luke > -----Original Message----- > From: Heikki Linnakangas [mailto:[EMAIL PROTECTED] On > Behalf Of Heikki Linnakangas > Sent: Tuesday, May 15, 2007 2:32 AM > To: PostgreSQL-development > Cc: Simon Riggs; Zeugswetter Andreas ADI SD; CK.Tan; Luke > Lonergan; Jeff Davis > Subject: Re: [HACKERS] Seq scans roadmap > > Just to keep you guys informed, I've been busy testing and > pondering over different buffer ring strategies for vacuum, > seqscans and copy. > Here's what I'm going to do: > > Use a fixed size ring. Fixed as in doesn't change after the > ring is initialized, however different kinds of scans use > differently sized rings. > > I said earlier that it'd be invasive change to see if a > buffer needs a WAL flush and choose another victim if that's > the case. I looked at it again and found a pretty clean way > of doing that, so I took that approach for seq scans. > > 1. For VACUUM, use a ring of 32 buffers. 32 buffers is small > enough to give the L2 cache benefits and keep cache pollution > low, but at the same time it's large enough that it keeps the > need to WAL flush reasonable > (1/32 of what we do now). > > 2. For sequential scans, also use a ring of 32 buffers, but > whenever a buffer in the ring would need a WAL flush to > recycle, we throw it out of the buffer ring instead. On > read-only scans (and scans that only update hint bit) this > gives the L2 cache benefits and doesn't pollute the buffer > cache. On bulk updates, it's effectively the current > behavior. On scans that do some updates, it's something in > between. In all cases it should be no worse than what we have > now. 32 buffers should be large enough to leave a "cache > trail" for Jeff's synchronized scans to work. > > 3. For COPY that doesn't write WAL, use the same strategy as > for sequential scans. This keeps the cache pollution low and > gives the L2 cache benefits. > > 4. For COPY that writes WAL, use a large ring of 2048-4096 > buffers. We want to use a ring that can accommodate 1 WAL > segment worth of data, to avoid having to do any extra WAL > flushes, and the WAL segment size is > 2048 pages in the default configuration. > > Some alternatives I considered but rejected: > > * Instead of throwing away dirtied buffers in seq scans, > accumulate them in another fixed sized list. When the list > gets full, do a WAL flush and put them to the shared freelist > or a backend-private freelist. That would eliminate the cache > pollution of bulk DELETEs and bulk UPDATEs, and it could be > used for vacuum as well. I think this would be the optimal > algorithm but I don't feel like inventing something that > complicated at this stage anymore. Maybe for 8.4. > > * Using a different sized ring for 1st and 2nd vacuum phase. > Decided that it's not worth the trouble, the above is already > an order of magnitude better than the current behavior. > > > I'm going to rerun the performance tests I ran earlier with > new patch, tidy it up a bit, and submit it in the next few > days. This turned out to be even more laborious patch to > review than I thought. While the patch is short and in the > end turned out to be very close to Simon's original patch, > there's many different usage scenarios that need to be > catered for and tested. > > I still need to check the interaction with Jeff's patch. This > is close enough to Simon's original patch that I believe the > results of the tests Jeff ran earlier are still valid. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match