Heikki,

32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
effect.

How about using 256/blocksize?

- Luke

> -----Original Message-----
> From: Heikki Linnakangas [mailto:[EMAIL PROTECTED] On 
> Behalf Of Heikki Linnakangas
> Sent: Tuesday, May 15, 2007 2:32 AM
> To: PostgreSQL-development
> Cc: Simon Riggs; Zeugswetter Andreas ADI SD; CK.Tan; Luke 
> Lonergan; Jeff Davis
> Subject: Re: [HACKERS] Seq scans roadmap
> 
> Just to keep you guys informed, I've been busy testing and 
> pondering over different buffer ring strategies for vacuum, 
> seqscans and copy. 
> Here's what I'm going to do:
> 
> Use a fixed size ring. Fixed as in doesn't change after the 
> ring is initialized, however different kinds of scans use 
> differently sized rings.
> 
> I said earlier that it'd be invasive change to see if a 
> buffer needs a WAL flush and choose another victim if that's 
> the case. I looked at it again and found a pretty clean way 
> of doing that, so I took that approach for seq scans.
> 
> 1. For VACUUM, use a ring of 32 buffers. 32 buffers is small 
> enough to give the L2 cache benefits and keep cache pollution 
> low, but at the same time it's large enough that it keeps the 
> need to WAL flush reasonable
> (1/32 of what we do now).
> 
> 2. For sequential scans, also use a ring of 32 buffers, but 
> whenever a buffer in the ring would need a WAL flush to 
> recycle, we throw it out of the buffer ring instead. On 
> read-only scans (and scans that only update hint bit) this 
> gives the L2 cache benefits and doesn't pollute the buffer 
> cache. On bulk updates, it's effectively the current 
> behavior. On scans that do some updates, it's something in 
> between. In all cases it should be no worse than what we have 
> now. 32 buffers should be large enough to leave a "cache 
> trail" for Jeff's synchronized scans to work.
> 
> 3. For COPY that doesn't write WAL, use the same strategy as 
> for sequential scans. This keeps the cache pollution low and 
> gives the L2 cache benefits.
> 
> 4. For COPY that writes WAL, use a large ring of 2048-4096 
> buffers. We want to use a ring that can accommodate 1 WAL 
> segment worth of data, to avoid having to do any extra WAL 
> flushes, and the WAL segment size is
> 2048 pages in the default configuration.
> 
> Some alternatives I considered but rejected:
> 
> * Instead of throwing away dirtied buffers in seq scans, 
> accumulate them in another fixed sized list. When the list 
> gets full, do a WAL flush and put them to the shared freelist 
> or a backend-private freelist. That would eliminate the cache 
> pollution of bulk DELETEs and bulk UPDATEs, and it could be 
> used for vacuum as well. I think this would be the optimal 
> algorithm but I don't feel like inventing something that 
> complicated at this stage anymore. Maybe for 8.4.
> 
> * Using a different sized ring for 1st and 2nd vacuum phase. 
> Decided that it's not worth the trouble, the above is already 
> an order of magnitude better than the current behavior.
> 
> 
> I'm going to rerun the performance tests I ran earlier with 
> new patch, tidy it up a bit, and submit it in the next few 
> days. This turned out to be even more laborious patch to 
> review than I thought. While the patch is short and in the 
> end turned out to be very close to Simon's original patch, 
> there's many different usage scenarios that need to be 
> catered for and tested.
> 
> I still need to check the interaction with Jeff's patch. This 
> is close enough to Simon's original patch that I believe the 
> results of the tests Jeff ran earlier are still valid.
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
> 
> 


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to