Tom, On 3/5/07 7:58 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:
> I looked a bit at the Linux code that's being used here, but it's all > x86_64 assembler which is something I've never studied :-(. Here's the C wrapper routine in Solaris: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/ move.c Here's the x86 assembler routine for Solaris: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32 /ml/copy.s The actual uiomove routine is a simple wrapper that calls the assembler kcopy or xcopyout routines. There are two versions (for Opteron), one that uses the NTA instructions that bypass the L2 cache on writing to avoid L2 cache pollution, and the second writes normally - through the L2 cache. Which one is used depends on a parameter (global) based on the size of the I/O. It is tuned to identify operations that might pollute the L2 cache (sound familiar?) I think what we're seeing is a generic artifact of the write-through behavior of the cache. I wouldn't expect this to get any better with DIRECTIO to the shared_buffers in pgsql - if we iterate over a large number of user space buffers we'll still hit the increased L2 thrashing. I think we're best off with a hybrid approach - when we "detect" a seq scan larger (much larger?) than buffer cache, we can switch into the "cache bypass" behavior, much like the above code uses the NTA instruction when appropriate. We can handle syncscan using a small buffer space. - Luke ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster