Tom,

On 3/5/07 7:58 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> I looked a bit at the Linux code that's being used here, but it's all
> x86_64 assembler which is something I've never studied :-(.

Here's the C wrapper routine in Solaris:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/
move.c

Here's the x86 assembler routine for Solaris:
  
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32
/ml/copy.s

The actual uiomove routine is a simple wrapper that calls the assembler
kcopy or xcopyout routines.  There are two versions (for Opteron), one that
uses the NTA instructions that bypass the L2 cache on writing to avoid L2
cache pollution, and the second writes normally - through the L2 cache.
Which one is used depends on a parameter (global) based on the size of the
I/O. It is tuned to identify operations that might pollute the L2 cache
(sound familiar?) 

I think what we're seeing is a generic artifact of the write-through
behavior of the cache.  I wouldn't expect this to get any better with
DIRECTIO to the shared_buffers in pgsql - if we iterate over a large number
of user space buffers we'll still hit the increased L2 thrashing.

I think we're best off with a hybrid approach - when we "detect" a seq scan
larger (much larger?) than buffer cache, we can switch into the "cache
bypass" behavior, much like the above code uses the NTA instruction when
appropriate.

We can handle syncscan using a small buffer space.

- Luke



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to