Re: [HACKERS] Warm-cache prefetching

Luke Lonergan Thu, 08 Dec 2005 20:54:48 -0800

Qingqing,

On 12/8/05 8:07 PM, "Qingqing Zhou" <[EMAIL PROTECTED]> wrote:


>                                         /* prefetch ahead */
>                                         __asm__ __volatile__ (
>                                         "1: prefetchnta 128(%0)\n"
>                                                 : : "r" (s) : "memory" );

I think this kind / grain of prefetch is handled as a compiler optimization
in the latest GNU compilers, and further there are some memory streaming
operations for the Pentium 4 ISA that are now part of the standard compiler
optimizations done by gcc.

What I think would be tremendously beneficial is to implement L2 cache
blocking in certain key code paths like sort.  What I mean by "cache
blocking" is performing as many operations on a block of memory (maybe 128
pages worth for a 1MB cache) as possible, then moving to the next batch of
memory and performing all of the work on that batch, etc.

The other thing to consider in conjunction with this would be maximizing use
of the instruction cache, increasing use of parallel functional units and
minimizing pipeline stalls.  The best way to do this would be to group
operations into tighter groups and separating out branching:

So instead of structures like this:

  function_row_at_a_time(row)
    if conditions
      do some work
    else if other
      do different work
    else if error
      print error_log

You'd have
  function_buffer_at_a_time(buffer_of_rows)
    loop on sizeof(buffer_of_rows) / sizeof_L2cache
       do a lot of work on each row

    loop on sizeof(buffer_of_rows) / sizeof_L2cache
       if error exit
       
The ideas in the above optimizations:
- Delay work until a buffer can be gathered
- Increase the "computational intensity" of the loops by putting more
instructions together
- While in loops doing lots of work, avoid branches / jumps

- Luke



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] Warm-cache prefetching

Reply via email to