Re: [PATCHES] COPY FROM performance improvements

Luke Lonergan Wed, 10 Aug 2005 02:07:32 -0700

Simon,
 
> That part of the code was specifically written to take advantage of
> processing pipelines in the hardware, not because the actual theoretical
> algorithm for that approach was itself faster.


Yup, good point.
 
> Nobody's said what compiler/hardware they have been using, so since both
> Alon and Tom say their character finding logic is faster, it is likely
> to be down to that? Name your platforms gentlemen, please.

In this case, we've been using gcc (3.2.3 RHEL3 Linux, 3.4.3 Solaris 10) on 
Opteron and Intel Xeon and Pentium 4.  Alon's performance comparisons for the 
parse only were done on a HT enabled P4 3.0GHz on RHEL3 with gcc 3.2.3, 
probably with optimization -O2, but possibly -O3.
 
Note that the level of microparallelism on upcoming CPUs is increasing with 
increasing pipeline depth.  Though there will be a step back on the Intel line 
with the introduction of the Centrino-based Xeon cores in 2006/7, other CPUs 
continue the trend, and I expect the next generation of multi-core CPUs to 
possibly introduce threaded micro-architectures which can also be scheduled as 
pipelines.  The gcc 4 compiler introduces auto vectorization, which may enhance 
the optimality of some loops.
 
I think the key thing is to make as much parallelism apparent to the compiler 
as possible, which will generally mean loops.  This means faster code on all 
modern CPUs and it won't hurt older CPU speeds.

> My feeling is that we may learn something here that applies more widely
> across many parts of the code.

Yes, I think one thing we've learned is that there are important parts of the 
code, those that are in the data path (COPY, sort, spill to disk, etc) that are 
in dire need of optimization.  For instance, the fgetc() pattern should be 
banned everywhere in the data path.
 
BTW - we are tracking down (in our spare time :-() the extremely slow sort 
performance.  We're seeing sort times of 1.7MB/s on our fastest machines, even 
when the work_mem is equal to the square root of the sort set.  This is a 
*serious* problem for us and we aren't getting to it - ideas are welcome.
 
Optimization here means both the use of good fundamental algorithms and 
micro-optimization (minimize memory copies, expose long runs of operations to 
the compiler, maximize computational intensity by working in cache-resident 
blocks, etc).
 
- Luke




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [PATCHES] COPY FROM performance improvements

Reply via email to