We’ll test this sometime soon and get back to you. We’re kind of jammed this week, hopefully we’ll get some time.
So you know, we’ve done some more work on the external sort to remove the “tape” abstraction from the code, which makes a significant improvement. This involved removing both the Knuth tapes, and the logtape.c codepath. The result is a reasonable improvement in performance (tens of percent), and a dramatic reduction in the amount of code.
Since we’re looking for a 4-fold improvement based on comparisons to “other commercial databases”, we feel we’re not done yet. Our next step (before we got jammed getting our latest MPP release out) was to implement these:
- Locate the cause for the excessive time in heap_getattr (you just did it)
- Implement something other than replacement selection for creating runs to optimize cache use
On 2/19/06 6:40 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:
After applying Simon's recent sort patch, I was doing some profiling and
noticed that sorting spends an unreasonably large fraction of its time
extracting datums from tuples (heap_getattr or index_getattr). The
attached patch does something about this by pulling out the leading sort
column of a tuple when it is received by the sort code or re-read from a
"tape". This increases the space needed by 8 or 12 bytes (depending on
sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as
the on-disk representation goes. The effort needed to extract the datum
at this point is well repaid because the tuple will normally undergo
multiple comparisons while it remains in memory. In some quick tests
the patch seemed to make for a significant speedup, on the order of 30%,
despite increasing the number of runs emitted because of the smaller
The choice to pull out just the leading column, rather than all columns,
is driven by concerns of (a) code complexity and (b) memory space.
Having the extra columns pre-extracted wouldn't buy anything anyway
in the common case where the leading key determines the result of
This is still WIP because it leaks memory intra-query (I need to fix it
to clean up palloc'd space better). I thought I'd post it now in case
anyone wants to try some measurements for their own favorite test cases.
In particular it would be interesting to see what happens for a
multi-column sort with lots of duplicated keys in the first column,
which is the case where the least advantage would be gained.
regards, tom lane