Title: Re: [PATCHES] WIP: further sorting speedup
The improvement was pre-Simon’s patch, and it came from implementing a single pass merge instead of a variable pass based on the number of tapes, as it is in Knuth’s tape algorithm.  Also, the additional tricks in logtape.c were higher in the profile than what I see here.

Simon’s patch had the effect of reducing the number of passes by increasing the number of tapes depending on the memory available, but that’s a long tail effect as seen in figure (70?) in Knuth.

Where I’d like this to go is the implementation of a two pass “create runs, merge”, where the second merge can be avoided unless random access is needed (as discussed previously on list).

In the run creation phase, the idea would be to implement something like quicksort or another L2-cache friendly algorithm (ideas?)

- Luke


On 2/19/06 8:19 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

"Luke Lonergan" <[EMAIL PROTECTED]> writes:
> So you know, we=B9ve done some more work on the external sort to remove the
> =B3tape=B2 abstraction from the code, which makes a significant improvement.

Improvement where?  That code's down in the noise so far as I can tell.
I see results like this (with the patched code):

CPU: P4 / Xeon with 2 hyper-threads, speed 2793.08 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 240000
samples  %        symbol name
147310   31.9110  tuplesort_heap_siftup
68381    14.8130  comparetup_index
34063     7.3789  btint4cmp
22573     4.8899  AllocSetAlloc
19317     4.1845  writetup_index
18953     4.1057  tuplesort_gettuple_common
18100     3.9209  mergepreread
17083     3.7006  GetMemoryChunkSpace
12527     2.7137  LWLockAcquire
11686     2.5315  LWLockRelease
6172      1.3370  tuplesort_heap_insert
5392      1.1680  index_form_tuple
5323      1.1531  PageAddItem
4943      1.0708  LogicalTapeWrite
4525      0.9802  LogicalTapeRead
4487      0.9720  LockBuffer
4217      0.9135  heapgettup
3891      0.8429  IndexBuildHeapScan
3862      0.8366  ltsReleaseBlock

It appears that a lot of the cycles blamed on tuplesort_heap_siftup are
due to cache misses associated with referencing memtuples[] entries
that have fallen out of L2 cache.  Not sure how to improve that though.

                        regards, tom lane



Reply via email to