On Tue, Mar 20, 2012 at 3:41 PM, Greg Stark <st...@mit.edu> wrote: > On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas <robertmh...@gmail.com> wrote: >> No. It does the opposite: it slows it down. This is a highly >> surprising result but it's quite repeatable: removing comparisons >> makes it slower. As previously pontificated, I think this is probably >> because the heap can fill up with next-run tuples that are cheap to >> compare against, and that spares us having to do "real" comparisons >> involving the actual datatype comparators. > > Frankly that analysis didn't make any sense to me at the time. > Comparing integers is fast, sure, but it's still slower than not > having to do any comparison at all.
I think you're underestimating how much it costs to call the datatype-specific comparator. My belief is that it's wicked expensive. The COMPARETUP() macro extracts a function pointer from the Tuplesortstate and calls it; we end up comparetup_heap, which calls ApplySortComparator(), which pulls the comparator function out of the state and then calls that. Since I was sorting strings, which have no sortsupport, we then end up in comparison_shim(), which uses the FunctionCallInvoke method to extract the actual function pointer and jump into bttextcmp(), which unpacks its arguments and then calls text_cmp(), which unpacks its arguments some more and then calls varstr_cmp() where the actual work happens. That's not trivial either - we have to call lc_collate_is_c() and then memcmp(). I have no problem believing that 6 levels of function calls, 3 of which involve jumps through function pointers, followed by lc_collate_is_c() and memcmp() is 100x or more as expensive than the lone integer comparison that happens when the tupindex values don't match - that's a single instruction. > Fwiw I think more interesting than improving tapesort would be > implementing wholly different algorithms like radix sort or the > repeated quicksort. Being able to handle different algorithms that > require a different API would be the first step to being able to > handle parallel algorithms using threads or GPUs. Yeah, I think I'm going to try implementing quicksort-the-whole-batch-and-dump-it-out-as-a-run algorithm, just to see how good or bad that is compared to what we have now. We may not end up doing anything that remotely resembles that, in the end, but I want to see the numbers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers