On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas <robertmh...@gmail.com> wrote: >> Frankly that analysis didn't make any sense to me at the time. >> Comparing integers is fast, sure, but it's still slower than not >> having to do any comparison at all. > > I think you're underestimating how much it costs to call the > datatype-specific comparator. My belief is that it's wicked > expensive.
I'm totally with you on the datatype-specific comparator being expensive. But we must be talking about two different scenarios. I don't see why Tom's algorithm was slower than Knuth's unless there was a bug. It seems to me it should perform exactly as many comparator calls but save the integer comparisons and the extra space for them. In the current algorithm, Knuth's, it compares the new tuple against the most recently emitted tuple to set the run number then adds it to the bottom of the heap and sifts up. If it's from the current run it does a bunch of integer comparisons and skips past the next run quickly. If it's from the next run it sifts up to the right spot in the next run and if it hits the top of the next run it does a quick integer comparison and stops. In Tom's algorithm it would perform the same comparison against the recently emitted tuple to set the run number and either add it to the unsorted list or the bottom of the heap. If it's added to the unsorted list we're done, if it's added to the bottom of the heap it performs the same siftup it would have done above except that it skips the bottom few levels of the heap -- all of which were fast integer comparisons. They might be fast but they can't be faster than doing nothing at all. When the heap is empty then we do a heapify which currently would be exactly the same O(nlogn) comparisons that we do maintaining the bottom of the heap but with a O(n) heapify would be fewer. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers