On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> Frankly that analysis didn't make any sense to me at the time.
>> Comparing integers is fast, sure, but it's still slower than not
>> having to do any comparison at all.
>
> I think you're underestimating how much it costs to call the
> datatype-specific comparator.  My belief is that it's wicked
> expensive.

I'm totally with you on the datatype-specific comparator being expensive.

But we must be talking about two different scenarios. I don't see why
Tom's algorithm was slower than Knuth's unless there was a bug. It
seems to me it should perform exactly as many comparator calls but
save the integer comparisons and the extra space for them.

In the current algorithm, Knuth's, it compares the new tuple against
the most recently emitted tuple to set the run number then adds it to
the bottom of the heap and sifts up. If it's from the current run it
does a bunch of integer comparisons and skips past the next run
quickly. If it's from the next run it sifts up to the right spot in
the next run and if it hits the top of the next run it does a quick
integer comparison and stops.

In Tom's algorithm it would perform the same comparison against the
recently emitted tuple to set the run number and either add it to the
unsorted list or the bottom of the heap. If it's added to the unsorted
list we're done, if it's added to the bottom of the heap it performs
the same siftup it would have done above except that it skips the
bottom few levels of the heap -- all of which were fast integer
comparisons. They might be fast but they can't be faster than doing
nothing at all. When the heap is empty then we do a heapify which
currently would be exactly the same O(nlogn) comparisons that we do
maintaining the bottom of the heap but with a O(n) heapify would be
fewer.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to