On 09/08/2016 03:36 AM, Peter Geoghegan wrote:
On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
The reason it's called siftup is that that's what Knuth calls it.
See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.

I see that Knuth explains that these steps form the siftup procedure.
What steps does tuplesort_heap_insert correspond to, if any?

Hmm. The loop in tuplesort_heap_siftup() indeed looks the same as Knuth's steps H3-H8. But the start is different. Knuth's algorithm begins with the situation that the top node of the sub-heap is in wrong place, and it pushes it downwards, until the whole sub-heap satisfies the heap condition again. But tuplesort_heap_siftup() begins by moving the last value in the heap to the top, and then it does the H3-H8 steps to move it to the right place.

Using Wikipedia's terms, tuplesort_heap_siftup() function as whole is the Extract operation. And the loop inside it corresponds to the Max-Heapify operation, which is the same as Knuth's "siftup".

I still think tuplesort_heap_siftup is a confusing name, although I'm not sure that Peter's "compact" is much better. I suggest that we rename it to tuplesort_heap_delete_top(). In comments within the function, explain that the *loop* corresponds to the "siftup" in Knuth's book.

Interestingly, as Knuth explains the siftup algorithm, it performs a "replace the top" operation, rather than "remove the top". But we use it to remove the top, by moving the last element to the top and running the algorithm after that. Peter's other patch, to change the way we currently replace the top node, to do it in one pass rather than as delete+insert, is actually Knuth's siftup algorithm.

Peter, looking at your "displace" patch in this light, I think tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm calling it now), should share a common subroutine. Displace replaces the top node with the new node passed as argument, and then calls the subroutine to push it down to the right place. Delete_top replaces the top node with the last value in the heap, and then calls the subroutine. Or perhaps delete_top should remove the last value from the heap, and then call displace() with it, to re-insert it.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to