On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas <robertmh...@gmail.com> wrote: > ... >> >> One thing that seems inefficient to me about our current algorithm is >> the use of the run number as a leading column in the sort key. >> There's no real reason why the tuples destined for the next run need >> to be maintained in heap order; we could just store them unordered and >> heapify the whole lot of them when it's time to start the next run. >> That ought to save comparison cycles for the current run, since the >> heap will be less deep, and there are other possible savings as well - >> in particular, an unordered array of tuples can be heapify'd in linear >> time, whereas our current algorithm is O(n lg n). However, when I >> actually implemented this, I found that doing it this way was a loser. >> In fact, excluding the next-run tuples from the heap for the current >> run was a BIG loser even before you add in the time required to >> re-heapify. This confused the daylights out of me for a while, >> because it's hard to understand how insert and siftup can be slower on >> a small heap than a large one. >> >> My working theory is that, even though we must necessarily do more >> comparisons when manipulating a larger heap, many of those comparisons >> are resolved by comparing run numbers, and that's a lot cheaper than >> invoking the real comparator. For example, if we insert into a heap >> whose rightmost child is in the next run, and the new tuple goes into >> the current run, and the current size of the heap is such that the >> initial position of the new element is a descendent of the right node, >> it will very quickly crash through all of the next-run tuples and only >> need one REAL comparison - against the root. Either the new element >> ends up as the root, or it ends up as the direct child of the root; >> now we remove the root and, perhaps, replace it with its rightmost >> child, meaning that the next element we read in can do the exact same >> thing all over again. If we exclude the next-run elements from the >> heap, then the average depth of the heap when inserting a new element >> is smaller, but all the elements are in the same-run, and we have to >> invoke the real comparator every time. In other words, those next-run >> tuples act as dummies which effectively create a heap of uneven depth, >> and the average depth encountered when inserting tuples turns out to >> be less than what we get by pulling out the dummies and making the >> depth uniform. >> >> This makes me kind of nervous, because I don't understand why things >> should work out so well as all that. If throwing some dummy elements >> into the heap makes things work better, then maybe we should throw in >> some more. Or maybe it would work better to take some but not all of >> them out. There certainly doesn't seem to be any indication in the >> code that this is an anticipated effect, and it seems an awfully >> providential accident. > > Is there a patch or a git repo available for this change? I'd like to > see if I can analyze that if I get some spare time.
Sorry for the slow response to this. Patch is attached (crummy-external-sort.patch). > Clever. Rather than doing a binary search of the path, it might make > sense to do a reverse search of it. The tuple is likely to send up > somewhere at the bottom of the heap, both because that is where most > of the data is, and because the tuple being reinserted just came from > the bottom so it is likely to be biased that way. Patches for these approaches also attached (siftup-using-bsearch.patch and siftup-reverse-linear.patch). Neither approach seemed terribly promising in my testing, IIRC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
crummy-external-sort.patch
Description: Binary data
siftup-reverse-linear.patch
Description: Binary data
siftup-using-bsearch.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers