Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On Jul 31, 2015 4:22 AM, Jeremy Harris j...@wizmail.org wrote: On 30/07/15 02:05, Peter Geoghegan wrote: Since heapification is now a big fraction of the total cost of a sort sometimes, even where the heap invariant need not be maintained for any length of time afterwards, it might be worth revisiting the patch to make that an O(n) rather than a O(log n) operation [3]. O(n log n) ? Heapification is O(n) already, whether siftup (existing) or down. They are both linear on average, but the way we currently do it has an NlogN worst case, while the other way is linear even in the worst case. Cheers, Jeff
Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On Sat, Aug 1, 2015 at 9:49 AM, Jeremy Harris j...@wizmail.org wrote: On 31/07/15 18:31, Robert Haas wrote: On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote: Heapification is O(n) already, whether siftup (existing) or down. That's not my impression, or what Wikipedia says. Source? Measurements done last year: http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org (spreadsheet attachment) http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org (measurement procedure and spreadsheet explanation) I don't think that running benchmarks is the right way to establish the asymptotic runtime of an algorithm. I mean, if you test quicksort, it will run in much less than O(n^2) time on almost any input. But that does not mean that the worst-case run time is anything other than O(n^2). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On Sat, Aug 1, 2015 at 9:56 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-31 13:31:54 -0400, Robert Haas wrote: On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote: Heapification is O(n) already, whether siftup (existing) or down. That's not my impression, or what Wikipedia says. Source? Building a binary heap via successive insertions is O(n log n), but building it directly is O(n). Looks like wikipedia agrees too https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap That doesn't really address the sift-up vs. sift-down question. Maybe I'm just confused about the terminology. I think that Wikipedia article is saying that if you iterate from the middle element of an unsorted array towards the beginning, establishing the heap invariant for every item as you reach it, you will take only O(n) time. But that is not what inittapes() does. It instead starts at the beginning of the array and inserts each element one after the other. If this is any different from building the heap via successive insertions, I don't understand how. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On 2015-07-31 13:31:54 -0400, Robert Haas wrote: On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote: Heapification is O(n) already, whether siftup (existing) or down. That's not my impression, or what Wikipedia says. Source? Building a binary heap via successive insertions is O(n log n), but building it directly is O(n). Looks like wikipedia agrees too https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap I'm pretty sure that there's a bunch of places where we intentionally build a heap at once instead successively. At least reorderbuffer.c does so, and it looks like nodeMergeAppend as well (that's why they use binaryheap_add_unordered and then binaryheap_build). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On 31/07/15 18:31, Robert Haas wrote: On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote: Heapification is O(n) already, whether siftup (existing) or down. That's not my impression, or what Wikipedia says. Source? Measurements done last year: http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org (spreadsheet attachment) http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org (measurement procedure and spreadsheet explanation) -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote: Heapification is O(n) already, whether siftup (existing) or down. That's not my impression, or what Wikipedia says. Source? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On 30/07/15 02:05, Peter Geoghegan wrote: Since heapification is now a big fraction of the total cost of a sort sometimes, even where the heap invariant need not be maintained for any length of time afterwards, it might be worth revisiting the patch to make that an O(n) rather than a O(log n) operation [3]. O(n log n) ? Heapification is O(n) already, whether siftup (existing) or down. It might be worthwhile comparing actual times with a quicksort, given that a sorted array is trivially a well-formed heap (the reverse is not true) and that quicksort seems to be cache-friendly. Presumably there will be a crossover N where the cache-friendliness k reduction loses out to the log n penalty for doing a full sort; below this it would be useful. You could then declare the tape buffer to be the leading tranche of work-mem (and dump it right away) and the heap to start with the remainder. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On 30 July 2015 at 12:26, Greg Stark st...@mit.edu wrote: On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas hlinn...@iki.fi wrote: True, you need a heap to hold the next tuple from each tape in the merge step. To avoid exceeding work_mem, you'd need to push some tuples from the in-memory array to the tape to make room for that. In practice, though, the memory needed for the merge step's heap is tiny. Even if you merge 1000 tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll need some logic to share/divide the in-memory array between the heap and the in-memory tail of the last tape. It's a bit worse than that because we buffer up a significant chunk of the tape to avoid randomly seeking between tapes after every tuple read. But I think in today's era of large memory we don't need anywhere near the entire work_mem just to buffer to avoid random access. Something simple like a fixed buffer size per tape probably much less than 1MB/tape. MERGE_BUFFER_SIZE is currently 0.25 MB, but there was benefit seen above that. I'd say we should scale that up to 1 MB if memory allows. Yes, that could be tiny for small number of runs. I mention it because Heikki's comment that we could use this in the general case would not be true for larger numbers of runs. Number of runs decreases quickly with more memory anyway, so the technique is mostly good for larger memory but certainly not for small memory allocations. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On Thu, Jul 30, 2015 at 4:26 AM, Greg Stark st...@mit.edu wrote: I'm a bit confused where the big win comes from though. Is what's going on that the external sort only exceeded memory by a small amount so nearly all the tuples are still in memory? Yes, that's why this can be much faster just as the work_mem threshold is crossed. You get an almost internal sort, which means you can mostly quicksort, and you can avoid dumping most tuples. It's still a pretty nice win when less than half of tuples fit in memory, though -- just not as nice. Below that, the optimization isn't used. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort
On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas hlinn...@iki.fi wrote: True, you need a heap to hold the next tuple from each tape in the merge step. To avoid exceeding work_mem, you'd need to push some tuples from the in-memory array to the tape to make room for that. In practice, though, the memory needed for the merge step's heap is tiny. Even if you merge 1000 tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll need some logic to share/divide the in-memory array between the heap and the in-memory tail of the last tape. It's a bit worse than that because we buffer up a significant chunk of the tape to avoid randomly seeking between tapes after every tuple read. But I think in today's era of large memory we don't need anywhere near the entire work_mem just to buffer to avoid random access. Something simple like a fixed buffer size per tape probably much less than 1MB/tape. I'm a bit confused where the big win comes from though. Is what's going on that the external sort only exceeded memory by a small amount so nearly all the tuples are still in memory? -- greg