Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-07 Thread Peter Geoghegan
On Fri, Jul 31, 2015 at 12:59 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/31/2015 02:01 AM, Peter Geoghegan wrote: What prevents the tuple at the top of the in-memory heap at the point of tuplesort_performsort() (say, one of the ones added to the heap as our glut of memory

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-06 Thread Peter Geoghegan
On Tue, Aug 4, 2015 at 1:24 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Yeah, I don't think there's a big performance difference between the two approaches. I'm not wedded to either approach. Whichever approach we use, my main point was that it would be better to handle this in the logical

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-04 Thread Peter Geoghegan
On Tue, Aug 4, 2015 at 1:24 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Yeah, something like that. To paraphrase, if I'm now understanding it correctly, Peter's idea is: When all the tuples have been fed to tuplesort, and it's time to perform the sort, quicksort all the tuples currently in

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-04 Thread Heikki Linnakangas
On 08/03/2015 11:36 PM, Robert Haas wrote: On Mon, Aug 3, 2015 at 3:33 PM, Peter Geoghegan p...@heroku.com wrote: When it's time to drain the heap, in performsort, divide the array into two arrays, based on the run number of each tuple, and then quicksort the arrays separately. The first array

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-03 Thread Peter Geoghegan
On Mon, Aug 3, 2015 at 1:36 PM, Robert Haas robertmh...@gmail.com wrote: I don't think that's what Heikki is talking about. He can correct me if I'm wrong, but what I think he's saying is that we should try to exploit the fact that we've already determined which in-memory tuples can be part

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-03 Thread Peter Geoghegan
On Fri, Jul 31, 2015 at 12:59 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Oh, ok, I was confused on how the heap works. You could still abstract this as in-memory tails of the tapes, but it's more complicated than I thought at first: When it's time to drain the heap, in performsort, divide

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-03 Thread Robert Haas
On Mon, Aug 3, 2015 at 3:33 PM, Peter Geoghegan p...@heroku.com wrote: When it's time to drain the heap, in performsort, divide the array into two arrays, based on the run number of each tuple, and then quicksort the arrays separately. The first array becomes the in-memory tail of the current

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-31 Thread Peter Geoghegan
On Thu, Jul 30, 2015 at 4:01 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jul 30, 2015 at 12:00 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Hmm. You don't really need to merge the in-memory array into the tape, as you know that all the tuples in the in-memory must come after the tuples

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-31 Thread Heikki Linnakangas
On 07/31/2015 02:01 AM, Peter Geoghegan wrote: What prevents the tuple at the top of the in-memory heap at the point of tuplesort_performsort() (say, one of the ones added to the heap as our glut of memory was*partially* consumed) being less than the last/greatest tuple on tape? If the answer

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2015 at 12:00 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Hmm. You don't really need to merge the in-memory array into the tape, as you know that all the tuples in the in-memory must come after the tuples already on the tape. You can just return all the tuples from the tape

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2015 at 11:32 AM, Robert Haas robertmh...@gmail.com wrote: Very interesting. And great performance numbers. Thanks for taking the time to investigate this - really cool. Thanks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2015 at 3:47 AM, Simon Riggs si...@2ndquadrant.com wrote: This is a good optimization for the common case where tuples are mostly already in order. We could increase the usefulness of this by making UPDATE pick blocks that are close to a tuple's original block, rather than

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Robert Haas
On Wed, Jul 29, 2015 at 9:05 PM, Peter Geoghegan p...@heroku.com wrote: The behavior of external sorts that do not require any merge step due to only having one run (what EXPLAIN ANALYZE output shows as an external sort, and not a merge sort) seems like an area that can be significantly

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Heikki Linnakangas
On 07/30/2015 04:05 AM, Peter Geoghegan wrote: Patch -- quicksort with spillover = With the attached patch, I propose to add an additional, better one run special case optimization. This new special case splits the single run into 2 subruns. One of the runs is comprised

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Simon Riggs
On 30 July 2015 at 08:00, Heikki Linnakangas hlinn...@iki.fi wrote: Hmm. You don't really need to merge the in-memory array into the tape, as you know that all the tuples in the in-memory must come after the tuples already on the tape. You can just return all the tuples from the tape first,

Re: [HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Heikki Linnakangas
On 07/30/2015 01:47 PM, Simon Riggs wrote: On 30 July 2015 at 08:00, Heikki Linnakangas hlinn...@iki.fi wrote: So here's a shorter/different explanation of this optimization: When it's time to perform the sort, instead of draining the in-memory heap one tuple at a time to the last tape, you

[HACKERS] Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-29 Thread Peter Geoghegan
The behavior of external sorts that do not require any merge step due to only having one run (what EXPLAIN ANALYZE output shows as an external sort, and not a merge sort) seems like an area that can be significantly improved upon. As noted in code comments, this optimization did not appear in The