Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-03 Thread Jeff Janes
On Jul 31, 2015 4:22 AM, Jeremy Harris j...@wizmail.org wrote:

 On 30/07/15 02:05, Peter Geoghegan wrote:
  Since heapification is now a big fraction of the total cost of a sort
  sometimes, even where the heap invariant need not be maintained for
  any length of time afterwards, it might be worth revisiting the patch
  to make that an O(n) rather than a O(log n) operation [3].

  O(n log n) ?

 Heapification is O(n) already, whether siftup (existing) or down.

They are both linear on average, but the way we currently do it has an
NlogN worst case, while the other way is linear even in the worst case.

Cheers, Jeff


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-03 Thread Robert Haas
On Sat, Aug 1, 2015 at 9:49 AM, Jeremy Harris j...@wizmail.org wrote:
 On 31/07/15 18:31, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
 Heapification is O(n) already, whether siftup (existing) or down.

 That's not my impression, or what Wikipedia says.  Source?

 Measurements done last year:

 http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org
 (spreadsheet attachment)

 http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org
 (measurement procedure and spreadsheet explanation)

I don't think that running benchmarks is the right way to establish
the asymptotic runtime of an algorithm.  I mean, if you test
quicksort, it will run in much less than O(n^2) time on almost any
input.  But that does not mean that the worst-case run time is
anything other than O(n^2).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-03 Thread Robert Haas
On Sat, Aug 1, 2015 at 9:56 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-07-31 13:31:54 -0400, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
  Heapification is O(n) already, whether siftup (existing) or down.

 That's not my impression, or what Wikipedia says.  Source?

 Building a binary heap via successive insertions is O(n log n), but
 building it directly is O(n). Looks like wikipedia agrees too
 https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap

That doesn't really address the sift-up vs. sift-down question.  Maybe
I'm just confused about the terminology.  I think that Wikipedia
article is saying that if you iterate from the middle element of an
unsorted array towards the beginning, establishing the heap invariant
for every item as you reach it, you will take only O(n) time.  But
that is not what inittapes() does.  It instead starts at the beginning
of the array and inserts each element one after the other.  If this is
any different from building the heap via successive insertions, I
don't understand how.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-01 Thread Andres Freund
On 2015-07-31 13:31:54 -0400, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
  Heapification is O(n) already, whether siftup (existing) or down.
 
 That's not my impression, or what Wikipedia says.  Source?

Building a binary heap via successive insertions is O(n log n), but
building it directly is O(n). Looks like wikipedia agrees too
https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap
I'm pretty sure that there's a bunch of places where we intentionally
build a heap at once instead successively. At least reorderbuffer.c does
so, and it looks like nodeMergeAppend as well (that's why they use
binaryheap_add_unordered and then binaryheap_build).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-01 Thread Jeremy Harris
On 31/07/15 18:31, Robert Haas wrote:
 On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
 Heapification is O(n) already, whether siftup (existing) or down.
 
 That's not my impression, or what Wikipedia says.  Source?

Measurements done last year:

http://www.postgresql.org/message-id/52f35462.3030...@wizmail.org
(spreadsheet attachment)

http://www.postgresql.org/message-id/52f40ce9.1070...@wizmail.org
(measurement procedure and spreadsheet explanation)

-- 
Cheers,
  Jeremy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-31 Thread Robert Haas
On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote:
 Heapification is O(n) already, whether siftup (existing) or down.

That's not my impression, or what Wikipedia says.  Source?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-31 Thread Jeremy Harris
On 30/07/15 02:05, Peter Geoghegan wrote:
 Since heapification is now a big fraction of the total cost of a sort
 sometimes, even where the heap invariant need not be maintained for
 any length of time afterwards, it might be worth revisiting the patch
 to make that an O(n) rather than a O(log n) operation [3].

 O(n log n) ?

Heapification is O(n) already, whether siftup (existing) or down.

It might be worthwhile comparing actual times with a quicksort, given
that a sorted array is trivially a well-formed heap (the reverse is not
true) and that quicksort seems to be cache-friendly.  Presumably there
will be a crossover N where the cache-friendliness k reduction
loses out to the log n penalty for doing a full sort; below this
it would be useful.

You could then declare the tape buffer to be the leading tranche of
work-mem (and dump it right away) and the heap to start with the
remainder.
-- 
Cheers,
  Jeremy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Simon Riggs
On 30 July 2015 at 12:26, Greg Stark st...@mit.edu wrote:


 On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:


 True, you need a heap to hold the next tuple from each tape in the merge
 step. To avoid exceeding work_mem, you'd need to push some tuples from the
 in-memory array to the tape to make room for that. In practice, though, the
 memory needed for the merge step's heap is tiny. Even if you merge 1000
 tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll
 need some logic to share/divide the in-memory array between the heap and
 the in-memory tail of the last tape.


 It's a bit worse than that because we buffer up a significant chunk of the
 tape to avoid randomly seeking between tapes after every tuple read. But I
 think in today's era of large memory we don't need anywhere near the entire
 work_mem just to buffer to avoid random access. Something simple like a
 fixed buffer size per tape probably much less than 1MB/tape.


MERGE_BUFFER_SIZE is currently 0.25 MB, but there was benefit seen above
that. I'd say we should scale that up to 1 MB if memory allows.

Yes, that could be tiny for small number of runs. I mention it because
Heikki's comment that we could use this in the general case would not be
true for larger numbers of runs. Number of runs decreases quickly with more
memory anyway, so the technique is mostly good for larger memory but
certainly not for small memory allocations.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2015 at 4:26 AM, Greg Stark st...@mit.edu wrote:
 I'm a bit confused where the big win comes from though. Is what's going on
 that the external sort only exceeded memory by a small amount  so nearly all
 the tuples are still in memory?

Yes, that's why this can be much faster just as the work_mem threshold
is crossed. You get an almost internal sort, which means you can
mostly quicksort, and you can avoid dumping most tuples. It's still a
pretty nice win when less than half of tuples fit in memory, though --
just not as nice. Below that, the optimization isn't used.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-30 Thread Greg Stark
On Thu, Jul 30, 2015 at 12:09 PM, Heikki Linnakangas hlinn...@iki.fi
wrote:


 True, you need a heap to hold the next tuple from each tape in the merge
 step. To avoid exceeding work_mem, you'd need to push some tuples from the
 in-memory array to the tape to make room for that. In practice, though, the
 memory needed for the merge step's heap is tiny. Even if you merge 1000
 tapes, you only need memory for 1000 tuples in the heap. But yeah, you'll
 need some logic to share/divide the in-memory array between the heap and
 the in-memory tail of the last tape.


It's a bit worse than that because we buffer up a significant chunk of the
tape to avoid randomly seeking between tapes after every tuple read. But I
think in today's era of large memory we don't need anywhere near the entire
work_mem just to buffer to avoid random access. Something simple like a
fixed buffer size per tape probably much less than 1MB/tape.

I'm a bit confused where the big win comes from though. Is what's going on
that the external sort only exceeded memory by a small amount  so nearly
all the tuples are still in memory?


-- 
greg