Re: [HACKERS] Minor performance improvement in transition to external sort

```On 06/02/14 14:22, Robert Haas wrote:
```
```On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris <j...@wizmail.org> wrote:
```
```On 05/02/14 21:56, Robert Haas wrote:
```
```On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <j...@wizmail.org> wrote:
```
```The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.```
```
Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).
```
```

I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
n); doing that n times is therefore O(n lg n).  Your patch likewise
implements an outer loop which runs O(n) times and an inner loop which
runs at most O(lg n) times, so a naive analysis of that algorithm also
comes out to O(n lg n).
```
```
The patch implements a siftdown.  Measurements attached.
```
```
Uh, this spreadsheet is useless.  You haven't explained how these
numbers were generated, or what the columns in the spreadsheet
actually mean.  Ideally, you should include enough information that
someone else can try to reproduce your results.

```
```
I'm sorry I wasn't clear.

Test code was groups of the form:

set work_mem to 1024;
CREATE TABLE jgh (i integer);
INSERT INTO jgh (i) SELECT 20000 * random() FROM generate_series(1, 20000);
VACUUM ANALYZE jgh;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to off;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to on;
DROP TABLE jgh;

.. for the tabulated work_mem, and scaled values for the INSERT.

Compares were counted for the heapify stage (only), and timings
taken using pg_rusage_init() before and pg_rusage_show() after
it.  Times are in seconds.

Baseline value for compares and time used 858ec11858.
Sift-down compares and time are for the patch.

"Baseline K cmps" is compares divided by work_mem (which should
be a scaled version of compares-per-tuple being heapified) pre-patch.
"Siftdown K cmps" is likewise with the patch.

"K ratio cmps" is the ratio  (patched compares)/(unpatched compares).

Repeat for time measurements.

--
Cheers,
Jeremy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
```