Re: [HACKERS] The case for removing replacement selection sort

2017-09-29 Thread Peter Geoghegan
On Fri, Sep 29, 2017 at 7:19 AM, Robert Haas  wrote:
> That supports your theory that there's some confounding factor in the
> CREATE INDEX case, such as I/O scheduling.  Since this machine has an
> SSD, I guess I don't have a mental model for how that works.  We're
> not waiting for the platter to rotate...

Random I/O is still significantly more expensive with SSDs, especially
random writes, where all the wear leveling stuff comes into play.
There is a tiny universe of very complicated firmware within every SSD
[1]. (I am generally concerned about the trend towards increasingly
complicated, unauditable firmware like this, but that's another
story.)

> ...but I guess that's all irrelevant as far as this patch goes.  The
> point of this patch is to simplify things from removing a technique
> that is no longer effective, and the evidence we have supports the
> contention that it is no longer effective.  I'll go commit this.

Thanks.

[1] https://lwn.net/Articles/353411/
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-29 Thread Robert Haas
On Thu, Sep 28, 2017 at 6:44 PM, Peter Geoghegan  wrote:
> I'm glad to hear that. But, I should reiterate that if sorting
> actually gets faster when my patch is applied, then that's something
> that I consider to be a bonus. (This is primarily a refactoring patch,
> to remove a bunch of obsolete code.)

I understand that.  The tests above weren't about your patch; they
were about whether replacement selection still has value.

>> Any idea what causes this regression?
>
> I don't know. My best guess is that the overall I/O scheduling is now
> suboptimal due to commit e94568e (Heikki's preloading thing), because
> this is CREATE INDEX, and there is new competitive pressure. You might
> find it hard to replicate the problem with a "SELECT COUNT(DISTINCT
> aid) FROM pgbench_accounts", which would confirm this explanation. Or,
> you could also see what happens with a separate temp tablespace.

I tried out that test case, again on 9.6 and master, again with scale
factor 300.  On 9.6, with default settings, it took ~12.5 seconds.
With 4MB of work_mem, it took about ~13.2 s with replacement selection
and ~12.5 s using quicksorted runs.  On master, with default settings,
it took ~8.4 seconds.  With 4MB of work_mem, it took about ~12.3 s
using replacement selection and ~8.4 seconds using quicksorted runs.
So here, everything was faster on master, but replacement selection
was only slightly faster while the other technique was a lot faster.

That supports your theory that there's some confounding factor in the
CREATE INDEX case, such as I/O scheduling.  Since this machine has an
SSD, I guess I don't have a mental model for how that works.  We're
not waiting for the platter to rotate...

...but I guess that's all irrelevant as far as this patch goes.  The
point of this patch is to simplify things from removing a technique
that is no longer effective, and the evidence we have supports the
contention that it is no longer effective.  I'll go commit this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 3:18 PM, Robert Haas  wrote:
> On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
>> With the additional enhancements made to Postgres 10, I doubt that
>> there are any remaining cases where it wins.
>
> I tried my favorite sorting test case -- pgbench -i -s 300 and then
> reindex index pgbench_accounts_pkey.  I set maintenance_work_mem to
> 4MB so that replacement selection would be chosen.
>
> On master, this takes ~21.5 seconds; with replacement_sort_tuples = 0,
> it takes ~19.1 seconds.  So, disabling replacement selection is a win.
>
> On 9.6, this takes ~19.1 seconds; with replacement_sort_tuples = 0, it
> takes ~23 seconds.  So, disabling replacement selection is a loss.
>
> This supports your theory that we should go ahead and remove
> replacement selection.

I'm glad to hear that. But, I should reiterate that if sorting
actually gets faster when my patch is applied, then that's something
that I consider to be a bonus. (This is primarily a refactoring patch,
to remove a bunch of obsolete code.)

> It does however both me a bit that this test
> case is apparently slower in master than in 9.6, and not just with
> very small values of work_mem.  With default maintenance_work_mem
> (64MB), this takes about 13 seconds on 9.6 but about 15 seconds on
> master -- and with that setting replacement selection is not chosen at
> all.

As I mentioned, the picture changed for replacement selection during
the Postgres 10 development cycle, which caused me to finally push for
it to be killed in this thread. So, anyone that just started following
the thread should note that it's totally expected that replacement
selection still just about pulls its weight in 9.6, but not in 10.

> Any idea what causes this regression?

I don't know. My best guess is that the overall I/O scheduling is now
suboptimal due to commit e94568e (Heikki's preloading thing), because
this is CREATE INDEX, and there is new competitive pressure. You might
find it hard to replicate the problem with a "SELECT COUNT(DISTINCT
aid) FROM pgbench_accounts", which would confirm this explanation. Or,
you could also see what happens with a separate temp tablespace.

It's really, really hard to have a 100%, unambiguous win within
tuplesort.c. We've seen this time and again over the years.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-28 Thread Robert Haas
On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
> With the additional enhancements made to Postgres 10, I doubt that
> there are any remaining cases where it wins.

I tried my favorite sorting test case -- pgbench -i -s 300 and then
reindex index pgbench_accounts_pkey.  I set maintenance_work_mem to
4MB so that replacement selection would be chosen.

On master, this takes ~21.5 seconds; with replacement_sort_tuples = 0,
it takes ~19.1 seconds.  So, disabling replacement selection is a win.

On 9.6, this takes ~19.1 seconds; with replacement_sort_tuples = 0, it
takes ~23 seconds.  So, disabling replacement selection is a loss.

This supports your theory that we should go ahead and remove
replacement selection.  It does however both me a bit that this test
case is apparently slower in master than in 9.6, and not just with
very small values of work_mem.  With default maintenance_work_mem
(64MB), this takes about 13 seconds on 9.6 but about 15 seconds on
master -- and with that setting replacement selection is not chosen at
all.

Any idea what causes this regression?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-27 Thread Peter Geoghegan
On Wed, Sep 27, 2017 at 12:06 AM, Simon Riggs  wrote:
>> I think we should remove the replacement_sort_tuples GUC, and kill
>> replacement selection entirely. There is no need to do this for
>> Postgres 10. I don't feel very strongly about it. It just doesn't make
>> sense to continue to support replacement selection.
>
> Forgive me if I missed the explanation, but how will we handle bounded sorts?

No change there at all. If anything they will probably be slightly
faster, because the heap management routines don't have to work with a
special heap comparator that compares run number, but only when
replacement selection is in use.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-27 Thread Simon Riggs
On 14 July 2017 at 23:20, Peter Geoghegan  wrote:

> I think we should remove the replacement_sort_tuples GUC, and kill
> replacement selection entirely. There is no need to do this for
> Postgres 10. I don't feel very strongly about it. It just doesn't make
> sense to continue to support replacement selection.

Forgive me if I missed the explanation, but how will we handle bounded sorts?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-15 Thread Peter Geoghegan
Tomas' e-mail from earlier today, that I've already replied to
directly, seems to have been lost to the mailing list. This must be
due to having a 1MB attachment (results spreadsheet), which seems a
bit aggressive as a reason to withold it IMV.

Here is a link to his results, converted to a Google docs spreadsheet:

https://docs.google.com/spreadsheets/d/1SL_IIkPdiJUZ9BHUgROcYkEKWZUch4wRax2SyOmMuT8/edit?usp=sharing

Here is his e-mail, in full:

On Fri, Sep 15, 2017 at 6:34 AM, Tomas Vondra
 wrote:
> Attached is a spreadsheet with updated data (using the more accurate
> timing, and comparing master with the default replacement_sort_tuples
> value (150k) and increased per Peter's instructions (to 1B).
>
> There are 6 sheets - one for each combination of a dataset size (100k
> and 1M) and machine where I ran the tests (with different CPU models).
>
> There are 5 columns - first three are medians for each of the tested
> PostgreSQL configurations:
>
> - master (default)
> - master (1billion)
> - no-replacement-sort (with patch applied)
>
> The numbers are medians from 5 runs (perhaps minimums would be better in
> this case).
>
> The last two columns are comparisons / speed-ups
>
> - master(1B) / master(default)
> - no-replacement-sort / master(default)
>
> Green numbers (below 1.0) mean speed-up, red (above 1.0) slow-down.
>
> Firstly, the master with r_s_t=1B setting performs either the same or
> worse compared to a default values, in almost every test. On the 100k
> data set the results are a bit noisy (particularly on the oldest CPU),
> but on the 1M data set the difference is quite clear. So I've only
> compared results for master(default) and patched.
>
> Quick summary, for each CPU model (which clearly affects the behavior).
>
>
> e5-2620-v4
> --
> - probably the CPU we should be looking at, as it's the current model
> - in some cases this gives us 3-5x speedup with low work_mem settings
> - consider for example the very last line, which shows that
>
>   SELECT DISTINCT a FROM text_test_padding ORDER BY a
>
> completed in ~5531ms on work_mem=8MB and 1067ms on 32MB, but with the
> patch it completes in 1784ms (1MB), 1211ms (4MB) and 1104 (8MB).
>
> - Of course, this is for already-sorted data, and for other data sets
> the improvement is more modest. It's difficult to summarize this into a
> single number, but there are plenty of improvements in 20-30% range.
>
> - Some cases are a bit slower, of course, but overall I'd say the chart
> is much more green than red. Also the slow-downs are much smaller,
> compared to speed-ups (generally within 1-5%).
>
> i5-2500k
> 
> - same story, but this CPU gives more stable results (less noise)
>
> e5-5450
> ---
> - rather noisy CPU, particularly on the small (100k) dataset
> - the larger data set mostly matches the other CPUs, although the
> regressions are somewhat more significant
> - I wouldn't really worry about this too much, it's clearly an obsolete
> CPU and not something performance-conscious person would use nowadays
> (the other CPUs are often 2-3x faster).
>
> If needed, full data is available here (each machine is pushing data to
> a separate git repository):
>
> * https://bitbucket.org/tvondra/sort-bench-i5-2500k
> * https://bitbucket.org/tvondra/sort-bench-e5-2620-v4
> * https://bitbucket.org/tvondra/sort-bench-e5-5450
>
> At this point the 10M row tests are running, but I don't expect anything
> particularly surprising from those results. That is, it's not something
> that should block getting this patch committed, if the agreement is to
> commit otherwise.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-15 Thread Peter Geoghegan
On Fri, Sep 15, 2017 at 6:34 AM, Tomas Vondra
 wrote:
> e5-2620-v4
> --
> - probably the CPU we should be looking at, as it's the current model
> - in some cases this gives us 3-5x speedup with low work_mem settings
> - consider for example the very last line, which shows that
>
>   SELECT DISTINCT a FROM text_test_padding ORDER BY a
>
> completed in ~5531ms on work_mem=8MB and 1067ms on 32MB, but with the
> patch it completes in 1784ms (1MB), 1211ms (4MB) and 1104 (8MB).

I think that this is because of the importance of getting a final
on-the-fly merge. Overlapping I/O with computation matters a lot here.

> - Of course, this is for already-sorted data, and for other data sets
> the improvement is more modest. It's difficult to summarize this into a
> single number, but there are plenty of improvements in 20-30% range.
>
> - Some cases are a bit slower, of course, but overall I'd say the chart
> is much more green than red. Also the slow-downs are much smaller,
> compared to speed-ups (generally within 1-5%).

The larger regressions mostly involve numeric. We have two
palloc()/pfree() cycles in the standard numeric comparator (one for
each datum that is compared), a cost which could be removed by
enhancing numeric SortSupport directly. That's why the numeric
abbreviated key stuff was the most effective (more effective than
text) when added to 9.5. We currently have very expensive full
comparisons of datums within L1 cache ("merge comparisons") relative
to abbreviated key comparisons -- the difference is huge, but could be
reduced.

Anyway, it seems ironic that in those few cases where replacement
selection still does better, it seems to be due to reduced CPU costs,
not reduced I/O costs. This is the opposite benefit to the one you'd
expect from reading Knuth.

Thanks for benchmarking! I hope that this removes the doubt that
replacement selection previously benefited from; it now clearly
deserves to be removed from tuplesort.c.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:55 PM, Peter Geoghegan  wrote:
> On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
>  wrote:
>>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>>
>> This breaks the documentation build, because
>> doc/src/sgml/release-9.6.sgml still contains > linkend="guc-replacement-sort-tuples"> but you removed that id.
>
> Thanks for looking into it.
>
> I suppose that the solution is to change the 9.6 release notes to no
> longer have a replacement_sort_tuples link. Anyone else have an
> opinion on that?

Attached is a revision of the patch that no longer breaks the
documentation build, by using a literal tag to refer to
replacement_sort_tuples within doc/src/sgml/release-9.6.sgml. The
patch is otherwise unchanged, and so reviewers shouldn't bother with
it (I just want to unbreak Thomas' continuous integration build, and
to save a committer the hassle of fixing the doc build themselves). I
verified that "make check-world" passes this time.

I also eyeballed the html generated by a "make STYLE=website html", to
ensure that it looked consistent with its surroundings. The resulting
9.6 release notes looked good to me.

-- 
Peter Geoghegan
From a22a77d09a5fb37713d8486db84e9619acc449d3 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 30 Aug 2017 16:14:36 -0700
Subject: [PATCH] Remove replacement selection sort.

This simplifies tuplesort.c, since heap maintenance routines need no
longer concern themselves with cases where two runs are represented
within a heap.

Replacement selection sort's traditional advantages no longer allow it
to outperform a simple sort-merge strategy, even in sympathetic cases.
This is due to long term trends in hardware performance characteristics,
and enhancements to the tuplesort.c merge code in the past couple of
years.

This will need to be called out as an incompatibility in the v11 release
notes, since it's possible that users set replacement_sort_tuples
themselves.

Discussion: https://postgr.es/m/cah2-wzmmnjg_k0r9nqywmq3zjyjjk+hcbizynghay-zyjs6...@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  39 ---
 doc/src/sgml/release-9.6.sgml |   2 +-
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 415 +++---
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 9 files changed, 52 insertions(+), 448 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255e..7c625e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1513,45 +1513,6 @@ include_dir 'conf.d'
   
  
 
- 
-  replacement_sort_tuples (integer)
-  
-   replacement_sort_tuples configuration parameter
-  
-  
-  
-   
-When the number of tuples to be sorted is smaller than this number,
-a sort will produce its first output run using replacement selection
-rather than quicksort.  This may be useful in memory-constrained
-environments where tuples that are input into larger sort operations
-have a strong physical-to-logical correlation.  Note that this does
-not include input tuples with an inverse
-correlation.  It is possible for the replacement selection algorithm
-to generate one long run that requires no merging, where use of the
-default strategy would result in many runs that must be merged
-to produce a final sorted output.  This may allow sort
-operations to complete sooner.
-   
-   
-The default is 150,000 tuples.  Note that higher values are typically
-not much more effective, and may be counter-productive, since the
-priority queue is sensitive to the size of available CPU cache, whereas
-the default strategy sorts runs using a cache
-oblivious algorithm.  This property allows the default sort
-strategy to automatically and transparently make effective use
-of available CPU cache.
-   
-   
-Setting maintenance_work_mem to its default
-value usually prevents utility command external sorts (e.g.,
-sorts used by CREATE INDEX to build B-Tree
-indexes) from ever using replacement selection sort, unless the
-input tuples are quite wide.
-   
-  
- 
-
  
   autovacuum_work_mem (integer)
   
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index fa5355f..603300a 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ -5110,7 +5110,7 @@ and many others in the same vein
 The new approach makes better use of the CPU 

Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:50 AM, Robert Haas  wrote:
> On Mon, Sep 11, 2017 at 11:47 AM, Tomas Vondra
>  wrote:
>> The question is what is the optimal replacement_sort_tuples value? I
>> assume it's the number of tuples that effectively uses CPU caches, at
>> least that's what our docs say. So I think you're right it to 1B rows
>> may break this assumption, and make it perform worse.
>>
>> But perhaps the fact that we're testing with multiple work_mem values,
>> and with smaller data sets (100k or 1M rows) makes this a non-issue?
>
> I am not sure that's the case -- I think that before Peter's changes
> it was pretty easy to find cases where lowering work_mem made sorting
> ordered data go faster.

Before enhancements to merging by both Heikki and myself went into
Postgres 10, there were cases where replacement selection of the first
run did relatively well, and cases where it did badly despite managing
to avoid a merge. The former were cases where work_mem was low, and
the latter were cases where it was high. This was true because the
heap used is cache inefficient. When it was small/cache efficient, and
when there was ordering to exploit, replacement selection could win.
Your recollection there sounds accurate to me.

These days, even a small, cache-efficient heap is generally not good
enough to beat quicksorting, since merging attained the ability to
exploit presortedness itself within commit 24598337c, and memory
utilization for merging got better, too. Very roughly speaking,
merging attained the same advantage that replacement selection had all
along, and replacement selection lost all ability to compete.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:47 AM, Tomas Vondra
 wrote:
> The question is what is the optimal replacement_sort_tuples value?

See my remarks to Robert just now.

I think that it's incredibly hard to set replacement_sort_tuples
sensibly in 9.6. As of Postgres 10, it is much more likely to hurt
than to help, because of the enhancements to merging that went into
Postgres 10 reduced the downside of not using replacement selection.
And so, for Postgres 11 replacement_sort_tuples deserves to be
removed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:32 AM, Robert Haas  wrote:
> On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan  wrote:
>> To be clear, you'll still need to set replacement_sort_tuples high
>> when testing RS, to make sure that we really use it for at least the
>> first run when we're expected to. (There is no easy way to have
>> testing mechanically verify that we really do only have one run in the
>> end with RS, but I assume that such paranoia is unneeded.)
>
> I seem to recall that raising replacement_sort_tuples makes
> replacement selection look worse in some cases -- the optimization is
> more likely to apply, sure, but the heap is also bigger, which hurts.

But that was only because work_mem was set relatively high. If you're
going to test work_mem values that are slightly on the high side for
replacement selection (like, 8MB or 32MB), then increasing
replacement_sort_tuples ensures that replacement selection is actually
used for at least the first run, and you don't waste time comparing a
behavior (quicksorting runs) to itself.

All that matters is whether or not replacement_sort_tuples exceeds the
number of tuples that the first run would have if quicksorted
immediately. replacement_sort_tuples is only ever used to answer a
single yes/no question.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:17 AM, Tomas Vondra
 wrote:
> Overall I think the results show quite significant positive impact of
> the patch. There are a few cases of regression, but ISTM those may
> easily be noise as it's usually 0.03 vs 0.04 second, or something. I'll
> switch to the \timing (instead of /usr/bin/time) to get more accurate
> results, and rerun those tests.

I'm glad to hear it. While I'm not surprised, I still don't consider
the patch to be a performance enhancement. It is intended to lower the
complexity of tuplesort.c, and the overall performance improvement is
just a bonus IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 11:47 AM, Tomas Vondra
 wrote:
> The question is what is the optimal replacement_sort_tuples value? I
> assume it's the number of tuples that effectively uses CPU caches, at
> least that's what our docs say. So I think you're right it to 1B rows
> may break this assumption, and make it perform worse.
>
> But perhaps the fact that we're testing with multiple work_mem values,
> and with smaller data sets (100k or 1M rows) makes this a non-issue?

I am not sure that's the case -- I think that before Peter's changes
it was pretty easy to find cases where lowering work_mem made sorting
ordered data go faster.

But I could easily be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Tomas Vondra
On 09/11/2017 05:32 PM, Robert Haas wrote:
> On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan  wrote:
>> To be clear, you'll still need to set replacement_sort_tuples high
>> when testing RS, to make sure that we really use it for at least the
>> first run when we're expected to. (There is no easy way to have
>> testing mechanically verify that we really do only have one run in the
>> end with RS, but I assume that such paranoia is unneeded.)
> 
> I seem to recall that raising replacement_sort_tuples makes
> replacement selection look worse in some cases -- the optimization is
> more likely to apply, sure, but the heap is also bigger, which hurts.
> 

The question is what is the optimal replacement_sort_tuples value? I
assume it's the number of tuples that effectively uses CPU caches, at
least that's what our docs say. So I think you're right it to 1B rows
may break this assumption, and make it perform worse.

But perhaps the fact that we're testing with multiple work_mem values,
and with smaller data sets (100k or 1M rows) makes this a non-issue?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Robert Haas
On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan  wrote:
> To be clear, you'll still need to set replacement_sort_tuples high
> when testing RS, to make sure that we really use it for at least the
> first run when we're expected to. (There is no easy way to have
> testing mechanically verify that we really do only have one run in the
> end with RS, but I assume that such paranoia is unneeded.)

I seem to recall that raising replacement_sort_tuples makes
replacement selection look worse in some cases -- the optimization is
more likely to apply, sure, but the heap is also bigger, which hurts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Peter Geoghegan
On Sun, Sep 10, 2017 at 5:59 PM, Tomas Vondra
 wrote:
> OK, so 1MB, 4MB, 8MB, 32MB?

Right.

> Ah, so you suggest doing all the tests on current master, by only
> tweaking the replacement_sort_tuples value? I've been testing master vs.
> your patch, but I guess setting replacement_sort_tuples=0 should have
> the same effect.

I think that there may be a very slight advantage to RS-free
performance with my patch actually applied, because it removes the
number of instructions that heap maintenance (merging) routines
consist of. This is due to my removing HEAPCOMPARE()/tupindex
handling. However, it's probably a low single digit percentage
difference -- not enough to be worth taking into account, probably. I
assume that just setting replacement_sort_tuples to zero is more
convenient for you (that's what I did).

To be clear, you'll still need to set replacement_sort_tuples high
when testing RS, to make sure that we really use it for at least the
first run when we're expected to. (There is no easy way to have
testing mechanically verify that we really do only have one run in the
end with RS, but I assume that such paranoia is unneeded.)

> I probably won't eliminate the random/DESC data sets, though. At least
> not from the two smaller data sets - I want to do a bit of benchmarking
> on Heikki's polyphase merge removal patch, and for that patch those data
> sets are still relevant. Also, it's useful to have a subset of results
> where we know we don't expect any change.

Okay. The DESC dataset is going to make my patch look good, because it
won't change anything for merging (same number of runs in the end),
but sorting will be slower for the first run with RS.

> Meh, more data is probably better. And with the reduced work_mem values
> and skipping of random/DESC data sets it should complete much faster.

Note that my own test case had a much higher number of tuples than
even 10 million -- it had 100 million tuples. I think that if any of
your test cases bring about a new insight, it will not be due to the
number of distinct tuples. But, if the extra time it takes doesn't
matter to you, then it doesn't matter to me either.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Tomas Vondra
On 09/11/2017 02:22 AM, Peter Geoghegan wrote:
> On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra
>  wrote:
>> I'm currently re-running the benchmarks we did in 2016 for 9.6, but
>> those are all sorts with a single column (see the attached script). But
>> it'd be good to add a few queries testing sorts with multiple keys. We
>> can either tweak some of the existing data sets + queries, or come up
>> with entirely new tests.
> 
> I see that work_mem is set like this in the script:
> 
> "for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do"
> 
> I suggest that we forget about values over 32MB, since the question of
> how well quicksort does there was settled by your tests in 2016. I
> would also add '4MB' to the list of wm values that you'll actually
> test.

OK, so 1MB, 4MB, 8MB, 32MB?

> 
> Any case with input that is initially in random order or DESC sort 
> order is not interesting, either. I suggest you remove those, too.
> 

OK.

> I think we're only interested in benchmarks where replacement
> selection really does get its putative best case (no merge needed in
> the end). Any (almost) sorted cases (the only cases that you are
> interesting to test now) will always manage that, once you set
> replacement_sort_tuples high enough, and provided there isn't even a
> single tuple that is completely out of order. The "before" cases here
> should have a replacement_sort_tuples of 1 billion (so that we're sure
> to not have the limit prevent the use of replacement selection in the
> first place), versus the "after" cases, which should have a
> replacement_sort_tuples of 0 to represent my proposal (to represent
> performance in a world where replacement selection is totally
> removed).
> 

Ah, so you suggest doing all the tests on current master, by only
tweaking the replacement_sort_tuples value? I've been testing master vs.
your patch, but I guess setting replacement_sort_tuples=0 should have
the same effect.

I probably won't eliminate the random/DESC data sets, though. At least
not from the two smaller data sets - I want to do a bit of benchmarking
on Heikki's polyphase merge removal patch, and for that patch those data
sets are still relevant. Also, it's useful to have a subset of results
where we know we don't expect any change.

>> For the existing queries, I should have some initial results
>> tomorrow, at least for the data sets with 100k and 1M rows. The
>> tests with 10M rows will take much more time (it takes 1-2hours for
>> a single work_mem value, and we're testing 6 of them).
> 
> I myself don't see that much value in a 10M row test.
> 

Meh, more data is probably better. And with the reduced work_mem values
and skipping of random/DESC data sets it should complete much faster.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Peter Geoghegan
On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra
 wrote:
> I'm currently re-running the benchmarks we did in 2016 for 9.6, but
> those are all sorts with a single column (see the attached script). But
> it'd be good to add a few queries testing sorts with multiple keys. We
> can either tweak some of the existing data sets + queries, or come up
> with entirely new tests.

I see that work_mem is set like this in the script:

"for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do"

I suggest that we forget about values over 32MB, since the question of
how well quicksort does there was settled by your tests in 2016. I
would also add '4MB' to the list of wm values that you'll actually
test.

Any case with input that is initially in random order or DESC sort
order is not interesting, either. I suggest you remove those, too.

I think we're only interested in benchmarks where replacement
selection really does get its putative best case (no merge needed in
the end). Any (almost) sorted cases (the only cases that you are
interesting to test now) will always manage that, once you set
replacement_sort_tuples high enough, and provided there isn't even a
single tuple that is completely out of order. The "before" cases here
should have a replacement_sort_tuples of 1 billion (so that we're sure
to not have the limit prevent the use of replacement selection in the
first place), versus the "after" cases, which should have a
replacement_sort_tuples of 0 to represent my proposal (to represent
performance in a world where replacement selection is totally
removed).

> For the existing queries, I should have some initial results tomorrow,
> at least for the data sets with 100k and 1M rows. The tests with 10M
> rows will take much more time (it takes 1-2hours for a single work_mem
> value, and we're testing 6 of them).

I myself don't see that much value in a 10M row test.

Thanks for volunteering to test!
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Tomas Vondra

On 09/11/2017 01:03 AM, Peter Geoghegan wrote:
> On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark  wrote:
>> This may be a bit "how long is a piece of string" but how do those two
>> compare with string sorting in an interesting encoding/locale -- say
>> /usr/share/dict/polish in pl_PL for example. It's certainly true that
>> people do sort text as well as numbers.
> 
> I haven't looked at text, because I presume that it's very rare for 
> tuples within a table to be more or less ordered by a text
> attribute. While the traditional big advantage of replacement
> selection has always been its ability to double initial run length on
> average, where text performance is quite interesting because
> localized clustering still happens, that doesn't really seem relevant
> here. The remaining use of replacement selection is expressly only
> about the "single run, no merge" best case, and in particular about
> avoiding regressions when upgrading from versions prior to 9.6 (9.6
> is the version where we began to generally prefer quicksort).
> 
>> Also, people often sort on keys of more than one column
> 
> Very true. I should test this.
> 

I'm currently re-running the benchmarks we did in 2016 for 9.6, but
those are all sorts with a single column (see the attached script). But
it'd be good to add a few queries testing sorts with multiple keys. We
can either tweak some of the existing data sets + queries, or come up
with entirely new tests.

The current tests construct data sets with different features (unique or
low/high-cardinality, random/sorted/almost-sorted). How should that work
for multi-key sorts? Same features for all columns, or some mix?

For the existing queries, I should have some initial results tomorrow,
at least for the data sets with 100k and 1M rows. The tests with 10M
rows will take much more time (it takes 1-2hours for a single work_mem
value, and we're testing 6 of them).

But perhaps we can reduce the number of tests somehow, only testing the
largest/smallest work_mem values? So that we could get some numbers now,
possibly running the complete test suite later.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


sort-bench.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Peter Geoghegan
On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark  wrote:
> This may be a bit "how long is a piece of string" but how do those two
> compare with string sorting in an interesting encoding/locale -- say
> /usr/share/dict/polish in pl_PL for example. It's certainly true that
> people do sort text as well as numbers.

I haven't looked at text, because I presume that it's very rare for
tuples within a table to be more or less ordered by a text attribute.
While the traditional big advantage of replacement selection has
always been its ability to double initial run length on average, where
text performance is quite interesting because localized clustering
still happens, that doesn't really seem relevant here. The remaining
use of replacement selection is expressly only about the "single run,
no merge" best case, and in particular about avoiding regressions when
upgrading from versions prior to 9.6 (9.6 is the version where we
began to generally prefer quicksort).

> Also, people often sort on
> keys of more than one column

Very true. I should test this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-09 Thread Greg Stark
On 8 September 2017 at 18:06, Peter Geoghegan  wrote:

> * It's still faster with int4/int8 comparisons on modern hardware, and
> I think that most ordered inputs will be of those types in practice.

This may be a bit "how long is a piece of string" but how do those two
compare with string sorting in an interesting encoding/locale -- say
/usr/share/dict/polish in pl_PL for example. It's certainly true that
people do sort text as well as numbers. Also, people often sort on
keys of more than one column

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-08 Thread Peter Geoghegan
On Thu, Sep 7, 2017 at 2:49 PM, Robert Haas  wrote:
> On Thu, Sep 7, 2017 at 5:38 PM, Tomas Vondra
>  wrote:
>> Do we need/want to repeat some of that benchmarking on these patches? I
>> don't recall how much this code changed since those benchmarks were done
>> in the 9.6 cycle.
>
> +1 for some new benchmarking.  I'm all for removing this code if we
> don't need it any more, but it'd be a lot better to have more numbers
> before deciding that.

It isn't always a win. For example, if I alter the pgbench_accounts
table so that the column "aid" is of type numeric, the picture changes
for my test case -- replacement selection is still somewhat faster for
the "select count(distinct aid) from pgbench_accounts" query with
work_mem=2MB. It takes about 5.4 seconds with replacement selection,
versus 7.4 seconds for quicksorting. But, I still think we should
proceed with my patch, because:

* It's still faster with int4/int8 comparisons on modern hardware, and
I think that most ordered inputs will be of those types in practice.
The trade-off between those two properties (faster for int4/int8 when
ordered, slower for numeric) recommends killing replacement selection
entirely. It's not a slam dunk, but it makes sense on performance
grounds, IMV.

* The comparator numeric_fast_cmp() is not well optimized -- it
doesn't avoid allocating memory with each call, for example, unlike
varstrfastcmp_locale(). This could and should be improved, and might
even change the picture for this second test case.

* With the default work_mem of 8MB, we never use replacement selection
anyway. Whatever its merits, it's pretty rare to use replacement
selection simply because the default replacement_sort_tuples just
isn't that  many tuples (150,000).

* The upside of my patch is not inconsiderable: We can remove a lot of
code, which will enable further improvements in the future, which are
far more compelling (cleaner memory management, the use of memory
batches during run generation).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-07 Thread Robert Haas
On Thu, Sep 7, 2017 at 5:38 PM, Tomas Vondra
 wrote:
> Do we need/want to repeat some of that benchmarking on these patches? I
> don't recall how much this code changed since those benchmarks were done
> in the 9.6 cycle.

+1 for some new benchmarking.  I'm all for removing this code if we
don't need it any more, but it'd be a lot better to have more numbers
before deciding that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-07 Thread Tomas Vondra
Hi,

On 08/31/2017 02:56 AM, Peter Geoghegan wrote:
> On Wed, Aug 30, 2017 at 5:38 PM, Robert Haas  wrote:
>> Wow.  Just to be clear, I am looking for the BEST case for replacement
>> selection, not the worst case.  But I would have expected that case to
>> be a win for replacement selection, and it clearly isn't.  I can
>> reproduce your results here.
> 
> But I *was* trying to get a best case. That's why it isn't even worse.
> That's what the docs say the best case is, after all.
> 
> This is the kind of thing that replacement selection actually did do
> better with on 9.6. I clearly remember Tomas Vondra doing lots of
> benchmarking, showing some benefit with RS with a work_mem of 8MB or
> less. As I said in my introduction on this thread, we weren't wrong to
> add replacement_sort_tuples to 9.6, given where things were with
> merging at the time. But, it does very much appear to create less than
> zero benefit these days. The picture changed.
> 

Do we need/want to repeat some of that benchmarking on these patches? I
don't recall how much this code changed since those benchmarks were done
in the 9.6 cycle.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
 wrote:
>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>
> This breaks the documentation build, because
> doc/src/sgml/release-9.6.sgml still contains  linkend="guc-replacement-sort-tuples"> but you removed that id.

Thanks for looking into it.

I suppose that the solution is to change the 9.6 release notes to no
longer have a replacement_sort_tuples link. Anyone else have an
opinion on that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Thomas Munro
On Fri, Sep 1, 2017 at 6:00 AM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 4:59 PM, Peter Geoghegan  wrote:
>> I may submit the simple patch to remove replacement selection, if
>> other contributors are receptive. Apart from everything else, the
>> "incrementalism" of replacement selection works against cleverer batch
>> memory management of the type I just outlined, which seems like a
>> useful direction to take tuplesort in.
>
> I attach a patch to remove replacement selection, which I'll submit to CF 1.

This breaks the documentation build, because
doc/src/sgml/release-9.6.sgml still contains  but you removed that id.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-31 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 4:59 PM, Peter Geoghegan  wrote:
> I may submit the simple patch to remove replacement selection, if
> other contributors are receptive. Apart from everything else, the
> "incrementalism" of replacement selection works against cleverer batch
> memory management of the type I just outlined, which seems like a
> useful direction to take tuplesort in.

I attach a patch to remove replacement selection, which I'll submit to CF 1.


-- 
Peter Geoghegan
From 6ccad74113d3a13264a19653e13ef897be5c902d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 30 Aug 2017 16:14:36 -0700
Subject: [PATCH] Remove replacement selection sort.

This simplifies tuplesort.c, since heap maintenance routines need no
longer concern themselves with cases where two runs are represented
within a heap.

Replacement selection sort's traditional advantages no longer allow it
to outperform a simple sort-merge strategy, even in sympathetic cases.
This is due to long term trends in hardware performance characteristics,
and enhancements to the tuplesort.c merge code in the past couple of
years.
---
 doc/src/sgml/config.sgml  |  39 ---
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 415 +++---
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 8 files changed, 51 insertions(+), 447 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255e..7c625e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1513,45 +1513,6 @@ include_dir 'conf.d'
   
  
 
- 
-  replacement_sort_tuples (integer)
-  
-   replacement_sort_tuples configuration parameter
-  
-  
-  
-   
-When the number of tuples to be sorted is smaller than this number,
-a sort will produce its first output run using replacement selection
-rather than quicksort.  This may be useful in memory-constrained
-environments where tuples that are input into larger sort operations
-have a strong physical-to-logical correlation.  Note that this does
-not include input tuples with an inverse
-correlation.  It is possible for the replacement selection algorithm
-to generate one long run that requires no merging, where use of the
-default strategy would result in many runs that must be merged
-to produce a final sorted output.  This may allow sort
-operations to complete sooner.
-   
-   
-The default is 150,000 tuples.  Note that higher values are typically
-not much more effective, and may be counter-productive, since the
-priority queue is sensitive to the size of available CPU cache, whereas
-the default strategy sorts runs using a cache
-oblivious algorithm.  This property allows the default sort
-strategy to automatically and transparently make effective use
-of available CPU cache.
-   
-   
-Setting maintenance_work_mem to its default
-value usually prevents utility command external sorts (e.g.,
-sorts used by CREATE INDEX to build B-Tree
-indexes) from ever using replacement selection sort, unless the
-input tuples are quite wide.
-   
-  
- 
-
  
   autovacuum_work_mem (integer)
   
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 7c09498..9680a4b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -112,7 +112,6 @@ bool		enableFsync = true;
 bool		allowSystemTableMods = false;
 int			work_mem = 1024;
 int			maintenance_work_mem = 16384;
-int			replacement_sort_tuples = 15;
 
 /*
  * Primary determinants of sizes of shared-memory structures.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 246fea8..a459672 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1932,16 +1932,6 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"replacement_sort_tuples", PGC_USERSET, RESOURCES_MEM,
-			gettext_noop("Sets the maximum number of tuples to be sorted using replacement selection."),
-			gettext_noop("When more tuples than this are present, quicksort will be used.")
-		},
-		_sort_tuples,
-		15, 0, INT_MAX,
-		NULL, NULL, NULL
-	},
-
 	/*
 	 * We use the hopefully-safely-small value of 100kB as the compiled-in
 	 * default for max_stack_depth.  InitializeGUCOptions will increase it if
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index df5d2f3..d0177d1 100644
--- 

Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 5:38 PM, Robert Haas  wrote:
> Wow.  Just to be clear, I am looking for the BEST case for replacement
> selection, not the worst case.  But I would have expected that case to
> be a win for replacement selection, and it clearly isn't.  I can
> reproduce your results here.

But I *was* trying to get a best case. That's why it isn't even worse.
That's what the docs say the best case is, after all.

This is the kind of thing that replacement selection actually did do
better with on 9.6. I clearly remember Tomas Vondra doing lots of
benchmarking, showing some benefit with RS with a work_mem of 8MB or
less. As I said in my introduction on this thread, we weren't wrong to
add replacement_sort_tuples to 9.6, given where things were with
merging at the time. But, it does very much appear to create less than
zero benefit these days. The picture changed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 6:14 PM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 3:01 PM, Robert Haas  wrote:
>> That may all be true, but my point is that if it wins in some cases,
>> we should keep it -- and proving it no longer wins in those cases will
>> require running tests.
>
> That's not hard. On my laptop:
>
> $ pgbench -i -s 100
> ...
>
> postgres=# set work_mem = '2MB';
> SET
> postgres=# show replacement_sort_tuples ;
>  replacement_sort_tuples
> ─
>  15
> (1 row)
> (30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
>count
> 
>  10,000,000
> (1 row)
>
> Time: 4157.267 ms (00:04.157)
> (30784) /postgres=# set replacement_sort_tuples = 0;
> SET
> (30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
>count
> 
>  10,000,000
> (1 row)
>
> Time: 3343.789 ms (00:03.344)
>
> This is significantly faster, in a way that's clearly reproducible and
> consistent, despite the fact that we need about 10 merge passes
> without replacement selection, and only have enough memory for 7
> tapes. I think that I could find a case that makes replacement
> selection look much worse, if I tried.

Wow.  Just to be clear, I am looking for the BEST case for replacement
selection, not the worst case.  But I would have expected that case to
be a win for replacement selection, and it clearly isn't.  I can
reproduce your results here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 3:14 PM, Peter Geoghegan  wrote:
> This is significantly faster, in a way that's clearly reproducible and
> consistent, despite the fact that we need about 10 merge passes
> without replacement selection, and only have enough memory for 7
> tapes. I think that I could find a case that makes replacement
> selection look much worse, if I tried.

Just to see where we end up, I quickly wrote a patch to remove
replacement selection + replacement_sort_tuples. The LOC impact worked
out like this:

$ git diff master --stat
 doc/src/sgml/config.sgml  |  39 
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 403
+--
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 8 files changed, 52 insertions(+), 434 deletions(-)

It was nice to be able to remove comments that make certain
distinctions that replacement selection cares about. These were
tedious to read. I managed to remove several paragraphs within
tuplesort.c's header comments.

Another advantage of the changes made that I noticed in passing is
that SortTuple.tupindex is now only used while merging. It would be
quite possible to remove SortTuple.tupindex entirely, as an additional
piece of work, by providing some other indirection to get that
information when merging. From there, we could replace SortTuple.tuple
with a bitfield, that has one bit for NULLness, and treats other bits
as a 63-bit or 31-bit integer. This integer would be used an offset
into a buffer that we repalloc(), thus removing all retail palloc()s,
even during run generation. Using one big buffer for tuples would make
tuplesort.c quite a bit more memory efficient (SortTuple will only be
16 bytes, we won't waste memory on palloc() headers, and sequential
access is made more cache efficient in presorted pass-by-reference
cases).

I'm not planning to work on this myself. It is similar to something
that Heikki described last year, but goes a bit further by eliminating
all palloc() header overhead, while also not playing tricks with
reclaiming bits from pointers (I recall that Tom didn't like that
aspect of Heikki's proposal at all). There would be new infrastructure
required to make this work -- we'd have to be able to ask routines
like ExecCopySlotMinimalTuple() and heap_copytuple() to copy into our
own large tuple buffer, rather than doing their own palloc() on
tuplesort's behalf. But that seems like a good idea anyway.

I may submit the simple patch to remove replacement selection, if
other contributors are receptive. Apart from everything else, the
"incrementalism" of replacement selection works against cleverer batch
memory management of the type I just outlined, which seems like a
useful direction to take tuplesort in.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 3:01 PM, Robert Haas  wrote:
> That may all be true, but my point is that if it wins in some cases,
> we should keep it -- and proving it no longer wins in those cases will
> require running tests.

That's not hard. On my laptop:

$ pgbench -i -s 100
...

postgres=# set work_mem = '2MB';
SET
postgres=# show replacement_sort_tuples ;
 replacement_sort_tuples
─
 15
(1 row)
(30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
   count

 10,000,000
(1 row)

Time: 4157.267 ms (00:04.157)
(30784) /postgres=# set replacement_sort_tuples = 0;
SET
(30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
   count

 10,000,000
(1 row)

Time: 3343.789 ms (00:03.344)

This is significantly faster, in a way that's clearly reproducible and
consistent, despite the fact that we need about 10 merge passes
without replacement selection, and only have enough memory for 7
tapes. I think that I could find a case that makes replacement
selection look much worse, if I tried.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 4:18 PM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 12:51 PM, Robert Haas  wrote:
>> On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
>>> With the additional enhancements made to Postgres 10, I doubt that
>>> there are any remaining cases where it wins.
>>
>> The thing to do about that would be to come up with some cases where
>> someone might plausibly think it would win and benchmark them to find
>> out what happens.  I find it really hard to believe that sorting a
>> long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
>> is ever going to be as fast with any other algorithm as it is with
>> replacement selection.
>
> Replacement selection as implemented in Postgres is supposed to be
> about the "single run, no merge" best case. This must use
> TSS_SORTEDONTAPE processing, which is optimized for random access,
> which is usually the wrong thing.
>
> In general, sorting is only one cost that is involved here, and is not
> the predominant cost with presorted input.

That may all be true, but my point is that if it wins in some cases,
we should keep it -- and proving it no longer wins in those cases will
require running tests.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 12:51 PM, Robert Haas  wrote:
> On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
>> With the additional enhancements made to Postgres 10, I doubt that
>> there are any remaining cases where it wins.
>
> The thing to do about that would be to come up with some cases where
> someone might plausibly think it would win and benchmark them to find
> out what happens.  I find it really hard to believe that sorting a
> long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
> is ever going to be as fast with any other algorithm as it is with
> replacement selection.

Replacement selection as implemented in Postgres is supposed to be
about the "single run, no merge" best case. This must use
TSS_SORTEDONTAPE processing, which is optimized for random access,
which is usually the wrong thing.

In general, sorting is only one cost that is involved here, and is not
the predominant cost with presorted input.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Robert Haas
On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
> With the additional enhancements made to Postgres 10, I doubt that
> there are any remaining cases where it wins.

The thing to do about that would be to come up with some cases where
someone might plausibly think it would win and benchmark them to find
out what happens.  I find it really hard to believe that sorting a
long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
is ever going to be as fast with any other algorithm as it is with
replacement selection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] The case for removing replacement selection sort

2017-07-14 Thread Peter Geoghegan
There was a number of improvements to tuplesort.c external sort
merging made for Postgres 10. One in particular, the changes to merge
heap maintenance that occurred for commit 24598337c8d, really helped
with presorted cases -- cases when there was an (inverse)
physical/logical correlation.

Replacement selection sort for run generation was not killed as part
of the improvements to external sorting in Postgres 9.6, although
there was a reasonably good case for that to be made at the time those
enhancements went in. This was because it was still possible for its
best case to come out ahead. The best case is the case where it
manages to produce a single run, all in one go, by exploiting
presortedness. The examples where this was possible were fairly
narrow, but they existed.

With the additional enhancements made to Postgres 10, I doubt that
there are any remaining cases where it wins. In addition to the point
about heap management and presortedness, you also have to consider
that TSS_SORTEDONTAPE processing is optimized for random access. This
means that one-big-replacement-selection-run cases will not take
advantage of available memory, improvements in Postgres 10 by Heikki
to tape preloading for merging, OS readahead, and so on. Merging is
often quite I/O bound these days, especially when merging naturally
requires few comparisons. TSS_SORTEDONTAPE processing is
inappropriate.

I think we should remove the replacement_sort_tuples GUC, and kill
replacement selection entirely. There is no need to do this for
Postgres 10. I don't feel very strongly about it. It just doesn't make
sense to continue to support replacement selection.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers