subject:"Re\: \[PATCH\] Incremental sort \(was\: PoC\: Partial sort\)"

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:

On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>>hyrax is not too happy with this test:
>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
>>
>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>>EXPLAIN output, but it evidently is.
>>
>
>Thanks, I'll investigate. It's not clear to me either what might be
>causing this, but I guess something must have gone wrong in
>estimation/planning.
>

OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.

Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.

Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:

On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
>On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
>>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
>> wrote:
>>>
>>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>hyrax is not too happy with this test:
>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
>
>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>EXPLAIN output, but it evidently is.
>

Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.

>>>
>>>OK, I know what's going on - it's a rather embarassing issue in the
>>>regression test. There's no analyze on the test tables, so it uses
>>>default estimates for number of groups etc. But with clobber cache the
>>>test runs long enough for autoanalyze to kick in and collect stats, so
>>>we generate better estimates which changes the plan.
>>>
>>>I'll get this fixed - explicit analyze and tweaking the data a bit
>>>should do the trick.
>>
>>Looking at the tests that failed, I think we should consider just adding:
>>set enable_sort = off;
>>because several of those tests have very specific amounts of data to
>>ensure we test the transition points around the different modes in the
>>incremental sort node.
>>
>
>Maybe, but I'd much rather tweak the data so that we test both the
>costing and execution part.
>

I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

OK, good. I'll push the fix.

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Yeah, but with 1000 rows we'd need a more complex CASE statement (I
don't think simply having two groups - small+large would work).

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

Ah, definitely. That was a mistake. Thanks for noticing.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tom Lane

James Coleman  writes:
> Should we change `analyze` to `analyze t` to avoid unnecessarily
> re-analyzing all other tables in the regression db?

Yes, a global analyze here is a remarkably horrid idea.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman

On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
> >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
> >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
> >> wrote:
> >>>
> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >hyrax is not too happy with this test:
> >
> >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
> >
> >It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >EXPLAIN output, but it evidently is.
> >
> 
> Thanks, I'll investigate. It's not clear to me either what might be
> causing this, but I guess something must have gone wrong in
> estimation/planning.
> 
> >>>
> >>>OK, I know what's going on - it's a rather embarassing issue in the
> >>>regression test. There's no analyze on the test tables, so it uses
> >>>default estimates for number of groups etc. But with clobber cache the
> >>>test runs long enough for autoanalyze to kick in and collect stats, so
> >>>we generate better estimates which changes the plan.
> >>>
> >>>I'll get this fixed - explicit analyze and tweaking the data a bit
> >>>should do the trick.
> >>
> >>Looking at the tests that failed, I think we should consider just adding:
> >>set enable_sort = off;
> >>because several of those tests have very specific amounts of data to
> >>ensure we test the transition points around the different modes in the
> >>incremental sort node.
> >>
> >
> >Maybe, but I'd much rather tweak the data so that we test both the
> >costing and execution part.
> >
>
> I do think this does the trick by increasing the number of rows a bit
> (from 100 to 1000) to make the Sort more expensive than Incremental
> Sort, while still testing the transition points.
>
> James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.


Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.



Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.



I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/test/regress/expected/incremental_sort.out 
b/src/test/regress/expected/incremental_sort.out
index 3072d95643..fb4ab95922 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -141,7 +141,8 @@ begin
 end;
 $$;
 -- A single large group tested around each mode transition point.
-insert into t(a, b) select 1, i from generate_series(1, 100) n(i);
+insert into t(a, b) select i/100 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 31;
QUERY PLAN
 -
@@ -456,7 +457,8 @@ select * from (select * from t order by a) s order by a, b 
limit 66;
 
 delete from t;
 -- An initial large group followed by a small group.
-insert into t(a, b) select (case when i < 50 then 1 else 2 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select i/50 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 55;
QUERY PLAN
 -
@@ -521,7 +523,7 @@ select * from (select * from t order by a) s order by a, b 
limit 55;
  1 | 47
  1 | 48
  1 | 49
- 2 | 50
+ 1 | 50
  2 | 51
  2 | 52
  2 | 53
@@ -538,10 +540,10 @@ select explain_analyze_without_memory('select * from 
(select * from t order by a
  Sort Key: t.a, t.b
  Presorted Key: t.a
  Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: 
avg=NNkB peak=NNkB
- ->  Sort (actual rows=100 loops=1)
+ ->  Sort (actual rows=101 loops=1)
Sort Key: t.a
Sort Method: quicksort  Memory: NNkB
-   ->  Seq Scan on t (actual rows=100 loops=1)
+   ->  Seq Scan on t (actual rows=1000 loops=1)
 (9 rows)
 
 select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * 
from (select * from t order by a) s order by a, b limit 55'));
@@ -584,7 +586,8 @@ select 
explain_analyze_inc_sort_nodes_verify_invariants('select * from (select *
 
 delete from t;
 -- An initial small group followed by a large group.
-insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 1000) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 70;
QUERY PLAN
 -
@@ -705,17 +708,17 @@ select * from t left join (select * from (select * from t 
order by a) v order by
 rollback;
 -- Test EXPLAIN ANALYZE with both fullsort and presorted groups.
 select explain_analyze_without_memory('select * from (select * from t order by 
a) s order by a, b limit 70');
-   
explain_analyze_without_memory

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> >On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >>hyrax is not too happy with this test:
> >>
> >>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
> >>
> >>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >>EXPLAIN output, but it evidently is.
> >>
> >
> >Thanks, I'll investigate. It's not clear to me either what might be
> >causing this, but I guess something must have gone wrong in
> >estimation/planning.
> >
>
> OK, I know what's going on - it's a rather embarassing issue in the
> regression test. There's no analyze on the test tables, so it uses
> default estimates for number of groups etc. But with clobber cache the
> test runs long enough for autoanalyze to kick in and collect stats, so
> we generate better estimates which changes the plan.
>
> I'll get this fixed - explicit analyze and tweaking the data a bit
> should do the trick.

Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Tom Lane

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


Hi,

I've pushed the second part of this patch series, adding incremental
sort to additional places in the planner. As explained in the commit
message (and somewhere in this thread) we've decided to only update some
of the places that require sorted input (and do create_sort). This might
be overly cautious, I expect we'll add it to more places in the future.


As for the remaining part tweaking add_partial_path to also consider
startup cost (a bit confusingly 0001 - 0003 in the v54 patchset), I've
decided not to push it now and leave it for v14. The add_partial_path
change is simple, but I came to the conclusion that the "precheck"
function should be modified to follow the same logic - it would be a bit
strange if add_partial_path_precheck considered only total cost and
add_partial_path considered both startup and total cost. It would not
matter for most places because the add_partial_path_precheck is ony used
in join planning, but it's still strange.

I could have modified the add_partial_path_precheck too, but looking at
add_path_precheck we'd probably need to compute required_outer so that
we only compare startup_cost when really useful. Or we might simply
consider startup_cost every time and leave that up to add_partial_path,
but then that would be another difference in behavior.

That seems like way too much stuff to rework on the last day of the last
commitfest. It does mean we'll fail to generate the cheapest plan in some
cases (e.g. with LIMIT, there's an example in [1]) but that's a
pre-existing condition, not something introduced by incremental sort.

regards


[1] 
https://www.postgresql.org/message-id/20190720132244.3vgg2uynfpxh3me5%40development

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread James Coleman

On Tue, Apr 7, 2020 at 7:58 PM Tomas Vondra
 wrote:
>
> On Tue, Apr 07, 2020 at 07:50:26PM -0400, James Coleman wrote:
> >On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:
> >> >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> >> >> I've pushed the fist part of this patch series - I've reorganized it a
> >> >
> >> >I scanned through this again post-commit.  Find attached some suggestions.
> >> >
> >>
> >> Thanks. The typo fixes seem clear, except for this bit:
> >>
> >>* If we've set up either of the sort states yet, we need to reset them.
> >>* We could end them and null out the pointers, but there's no reason to
> >>* repay the setup cost, and because  guard setting up pivot 
> >> comparator
> >>* state similarly, doing so might actually cause a leak.
> >>
> >> I can't figure out what  should be. James, do you recall what this
> >> should be?
> >
> >Yep, it's ExecIncrementalSort. If you look for the block guarded by
> >`if (fullsort_state == NULL)` you'll see the call to
> >preparePresortedCols(), which sets up the pivot comparator state
> >referenced by this comment.
> >
>
> OK, so it should be "... and because ExecIncrementalSort guard ..."?

Yes, "because ExecIncrementalSort guards presorted column functions by
checking to see if the full sort state has been initialized yet,
setting the sort states to null here might cause..." (that's more
specific IMO than my original "pivot comparator state...doing so").

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Apr 07, 2020 at 07:50:26PM -0400, James Coleman wrote:

On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra
 wrote:

On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:
>On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
>> I've pushed the fist part of this patch series - I've reorganized it a
>
>I scanned through this again post-commit.  Find attached some suggestions.
>

Thanks. The typo fixes seem clear, except for this bit:

   * If we've set up either of the sort states yet, we need to reset them.
   * We could end them and null out the pointers, but there's no reason to
   * repay the setup cost, and because  guard setting up pivot comparator
   * state similarly, doing so might actually cause a leak.

I can't figure out what  should be. James, do you recall what this
should be?

Yep, it's ExecIncrementalSort. If you look for the block guarded by
`if (fullsort_state == NULL)` you'll see the call to
preparePresortedCols(), which sets up the pivot comparator state
referenced by this comment.

OK, so it should be "... and because ExecIncrementalSort guard ..."?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread James Coleman

On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:
> >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> >> I've pushed the fist part of this patch series - I've reorganized it a
> >
> >I scanned through this again post-commit.  Find attached some suggestions.
> >
>
> Thanks. The typo fixes seem clear, except for this bit:
>
>* If we've set up either of the sort states yet, we need to reset them.
>* We could end them and null out the pointers, but there's no reason to
>* repay the setup cost, and because  guard setting up pivot comparator
>* state similarly, doing so might actually cause a leak.
>
> I can't figure out what  should be. James, do you recall what this
> should be?

Yep, it's ExecIncrementalSort. If you look for the block guarded by
`if (fullsort_state == NULL)` you'll see the call to
preparePresortedCols(), which sets up the pivot comparator state
referenced by this comment.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:

On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:

I've pushed the fist part of this patch series - I've reorganized it a


I scanned through this again post-commit.  Find attached some suggestions.



Thanks. The typo fixes seem clear, except for this bit:

  * If we've set up either of the sort states yet, we need to reset them.
  * We could end them and null out the pointers, but there's no reason to
  * repay the setup cost, and because  guard setting up pivot comparator
  * state similarly, doing so might actually cause a leak.

I can't figure out what  should be. James, do you recall what this
should be?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Justin Pryzby

On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
> > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> > > I've pushed the fist part of this patch series - I've reorganized it a
> >
> > I scanned through this again post-commit.  Find attached some suggestions.
> >
> > Shouldn't non-text explain output always show both disk *and* mem, including
> > zeros ?
> 
> Could you give more context on this? Is there a standard to follow?
> Regular sort nodes only ever report one type, so there's not a good
> parallel there.

Right, I'm not sure either, since it seems to be a new case.  Maybe Tomas has a
strong intuition.

See at least the commit messages here:
3ec20c7091e97a554e7447ac2b7f4ed795631395
7d91b604d9b5d6ec8c19c57a9ffd2f27129cdd94
8ebb69f85445177575684a0ba5cfedda8d840a91

Maybe this one suggests that it should *not* be present unconditionally, but
only when that sort type is used?
4b234fd8bf21cd6f5ff44f1f1c613bf40860998d

Another thought: is checking if bytes>0 really a good way to determine if a
sort type was used ?  It seems like checking a bit or a pointer would be
better.  I guess a size of 0 is unlikely, and it's ok at least in text mode.

if (groupInfo->maxMemorySpaceUsed > 0)

On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
...
> I read through that subthread, and the ending seemed to be Peter
> wanting things to be unified. Was there a conclusion beyond that?

This discussion is ongoing.  I think let's wait until that's settled before
addressing this more complex and even newer case.  We can add "explain, two
spaces and equals vs colon" to the "Open items" list if need be - I hope the
discussion will not delay the release.

-- 
Justin

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:

On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:


On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> I've pushed the fist part of this patch series - I've reorganized it a

I scanned through this again post-commit.  Find attached some suggestions.

Shouldn't non-text explain output always show both disk *and* mem, including
zeros ?


Could you give more context on this? Is there a standard to follow?
Regular sort nodes only ever report one type, so there's not a good
parallel there.


Should "Pre-sorted Groups:" be on a separate line ?
| Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB


I'd originally had that, but Tomas wanted it to be more compact. It's
easy to adjust though if the consensus changes on that.



I'm OK with changing the format if there's a consensus. The current
format seemed better to me, but I'm not particularly attached to it.


And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
Groups"?  I think you should maybe do that instead of the "semicolon
separator".  I think "two spaces" makes sense, since the units are different,
similar to hash buckets and normal sort node.

 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
 appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",

Note, I made a similar comment regarding two spaces for explain(WAL) here:
https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com

And Peter E seemed to dislike that, here:
https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com


I read through that subthread, and the ending seemed to be Peter
wanting things to be unified. Was there a conclusion beyond that?



Yeah, I don't think there was a clear consensus :-(


Also, you're showing:
ExplainPropertyInteger("Maximum Sort Space Used", "kB",
groupInfo->maxMemorySpaceUsed, es);

But in show_hash_info() and show_hashagg_info(), and in your own text output,
that's called "Peak":
ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
ExplainPropertyInteger("Peak Memory Usage", "kB",
spacePeakKb, es);


Yes, that's a miss and should be fixed.



Will fix.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread James Coleman

On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
>
> On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> > I've pushed the fist part of this patch series - I've reorganized it a
>
> I scanned through this again post-commit.  Find attached some suggestions.
>
> Shouldn't non-text explain output always show both disk *and* mem, including
> zeros ?

Could you give more context on this? Is there a standard to follow?
Regular sort nodes only ever report one type, so there's not a good
parallel there.

> Should "Pre-sorted Groups:" be on a separate line ?
> | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
> Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB

I'd originally had that, but Tomas wanted it to be more compact. It's
easy to adjust though if the consensus changes on that.

> And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
> Groups"?  I think you should maybe do that instead of the "semicolon
> separator".  I think "two spaces" makes sense, since the units are different,
> similar to hash buckets and normal sort node.
>
>  "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
>  appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",
>
> Note, I made a similar comment regarding two spaces for explain(WAL) here:
> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com
>
> And Peter E seemed to dislike that, here:
> https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com

I read through that subthread, and the ending seemed to be Peter
wanting things to be unified. Was there a conclusion beyond that?

> Also, you're showing:
> ExplainPropertyInteger("Maximum Sort Space Used", "kB",
> groupInfo->maxMemorySpaceUsed, es);
>
> But in show_hash_info() and show_hashagg_info(), and in your own text output,
> that's called "Peak":
> ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
> ExplainPropertyInteger("Peak Memory Usage", "kB",
> spacePeakKb, es);

Yes, that's a miss and should be fixed.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Justin Pryzby

On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> I've pushed the fist part of this patch series - I've reorganized it a

I scanned through this again post-commit.  Find attached some suggestions.

Shouldn't non-text explain output always show both disk *and* mem, including
zeros ?

Should "Pre-sorted Groups:" be on a separate line ?
| Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB

And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
Groups"?  I think you should maybe do that instead of the "semicolon
separator".  I think "two spaces" makes sense, since the units are different,
similar to hash buckets and normal sort node.  

 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
 appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",

Note, I made a similar comment regarding two spaces for explain(WAL) here:
https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com

And Peter E seemed to dislike that, here:
https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com

Also, you're showing:
ExplainPropertyInteger("Maximum Sort Space Used", "kB",
groupInfo->maxMemorySpaceUsed, es);

But in show_hash_info() and show_hashagg_info(), and in your own text output,
that's called "Peak":
ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
ExplainPropertyInteger("Peak Memory Usage", "kB",
spacePeakKb, es);

-- 
Justin
>From e26f2cc842792fc3c0dd4b4b97c0996d450c6dd7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 6 Apr 2020 17:37:31 -0500
Subject: [PATCH v1] comment typos and others: Incremental Sort

commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da
Author: Tomas Vondra 
---
 doc/src/sgml/perform.sgml  |  2 +-
 src/backend/commands/explain.c | 17 +++---
 src/backend/executor/nodeIncrementalSort.c | 26 +++---
 src/backend/utils/sort/tuplesort.c | 10 -
 src/include/utils/tuplesort.h  |  2 +-
 5 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 0dfc3e80e2..f448abd073 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -311,7 +311,7 @@ EXPLAIN SELECT * FROM tenk1 ORDER BY unique1;
->  Seq Scan on tenk1  (cost=0.00..445.00 rows=1 width=244)
 
 
-If the a part of the plan guarantess an ordering on a prefix of the
+If a part of the plan guarantees an ordering on a prefix of the
 required sort keys, then the planner may instead decide to use an
 incremental sort step:
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index baaa5817af..ca4fe3307d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2532,7 +2532,7 @@ show_sort_group_keys(PlanState *planstate, const char *qlabel,
 
 	ExplainPropertyList(qlabel, result, es);
 	if (nPresortedKeys > 0)
-		ExplainPropertyList("Presorted Key", resultPresorted, es);
+		ExplainPropertyList("Pre-sorted Key", resultPresorted, es);
 }
 
 /*
@@ -2829,7 +2829,6 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 
 		ExplainPropertyList("Sort Methods Used", methodNames, es);
 
-		if (groupInfo->maxMemorySpaceUsed > 0)
 		{
 			long		avgSpace = groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
 			const char *spaceTypeName;
@@ -2841,12 +2840,12 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			ExplainOpenGroup("Sort Space", memoryName.data, true, es);
 
 			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es);
-			ExplainPropertyInteger("Maximum Sort Space Used", "kB",
+			ExplainPropertyInteger("Peak Sort Space Used", "kB",
    groupInfo->maxMemorySpaceUsed, es);
 
 			ExplainCloseGroup("Sort Spaces", memoryName.data, true, es);
 		}
-		if (groupInfo->maxDiskSpaceUsed > 0)
+
 		{
 			long		avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
 			const char *spaceTypeName;
@@ -2858,7 +2857,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			ExplainOpenGroup("Sort Space", diskName.data, true, es);
 
 			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es);
-			ExplainPropertyInteger("Maximum Sort Space Used", "kB",
+			ExplainPropertyInteger("Peak Sort Space Used", "kB",
    groupInfo->maxDiskSpaceUsed, es);
 
 			ExplainCloseGroup("Sort Spaces", diskName.data, true, es);
@@ -2869,7 +2868,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node
  */
 static void

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 11:00:37PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

I came to the same conclusion (that the change in TuplesortMethod
definiton is the culprit) a while ago and was about to push a fix that
initialized it correctly in ExecSortInitializeDSM. But I agree reverting
it back to the old definition is probably better.


Yeah, for the moment.  James would like to not have
SORT_TYPE_STILL_IN_PROGRESS be part of the enum at all, I think,
and I can see his point --- but then we need some out-of-band
representation of "worker not done", so I'm not sure there'll be
any net reduction of cruft.  Anyway that can be dealt with after
we have a stable buildfarm.



Agreed.


Note also that there's a separate comment-only patch in

that shouldn't be forgotten about.



OK, I'll take care of that tomorrow. I have two smaller patches to
commit in the incremental sort patchset, so I'll add it to that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> I came to the same conclusion (that the change in TuplesortMethod
> definiton is the culprit) a while ago and was about to push a fix that
> initialized it correctly in ExecSortInitializeDSM. But I agree reverting
> it back to the old definition is probably better.

Yeah, for the moment.  James would like to not have
SORT_TYPE_STILL_IN_PROGRESS be part of the enum at all, I think,
and I can see his point --- but then we need some out-of-band
representation of "worker not done", so I'm not sure there'll be
any net reduction of cruft.  Anyway that can be dealt with after
we have a stable buildfarm.

Note also that there's a separate comment-only patch in

that shouldn't be forgotten about.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 10:19:41PM -0400, Tom Lane wrote:

James Coleman  writes:

Fair enough. Unsure if Tomas is still online to comment and/or push,
but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an
initial fix.


I'm guessing he went to bed, so I'll push a fix in a moment.
The patch has survived enough test cycles here now to make me
moderately confident that it fixes the issue.



Nope, how I could I sleep with half of the buildfarm still red?

I came to the same conclusion (that the change in TuplesortMethod
definiton is the culprit) a while ago and was about to push a fix that
initialized it correctly in ExecSortInitializeDSM. But I agree reverting
it back to the old definition is probably better.

Thanks!

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

James Coleman  writes:
> Fair enough. Unsure if Tomas is still online to comment and/or push,
> but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an
> initial fix.

I'm guessing he went to bed, so I'll push a fix in a moment.
The patch has survived enough test cycles here now to make me
moderately confident that it fixes the issue.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 10:09 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Mon, Apr 6, 2020 at 9:46 PM Tom Lane  wrote:
> >> I think the correct fix is to change the enum declaration.
>
> > Hmm. I don't actually really like that, because it means the value
> > here isn't actually semantically correct. That is, the sort type is
> > not "in progress"; it's "we never started a sort at all".
>
> Well, yeah, but that pre-dated this patch, and right now is no
> time to improve it; we can debate such fine points at more leisure
> once the buildfarm isn't broken.

Fair enough. Unsure if Tomas is still online to comment and/or push,
but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an
initial fix.

> Obviously the comment needs fixed...

The one in show_short_info?

I can work on that (and the other proposed cleanup above) with Tomas
tomorrow or later.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

James Coleman  writes:
> On Mon, Apr 6, 2020 at 9:46 PM Tom Lane  wrote:
>> I think the correct fix is to change the enum declaration.

> Hmm. I don't actually really like that, because it means the value
> here isn't actually semantically correct. That is, the sort type is
> not "in progress"; it's "we never started a sort at all".

Well, yeah, but that pre-dated this patch, and right now is no
time to improve it; we can debate such fine points at more leisure
once the buildfarm isn't broken.

Obviously the comment needs fixed...

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 9:46 PM Tom Lane  wrote:
>
> Tomas Vondra  writes:
> > I don't know, I've tried running the tests on a number of machines,
> > similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
> > everywhere while the failures seem consistent.
>
> On my machine, it reproduces about one time in six with
> force_parallel_mode = regress.  It seems possible given your
> results that reducing max_parallel_workers would make it more
> likely, but I've not tried that.
>
> What I'm seeing, after adding some debug printouts, is that sortMethod is
> frequently zero when we reach the EXPLAIN output for a worker.  In many of
> the tests this happens even though there is no visible failure, because
> we've got a filter function hiding the output :-(
>
> So I concur with James' conclusion that the existing code is relying on
> sortMethod initializing to zeroes, and that we did the wrong thing by
> trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation.
> I do not like his patch though, particularly not the type pun with NULL.

Sentinel and NULL? I hadn't caught that at all.

> I think the correct fix is to change the enum declaration.

Hmm. I don't actually really like that, because it means the value
here isn't actually semantically correct. That is, the sort type is
not "in progress"; it's "we never started a sort at all". I don't
really love the conflating of those things that the old enum
declaration had (even it'd had a helpful comment). It seems to me that
we should make "we don't have a type" and "we have a type" distinct.

We could add a new enum value SORT_TYPE_UNINITIALIZED or similar though.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> I don't know, I've tried running the tests on a number of machines,
> similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
> everywhere while the failures seem consistent.

On my machine, it reproduces about one time in six with
force_parallel_mode = regress.  It seems possible given your
results that reducing max_parallel_workers would make it more
likely, but I've not tried that.

What I'm seeing, after adding some debug printouts, is that sortMethod is
frequently zero when we reach the EXPLAIN output for a worker.  In many of
the tests this happens even though there is no visible failure, because
we've got a filter function hiding the output :-(

So I concur with James' conclusion that the existing code is relying on
sortMethod initializing to zeroes, and that we did the wrong thing by
trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation.
I do not like his patch though, particularly not the type pun with NULL.
I think the correct fix is to change the enum declaration.

I know it's darn late where you are, do you want me to change it?

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 06, 2020 at 08:42:13PM -0400, Tom Lane wrote:

Tomas Vondra writes:

It doesn't seem to be particularly platform-specific, but I've been
unable to reproduce it so far. It seems on older gcc versions, though.

It's looking kind of like an uninitialized-memory problem. Note
the latest from spurfowl,

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl=2020-04-07%2000%3A15%3A05

which got through "make check" and then failed during pg_upgrade's
repetition of the test. Similarly on rhinoceros. So there's definitely
instability there even on one machine.

Perhaps something to do with unexpected cache flushes??

I don't know, I've tried running the tests on a number of machines,
similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
everywhere while the failures seem consistent.

I've been able to reproduce these failures (same symptoms) by making
sure the worker (implied by force_parallel_mode=regress) won't start.

set max_parallel_workers = 0;
set force_parallel_mode = regress;

triggers exactly those failures for me (at least during make check, I
haven't tried pg_upgrade tests etc.).

So my theory is that we fail to start parallel workers on those
machines. It's not clear to me why would it be limited to some machines
and why would it be correlated to the incremental sort? I don't think
those machines have lower number of parallel workers, no?

But maybe incremental sort allowed using more parallel queries for more
queries, and we simply run out of parallel workers that way?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 07:27:19PM -0400, James Coleman wrote:

On Mon, Apr 6, 2020 at 7:09 PM James Coleman  wrote:


On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >>
> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >>>other BF members).
> >> >>
> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >> >>and they were all identical.  I can confirm that, in my laptop, the
> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >>
> >> >
> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >> >soon. I'll push a fix in a minute.
> >> >
> >>
> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >>
> >> It however seems to me a bit more needs to be done. The fix makes
> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> because IncrementalSortState does not have sort_Done flag so it still
> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> before, but not having that flag seems a bit weird to me.
> >>
> >> It also seems possibly incorrect - we may end up with
> >>
> >>fullsortGroupInfo->groupCount == 0
> >>prefixsortGroupInfo->groupCount > 0
> >>
> >> but we won't print anything.
> >
> >This shouldn't ever be possible, because the only way we get any
> >prefix groups at all is if we've already sorted a full sort group
> >during the mode transition.
> >
> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> and make it work as in plain Sort. Or some comment explaining why
> >> depending on the counts is OK (assuming it is).
> >
> >There's previous email traffic on this thread about that (I can look
> >it up later this evening), but the short of it is that I believe that
> >relying on the group count is actually more correct than a sort_Done
> >flag in the case of incremental sort (in contrast to regular sort).
> >
>
> OK. Maybe we should add a comment to explain.c saying it's OK.
>
> I've pushed a fix for failures due to different planned workers (in the
> test I added to show changes due to add_partial_path tweaks).
>
> It seems we're not out of the woods yet, though. rhinoceros and
> sidewinder failed with something like this:
>
>  Sort Method: quicksort  Memory: NNkB
> +   Sort Method: unknown  Disk: NNkB
>
> Would you mind investigating at it?

I assume that means those build farm members run with very low
work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
these tests? Or is that bad practice and these are to expose issues
with changing into disk sort mode?


On rhinoceros I see:

== pgsql.build/src/test/regress/regression.diffs
===
diff -U3 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
2020-03-14 10:37:49.156761104 -0700
+++ 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
2020-04-06 16:01:13.766798059 -0700
@@ -1328,8 +1328,9 @@
 ->  Sort (actual rows=3 loops=1)
   Sort Key: sq_limit.c1, sq_limit.pk
   Sort Method: top-N heapsort  Memory: xxx
+   Sort Method: unknown  Disk: 0kB
   ->  Seq Scan on sq_limit (actual rows=8 loops=1)
-(6 rows)
+(7 rows)

Same on sidewinder.

Given the 0kB I'm not sure this is *just* a work_mem thing, though
that's still something I'm curious to know about, and it's still part
of the "problem" here.

I'm investigating further.



I don't see how could this be caused by low work_mem value, really. What
I think it happening is that when executed with

force_parallel_mode = regress

we run this as if in a parallel worker, but then it gets somehow
confused and either (leader + worker) or two worker stats, or something
like that. I don't know why it's happening, though, or why would it be
triggered by the incremental sort patch ...

Actually, I just managed to trigger exactly this - the trick is that we
plan for certain number of workers, but then fail to start some. So for
example like this:

create table t (a int, b int, c int);

insert into t select mod(i,10), mod(i,10), i
  from generate_series(1,100) s(i);

set force_parallel_mode = regress;

set max_parallel_workers = 0;

explain (costs off, analyze) select * from t order by a,b;

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 7:31 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >> >>
> >> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >> >>>other BF members).
> >> >> >>
> >> >> >>FWIW I looked at the eight failures there were about fifteen minutes 
> >> >> >>ago
> >> >> >>and they were all identical.  I can confirm that, in my laptop, the
> >> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >> >>
> >> >> >
> >> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns 
> >> >> >too
> >> >> >soon. I'll push a fix in a minute.
> >> >> >
> >> >>
> >> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >> >>
> >> >> It however seems to me a bit more needs to be done. The fix makes
> >> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> >> because IncrementalSortState does not have sort_Done flag so it still
> >> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> >> before, but not having that flag seems a bit weird to me.
> >> >>
> >> >> It also seems possibly incorrect - we may end up with
> >> >>
> >> >>fullsortGroupInfo->groupCount == 0
> >> >>prefixsortGroupInfo->groupCount > 0
> >> >>
> >> >> but we won't print anything.
> >> >
> >> >This shouldn't ever be possible, because the only way we get any
> >> >prefix groups at all is if we've already sorted a full sort group
> >> >during the mode transition.
> >> >
> >> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> >> and make it work as in plain Sort. Or some comment explaining why
> >> >> depending on the counts is OK (assuming it is).
> >> >
> >> >There's previous email traffic on this thread about that (I can look
> >> >it up later this evening), but the short of it is that I believe that
> >> >relying on the group count is actually more correct than a sort_Done
> >> >flag in the case of incremental sort (in contrast to regular sort).
> >> >
> >>
> >> OK. Maybe we should add a comment to explain.c saying it's OK.
> >>
> >> I've pushed a fix for failures due to different planned workers (in the
> >> test I added to show changes due to add_partial_path tweaks).
> >>
> >> It seems we're not out of the woods yet, though. rhinoceros and
> >> sidewinder failed with something like this:
> >>
> >>  Sort Method: quicksort  Memory: NNkB
> >> +   Sort Method: unknown  Disk: NNkB
> >>
> >> Would you mind investigating at it?
> >
> >I assume that means those build farm members run with very low
> >work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
> >these tests? Or is that bad practice and these are to expose issues
> >with changing into disk sort mode?
> >
>
> I don't think so - I don't see any work_mem changes in the config - see
> the extra_config at the beginning of the page with details:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2020-04-06%2023%3A00%3A16
>
> Moreover, this seems to be in regular Sort, not Incremental Sort and it
> very much seems like it gets confused to print a worker info because the
> only way for Sort to print two "Sort Method" lines seems to be to enter
> either both
>
>if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
>{
>  ... print leader info ...
>}
>
>and
>
>if (sortstate->shared_info != NULL)
>{
>  for (n = 0; n < sortstate->shared_info->num_workers; n++)
>  {
>... print worker info ...
>  }
>}
>
> or maybe there are two workers? It's strange ...
>
>
> It doesn't seem to be particularly platform-specific, but I've been
> unable to reproduce it so far. It seems on older gcc versions, though.


I haven't been able to reproduce it, but I'm 99% confident this will fix it:

-if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS
+|| sinstrument->sortMethod == NULL)
 continue;   /* ignore any unfilled slots */

Earlier we'd had this discussion about why SORT_TYPE_STILL_IN_PROGRESS
was explicitly set to 0 in the enum declaration. Since there was no
comment, we changed that, but here I believe that show_sort_info was
relying on that as an indicator that a worker didn't actually do any
work (since the DSM for the sort node gets set to all zeros, this
would work).

I'm not sure if the

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> It doesn't seem to be particularly platform-specific, but I've been
> unable to reproduce it so far. It seems on older gcc versions, though.

It's looking kind of like an uninitialized-memory problem.  Note
the latest from spurfowl,

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl=2020-04-07%2000%3A15%3A05

which got through "make check" and then failed during pg_upgrade's
repetition of the test.  Similarly on rhinoceros.  So there's definitely
instability there even on one machine.

Perhaps something to do with unexpected cache flushes??

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote:

On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
 wrote:


On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
>On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> wrote:
>>
>> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
>> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
>> >>On 2020-Apr-06, Tom Lane wrote:
>> >>
>> >>>Locally, things pass without force_parallel_mode, but turning it on
>> >>>produces failures that look similar to rhinoceros's (didn't examine
>> >>>other BF members).
>> >>
>> >>FWIW I looked at the eight failures there were about fifteen minutes ago
>> >>and they were all identical.  I can confirm that, in my laptop, the
>> >>tests work without that GUC, and fail in exactly that way with it.
>> >>
>> >
>> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
>> >soon. I'll push a fix in a minute.
>> >
>>
>> OK, I've pushed a fix - this should make the buildfarm happy again.
>>
>> It however seems to me a bit more needs to be done. The fix makes
>> show_incremental_sort_info closer to show_sort_info, but not entirely
>> because IncrementalSortState does not have sort_Done flag so it still
>> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
>> before, but not having that flag seems a bit weird to me.
>>
>> It also seems possibly incorrect - we may end up with
>>
>>fullsortGroupInfo->groupCount == 0
>>prefixsortGroupInfo->groupCount > 0
>>
>> but we won't print anything.
>
>This shouldn't ever be possible, because the only way we get any
>prefix groups at all is if we've already sorted a full sort group
>during the mode transition.
>
>> James, any opinion on this? I'd say we should restore the sort_Done flag
>> and make it work as in plain Sort. Or some comment explaining why
>> depending on the counts is OK (assuming it is).
>
>There's previous email traffic on this thread about that (I can look
>it up later this evening), but the short of it is that I believe that
>relying on the group count is actually more correct than a sort_Done
>flag in the case of incremental sort (in contrast to regular sort).
>

OK. Maybe we should add a comment to explain.c saying it's OK.

I've pushed a fix for failures due to different planned workers (in the
test I added to show changes due to add_partial_path tweaks).

It seems we're not out of the woods yet, though. rhinoceros and
sidewinder failed with something like this:

 Sort Method: quicksort  Memory: NNkB
+   Sort Method: unknown  Disk: NNkB

Would you mind investigating at it?


I assume that means those build farm members run with very low
work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
these tests? Or is that bad practice and these are to expose issues
with changing into disk sort mode?



I don't think so - I don't see any work_mem changes in the config - see
the extra_config at the beginning of the page with details:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2020-04-06%2023%3A00%3A16

Moreover, this seems to be in regular Sort, not Incremental Sort and it
very much seems like it gets confused to print a worker info because the
only way for Sort to print two "Sort Method" lines seems to be to enter
either both

  if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
  {
... print leader info ...
  }

  and

  if (sortstate->shared_info != NULL)
  {
for (n = 0; n < sortstate->shared_info->num_workers; n++)
{
  ... print worker info ...
}
  }

or maybe there are two workers? It's strange ...


It doesn't seem to be particularly platform-specific, but I've been
unable to reproduce it so far. It seems on older gcc versions, though.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 7:09 PM James Coleman  wrote:
>
> On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
>  wrote:
> >
> > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> > > wrote:
> > >>
> > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> > >> >>On 2020-Apr-06, Tom Lane wrote:
> > >> >>
> > >> >>>Locally, things pass without force_parallel_mode, but turning it on
> > >> >>>produces failures that look similar to rhinoceros's (didn't examine
> > >> >>>other BF members).
> > >> >>
> > >> >>FWIW I looked at the eight failures there were about fifteen minutes 
> > >> >>ago
> > >> >>and they were all identical.  I can confirm that, in my laptop, the
> > >> >>tests work without that GUC, and fail in exactly that way with it.
> > >> >>
> > >> >
> > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> > >> >soon. I'll push a fix in a minute.
> > >> >
> > >>
> > >> OK, I've pushed a fix - this should make the buildfarm happy again.
> > >>
> > >> It however seems to me a bit more needs to be done. The fix makes
> > >> show_incremental_sort_info closer to show_sort_info, but not entirely
> > >> because IncrementalSortState does not have sort_Done flag so it still
> > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> > >> before, but not having that flag seems a bit weird to me.
> > >>
> > >> It also seems possibly incorrect - we may end up with
> > >>
> > >>fullsortGroupInfo->groupCount == 0
> > >>prefixsortGroupInfo->groupCount > 0
> > >>
> > >> but we won't print anything.
> > >
> > >This shouldn't ever be possible, because the only way we get any
> > >prefix groups at all is if we've already sorted a full sort group
> > >during the mode transition.
> > >
> > >> James, any opinion on this? I'd say we should restore the sort_Done flag
> > >> and make it work as in plain Sort. Or some comment explaining why
> > >> depending on the counts is OK (assuming it is).
> > >
> > >There's previous email traffic on this thread about that (I can look
> > >it up later this evening), but the short of it is that I believe that
> > >relying on the group count is actually more correct than a sort_Done
> > >flag in the case of incremental sort (in contrast to regular sort).
> > >
> >
> > OK. Maybe we should add a comment to explain.c saying it's OK.
> >
> > I've pushed a fix for failures due to different planned workers (in the
> > test I added to show changes due to add_partial_path tweaks).
> >
> > It seems we're not out of the woods yet, though. rhinoceros and
> > sidewinder failed with something like this:
> >
> >  Sort Method: quicksort  Memory: NNkB
> > +   Sort Method: unknown  Disk: NNkB
> >
> > Would you mind investigating at it?
>
> I assume that means those build farm members run with very low
> work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
> these tests? Or is that bad practice and these are to expose issues
> with changing into disk sort mode?

On rhinoceros I see:

== pgsql.build/src/test/regress/regression.diffs
===
diff -U3 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
2020-03-14 10:37:49.156761104 -0700
+++ 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
2020-04-06 16:01:13.766798059 -0700
@@ -1328,8 +1328,9 @@
  ->  Sort (actual rows=3 loops=1)
Sort Key: sq_limit.c1, sq_limit.pk
Sort Method: top-N heapsort  Memory: xxx
+   Sort Method: unknown  Disk: 0kB
->  Seq Scan on sq_limit (actual rows=8 loops=1)
-(6 rows)
+(7 rows)

Same on sidewinder.

Given the 0kB I'm not sure this is *just* a work_mem thing, though
that's still something I'm curious to know about, and it's still part
of the "problem" here.

I'm investigating further.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >>
> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >>>other BF members).
> >> >>
> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >> >>and they were all identical.  I can confirm that, in my laptop, the
> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >>
> >> >
> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >> >soon. I'll push a fix in a minute.
> >> >
> >>
> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >>
> >> It however seems to me a bit more needs to be done. The fix makes
> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> because IncrementalSortState does not have sort_Done flag so it still
> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> before, but not having that flag seems a bit weird to me.
> >>
> >> It also seems possibly incorrect - we may end up with
> >>
> >>fullsortGroupInfo->groupCount == 0
> >>prefixsortGroupInfo->groupCount > 0
> >>
> >> but we won't print anything.
> >
> >This shouldn't ever be possible, because the only way we get any
> >prefix groups at all is if we've already sorted a full sort group
> >during the mode transition.
> >
> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> and make it work as in plain Sort. Or some comment explaining why
> >> depending on the counts is OK (assuming it is).
> >
> >There's previous email traffic on this thread about that (I can look
> >it up later this evening), but the short of it is that I believe that
> >relying on the group count is actually more correct than a sort_Done
> >flag in the case of incremental sort (in contrast to regular sort).
> >
>
> OK. Maybe we should add a comment to explain.c saying it's OK.
>
> I've pushed a fix for failures due to different planned workers (in the
> test I added to show changes due to add_partial_path tweaks).
>
> It seems we're not out of the woods yet, though. rhinoceros and
> sidewinder failed with something like this:
>
>  Sort Method: quicksort  Memory: NNkB
> +   Sort Method: unknown  Disk: NNkB
>
> Would you mind investigating at it?

I assume that means those build farm members run with very low
work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
these tests? Or is that bad practice and these are to expose issues
with changing into disk sort mode?

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 06:34:04PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote:

Well, it's *less* unhappy.  thorntail is showing that the number of
workers field is not stable; that will need to be masked.



Yeah, I've already pushed a fix for that. But there seems to be another
failure in th explain output. Looking.


I'm kind of unimpressed with that fix, because it guarantees that if
there's any problem with more than 2 workers, this test will never
find it.  I think you should do what I said and arrange to replace
the number-of-workers output with "N".



I can do that, but isn't this how every other regression test does it?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote:
>> Well, it's *less* unhappy.  thorntail is showing that the number of
>> workers field is not stable; that will need to be masked.

> Yeah, I've already pushed a fix for that. But there seems to be another
> failure in th explain output. Looking.

I'm kind of unimpressed with that fix, because it guarantees that if
there's any problem with more than 2 workers, this test will never
find it.  I think you should do what I said and arrange to replace
the number-of-workers output with "N".

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

OK, I've pushed a fix - this should make the buildfarm happy again.


Well, it's *less* unhappy.  thorntail is showing that the number of
workers field is not stable; that will need to be masked.



Yeah, I've already pushed a fix for that. But there seems to be another
failure in th explain output. Looking.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:

On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
 wrote:

On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
>On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
>>On 2020-Apr-06, Tom Lane wrote:
>>
>>>Locally, things pass without force_parallel_mode, but turning it on
>>>produces failures that look similar to rhinoceros's (didn't examine
>>>other BF members).
>>
>>FWIW I looked at the eight failures there were about fifteen minutes ago
>>and they were all identical.  I can confirm that, in my laptop, the
>>tests work without that GUC, and fail in exactly that way with it.
>>
>
>Yes, there's a thinko in show_incremental_sort_info() and it returns too
>soon. I'll push a fix in a minute.
>

OK, I've pushed a fix - this should make the buildfarm happy again.

It however seems to me a bit more needs to be done. The fix makes
show_incremental_sort_info closer to show_sort_info, but not entirely
because IncrementalSortState does not have sort_Done flag so it still
depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
before, but not having that flag seems a bit weird to me.

It also seems possibly incorrect - we may end up with

   fullsortGroupInfo->groupCount == 0
   prefixsortGroupInfo->groupCount > 0

but we won't print anything.

This shouldn't ever be possible, because the only way we get any
prefix groups at all is if we've already sorted a full sort group
during the mode transition.

James, any opinion on this? I'd say we should restore the sort_Done flag
and make it work as in plain Sort. Or some comment explaining why
depending on the counts is OK (assuming it is).

There's previous email traffic on this thread about that (I can look
it up later this evening), but the short of it is that I believe that
relying on the group count is actually more correct than a sort_Done
flag in the case of incremental sort (in contrast to regular sort).

OK. Maybe we should add a comment to explain.c saying it's OK.

I've pushed a fix for failures due to different planned workers (in the
test I added to show changes due to add_partial_path tweaks).

It seems we're not out of the woods yet, though. rhinoceros and
sidewinder failed with something like this:

Sort Method: quicksort  Memory: NNkB
+   Sort Method: unknown  Disk: NNkB

Would you mind investigating at it?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> OK, I've pushed a fix - this should make the buildfarm happy again.

Well, it's *less* unhappy.  thorntail is showing that the number of
workers field is not stable; that will need to be masked.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >>On 2020-Apr-06, Tom Lane wrote:
> >>
> >>>Locally, things pass without force_parallel_mode, but turning it on
> >>>produces failures that look similar to rhinoceros's (didn't examine
> >>>other BF members).
> >>
> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >>and they were all identical.  I can confirm that, in my laptop, the
> >>tests work without that GUC, and fail in exactly that way with it.
> >>
> >
> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >soon. I'll push a fix in a minute.
> >
>
> OK, I've pushed a fix - this should make the buildfarm happy again.
>
> It however seems to me a bit more needs to be done. The fix makes
> show_incremental_sort_info closer to show_sort_info, but not entirely
> because IncrementalSortState does not have sort_Done flag so it still
> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> before, but not having that flag seems a bit weird to me.
>
> It also seems possibly incorrect - we may end up with
>
>fullsortGroupInfo->groupCount == 0
>prefixsortGroupInfo->groupCount > 0
>
> but we won't print anything.

This shouldn't ever be possible, because the only way we get any
prefix groups at all is if we've already sorted a full sort group
during the mode transition.

> James, any opinion on this? I'd say we should restore the sort_Done flag
> and make it work as in plain Sort. Or some comment explaining why
> depending on the counts is OK (assuming it is).

There's previous email traffic on this thread about that (I can look
it up later this evening), but the short of it is that I believe that
relying on the group count is actually more correct than a sort_Done
flag in the case of incremental sort (in contrast to regular sort).

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:

On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:

On 2020-Apr-06, Tom Lane wrote:


Locally, things pass without force_parallel_mode, but turning it on
produces failures that look similar to rhinoceros's (didn't examine
other BF members).


FWIW I looked at the eight failures there were about fifteen minutes ago
and they were all identical.  I can confirm that, in my laptop, the
tests work without that GUC, and fail in exactly that way with it.



Yes, there's a thinko in show_incremental_sort_info() and it returns too
soon. I'll push a fix in a minute.



OK, I've pushed a fix - this should make the buildfarm happy again.

It however seems to me a bit more needs to be done. The fix makes
show_incremental_sort_info closer to show_sort_info, but not entirely
because IncrementalSortState does not have sort_Done flag so it still
depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
before, but not having that flag seems a bit weird to me.

It also seems possibly incorrect - we may end up with

  fullsortGroupInfo->groupCount == 0
  prefixsortGroupInfo->groupCount > 0

but we won't print anything.

James, any opinion on this? I'd say we should restore the sort_Done flag
and make it work as in plain Sort. Or some comment explaining why
depending on the counts is OK (assuming it is).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 5:22 PM James Coleman  wrote:
>
> On Mon, Apr 6, 2020 at 5:20 PM James Coleman  wrote:
> >
> > On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra
> >  wrote:
> > >
> > > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> > > >On 2020-Apr-06, Tom Lane wrote:
> > > >
> > > >> Locally, things pass without force_parallel_mode, but turning it on
> > > >> produces failures that look similar to rhinoceros's (didn't examine
> > > >> other BF members).
> > > >
> > > >FWIW I looked at the eight failures there were about fifteen minutes ago
> > > >and they were all identical.  I can confirm that, in my laptop, the
> > > >tests work without that GUC, and fail in exactly that way with it.
> > > >
> > >
> > > Yes, there's a thinko in show_incremental_sort_info() and it returns too
> > > soon. I'll push a fix in a minute.
> >
> > I'm stepping through this in a debugger; is what you're considering
> > that the for loop through the workers is off by one?
>
> Oh, nevermind, misread that.
>
> Looks like if the leader doesn't participate, then we don't show
> details for workers.
>
> Tomas: Do you already have a patch? If not, I can work one up.

Well, already have it, so I'll send it just in case.

James
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62c86ecdc5..c31f3e0987 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2880,19 +2880,22 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
 
 	fullsortGroupInfo = >incsort_info.fullsortGroupInfo;
 
-	if (!(es->analyze && fullsortGroupInfo->groupCount > 0))
+	if (!es->analyze)
 		return;
 
-	show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es);
-	prefixsortGroupInfo = >incsort_info.prefixsortGroupInfo;
-	if (prefixsortGroupInfo->groupCount > 0)
+	if (fullsortGroupInfo->groupCount > 0)
 	{
+		show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es);
+		prefixsortGroupInfo = >incsort_info.prefixsortGroupInfo;
+		if (prefixsortGroupInfo->groupCount > 0)
+		{
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+appendStringInfo(es->str, " ");
+			show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
+		}
 		if (es->format == EXPLAIN_FORMAT_TEXT)
-			appendStringInfo(es->str, " ");
-		show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
+			appendStringInfo(es->str, "\n");
 	}
-	if (es->format == EXPLAIN_FORMAT_TEXT)
-		appendStringInfo(es->str, "\n");
 
 	if (incrsortstate->shared_info != NULL)
 	{
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 3b359efa29..c46443358a 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -1,3 +1,5 @@
+set force_parallel_mode = regress;
+
 -- When we have to sort the entire table, incremental sort will
 -- be slower than plain sort, so it should not be used.
 explain (costs off)

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 5:20 PM James Coleman  wrote:
>
> On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra
>  wrote:
> >
> > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> > >On 2020-Apr-06, Tom Lane wrote:
> > >
> > >> Locally, things pass without force_parallel_mode, but turning it on
> > >> produces failures that look similar to rhinoceros's (didn't examine
> > >> other BF members).
> > >
> > >FWIW I looked at the eight failures there were about fifteen minutes ago
> > >and they were all identical.  I can confirm that, in my laptop, the
> > >tests work without that GUC, and fail in exactly that way with it.
> > >
> >
> > Yes, there's a thinko in show_incremental_sort_info() and it returns too
> > soon. I'll push a fix in a minute.
>
> I'm stepping through this in a debugger; is what you're considering
> that the for loop through the workers is off by one?

Oh, nevermind, misread that.

Looks like if the leader doesn't participate, then we don't show
details for workers.

Tomas: Do you already have a patch? If not, I can work one up.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >On 2020-Apr-06, Tom Lane wrote:
> >
> >> Locally, things pass without force_parallel_mode, but turning it on
> >> produces failures that look similar to rhinoceros's (didn't examine
> >> other BF members).
> >
> >FWIW I looked at the eight failures there were about fifteen minutes ago
> >and they were all identical.  I can confirm that, in my laptop, the
> >tests work without that GUC, and fail in exactly that way with it.
> >
>
> Yes, there's a thinko in show_incremental_sort_info() and it returns too
> soon. I'll push a fix in a minute.

I'm stepping through this in a debugger; is what you're considering
that the for loop through the workers is off by one?

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:

On 2020-Apr-06, Tom Lane wrote:


Locally, things pass without force_parallel_mode, but turning it on
produces failures that look similar to rhinoceros's (didn't examine
other BF members).


FWIW I looked at the eight failures there were about fifteen minutes ago
and they were all identical.  I can confirm that, in my laptop, the
tests work without that GUC, and fail in exactly that way with it.



Yes, there's a thinko in show_incremental_sort_info() and it returns too
soon. I'll push a fix in a minute.


thanks

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Alvaro Herrera

On 2020-Apr-06, Tom Lane wrote:

> Locally, things pass without force_parallel_mode, but turning it on
> produces failures that look similar to rhinoceros's (didn't examine
> other BF members).

FWIW I looked at the eight failures there were about fifteen minutes ago
and they were all identical.  I can confirm that, in my laptop, the
tests work without that GUC, and fail in exactly that way with it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> On Mon, Apr 06, 2020 at 04:14:38PM -0400, Tom Lane wrote:
>> Did you ever use force_parallel_mode = regress?

> Ah, not sure - probably not in this round of tests and there were some
> changes in the explain code. Thanks for the hint.

Locally, things pass without force_parallel_mode, but turning it on
produces failures that look similar to rhinoceros's (didn't examine
other BF members).  At a guess, looks like missing or incorrect logic
for propagating some state back from parallel workers.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 04:14:38PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Hmmm, I see the buildfarm is not happy about it - a couple of animals
failed, but some succeeded. The failure seems like a simple difference
in explain output, but it's not clear why would it happen (and I've
ran the tests many times but never seen this failure).


Did you ever use force_parallel_mode = regress?



Ah, not sure - probably not in this round of tests and there were some
changes in the explain code. Thanks for the hint.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

Tomas Vondra  writes:
> Hmmm, I see the buildfarm is not happy about it - a couple of animals
> failed, but some succeeded. The failure seems like a simple difference
> in explain output, but it's not clear why would it happen (and I've
> ran the tests many times but never seen this failure).

Did you ever use force_parallel_mode = regress?

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:

Hi,

I've pushed the fist part of this patch series - I've reorganized it a
bit by moving the add_partial_path changes to the end. That way I've
been able to add regression test demonstrating impact of the change on
plans involving incremental sort nodes (which wouldn't be possible when
committing the add_partial_path first). I'll wait a bit before pushing
the two additional parts, so that if something fails we know which bit
caused it.



Hmmm, I see the buildfarm is not happy about it - a couple of animals
failed, but some succeeded. The failure seems like a simple difference
in explain output, but it's not clear why would it happen (and I've
ran the tests many times but never seen this failure).

Investigating.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-05 Thread Tomas Vondra


On Sun, Apr 05, 2020 at 03:01:10PM +0200, Tomas Vondra wrote:

On Thu, Apr 02, 2020 at 09:40:45PM -0400, James Coleman wrote:

On Thu, Apr 2, 2020 at 8:46 PM James Coleman  wrote:


On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
 wrote:

...
5) Overall, I think the costing is OK. I'm sure we'll find cases that
will need improvements, but that's fine. However, we now have

- cost_tuplesort (used to be cost_sort)
- cost_full_sort
- cost_incremental_sort
- cost_sort

I find it a bit confusing that we have cost_sort and cost_full_sort. Why
don't we just keep using the dummy path in label_sort_with_costsize?
That seems to be the only external caller outside costsize.c. Then we
could either make cost_full_sort static or get rid of it entirely.


This another area of the patch I haven't really modified.


See attached for a cleanup of this; it removed cost_fullsort so
label_sort_with_costsize is back to how it was.

I've directly merged this into the patch series; if you'd like to see
the diff I can send that along.



Thanks. Attached is v54 of the patch, with some minor changes. The main
two changes are in add_partial_path_precheck(), firstly to also consider
startup_cost, as discussed before. The second change (in 0003) is a bit
of an experiment to make add_partial_precheck() cheaper by only calling
compare_pathkeys after checking the costs first (which should be cheaper
than the function call). add_path_precheck already does it in that order
anyway.



Oh, I forgot to mention a change in add_partial_path - I've removed the
reference/dependency on enable_incrementalsort. It seemed rather ugly,
and the results without it seem fine (I'm benchmarking only the case
with incremental sort enabled anyway). I also plan to look at the other
optimization we bolted on last week, i.e. checking length of pathkeys.
I'll see if that actually makes measurable difference.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-02 Thread James Coleman

On Thu, Apr 2, 2020 at 8:46 PM James Coleman  wrote:
>
> On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > Thanks, the v52 looks almost ready. I've been looking at the two or
> > three things I mentioned, and I have a couple of comments.
> >
> >
> > 1) /* XXX comparison_cost shouldn't be 0? */
> >
> > I'm not worried about this, because this is not really intriduced by
> > this patch - create_sort_path has the same comment/issue, so I think
> > it's acceptable to do the same thing for incremental sort.
>
> Sounds good.
>
> > 2) INITIAL_MEMTUPSIZE
> >
> > tuplesort.c does this:
> >
> >#define INITIAL_MEMTUPSIZE Max(1024, \
> >ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
> >
> > supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
> > But I think it fails to do that, for a couple of reasons.
> >
> > Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
> > minimum (without padding), so with 1024 elements it's guaranteed to be
> > at least 21kB - so exceeding the threshold. The maximum value is
> > something like 256.
> >
> > Secondly, the second part of the formula is guaranteed to get us over
> > the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then
> >
> >ALLOCSET_SEPARATE_THRESHOLD / 30 = 273
> >
> > but we end up using 274, resulting in 8220B array. :-(
> >
> > So I guess the formula should be more like
> >
> >Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))
> >
> > or something like that.
> >
> > FWIW I think the whole hypothesis that selecting the array size below
> > ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
> > allocate this only once (or very few times), and if we need to grow the
> > array we'll hit the threshold anyway.
> >
> > I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
> > may be OK too.
>
> That was a part of the patch I haven't touched since I inherited it,
> and I didn't feel like I knew enough about the Postgres memory
> management to make a determination on whether the reasoning made
> sense.
>
> So I' happy to use a constant as suggested.

I take that back. That code hasn't changed, it's just moved. Here's
the current code in tuplesort_begin_common on master:

/*
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
*/
state->memtupsize = Max(1024,
ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1);

I'm not sure we ought to change that in this patch...

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-02 Thread James Coleman

On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
 wrote:
>
> Hi,
>
> Thanks, the v52 looks almost ready. I've been looking at the two or
> three things I mentioned, and I have a couple of comments.
>
>
> 1) /* XXX comparison_cost shouldn't be 0? */
>
> I'm not worried about this, because this is not really intriduced by
> this patch - create_sort_path has the same comment/issue, so I think
> it's acceptable to do the same thing for incremental sort.

Sounds good.

> 2) INITIAL_MEMTUPSIZE
>
> tuplesort.c does this:
>
>#define INITIAL_MEMTUPSIZE Max(1024, \
>ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
>
> supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
> But I think it fails to do that, for a couple of reasons.
>
> Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
> minimum (without padding), so with 1024 elements it's guaranteed to be
> at least 21kB - so exceeding the threshold. The maximum value is
> something like 256.
>
> Secondly, the second part of the formula is guaranteed to get us over
> the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then
>
>ALLOCSET_SEPARATE_THRESHOLD / 30 = 273
>
> but we end up using 274, resulting in 8220B array. :-(
>
> So I guess the formula should be more like
>
>Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))
>
> or something like that.
>
> FWIW I think the whole hypothesis that selecting the array size below
> ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
> allocate this only once (or very few times), and if we need to grow the
> array we'll hit the threshold anyway.
>
> I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
> may be OK too.

That was a part of the patch I haven't touched since I inherited it,
and I didn't feel like I knew enough about the Postgres memory
management to make a determination on whether the reasoning made
sense.

So I' happy to use a constant as suggested.

> 3) add_partial_path
>
> I'm a bit torn about using enable_incrementalsort in add_partial_path. I
> know we've done that to eliminate the overhead, but it just seems weird.
> I wonder if that really saves us anything or if it was just noise. I'll
> do more testing before making a decision.
>
> While looking at add_path, I see the comparisons there are made in the
> opposite order - we first compare costs, and only then pathkeys. That
> seems like a more efficient way, I guess (cheaper float comparison
> first, pathkey comparison with a loop second). I wonder why it's not
> done like that in add_partial_path too? Perhaps this will make it cheap
> enough to remove the enable_incrementalsort check.

I would love to avoid checking enable_incrementalsort there. I think
it's pretty gross to do so. But if it's the only way to keep the patch
acceptable, then obviously I'd leave it in. So I'm hopeful we can
avoid needing it.

> 4) add_partial_path_precheck
>
> While looking at add_partial_path, I realized add_partial_path_precheck
> probably needs the same change - to consider startup cost too. I think
> the expectation is that add_partial_path_precheck only rejects paths
> that we know would then get rejected by add_partial_path.
>
> But now the behavior is inconsistent, which might lead to surprising
> behavior (I haven't seen such cases, but I haven't looked for them).
> At the moment the add_partial_path_precheck is called only from
> joinpath.c, maybe it's not an issue there.
>
> If it's OK to keep it like this, it'd probably deserve a comment why
> the difference is OK. And the comments also contain the same claim that
> parallel plans only need to look at total cost.

I remember some discussion about that much earlier in this thread (or
in the spin-off thread [1] re: that specific change that didn't get a
lot of action), and I'm pretty sure the conclusion was that we should
change both. But I guess we never got around to doing it.

> 5) Overall, I think the costing is OK. I'm sure we'll find cases that
> will need improvements, but that's fine. However, we now have
>
> - cost_tuplesort (used to be cost_sort)
> - cost_full_sort
> - cost_incremental_sort
> - cost_sort
>
> I find it a bit confusing that we have cost_sort and cost_full_sort. Why
> don't we just keep using the dummy path in label_sort_with_costsize?
> That seems to be the only external caller outside costsize.c. Then we
> could either make cost_full_sort static or get rid of it entirely.

This another area of the patch I haven't really modified.

James

[1]: 
https://www.postgresql.org/message-id/flat/CAAaqYe-5HmM4ih6FWp2RNV9rruunfrFrLhqFXF_nrrNCPy1Zhg%40mail.gmail.com

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-02 Thread Tomas Vondra


Hi,

Thanks, the v52 looks almost ready. I've been looking at the two or
three things I mentioned, and I have a couple of comments.


1) /* XXX comparison_cost shouldn't be 0? */

I'm not worried about this, because this is not really intriduced by
this patch - create_sort_path has the same comment/issue, so I think
it's acceptable to do the same thing for incremental sort.


2) INITIAL_MEMTUPSIZE

tuplesort.c does this:

  #define INITIAL_MEMTUPSIZE Max(1024, \
  ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)

supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
But I think it fails to do that, for a couple of reasons.

Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
minimum (without padding), so with 1024 elements it's guaranteed to be
at least 21kB - so exceeding the threshold. The maximum value is
something like 256.

Secondly, the second part of the formula is guaranteed to get us over
the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then

  ALLOCSET_SEPARATE_THRESHOLD / 30 = 273

but we end up using 274, resulting in 8220B array. :-(

So I guess the formula should be more like

  Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))

or something like that.

FWIW I think the whole hypothesis that selecting the array size below
ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
allocate this only once (or very few times), and if we need to grow the
array we'll hit the threshold anyway.

I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
may be OK too.


3) add_partial_path

I'm a bit torn about using enable_incrementalsort in add_partial_path. I
know we've done that to eliminate the overhead, but it just seems weird.
I wonder if that really saves us anything or if it was just noise. I'll
do more testing before making a decision.

While looking at add_path, I see the comparisons there are made in the
opposite order - we first compare costs, and only then pathkeys. That
seems like a more efficient way, I guess (cheaper float comparison
first, pathkey comparison with a loop second). I wonder why it's not
done like that in add_partial_path too? Perhaps this will make it cheap
enough to remove the enable_incrementalsort check.


4) add_partial_path_precheck

While looking at add_partial_path, I realized add_partial_path_precheck
probably needs the same change - to consider startup cost too. I think
the expectation is that add_partial_path_precheck only rejects paths
that we know would then get rejected by add_partial_path.

But now the behavior is inconsistent, which might lead to surprising
behavior (I haven't seen such cases, but I haven't looked for them).
At the moment the add_partial_path_precheck is called only from
joinpath.c, maybe it's not an issue there.

If it's OK to keep it like this, it'd probably deserve a comment why
the difference is OK. And the comments also contain the same claim that
parallel plans only need to look at total cost.


5) Overall, I think the costing is OK. I'm sure we'll find cases that
will need improvements, but that's fine. However, we now have 


- cost_tuplesort (used to be cost_sort)
- cost_full_sort
- cost_incremental_sort
- cost_sort

I find it a bit confusing that we have cost_sort and cost_full_sort. Why
don't we just keep using the dummy path in label_sort_with_costsize?
That seems to be the only external caller outside costsize.c. Then we
could either make cost_full_sort static or get rid of it entirely.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-01 Thread Tomas Vondra


On Wed, Apr 01, 2020 at 10:09:20PM -0400, James Coleman wrote:

On Wed, Apr 1, 2020 at 5:42 PM Tomas Vondra
 wrote:

...
I've realized the way get_useful_pathkeys_for_relation() is coded kinda
works against the fastpath we added when comparing pathkeys. That
depends on comparing pointers to the list, but we've been building new
lists (and then returned those) which defeats the optimization. Attached
is a patch that returns the original list in most cases (and only
creates a copy when really necessary). This might also save a few cycles
on bulding the new list, of course.

I've done a bunch of read-only pgbench tests with fairly small scales (1
and 10). First with the built-in read-only transaction, and also with a
simple custom query doing an order-by. And I did this both on the
default schema and with a bunch of extra indexes. The script I used to
run this is attached, along with a summary of results.

There are results for master and v40 and v50 patches (the v50 also
includes the extra patch fixing get_useful_pathkeys_for_relation).

Overall, I'm happy with those results - the v50 seems to be within 1% of
master, in both directions. This very much seems like a noise.

I still want to do a bit more review of the costing / tuplesort changes,
which I plan to do tomorrow. If that goes well, I plan to start
committing this. So please if you think this is not ready or wants more


I think we need to either implement this or remove the comment:
* XXX I wonder if we need to consider adding a projection here, as
* create_ordered_paths does.
in generate_useful_gather_paths().



Yeah. I think we don't need the projection here. My reasoning is that if
we don't need it in generate_gather_paths(), we don't need it here.


In the same function we have the following code:
/*
* When the partial path is already sorted, we can just add a gather
* merge on top, and we're done - no point in adding explicit sort.
*
* XXX Can't we skip this (maybe only for the cheapest partial path)
* when the path is already sorted? Then it's likely duplicate with
* the path created by generate_gather_paths.
*/
if (is_sorted)
{
   path = create_gather_merge_path(root, rel, subpath, rel->reltarget,

subpath->pathkeys, NULL, rowsp);

   add_path(rel, >path);
   continue;
}

looking at the relevant loop in generate_gather_paths:
/*
* For each useful ordering, we can consider an order-preserving Gather
* Merge.
*/
foreach(lc, rel->partial_pathlist)
{
   Path   *subpath = (Path *) lfirst(lc);
   GatherMergePath *path;

   if (subpath->pathkeys == NIL)
   continue;

   rows = subpath->rows * subpath->parallel_workers;
   path = create_gather_merge_path(root, rel, subpath, rel->reltarget,

subpath->pathkeys, NULL, rowsp);
   add_path(rel, >path);
}

I believe we can eliminate the block entirely in
generate_useful_gather_paths(). Here's my reasoning: all paths for
which is_sorted is true must necessarily have pathkeys, and since we
already add a gather merge for every subpath with pathkeys, we've
already added gather merge paths for all of these.

I've included a patch to change this, but let me know if the reasoning
isn't sound.



Good catch! I think you're correct - we don't need to generate this
path, and we can just skip that partial path entirely.


We can also remove the XXX on this comment (in the same function):
* XXX This is not redundant with the gather merge path created in
* generate_gather_paths, because that merely preserves ordering of
* the cheapest partial path, while here we add an explicit sort to
* get match the useful ordering.

because of this code in generate_gather_paths():
cheapest_partial_path = linitial(rel->partial_pathlist);
rows =
   cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
simple_gather_path = (Path *)
   create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
  NULL, rowsp);
add_path(rel, simple_gather_path);

but we can cleanup the comment a bit: fix the grammar issue in the
last line and fix the reference to gather merge path (it's a gather
path).

I've included that in the same patch.



OK, makes sense.


I also noticed that in create_incremental_sort_path we have this:
/* XXX comparison_cost shouldn't be 0? */
but I guess that's part of what you're reviewing tomorrow.



Right, one of the bits.


time for a review, let me know. I'm not yet sure if I'll commit this as
a single change, or in three separate commits.


I don't love the idea of committing it as a single patch, but at least
the first two I think probably go together. Otherwise we're
introducing a "fix" with no proven impact that will slow down planning
(even if only in a small way) only to intend to condition that on a
GUC in the next commit.

But I think you could potentially make an argument for keeping the
additional paths separate...but it's not absolutely necessary IMO.



OK. I've been actually

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-01 Thread Tomas Vondra

On Wed, Apr 01, 2020 at 09:05:27AM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 11:07 PM James Coleman  wrote:

On Tue, Mar 31, 2020 at 10:44 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
> >> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> >> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> >> >> > wrote:

...

> >> >> >One small idea, but I'm not yet sure it helps us a whole lot: if the
> >> >> >query pathkeys is only length 1, then we could skip the additional
> >> >> >path creation.
> >> >> >
> >> >>
> >> >> I don't follow. Why would we create incremental sort in this case at
> >> >> all? With single-element query_pathkeys the path is either unsorted or
> >> >> fully sorted - there's no room for incremental sort. No?
> >> >
> >> >Well, we shouldn't, that's what I'm getting. But I didn't see anything
> >> >in the code now that explicitly excludes that case when decided
> >> >whether or not to create an incremental sort path, unless I'm missing
> >> >something obvious.
> >>
> >> Well, my point is that create_ordered_paths() looks like this:
> >>
> >>  is_sorted = pathkeys_common_contained_in(root->sort_patkeys, ...);
> >>
> >>  if (is_sorted)
> >>  {
> >>  ... old code
> >>  }
> >>  else
> >>  {
> >>  if (input_path == cheapest_input_path)
> >>  {
> >>  ... old code
> >>  }
> >>
> >>  /* With incremental sort disabled, don't build those paths. */
> >>  if (!enable_incrementalsort)
> >>  continue;
> >>
> >>  /* Likewise, if the path can't be used for incremental sort. */
> >>  if (!presorted_keys)
> >>  continue;
> >>
> >>  ... incremental sort path
> >>  }
> >>
> >> Now, with single-item sort_pathkeys, the input path can't be partially
> >> sorted. It's either fully sorted - in which case it's handled by the
> >> first branch. Or it's not sorted at all, so presorted_keys==0 and we
> >> never get to the incremental path.
> >>
> >> Or did you mean to use the optimization somewhere else?
> >
> >Hmm, yes, I didn't think through that properly. I'll have to look at
> >the other cases to confirm the same logic applies there.

I looked through this more carefully, and I did end up finding a few
places where we can skip iterating through a list of paths entirely
with this check, so I added it there. I also cleaned up some comments,
added comments and asserts to the other places where
list_length(pathkeys) should be guaranteed to be > 1, removed a few
asserts I found unnecessary, and merged duplicative
pathkeys_[count_]_contained_in calls.

OK

> >One other thing:in the code above we create the regular sort path
> >inside of `if (input_path == cheapest_input_path)`, but incremental
> >sort is outside of that condition. I'm not sure I'm remembering why
> >that was, and it's not obvious to me reading it right now (though it's
> >getting late here, so maybe I'm just not thinking clearly). Do you
> >happen to remember why that is?
> >
>
> It's because for the regular sort, the path is either already sorted or
> it requires a full sort. But full sort only makes sense on the cheapest
> path, because we assume the additional sort cost is independent of the
> input cost, essentially
>
> cost(path + Sort) = cost(path) + cost(Sort)
>
> and it's always
>
>  cost(path) + cost(Sort) >= cost(cheapest path) + cost(Sort)
>
> and by checking for cheapest path we simply skip building all the paths
> that we'd end up discarding anyway.
>
> With incremental sort we can't do this, the cost of the incremental sort
> depends on how well presorted is the input path.

Thanks for the explanation. I've added a comment to that effect.

Thanks.

I've realized the way get_useful_pathkeys_for_relation() is coded kinda
works against the fastpath we added when comparing pathkeys. That
depends on comparing pointers to the list, but we've been building new
lists (and then returned those) which defeats the optimization. Attached
is a patch that returns the original list in most cases (and only
creates a copy when really necessary). This might also save a few cycles
on bulding the new list, of course.

I've done a bunch of read-only pgbench tests with fairly small scales (1
and 10). First with the built-in read-only transaction, and also with a
simple custom query doing an order-by. And I did this both on the
default schema and with a bunch of extra indexes. The script I used to
run this is attached, along with a summary of results.

There are results for master and v40 and v50 patches (the v50 also
includes the extra patch fixing get_useful_pathkeys_for_relation).

Overall, I'm happy with those results - the v50 seems to be

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> wrote:
>>
>> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
>> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
>> > wrote:
>> >>
>> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >> >> >Tomas Vondra  writes:
>> >> >> >> In general, I think it'd be naive that we can make planner smarter 
with
>> >> >> >> no extra overhead spent on planning, and we can never accept patches
>> >> >> >> adding even tiny overhead. With that approach we'd probably end up 
with
>> >> >> >> a trivial planner that generates just a single query plan, because
>> >> >> >> that's going to be the fastest planner. A realistic approach needs 
to
>> >> >> >> consider both the planning and execution phase, and benefits of this
>> >> >> >> patch seem to be clear - if you have queries that do benefit from 
it.
>> >> >> >
>> >> >> >I think that's kind of attacking a straw man, though.  The thing that
>> >> >> >people push back on, or should push back on IMO, is when a proposed
>> >> >> >patch adds significant slowdown to queries that it has no or very 
little
>> >> >> >hope of improving.  The trick is to do expensive stuff only when
>> >> >> >there's a good chance of getting a better plan out of it.
>> >> >> >
>> >> >>
>> >> >> Yeah, I agree with that. I think the main issue is that we don't really
>> >> >> know what the "expensive stuff" is in this case, so it's not really
>> >> >> clear how to be smarter :-(
>> >> >
>> >> >To add to this: I agree that ideally you'd check cheaply to know
>> >> >you're in a situation that might help, and then do more work. But here
>> >> >the question is always going to be simply "would we benefit from an
>> >> >ordering, and, if so, do we have it already partially sorted". It's
>> >> >hard to imagine that reducing much conceptually, so we're left with
>> >> >optimizations of that check.
>> >> >
>> >>
>> >> I think it depends on what exactly is the expensive part. For example if
>> >> it's the construction of IncrementalSort paths, then maybe we could try
>> >> do a quick/check check if the path can even be useful by estimating the
>> >> cost and only then building the path.
>> >>
>> >> That's what we do for join algorithms, for example - we first compute
>> >> initial_cost_nestloop and only when that seems cheap enough we do the
>> >> more expensive stuff.
>> >>
>> >> But I'm not sure the path construction is the expensive part, as it
>> >> should be disabled by enable_incrementalsort=off. But the regression
>> >> does not seem to disappear, at least not entirely.
>> >>
>> >>
>> >> >> One possibility is that it's just one of those regressions due to 
change
>> >> >> in binary layout, but I'm not sure know how to verify that.
>> >> >
>> >> >If we are testing with a case that can't actually add more paths (due
>> >> >to it checking the guc before building them), doesn't that effectively
>> >> >leave one of these two options:
>> >> >
>> >> >1. Binary layout/cache/other untraceable change, or
>> >> >2. Changes due to refactored function calls.
>> >> >
>> >>
>> >> Hmm, but in case of (1) the overhead should be there even with tests
>> >> that don't really have any additional paths to consider, right? I've
>> >> tried with such test (single table with no indexes) and I don't quite
>> >> see any regression (maybe ~1%).
>> >
>> >Not necessarily, if the cost is in sort costing or useful pathkeys
>> >checking, right? We have run that code even without incremental sort,
>> >but it's changed from master.
>> >
>>
>> Ah, I should have mentioned I've done most of the tests on just the
>> basic incremental sort patch (0001+0002), without the additional useful
>> paths. I initially tested the whole patch series, but after discovering
>> the regression I removed the last part (which I suspected might be the
>> root cause). But the regression is still there, so it's not that.
>>
>> It might be in the reworked costing, yeah. But then I'd expect those
>> function to show in the perf profile.
>
>Right. I'm just grasping at straws on that.
>
>> >> (2) might have impact, but I don't see any immediate supects. Did you
>> >> have some functions in mind?
>> >
>> >I guess this is where the lines blur: I didn't see anything obvious
>> >either, but the changes to sort costing...should probably not have
>> >real impact...but...
>> >
>>
>> :-(
>>
>> >> BTW I see the patch adds pathkeys_common but it's not called from
>> >> anywhere. It's probably leftover from an earlier patch version.
>> >>
>
>BTW, I think I'm going to rename the pathkeys_common_contained_in
>function to something like pathkeys_count_contained_in, unless you
>have an

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> >> >> > wrote:
> >> >> >>
> >> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >> >> >Tomas Vondra  writes:
> >> >> >> >> In general, I think it'd be naive that we can make planner 
> >> >> >> >> smarter with
> >> >> >> >> no extra overhead spent on planning, and we can never accept 
> >> >> >> >> patches
> >> >> >> >> adding even tiny overhead. With that approach we'd probably end 
> >> >> >> >> up with
> >> >> >> >> a trivial planner that generates just a single query plan, because
> >> >> >> >> that's going to be the fastest planner. A realistic approach 
> >> >> >> >> needs to
> >> >> >> >> consider both the planning and execution phase, and benefits of 
> >> >> >> >> this
> >> >> >> >> patch seem to be clear - if you have queries that do benefit from 
> >> >> >> >> it.
> >> >> >> >
> >> >> >> >I think that's kind of attacking a straw man, though.  The thing 
> >> >> >> >that
> >> >> >> >people push back on, or should push back on IMO, is when a proposed
> >> >> >> >patch adds significant slowdown to queries that it has no or very 
> >> >> >> >little
> >> >> >> >hope of improving.  The trick is to do expensive stuff only when
> >> >> >> >there's a good chance of getting a better plan out of it.
> >> >> >> >
> >> >> >>
> >> >> >> Yeah, I agree with that. I think the main issue is that we don't 
> >> >> >> really
> >> >> >> know what the "expensive stuff" is in this case, so it's not really
> >> >> >> clear how to be smarter :-(
> >> >> >
> >> >> >To add to this: I agree that ideally you'd check cheaply to know
> >> >> >you're in a situation that might help, and then do more work. But here
> >> >> >the question is always going to be simply "would we benefit from an
> >> >> >ordering, and, if so, do we have it already partially sorted". It's
> >> >> >hard to imagine that reducing much conceptually, so we're left with
> >> >> >optimizations of that check.
> >> >> >
> >> >>
> >> >> I think it depends on what exactly is the expensive part. For example if
> >> >> it's the construction of IncrementalSort paths, then maybe we could try
> >> >> do a quick/check check if the path can even be useful by estimating the
> >> >> cost and only then building the path.
> >> >>
> >> >> That's what we do for join algorithms, for example - we first compute
> >> >> initial_cost_nestloop and only when that seems cheap enough we do the
> >> >> more expensive stuff.
> >> >>
> >> >> But I'm not sure the path construction is the expensive part, as it
> >> >> should be disabled by enable_incrementalsort=off. But the regression
> >> >> does not seem to disappear, at least not entirely.
> >> >>
> >> >>
> >> >> >> One possibility is that it's just one of those regressions due to 
> >> >> >> change
> >> >> >> in binary layout, but I'm not sure know how to verify that.
> >> >> >
> >> >> >If we are testing with a case that can't actually add more paths (due
> >> >> >to it checking the guc before building them), doesn't that effectively
> >> >> >leave one of these two options:
> >> >> >
> >> >> >1. Binary layout/cache/other untraceable change, or
> >> >> >2. Changes due to refactored function calls.
> >> >> >
> >> >>
> >> >> Hmm, but in case of (1) the overhead should be there even with tests
> >> >> that don't really have any additional paths to consider, right? I've
> >> >> tried with such test (single table with no indexes) and I don't quite
> >> >> see any regression (maybe ~1%).
> >> >
> >> >Not necessarily, if the cost is in sort costing or useful pathkeys
> >> >checking, right? We have run that code even without incremental sort,
> >> >but it's changed from master.
> >> >
> >>
> >> Ah, I should have mentioned I've done most of the tests on just the
> >> basic incremental sort patch (0001+0002), without the additional useful
> >> paths. I initially tested the whole patch series, but after discovering
> >> the regression I removed the last part (which I suspected might be the
> >> root cause). But the regression is still there, so it's not that.
> >>
> >> It might be in the reworked costing, yeah. But then I'd expect those
> >> function to show in the perf profile.
> >
> >Right. I'm just grasping at straws on that.
> >
> >> >> (2) might have impact, but I don't see any immediate supects. Did you
> >> >> have some functions in mind?
> >> >
> >> >I guess this is where the lines blur: I didn't see anything obvious
> >> >either, but the changes to sort costing...should probably not have
> >> >real impact...but...
> >> >
> >>
> >> :-(
> >>
> >> >> BTW I see the patch adds

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> wrote:
>>
>> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
>> > wrote:
>> >>
>> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >> >Tomas Vondra  writes:
>> >> >> In general, I think it'd be naive that we can make planner smarter with
>> >> >> no extra overhead spent on planning, and we can never accept patches
>> >> >> adding even tiny overhead. With that approach we'd probably end up with
>> >> >> a trivial planner that generates just a single query plan, because
>> >> >> that's going to be the fastest planner. A realistic approach needs to
>> >> >> consider both the planning and execution phase, and benefits of this
>> >> >> patch seem to be clear - if you have queries that do benefit from it.
>> >> >
>> >> >I think that's kind of attacking a straw man, though.  The thing that
>> >> >people push back on, or should push back on IMO, is when a proposed
>> >> >patch adds significant slowdown to queries that it has no or very little
>> >> >hope of improving.  The trick is to do expensive stuff only when
>> >> >there's a good chance of getting a better plan out of it.
>> >> >
>> >>
>> >> Yeah, I agree with that. I think the main issue is that we don't really
>> >> know what the "expensive stuff" is in this case, so it's not really
>> >> clear how to be smarter :-(
>> >
>> >To add to this: I agree that ideally you'd check cheaply to know
>> >you're in a situation that might help, and then do more work. But here
>> >the question is always going to be simply "would we benefit from an
>> >ordering, and, if so, do we have it already partially sorted". It's
>> >hard to imagine that reducing much conceptually, so we're left with
>> >optimizations of that check.
>> >
>>
>> I think it depends on what exactly is the expensive part. For example if
>> it's the construction of IncrementalSort paths, then maybe we could try
>> do a quick/check check if the path can even be useful by estimating the
>> cost and only then building the path.
>>
>> That's what we do for join algorithms, for example - we first compute
>> initial_cost_nestloop and only when that seems cheap enough we do the
>> more expensive stuff.
>>
>> But I'm not sure the path construction is the expensive part, as it
>> should be disabled by enable_incrementalsort=off. But the regression
>> does not seem to disappear, at least not entirely.
>>
>>
>> >> One possibility is that it's just one of those regressions due to change
>> >> in binary layout, but I'm not sure know how to verify that.
>> >
>> >If we are testing with a case that can't actually add more paths (due
>> >to it checking the guc before building them), doesn't that effectively
>> >leave one of these two options:
>> >
>> >1. Binary layout/cache/other untraceable change, or
>> >2. Changes due to refactored function calls.
>> >
>>
>> Hmm, but in case of (1) the overhead should be there even with tests
>> that don't really have any additional paths to consider, right? I've
>> tried with such test (single table with no indexes) and I don't quite
>> see any regression (maybe ~1%).
>
>Not necessarily, if the cost is in sort costing or useful pathkeys
>checking, right? We have run that code even without incremental sort,
>but it's changed from master.
>

Ah, I should have mentioned I've done most of the tests on just the
basic incremental sort patch (0001+0002), without the additional useful
paths. I initially tested the whole patch series, but after discovering
the regression I removed the last part (which I suspected might be the
root cause). But the regression is still there, so it's not that.

It might be in the reworked costing, yeah. But then I'd expect those
function to show in the perf profile.


Right. I'm just grasping at straws on that.


>> (2) might have impact, but I don't see any immediate supects. Did you
>> have some functions in mind?
>
>I guess this is where the lines blur: I didn't see anything obvious
>either, but the changes to sort costing...should probably not have
>real impact...but...
>

:-(

>> BTW I see the patch adds pathkeys_common but it's not called from
>> anywhere. It's probably leftover from an earlier patch version.
>>


BTW, I think I'm going to rename the pathkeys_common_contained_in
function to something like pathkeys_count_contained_in, unless you
have an objection to that. The name doesn't seem obvious at all to me.



WFM


>> >There's not anything obvious in point (2) that would be a big cost,
>> >but there are definitely changes there. I was surprised that just
>> >eliminating the loop through the pathkeys on the query and the index
>> >was enough to save us ~4%.
>> >
>> >Tomas: Earlier you'd wondered about if we should try to shortcut the
>> >changes

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >> >Tomas Vondra  writes:
> >> >> >> In general, I think it'd be naive that we can make planner smarter 
> >> >> >> with
> >> >> >> no extra overhead spent on planning, and we can never accept patches
> >> >> >> adding even tiny overhead. With that approach we'd probably end up 
> >> >> >> with
> >> >> >> a trivial planner that generates just a single query plan, because
> >> >> >> that's going to be the fastest planner. A realistic approach needs to
> >> >> >> consider both the planning and execution phase, and benefits of this
> >> >> >> patch seem to be clear - if you have queries that do benefit from it.
> >> >> >
> >> >> >I think that's kind of attacking a straw man, though.  The thing that
> >> >> >people push back on, or should push back on IMO, is when a proposed
> >> >> >patch adds significant slowdown to queries that it has no or very 
> >> >> >little
> >> >> >hope of improving.  The trick is to do expensive stuff only when
> >> >> >there's a good chance of getting a better plan out of it.
> >> >> >
> >> >>
> >> >> Yeah, I agree with that. I think the main issue is that we don't really
> >> >> know what the "expensive stuff" is in this case, so it's not really
> >> >> clear how to be smarter :-(
> >> >
> >> >To add to this: I agree that ideally you'd check cheaply to know
> >> >you're in a situation that might help, and then do more work. But here
> >> >the question is always going to be simply "would we benefit from an
> >> >ordering, and, if so, do we have it already partially sorted". It's
> >> >hard to imagine that reducing much conceptually, so we're left with
> >> >optimizations of that check.
> >> >
> >>
> >> I think it depends on what exactly is the expensive part. For example if
> >> it's the construction of IncrementalSort paths, then maybe we could try
> >> do a quick/check check if the path can even be useful by estimating the
> >> cost and only then building the path.
> >>
> >> That's what we do for join algorithms, for example - we first compute
> >> initial_cost_nestloop and only when that seems cheap enough we do the
> >> more expensive stuff.
> >>
> >> But I'm not sure the path construction is the expensive part, as it
> >> should be disabled by enable_incrementalsort=off. But the regression
> >> does not seem to disappear, at least not entirely.
> >>
> >>
> >> >> One possibility is that it's just one of those regressions due to change
> >> >> in binary layout, but I'm not sure know how to verify that.
> >> >
> >> >If we are testing with a case that can't actually add more paths (due
> >> >to it checking the guc before building them), doesn't that effectively
> >> >leave one of these two options:
> >> >
> >> >1. Binary layout/cache/other untraceable change, or
> >> >2. Changes due to refactored function calls.
> >> >
> >>
> >> Hmm, but in case of (1) the overhead should be there even with tests
> >> that don't really have any additional paths to consider, right? I've
> >> tried with such test (single table with no indexes) and I don't quite
> >> see any regression (maybe ~1%).
> >
> >Not necessarily, if the cost is in sort costing or useful pathkeys
> >checking, right? We have run that code even without incremental sort,
> >but it's changed from master.
> >
>
> Ah, I should have mentioned I've done most of the tests on just the
> basic incremental sort patch (0001+0002), without the additional useful
> paths. I initially tested the whole patch series, but after discovering
> the regression I removed the last part (which I suspected might be the
> root cause). But the regression is still there, so it's not that.
>
> It might be in the reworked costing, yeah. But then I'd expect those
> function to show in the perf profile.

Right. I'm just grasping at straws on that.

> >> (2) might have impact, but I don't see any immediate supects. Did you
> >> have some functions in mind?
> >
> >I guess this is where the lines blur: I didn't see anything obvious
> >either, but the changes to sort costing...should probably not have
> >real impact...but...
> >
>
> :-(
>
> >> BTW I see the patch adds pathkeys_common but it's not called from
> >> anywhere. It's probably leftover from an earlier patch version.
> >>

BTW, I think I'm going to rename the pathkeys_common_contained_in
function to something like pathkeys_count_contained_in, unless you
have an objection to that. The name doesn't seem obvious at all to me.

> >> >There's not anything obvious in point (2) that would be a big cost,
> >> >but there are definitely changes there. I was surprised that just
> >> >eliminating the loop through the pathkeys on

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
 wrote:

On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> wrote:
>>
>> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >Tomas Vondra  writes:
>> >> In general, I think it'd be naive that we can make planner smarter with
>> >> no extra overhead spent on planning, and we can never accept patches
>> >> adding even tiny overhead. With that approach we'd probably end up with
>> >> a trivial planner that generates just a single query plan, because
>> >> that's going to be the fastest planner. A realistic approach needs to
>> >> consider both the planning and execution phase, and benefits of this
>> >> patch seem to be clear - if you have queries that do benefit from it.
>> >
>> >I think that's kind of attacking a straw man, though.  The thing that
>> >people push back on, or should push back on IMO, is when a proposed
>> >patch adds significant slowdown to queries that it has no or very little
>> >hope of improving.  The trick is to do expensive stuff only when
>> >there's a good chance of getting a better plan out of it.
>> >
>>
>> Yeah, I agree with that. I think the main issue is that we don't really
>> know what the "expensive stuff" is in this case, so it's not really
>> clear how to be smarter :-(
>
>To add to this: I agree that ideally you'd check cheaply to know
>you're in a situation that might help, and then do more work. But here
>the question is always going to be simply "would we benefit from an
>ordering, and, if so, do we have it already partially sorted". It's
>hard to imagine that reducing much conceptually, so we're left with
>optimizations of that check.
>

I think it depends on what exactly is the expensive part. For example if
it's the construction of IncrementalSort paths, then maybe we could try
do a quick/check check if the path can even be useful by estimating the
cost and only then building the path.

That's what we do for join algorithms, for example - we first compute
initial_cost_nestloop and only when that seems cheap enough we do the
more expensive stuff.

But I'm not sure the path construction is the expensive part, as it
should be disabled by enable_incrementalsort=off. But the regression
does not seem to disappear, at least not entirely.

>> One possibility is that it's just one of those regressions due to change
>> in binary layout, but I'm not sure know how to verify that.
>
>If we are testing with a case that can't actually add more paths (due
>to it checking the guc before building them), doesn't that effectively
>leave one of these two options:
>
>1. Binary layout/cache/other untraceable change, or
>2. Changes due to refactored function calls.
>

Hmm, but in case of (1) the overhead should be there even with tests
that don't really have any additional paths to consider, right? I've
tried with such test (single table with no indexes) and I don't quite
see any regression (maybe ~1%).

Not necessarily, if the cost is in sort costing or useful pathkeys
checking, right? We have run that code even without incremental sort,
but it's changed from master.

Ah, I should have mentioned I've done most of the tests on just the
basic incremental sort patch (0001+0002), without the additional useful
paths. I initially tested the whole patch series, but after discovering
the regression I removed the last part (which I suspected might be the
root cause). But the regression is still there, so it's not that.

It might be in the reworked costing, yeah. But then I'd expect those
function to show in the perf profile.

(2) might have impact, but I don't see any immediate supects. Did you
have some functions in mind?

I guess this is where the lines blur: I didn't see anything obvious
either, but the changes to sort costing...should probably not have
real impact...but...

:-(

BTW I see the patch adds pathkeys_common but it's not called from
anywhere. It's probably leftover from an earlier patch version.

>There's not anything obvious in point (2) that would be a big cost,
>but there are definitely changes there. I was surprised that just
>eliminating the loop through the pathkeys on the query and the index
>was enough to save us ~4%.
>
>Tomas: Earlier you'd wondered about if we should try to shortcut the
>changes in costing...I was skeptical of that originally, but maybe
>it's worth looking into? I'm going to try backing that out and see
>what the numbers look like.
>

BTW, I did this test, and it looks like we can get back something
close to 1% by reverting that initial fix on partial path costing. But
we can't get rid of it all the time, at the very least. *Maybe* we
could condition it on incremental sort, but I'm not convinced that's
the only place it's needed as a fix.

Sounds interesting. I actually tried how much the add_partial_path
change accounts for, and you're right it was quite a bit. But

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Alvaro Herrera

On 2020-Mar-31, Tom Lane wrote:

> James Coleman  writes:
> > On Tue, Mar 31, 2020 at 1:04 PM Tom Lane  wrote:
> >> Perhaps the semantics are such that that's actually sensible, but it's
> >> far from a straightforward remapping of the old enum.
> 
> > Right, I didn't see the explicit "= 0" in other enums there, so it
> > made me wonder if it was intentional to designate that one had to be
> > 0, but I guess without a comment that's a lot of inference.
> 
> It's possible that somebody meant that as an indicator that the code
> depends on palloc0() leaving the field with that value.  But if so,
> you'd soon find that out ... and an actual comment would be better,
> anyway.

git blame fingers this:

commit bf11e7ee2e3607bb67d25aec73aa53b2d7e9961b
Author: Robert Haas 
AuthorDate: Tue Aug 29 13:22:49 2017 -0400
CommitDate: Tue Aug 29 13:26:33 2017 -0400

Propagate sort instrumentation from workers back to leader.

Up until now, when parallel query was used, no details about the
sort method or space used by the workers were available; details
were shown only for any sorting done by the leader.  Fix that.

Commit 1177ab1dabf72bafee8f19d904cee3a299f25892 forced the test case
added by commit 1f6d515a67ec98194c23a5db25660856c9aab944 to run
without parallelism; now that we have this infrastructure, allow
that again, with a little tweaking to make it pass with and without
force_parallel_mode.

Robert Haas and Tom Lane

Discussion: 
http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=rf6rqq2br+zjvegwj0uod...@mail.gmail.com

I looked at the discussion thread.  That patch was first posted by
Robert at
https://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=rf6rqq2br+zjvegwj0uod...@mail.gmail.com
without the "= 0" part; later Tom posted v2 here
https://postgr.es/m/11223.1503695...@sss.pgh.pa.us
containing the "= 0", but I see no actual discussion of that point.

I suppose it could also be important to clarify that it's 0 if it were
used as an array index of some sort, but I don't see that in 2017's
commit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >Tomas Vondra  writes:
> >> >> In general, I think it'd be naive that we can make planner smarter with
> >> >> no extra overhead spent on planning, and we can never accept patches
> >> >> adding even tiny overhead. With that approach we'd probably end up with
> >> >> a trivial planner that generates just a single query plan, because
> >> >> that's going to be the fastest planner. A realistic approach needs to
> >> >> consider both the planning and execution phase, and benefits of this
> >> >> patch seem to be clear - if you have queries that do benefit from it.
> >> >
> >> >I think that's kind of attacking a straw man, though.  The thing that
> >> >people push back on, or should push back on IMO, is when a proposed
> >> >patch adds significant slowdown to queries that it has no or very little
> >> >hope of improving.  The trick is to do expensive stuff only when
> >> >there's a good chance of getting a better plan out of it.
> >> >
> >>
> >> Yeah, I agree with that. I think the main issue is that we don't really
> >> know what the "expensive stuff" is in this case, so it's not really
> >> clear how to be smarter :-(
> >
> >To add to this: I agree that ideally you'd check cheaply to know
> >you're in a situation that might help, and then do more work. But here
> >the question is always going to be simply "would we benefit from an
> >ordering, and, if so, do we have it already partially sorted". It's
> >hard to imagine that reducing much conceptually, so we're left with
> >optimizations of that check.
> >
>
> I think it depends on what exactly is the expensive part. For example if
> it's the construction of IncrementalSort paths, then maybe we could try
> do a quick/check check if the path can even be useful by estimating the
> cost and only then building the path.
>
> That's what we do for join algorithms, for example - we first compute
> initial_cost_nestloop and only when that seems cheap enough we do the
> more expensive stuff.
>
> But I'm not sure the path construction is the expensive part, as it
> should be disabled by enable_incrementalsort=off. But the regression
> does not seem to disappear, at least not entirely.
>
>
> >> One possibility is that it's just one of those regressions due to change
> >> in binary layout, but I'm not sure know how to verify that.
> >
> >If we are testing with a case that can't actually add more paths (due
> >to it checking the guc before building them), doesn't that effectively
> >leave one of these two options:
> >
> >1. Binary layout/cache/other untraceable change, or
> >2. Changes due to refactored function calls.
> >
>
> Hmm, but in case of (1) the overhead should be there even with tests
> that don't really have any additional paths to consider, right? I've
> tried with such test (single table with no indexes) and I don't quite
> see any regression (maybe ~1%).

Not necessarily, if the cost is in sort costing or useful pathkeys
checking, right? We have run that code even without incremental sort,
but it's changed from master.

> (2) might have impact, but I don't see any immediate supects. Did you
> have some functions in mind?

I guess this is where the lines blur: I didn't see anything obvious
either, but the changes to sort costing...should probably not have
real impact...but...

> BTW I see the patch adds pathkeys_common but it's not called from
> anywhere. It's probably leftover from an earlier patch version.
>
> >There's not anything obvious in point (2) that would be a big cost,
> >but there are definitely changes there. I was surprised that just
> >eliminating the loop through the pathkeys on the query and the index
> >was enough to save us ~4%.
> >
> >Tomas: Earlier you'd wondered about if we should try to shortcut the
> >changes in costing...I was skeptical of that originally, but maybe
> >it's worth looking into? I'm going to try backing that out and see
> >what the numbers look like.
> >

BTW, I did this test, and it looks like we can get back something
close to 1% by reverting that initial fix on partial path costing. But
we can't get rid of it all the time, at the very least. *Maybe* we
could condition it on incremental sort, but I'm not convinced that's
the only place it's needed as a fix.

> I've described the idea about something like initial_cost_nestloop and
> so on. But I'm a bit skeptical about it, considering that the GUC only
> has limited effect.
>
>
> A somewhat note is that the number of indexes has pretty significant
> impact on planning time, even on master. This is timing of the same
> explain script (similar to the one shown before) with different number
> of indexes on master:
>
>  0 indexes7 indexes   49 indexes
>  ---
>

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
 wrote:

On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>Tomas Vondra  writes:
>> In general, I think it'd be naive that we can make planner smarter with
>> no extra overhead spent on planning, and we can never accept patches
>> adding even tiny overhead. With that approach we'd probably end up with
>> a trivial planner that generates just a single query plan, because
>> that's going to be the fastest planner. A realistic approach needs to
>> consider both the planning and execution phase, and benefits of this
>> patch seem to be clear - if you have queries that do benefit from it.
>
>I think that's kind of attacking a straw man, though.  The thing that
>people push back on, or should push back on IMO, is when a proposed
>patch adds significant slowdown to queries that it has no or very little
>hope of improving.  The trick is to do expensive stuff only when
>there's a good chance of getting a better plan out of it.
>

Yeah, I agree with that. I think the main issue is that we don't really
know what the "expensive stuff" is in this case, so it's not really
clear how to be smarter :-(

To add to this: I agree that ideally you'd check cheaply to know
you're in a situation that might help, and then do more work. But here
the question is always going to be simply "would we benefit from an
ordering, and, if so, do we have it already partially sorted". It's
hard to imagine that reducing much conceptually, so we're left with
optimizations of that check.

I think it depends on what exactly is the expensive part. For example if
it's the construction of IncrementalSort paths, then maybe we could try
do a quick/check check if the path can even be useful by estimating the
cost and only then building the path.

That's what we do for join algorithms, for example - we first compute
initial_cost_nestloop and only when that seems cheap enough we do the
more expensive stuff.

But I'm not sure the path construction is the expensive part, as it
should be disabled by enable_incrementalsort=off. But the regression
does not seem to disappear, at least not entirely.

One possibility is that it's just one of those regressions due to change
in binary layout, but I'm not sure know how to verify that.

If we are testing with a case that can't actually add more paths (due
to it checking the guc before building them), doesn't that effectively
leave one of these two options:

1. Binary layout/cache/other untraceable change, or
2. Changes due to refactored function calls.

Hmm, but in case of (1) the overhead should be there even with tests
that don't really have any additional paths to consider, right? I've
tried with such test (single table with no indexes) and I don't quite
see any regression (maybe ~1%).

(2) might have impact, but I don't see any immediate supects. Did you
have some functions in mind?

BTW I see the patch adds pathkeys_common but it's not called from
anywhere. It's probably leftover from an earlier patch version.

There's not anything obvious in point (2) that would be a big cost,
but there are definitely changes there. I was surprised that just
eliminating the loop through the pathkeys on the query and the index
was enough to save us ~4%.

Tomas: Earlier you'd wondered about if we should try to shortcut the
changes in costing...I was skeptical of that originally, but maybe
it's worth looking into? I'm going to try backing that out and see
what the numbers look like.

I've described the idea about something like initial_cost_nestloop and
so on. But I'm a bit skeptical about it, considering that the GUC only
has limited effect.

A somewhat note is that the number of indexes has pretty significant
impact on planning time, even on master. This is timing of the same
explain script (similar to the one shown before) with different number
of indexes on master:

0 indexes7 indexes   49 indexes
---
10.8512.5627.83

The way I look at incremental sort is that it allows using indexes for
queries that couldn't use it before, possibly requiring a separate
index. So incremental sort might easily reduce the number of indexes
needed, compensating for the overhead we're discussing here. Of course,
that may or may not be true.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >Tomas Vondra  writes:
> >> In general, I think it'd be naive that we can make planner smarter with
> >> no extra overhead spent on planning, and we can never accept patches
> >> adding even tiny overhead. With that approach we'd probably end up with
> >> a trivial planner that generates just a single query plan, because
> >> that's going to be the fastest planner. A realistic approach needs to
> >> consider both the planning and execution phase, and benefits of this
> >> patch seem to be clear - if you have queries that do benefit from it.
> >
> >I think that's kind of attacking a straw man, though.  The thing that
> >people push back on, or should push back on IMO, is when a proposed
> >patch adds significant slowdown to queries that it has no or very little
> >hope of improving.  The trick is to do expensive stuff only when
> >there's a good chance of getting a better plan out of it.
> >
>
> Yeah, I agree with that. I think the main issue is that we don't really
> know what the "expensive stuff" is in this case, so it's not really
> clear how to be smarter :-(

To add to this: I agree that ideally you'd check cheaply to know
you're in a situation that might help, and then do more work. But here
the question is always going to be simply "would we benefit from an
ordering, and, if so, do we have it already partially sorted". It's
hard to imagine that reducing much conceptually, so we're left with
optimizations of that check.

> One possibility is that it's just one of those regressions due to change
> in binary layout, but I'm not sure know how to verify that.

If we are testing with a case that can't actually add more paths (due
to it checking the guc before building them), doesn't that effectively
leave one of these two options:
1. Binary layout/cache/other untraceable change, or
2. Changes due to refactored function calls.

There's not anything obvious in point (2) that would be a big cost,
but there are definitely changes there. I was surprised that just
eliminating the loop through the pathkeys on the query and the index
was enough to save us ~4%.

Tomas: Earlier you'd wondered about if we should try to shortcut the
changes in costing...I was skeptical of that originally, but maybe
it's worth looking into? I'm going to try backing that out and see
what the numbers look like.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

In general, I think it'd be naive that we can make planner smarter with
no extra overhead spent on planning, and we can never accept patches
adding even tiny overhead. With that approach we'd probably end up with
a trivial planner that generates just a single query plan, because
that's going to be the fastest planner. A realistic approach needs to
consider both the planning and execution phase, and benefits of this
patch seem to be clear - if you have queries that do benefit from it.


I think that's kind of attacking a straw man, though.  The thing that
people push back on, or should push back on IMO, is when a proposed
patch adds significant slowdown to queries that it has no or very little
hope of improving.  The trick is to do expensive stuff only when
there's a good chance of getting a better plan out of it.



Yeah, I agree with that. I think the main issue is that we don't really
know what the "expensive stuff" is in this case, so it's not really
clear how to be smarter :-(

One possibility is that it's just one of those regressions due to change
in binary layout, but I'm not sure know how to verify that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 06:00:00PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra
 wrote:

On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:
>On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
> wrote:
>>
>> The main thing I've been working on today is benchmarking how this
>> affects planning. And I'm seeing a regression that worries me a bit,
>> unfortunately.
>>
>> The test I'm doing is pretty simple - build a small table with a bunch
>> of columns:
>>
>>create table t (a int, b int, c int, d int, e int, f int, g int);
>>
>>insert into t select 100*random(), 100*random(), 100*random(),
>>  100*random(), 100*random(), 100*random(), 100*random()
>>from generate_series(1,10) s(i);
>>
>> and then a number of indexes on subsets of up to 3 columns, as generated
>> using the attached build-indexes.py script. And then run a bunch of
>> explains (so no actual execution) sorting the data by at least 4 columns
>> (to trigger incremental sort paths), measuring timing of the script.
>>
>> I did a bunch of runs on current master and v46 with incremental sort
>> disabled and enabled, and the results look like this:
>>
>>  master   offon
>>  --
>>  34.60937.46337.729
>>
>> which means about 8-9% regression with incremental sort. Of course, this
>> is only for planning time, for execution the impact is going to be much
>> smaller. But it's still a bit annoying.
>>
>> I've suspected this might be either due to the add_partial_path changes
>> or the patch adding incremental sort to additional places, so I tested
>> those parts individually and the answer is no - add_partial_path changes
>> have very small impact (~1%, which might be noise). The regression comes
>> mostly from the 0002 part that adds incremental sort. At least in this
>> particular test - different tests might behave differently, of course.
>>
>> The annoying bit is that the overhead does not disappear after disabling
>> incremental sort. That suggests this is not merely due to considering
>> and creating higher number of paths, but due to something that happens
>> before we even look at the GUC ...
>>
>> I think I've found one such place - if you look at compare_pathkeys, it
>> has this check right before the forboth() loop:
>>
>>  if (keys1 == keys2)
>>  return PATHKEYS_EQUAL;
>>
>> But with incremental sort we don't call pathkeys_contained_in, we call
>> pathkeys_common_contained_in instead. And that does not call
>> compare_pathkeys and does not have the simple equality check. Adding
>> the following check seems to cut the overhead in half, which is nice:
>>
>>  if (keys1 == keys2)
>>  {
>>  *n_common = list_length(keys1);
>> return true;
>>  }
>>
>> Not sure where the rest of the regression comes from yet.
>
>I noticed in the other function we also optimize by checking if either
>keys list is NIL, so I tried adding that, and it might have made a
>minor difference, but it's hard to tell as it was under 1%.
>

Which other function? I don't see this optimization in compare_pathkeys,
that only checks key1/key2 after the forboth loop, but that's something
different.

pathkeys_useful_for_ordering checks both inputs.

I may be wrong, but my guess would be that this won't save much, because
the loop should terminate immediately. The whole point is not to loop
over possibly many pathkeys when we know it's exactly the same pathkeys
list (same pointer). When one of the lists is NIL the loop should
terminate right away ...

>I also ran perf with a slightly modified version of your test that
>uses psql, and after the above changes was seeing something like a
>3.5% delta between master and this patch series. Nothing obvious in
>the perf report though.
>

Yeah, I don't think there's anything obviously more expensive.

>This test is intended to be somewhat worst case, no? At what point do
>we consider the trade-off worth it (given that it's not plausible to
>have zero impact)?
>

Yes, more or less. It was definitely designed to do that - it merely
plans the query (no execution), with many applicable indexes etc. It's
definitely true that once the exection starts to take more time, the
overhead will get negligible. Same for reducing number of indexes.

And of course, for queries that can benefit from incremental sort, the
speedup is likely way larger than this.

In general, I think it'd be naive that we can make planner smarter with
no extra overhead spent on planning, and we can never accept patches
adding even tiny overhead. With that approach we'd probably end up with
a trivial planner that generates just a single query plan, because
that's going to be the fastest planner. A realistic approach needs to
consider both the planning and execution phase, and benefits of this
patch seem to be clear - if you have queries that do benefit from it.

Let's try to minimize the overhead a bit more, and then we'll see.

Any

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tom Lane

Tomas Vondra  writes:
> In general, I think it'd be naive that we can make planner smarter with
> no extra overhead spent on planning, and we can never accept patches
> adding even tiny overhead. With that approach we'd probably end up with
> a trivial planner that generates just a single query plan, because
> that's going to be the fastest planner. A realistic approach needs to
> consider both the planning and execution phase, and benefits of this
> patch seem to be clear - if you have queries that do benefit from it.

I think that's kind of attacking a straw man, though.  The thing that
people push back on, or should push back on IMO, is when a proposed
patch adds significant slowdown to queries that it has no or very little
hope of improving.  The trick is to do expensive stuff only when
there's a good chance of getting a better plan out of it.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:
> >On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
> > wrote:
> >>
> >> The main thing I've been working on today is benchmarking how this
> >> affects planning. And I'm seeing a regression that worries me a bit,
> >> unfortunately.
> >>
> >> The test I'm doing is pretty simple - build a small table with a bunch
> >> of columns:
> >>
> >>create table t (a int, b int, c int, d int, e int, f int, g int);
> >>
> >>insert into t select 100*random(), 100*random(), 100*random(),
> >>  100*random(), 100*random(), 100*random(), 100*random()
> >>from generate_series(1,10) s(i);
> >>
> >> and then a number of indexes on subsets of up to 3 columns, as generated
> >> using the attached build-indexes.py script. And then run a bunch of
> >> explains (so no actual execution) sorting the data by at least 4 columns
> >> (to trigger incremental sort paths), measuring timing of the script.
> >>
> >> I did a bunch of runs on current master and v46 with incremental sort
> >> disabled and enabled, and the results look like this:
> >>
> >>  master   offon
> >>  --
> >>  34.60937.46337.729
> >>
> >> which means about 8-9% regression with incremental sort. Of course, this
> >> is only for planning time, for execution the impact is going to be much
> >> smaller. But it's still a bit annoying.
> >>
> >> I've suspected this might be either due to the add_partial_path changes
> >> or the patch adding incremental sort to additional places, so I tested
> >> those parts individually and the answer is no - add_partial_path changes
> >> have very small impact (~1%, which might be noise). The regression comes
> >> mostly from the 0002 part that adds incremental sort. At least in this
> >> particular test - different tests might behave differently, of course.
> >>
> >> The annoying bit is that the overhead does not disappear after disabling
> >> incremental sort. That suggests this is not merely due to considering
> >> and creating higher number of paths, but due to something that happens
> >> before we even look at the GUC ...
> >>
> >> I think I've found one such place - if you look at compare_pathkeys, it
> >> has this check right before the forboth() loop:
> >>
> >>  if (keys1 == keys2)
> >>  return PATHKEYS_EQUAL;
> >>
> >> But with incremental sort we don't call pathkeys_contained_in, we call
> >> pathkeys_common_contained_in instead. And that does not call
> >> compare_pathkeys and does not have the simple equality check. Adding
> >> the following check seems to cut the overhead in half, which is nice:
> >>
> >>  if (keys1 == keys2)
> >>  {
> >>  *n_common = list_length(keys1);
> >> return true;
> >>  }
> >>
> >> Not sure where the rest of the regression comes from yet.
> >
> >I noticed in the other function we also optimize by checking if either
> >keys list is NIL, so I tried adding that, and it might have made a
> >minor difference, but it's hard to tell as it was under 1%.
> >
>
> Which other function? I don't see this optimization in compare_pathkeys,
> that only checks key1/key2 after the forboth loop, but that's something
> different.

pathkeys_useful_for_ordering checks both inputs.

> I may be wrong, but my guess would be that this won't save much, because
> the loop should terminate immediately. The whole point is not to loop
> over possibly many pathkeys when we know it's exactly the same pathkeys
> list (same pointer). When one of the lists is NIL the loop should
> terminate right away ...
>
> >I also ran perf with a slightly modified version of your test that
> >uses psql, and after the above changes was seeing something like a
> >3.5% delta between master and this patch series. Nothing obvious in
> >the perf report though.
> >
>
> Yeah, I don't think there's anything obviously more expensive.
>
> >This test is intended to be somewhat worst case, no? At what point do
> >we consider the trade-off worth it (given that it's not plausible to
> >have zero impact)?
> >
>
> Yes, more or less. It was definitely designed to do that - it merely
> plans the query (no execution), with many applicable indexes etc. It's
> definitely true that once the exection starts to take more time, the
> overhead will get negligible. Same for reducing number of indexes.
>
> And of course, for queries that can benefit from incremental sort, the
> speedup is likely way larger than this.
>
> In general, I think it'd be naive that we can make planner smarter with
> no extra overhead spent on planning, and we can never accept patches
> adding even tiny overhead. With that approach we'd probably end up with
> a trivial planner that generates just a single query plan, because
> that's going to be the fastest planner. A realistic approach needs to
> consider both the planning and execution phase, and benefits of this
> patch

Re: [PATCH] Incremental sort (was: PoC: Partial sort)


On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:

On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
 wrote:


The main thing I've been working on today is benchmarking how this
affects planning. And I'm seeing a regression that worries me a bit,
unfortunately.

The test I'm doing is pretty simple - build a small table with a bunch
of columns:

   create table t (a int, b int, c int, d int, e int, f int, g int);

   insert into t select 100*random(), 100*random(), 100*random(),
 100*random(), 100*random(), 100*random(), 100*random()
   from generate_series(1,10) s(i);

and then a number of indexes on subsets of up to 3 columns, as generated
using the attached build-indexes.py script. And then run a bunch of
explains (so no actual execution) sorting the data by at least 4 columns
(to trigger incremental sort paths), measuring timing of the script.

I did a bunch of runs on current master and v46 with incremental sort
disabled and enabled, and the results look like this:

 master   offon
 --
 34.60937.46337.729

which means about 8-9% regression with incremental sort. Of course, this
is only for planning time, for execution the impact is going to be much
smaller. But it's still a bit annoying.

I've suspected this might be either due to the add_partial_path changes
or the patch adding incremental sort to additional places, so I tested
those parts individually and the answer is no - add_partial_path changes
have very small impact (~1%, which might be noise). The regression comes
mostly from the 0002 part that adds incremental sort. At least in this
particular test - different tests might behave differently, of course.

The annoying bit is that the overhead does not disappear after disabling
incremental sort. That suggests this is not merely due to considering
and creating higher number of paths, but due to something that happens
before we even look at the GUC ...

I think I've found one such place - if you look at compare_pathkeys, it
has this check right before the forboth() loop:

 if (keys1 == keys2)
 return PATHKEYS_EQUAL;

But with incremental sort we don't call pathkeys_contained_in, we call
pathkeys_common_contained_in instead. And that does not call
compare_pathkeys and does not have the simple equality check. Adding
the following check seems to cut the overhead in half, which is nice:

 if (keys1 == keys2)
 {
 *n_common = list_length(keys1);
return true;
 }

Not sure where the rest of the regression comes from yet.


I noticed in the other function we also optimize by checking if either
keys list is NIL, so I tried adding that, and it might have made a
minor difference, but it's hard to tell as it was under 1%.



Which other function? I don't see this optimization in compare_pathkeys,
that only checks key1/key2 after the forboth loop, but that's something
different.

I may be wrong, but my guess would be that this won't save much, because
the loop should terminate immediately. The whole point is not to loop
over possibly many pathkeys when we know it's exactly the same pathkeys
list (same pointer). When one of the lists is NIL the loop should
terminate right away ...


I also ran perf with a slightly modified version of your test that
uses psql, and after the above changes was seeing something like a
3.5% delta between master and this patch series. Nothing obvious in
the perf report though.



Yeah, I don't think there's anything obviously more expensive.


This test is intended to be somewhat worst case, no? At what point do
we consider the trade-off worth it (given that it's not plausible to
have zero impact)?



Yes, more or less. It was definitely designed to do that - it merely
plans the query (no execution), with many applicable indexes etc. It's
definitely true that once the exection starts to take more time, the
overhead will get negligible. Same for reducing number of indexes.

And of course, for queries that can benefit from incremental sort, the
speedup is likely way larger than this.

In general, I think it'd be naive that we can make planner smarter with
no extra overhead spent on planning, and we can never accept patches
adding even tiny overhead. With that approach we'd probably end up with
a trivial planner that generates just a single query plan, because
that's going to be the fastest planner. A realistic approach needs to
consider both the planning and execution phase, and benefits of this
patch seem to be clear - if you have queries that do benefit from it.

Let's try to minimize the overhead a bit more, and then we'll see.



Also, while looking at pathkeys_common_contained_in(), I've been a bit
puzzled why does is this correct:

 return (key1 == NULL);

It's easy to not notice it's key1 and not keys1, so I suggest we add a
comment 'key1 == NULL' means we've processed whole keys1 list.


Done.

I've included fixes for Alvaro's comments in this patch series also.

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tom Lane

James Coleman  writes:
> On Tue, Mar 31, 2020 at 1:04 PM Tom Lane  wrote:
>> Perhaps the semantics are such that that's actually sensible, but it's
>> far from a straightforward remapping of the old enum.

> Right, I didn't see the explicit "= 0" in other enums there, so it
> made me wonder if it was intentional to designate that one had to be
> 0, but I guess without a comment that's a lot of inference.

It's possible that somebody meant that as an indicator that the code
depends on palloc0() leaving the field with that value.  But if so,
you'd soon find that out ... and an actual comment would be better,
anyway.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

On Tue, Mar 31, 2020 at 1:04 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory
> > + * instrumentation so needs to have each value be a separate bit.
>
> >> I don't quite understand why you skipped "1".  (Also, is the use of zero
> >> a wise choice?)
>
> > The assignment of 0 was already there, and there wasn't a comment to
> > indicate why. That ends up meaning we wouldn't display "still in
> > progress" as a type here, which is maybe desirable, but I'm honestly
> > not sure why it was that way originally. I'm curious if you have any
> > thoughts on it.
>
> As things stood, the "= 0" was a no-op, since the first enum value
> would've been that anyway.  But if you're converting this set of symbols
> to bits that can be OR'd together, it seems pretty strange to use zero,
> because that can't be distinguished from "absence of any entry".
>
> Perhaps the semantics are such that that's actually sensible, but it's
> far from a straightforward remapping of the old enum.

Right, I didn't see the explicit "= 0" in other enums there, so it
made me wonder if it was intentional to designate that one had to be
0, but I guess without a comment that's a lot of inference.

The semantics seemed somewhat useful here in theory, but since I'm not
hearing a "yeah, that was intentional but not commented", I'm just
going to change it to what you'd naturally expect.

James

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tom Lane

James Coleman  writes:
> + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory
> + * instrumentation so needs to have each value be a separate bit.

>> I don't quite understand why you skipped "1".  (Also, is the use of zero
>> a wise choice?)

> The assignment of 0 was already there, and there wasn't a comment to
> indicate why. That ends up meaning we wouldn't display "still in
> progress" as a type here, which is maybe desirable, but I'm honestly
> not sure why it was that way originally. I'm curious if you have any
> thoughts on it.

As things stood, the "= 0" was a no-op, since the first enum value
would've been that anyway.  But if you're converting this set of symbols
to bits that can be OR'd together, it seems pretty strange to use zero,
because that can't be distinguished from "absence of any entry".

Perhaps the semantics are such that that's actually sensible, but it's
far from a straightforward remapping of the old enum.

regards, tom lane

Re: [PATCH] Incremental sort (was: PoC: Partial sort)