Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-14 Thread Peter Geoghegan
On Thu, Jul 2, 2020 at 8:47 AM James Coleman  wrote:
> It seems like the consensus over at another discussion on this topic
> [1] is that we ought to go ahead and print the zeros [for machine
> readable output formats], even though that creates some interesting
> scenarios like the fact that disk sorts will print 0 for memory even
> though that's not true.

What about having it print -1 for memory in this case instead? That's
still not ideal, but machine readable EXPLAIN output ought to
consistently show the same information per node, even when the answer
is in some sense indeterminate. That seems to be the standard that
we've settled on.

It might be worth teaching the JSON format to show a JSON null or
something instead. Not sure about that, though.

-- 
Peter Geoghegan




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-14 Thread Peter Geoghegan
On Thu, Jul 9, 2020 at 12:06 PM Jonathan S. Katz  wrote:
> On 7/2/20 11:47 AM, James Coleman wrote:
> > It seems like the consensus over at another discussion on this topic
> > [1] is that we ought to go ahead and print the zeros [for machine
> > readable output formats], even though that creates some interesting
> > scenarios like the fact that disk sorts will print 0 for memory even
> > though that's not true.
> >
> > The change has already been made and pushed for hash disk spilling, so
> > I think we ought to use Justin's patch here.
>
> Do people agree with James analysis? From the RMT perspective, we would
> like to get this open item wrapped up for the next beta, given[1] is now
> resolved.

Tomas, Justin: Ping? Can we get an update on this?

Just for the record, David Rowley fixed the similar hashagg issue in
commit 40efbf8706cdd96e06bc4d1754272e46d9857875. I don't see any
reason for the delay here.

Thanks
-- 
Peter Geoghegan




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-09 Thread Jonathan S. Katz
On 7/2/20 11:47 AM, James Coleman wrote:
> It seems like the consensus over at another discussion on this topic
> [1] is that we ought to go ahead and print the zeros [for machine
> readable output formats], even though that creates some interesting
> scenarios like the fact that disk sorts will print 0 for memory even
> though that's not true.
> 
> The change has already been made and pushed for hash disk spilling, so
> I think we ought to use Justin's patch here.

Do people agree with James analysis? From the RMT perspective, we would
like to get this open item wrapped up for the next beta, given[1] is now
resolved.

Thanks!

Jonathan

[1]
https://www.postgresql.org/message-id/CAApHDvpBQx4Shmisjp7oKr%3DECX18KYKPB%3DKpdWYxMKQNvisgvQ%40mail.gmail.com



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-07-02 Thread James Coleman
It seems like the consensus over at another discussion on this topic
[1] is that we ought to go ahead and print the zeros [for machine
readable output formats], even though that creates some interesting
scenarios like the fact that disk sorts will print 0 for memory even
though that's not true.

The change has already been made and pushed for hash disk spilling, so
I think we ought to use Justin's patch here.

James

[1] https://www.postgresql.org/message-id/2276865.1593102811%40sss.pgh.pa.us




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-06-24 Thread James Coleman
On Fri, Jun 19, 2020 at 12:04 AM Justin Pryzby  wrote:
>
> On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
> > > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> > > > I've pushed the fist part of this patch series - I've reorganized it a
> > >
> > > I scanned through this again post-commit.  Find attached some suggestions.
> > >
> > > Shouldn't non-text explain output always show both disk *and* mem, 
> > > including
> > > zeros ?
> >
> > Could you give more context on this? Is there a standard to follow?
> > Regular sort nodes only ever report one type, so there's not a good
> > parallel there.
>
> The change I proposed was like:
>
> @@ -2829,7 +2829,6 @@ 
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>
> ExplainPropertyList("Sort Methods Used", methodNames, es);
>
> -   if (groupInfo->maxMemorySpaceUsed > 0)
> {
> longavgSpace = 
> groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
> const char *spaceTypeName;
> ...
> -   if (groupInfo->maxDiskSpaceUsed > 0)
> {
> ...
>
> To show in non-text format *both* disk and memory space used, even if zero.
>
> I still think that's what's desirable.

I'm of the opposite opinion. I believe showing both unnecessarily is confusing.

> If it's important to show *whether* a sort space was used, then I think there
> should be a boolean, or an int 0/1.  But I don't think it's actually needed,
> since someone parsing the explain output could just check
> |if _dict['Peak Sort Space Used'] > 0: ...
> the same as you're doing, without having to write some variation on:
> |if 'Peak Sort Space Used' in _dict and _dict['Peak Sort Space Used'] > 0: ...

I think it's desirable for code to be explicitly about the type having
been used rather than implicitly assuming it based on 0/non-zero
values.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-06-18 Thread Justin Pryzby
On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
> > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> > > I've pushed the fist part of this patch series - I've reorganized it a
> >
> > I scanned through this again post-commit.  Find attached some suggestions.
> >
> > Shouldn't non-text explain output always show both disk *and* mem, including
> > zeros ?
>
> Could you give more context on this? Is there a standard to follow?
> Regular sort nodes only ever report one type, so there's not a good
> parallel there.

The change I proposed was like:

@@ -2829,7 +2829,6 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo 
*groupInfo,

ExplainPropertyList("Sort Methods Used", methodNames, es);

-   if (groupInfo->maxMemorySpaceUsed > 0)
{
longavgSpace = 
groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
...
-   if (groupInfo->maxDiskSpaceUsed > 0)
{
...

To show in non-text format *both* disk and memory space used, even if zero.

I still think that's what's desirable.

If it's important to show *whether* a sort space was used, then I think there
should be a boolean, or an int 0/1.  But I don't think it's actually needed,
since someone parsing the explain output could just check
|if _dict['Peak Sort Space Used'] > 0: ...
the same as you're doing, without having to write some variation on:
|if 'Peak Sort Space Used' in _dict and _dict['Peak Sort Space Used'] > 0: ...

-- 
Justin




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-05-12 Thread Peter Geoghegan
On Tue, May 12, 2020 at 11:18 AM Tomas Vondra
 wrote:
> I've pushed both patches, fixing typos and explain format.

Thanks, Tomas.

-- 
Peter Geoghegan




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-05-12 Thread Tomas Vondra

On Sun, May 10, 2020 at 02:25:23PM +0200, Tomas Vondra wrote:

On Sat, May 09, 2020 at 06:48:02PM -0700, Peter Geoghegan wrote:

On Sat, May 9, 2020 at 3:19 PM Tomas Vondra
 wrote:

I'm generally OK with most of this - I'd probably keep the single-line
format, but I don't feel very strongly about that and if others think
using two lines is better ...

Barring objections I'll get this polished and pushed soon-ish (say,
early next week).


I see something about starting a new thread on the Open Items page.
Please CC me on this.

Speaking in my capacity as an RMT member: Glad to see that you plan to
close this item out next week. (I had planned on giving you a nudge
about this, but it looks like I don't really have to now.)



Not sure about about the new thread - the discussion continues on the
main incremental sort thread, I don't see any proposal to start a new
thread there. IMO it'd be pointless at this point.



I've pushed both patches, fixing typos and explain format.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-05-10 Thread Tomas Vondra

On Sat, May 09, 2020 at 06:48:02PM -0700, Peter Geoghegan wrote:

On Sat, May 9, 2020 at 3:19 PM Tomas Vondra
 wrote:

I'm generally OK with most of this - I'd probably keep the single-line
format, but I don't feel very strongly about that and if others think
using two lines is better ...

Barring objections I'll get this polished and pushed soon-ish (say,
early next week).


I see something about starting a new thread on the Open Items page.
Please CC me on this.

Speaking in my capacity as an RMT member: Glad to see that you plan to
close this item out next week. (I had planned on giving you a nudge
about this, but it looks like I don't really have to now.)



Not sure about about the new thread - the discussion continues on the
main incremental sort thread, I don't see any proposal to start a new
thread there. IMO it'd be pointless at this point.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-05-09 Thread Peter Geoghegan
On Sat, May 9, 2020 at 3:19 PM Tomas Vondra
 wrote:
> I'm generally OK with most of this - I'd probably keep the single-line
> format, but I don't feel very strongly about that and if others think
> using two lines is better ...
>
> Barring objections I'll get this polished and pushed soon-ish (say,
> early next week).

I see something about starting a new thread on the Open Items page.
Please CC me on this.

Speaking in my capacity as an RMT member: Glad to see that you plan to
close this item out next week. (I had planned on giving you a nudge
about this, but it looks like I don't really have to now.)

Thanks
-- 
Peter Geoghegan




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-05-09 Thread Tomas Vondra

On Sat, May 09, 2020 at 03:18:36PM -0500, Justin Pryzby wrote:

Checking if it's possible to address this Opened Item before 13b1.

https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
consistency of explain output: two spaces, equals vs colons, semicolons 
(incremental sort)



Yes. Now that the other items related to incremental sort are fixed,
this is next on my TODO.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-05-09 Thread Justin Pryzby
Checking if it's possible to address this Opened Item before 13b1.

https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
consistency of explain output: two spaces, equals vs colons, semicolons 
(incremental sort) 

On Sun, Apr 19, 2020 at 09:46:55AM -0400, James Coleman wrote:
> On Sat, Apr 18, 2020 at 10:36 PM Justin Pryzby  wrote:
> >
> > On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote:
> > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > > > And, should it use two spaces before "Sort Method", "Memory" and 
> > > > > "Pre-sorted
> > > ...
> > > > I read through that subthread, and the ending seemed to be Peter
> > > > wanting things to be unified. Was there a conclusion beyond that?
> > >
> > > This discussion is ongoing.  I think let's wait until that's settled 
> > > before
> > > addressing this more complex and even newer case.  We can add "explain, 
> > > two
> > > spaces and equals vs colon" to the "Open items" list if need be - I hope 
> > > the
> > > discussion will not delay the release.
> >
> > The change proposed on the WAL thread is minimal, and makes new explain(WAL)
> > output consistent with the that of explain(BUFFERS).
> >
> > That uses a different format from "Sort", which is what incremental sort 
> > should
> > follow.  (Hashjoin also uses the Sort's format of two-spaces and colons 
> > rather
> > than equals).
> 
> I think it's not great that buffers/sort are different, but I agree
> that we should follow sort.
> 
> > So the attached 0001 makes explain output for incremental sort more 
> > consistent
> > with sort:
> >
> >  - Two spaces;
> >  - colons rather than equals;
> >  - Don't use semicolon, which isn't in use anywhere else;
> >
> > I tested with this:
> > template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t 
> > SELECT a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON 
> > t(i); VACUUM VERBOSE ANALYZE t;
> > template1=# explain analyze SELECT * FROM t a ORDER BY i,j;
> > ...
> >Full-sort Groups: 1000  Sort Method: quicksort  Average Memory: 28kB  
> > Peak Memory: 28kB  Pre-sorted Groups: 1000  Sort Method: quicksort  Average 
> > Memory: 30kB  Peak Memory: 30kB
> >
> > On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote:
> > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  
> > > > wrote:
> > > > > Should "Pre-sorted Groups:" be on a separate line ?
> > > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB 
> > > > > peak=28kB Pre-sorted Groups: 1 Sort Method: quicksort Memory: 
> > > > > avg=30kB peak=30kB
> > > >
> > > > I'd originally had that, but Tomas wanted it to be more compact. It's
> > > > easy to adjust though if the consensus changes on that.
> > >
> > > I'm OK with changing the format if there's a consensus. The current
> > > format seemed better to me, but I'm not particularly attached to it.
> >
> > I still think Pre-sorted groups should be on a separate line, as in 0002.
> > In addition to looking better (to me), and being easier to read, another 
> > reason
> > is that there are essentially key=>values here, but the keys are repeated 
> > (Sort
> > Method, etc).
> 
> I collapsed this into 0001 because I think that if we're going to do
> away with the key=value style then we effectively to have to do this
> to avoid the repeated values being confusing (with key=value it kinda
> worked, because that made it seem like the avg/peak were clearly a
> subset of the Sort Groups info).
> 
> I also cleaned up the newline patch a bit in the process (we already
> have a way to indent with a flag so don't need to do it directly).
> 
> > I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that 
> > here.
> 
> I went ahead and did that too; we already use "Full-sort", so the
> proposed change makes both parallel.
> 
> > Michael already patched most of the comment typos, the remainder I'm 
> > including
> > here as a "nearby patch"..
> 
> Modified slightly.
> 
> James

> From becd60ba348558fa241db5cc2100a84b757cbdc5 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Mon, 6 Apr 2020 17:37:31 -0500
> Subject: [PATCH v2 2/2] comment typos: Incremental Sort
> 
> commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da
> Author: Tomas Vondra 
> 
> Previously reported here:
> https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com
> ---
>  src/backend/commands/explain.c |  4 ++--
>  src/backend/executor/nodeIncrementalSort.c | 10 ++
>  src/backend/utils/sort/tuplesort.c |  8 
>  3 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 5f91c569a0..86c10458f4 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2865,7 +2865,7 @@ 
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
>  }
>  
>  

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-19 Thread James Coleman
On Sat, Apr 18, 2020 at 10:36 PM Justin Pryzby  wrote:
>
> On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote:
> > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > > And, should it use two spaces before "Sort Method", "Memory" and 
> > > > "Pre-sorted
> > ...
> > > I read through that subthread, and the ending seemed to be Peter
> > > wanting things to be unified. Was there a conclusion beyond that?
> >
> > This discussion is ongoing.  I think let's wait until that's settled before
> > addressing this more complex and even newer case.  We can add "explain, two
> > spaces and equals vs colon" to the "Open items" list if need be - I hope the
> > discussion will not delay the release.
>
> The change proposed on the WAL thread is minimal, and makes new explain(WAL)
> output consistent with the that of explain(BUFFERS).
>
> That uses a different format from "Sort", which is what incremental sort 
> should
> follow.  (Hashjoin also uses the Sort's format of two-spaces and colons rather
> than equals).

I think it's not great that buffers/sort are different, but I agree
that we should follow sort.

> So the attached 0001 makes explain output for incremental sort more consistent
> with sort:
>
>  - Two spaces;
>  - colons rather than equals;
>  - Don't use semicolon, which isn't in use anywhere else;
>
> I tested with this:
> template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT 
> a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON t(i); 
> VACUUM VERBOSE ANALYZE t;
> template1=# explain analyze SELECT * FROM t a ORDER BY i,j;
> ...
>Full-sort Groups: 1000  Sort Method: quicksort  Average Memory: 28kB  Peak 
> Memory: 28kB  Pre-sorted Groups: 1000  Sort Method: quicksort  Average 
> Memory: 30kB  Peak Memory: 30kB
>
> On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote:
> > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  
> > > wrote:
> > > > Should "Pre-sorted Groups:" be on a separate line ?
> > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
> > > > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB
> > >
> > > I'd originally had that, but Tomas wanted it to be more compact. It's
> > > easy to adjust though if the consensus changes on that.
> >
> > I'm OK with changing the format if there's a consensus. The current
> > format seemed better to me, but I'm not particularly attached to it.
>
> I still think Pre-sorted groups should be on a separate line, as in 0002.
> In addition to looking better (to me), and being easier to read, another 
> reason
> is that there are essentially key=>values here, but the keys are repeated 
> (Sort
> Method, etc).

I collapsed this into 0001 because I think that if we're going to do
away with the key=value style then we effectively to have to do this
to avoid the repeated values being confusing (with key=value it kinda
worked, because that made it seem like the avg/peak were clearly a
subset of the Sort Groups info).

I also cleaned up the newline patch a bit in the process (we already
have a way to indent with a flag so don't need to do it directly).

> I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that 
> here.

I went ahead and did that too; we already use "Full-sort", so the
proposed change makes both parallel.

> Michael already patched most of the comment typos, the remainder I'm including
> here as a "nearby patch"..

Modified slightly.

James
From becd60ba348558fa241db5cc2100a84b757cbdc5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 6 Apr 2020 17:37:31 -0500
Subject: [PATCH v2 2/2] comment typos: Incremental Sort

commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da
Author: Tomas Vondra 

Previously reported here:
https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com
---
 src/backend/commands/explain.c |  4 ++--
 src/backend/executor/nodeIncrementalSort.c | 10 ++
 src/backend/utils/sort/tuplesort.c |  8 
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5f91c569a0..86c10458f4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2865,7 +2865,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node
  */
 static void
 show_incremental_sort_info(IncrementalSortState *incrsortstate,
@@ -2913,7 +2913,7 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
 			>shared_info->sinfo[n];
 
 			/*
-			 * If a worker hasn't process any sort groups at all, then exclude
+			 * If a worker hasn't processed any sort groups at all, then exclude
 			 * it from output since it either 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-18 Thread Justin Pryzby
On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote:
> On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > > And, should it use two spaces before "Sort Method", "Memory" and 
> > > "Pre-sorted
> ...
> > I read through that subthread, and the ending seemed to be Peter
> > wanting things to be unified. Was there a conclusion beyond that?
> 
> This discussion is ongoing.  I think let's wait until that's settled before
> addressing this more complex and even newer case.  We can add "explain, two
> spaces and equals vs colon" to the "Open items" list if need be - I hope the
> discussion will not delay the release.

The change proposed on the WAL thread is minimal, and makes new explain(WAL)
output consistent with the that of explain(BUFFERS).

That uses a different format from "Sort", which is what incremental sort should
follow.  (Hashjoin also uses the Sort's format of two-spaces and colons rather
than equals).

So the attached 0001 makes explain output for incremental sort more consistent
with sort:

 - Two spaces;
 - colons rather than equals;
 - Don't use semicolon, which isn't in use anywhere else;

I tested with this:
template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT 
a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON t(i); VACUUM 
VERBOSE ANALYZE t;
template1=# explain analyze SELECT * FROM t a ORDER BY i,j;
...
   Full-sort Groups: 1000  Sort Method: quicksort  Average Memory: 28kB  Peak 
Memory: 28kB  Pre-sorted Groups: 1000  Sort Method: quicksort  Average Memory: 
30kB  Peak Memory: 30kB

On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
> > > Should "Pre-sorted Groups:" be on a separate line ?
> > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
> > > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB
> > 
> > I'd originally had that, but Tomas wanted it to be more compact. It's
> > easy to adjust though if the consensus changes on that.
> 
> I'm OK with changing the format if there's a consensus. The current
> format seemed better to me, but I'm not particularly attached to it.

I still think Pre-sorted groups should be on a separate line, as in 0002.
In addition to looking better (to me), and being easier to read, another reason
is that there are essentially key=>values here, but the keys are repeated (Sort
Method, etc).

I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that here.

Michael already patched most of the comment typos, the remainder I'm including
here as a "nearby patch"..

-- 
Justin
>From 55044341f82b847d136cd17df5a3c8d80c8371b4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 15 Apr 2020 08:45:21 -0500
Subject: [PATCH v1 1/3] Fix explain output for incr sort:

 - Two spaces;
 - colons rather than equals;
 - Don't use semicolon;
---
 src/backend/commands/explain.c | 18 +++---
 src/test/regress/expected/incremental_sort.out | 12 ++--
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..9257c52707 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2778,7 +2778,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 	{
 		if (indent)
 			appendStringInfoSpaces(es->str, es->indent * 2);
-		appendStringInfo(es->str, "%s Groups: " INT64_FORMAT " Sort Method", groupLabel,
+		appendStringInfo(es->str, "%s Groups: " INT64_FORMAT "  Sort Method", groupLabel,
 		 groupInfo->groupCount);
 		/* plural/singular based on methodNames size */
 		if (list_length(methodNames) > 1)
@@ -2798,24 +2798,20 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			const char *spaceTypeName;
 
 			spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
-			appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
+			appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
 			 spaceTypeName, avgSpace,
-			 groupInfo->maxMemorySpaceUsed);
+			 spaceTypeName, groupInfo->maxMemorySpaceUsed);
 		}
 
 		if (groupInfo->maxDiskSpaceUsed > 0)
 		{
 			long		avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
-
 			const char *spaceTypeName;
 
 			spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
-			/* Add a semicolon separator only if memory stats were printed. */
-			if (groupInfo->maxMemorySpaceUsed > 0)
-appendStringInfo(es->str, ";");
-			appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB",
+			appendStringInfo(es->str, "  Average %s: %ldkB  Peak %s: %ldkB",
 			 spaceTypeName, avgSpace,
-			 groupInfo->maxDiskSpaceUsed);
+			 spaceTypeName, groupInfo->maxDiskSpaceUsed);
 		}
 	}
 	else
@@ -2899,7 +2895,7 @@ 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-16 Thread James Coleman
On Thu, Apr 16, 2020 at 1:10 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Fri, Apr 10, 2020 at 10:12 AM James Coleman  wrote:
> >> One thing I just noticed and had a question about: in
> >> preparePresortedCols (which sets up a function call context), do we
> >> need to call pg_proc_aclcheck?
>
> > Background: this came up because I noticed that pg_proc_aclcheck is
> > called in the scalar array op case in execExpr.c.
>
> > However grepping through the source code I see several places where a
> > function (including an equality op for an ordering op, like the case
> > we have here) gets looked up without calling pg_proc_aclcheck, but
> > then other places where the acl check is invoked.
>
> Rule of thumb is that we don't apply ACL checks to functions/ops
> we get out of an opclass; adding a function to an opclass is tantamount
> to giving public execute permission on it.  If the function/operator
> reference came directly from the SQL query it must be checked.

All right, in that case I believe we're OK here without modification.
We're looking up the equality op based on the ordering op the planner
has already selected for sorting the query, and I'm assuming that
looking that up via the op family is in the same category as "getting
out of an opclass" (since opclasses are part of an opfamily).

Thanks for the explanation.

> > In addition, I haven't been able to discern a reason for why sometimes
> > InvokeFunctionExecuteHook gets called with the function after lookup,
> > but not others.
>
> I would not stand here and say that that hook infrastructure is worth
> anything at all.  Maybe the coverage is sufficient for some use-cases,
> but who's to say?

Interesting. It does look to be particularly underused. Just grepping
for that hook invocation macro shows, for example, that it's not used
in nodeSort.c or tuplesort.c, so clearly it's not executed for the
functions we'd use in regular sort. Given that...I think we can
proceed without it here too.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-16 Thread Tom Lane
James Coleman  writes:
> On Fri, Apr 10, 2020 at 10:12 AM James Coleman  wrote:
>> One thing I just noticed and had a question about: in
>> preparePresortedCols (which sets up a function call context), do we
>> need to call pg_proc_aclcheck?

> Background: this came up because I noticed that pg_proc_aclcheck is
> called in the scalar array op case in execExpr.c.

> However grepping through the source code I see several places where a
> function (including an equality op for an ordering op, like the case
> we have here) gets looked up without calling pg_proc_aclcheck, but
> then other places where the acl check is invoked.

Rule of thumb is that we don't apply ACL checks to functions/ops
we get out of an opclass; adding a function to an opclass is tantamount
to giving public execute permission on it.  If the function/operator
reference came directly from the SQL query it must be checked.

> In addition, I haven't been able to discern a reason for why sometimes
> InvokeFunctionExecuteHook gets called with the function after lookup,
> but not others.

I would not stand here and say that that hook infrastructure is worth
anything at all.  Maybe the coverage is sufficient for some use-cases,
but who's to say?

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-16 Thread James Coleman
On Fri, Apr 10, 2020 at 10:12 AM James Coleman  wrote:
>
> One thing I just noticed and had a question about: in
> preparePresortedCols (which sets up a function call context), do we
> need to call pg_proc_aclcheck?

Background: this came up because I noticed that pg_proc_aclcheck is
called in the scalar array op case in execExpr.c.

However grepping through the source code I see several places where a
function (including an equality op for an ordering op, like the case
we have here) gets looked up without calling pg_proc_aclcheck, but
then other places where the acl check is invoked.

In addition, I haven't been able to discern a reason for why sometimes
InvokeFunctionExecuteHook gets called with the function after lookup,
but not others.

So I'm not sure if either of these needed to be added to the equality
op/function lookup code in nodeIncrementalSort's preparePresortedCols
or not.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-10 Thread James Coleman
One thing I just noticed and had a question about: in
preparePresortedCols (which sets up a function call context), do we
need to call pg_proc_aclcheck?

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>>hyrax is not too happy with this test:
>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
>>
>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>>EXPLAIN output, but it evidently is.
>>
>
>Thanks, I'll investigate. It's not clear to me either what might be
>causing this, but I guess something must have gone wrong in
>estimation/planning.
>

OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.


Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.



Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
>On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
>>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
>> wrote:
>>>
>>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>hyrax is not too happy with this test:
>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
>
>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>EXPLAIN output, but it evidently is.
>

Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.

>>>
>>>OK, I know what's going on - it's a rather embarassing issue in the
>>>regression test. There's no analyze on the test tables, so it uses
>>>default estimates for number of groups etc. But with clobber cache the
>>>test runs long enough for autoanalyze to kick in and collect stats, so
>>>we generate better estimates which changes the plan.
>>>
>>>I'll get this fixed - explicit analyze and tweaking the data a bit
>>>should do the trick.
>>
>>Looking at the tests that failed, I think we should consider just adding:
>>set enable_sort = off;
>>because several of those tests have very specific amounts of data to
>>ensure we test the transition points around the different modes in the
>>incremental sort node.
>>
>
>Maybe, but I'd much rather tweak the data so that we test both the
>costing and execution part.
>

I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?


Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".



OK, good. I'll push the fix.


I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...



Yeah, but with 1000 rows we'd need a more complex CASE statement (I
don't think simply having two groups - small+large would work).


Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?



Ah, definitely. That was a mistake. Thanks for noticing.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tom Lane
James Coleman  writes:
> Should we change `analyze` to `analyze t` to avoid unnecessarily
> re-analyzing all other tables in the regression db?

Yes, a global analyze here is a remarkably horrid idea.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman
On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
> >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
> >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
> >> wrote:
> >>>
> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >hyrax is not too happy with this test:
> >
> >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
> >
> >It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >EXPLAIN output, but it evidently is.
> >
> 
> Thanks, I'll investigate. It's not clear to me either what might be
> causing this, but I guess something must have gone wrong in
> estimation/planning.
> 
> >>>
> >>>OK, I know what's going on - it's a rather embarassing issue in the
> >>>regression test. There's no analyze on the test tables, so it uses
> >>>default estimates for number of groups etc. But with clobber cache the
> >>>test runs long enough for autoanalyze to kick in and collect stats, so
> >>>we generate better estimates which changes the plan.
> >>>
> >>>I'll get this fixed - explicit analyze and tweaking the data a bit
> >>>should do the trick.
> >>
> >>Looking at the tests that failed, I think we should consider just adding:
> >>set enable_sort = off;
> >>because several of those tests have very specific amounts of data to
> >>ensure we test the transition points around the different modes in the
> >>incremental sort node.
> >>
> >
> >Maybe, but I'd much rather tweak the data so that we test both the
> >costing and execution part.
> >
>
> I do think this does the trick by increasing the number of rows a bit
> (from 100 to 1000) to make the Sort more expensive than Incremental
> Sort, while still testing the transition points.
>
> James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.


Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.



Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.



I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/test/regress/expected/incremental_sort.out 
b/src/test/regress/expected/incremental_sort.out
index 3072d95643..fb4ab95922 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -141,7 +141,8 @@ begin
 end;
 $$;
 -- A single large group tested around each mode transition point.
-insert into t(a, b) select 1, i from generate_series(1, 100) n(i);
+insert into t(a, b) select i/100 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 31;
QUERY PLAN
 -
@@ -456,7 +457,8 @@ select * from (select * from t order by a) s order by a, b 
limit 66;
 
 delete from t;
 -- An initial large group followed by a small group.
-insert into t(a, b) select (case when i < 50 then 1 else 2 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select i/50 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 55;
QUERY PLAN
 -
@@ -521,7 +523,7 @@ select * from (select * from t order by a) s order by a, b 
limit 55;
  1 | 47
  1 | 48
  1 | 49
- 2 | 50
+ 1 | 50
  2 | 51
  2 | 52
  2 | 53
@@ -538,10 +540,10 @@ select explain_analyze_without_memory('select * from 
(select * from t order by a
  Sort Key: t.a, t.b
  Presorted Key: t.a
  Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: 
avg=NNkB peak=NNkB
- ->  Sort (actual rows=100 loops=1)
+ ->  Sort (actual rows=101 loops=1)
Sort Key: t.a
Sort Method: quicksort  Memory: NNkB
-   ->  Seq Scan on t (actual rows=100 loops=1)
+   ->  Seq Scan on t (actual rows=1000 loops=1)
 (9 rows)
 
 select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * 
from (select * from t order by a) s order by a, b limit 55'));
@@ -584,7 +586,8 @@ select 
explain_analyze_inc_sort_nodes_verify_invariants('select * from (select *
 
 delete from t;
 -- An initial small group followed by a large group.
-insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 1000) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 70;
QUERY PLAN
 -
@@ -705,17 +708,17 @@ select * from t left join (select * from (select * from t 
order by a) v order by
 rollback;
 -- Test EXPLAIN ANALYZE with both fullsort and presorted groups.
 select explain_analyze_without_memory('select * from (select * from t order by 
a) s order by a, b limit 70');
-   
explain_analyze_without_memory  
  

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman
On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> >On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >>hyrax is not too happy with this test:
> >>
> >>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15
> >>
> >>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >>EXPLAIN output, but it evidently is.
> >>
> >
> >Thanks, I'll investigate. It's not clear to me either what might be
> >causing this, but I guess something must have gone wrong in
> >estimation/planning.
> >
>
> OK, I know what's going on - it's a rather embarassing issue in the
> regression test. There's no analyze on the test tables, so it uses
> default estimates for number of groups etc. But with clobber cache the
> test runs long enough for autoanalyze to kick in and collect stats, so
> we generate better estimates which changes the plan.
>
> I'll get this fixed - explicit analyze and tweaking the data a bit
> should do the trick.

Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Tom Lane
hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Tomas Vondra

Hi,

I've pushed the second part of this patch series, adding incremental
sort to additional places in the planner. As explained in the commit
message (and somewhere in this thread) we've decided to only update some
of the places that require sorted input (and do create_sort). This might
be overly cautious, I expect we'll add it to more places in the future.


As for the remaining part tweaking add_partial_path to also consider
startup cost (a bit confusingly 0001 - 0003 in the v54 patchset), I've
decided not to push it now and leave it for v14. The add_partial_path
change is simple, but I came to the conclusion that the "precheck"
function should be modified to follow the same logic - it would be a bit
strange if add_partial_path_precheck considered only total cost and
add_partial_path considered both startup and total cost. It would not
matter for most places because the add_partial_path_precheck is ony used
in join planning, but it's still strange.

I could have modified the add_partial_path_precheck too, but looking at
add_path_precheck we'd probably need to compute required_outer so that
we only compare startup_cost when really useful. Or we might simply
consider startup_cost every time and leave that up to add_partial_path,
but then that would be another difference in behavior.

That seems like way too much stuff to rework on the last day of the last
commitfest. It does mean we'll fail to generate the cheapest plan in some
cases (e.g. with LIMIT, there's an example in [1]) but that's a
pre-existing condition, not something introduced by incremental sort.

regards


[1] 
https://www.postgresql.org/message-id/20190720132244.3vgg2uynfpxh3me5%40development

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread James Coleman
On Tue, Apr 7, 2020 at 7:58 PM Tomas Vondra
 wrote:
>
> On Tue, Apr 07, 2020 at 07:50:26PM -0400, James Coleman wrote:
> >On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:
> >> >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> >> >> I've pushed the fist part of this patch series - I've reorganized it a
> >> >
> >> >I scanned through this again post-commit.  Find attached some suggestions.
> >> >
> >>
> >> Thanks. The typo fixes seem clear, except for this bit:
> >>
> >>* If we've set up either of the sort states yet, we need to reset them.
> >>* We could end them and null out the pointers, but there's no reason to
> >>* repay the setup cost, and because  guard setting up pivot 
> >> comparator
> >>* state similarly, doing so might actually cause a leak.
> >>
> >> I can't figure out what  should be. James, do you recall what this
> >> should be?
> >
> >Yep, it's ExecIncrementalSort. If you look for the block guarded by
> >`if (fullsort_state == NULL)` you'll see the call to
> >preparePresortedCols(), which sets up the pivot comparator state
> >referenced by this comment.
> >
>
> OK, so it should be "... and because ExecIncrementalSort guard ..."?

Yes, "because ExecIncrementalSort guards presorted column functions by
checking to see if the full sort state has been initialized yet,
setting the sort states to null here might cause..." (that's more
specific IMO than my original "pivot comparator state...doing so").

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Tomas Vondra

On Tue, Apr 07, 2020 at 07:50:26PM -0400, James Coleman wrote:

On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra
 wrote:


On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:
>On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
>> I've pushed the fist part of this patch series - I've reorganized it a
>
>I scanned through this again post-commit.  Find attached some suggestions.
>

Thanks. The typo fixes seem clear, except for this bit:

   * If we've set up either of the sort states yet, we need to reset them.
   * We could end them and null out the pointers, but there's no reason to
   * repay the setup cost, and because  guard setting up pivot comparator
   * state similarly, doing so might actually cause a leak.

I can't figure out what  should be. James, do you recall what this
should be?


Yep, it's ExecIncrementalSort. If you look for the block guarded by
`if (fullsort_state == NULL)` you'll see the call to
preparePresortedCols(), which sets up the pivot comparator state
referenced by this comment.



OK, so it should be "... and because ExecIncrementalSort guard ..."?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread James Coleman
On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:
> >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> >> I've pushed the fist part of this patch series - I've reorganized it a
> >
> >I scanned through this again post-commit.  Find attached some suggestions.
> >
>
> Thanks. The typo fixes seem clear, except for this bit:
>
>* If we've set up either of the sort states yet, we need to reset them.
>* We could end them and null out the pointers, but there's no reason to
>* repay the setup cost, and because  guard setting up pivot comparator
>* state similarly, doing so might actually cause a leak.
>
> I can't figure out what  should be. James, do you recall what this
> should be?

Yep, it's ExecIncrementalSort. If you look for the block guarded by
`if (fullsort_state == NULL)` you'll see the call to
preparePresortedCols(), which sets up the pivot comparator state
referenced by this comment.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote:

On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:

I've pushed the fist part of this patch series - I've reorganized it a


I scanned through this again post-commit.  Find attached some suggestions.



Thanks. The typo fixes seem clear, except for this bit:

  * If we've set up either of the sort states yet, we need to reset them.
  * We could end them and null out the pointers, but there's no reason to
  * repay the setup cost, and because  guard setting up pivot comparator
  * state similarly, doing so might actually cause a leak.

I can't figure out what  should be. James, do you recall what this
should be?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Justin Pryzby
On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
> > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> > > I've pushed the fist part of this patch series - I've reorganized it a
> >
> > I scanned through this again post-commit.  Find attached some suggestions.
> >
> > Shouldn't non-text explain output always show both disk *and* mem, including
> > zeros ?
> 
> Could you give more context on this? Is there a standard to follow?
> Regular sort nodes only ever report one type, so there's not a good
> parallel there.

Right, I'm not sure either, since it seems to be a new case.  Maybe Tomas has a
strong intuition.

See at least the commit messages here:
3ec20c7091e97a554e7447ac2b7f4ed795631395
7d91b604d9b5d6ec8c19c57a9ffd2f27129cdd94
8ebb69f85445177575684a0ba5cfedda8d840a91

Maybe this one suggests that it should *not* be present unconditionally, but
only when that sort type is used?
4b234fd8bf21cd6f5ff44f1f1c613bf40860998d

Another thought: is checking if bytes>0 really a good way to determine if a
sort type was used ?  It seems like checking a bit or a pointer would be
better.  I guess a size of 0 is unlikely, and it's ok at least in text mode.

if (groupInfo->maxMemorySpaceUsed > 0)

On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:
> > And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
...
> I read through that subthread, and the ending seemed to be Peter
> wanting things to be unified. Was there a conclusion beyond that?

This discussion is ongoing.  I think let's wait until that's settled before
addressing this more complex and even newer case.  We can add "explain, two
spaces and equals vs colon" to the "Open items" list if need be - I hope the
discussion will not delay the release.

-- 
Justin




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread Tomas Vondra

On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote:

On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:


On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> I've pushed the fist part of this patch series - I've reorganized it a

I scanned through this again post-commit.  Find attached some suggestions.

Shouldn't non-text explain output always show both disk *and* mem, including
zeros ?


Could you give more context on this? Is there a standard to follow?
Regular sort nodes only ever report one type, so there's not a good
parallel there.


Should "Pre-sorted Groups:" be on a separate line ?
| Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB


I'd originally had that, but Tomas wanted it to be more compact. It's
easy to adjust though if the consensus changes on that.



I'm OK with changing the format if there's a consensus. The current
format seemed better to me, but I'm not particularly attached to it.


And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
Groups"?  I think you should maybe do that instead of the "semicolon
separator".  I think "two spaces" makes sense, since the units are different,
similar to hash buckets and normal sort node.

 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
 appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",

Note, I made a similar comment regarding two spaces for explain(WAL) here:
https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com

And Peter E seemed to dislike that, here:
https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com


I read through that subthread, and the ending seemed to be Peter
wanting things to be unified. Was there a conclusion beyond that?



Yeah, I don't think there was a clear consensus :-(


Also, you're showing:
ExplainPropertyInteger("Maximum Sort Space Used", "kB",
groupInfo->maxMemorySpaceUsed, es);

But in show_hash_info() and show_hashagg_info(), and in your own text output,
that's called "Peak":
ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
ExplainPropertyInteger("Peak Memory Usage", "kB",
spacePeakKb, es);


Yes, that's a miss and should be fixed.



Will fix.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-07 Thread James Coleman
On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby  wrote:
>
> On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> > I've pushed the fist part of this patch series - I've reorganized it a
>
> I scanned through this again post-commit.  Find attached some suggestions.
>
> Shouldn't non-text explain output always show both disk *and* mem, including
> zeros ?

Could you give more context on this? Is there a standard to follow?
Regular sort nodes only ever report one type, so there's not a good
parallel there.

> Should "Pre-sorted Groups:" be on a separate line ?
> | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
> Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB

I'd originally had that, but Tomas wanted it to be more compact. It's
easy to adjust though if the consensus changes on that.

> And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
> Groups"?  I think you should maybe do that instead of the "semicolon
> separator".  I think "two spaces" makes sense, since the units are different,
> similar to hash buckets and normal sort node.
>
>  "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
>  appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",
>
> Note, I made a similar comment regarding two spaces for explain(WAL) here:
> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com
>
> And Peter E seemed to dislike that, here:
> https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com

I read through that subthread, and the ending seemed to be Peter
wanting things to be unified. Was there a conclusion beyond that?

> Also, you're showing:
> ExplainPropertyInteger("Maximum Sort Space Used", "kB",
> groupInfo->maxMemorySpaceUsed, es);
>
> But in show_hash_info() and show_hashagg_info(), and in your own text output,
> that's called "Peak":
> ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
> ExplainPropertyInteger("Peak Memory Usage", "kB",
> spacePeakKb, es);

Yes, that's a miss and should be fixed.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Justin Pryzby
On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:
> I've pushed the fist part of this patch series - I've reorganized it a

I scanned through this again post-commit.  Find attached some suggestions.

Shouldn't non-text explain output always show both disk *and* mem, including
zeros ?

Should "Pre-sorted Groups:" be on a separate line ?
| Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB 
Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB

And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted
Groups"?  I think you should maybe do that instead of the "semicolon
separator".  I think "two spaces" makes sense, since the units are different,
similar to hash buckets and normal sort node.  

 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
 appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",

Note, I made a similar comment regarding two spaces for explain(WAL) here:
https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com

And Peter E seemed to dislike that, here:
https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com

Also, you're showing:
ExplainPropertyInteger("Maximum Sort Space Used", "kB",
groupInfo->maxMemorySpaceUsed, es);

But in show_hash_info() and show_hashagg_info(), and in your own text output,
that's called "Peak":
ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
ExplainPropertyInteger("Peak Memory Usage", "kB",
spacePeakKb, es);

-- 
Justin
>From e26f2cc842792fc3c0dd4b4b97c0996d450c6dd7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 6 Apr 2020 17:37:31 -0500
Subject: [PATCH v1] comment typos and others: Incremental Sort

commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da
Author: Tomas Vondra 
---
 doc/src/sgml/perform.sgml  |  2 +-
 src/backend/commands/explain.c | 17 +++---
 src/backend/executor/nodeIncrementalSort.c | 26 +++---
 src/backend/utils/sort/tuplesort.c | 10 -
 src/include/utils/tuplesort.h  |  2 +-
 5 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 0dfc3e80e2..f448abd073 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -311,7 +311,7 @@ EXPLAIN SELECT * FROM tenk1 ORDER BY unique1;
->  Seq Scan on tenk1  (cost=0.00..445.00 rows=1 width=244)
 
 
-If the a part of the plan guarantess an ordering on a prefix of the
+If a part of the plan guarantees an ordering on a prefix of the
 required sort keys, then the planner may instead decide to use an
 incremental sort step:
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index baaa5817af..ca4fe3307d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2532,7 +2532,7 @@ show_sort_group_keys(PlanState *planstate, const char *qlabel,
 
 	ExplainPropertyList(qlabel, result, es);
 	if (nPresortedKeys > 0)
-		ExplainPropertyList("Presorted Key", resultPresorted, es);
+		ExplainPropertyList("Pre-sorted Key", resultPresorted, es);
 }
 
 /*
@@ -2829,7 +2829,6 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 
 		ExplainPropertyList("Sort Methods Used", methodNames, es);
 
-		if (groupInfo->maxMemorySpaceUsed > 0)
 		{
 			long		avgSpace = groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
 			const char *spaceTypeName;
@@ -2841,12 +2840,12 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			ExplainOpenGroup("Sort Space", memoryName.data, true, es);
 
 			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es);
-			ExplainPropertyInteger("Maximum Sort Space Used", "kB",
+			ExplainPropertyInteger("Peak Sort Space Used", "kB",
    groupInfo->maxMemorySpaceUsed, es);
 
 			ExplainCloseGroup("Sort Spaces", memoryName.data, true, es);
 		}
-		if (groupInfo->maxDiskSpaceUsed > 0)
+
 		{
 			long		avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
 			const char *spaceTypeName;
@@ -2858,7 +2857,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 			ExplainOpenGroup("Sort Space", diskName.data, true, es);
 
 			ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es);
-			ExplainPropertyInteger("Maximum Sort Space Used", "kB",
+			ExplainPropertyInteger("Peak Sort Space Used", "kB",
    groupInfo->maxDiskSpaceUsed, es);
 
 			ExplainCloseGroup("Sort Spaces", diskName.data, true, es);
@@ -2869,7 +2868,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
+ * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node
  */
 static void
 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 11:00:37PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

I came to the same conclusion (that the change in TuplesortMethod
definiton is the culprit) a while ago and was about to push a fix that
initialized it correctly in ExecSortInitializeDSM. But I agree reverting
it back to the old definition is probably better.


Yeah, for the moment.  James would like to not have
SORT_TYPE_STILL_IN_PROGRESS be part of the enum at all, I think,
and I can see his point --- but then we need some out-of-band
representation of "worker not done", so I'm not sure there'll be
any net reduction of cruft.  Anyway that can be dealt with after
we have a stable buildfarm.



Agreed.


Note also that there's a separate comment-only patch in

that shouldn't be forgotten about.



OK, I'll take care of that tomorrow. I have two smaller patches to
commit in the incremental sort patchset, so I'll add it to that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> I came to the same conclusion (that the change in TuplesortMethod
> definiton is the culprit) a while ago and was about to push a fix that
> initialized it correctly in ExecSortInitializeDSM. But I agree reverting
> it back to the old definition is probably better.

Yeah, for the moment.  James would like to not have
SORT_TYPE_STILL_IN_PROGRESS be part of the enum at all, I think,
and I can see his point --- but then we need some out-of-band
representation of "worker not done", so I'm not sure there'll be
any net reduction of cruft.  Anyway that can be dealt with after
we have a stable buildfarm.

Note also that there's a separate comment-only patch in

that shouldn't be forgotten about.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 10:19:41PM -0400, Tom Lane wrote:

James Coleman  writes:

Fair enough. Unsure if Tomas is still online to comment and/or push,
but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an
initial fix.


I'm guessing he went to bed, so I'll push a fix in a moment.
The patch has survived enough test cycles here now to make me
moderately confident that it fixes the issue.



Nope, how I could I sleep with half of the buildfarm still red?

I came to the same conclusion (that the change in TuplesortMethod
definiton is the culprit) a while ago and was about to push a fix that
initialized it correctly in ExecSortInitializeDSM. But I agree reverting
it back to the old definition is probably better.

Thanks!

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
James Coleman  writes:
> Fair enough. Unsure if Tomas is still online to comment and/or push,
> but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an
> initial fix.

I'm guessing he went to bed, so I'll push a fix in a moment.
The patch has survived enough test cycles here now to make me
moderately confident that it fixes the issue.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 10:09 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Mon, Apr 6, 2020 at 9:46 PM Tom Lane  wrote:
> >> I think the correct fix is to change the enum declaration.
>
> > Hmm. I don't actually really like that, because it means the value
> > here isn't actually semantically correct. That is, the sort type is
> > not "in progress"; it's "we never started a sort at all".
>
> Well, yeah, but that pre-dated this patch, and right now is no
> time to improve it; we can debate such fine points at more leisure
> once the buildfarm isn't broken.

Fair enough. Unsure if Tomas is still online to comment and/or push,
but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an
initial fix.

> Obviously the comment needs fixed...

The one in show_short_info?

I can work on that (and the other proposed cleanup above) with Tomas
tomorrow or later.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
James Coleman  writes:
> On Mon, Apr 6, 2020 at 9:46 PM Tom Lane  wrote:
>> I think the correct fix is to change the enum declaration.

> Hmm. I don't actually really like that, because it means the value
> here isn't actually semantically correct. That is, the sort type is
> not "in progress"; it's "we never started a sort at all".

Well, yeah, but that pre-dated this patch, and right now is no
time to improve it; we can debate such fine points at more leisure
once the buildfarm isn't broken.

Obviously the comment needs fixed...

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 9:46 PM Tom Lane  wrote:
>
> Tomas Vondra  writes:
> > I don't know, I've tried running the tests on a number of machines,
> > similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
> > everywhere while the failures seem consistent.
>
> On my machine, it reproduces about one time in six with
> force_parallel_mode = regress.  It seems possible given your
> results that reducing max_parallel_workers would make it more
> likely, but I've not tried that.
>
> What I'm seeing, after adding some debug printouts, is that sortMethod is
> frequently zero when we reach the EXPLAIN output for a worker.  In many of
> the tests this happens even though there is no visible failure, because
> we've got a filter function hiding the output :-(
>
> So I concur with James' conclusion that the existing code is relying on
> sortMethod initializing to zeroes, and that we did the wrong thing by
> trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation.
> I do not like his patch though, particularly not the type pun with NULL.

Sentinel and NULL? I hadn't caught that at all.

> I think the correct fix is to change the enum declaration.

Hmm. I don't actually really like that, because it means the value
here isn't actually semantically correct. That is, the sort type is
not "in progress"; it's "we never started a sort at all". I don't
really love the conflating of those things that the old enum
declaration had (even it'd had a helpful comment). It seems to me that
we should make "we don't have a type" and "we have a type" distinct.

We could add a new enum value SORT_TYPE_UNINITIALIZED or similar though.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> I don't know, I've tried running the tests on a number of machines,
> similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
> everywhere while the failures seem consistent.

On my machine, it reproduces about one time in six with
force_parallel_mode = regress.  It seems possible given your
results that reducing max_parallel_workers would make it more
likely, but I've not tried that.

What I'm seeing, after adding some debug printouts, is that sortMethod is
frequently zero when we reach the EXPLAIN output for a worker.  In many of
the tests this happens even though there is no visible failure, because
we've got a filter function hiding the output :-(

So I concur with James' conclusion that the existing code is relying on
sortMethod initializing to zeroes, and that we did the wrong thing by
trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation.
I do not like his patch though, particularly not the type pun with NULL.
I think the correct fix is to change the enum declaration.

I know it's darn late where you are, do you want me to change it?

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 08:42:13PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

It doesn't seem to be particularly platform-specific, but I've been
unable to reproduce it so far. It seems on older gcc versions, though.


It's looking kind of like an uninitialized-memory problem.  Note
the latest from spurfowl,

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl=2020-04-07%2000%3A15%3A05

which got through "make check" and then failed during pg_upgrade's
repetition of the test.  Similarly on rhinoceros.  So there's definitely
instability there even on one machine.

Perhaps something to do with unexpected cache flushes??



I don't know, I've tried running the tests on a number of machines,
similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
everywhere while the failures seem consistent.

I've been able to reproduce these failures (same symptoms) by making
sure the worker (implied by force_parallel_mode=regress) won't start.

  set max_parallel_workers = 0;
  set force_parallel_mode = regress;

triggers exactly those failures for me (at least during make check, I
haven't tried pg_upgrade tests etc.).

So my theory is that we fail to start parallel workers on those
machines. It's not clear to me why would it be limited to some machines
and why would it be correlated to the incremental sort? I don't think
those machines have lower number of parallel workers, no?

But maybe incremental sort allowed using more parallel queries for more
queries, and we simply run out of parallel workers that way?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 07:27:19PM -0400, James Coleman wrote:

On Mon, Apr 6, 2020 at 7:09 PM James Coleman  wrote:


On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >>
> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >>>other BF members).
> >> >>
> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >> >>and they were all identical.  I can confirm that, in my laptop, the
> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >>
> >> >
> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >> >soon. I'll push a fix in a minute.
> >> >
> >>
> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >>
> >> It however seems to me a bit more needs to be done. The fix makes
> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> because IncrementalSortState does not have sort_Done flag so it still
> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> before, but not having that flag seems a bit weird to me.
> >>
> >> It also seems possibly incorrect - we may end up with
> >>
> >>fullsortGroupInfo->groupCount == 0
> >>prefixsortGroupInfo->groupCount > 0
> >>
> >> but we won't print anything.
> >
> >This shouldn't ever be possible, because the only way we get any
> >prefix groups at all is if we've already sorted a full sort group
> >during the mode transition.
> >
> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> and make it work as in plain Sort. Or some comment explaining why
> >> depending on the counts is OK (assuming it is).
> >
> >There's previous email traffic on this thread about that (I can look
> >it up later this evening), but the short of it is that I believe that
> >relying on the group count is actually more correct than a sort_Done
> >flag in the case of incremental sort (in contrast to regular sort).
> >
>
> OK. Maybe we should add a comment to explain.c saying it's OK.
>
> I've pushed a fix for failures due to different planned workers (in the
> test I added to show changes due to add_partial_path tweaks).
>
> It seems we're not out of the woods yet, though. rhinoceros and
> sidewinder failed with something like this:
>
>  Sort Method: quicksort  Memory: NNkB
> +   Sort Method: unknown  Disk: NNkB
>
> Would you mind investigating at it?

I assume that means those build farm members run with very low
work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
these tests? Or is that bad practice and these are to expose issues
with changing into disk sort mode?


On rhinoceros I see:

== pgsql.build/src/test/regress/regression.diffs
===
diff -U3 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
2020-03-14 10:37:49.156761104 -0700
+++ 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
2020-04-06 16:01:13.766798059 -0700
@@ -1328,8 +1328,9 @@
 ->  Sort (actual rows=3 loops=1)
   Sort Key: sq_limit.c1, sq_limit.pk
   Sort Method: top-N heapsort  Memory: xxx
+   Sort Method: unknown  Disk: 0kB
   ->  Seq Scan on sq_limit (actual rows=8 loops=1)
-(6 rows)
+(7 rows)

Same on sidewinder.

Given the 0kB I'm not sure this is *just* a work_mem thing, though
that's still something I'm curious to know about, and it's still part
of the "problem" here.

I'm investigating further.



I don't see how could this be caused by low work_mem value, really. What
I think it happening is that when executed with

force_parallel_mode = regress

we run this as if in a parallel worker, but then it gets somehow
confused and either (leader + worker) or two worker stats, or something
like that. I don't know why it's happening, though, or why would it be
triggered by the incremental sort patch ...

Actually, I just managed to trigger exactly this - the trick is that we
plan for certain number of workers, but then fail to start some. So for
example like this:

create table t (a int, b int, c int);

insert into t select mod(i,10), mod(i,10), i
  from generate_series(1,100) s(i);

set force_parallel_mode = regress;

set max_parallel_workers = 0;

explain (costs off, analyze) select * from t order by a,b;

   

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 7:31 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >> >>
> >> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >> >>>other BF members).
> >> >> >>
> >> >> >>FWIW I looked at the eight failures there were about fifteen minutes 
> >> >> >>ago
> >> >> >>and they were all identical.  I can confirm that, in my laptop, the
> >> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >> >>
> >> >> >
> >> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns 
> >> >> >too
> >> >> >soon. I'll push a fix in a minute.
> >> >> >
> >> >>
> >> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >> >>
> >> >> It however seems to me a bit more needs to be done. The fix makes
> >> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> >> because IncrementalSortState does not have sort_Done flag so it still
> >> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> >> before, but not having that flag seems a bit weird to me.
> >> >>
> >> >> It also seems possibly incorrect - we may end up with
> >> >>
> >> >>fullsortGroupInfo->groupCount == 0
> >> >>prefixsortGroupInfo->groupCount > 0
> >> >>
> >> >> but we won't print anything.
> >> >
> >> >This shouldn't ever be possible, because the only way we get any
> >> >prefix groups at all is if we've already sorted a full sort group
> >> >during the mode transition.
> >> >
> >> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> >> and make it work as in plain Sort. Or some comment explaining why
> >> >> depending on the counts is OK (assuming it is).
> >> >
> >> >There's previous email traffic on this thread about that (I can look
> >> >it up later this evening), but the short of it is that I believe that
> >> >relying on the group count is actually more correct than a sort_Done
> >> >flag in the case of incremental sort (in contrast to regular sort).
> >> >
> >>
> >> OK. Maybe we should add a comment to explain.c saying it's OK.
> >>
> >> I've pushed a fix for failures due to different planned workers (in the
> >> test I added to show changes due to add_partial_path tweaks).
> >>
> >> It seems we're not out of the woods yet, though. rhinoceros and
> >> sidewinder failed with something like this:
> >>
> >>  Sort Method: quicksort  Memory: NNkB
> >> +   Sort Method: unknown  Disk: NNkB
> >>
> >> Would you mind investigating at it?
> >
> >I assume that means those build farm members run with very low
> >work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
> >these tests? Or is that bad practice and these are to expose issues
> >with changing into disk sort mode?
> >
>
> I don't think so - I don't see any work_mem changes in the config - see
> the extra_config at the beginning of the page with details:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2020-04-06%2023%3A00%3A16
>
> Moreover, this seems to be in regular Sort, not Incremental Sort and it
> very much seems like it gets confused to print a worker info because the
> only way for Sort to print two "Sort Method" lines seems to be to enter
> either both
>
>if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
>{
>  ... print leader info ...
>}
>
>and
>
>if (sortstate->shared_info != NULL)
>{
>  for (n = 0; n < sortstate->shared_info->num_workers; n++)
>  {
>... print worker info ...
>  }
>}
>
> or maybe there are two workers? It's strange ...
>
>
> It doesn't seem to be particularly platform-specific, but I've been
> unable to reproduce it so far. It seems on older gcc versions, though.


I haven't been able to reproduce it, but I'm 99% confident this will fix it:

-if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS
+|| sinstrument->sortMethod == NULL)
 continue;   /* ignore any unfilled slots */

Earlier we'd had this discussion about why SORT_TYPE_STILL_IN_PROGRESS
was explicitly set to 0 in the enum declaration. Since there was no
comment, we changed that, but here I believe that show_sort_info was
relying on that as an indicator that a worker didn't actually do any
work (since the DSM for the sort node gets set to all zeros, this
would work).

I'm not sure if the 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> It doesn't seem to be particularly platform-specific, but I've been
> unable to reproduce it so far. It seems on older gcc versions, though.

It's looking kind of like an uninitialized-memory problem.  Note
the latest from spurfowl,

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl=2020-04-07%2000%3A15%3A05

which got through "make check" and then failed during pg_upgrade's
repetition of the test.  Similarly on rhinoceros.  So there's definitely
instability there even on one machine.

Perhaps something to do with unexpected cache flushes??

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote:

On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
 wrote:


On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
>On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> wrote:
>>
>> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
>> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
>> >>On 2020-Apr-06, Tom Lane wrote:
>> >>
>> >>>Locally, things pass without force_parallel_mode, but turning it on
>> >>>produces failures that look similar to rhinoceros's (didn't examine
>> >>>other BF members).
>> >>
>> >>FWIW I looked at the eight failures there were about fifteen minutes ago
>> >>and they were all identical.  I can confirm that, in my laptop, the
>> >>tests work without that GUC, and fail in exactly that way with it.
>> >>
>> >
>> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
>> >soon. I'll push a fix in a minute.
>> >
>>
>> OK, I've pushed a fix - this should make the buildfarm happy again.
>>
>> It however seems to me a bit more needs to be done. The fix makes
>> show_incremental_sort_info closer to show_sort_info, but not entirely
>> because IncrementalSortState does not have sort_Done flag so it still
>> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
>> before, but not having that flag seems a bit weird to me.
>>
>> It also seems possibly incorrect - we may end up with
>>
>>fullsortGroupInfo->groupCount == 0
>>prefixsortGroupInfo->groupCount > 0
>>
>> but we won't print anything.
>
>This shouldn't ever be possible, because the only way we get any
>prefix groups at all is if we've already sorted a full sort group
>during the mode transition.
>
>> James, any opinion on this? I'd say we should restore the sort_Done flag
>> and make it work as in plain Sort. Or some comment explaining why
>> depending on the counts is OK (assuming it is).
>
>There's previous email traffic on this thread about that (I can look
>it up later this evening), but the short of it is that I believe that
>relying on the group count is actually more correct than a sort_Done
>flag in the case of incremental sort (in contrast to regular sort).
>

OK. Maybe we should add a comment to explain.c saying it's OK.

I've pushed a fix for failures due to different planned workers (in the
test I added to show changes due to add_partial_path tweaks).

It seems we're not out of the woods yet, though. rhinoceros and
sidewinder failed with something like this:

 Sort Method: quicksort  Memory: NNkB
+   Sort Method: unknown  Disk: NNkB

Would you mind investigating at it?


I assume that means those build farm members run with very low
work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
these tests? Or is that bad practice and these are to expose issues
with changing into disk sort mode?



I don't think so - I don't see any work_mem changes in the config - see
the extra_config at the beginning of the page with details:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2020-04-06%2023%3A00%3A16

Moreover, this seems to be in regular Sort, not Incremental Sort and it
very much seems like it gets confused to print a worker info because the
only way for Sort to print two "Sort Method" lines seems to be to enter
either both

  if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
  {
... print leader info ...
  }

  and

  if (sortstate->shared_info != NULL)
  {
for (n = 0; n < sortstate->shared_info->num_workers; n++)
{
  ... print worker info ...
}
  }

or maybe there are two workers? It's strange ...


It doesn't seem to be particularly platform-specific, but I've been
unable to reproduce it so far. It seems on older gcc versions, though.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 7:09 PM James Coleman  wrote:
>
> On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
>  wrote:
> >
> > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> > > wrote:
> > >>
> > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> > >> >>On 2020-Apr-06, Tom Lane wrote:
> > >> >>
> > >> >>>Locally, things pass without force_parallel_mode, but turning it on
> > >> >>>produces failures that look similar to rhinoceros's (didn't examine
> > >> >>>other BF members).
> > >> >>
> > >> >>FWIW I looked at the eight failures there were about fifteen minutes 
> > >> >>ago
> > >> >>and they were all identical.  I can confirm that, in my laptop, the
> > >> >>tests work without that GUC, and fail in exactly that way with it.
> > >> >>
> > >> >
> > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> > >> >soon. I'll push a fix in a minute.
> > >> >
> > >>
> > >> OK, I've pushed a fix - this should make the buildfarm happy again.
> > >>
> > >> It however seems to me a bit more needs to be done. The fix makes
> > >> show_incremental_sort_info closer to show_sort_info, but not entirely
> > >> because IncrementalSortState does not have sort_Done flag so it still
> > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> > >> before, but not having that flag seems a bit weird to me.
> > >>
> > >> It also seems possibly incorrect - we may end up with
> > >>
> > >>fullsortGroupInfo->groupCount == 0
> > >>prefixsortGroupInfo->groupCount > 0
> > >>
> > >> but we won't print anything.
> > >
> > >This shouldn't ever be possible, because the only way we get any
> > >prefix groups at all is if we've already sorted a full sort group
> > >during the mode transition.
> > >
> > >> James, any opinion on this? I'd say we should restore the sort_Done flag
> > >> and make it work as in plain Sort. Or some comment explaining why
> > >> depending on the counts is OK (assuming it is).
> > >
> > >There's previous email traffic on this thread about that (I can look
> > >it up later this evening), but the short of it is that I believe that
> > >relying on the group count is actually more correct than a sort_Done
> > >flag in the case of incremental sort (in contrast to regular sort).
> > >
> >
> > OK. Maybe we should add a comment to explain.c saying it's OK.
> >
> > I've pushed a fix for failures due to different planned workers (in the
> > test I added to show changes due to add_partial_path tweaks).
> >
> > It seems we're not out of the woods yet, though. rhinoceros and
> > sidewinder failed with something like this:
> >
> >  Sort Method: quicksort  Memory: NNkB
> > +   Sort Method: unknown  Disk: NNkB
> >
> > Would you mind investigating at it?
>
> I assume that means those build farm members run with very low
> work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
> these tests? Or is that bad practice and these are to expose issues
> with changing into disk sort mode?

On rhinoceros I see:

== pgsql.build/src/test/regress/regression.diffs
===
diff -U3 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
--- 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
2020-03-14 10:37:49.156761104 -0700
+++ 
/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
2020-04-06 16:01:13.766798059 -0700
@@ -1328,8 +1328,9 @@
  ->  Sort (actual rows=3 loops=1)
Sort Key: sq_limit.c1, sq_limit.pk
Sort Method: top-N heapsort  Memory: xxx
+   Sort Method: unknown  Disk: 0kB
->  Seq Scan on sq_limit (actual rows=8 loops=1)
-(6 rows)
+(7 rows)

Same on sidewinder.

Given the 0kB I'm not sure this is *just* a work_mem thing, though
that's still something I'm curious to know about, and it's still part
of the "problem" here.

I'm investigating further.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >>
> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >>>other BF members).
> >> >>
> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >> >>and they were all identical.  I can confirm that, in my laptop, the
> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >>
> >> >
> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >> >soon. I'll push a fix in a minute.
> >> >
> >>
> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >>
> >> It however seems to me a bit more needs to be done. The fix makes
> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> because IncrementalSortState does not have sort_Done flag so it still
> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> before, but not having that flag seems a bit weird to me.
> >>
> >> It also seems possibly incorrect - we may end up with
> >>
> >>fullsortGroupInfo->groupCount == 0
> >>prefixsortGroupInfo->groupCount > 0
> >>
> >> but we won't print anything.
> >
> >This shouldn't ever be possible, because the only way we get any
> >prefix groups at all is if we've already sorted a full sort group
> >during the mode transition.
> >
> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> and make it work as in plain Sort. Or some comment explaining why
> >> depending on the counts is OK (assuming it is).
> >
> >There's previous email traffic on this thread about that (I can look
> >it up later this evening), but the short of it is that I believe that
> >relying on the group count is actually more correct than a sort_Done
> >flag in the case of incremental sort (in contrast to regular sort).
> >
>
> OK. Maybe we should add a comment to explain.c saying it's OK.
>
> I've pushed a fix for failures due to different planned workers (in the
> test I added to show changes due to add_partial_path tweaks).
>
> It seems we're not out of the woods yet, though. rhinoceros and
> sidewinder failed with something like this:
>
>  Sort Method: quicksort  Memory: NNkB
> +   Sort Method: unknown  Disk: NNkB
>
> Would you mind investigating at it?

I assume that means those build farm members run with very low
work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
these tests? Or is that bad practice and these are to expose issues
with changing into disk sort mode?

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 06:34:04PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote:

Well, it's *less* unhappy.  thorntail is showing that the number of
workers field is not stable; that will need to be masked.



Yeah, I've already pushed a fix for that. But there seems to be another
failure in th explain output. Looking.


I'm kind of unimpressed with that fix, because it guarantees that if
there's any problem with more than 2 workers, this test will never
find it.  I think you should do what I said and arrange to replace
the number-of-workers output with "N".



I can do that, but isn't this how every other regression test does it?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote:
>> Well, it's *less* unhappy.  thorntail is showing that the number of
>> workers field is not stable; that will need to be masked.

> Yeah, I've already pushed a fix for that. But there seems to be another
> failure in th explain output. Looking.

I'm kind of unimpressed with that fix, because it guarantees that if
there's any problem with more than 2 workers, this test will never
find it.  I think you should do what I said and arrange to replace
the number-of-workers output with "N".

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

OK, I've pushed a fix - this should make the buildfarm happy again.


Well, it's *less* unhappy.  thorntail is showing that the number of
workers field is not stable; that will need to be masked.



Yeah, I've already pushed a fix for that. But there seems to be another
failure in th explain output. Looking.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:

On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
 wrote:


On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
>On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
>>On 2020-Apr-06, Tom Lane wrote:
>>
>>>Locally, things pass without force_parallel_mode, but turning it on
>>>produces failures that look similar to rhinoceros's (didn't examine
>>>other BF members).
>>
>>FWIW I looked at the eight failures there were about fifteen minutes ago
>>and they were all identical.  I can confirm that, in my laptop, the
>>tests work without that GUC, and fail in exactly that way with it.
>>
>
>Yes, there's a thinko in show_incremental_sort_info() and it returns too
>soon. I'll push a fix in a minute.
>

OK, I've pushed a fix - this should make the buildfarm happy again.

It however seems to me a bit more needs to be done. The fix makes
show_incremental_sort_info closer to show_sort_info, but not entirely
because IncrementalSortState does not have sort_Done flag so it still
depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
before, but not having that flag seems a bit weird to me.

It also seems possibly incorrect - we may end up with

   fullsortGroupInfo->groupCount == 0
   prefixsortGroupInfo->groupCount > 0

but we won't print anything.


This shouldn't ever be possible, because the only way we get any
prefix groups at all is if we've already sorted a full sort group
during the mode transition.


James, any opinion on this? I'd say we should restore the sort_Done flag
and make it work as in plain Sort. Or some comment explaining why
depending on the counts is OK (assuming it is).


There's previous email traffic on this thread about that (I can look
it up later this evening), but the short of it is that I believe that
relying on the group count is actually more correct than a sort_Done
flag in the case of incremental sort (in contrast to regular sort).



OK. Maybe we should add a comment to explain.c saying it's OK.

I've pushed a fix for failures due to different planned workers (in the
test I added to show changes due to add_partial_path tweaks).

It seems we're not out of the woods yet, though. rhinoceros and
sidewinder failed with something like this:

Sort Method: quicksort  Memory: NNkB
+   Sort Method: unknown  Disk: NNkB

Would you mind investigating at it?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> OK, I've pushed a fix - this should make the buildfarm happy again.

Well, it's *less* unhappy.  thorntail is showing that the number of
workers field is not stable; that will need to be masked.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >>On 2020-Apr-06, Tom Lane wrote:
> >>
> >>>Locally, things pass without force_parallel_mode, but turning it on
> >>>produces failures that look similar to rhinoceros's (didn't examine
> >>>other BF members).
> >>
> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >>and they were all identical.  I can confirm that, in my laptop, the
> >>tests work without that GUC, and fail in exactly that way with it.
> >>
> >
> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >soon. I'll push a fix in a minute.
> >
>
> OK, I've pushed a fix - this should make the buildfarm happy again.
>
> It however seems to me a bit more needs to be done. The fix makes
> show_incremental_sort_info closer to show_sort_info, but not entirely
> because IncrementalSortState does not have sort_Done flag so it still
> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> before, but not having that flag seems a bit weird to me.
>
> It also seems possibly incorrect - we may end up with
>
>fullsortGroupInfo->groupCount == 0
>prefixsortGroupInfo->groupCount > 0
>
> but we won't print anything.

This shouldn't ever be possible, because the only way we get any
prefix groups at all is if we've already sorted a full sort group
during the mode transition.

> James, any opinion on this? I'd say we should restore the sort_Done flag
> and make it work as in plain Sort. Or some comment explaining why
> depending on the counts is OK (assuming it is).

There's previous email traffic on this thread about that (I can look
it up later this evening), but the short of it is that I believe that
relying on the group count is actually more correct than a sort_Done
flag in the case of incremental sort (in contrast to regular sort).

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:

On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:

On 2020-Apr-06, Tom Lane wrote:


Locally, things pass without force_parallel_mode, but turning it on
produces failures that look similar to rhinoceros's (didn't examine
other BF members).


FWIW I looked at the eight failures there were about fifteen minutes ago
and they were all identical.  I can confirm that, in my laptop, the
tests work without that GUC, and fail in exactly that way with it.



Yes, there's a thinko in show_incremental_sort_info() and it returns too
soon. I'll push a fix in a minute.



OK, I've pushed a fix - this should make the buildfarm happy again.

It however seems to me a bit more needs to be done. The fix makes
show_incremental_sort_info closer to show_sort_info, but not entirely
because IncrementalSortState does not have sort_Done flag so it still
depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
before, but not having that flag seems a bit weird to me.

It also seems possibly incorrect - we may end up with

  fullsortGroupInfo->groupCount == 0
  prefixsortGroupInfo->groupCount > 0

but we won't print anything.

James, any opinion on this? I'd say we should restore the sort_Done flag
and make it work as in plain Sort. Or some comment explaining why
depending on the counts is OK (assuming it is).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 5:22 PM James Coleman  wrote:
>
> On Mon, Apr 6, 2020 at 5:20 PM James Coleman  wrote:
> >
> > On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra
> >  wrote:
> > >
> > > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> > > >On 2020-Apr-06, Tom Lane wrote:
> > > >
> > > >> Locally, things pass without force_parallel_mode, but turning it on
> > > >> produces failures that look similar to rhinoceros's (didn't examine
> > > >> other BF members).
> > > >
> > > >FWIW I looked at the eight failures there were about fifteen minutes ago
> > > >and they were all identical.  I can confirm that, in my laptop, the
> > > >tests work without that GUC, and fail in exactly that way with it.
> > > >
> > >
> > > Yes, there's a thinko in show_incremental_sort_info() and it returns too
> > > soon. I'll push a fix in a minute.
> >
> > I'm stepping through this in a debugger; is what you're considering
> > that the for loop through the workers is off by one?
>
> Oh, nevermind, misread that.
>
> Looks like if the leader doesn't participate, then we don't show
> details for workers.
>
> Tomas: Do you already have a patch? If not, I can work one up.

Well, already have it, so I'll send it just in case.

James
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62c86ecdc5..c31f3e0987 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2880,19 +2880,22 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate,
 
 	fullsortGroupInfo = >incsort_info.fullsortGroupInfo;
 
-	if (!(es->analyze && fullsortGroupInfo->groupCount > 0))
+	if (!es->analyze)
 		return;
 
-	show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es);
-	prefixsortGroupInfo = >incsort_info.prefixsortGroupInfo;
-	if (prefixsortGroupInfo->groupCount > 0)
+	if (fullsortGroupInfo->groupCount > 0)
 	{
+		show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es);
+		prefixsortGroupInfo = >incsort_info.prefixsortGroupInfo;
+		if (prefixsortGroupInfo->groupCount > 0)
+		{
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+appendStringInfo(es->str, " ");
+			show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
+		}
 		if (es->format == EXPLAIN_FORMAT_TEXT)
-			appendStringInfo(es->str, " ");
-		show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es);
+			appendStringInfo(es->str, "\n");
 	}
-	if (es->format == EXPLAIN_FORMAT_TEXT)
-		appendStringInfo(es->str, "\n");
 
 	if (incrsortstate->shared_info != NULL)
 	{
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 3b359efa29..c46443358a 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -1,3 +1,5 @@
+set force_parallel_mode = regress;
+
 -- When we have to sort the entire table, incremental sort will
 -- be slower than plain sort, so it should not be used.
 explain (costs off)


Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 5:20 PM James Coleman  wrote:
>
> On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra
>  wrote:
> >
> > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> > >On 2020-Apr-06, Tom Lane wrote:
> > >
> > >> Locally, things pass without force_parallel_mode, but turning it on
> > >> produces failures that look similar to rhinoceros's (didn't examine
> > >> other BF members).
> > >
> > >FWIW I looked at the eight failures there were about fifteen minutes ago
> > >and they were all identical.  I can confirm that, in my laptop, the
> > >tests work without that GUC, and fail in exactly that way with it.
> > >
> >
> > Yes, there's a thinko in show_incremental_sort_info() and it returns too
> > soon. I'll push a fix in a minute.
>
> I'm stepping through this in a debugger; is what you're considering
> that the for loop through the workers is off by one?

Oh, nevermind, misread that.

Looks like if the leader doesn't participate, then we don't show
details for workers.

Tomas: Do you already have a patch? If not, I can work one up.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread James Coleman
On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra
 wrote:
>
> On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >On 2020-Apr-06, Tom Lane wrote:
> >
> >> Locally, things pass without force_parallel_mode, but turning it on
> >> produces failures that look similar to rhinoceros's (didn't examine
> >> other BF members).
> >
> >FWIW I looked at the eight failures there were about fifteen minutes ago
> >and they were all identical.  I can confirm that, in my laptop, the
> >tests work without that GUC, and fail in exactly that way with it.
> >
>
> Yes, there's a thinko in show_incremental_sort_info() and it returns too
> soon. I'll push a fix in a minute.

I'm stepping through this in a debugger; is what you're considering
that the for loop through the workers is off by one?

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:

On 2020-Apr-06, Tom Lane wrote:


Locally, things pass without force_parallel_mode, but turning it on
produces failures that look similar to rhinoceros's (didn't examine
other BF members).


FWIW I looked at the eight failures there were about fifteen minutes ago
and they were all identical.  I can confirm that, in my laptop, the
tests work without that GUC, and fail in exactly that way with it.



Yes, there's a thinko in show_incremental_sort_info() and it returns too
soon. I'll push a fix in a minute.


thanks

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Alvaro Herrera
On 2020-Apr-06, Tom Lane wrote:

> Locally, things pass without force_parallel_mode, but turning it on
> produces failures that look similar to rhinoceros's (didn't examine
> other BF members).

FWIW I looked at the eight failures there were about fifteen minutes ago
and they were all identical.  I can confirm that, in my laptop, the
tests work without that GUC, and fail in exactly that way with it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> On Mon, Apr 06, 2020 at 04:14:38PM -0400, Tom Lane wrote:
>> Did you ever use force_parallel_mode = regress?

> Ah, not sure - probably not in this round of tests and there were some
> changes in the explain code. Thanks for the hint.

Locally, things pass without force_parallel_mode, but turning it on
produces failures that look similar to rhinoceros's (didn't examine
other BF members).  At a guess, looks like missing or incorrect logic
for propagating some state back from parallel workers.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 04:14:38PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Hmmm, I see the buildfarm is not happy about it - a couple of animals
failed, but some succeeded. The failure seems like a simple difference
in explain output, but it's not clear why would it happen (and I've
ran the tests many times but never seen this failure).


Did you ever use force_parallel_mode = regress?



Ah, not sure - probably not in this round of tests and there were some
changes in the explain code. Thanks for the hint.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tom Lane
Tomas Vondra  writes:
> Hmmm, I see the buildfarm is not happy about it - a couple of animals
> failed, but some succeeded. The failure seems like a simple difference
> in explain output, but it's not clear why would it happen (and I've
> ran the tests many times but never seen this failure).

Did you ever use force_parallel_mode = regress?

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-06 Thread Tomas Vondra

On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote:

Hi,

I've pushed the fist part of this patch series - I've reorganized it a
bit by moving the add_partial_path changes to the end. That way I've
been able to add regression test demonstrating impact of the change on
plans involving incremental sort nodes (which wouldn't be possible when
committing the add_partial_path first). I'll wait a bit before pushing
the two additional parts, so that if something fails we know which bit
caused it.



Hmmm, I see the buildfarm is not happy about it - a couple of animals
failed, but some succeeded. The failure seems like a simple difference
in explain output, but it's not clear why would it happen (and I've
ran the tests many times but never seen this failure).

Investigating.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-05 Thread Tomas Vondra

On Sun, Apr 05, 2020 at 03:01:10PM +0200, Tomas Vondra wrote:

On Thu, Apr 02, 2020 at 09:40:45PM -0400, James Coleman wrote:

On Thu, Apr 2, 2020 at 8:46 PM James Coleman  wrote:


On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
 wrote:

...
5) Overall, I think the costing is OK. I'm sure we'll find cases that
will need improvements, but that's fine. However, we now have

- cost_tuplesort (used to be cost_sort)
- cost_full_sort
- cost_incremental_sort
- cost_sort

I find it a bit confusing that we have cost_sort and cost_full_sort. Why
don't we just keep using the dummy path in label_sort_with_costsize?
That seems to be the only external caller outside costsize.c. Then we
could either make cost_full_sort static or get rid of it entirely.


This another area of the patch I haven't really modified.


See attached for a cleanup of this; it removed cost_fullsort so
label_sort_with_costsize is back to how it was.

I've directly merged this into the patch series; if you'd like to see
the diff I can send that along.



Thanks. Attached is v54 of the patch, with some minor changes. The main
two changes are in add_partial_path_precheck(), firstly to also consider
startup_cost, as discussed before. The second change (in 0003) is a bit
of an experiment to make add_partial_precheck() cheaper by only calling
compare_pathkeys after checking the costs first (which should be cheaper
than the function call). add_path_precheck already does it in that order
anyway.



Oh, I forgot to mention a change in add_partial_path - I've removed the
reference/dependency on enable_incrementalsort. It seemed rather ugly,
and the results without it seem fine (I'm benchmarking only the case
with incremental sort enabled anyway). I also plan to look at the other
optimization we bolted on last week, i.e. checking length of pathkeys.
I'll see if that actually makes measurable difference.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-02 Thread James Coleman
On Thu, Apr 2, 2020 at 8:46 PM James Coleman  wrote:
>
> On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > Thanks, the v52 looks almost ready. I've been looking at the two or
> > three things I mentioned, and I have a couple of comments.
> >
> >
> > 1) /* XXX comparison_cost shouldn't be 0? */
> >
> > I'm not worried about this, because this is not really intriduced by
> > this patch - create_sort_path has the same comment/issue, so I think
> > it's acceptable to do the same thing for incremental sort.
>
> Sounds good.
>
> > 2) INITIAL_MEMTUPSIZE
> >
> > tuplesort.c does this:
> >
> >#define INITIAL_MEMTUPSIZE Max(1024, \
> >ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
> >
> > supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
> > But I think it fails to do that, for a couple of reasons.
> >
> > Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
> > minimum (without padding), so with 1024 elements it's guaranteed to be
> > at least 21kB - so exceeding the threshold. The maximum value is
> > something like 256.
> >
> > Secondly, the second part of the formula is guaranteed to get us over
> > the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then
> >
> >ALLOCSET_SEPARATE_THRESHOLD / 30 = 273
> >
> > but we end up using 274, resulting in 8220B array. :-(
> >
> > So I guess the formula should be more like
> >
> >Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))
> >
> > or something like that.
> >
> > FWIW I think the whole hypothesis that selecting the array size below
> > ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
> > allocate this only once (or very few times), and if we need to grow the
> > array we'll hit the threshold anyway.
> >
> > I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
> > may be OK too.
>
> That was a part of the patch I haven't touched since I inherited it,
> and I didn't feel like I knew enough about the Postgres memory
> management to make a determination on whether the reasoning made
> sense.
>
> So I' happy to use a constant as suggested.

I take that back. That code hasn't changed, it's just moved. Here's
the current code in tuplesort_begin_common on master:

/*
* Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
* see comments in grow_memtuples().
*/
state->memtupsize = Max(1024,
ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1);

I'm not sure we ought to change that in this patch...

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-02 Thread James Coleman
On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
 wrote:
>
> Hi,
>
> Thanks, the v52 looks almost ready. I've been looking at the two or
> three things I mentioned, and I have a couple of comments.
>
>
> 1) /* XXX comparison_cost shouldn't be 0? */
>
> I'm not worried about this, because this is not really intriduced by
> this patch - create_sort_path has the same comment/issue, so I think
> it's acceptable to do the same thing for incremental sort.

Sounds good.

> 2) INITIAL_MEMTUPSIZE
>
> tuplesort.c does this:
>
>#define INITIAL_MEMTUPSIZE Max(1024, \
>ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)
>
> supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
> But I think it fails to do that, for a couple of reasons.
>
> Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
> minimum (without padding), so with 1024 elements it's guaranteed to be
> at least 21kB - so exceeding the threshold. The maximum value is
> something like 256.
>
> Secondly, the second part of the formula is guaranteed to get us over
> the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then
>
>ALLOCSET_SEPARATE_THRESHOLD / 30 = 273
>
> but we end up using 274, resulting in 8220B array. :-(
>
> So I guess the formula should be more like
>
>Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))
>
> or something like that.
>
> FWIW I think the whole hypothesis that selecting the array size below
> ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
> allocate this only once (or very few times), and if we need to grow the
> array we'll hit the threshold anyway.
>
> I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
> may be OK too.

That was a part of the patch I haven't touched since I inherited it,
and I didn't feel like I knew enough about the Postgres memory
management to make a determination on whether the reasoning made
sense.

So I' happy to use a constant as suggested.

> 3) add_partial_path
>
> I'm a bit torn about using enable_incrementalsort in add_partial_path. I
> know we've done that to eliminate the overhead, but it just seems weird.
> I wonder if that really saves us anything or if it was just noise. I'll
> do more testing before making a decision.
>
> While looking at add_path, I see the comparisons there are made in the
> opposite order - we first compare costs, and only then pathkeys. That
> seems like a more efficient way, I guess (cheaper float comparison
> first, pathkey comparison with a loop second). I wonder why it's not
> done like that in add_partial_path too? Perhaps this will make it cheap
> enough to remove the enable_incrementalsort check.

I would love to avoid checking enable_incrementalsort there. I think
it's pretty gross to do so. But if it's the only way to keep the patch
acceptable, then obviously I'd leave it in. So I'm hopeful we can
avoid needing it.

> 4) add_partial_path_precheck
>
> While looking at add_partial_path, I realized add_partial_path_precheck
> probably needs the same change - to consider startup cost too. I think
> the expectation is that add_partial_path_precheck only rejects paths
> that we know would then get rejected by add_partial_path.
>
> But now the behavior is inconsistent, which might lead to surprising
> behavior (I haven't seen such cases, but I haven't looked for them).
> At the moment the add_partial_path_precheck is called only from
> joinpath.c, maybe it's not an issue there.
>
> If it's OK to keep it like this, it'd probably deserve a comment why
> the difference is OK. And the comments also contain the same claim that
> parallel plans only need to look at total cost.

I remember some discussion about that much earlier in this thread (or
in the spin-off thread [1] re: that specific change that didn't get a
lot of action), and I'm pretty sure the conclusion was that we should
change both. But I guess we never got around to doing it.

> 5) Overall, I think the costing is OK. I'm sure we'll find cases that
> will need improvements, but that's fine. However, we now have
>
> - cost_tuplesort (used to be cost_sort)
> - cost_full_sort
> - cost_incremental_sort
> - cost_sort
>
> I find it a bit confusing that we have cost_sort and cost_full_sort. Why
> don't we just keep using the dummy path in label_sort_with_costsize?
> That seems to be the only external caller outside costsize.c. Then we
> could either make cost_full_sort static or get rid of it entirely.

This another area of the patch I haven't really modified.

James

[1]: 
https://www.postgresql.org/message-id/flat/CAAaqYe-5HmM4ih6FWp2RNV9rruunfrFrLhqFXF_nrrNCPy1Zhg%40mail.gmail.com




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-02 Thread Tomas Vondra

Hi,

Thanks, the v52 looks almost ready. I've been looking at the two or
three things I mentioned, and I have a couple of comments.


1) /* XXX comparison_cost shouldn't be 0? */

I'm not worried about this, because this is not really intriduced by
this patch - create_sort_path has the same comment/issue, so I think
it's acceptable to do the same thing for incremental sort.


2) INITIAL_MEMTUPSIZE

tuplesort.c does this:

  #define INITIAL_MEMTUPSIZE Max(1024, \
  ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1)

supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD.
But I think it fails to do that, for a couple of reasons.

Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at
minimum (without padding), so with 1024 elements it's guaranteed to be
at least 21kB - so exceeding the threshold. The maximum value is
something like 256.

Secondly, the second part of the formula is guaranteed to get us over
the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then

  ALLOCSET_SEPARATE_THRESHOLD / 30 = 273

but we end up using 274, resulting in 8220B array. :-(

So I guess the formula should be more like

  Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple))

or something like that.

FWIW I think the whole hypothesis that selecting the array size below
ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we
allocate this only once (or very few times), and if we need to grow the
array we'll hit the threshold anyway.

I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024
may be OK too.


3) add_partial_path

I'm a bit torn about using enable_incrementalsort in add_partial_path. I
know we've done that to eliminate the overhead, but it just seems weird.
I wonder if that really saves us anything or if it was just noise. I'll
do more testing before making a decision.

While looking at add_path, I see the comparisons there are made in the
opposite order - we first compare costs, and only then pathkeys. That
seems like a more efficient way, I guess (cheaper float comparison
first, pathkey comparison with a loop second). I wonder why it's not
done like that in add_partial_path too? Perhaps this will make it cheap
enough to remove the enable_incrementalsort check.


4) add_partial_path_precheck

While looking at add_partial_path, I realized add_partial_path_precheck
probably needs the same change - to consider startup cost too. I think
the expectation is that add_partial_path_precheck only rejects paths
that we know would then get rejected by add_partial_path.

But now the behavior is inconsistent, which might lead to surprising
behavior (I haven't seen such cases, but I haven't looked for them).
At the moment the add_partial_path_precheck is called only from
joinpath.c, maybe it's not an issue there.

If it's OK to keep it like this, it'd probably deserve a comment why
the difference is OK. And the comments also contain the same claim that
parallel plans only need to look at total cost.


5) Overall, I think the costing is OK. I'm sure we'll find cases that
will need improvements, but that's fine. However, we now have 


- cost_tuplesort (used to be cost_sort)
- cost_full_sort
- cost_incremental_sort
- cost_sort

I find it a bit confusing that we have cost_sort and cost_full_sort. Why
don't we just keep using the dummy path in label_sort_with_costsize?
That seems to be the only external caller outside costsize.c. Then we
could either make cost_full_sort static or get rid of it entirely.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-01 Thread Tomas Vondra

On Wed, Apr 01, 2020 at 10:09:20PM -0400, James Coleman wrote:

On Wed, Apr 1, 2020 at 5:42 PM Tomas Vondra
 wrote:

...
I've realized the way get_useful_pathkeys_for_relation() is coded kinda
works against the fastpath we added when comparing pathkeys. That
depends on comparing pointers to the list, but we've been building new
lists (and then returned those) which defeats the optimization. Attached
is a patch that returns the original list in most cases (and only
creates a copy when really necessary). This might also save a few cycles
on bulding the new list, of course.

I've done a bunch of read-only pgbench tests with fairly small scales (1
and 10). First with the built-in read-only transaction, and also with a
simple custom query doing an order-by. And I did this both on the
default schema and with a bunch of extra indexes. The script I used to
run this is attached, along with a summary of results.

There are results for master and v40 and v50 patches (the v50 also
includes the extra patch fixing get_useful_pathkeys_for_relation).

Overall, I'm happy with those results - the v50 seems to be within 1% of
master, in both directions. This very much seems like a noise.

I still want to do a bit more review of the costing / tuplesort changes,
which I plan to do tomorrow. If that goes well, I plan to start
committing this. So please if you think this is not ready or wants more


I think we need to either implement this or remove the comment:
* XXX I wonder if we need to consider adding a projection here, as
* create_ordered_paths does.
in generate_useful_gather_paths().



Yeah. I think we don't need the projection here. My reasoning is that if
we don't need it in generate_gather_paths(), we don't need it here.


In the same function we have the following code:
/*
* When the partial path is already sorted, we can just add a gather
* merge on top, and we're done - no point in adding explicit sort.
*
* XXX Can't we skip this (maybe only for the cheapest partial path)
* when the path is already sorted? Then it's likely duplicate with
* the path created by generate_gather_paths.
*/
if (is_sorted)
{
   path = create_gather_merge_path(root, rel, subpath, rel->reltarget,

subpath->pathkeys, NULL, rowsp);

   add_path(rel, >path);
   continue;
}

looking at the relevant loop in generate_gather_paths:
/*
* For each useful ordering, we can consider an order-preserving Gather
* Merge.
*/
foreach(lc, rel->partial_pathlist)
{
   Path   *subpath = (Path *) lfirst(lc);
   GatherMergePath *path;

   if (subpath->pathkeys == NIL)
   continue;

   rows = subpath->rows * subpath->parallel_workers;
   path = create_gather_merge_path(root, rel, subpath, rel->reltarget,

subpath->pathkeys, NULL, rowsp);
   add_path(rel, >path);
}

I believe we can eliminate the block entirely in
generate_useful_gather_paths(). Here's my reasoning: all paths for
which is_sorted is true must necessarily have pathkeys, and since we
already add a gather merge for every subpath with pathkeys, we've
already added gather merge paths for all of these.

I've included a patch to change this, but let me know if the reasoning
isn't sound.



Good catch! I think you're correct - we don't need to generate this
path, and we can just skip that partial path entirely.


We can also remove the XXX on this comment (in the same function):
* XXX This is not redundant with the gather merge path created in
* generate_gather_paths, because that merely preserves ordering of
* the cheapest partial path, while here we add an explicit sort to
* get match the useful ordering.

because of this code in generate_gather_paths():
cheapest_partial_path = linitial(rel->partial_pathlist);
rows =
   cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
simple_gather_path = (Path *)
   create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
  NULL, rowsp);
add_path(rel, simple_gather_path);

but we can cleanup the comment a bit: fix the grammar issue in the
last line and fix the reference to gather merge path (it's a gather
path).

I've included that in the same patch.



OK, makes sense.


I also noticed that in create_incremental_sort_path we have this:
/* XXX comparison_cost shouldn't be 0? */
but I guess that's part of what you're reviewing tomorrow.



Right, one of the bits.


time for a review, let me know. I'm not yet sure if I'll commit this as
a single change, or in three separate commits.


I don't love the idea of committing it as a single patch, but at least
the first two I think probably go together. Otherwise we're
introducing a "fix" with no proven impact that will slow down planning
(even if only in a small way) only to intend to condition that on a
GUC in the next commit.

But I think you could potentially make an argument for keeping the
additional paths separate...but it's not absolutely necessary IMO.



OK. I've been actually 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-01 Thread Tomas Vondra

On Wed, Apr 01, 2020 at 09:05:27AM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 11:07 PM James Coleman  wrote:


On Tue, Mar 31, 2020 at 10:44 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
> >> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> >> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> >> >> > wrote:

...

> >> >> >One small idea, but I'm not yet sure it helps us a whole lot: if the
> >> >> >query pathkeys is only length 1, then we could skip the additional
> >> >> >path creation.
> >> >> >
> >> >>
> >> >> I don't follow. Why would we create incremental sort in this case at
> >> >> all? With single-element query_pathkeys the path is either unsorted or
> >> >> fully sorted - there's no room for incremental sort. No?
> >> >
> >> >Well, we shouldn't, that's what I'm getting. But I didn't see anything
> >> >in the code now that explicitly excludes that case when decided
> >> >whether or not to create an incremental sort path, unless I'm missing
> >> >something obvious.
> >>
> >> Well, my point is that create_ordered_paths() looks like this:
> >>
> >>  is_sorted = pathkeys_common_contained_in(root->sort_patkeys, ...);
> >>
> >>  if (is_sorted)
> >>  {
> >>  ... old code
> >>  }
> >>  else
> >>  {
> >>  if (input_path == cheapest_input_path)
> >>  {
> >>  ... old code
> >>  }
> >>
> >>  /* With incremental sort disabled, don't build those paths. */
> >>  if (!enable_incrementalsort)
> >>  continue;
> >>
> >>  /* Likewise, if the path can't be used for incremental sort. */
> >>  if (!presorted_keys)
> >>  continue;
> >>
> >>  ... incremental sort path
> >>  }
> >>
> >> Now, with single-item sort_pathkeys, the input path can't be partially
> >> sorted. It's either fully sorted - in which case it's handled by the
> >> first branch. Or it's not sorted at all, so presorted_keys==0 and we
> >> never get to the incremental path.
> >>
> >> Or did you mean to use the optimization somewhere else?
> >
> >Hmm, yes, I didn't think through that properly. I'll have to look at
> >the other cases to confirm the same logic applies there.


I looked through this more carefully, and I did end up finding a few
places where we can skip iterating through a list of paths entirely
with this check, so I added it there. I also cleaned up some comments,
added comments and asserts to the other places where
list_length(pathkeys) should be guaranteed to be > 1, removed a few
asserts I found unnecessary, and merged duplicative
pathkeys_[count_]_contained_in calls.



OK


> >One other thing:in the code above we create the regular sort path
> >inside of `if (input_path == cheapest_input_path)`, but incremental
> >sort is outside of that condition. I'm not sure I'm remembering why
> >that was, and it's not obvious to me reading it right now (though it's
> >getting late here, so maybe I'm just not thinking clearly). Do you
> >happen to remember why that is?
> >
>
> It's because for the regular sort, the path is either already sorted or
> it requires a full sort. But full sort only makes sense on the cheapest
> path, because we assume the additional sort cost is independent of the
> input cost, essentially
>
> cost(path + Sort) = cost(path) + cost(Sort)
>
> and it's always
>
>  cost(path) + cost(Sort) >= cost(cheapest path) + cost(Sort)
>
> and by checking for cheapest path we simply skip building all the paths
> that we'd end up discarding anyway.
>
> With incremental sort we can't do this, the cost of the incremental sort
> depends on how well presorted is the input path.


Thanks for the explanation. I've added a comment to that effect.



Thanks.

I've realized the way get_useful_pathkeys_for_relation() is coded kinda
works against the fastpath we added when comparing pathkeys. That
depends on comparing pointers to the list, but we've been building new
lists (and then returned those) which defeats the optimization. Attached
is a patch that returns the original list in most cases (and only
creates a copy when really necessary). This might also save a few cycles
on bulding the new list, of course.

I've done a bunch of read-only pgbench tests with fairly small scales (1
and 10). First with the built-in read-only transaction, and also with a
simple custom query doing an order-by. And I did this both on the
default schema and with a bunch of extra indexes. The script I used to
run this is attached, along with a summary of results.

There are results for master and v40 and v50 patches (the v50 also
includes the extra patch fixing get_useful_pathkeys_for_relation).

Overall, I'm happy with those results - the v50 seems to be 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> wrote:
>>
>> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
>> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
>> > wrote:
>> >>
>> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >> >> >Tomas Vondra  writes:
>> >> >> >> In general, I think it'd be naive that we can make planner smarter 
with
>> >> >> >> no extra overhead spent on planning, and we can never accept patches
>> >> >> >> adding even tiny overhead. With that approach we'd probably end up 
with
>> >> >> >> a trivial planner that generates just a single query plan, because
>> >> >> >> that's going to be the fastest planner. A realistic approach needs 
to
>> >> >> >> consider both the planning and execution phase, and benefits of this
>> >> >> >> patch seem to be clear - if you have queries that do benefit from 
it.
>> >> >> >
>> >> >> >I think that's kind of attacking a straw man, though.  The thing that
>> >> >> >people push back on, or should push back on IMO, is when a proposed
>> >> >> >patch adds significant slowdown to queries that it has no or very 
little
>> >> >> >hope of improving.  The trick is to do expensive stuff only when
>> >> >> >there's a good chance of getting a better plan out of it.
>> >> >> >
>> >> >>
>> >> >> Yeah, I agree with that. I think the main issue is that we don't really
>> >> >> know what the "expensive stuff" is in this case, so it's not really
>> >> >> clear how to be smarter :-(
>> >> >
>> >> >To add to this: I agree that ideally you'd check cheaply to know
>> >> >you're in a situation that might help, and then do more work. But here
>> >> >the question is always going to be simply "would we benefit from an
>> >> >ordering, and, if so, do we have it already partially sorted". It's
>> >> >hard to imagine that reducing much conceptually, so we're left with
>> >> >optimizations of that check.
>> >> >
>> >>
>> >> I think it depends on what exactly is the expensive part. For example if
>> >> it's the construction of IncrementalSort paths, then maybe we could try
>> >> do a quick/check check if the path can even be useful by estimating the
>> >> cost and only then building the path.
>> >>
>> >> That's what we do for join algorithms, for example - we first compute
>> >> initial_cost_nestloop and only when that seems cheap enough we do the
>> >> more expensive stuff.
>> >>
>> >> But I'm not sure the path construction is the expensive part, as it
>> >> should be disabled by enable_incrementalsort=off. But the regression
>> >> does not seem to disappear, at least not entirely.
>> >>
>> >>
>> >> >> One possibility is that it's just one of those regressions due to 
change
>> >> >> in binary layout, but I'm not sure know how to verify that.
>> >> >
>> >> >If we are testing with a case that can't actually add more paths (due
>> >> >to it checking the guc before building them), doesn't that effectively
>> >> >leave one of these two options:
>> >> >
>> >> >1. Binary layout/cache/other untraceable change, or
>> >> >2. Changes due to refactored function calls.
>> >> >
>> >>
>> >> Hmm, but in case of (1) the overhead should be there even with tests
>> >> that don't really have any additional paths to consider, right? I've
>> >> tried with such test (single table with no indexes) and I don't quite
>> >> see any regression (maybe ~1%).
>> >
>> >Not necessarily, if the cost is in sort costing or useful pathkeys
>> >checking, right? We have run that code even without incremental sort,
>> >but it's changed from master.
>> >
>>
>> Ah, I should have mentioned I've done most of the tests on just the
>> basic incremental sort patch (0001+0002), without the additional useful
>> paths. I initially tested the whole patch series, but after discovering
>> the regression I removed the last part (which I suspected might be the
>> root cause). But the regression is still there, so it's not that.
>>
>> It might be in the reworked costing, yeah. But then I'd expect those
>> function to show in the perf profile.
>
>Right. I'm just grasping at straws on that.
>
>> >> (2) might have impact, but I don't see any immediate supects. Did you
>> >> have some functions in mind?
>> >
>> >I guess this is where the lines blur: I didn't see anything obvious
>> >either, but the changes to sort costing...should probably not have
>> >real impact...but...
>> >
>>
>> :-(
>>
>> >> BTW I see the patch adds pathkeys_common but it's not called from
>> >> anywhere. It's probably leftover from an earlier patch version.
>> >>
>
>BTW, I think I'm going to rename the pathkeys_common_contained_in
>function to something like pathkeys_count_contained_in, unless you
>have an 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> >> >> > wrote:
> >> >> >>
> >> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >> >> >Tomas Vondra  writes:
> >> >> >> >> In general, I think it'd be naive that we can make planner 
> >> >> >> >> smarter with
> >> >> >> >> no extra overhead spent on planning, and we can never accept 
> >> >> >> >> patches
> >> >> >> >> adding even tiny overhead. With that approach we'd probably end 
> >> >> >> >> up with
> >> >> >> >> a trivial planner that generates just a single query plan, because
> >> >> >> >> that's going to be the fastest planner. A realistic approach 
> >> >> >> >> needs to
> >> >> >> >> consider both the planning and execution phase, and benefits of 
> >> >> >> >> this
> >> >> >> >> patch seem to be clear - if you have queries that do benefit from 
> >> >> >> >> it.
> >> >> >> >
> >> >> >> >I think that's kind of attacking a straw man, though.  The thing 
> >> >> >> >that
> >> >> >> >people push back on, or should push back on IMO, is when a proposed
> >> >> >> >patch adds significant slowdown to queries that it has no or very 
> >> >> >> >little
> >> >> >> >hope of improving.  The trick is to do expensive stuff only when
> >> >> >> >there's a good chance of getting a better plan out of it.
> >> >> >> >
> >> >> >>
> >> >> >> Yeah, I agree with that. I think the main issue is that we don't 
> >> >> >> really
> >> >> >> know what the "expensive stuff" is in this case, so it's not really
> >> >> >> clear how to be smarter :-(
> >> >> >
> >> >> >To add to this: I agree that ideally you'd check cheaply to know
> >> >> >you're in a situation that might help, and then do more work. But here
> >> >> >the question is always going to be simply "would we benefit from an
> >> >> >ordering, and, if so, do we have it already partially sorted". It's
> >> >> >hard to imagine that reducing much conceptually, so we're left with
> >> >> >optimizations of that check.
> >> >> >
> >> >>
> >> >> I think it depends on what exactly is the expensive part. For example if
> >> >> it's the construction of IncrementalSort paths, then maybe we could try
> >> >> do a quick/check check if the path can even be useful by estimating the
> >> >> cost and only then building the path.
> >> >>
> >> >> That's what we do for join algorithms, for example - we first compute
> >> >> initial_cost_nestloop and only when that seems cheap enough we do the
> >> >> more expensive stuff.
> >> >>
> >> >> But I'm not sure the path construction is the expensive part, as it
> >> >> should be disabled by enable_incrementalsort=off. But the regression
> >> >> does not seem to disappear, at least not entirely.
> >> >>
> >> >>
> >> >> >> One possibility is that it's just one of those regressions due to 
> >> >> >> change
> >> >> >> in binary layout, but I'm not sure know how to verify that.
> >> >> >
> >> >> >If we are testing with a case that can't actually add more paths (due
> >> >> >to it checking the guc before building them), doesn't that effectively
> >> >> >leave one of these two options:
> >> >> >
> >> >> >1. Binary layout/cache/other untraceable change, or
> >> >> >2. Changes due to refactored function calls.
> >> >> >
> >> >>
> >> >> Hmm, but in case of (1) the overhead should be there even with tests
> >> >> that don't really have any additional paths to consider, right? I've
> >> >> tried with such test (single table with no indexes) and I don't quite
> >> >> see any regression (maybe ~1%).
> >> >
> >> >Not necessarily, if the cost is in sort costing or useful pathkeys
> >> >checking, right? We have run that code even without incremental sort,
> >> >but it's changed from master.
> >> >
> >>
> >> Ah, I should have mentioned I've done most of the tests on just the
> >> basic incremental sort patch (0001+0002), without the additional useful
> >> paths. I initially tested the whole patch series, but after discovering
> >> the regression I removed the last part (which I suspected might be the
> >> root cause). But the regression is still there, so it's not that.
> >>
> >> It might be in the reworked costing, yeah. But then I'd expect those
> >> function to show in the perf profile.
> >
> >Right. I'm just grasping at straws on that.
> >
> >> >> (2) might have impact, but I don't see any immediate supects. Did you
> >> >> have some functions in mind?
> >> >
> >> >I guess this is where the lines blur: I didn't see anything obvious
> >> >either, but the changes to sort costing...should probably not have
> >> >real impact...but...
> >> >
> >>
> >> :-(
> >>
> >> >> BTW I see the patch adds 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> wrote:
>>
>> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
>> > wrote:
>> >>
>> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >> >Tomas Vondra  writes:
>> >> >> In general, I think it'd be naive that we can make planner smarter with
>> >> >> no extra overhead spent on planning, and we can never accept patches
>> >> >> adding even tiny overhead. With that approach we'd probably end up with
>> >> >> a trivial planner that generates just a single query plan, because
>> >> >> that's going to be the fastest planner. A realistic approach needs to
>> >> >> consider both the planning and execution phase, and benefits of this
>> >> >> patch seem to be clear - if you have queries that do benefit from it.
>> >> >
>> >> >I think that's kind of attacking a straw man, though.  The thing that
>> >> >people push back on, or should push back on IMO, is when a proposed
>> >> >patch adds significant slowdown to queries that it has no or very little
>> >> >hope of improving.  The trick is to do expensive stuff only when
>> >> >there's a good chance of getting a better plan out of it.
>> >> >
>> >>
>> >> Yeah, I agree with that. I think the main issue is that we don't really
>> >> know what the "expensive stuff" is in this case, so it's not really
>> >> clear how to be smarter :-(
>> >
>> >To add to this: I agree that ideally you'd check cheaply to know
>> >you're in a situation that might help, and then do more work. But here
>> >the question is always going to be simply "would we benefit from an
>> >ordering, and, if so, do we have it already partially sorted". It's
>> >hard to imagine that reducing much conceptually, so we're left with
>> >optimizations of that check.
>> >
>>
>> I think it depends on what exactly is the expensive part. For example if
>> it's the construction of IncrementalSort paths, then maybe we could try
>> do a quick/check check if the path can even be useful by estimating the
>> cost and only then building the path.
>>
>> That's what we do for join algorithms, for example - we first compute
>> initial_cost_nestloop and only when that seems cheap enough we do the
>> more expensive stuff.
>>
>> But I'm not sure the path construction is the expensive part, as it
>> should be disabled by enable_incrementalsort=off. But the regression
>> does not seem to disappear, at least not entirely.
>>
>>
>> >> One possibility is that it's just one of those regressions due to change
>> >> in binary layout, but I'm not sure know how to verify that.
>> >
>> >If we are testing with a case that can't actually add more paths (due
>> >to it checking the guc before building them), doesn't that effectively
>> >leave one of these two options:
>> >
>> >1. Binary layout/cache/other untraceable change, or
>> >2. Changes due to refactored function calls.
>> >
>>
>> Hmm, but in case of (1) the overhead should be there even with tests
>> that don't really have any additional paths to consider, right? I've
>> tried with such test (single table with no indexes) and I don't quite
>> see any regression (maybe ~1%).
>
>Not necessarily, if the cost is in sort costing or useful pathkeys
>checking, right? We have run that code even without incremental sort,
>but it's changed from master.
>

Ah, I should have mentioned I've done most of the tests on just the
basic incremental sort patch (0001+0002), without the additional useful
paths. I initially tested the whole patch series, but after discovering
the regression I removed the last part (which I suspected might be the
root cause). But the regression is still there, so it's not that.

It might be in the reworked costing, yeah. But then I'd expect those
function to show in the perf profile.


Right. I'm just grasping at straws on that.


>> (2) might have impact, but I don't see any immediate supects. Did you
>> have some functions in mind?
>
>I guess this is where the lines blur: I didn't see anything obvious
>either, but the changes to sort costing...should probably not have
>real impact...but...
>

:-(

>> BTW I see the patch adds pathkeys_common but it's not called from
>> anywhere. It's probably leftover from an earlier patch version.
>>


BTW, I think I'm going to rename the pathkeys_common_contained_in
function to something like pathkeys_count_contained_in, unless you
have an objection to that. The name doesn't seem obvious at all to me.



WFM


>> >There's not anything obvious in point (2) that would be a big cost,
>> >but there are definitely changes there. I was surprised that just
>> >eliminating the loop through the pathkeys on the query and the index
>> >was enough to save us ~4%.
>> >
>> >Tomas: Earlier you'd wondered about if we should try to shortcut the
>> >changes 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >> >Tomas Vondra  writes:
> >> >> >> In general, I think it'd be naive that we can make planner smarter 
> >> >> >> with
> >> >> >> no extra overhead spent on planning, and we can never accept patches
> >> >> >> adding even tiny overhead. With that approach we'd probably end up 
> >> >> >> with
> >> >> >> a trivial planner that generates just a single query plan, because
> >> >> >> that's going to be the fastest planner. A realistic approach needs to
> >> >> >> consider both the planning and execution phase, and benefits of this
> >> >> >> patch seem to be clear - if you have queries that do benefit from it.
> >> >> >
> >> >> >I think that's kind of attacking a straw man, though.  The thing that
> >> >> >people push back on, or should push back on IMO, is when a proposed
> >> >> >patch adds significant slowdown to queries that it has no or very 
> >> >> >little
> >> >> >hope of improving.  The trick is to do expensive stuff only when
> >> >> >there's a good chance of getting a better plan out of it.
> >> >> >
> >> >>
> >> >> Yeah, I agree with that. I think the main issue is that we don't really
> >> >> know what the "expensive stuff" is in this case, so it's not really
> >> >> clear how to be smarter :-(
> >> >
> >> >To add to this: I agree that ideally you'd check cheaply to know
> >> >you're in a situation that might help, and then do more work. But here
> >> >the question is always going to be simply "would we benefit from an
> >> >ordering, and, if so, do we have it already partially sorted". It's
> >> >hard to imagine that reducing much conceptually, so we're left with
> >> >optimizations of that check.
> >> >
> >>
> >> I think it depends on what exactly is the expensive part. For example if
> >> it's the construction of IncrementalSort paths, then maybe we could try
> >> do a quick/check check if the path can even be useful by estimating the
> >> cost and only then building the path.
> >>
> >> That's what we do for join algorithms, for example - we first compute
> >> initial_cost_nestloop and only when that seems cheap enough we do the
> >> more expensive stuff.
> >>
> >> But I'm not sure the path construction is the expensive part, as it
> >> should be disabled by enable_incrementalsort=off. But the regression
> >> does not seem to disappear, at least not entirely.
> >>
> >>
> >> >> One possibility is that it's just one of those regressions due to change
> >> >> in binary layout, but I'm not sure know how to verify that.
> >> >
> >> >If we are testing with a case that can't actually add more paths (due
> >> >to it checking the guc before building them), doesn't that effectively
> >> >leave one of these two options:
> >> >
> >> >1. Binary layout/cache/other untraceable change, or
> >> >2. Changes due to refactored function calls.
> >> >
> >>
> >> Hmm, but in case of (1) the overhead should be there even with tests
> >> that don't really have any additional paths to consider, right? I've
> >> tried with such test (single table with no indexes) and I don't quite
> >> see any regression (maybe ~1%).
> >
> >Not necessarily, if the cost is in sort costing or useful pathkeys
> >checking, right? We have run that code even without incremental sort,
> >but it's changed from master.
> >
>
> Ah, I should have mentioned I've done most of the tests on just the
> basic incremental sort patch (0001+0002), without the additional useful
> paths. I initially tested the whole patch series, but after discovering
> the regression I removed the last part (which I suspected might be the
> root cause). But the regression is still there, so it's not that.
>
> It might be in the reworked costing, yeah. But then I'd expect those
> function to show in the perf profile.

Right. I'm just grasping at straws on that.

> >> (2) might have impact, but I don't see any immediate supects. Did you
> >> have some functions in mind?
> >
> >I guess this is where the lines blur: I didn't see anything obvious
> >either, but the changes to sort costing...should probably not have
> >real impact...but...
> >
>
> :-(
>
> >> BTW I see the patch adds pathkeys_common but it's not called from
> >> anywhere. It's probably leftover from an earlier patch version.
> >>

BTW, I think I'm going to rename the pathkeys_common_contained_in
function to something like pathkeys_count_contained_in, unless you
have an objection to that. The name doesn't seem obvious at all to me.

> >> >There's not anything obvious in point (2) that would be a big cost,
> >> >but there are definitely changes there. I was surprised that just
> >> >eliminating the loop through the pathkeys on 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
>On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> wrote:
>>
>> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>> >Tomas Vondra  writes:
>> >> In general, I think it'd be naive that we can make planner smarter with
>> >> no extra overhead spent on planning, and we can never accept patches
>> >> adding even tiny overhead. With that approach we'd probably end up with
>> >> a trivial planner that generates just a single query plan, because
>> >> that's going to be the fastest planner. A realistic approach needs to
>> >> consider both the planning and execution phase, and benefits of this
>> >> patch seem to be clear - if you have queries that do benefit from it.
>> >
>> >I think that's kind of attacking a straw man, though.  The thing that
>> >people push back on, or should push back on IMO, is when a proposed
>> >patch adds significant slowdown to queries that it has no or very little
>> >hope of improving.  The trick is to do expensive stuff only when
>> >there's a good chance of getting a better plan out of it.
>> >
>>
>> Yeah, I agree with that. I think the main issue is that we don't really
>> know what the "expensive stuff" is in this case, so it's not really
>> clear how to be smarter :-(
>
>To add to this: I agree that ideally you'd check cheaply to know
>you're in a situation that might help, and then do more work. But here
>the question is always going to be simply "would we benefit from an
>ordering, and, if so, do we have it already partially sorted". It's
>hard to imagine that reducing much conceptually, so we're left with
>optimizations of that check.
>

I think it depends on what exactly is the expensive part. For example if
it's the construction of IncrementalSort paths, then maybe we could try
do a quick/check check if the path can even be useful by estimating the
cost and only then building the path.

That's what we do for join algorithms, for example - we first compute
initial_cost_nestloop and only when that seems cheap enough we do the
more expensive stuff.

But I'm not sure the path construction is the expensive part, as it
should be disabled by enable_incrementalsort=off. But the regression
does not seem to disappear, at least not entirely.


>> One possibility is that it's just one of those regressions due to change
>> in binary layout, but I'm not sure know how to verify that.
>
>If we are testing with a case that can't actually add more paths (due
>to it checking the guc before building them), doesn't that effectively
>leave one of these two options:
>
>1. Binary layout/cache/other untraceable change, or
>2. Changes due to refactored function calls.
>

Hmm, but in case of (1) the overhead should be there even with tests
that don't really have any additional paths to consider, right? I've
tried with such test (single table with no indexes) and I don't quite
see any regression (maybe ~1%).


Not necessarily, if the cost is in sort costing or useful pathkeys
checking, right? We have run that code even without incremental sort,
but it's changed from master.



Ah, I should have mentioned I've done most of the tests on just the
basic incremental sort patch (0001+0002), without the additional useful
paths. I initially tested the whole patch series, but after discovering
the regression I removed the last part (which I suspected might be the
root cause). But the regression is still there, so it's not that.

It might be in the reworked costing, yeah. But then I'd expect those
function to show in the perf profile.


(2) might have impact, but I don't see any immediate supects. Did you
have some functions in mind?


I guess this is where the lines blur: I didn't see anything obvious
either, but the changes to sort costing...should probably not have
real impact...but...



:-(


BTW I see the patch adds pathkeys_common but it's not called from
anywhere. It's probably leftover from an earlier patch version.

>There's not anything obvious in point (2) that would be a big cost,
>but there are definitely changes there. I was surprised that just
>eliminating the loop through the pathkeys on the query and the index
>was enough to save us ~4%.
>
>Tomas: Earlier you'd wondered about if we should try to shortcut the
>changes in costing...I was skeptical of that originally, but maybe
>it's worth looking into? I'm going to try backing that out and see
>what the numbers look like.
>


BTW, I did this test, and it looks like we can get back something
close to 1% by reverting that initial fix on partial path costing. But
we can't get rid of it all the time, at the very least. *Maybe* we
could condition it on incremental sort, but I'm not convinced that's
the only place it's needed as a fix.



Sounds interesting. I actually tried how much the add_partial_path
change accounts for, and you're right it was quite a bit. But 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Alvaro Herrera
On 2020-Mar-31, Tom Lane wrote:

> James Coleman  writes:
> > On Tue, Mar 31, 2020 at 1:04 PM Tom Lane  wrote:
> >> Perhaps the semantics are such that that's actually sensible, but it's
> >> far from a straightforward remapping of the old enum.
> 
> > Right, I didn't see the explicit "= 0" in other enums there, so it
> > made me wonder if it was intentional to designate that one had to be
> > 0, but I guess without a comment that's a lot of inference.
> 
> It's possible that somebody meant that as an indicator that the code
> depends on palloc0() leaving the field with that value.  But if so,
> you'd soon find that out ... and an actual comment would be better,
> anyway.

git blame fingers this:

commit bf11e7ee2e3607bb67d25aec73aa53b2d7e9961b
Author: Robert Haas 
AuthorDate: Tue Aug 29 13:22:49 2017 -0400
CommitDate: Tue Aug 29 13:26:33 2017 -0400

Propagate sort instrumentation from workers back to leader.

Up until now, when parallel query was used, no details about the
sort method or space used by the workers were available; details
were shown only for any sorting done by the leader.  Fix that.

Commit 1177ab1dabf72bafee8f19d904cee3a299f25892 forced the test case
added by commit 1f6d515a67ec98194c23a5db25660856c9aab944 to run
without parallelism; now that we have this infrastructure, allow
that again, with a little tweaking to make it pass with and without
force_parallel_mode.

Robert Haas and Tom Lane

Discussion: 
http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=rf6rqq2br+zjvegwj0uod...@mail.gmail.com

I looked at the discussion thread.  That patch was first posted by
Robert at
https://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=rf6rqq2br+zjvegwj0uod...@mail.gmail.com
without the "= 0" part; later Tom posted v2 here
https://postgr.es/m/11223.1503695...@sss.pgh.pa.us
containing the "= 0", but I see no actual discussion of that point.

I suppose it could also be important to clarify that it's 0 if it were
used as an array index of some sort, but I don't see that in 2017's
commit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:
> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >> >Tomas Vondra  writes:
> >> >> In general, I think it'd be naive that we can make planner smarter with
> >> >> no extra overhead spent on planning, and we can never accept patches
> >> >> adding even tiny overhead. With that approach we'd probably end up with
> >> >> a trivial planner that generates just a single query plan, because
> >> >> that's going to be the fastest planner. A realistic approach needs to
> >> >> consider both the planning and execution phase, and benefits of this
> >> >> patch seem to be clear - if you have queries that do benefit from it.
> >> >
> >> >I think that's kind of attacking a straw man, though.  The thing that
> >> >people push back on, or should push back on IMO, is when a proposed
> >> >patch adds significant slowdown to queries that it has no or very little
> >> >hope of improving.  The trick is to do expensive stuff only when
> >> >there's a good chance of getting a better plan out of it.
> >> >
> >>
> >> Yeah, I agree with that. I think the main issue is that we don't really
> >> know what the "expensive stuff" is in this case, so it's not really
> >> clear how to be smarter :-(
> >
> >To add to this: I agree that ideally you'd check cheaply to know
> >you're in a situation that might help, and then do more work. But here
> >the question is always going to be simply "would we benefit from an
> >ordering, and, if so, do we have it already partially sorted". It's
> >hard to imagine that reducing much conceptually, so we're left with
> >optimizations of that check.
> >
>
> I think it depends on what exactly is the expensive part. For example if
> it's the construction of IncrementalSort paths, then maybe we could try
> do a quick/check check if the path can even be useful by estimating the
> cost and only then building the path.
>
> That's what we do for join algorithms, for example - we first compute
> initial_cost_nestloop and only when that seems cheap enough we do the
> more expensive stuff.
>
> But I'm not sure the path construction is the expensive part, as it
> should be disabled by enable_incrementalsort=off. But the regression
> does not seem to disappear, at least not entirely.
>
>
> >> One possibility is that it's just one of those regressions due to change
> >> in binary layout, but I'm not sure know how to verify that.
> >
> >If we are testing with a case that can't actually add more paths (due
> >to it checking the guc before building them), doesn't that effectively
> >leave one of these two options:
> >
> >1. Binary layout/cache/other untraceable change, or
> >2. Changes due to refactored function calls.
> >
>
> Hmm, but in case of (1) the overhead should be there even with tests
> that don't really have any additional paths to consider, right? I've
> tried with such test (single table with no indexes) and I don't quite
> see any regression (maybe ~1%).

Not necessarily, if the cost is in sort costing or useful pathkeys
checking, right? We have run that code even without incremental sort,
but it's changed from master.

> (2) might have impact, but I don't see any immediate supects. Did you
> have some functions in mind?

I guess this is where the lines blur: I didn't see anything obvious
either, but the changes to sort costing...should probably not have
real impact...but...

> BTW I see the patch adds pathkeys_common but it's not called from
> anywhere. It's probably leftover from an earlier patch version.
>
> >There's not anything obvious in point (2) that would be a big cost,
> >but there are definitely changes there. I was surprised that just
> >eliminating the loop through the pathkeys on the query and the index
> >was enough to save us ~4%.
> >
> >Tomas: Earlier you'd wondered about if we should try to shortcut the
> >changes in costing...I was skeptical of that originally, but maybe
> >it's worth looking into? I'm going to try backing that out and see
> >what the numbers look like.
> >

BTW, I did this test, and it looks like we can get back something
close to 1% by reverting that initial fix on partial path costing. But
we can't get rid of it all the time, at the very least. *Maybe* we
could condition it on incremental sort, but I'm not convinced that's
the only place it's needed as a fix.

> I've described the idea about something like initial_cost_nestloop and
> so on. But I'm a bit skeptical about it, considering that the GUC only
> has limited effect.
>
>
> A somewhat note is that the number of indexes has pretty significant
> impact on planning time, even on master. This is timing of the same
> explain script (similar to the one shown before) with different number
> of indexes on master:
>
>  0 indexes7 indexes   49 indexes
>  ---
>   

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
>Tomas Vondra  writes:
>> In general, I think it'd be naive that we can make planner smarter with
>> no extra overhead spent on planning, and we can never accept patches
>> adding even tiny overhead. With that approach we'd probably end up with
>> a trivial planner that generates just a single query plan, because
>> that's going to be the fastest planner. A realistic approach needs to
>> consider both the planning and execution phase, and benefits of this
>> patch seem to be clear - if you have queries that do benefit from it.
>
>I think that's kind of attacking a straw man, though.  The thing that
>people push back on, or should push back on IMO, is when a proposed
>patch adds significant slowdown to queries that it has no or very little
>hope of improving.  The trick is to do expensive stuff only when
>there's a good chance of getting a better plan out of it.
>

Yeah, I agree with that. I think the main issue is that we don't really
know what the "expensive stuff" is in this case, so it's not really
clear how to be smarter :-(


To add to this: I agree that ideally you'd check cheaply to know
you're in a situation that might help, and then do more work. But here
the question is always going to be simply "would we benefit from an
ordering, and, if so, do we have it already partially sorted". It's
hard to imagine that reducing much conceptually, so we're left with
optimizations of that check.



I think it depends on what exactly is the expensive part. For example if
it's the construction of IncrementalSort paths, then maybe we could try
do a quick/check check if the path can even be useful by estimating the
cost and only then building the path.

That's what we do for join algorithms, for example - we first compute
initial_cost_nestloop and only when that seems cheap enough we do the
more expensive stuff.

But I'm not sure the path construction is the expensive part, as it
should be disabled by enable_incrementalsort=off. But the regression
does not seem to disappear, at least not entirely.



One possibility is that it's just one of those regressions due to change
in binary layout, but I'm not sure know how to verify that.


If we are testing with a case that can't actually add more paths (due
to it checking the guc before building them), doesn't that effectively
leave one of these two options:

1. Binary layout/cache/other untraceable change, or
2. Changes due to refactored function calls.



Hmm, but in case of (1) the overhead should be there even with tests
that don't really have any additional paths to consider, right? I've
tried with such test (single table with no indexes) and I don't quite
see any regression (maybe ~1%).

(2) might have impact, but I don't see any immediate supects. Did you
have some functions in mind?

BTW I see the patch adds pathkeys_common but it's not called from
anywhere. It's probably leftover from an earlier patch version.


There's not anything obvious in point (2) that would be a big cost,
but there are definitely changes there. I was surprised that just
eliminating the loop through the pathkeys on the query and the index
was enough to save us ~4%.

Tomas: Earlier you'd wondered about if we should try to shortcut the
changes in costing...I was skeptical of that originally, but maybe
it's worth looking into? I'm going to try backing that out and see
what the numbers look like.



I've described the idea about something like initial_cost_nestloop and
so on. But I'm a bit skeptical about it, considering that the GUC only
has limited effect.


A somewhat note is that the number of indexes has pretty significant
impact on planning time, even on master. This is timing of the same
explain script (similar to the one shown before) with different number
of indexes on master:

0 indexes7 indexes   49 indexes
---
10.8512.5627.83

The way I look at incremental sort is that it allows using indexes for
queries that couldn't use it before, possibly requiring a separate
index. So incremental sort might easily reduce the number of indexes
needed, compensating for the overhead we're discussing here. Of course,
that may or may not be true.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:
> >Tomas Vondra  writes:
> >> In general, I think it'd be naive that we can make planner smarter with
> >> no extra overhead spent on planning, and we can never accept patches
> >> adding even tiny overhead. With that approach we'd probably end up with
> >> a trivial planner that generates just a single query plan, because
> >> that's going to be the fastest planner. A realistic approach needs to
> >> consider both the planning and execution phase, and benefits of this
> >> patch seem to be clear - if you have queries that do benefit from it.
> >
> >I think that's kind of attacking a straw man, though.  The thing that
> >people push back on, or should push back on IMO, is when a proposed
> >patch adds significant slowdown to queries that it has no or very little
> >hope of improving.  The trick is to do expensive stuff only when
> >there's a good chance of getting a better plan out of it.
> >
>
> Yeah, I agree with that. I think the main issue is that we don't really
> know what the "expensive stuff" is in this case, so it's not really
> clear how to be smarter :-(

To add to this: I agree that ideally you'd check cheaply to know
you're in a situation that might help, and then do more work. But here
the question is always going to be simply "would we benefit from an
ordering, and, if so, do we have it already partially sorted". It's
hard to imagine that reducing much conceptually, so we're left with
optimizations of that check.

> One possibility is that it's just one of those regressions due to change
> in binary layout, but I'm not sure know how to verify that.

If we are testing with a case that can't actually add more paths (due
to it checking the guc before building them), doesn't that effectively
leave one of these two options:
1. Binary layout/cache/other untraceable change, or
2. Changes due to refactored function calls.

There's not anything obvious in point (2) that would be a big cost,
but there are definitely changes there. I was surprised that just
eliminating the loop through the pathkeys on the query and the index
was enough to save us ~4%.

Tomas: Earlier you'd wondered about if we should try to shortcut the
changes in costing...I was skeptical of that originally, but maybe
it's worth looking into? I'm going to try backing that out and see
what the numbers look like.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

In general, I think it'd be naive that we can make planner smarter with
no extra overhead spent on planning, and we can never accept patches
adding even tiny overhead. With that approach we'd probably end up with
a trivial planner that generates just a single query plan, because
that's going to be the fastest planner. A realistic approach needs to
consider both the planning and execution phase, and benefits of this
patch seem to be clear - if you have queries that do benefit from it.


I think that's kind of attacking a straw man, though.  The thing that
people push back on, or should push back on IMO, is when a proposed
patch adds significant slowdown to queries that it has no or very little
hope of improving.  The trick is to do expensive stuff only when
there's a good chance of getting a better plan out of it.



Yeah, I agree with that. I think the main issue is that we don't really
know what the "expensive stuff" is in this case, so it's not really
clear how to be smarter :-(

One possibility is that it's just one of those regressions due to change
in binary layout, but I'm not sure know how to verify that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 06:00:00PM -0400, James Coleman wrote:

On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra
 wrote:


On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:
>On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
> wrote:
>>
>> The main thing I've been working on today is benchmarking how this
>> affects planning. And I'm seeing a regression that worries me a bit,
>> unfortunately.
>>
>> The test I'm doing is pretty simple - build a small table with a bunch
>> of columns:
>>
>>create table t (a int, b int, c int, d int, e int, f int, g int);
>>
>>insert into t select 100*random(), 100*random(), 100*random(),
>>  100*random(), 100*random(), 100*random(), 100*random()
>>from generate_series(1,10) s(i);
>>
>> and then a number of indexes on subsets of up to 3 columns, as generated
>> using the attached build-indexes.py script. And then run a bunch of
>> explains (so no actual execution) sorting the data by at least 4 columns
>> (to trigger incremental sort paths), measuring timing of the script.
>>
>> I did a bunch of runs on current master and v46 with incremental sort
>> disabled and enabled, and the results look like this:
>>
>>  master   offon
>>  --
>>  34.60937.46337.729
>>
>> which means about 8-9% regression with incremental sort. Of course, this
>> is only for planning time, for execution the impact is going to be much
>> smaller. But it's still a bit annoying.
>>
>> I've suspected this might be either due to the add_partial_path changes
>> or the patch adding incremental sort to additional places, so I tested
>> those parts individually and the answer is no - add_partial_path changes
>> have very small impact (~1%, which might be noise). The regression comes
>> mostly from the 0002 part that adds incremental sort. At least in this
>> particular test - different tests might behave differently, of course.
>>
>> The annoying bit is that the overhead does not disappear after disabling
>> incremental sort. That suggests this is not merely due to considering
>> and creating higher number of paths, but due to something that happens
>> before we even look at the GUC ...
>>
>> I think I've found one such place - if you look at compare_pathkeys, it
>> has this check right before the forboth() loop:
>>
>>  if (keys1 == keys2)
>>  return PATHKEYS_EQUAL;
>>
>> But with incremental sort we don't call pathkeys_contained_in, we call
>> pathkeys_common_contained_in instead. And that does not call
>> compare_pathkeys and does not have the simple equality check. Adding
>> the following check seems to cut the overhead in half, which is nice:
>>
>>  if (keys1 == keys2)
>>  {
>>  *n_common = list_length(keys1);
>> return true;
>>  }
>>
>> Not sure where the rest of the regression comes from yet.
>
>I noticed in the other function we also optimize by checking if either
>keys list is NIL, so I tried adding that, and it might have made a
>minor difference, but it's hard to tell as it was under 1%.
>

Which other function? I don't see this optimization in compare_pathkeys,
that only checks key1/key2 after the forboth loop, but that's something
different.


pathkeys_useful_for_ordering checks both inputs.


I may be wrong, but my guess would be that this won't save much, because
the loop should terminate immediately. The whole point is not to loop
over possibly many pathkeys when we know it's exactly the same pathkeys
list (same pointer). When one of the lists is NIL the loop should
terminate right away ...

>I also ran perf with a slightly modified version of your test that
>uses psql, and after the above changes was seeing something like a
>3.5% delta between master and this patch series. Nothing obvious in
>the perf report though.
>

Yeah, I don't think there's anything obviously more expensive.

>This test is intended to be somewhat worst case, no? At what point do
>we consider the trade-off worth it (given that it's not plausible to
>have zero impact)?
>

Yes, more or less. It was definitely designed to do that - it merely
plans the query (no execution), with many applicable indexes etc. It's
definitely true that once the exection starts to take more time, the
overhead will get negligible. Same for reducing number of indexes.

And of course, for queries that can benefit from incremental sort, the
speedup is likely way larger than this.

In general, I think it'd be naive that we can make planner smarter with
no extra overhead spent on planning, and we can never accept patches
adding even tiny overhead. With that approach we'd probably end up with
a trivial planner that generates just a single query plan, because
that's going to be the fastest planner. A realistic approach needs to
consider both the planning and execution phase, and benefits of this
patch seem to be clear - if you have queries that do benefit from it.

Let's try to minimize the overhead a bit more, and then we'll see.


Any 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tom Lane
Tomas Vondra  writes:
> In general, I think it'd be naive that we can make planner smarter with
> no extra overhead spent on planning, and we can never accept patches
> adding even tiny overhead. With that approach we'd probably end up with
> a trivial planner that generates just a single query plan, because
> that's going to be the fastest planner. A realistic approach needs to
> consider both the planning and execution phase, and benefits of this
> patch seem to be clear - if you have queries that do benefit from it.

I think that's kind of attacking a straw man, though.  The thing that
people push back on, or should push back on IMO, is when a proposed
patch adds significant slowdown to queries that it has no or very little
hope of improving.  The trick is to do expensive stuff only when
there's a good chance of getting a better plan out of it.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:
> >On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
> > wrote:
> >>
> >> The main thing I've been working on today is benchmarking how this
> >> affects planning. And I'm seeing a regression that worries me a bit,
> >> unfortunately.
> >>
> >> The test I'm doing is pretty simple - build a small table with a bunch
> >> of columns:
> >>
> >>create table t (a int, b int, c int, d int, e int, f int, g int);
> >>
> >>insert into t select 100*random(), 100*random(), 100*random(),
> >>  100*random(), 100*random(), 100*random(), 100*random()
> >>from generate_series(1,10) s(i);
> >>
> >> and then a number of indexes on subsets of up to 3 columns, as generated
> >> using the attached build-indexes.py script. And then run a bunch of
> >> explains (so no actual execution) sorting the data by at least 4 columns
> >> (to trigger incremental sort paths), measuring timing of the script.
> >>
> >> I did a bunch of runs on current master and v46 with incremental sort
> >> disabled and enabled, and the results look like this:
> >>
> >>  master   offon
> >>  --
> >>  34.60937.46337.729
> >>
> >> which means about 8-9% regression with incremental sort. Of course, this
> >> is only for planning time, for execution the impact is going to be much
> >> smaller. But it's still a bit annoying.
> >>
> >> I've suspected this might be either due to the add_partial_path changes
> >> or the patch adding incremental sort to additional places, so I tested
> >> those parts individually and the answer is no - add_partial_path changes
> >> have very small impact (~1%, which might be noise). The regression comes
> >> mostly from the 0002 part that adds incremental sort. At least in this
> >> particular test - different tests might behave differently, of course.
> >>
> >> The annoying bit is that the overhead does not disappear after disabling
> >> incremental sort. That suggests this is not merely due to considering
> >> and creating higher number of paths, but due to something that happens
> >> before we even look at the GUC ...
> >>
> >> I think I've found one such place - if you look at compare_pathkeys, it
> >> has this check right before the forboth() loop:
> >>
> >>  if (keys1 == keys2)
> >>  return PATHKEYS_EQUAL;
> >>
> >> But with incremental sort we don't call pathkeys_contained_in, we call
> >> pathkeys_common_contained_in instead. And that does not call
> >> compare_pathkeys and does not have the simple equality check. Adding
> >> the following check seems to cut the overhead in half, which is nice:
> >>
> >>  if (keys1 == keys2)
> >>  {
> >>  *n_common = list_length(keys1);
> >> return true;
> >>  }
> >>
> >> Not sure where the rest of the regression comes from yet.
> >
> >I noticed in the other function we also optimize by checking if either
> >keys list is NIL, so I tried adding that, and it might have made a
> >minor difference, but it's hard to tell as it was under 1%.
> >
>
> Which other function? I don't see this optimization in compare_pathkeys,
> that only checks key1/key2 after the forboth loop, but that's something
> different.

pathkeys_useful_for_ordering checks both inputs.

> I may be wrong, but my guess would be that this won't save much, because
> the loop should terminate immediately. The whole point is not to loop
> over possibly many pathkeys when we know it's exactly the same pathkeys
> list (same pointer). When one of the lists is NIL the loop should
> terminate right away ...
>
> >I also ran perf with a slightly modified version of your test that
> >uses psql, and after the above changes was seeing something like a
> >3.5% delta between master and this patch series. Nothing obvious in
> >the perf report though.
> >
>
> Yeah, I don't think there's anything obviously more expensive.
>
> >This test is intended to be somewhat worst case, no? At what point do
> >we consider the trade-off worth it (given that it's not plausible to
> >have zero impact)?
> >
>
> Yes, more or less. It was definitely designed to do that - it merely
> plans the query (no execution), with many applicable indexes etc. It's
> definitely true that once the exection starts to take more time, the
> overhead will get negligible. Same for reducing number of indexes.
>
> And of course, for queries that can benefit from incremental sort, the
> speedup is likely way larger than this.
>
> In general, I think it'd be naive that we can make planner smarter with
> no extra overhead spent on planning, and we can never accept patches
> adding even tiny overhead. With that approach we'd probably end up with
> a trivial planner that generates just a single query plan, because
> that's going to be the fastest planner. A realistic approach needs to
> consider both the planning and execution phase, and benefits of this
> patch 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tomas Vondra

On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote:

On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra
 wrote:


The main thing I've been working on today is benchmarking how this
affects planning. And I'm seeing a regression that worries me a bit,
unfortunately.

The test I'm doing is pretty simple - build a small table with a bunch
of columns:

   create table t (a int, b int, c int, d int, e int, f int, g int);

   insert into t select 100*random(), 100*random(), 100*random(),
 100*random(), 100*random(), 100*random(), 100*random()
   from generate_series(1,10) s(i);

and then a number of indexes on subsets of up to 3 columns, as generated
using the attached build-indexes.py script. And then run a bunch of
explains (so no actual execution) sorting the data by at least 4 columns
(to trigger incremental sort paths), measuring timing of the script.

I did a bunch of runs on current master and v46 with incremental sort
disabled and enabled, and the results look like this:

 master   offon
 --
 34.60937.46337.729

which means about 8-9% regression with incremental sort. Of course, this
is only for planning time, for execution the impact is going to be much
smaller. But it's still a bit annoying.

I've suspected this might be either due to the add_partial_path changes
or the patch adding incremental sort to additional places, so I tested
those parts individually and the answer is no - add_partial_path changes
have very small impact (~1%, which might be noise). The regression comes
mostly from the 0002 part that adds incremental sort. At least in this
particular test - different tests might behave differently, of course.

The annoying bit is that the overhead does not disappear after disabling
incremental sort. That suggests this is not merely due to considering
and creating higher number of paths, but due to something that happens
before we even look at the GUC ...

I think I've found one such place - if you look at compare_pathkeys, it
has this check right before the forboth() loop:

 if (keys1 == keys2)
 return PATHKEYS_EQUAL;

But with incremental sort we don't call pathkeys_contained_in, we call
pathkeys_common_contained_in instead. And that does not call
compare_pathkeys and does not have the simple equality check. Adding
the following check seems to cut the overhead in half, which is nice:

 if (keys1 == keys2)
 {
 *n_common = list_length(keys1);
return true;
 }

Not sure where the rest of the regression comes from yet.


I noticed in the other function we also optimize by checking if either
keys list is NIL, so I tried adding that, and it might have made a
minor difference, but it's hard to tell as it was under 1%.



Which other function? I don't see this optimization in compare_pathkeys,
that only checks key1/key2 after the forboth loop, but that's something
different.

I may be wrong, but my guess would be that this won't save much, because
the loop should terminate immediately. The whole point is not to loop
over possibly many pathkeys when we know it's exactly the same pathkeys
list (same pointer). When one of the lists is NIL the loop should
terminate right away ...


I also ran perf with a slightly modified version of your test that
uses psql, and after the above changes was seeing something like a
3.5% delta between master and this patch series. Nothing obvious in
the perf report though.



Yeah, I don't think there's anything obviously more expensive.


This test is intended to be somewhat worst case, no? At what point do
we consider the trade-off worth it (given that it's not plausible to
have zero impact)?



Yes, more or less. It was definitely designed to do that - it merely
plans the query (no execution), with many applicable indexes etc. It's
definitely true that once the exection starts to take more time, the
overhead will get negligible. Same for reducing number of indexes.

And of course, for queries that can benefit from incremental sort, the
speedup is likely way larger than this.

In general, I think it'd be naive that we can make planner smarter with
no extra overhead spent on planning, and we can never accept patches
adding even tiny overhead. With that approach we'd probably end up with
a trivial planner that generates just a single query plan, because
that's going to be the fastest planner. A realistic approach needs to
consider both the planning and execution phase, and benefits of this
patch seem to be clear - if you have queries that do benefit from it.

Let's try to minimize the overhead a bit more, and then we'll see.



Also, while looking at pathkeys_common_contained_in(), I've been a bit
puzzled why does is this correct:

 return (key1 == NULL);

It's easy to not notice it's key1 and not keys1, so I suggest we add a
comment 'key1 == NULL' means we've processed whole keys1 list.


Done.

I've included fixes for Alvaro's comments in this patch series also.



Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tom Lane
James Coleman  writes:
> On Tue, Mar 31, 2020 at 1:04 PM Tom Lane  wrote:
>> Perhaps the semantics are such that that's actually sensible, but it's
>> far from a straightforward remapping of the old enum.

> Right, I didn't see the explicit "= 0" in other enums there, so it
> made me wonder if it was intentional to designate that one had to be
> 0, but I guess without a comment that's a lot of inference.

It's possible that somebody meant that as an indicator that the code
depends on palloc0() leaving the field with that value.  But if so,
you'd soon find that out ... and an actual comment would be better,
anyway.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 1:04 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory
> > + * instrumentation so needs to have each value be a separate bit.
>
> >> I don't quite understand why you skipped "1".  (Also, is the use of zero
> >> a wise choice?)
>
> > The assignment of 0 was already there, and there wasn't a comment to
> > indicate why. That ends up meaning we wouldn't display "still in
> > progress" as a type here, which is maybe desirable, but I'm honestly
> > not sure why it was that way originally. I'm curious if you have any
> > thoughts on it.
>
> As things stood, the "= 0" was a no-op, since the first enum value
> would've been that anyway.  But if you're converting this set of symbols
> to bits that can be OR'd together, it seems pretty strange to use zero,
> because that can't be distinguished from "absence of any entry".
>
> Perhaps the semantics are such that that's actually sensible, but it's
> far from a straightforward remapping of the old enum.

Right, I didn't see the explicit "= 0" in other enums there, so it
made me wonder if it was intentional to designate that one had to be
0, but I guess without a comment that's a lot of inference.

The semantics seemed somewhat useful here in theory, but since I'm not
hearing a "yeah, that was intentional but not commented", I'm just
going to change it to what you'd naturally expect.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Tom Lane
James Coleman  writes:
> + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory
> + * instrumentation so needs to have each value be a separate bit.

>> I don't quite understand why you skipped "1".  (Also, is the use of zero
>> a wise choice?)

> The assignment of 0 was already there, and there wasn't a comment to
> indicate why. That ends up meaning we wouldn't display "still in
> progress" as a type here, which is maybe desirable, but I'm honestly
> not sure why it was that way originally. I'm curious if you have any
> thoughts on it.

As things stood, the "= 0" was a no-op, since the first enum value
would've been that anyway.  But if you're converting this set of symbols
to bits that can be OR'd together, it seems pretty strange to use zero,
because that can't be distinguished from "absence of any entry".

Perhaps the semantics are such that that's actually sensible, but it's
far from a straightforward remapping of the old enum.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread James Coleman
On Tue, Mar 31, 2020 at 12:31 PM Alvaro Herrera
 wrote:
>
> On 2020-Mar-30, James Coleman wrote:
>
> > +/* 
> > + *Instruementation information for IncrementalSort
> > + * 
> > + */
> > +typedef struct IncrementalSortGroupInfo
> > +{
> > + int64   groupCount;
> > + longmaxDiskSpaceUsed;
> > + longtotalDiskSpaceUsed;
> > + longmaxMemorySpaceUsed;
> > + longtotalMemorySpaceUsed;
> > + SizesortMethods; /* bitmask of TuplesortMethod */
> > +} IncrementalSortGroupInfo;
>
> There's a typo "Instruementation" in the comment, but I'm more surprised
> that type Size is being used to store a bitmask.  It looks weird to me.
> Wouldn't it be more reasonable to use bits32 or some such?  (I first
> noticed this in the "sizeof(Size)" code that appears in the explain
> code.)

I just didn't know about bits32; I'll change.

> OTOH, aesthetically it would seem to be better to define these values
> using ones and increasing shifts (1 << 1 and so on), rather than powers
> of two:
>
> > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory
> > + * instrumentation so needs to have each value be a separate bit.
> >   */
> >  typedef enum
> >  {
> >   SORT_TYPE_STILL_IN_PROGRESS = 0,
> > - SORT_TYPE_TOP_N_HEAPSORT,
> > - SORT_TYPE_QUICKSORT,
> > - SORT_TYPE_EXTERNAL_SORT,
> > - SORT_TYPE_EXTERNAL_MERGE
> > + SORT_TYPE_TOP_N_HEAPSORT = 2,
> > + SORT_TYPE_QUICKSORT = 4,
> > + SORT_TYPE_EXTERNAL_SORT = 8,
> > + SORT_TYPE_EXTERNAL_MERGE = 16
> >  } TuplesortMethod;
>
> I don't quite understand why you skipped "1".  (Also, is the use of zero
> a wise choice?)

The assignment of 0 was already there, and there wasn't a comment to
indicate why. That ends up meaning we wouldn't display "still in
progress" as a type here, which is maybe desirable, but I'm honestly
not sure why it was that way originally. I'm curious if you have any
thoughts on it.

I knew some projects used increasing shifts, but I wasn't sure what
the preference was here. I'll correct.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-31 Thread Alvaro Herrera
On 2020-Mar-30, James Coleman wrote:

> +/* 
> + *Instruementation information for IncrementalSort
> + * 
> + */
> +typedef struct IncrementalSortGroupInfo
> +{
> + int64   groupCount;
> + longmaxDiskSpaceUsed;
> + longtotalDiskSpaceUsed;
> + longmaxMemorySpaceUsed;
> + longtotalMemorySpaceUsed;
> + SizesortMethods; /* bitmask of TuplesortMethod */
> +} IncrementalSortGroupInfo;

There's a typo "Instruementation" in the comment, but I'm more surprised
that type Size is being used to store a bitmask.  It looks weird to me.
Wouldn't it be more reasonable to use bits32 or some such?  (I first
noticed this in the "sizeof(Size)" code that appears in the explain
code.)


OTOH, aesthetically it would seem to be better to define these values
using ones and increasing shifts (1 << 1 and so on), rather than powers
of two:

> + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory
> + * instrumentation so needs to have each value be a separate bit.
>   */
>  typedef enum
>  {
>   SORT_TYPE_STILL_IN_PROGRESS = 0,
> - SORT_TYPE_TOP_N_HEAPSORT,
> - SORT_TYPE_QUICKSORT,
> - SORT_TYPE_EXTERNAL_SORT,
> - SORT_TYPE_EXTERNAL_MERGE
> + SORT_TYPE_TOP_N_HEAPSORT = 2,
> + SORT_TYPE_QUICKSORT = 4,
> + SORT_TYPE_EXTERNAL_SORT = 8,
> + SORT_TYPE_EXTERNAL_MERGE = 16
>  } TuplesortMethod;

I don't quite understand why you skipped "1".  (Also, is the use of zero
a wise choice?)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-30 Thread Tomas Vondra

On Mon, Mar 30, 2020 at 06:53:47PM -0400, James Coleman wrote:

On Mon, Mar 30, 2020 at 8:24 AM Tomas Vondra
 wrote:


On Sun, Mar 29, 2020 at 10:16:53PM -0400, James Coleman wrote:
>On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra
> wrote:
>>
>> Hi,
>>
>> Attached is a slightly reorganized patch series. I've merged the fixes
>> into the appropriate matches, and I've also combined the two patches
>> adding incremental sort paths to additional places in planner.
>>
>> A couple more comments:
>>
>>
>> 1) I think the GUC documentation in src/sgml/config.sgml is a bit too
>> detailed, compared to the other enable_* GUCs. I wonder if there's a
>> better place where to move the details. What about adding some examples
>> and explanation to perform.sgml?
>
>I'll take a look at that and include in a patch series tomorrow.


Attached.


>> 2) Looking at the explain output, the verbose mode looks like this:
>>
>> test=# explain (verbose, analyze) select a from t order by a, b, c;
>>
QUERY PLAN
>> 
--
>>   Gather Merge  (cost=66.31..816072.71 rows=8333226 width=24) (actual 
time=4.787..20092.555 rows=1000 loops=1)
>> Output: a, b, c
>> Workers Planned: 2
>> Workers Launched: 2
>> ->  Incremental Sort  (cost=66.28..729200.36 rows=4166613 width=24) 
(actual time=1.308..14021.575 rows=333 loops=3)
>>   Output: a, b, c
>>   Sort Key: t.a, t.b, t.c
>>   Presorted Key: t.a, t.b
>>   Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB 
peak=30kB
>>   Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB 
peak=138kB
>>   Worker 0:  actual time=0.766..16122.368 rows=3841573 loops=1
>>   Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB
>> Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB 
peak=141kB
>>   Worker 1:  actual time=1.986..16189.831 rows=3845490 loops=1
>>   Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB
>> Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB 
peak=139kB
>>   ->  Parallel Index Scan using t_a_b_idx on public.t  
(cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 rows=333 
loops=3)
>> Output: a, b, c
>> Worker 0:  actual time=0.048..11275.178 rows=3841573 loops=1
>> Worker 1:  actual time=0.041..11314.133 rows=3845490 loops=1
>>   Planning Time: 0.166 ms
>>   Execution Time: 25135.029 ms
>> (22 rows)
>>
>> There seems to be missing indentation for the first line of worker info.
>
>Working on that too.


See attached. I've folded in the original "explain fixes" patch into
the main series, and the "explain fixes" patch in this series contains
only the changes for the above.



Thanks. I'll take a look at those changes tomorrow.


>> I'm still not quite convinced we should be printing two lines - I know
>> you mentioned the lines might be too long, but see how long the other
>> lines may get ...
>
>All right, I give in :)
>
>Do you think non-workers (both the leader and non-parallel plans)
>should also move to one line?
>

I think we should use the same formatting for both cases, so yes.

FWIW I forgot to mention I tweaked the INSTRUMENT_SORT_GROUP macro a
bit, by moving the if condition in it. That makes the calls easier.


Ah, that actually fixed some of the compile warnings. The other is
fixed in my explain fixes patch.


>> 3) I see the new nodes (plan state, ...) have "presortedCols" which does
>> not indicate it's a "number of". I think we usually prefix names of such
>> fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.)
>
>I can fix this too.


Changed everywhere we used this var name. I'm tempted to change to
nPresortedKeys, but a cursory glance suggests some cases might
actually be consistent with other var names reference columns, so I'm
not sure if we want to go down that path (and change more than just
this).



Not sure. We use "sort keys" and "path keys" for this, but I think
"columns" is good enough.


The main thing I've been working on today is benchmarking how this
affects planning. And I'm seeing a regression that worries me a bit,
unfortunately.

The test I'm doing is pretty simple - build a small table with a bunch
of columns:

  create table t (a int, b int, c int, d int, e int, f int, g int);

  insert into t select 100*random(), 100*random(), 100*random(),
100*random(), 100*random(), 100*random(), 100*random()
  from generate_series(1,10) s(i);

and then a number of indexes on subsets of up to 3 columns, as generated
using the attached build-indexes.py script. And then run a bunch of
explains (so no actual execution) sorting the data by at least 4 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-30 Thread Tomas Vondra

On Sun, Mar 29, 2020 at 10:16:53PM -0400, James Coleman wrote:

On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra
 wrote:


Hi,

Attached is a slightly reorganized patch series. I've merged the fixes
into the appropriate matches, and I've also combined the two patches
adding incremental sort paths to additional places in planner.

A couple more comments:


1) I think the GUC documentation in src/sgml/config.sgml is a bit too
detailed, compared to the other enable_* GUCs. I wonder if there's a
better place where to move the details. What about adding some examples
and explanation to perform.sgml?


I'll take a look at that and include in a patch series tomorrow.


2) Looking at the explain output, the verbose mode looks like this:

test=# explain (verbose, analyze) select a from t order by a, b, c;
   
QUERY PLAN
--
  Gather Merge  (cost=66.31..816072.71 rows=8333226 width=24) (actual 
time=4.787..20092.555 rows=1000 loops=1)
Output: a, b, c
Workers Planned: 2
Workers Launched: 2
->  Incremental Sort  (cost=66.28..729200.36 rows=4166613 width=24) (actual 
time=1.308..14021.575 rows=333 loops=3)
  Output: a, b, c
  Sort Key: t.a, t.b, t.c
  Presorted Key: t.a, t.b
  Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB 
peak=30kB
  Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB 
peak=138kB
  Worker 0:  actual time=0.766..16122.368 rows=3841573 loops=1
  Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB
Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB 
peak=141kB
  Worker 1:  actual time=1.986..16189.831 rows=3845490 loops=1
  Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB
Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB 
peak=139kB
  ->  Parallel Index Scan using t_a_b_idx on public.t  
(cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 
rows=333 loops=3)
Output: a, b, c
Worker 0:  actual time=0.048..11275.178 rows=3841573 loops=1
Worker 1:  actual time=0.041..11314.133 rows=3845490 loops=1
  Planning Time: 0.166 ms
  Execution Time: 25135.029 ms
(22 rows)

There seems to be missing indentation for the first line of worker info.


Working on that too.


I'm still not quite convinced we should be printing two lines - I know
you mentioned the lines might be too long, but see how long the other
lines may get ...


All right, I give in :)

Do you think non-workers (both the leader and non-parallel plans)
should also move to one line?



I think we should use the same formatting for both cases, so yes.

FWIW I forgot to mention I tweaked the INSTRUMENT_SORT_GROUP macro a
bit, by moving the if condition in it. That makes the calls easier.


3) I see the new nodes (plan state, ...) have "presortedCols" which does
not indicate it's a "number of". I think we usually prefix names of such
fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.)


I can fix this too.

Also I noticed a few compiler warnings I'll fixup in tomorrow's reply.



OK

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-29 Thread James Coleman
On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra
 wrote:
>
> Hi,
>
> Attached is a slightly reorganized patch series. I've merged the fixes
> into the appropriate matches, and I've also combined the two patches
> adding incremental sort paths to additional places in planner.
>
> A couple more comments:
>
>
> 1) I think the GUC documentation in src/sgml/config.sgml is a bit too
> detailed, compared to the other enable_* GUCs. I wonder if there's a
> better place where to move the details. What about adding some examples
> and explanation to perform.sgml?

I'll take a look at that and include in a patch series tomorrow.

> 2) Looking at the explain output, the verbose mode looks like this:
>
> test=# explain (verbose, analyze) select a from t order by a, b, c;
>
> QUERY PLAN
> --
>   Gather Merge  (cost=66.31..816072.71 rows=8333226 width=24) (actual 
> time=4.787..20092.555 rows=1000 loops=1)
> Output: a, b, c
> Workers Planned: 2
> Workers Launched: 2
> ->  Incremental Sort  (cost=66.28..729200.36 rows=4166613 width=24) 
> (actual time=1.308..14021.575 rows=333 loops=3)
>   Output: a, b, c
>   Sort Key: t.a, t.b, t.c
>   Presorted Key: t.a, t.b
>   Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB 
> peak=30kB
>   Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB 
> peak=138kB
>   Worker 0:  actual time=0.766..16122.368 rows=3841573 loops=1
>   Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB
> Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB 
> peak=141kB
>   Worker 1:  actual time=1.986..16189.831 rows=3845490 loops=1
>   Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB
> Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB 
> peak=139kB
>   ->  Parallel Index Scan using t_a_b_idx on public.t  
> (cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 
> rows=333 loops=3)
> Output: a, b, c
> Worker 0:  actual time=0.048..11275.178 rows=3841573 loops=1
> Worker 1:  actual time=0.041..11314.133 rows=3845490 loops=1
>   Planning Time: 0.166 ms
>   Execution Time: 25135.029 ms
> (22 rows)
>
> There seems to be missing indentation for the first line of worker info.

Working on that too.

> I'm still not quite convinced we should be printing two lines - I know
> you mentioned the lines might be too long, but see how long the other
> lines may get ...

All right, I give in :)

Do you think non-workers (both the leader and non-parallel plans)
should also move to one line?

> 3) I see the new nodes (plan state, ...) have "presortedCols" which does
> not indicate it's a "number of". I think we usually prefix names of such
> fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.)

I can fix this too.

Also I noticed a few compiler warnings I'll fixup in tomorrow's reply.

> My TODO for this patch is this:
>
> - review the costing (I think the estimates are OK, but I recall I
>haven't been entirely happy with how it's broken into functions.)
>
> - review the tuplesort changes (the memory contexts etc.)
>
> - do more testing of performance impact on planning

Sounds good.

Thanks,
James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread James Coleman
On Sat, Mar 28, 2020 at 11:14 PM Tomas Vondra
 wrote:
>
> On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote:
> >On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra
> > wrote:
> >>
> >> Hi,
> >>
> >> Attached is my take on simplification of the useful pathkeyes thing. It
> >> keeps the function, but it truncates query_pathkeys to only members with
> >> EC members in the relation. I think that's essentially the optimization
> >> you've proposed.
> >
> >Thanks. I've included that in the patch series in this email (as a
> >separate patch) with a few additional comments.
> >
>
> Thanks.
>
> >I've also noticed that the enabled_incrementalsort check in
> >generate_useful_gather_paths seemed broken, because it returned us out
> >of the function before creating either a plain gather merge (if
> >already sorted) or an explicit sort path. I've included a patch that
> >moves it to the if block that actually builds the incremental sort
> >path.
> >
>
> Hmmm, that's probably right.
>
> The reason why the GUC check was right after generate_gather_paths is
> that the intent to disable all the useful-pathkeys business, essentially
> reducing it back to plain generate_gather_paths.
>
> But I think you're right that's wrong, because it might lead to strange
> behavior when the GUC switches between plans without any incremental
> sort nodes - setting it to 'true' might end up picking GM on top of
> plain Sort, for example.

Thanks.

> >>
> >> ...
> >>
> >> 1) Missing worker identification (Worker #).
> >
> >Fixed.
> >
> >> 2) Missing method for workers (we have it for the leader, though).
> >
> >Fixed. Since we can't have pointers in the parallel shared memory
> >space, we can't store the sort methods used in a list. To accomplish
> >the same goal, I've assigned the TuplesortMethod enum entries uique
> >bit positions, and store the methods used in a bitmask.
> >
>
> OK, makes sense.
>
> >> 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
> >> why it's in parenthesis.
> >
> >I've removed the parentheses. It's labeled "Methods" since there can
> >be more than one (different batches could use different methods). I've
> >updated this to properly use singular/plural depending on the number
> >of methods used.
> >
>
> I'm a bit confused. How do I know which batch used which method? Is it
> actually worth printing in explain analyze? Maybe only print it in the
> verbose mode?

The alternative is showing no sort method information at all, or only
showing it if all batches used the same method (which seems confusing
to me). It seems weird that we wouldn't try to find some rough
analogue to what a regular sort node outputs, so I've attempted to
summarize.

This is similar to the memory information: the average doesn't apply
to any one batch, and you don't know which one (or how many) hit the
peak memory usage either, but I think it's meaningful to know a
summary.

With the sort methods, I think it's useful to be able to, for example,
know if any of the groups happened to trigger the top-n heapsort
optimization, or not, and as a corollary, if all of them did or not.

> >> 4) Not sure having two lines for each worker is a great idea.
> >
> >I've left these in for now because the lines are already very long
> >(much, much longer than the worker lines in a standard sort node).
> >This is largely because we're trying to summarize many sort batches,
> >while standard sort nodes only have to give the exact stats from a
> >single batch.
> >
> >See the example output later in the email.
> >
>
> OK
>
> >> 5) I'd probably prefer having multiple labels for avg/max memory values,
> >> instead of (avg) and (max) notes. Also, I think we use "peak" in this
> >> context instead of "max".
> >
> >Updated.
> >
>
> OK
>
> >Here's the current output:
> >
> > Limit  (cost=1887419.20..1889547.68 rows=1 width=8) (actual
> >time=13218.403..13222.519 rows=1 loo
> >ps=1)
> >   ->  Gather Merge  (cost=1887419.20..19624748.03 rows=8360
> >width=8) (actual time=13218.401..13229.7
> >50 rows=1 loops=1)
> > Workers Planned: 2
> > Workers Launched: 2
> > ->  Incremental Sort  (cost=1886419.17..10005010.55
> >rows=4180 width=8) (actual time=13208.00
> >4..13208.586 rows=4425 loops=3)
> >   Sort Key: a, b
> >   Presorted Key: a
> >   Full-sort Groups: 1 Sort Method: quicksort Memory:
> >avg=28kB peak=28kB
> >   Presorted Groups: 1 Sort Method: top-N heapsort Memory:
> >avg=1681kB peak=1681kB
> >   Worker 0:  Full-sort Groups: 1 Sort Method: quicksort
> >Memory: avg=28kB peak=28kB
> > Presorted Groups: 1 Sort Method: top-N heapsort
> >Memory: avg=1680kB peak=1680kB
> >   Worker 1:  Full-sort Groups: 1 Sort Method: quicksort
> >Memory: avg=28kB peak=28kB
> > Presorted Groups: 1 Sort Method: top-N heapsort
> >Memory: avg=1682kB peak=1682kB
> >   ->  Parallel Index Scan using index_s_a on 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra

On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote:

On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra
 wrote:


Hi,

Attached is my take on simplification of the useful pathkeyes thing. It
keeps the function, but it truncates query_pathkeys to only members with
EC members in the relation. I think that's essentially the optimization
you've proposed.


Thanks. I've included that in the patch series in this email (as a
separate patch) with a few additional comments.



Thanks.


I've also noticed that the enabled_incrementalsort check in
generate_useful_gather_paths seemed broken, because it returned us out
of the function before creating either a plain gather merge (if
already sorted) or an explicit sort path. I've included a patch that
moves it to the if block that actually builds the incremental sort
path.



Hmmm, that's probably right.

The reason why the GUC check was right after generate_gather_paths is
that the intent to disable all the useful-pathkeys business, essentially
reducing it back to plain generate_gather_paths.

But I think you're right that's wrong, because it might lead to strange
behavior when the GUC switches between plans without any incremental
sort nodes - setting it to 'true' might end up picking GM on top of
plain Sort, for example.



...

1) Missing worker identification (Worker #).


Fixed.


2) Missing method for workers (we have it for the leader, though).


Fixed. Since we can't have pointers in the parallel shared memory
space, we can't store the sort methods used in a list. To accomplish
the same goal, I've assigned the TuplesortMethod enum entries uique
bit positions, and store the methods used in a bitmask.



OK, makes sense.


3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
why it's in parenthesis.


I've removed the parentheses. It's labeled "Methods" since there can
be more than one (different batches could use different methods). I've
updated this to properly use singular/plural depending on the number
of methods used.



I'm a bit confused. How do I know which batch used which method? Is it
actually worth printing in explain analyze? Maybe only print it in the
verbose mode?


4) Not sure having two lines for each worker is a great idea.


I've left these in for now because the lines are already very long
(much, much longer than the worker lines in a standard sort node).
This is largely because we're trying to summarize many sort batches,
while standard sort nodes only have to give the exact stats from a
single batch.

See the example output later in the email.



OK


5) I'd probably prefer having multiple labels for avg/max memory values,
instead of (avg) and (max) notes. Also, I think we use "peak" in this
context instead of "max".


Updated.



OK


Here's the current output:

Limit  (cost=1887419.20..1889547.68 rows=1 width=8) (actual
time=13218.403..13222.519 rows=1 loo
ps=1)
  ->  Gather Merge  (cost=1887419.20..19624748.03 rows=8360
width=8) (actual time=13218.401..13229.7
50 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Incremental Sort  (cost=1886419.17..10005010.55
rows=4180 width=8) (actual time=13208.00
4..13208.586 rows=4425 loops=3)
  Sort Key: a, b
  Presorted Key: a
  Full-sort Groups: 1 Sort Method: quicksort Memory:
avg=28kB peak=28kB
  Presorted Groups: 1 Sort Method: top-N heapsort Memory:
avg=1681kB peak=1681kB
  Worker 0:  Full-sort Groups: 1 Sort Method: quicksort
Memory: avg=28kB peak=28kB
Presorted Groups: 1 Sort Method: top-N heapsort
Memory: avg=1680kB peak=1680kB
  Worker 1:  Full-sort Groups: 1 Sort Method: quicksort
Memory: avg=28kB peak=28kB
Presorted Groups: 1 Sort Method: top-N heapsort
Memory: avg=1682kB peak=1682kB
  ->  Parallel Index Scan using index_s_a on s
(cost=0.57..4967182.06 rows=4180 width=8
) (actual time=0.455..11730.878 rows=668 loops=3)

James


Looks reasonable. Did you try it in other output formats - json/yaml?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra

Hi,

Attached is my take on simplification of the useful pathkeyes thing. It
keeps the function, but it truncates query_pathkeys to only members with
EC members in the relation. I think that's essentially the optimization
you've proposed.

I've also noticed an issue in explain output. EXPLAIN ANALYZE on a simple
query gives me this:

  QUERY PLAN
---
 Gather Merge  (cost=66.30..816060.48 rows=8333226 width=24) (actual 
time=6.464..19091.006 rows=1000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Incremental Sort  (cost=66.28..729188.13 rows=4166613 width=24) (actual 
time=1.836..13401.109 rows=333 loops=3)
 Sort Key: a, b, c
 Presorted Key: a, b
 Full-sort Groups: 4156 (Methods: quicksort) Memory: 30kB (avg), 30kB 
(max)
 Presorted Groups: 4137 (Methods: quicksort) Memory: 108kB (avg), 111kB 
(max)
 Full-sort Groups: 6888 (Methods: ) Memory: 30kB (avg), 30kB (max)
 Presorted Groups: 6849 (Methods: ) Memory: 121kB (avg), 131kB (max)
 Full-sort Groups: 6869 (Methods: ) Memory: 30kB (avg), 30kB (max)
 Presorted Groups: 6816 (Methods: ) Memory: 128kB (avg), 132kB (max)
 ->  Parallel Index Scan using t_a_b_idx on t  (cost=0.43..382353.69 
rows=4166613 width=24) (actual time=0.033..9346.679 rows=333 loops=3)
 Planning Time: 0.133 ms
 Execution Time: 23998.669 ms
(15 rows)

while with incremental sort disabled it looks like this:

 QUERY PLAN
-
 Gather Merge  (cost=734387.50..831676.35 rows=8333226 width=24) (actual 
time=5597.978..14967.246 rows=1000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=734387.47..744804.00 rows=4166613 width=24) (actual 
time=5584.616..7645.711 rows=333 loops=3)
 Sort Key: a, b, c
 Sort Method: external merge  Disk: 111216kB
 Worker 0:  Sort Method: external merge  Disk: 109552kB
 Worker 1:  Sort Method: external merge  Disk: 112112kB
 ->  Parallel Seq Scan on t  (cost=0.00..105361.13 rows=4166613 
width=24) (actual time=0.011..1753.128 rows=333 loops=3)
 Planning Time: 0.048 ms
 Execution Time: 19682.582 ms
(11 rows)

So I think there's a couple of issues:

1) Missing worker identification (Worker #).

2) Missing method for workers (we have it for the leader, though).

3) I'm not sure why the lable is "Methods" instead of "Sort Method", and
why it's in parenthesis.

4) Not sure having two lines for each worker is a great idea.

5) I'd probably prefer having multiple labels for avg/max memory values,
instead of (avg) and (max) notes. Also, I think we use "peak" in this
context instead of "max".


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 32bf734820..49d4990e66 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2748,7 +2748,6 @@ static List *
 get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
 {
List   *useful_pathkeys_list = NIL;
-   ListCell   *lc;
 
/*
 * Considering query_pathkeys is always worth it, because it might 
allow us
@@ -2756,29 +2755,27 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, 
RelOptInfo *rel)
 */
if (root->query_pathkeys)
{
-   boolquery_pathkeys_ok = true;
+   ListCell   *lc;
+   List   *pathkeys = NIL;
 
foreach(lc, root->query_pathkeys)
{
PathKey*pathkey = (PathKey *) lfirst(lc);
EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-   Expr   *em_expr;
 
/*
-* We can't use incremental sort for pathkeys 
containing volatile
-* expressions. We could walk the exppression itself, 
but checking
-* ec_has_volatile here saves some cycles.
+* We can't build Incremental Sort when pathkeys 
contain elements
+* without an EC member not contained in the current 
relation, so
+* just ignore anything after the first such pathkey.
 */
-   if (pathkey_ec->ec_has_volatile ||
-   !(em_expr = find_em_expr_for_rel(pathkey_ec, 
rel)))
-   {
-   

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread James Coleman
On Sat, Mar 28, 2020 at 5:30 PM Tomas Vondra
 wrote:
>
> On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:
> >On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
> > wrote:
> >>
> >> ...
> >>
> >> The more I look at pathkeys_useful_for_ordering() the more I think the
> >> get_useful_pathkeys_for_relation() function should look more like it
> >> than the postgres_fdw one ...
> >
> >If we go down that path (and indeed this is actually implied by
> >removing the volatile check too), doesn't that really just mean (at
> >least for now) that get_useful_pathkeys_for_relation goes away
> >entirely and we effectively use root->query_pathkeys directly? The
> >only thing your lose them is the check that each eclass has a member
> >in the rel. But that check probably wasn't quite right anyway: at
> >least for incremental sort (maybe not regular sort), I think we
> >probably don't necessarily care that the entire list has members in
> >the rel, but rather that some prefix does, right? The idea there would
> >be that we could create a gather merge here that supplies a partial
> >ordering (that prefix of query_pathkeys) and then above it the planner
> >might place another incremental sort (say above a join), or perhaps
> >even a join above that would actually be sufficient (since many joins
> >are capable of providing an ordering anyway).
> >
> >I've attached (added prefix .txt to avoid the cfbot assuming this is a
> >full series) an idea in that direction to see if we're thinking along
> >the same route. You'll want to apply and view with `git diff -w`
> >probably.
> >
>
> Hmmm, I'm not sure the patch is quite correct.
>
> Firstly, I suggest we don't remove get_useful_pathkeys_for_relation
> entirely, because that allows us to add more useful pathkeys in the
> future (even if we don't consider pathkeys useful for merging now).
> We could also do the EC optimization in the function, return NIL and
> not loop over partial_pathlist at all. That's cosmetic issue, though.
>
> More importantly, I'm not sure this makes sense:
>
>/* consider incremental sort for interesting orderings */
>useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys);
>
>...
>
>is_sorted = pathkeys_common_contained_in(useful_pathkeys,
> subpath->pathkeys,
> _keys);
>
> I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys,
> right? So how could this ever return is_sorted=false?
>
> The whole point is to end up with query_pathkeys (or whatever pathkeys
> we deem useful), but this does not do that.

Yes, that's obviously a thinko in my rush to get an idea out.

I think useful_pathkeys there would be essentially
root->query_pathkeys, or, more correctly the prefix of query_pathkeys
that has eclasses shared with the current rel. Or, if we go with the
more restrictive approach (see the two approaches I mentioned
earlier), then either NIL or query_pathkeys (if they all matches
eclasses in the current rel).

Given those requirements, I agree that keeping
get_useful_pathkeys_for_relation makes sense to wrap up all of that
behavior.

James




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-28 Thread Tomas Vondra

On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:

On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
 wrote:


...

The more I look at pathkeys_useful_for_ordering() the more I think the
get_useful_pathkeys_for_relation() function should look more like it
than the postgres_fdw one ...


If we go down that path (and indeed this is actually implied by
removing the volatile check too), doesn't that really just mean (at
least for now) that get_useful_pathkeys_for_relation goes away
entirely and we effectively use root->query_pathkeys directly? The
only thing your lose them is the check that each eclass has a member
in the rel. But that check probably wasn't quite right anyway: at
least for incremental sort (maybe not regular sort), I think we
probably don't necessarily care that the entire list has members in
the rel, but rather that some prefix does, right? The idea there would
be that we could create a gather merge here that supplies a partial
ordering (that prefix of query_pathkeys) and then above it the planner
might place another incremental sort (say above a join), or perhaps
even a join above that would actually be sufficient (since many joins
are capable of providing an ordering anyway).

I've attached (added prefix .txt to avoid the cfbot assuming this is a
full series) an idea in that direction to see if we're thinking along
the same route. You'll want to apply and view with `git diff -w`
probably.



Hmmm, I'm not sure the patch is quite correct.

Firstly, I suggest we don't remove get_useful_pathkeys_for_relation
entirely, because that allows us to add more useful pathkeys in the
future (even if we don't consider pathkeys useful for merging now).
We could also do the EC optimization in the function, return NIL and
not loop over partial_pathlist at all. That's cosmetic issue, though.

More importantly, I'm not sure this makes sense:

  /* consider incremental sort for interesting orderings */
  useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys);

  ...

  is_sorted = pathkeys_common_contained_in(useful_pathkeys,
   subpath->pathkeys,
   _keys);

I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys,
right? So how could this ever return is_sorted=false?

The whole point is to end up with query_pathkeys (or whatever pathkeys
we deem useful), but this does not do that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




  1   2   3   >