Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Thu, Jul 2, 2020 at 8:47 AM James Coleman wrote: > It seems like the consensus over at another discussion on this topic > [1] is that we ought to go ahead and print the zeros [for machine > readable output formats], even though that creates some interesting > scenarios like the fact that disk sorts will print 0 for memory even > though that's not true. What about having it print -1 for memory in this case instead? That's still not ideal, but machine readable EXPLAIN output ought to consistently show the same information per node, even when the answer is in some sense indeterminate. That seems to be the standard that we've settled on. It might be worth teaching the JSON format to show a JSON null or something instead. Not sure about that, though. -- Peter Geoghegan
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Thu, Jul 9, 2020 at 12:06 PM Jonathan S. Katz wrote: > On 7/2/20 11:47 AM, James Coleman wrote: > > It seems like the consensus over at another discussion on this topic > > [1] is that we ought to go ahead and print the zeros [for machine > > readable output formats], even though that creates some interesting > > scenarios like the fact that disk sorts will print 0 for memory even > > though that's not true. > > > > The change has already been made and pushed for hash disk spilling, so > > I think we ought to use Justin's patch here. > > Do people agree with James analysis? From the RMT perspective, we would > like to get this open item wrapped up for the next beta, given[1] is now > resolved. Tomas, Justin: Ping? Can we get an update on this? Just for the record, David Rowley fixed the similar hashagg issue in commit 40efbf8706cdd96e06bc4d1754272e46d9857875. I don't see any reason for the delay here. Thanks -- Peter Geoghegan
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On 7/2/20 11:47 AM, James Coleman wrote: > It seems like the consensus over at another discussion on this topic > [1] is that we ought to go ahead and print the zeros [for machine > readable output formats], even though that creates some interesting > scenarios like the fact that disk sorts will print 0 for memory even > though that's not true. > > The change has already been made and pushed for hash disk spilling, so > I think we ought to use Justin's patch here. Do people agree with James analysis? From the RMT perspective, we would like to get this open item wrapped up for the next beta, given[1] is now resolved. Thanks! Jonathan [1] https://www.postgresql.org/message-id/CAApHDvpBQx4Shmisjp7oKr%3DECX18KYKPB%3DKpdWYxMKQNvisgvQ%40mail.gmail.com signature.asc Description: OpenPGP digital signature
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
It seems like the consensus over at another discussion on this topic [1] is that we ought to go ahead and print the zeros [for machine readable output formats], even though that creates some interesting scenarios like the fact that disk sorts will print 0 for memory even though that's not true. The change has already been made and pushed for hash disk spilling, so I think we ought to use Justin's patch here. James [1] https://www.postgresql.org/message-id/2276865.1593102811%40sss.pgh.pa.us
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Fri, Jun 19, 2020 at 12:04 AM Justin Pryzby wrote: > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby wrote: > > > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > > > > I've pushed the fist part of this patch series - I've reorganized it a > > > > > > I scanned through this again post-commit. Find attached some suggestions. > > > > > > Shouldn't non-text explain output always show both disk *and* mem, > > > including > > > zeros ? > > > > Could you give more context on this? Is there a standard to follow? > > Regular sort nodes only ever report one type, so there's not a good > > parallel there. > > The change I proposed was like: > > @@ -2829,7 +2829,6 @@ > show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, > > ExplainPropertyList("Sort Methods Used", methodNames, es); > > - if (groupInfo->maxMemorySpaceUsed > 0) > { > longavgSpace = > groupInfo->totalMemorySpaceUsed / groupInfo->groupCount; > const char *spaceTypeName; > ... > - if (groupInfo->maxDiskSpaceUsed > 0) > { > ... > > To show in non-text format *both* disk and memory space used, even if zero. > > I still think that's what's desirable. I'm of the opposite opinion. I believe showing both unnecessarily is confusing. > If it's important to show *whether* a sort space was used, then I think there > should be a boolean, or an int 0/1. But I don't think it's actually needed, > since someone parsing the explain output could just check > |if _dict['Peak Sort Space Used'] > 0: ... > the same as you're doing, without having to write some variation on: > |if 'Peak Sort Space Used' in _dict and _dict['Peak Sort Space Used'] > 0: ... I think it's desirable for code to be explicitly about the type having been used rather than implicitly assuming it based on 0/non-zero values. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby wrote: > > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > > > I've pushed the fist part of this patch series - I've reorganized it a > > > > I scanned through this again post-commit. Find attached some suggestions. > > > > Shouldn't non-text explain output always show both disk *and* mem, including > > zeros ? > > Could you give more context on this? Is there a standard to follow? > Regular sort nodes only ever report one type, so there's not a good > parallel there. The change I proposed was like: @@ -2829,7 +2829,6 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, ExplainPropertyList("Sort Methods Used", methodNames, es); - if (groupInfo->maxMemorySpaceUsed > 0) { longavgSpace = groupInfo->totalMemorySpaceUsed / groupInfo->groupCount; const char *spaceTypeName; ... - if (groupInfo->maxDiskSpaceUsed > 0) { ... To show in non-text format *both* disk and memory space used, even if zero. I still think that's what's desirable. If it's important to show *whether* a sort space was used, then I think there should be a boolean, or an int 0/1. But I don't think it's actually needed, since someone parsing the explain output could just check |if _dict['Peak Sort Space Used'] > 0: ... the same as you're doing, without having to write some variation on: |if 'Peak Sort Space Used' in _dict and _dict['Peak Sort Space Used'] > 0: ... -- Justin
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, May 12, 2020 at 11:18 AM Tomas Vondra wrote: > I've pushed both patches, fixing typos and explain format. Thanks, Tomas. -- Peter Geoghegan
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, May 10, 2020 at 02:25:23PM +0200, Tomas Vondra wrote: On Sat, May 09, 2020 at 06:48:02PM -0700, Peter Geoghegan wrote: On Sat, May 9, 2020 at 3:19 PM Tomas Vondra wrote: I'm generally OK with most of this - I'd probably keep the single-line format, but I don't feel very strongly about that and if others think using two lines is better ... Barring objections I'll get this polished and pushed soon-ish (say, early next week). I see something about starting a new thread on the Open Items page. Please CC me on this. Speaking in my capacity as an RMT member: Glad to see that you plan to close this item out next week. (I had planned on giving you a nudge about this, but it looks like I don't really have to now.) Not sure about about the new thread - the discussion continues on the main incremental sort thread, I don't see any proposal to start a new thread there. IMO it'd be pointless at this point. I've pushed both patches, fixing typos and explain format. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, May 09, 2020 at 06:48:02PM -0700, Peter Geoghegan wrote: On Sat, May 9, 2020 at 3:19 PM Tomas Vondra wrote: I'm generally OK with most of this - I'd probably keep the single-line format, but I don't feel very strongly about that and if others think using two lines is better ... Barring objections I'll get this polished and pushed soon-ish (say, early next week). I see something about starting a new thread on the Open Items page. Please CC me on this. Speaking in my capacity as an RMT member: Glad to see that you plan to close this item out next week. (I had planned on giving you a nudge about this, but it looks like I don't really have to now.) Not sure about about the new thread - the discussion continues on the main incremental sort thread, I don't see any proposal to start a new thread there. IMO it'd be pointless at this point. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, May 9, 2020 at 3:19 PM Tomas Vondra wrote: > I'm generally OK with most of this - I'd probably keep the single-line > format, but I don't feel very strongly about that and if others think > using two lines is better ... > > Barring objections I'll get this polished and pushed soon-ish (say, > early next week). I see something about starting a new thread on the Open Items page. Please CC me on this. Speaking in my capacity as an RMT member: Glad to see that you plan to close this item out next week. (I had planned on giving you a nudge about this, but it looks like I don't really have to now.) Thanks -- Peter Geoghegan
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, May 09, 2020 at 03:18:36PM -0500, Justin Pryzby wrote: Checking if it's possible to address this Opened Item before 13b1. https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items consistency of explain output: two spaces, equals vs colons, semicolons (incremental sort) Yes. Now that the other items related to incremental sort are fixed, this is next on my TODO. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Checking if it's possible to address this Opened Item before 13b1. https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items consistency of explain output: two spaces, equals vs colons, semicolons (incremental sort) On Sun, Apr 19, 2020 at 09:46:55AM -0400, James Coleman wrote: > On Sat, Apr 18, 2020 at 10:36 PM Justin Pryzby wrote: > > > > On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote: > > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > > > > And, should it use two spaces before "Sort Method", "Memory" and > > > > > "Pre-sorted > > > ... > > > > I read through that subthread, and the ending seemed to be Peter > > > > wanting things to be unified. Was there a conclusion beyond that? > > > > > > This discussion is ongoing. I think let's wait until that's settled > > > before > > > addressing this more complex and even newer case. We can add "explain, > > > two > > > spaces and equals vs colon" to the "Open items" list if need be - I hope > > > the > > > discussion will not delay the release. > > > > The change proposed on the WAL thread is minimal, and makes new explain(WAL) > > output consistent with the that of explain(BUFFERS). > > > > That uses a different format from "Sort", which is what incremental sort > > should > > follow. (Hashjoin also uses the Sort's format of two-spaces and colons > > rather > > than equals). > > I think it's not great that buffers/sort are different, but I agree > that we should follow sort. > > > So the attached 0001 makes explain output for incremental sort more > > consistent > > with sort: > > > > - Two spaces; > > - colons rather than equals; > > - Don't use semicolon, which isn't in use anywhere else; > > > > I tested with this: > > template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t > > SELECT a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON > > t(i); VACUUM VERBOSE ANALYZE t; > > template1=# explain analyze SELECT * FROM t a ORDER BY i,j; > > ... > >Full-sort Groups: 1000 Sort Method: quicksort Average Memory: 28kB > > Peak Memory: 28kB Pre-sorted Groups: 1000 Sort Method: quicksort Average > > Memory: 30kB Peak Memory: 30kB > > > > On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote: > > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby > > > > wrote: > > > > > Should "Pre-sorted Groups:" be on a separate line ? > > > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB > > > > > peak=28kB Pre-sorted Groups: 1 Sort Method: quicksort Memory: > > > > > avg=30kB peak=30kB > > > > > > > > I'd originally had that, but Tomas wanted it to be more compact. It's > > > > easy to adjust though if the consensus changes on that. > > > > > > I'm OK with changing the format if there's a consensus. The current > > > format seemed better to me, but I'm not particularly attached to it. > > > > I still think Pre-sorted groups should be on a separate line, as in 0002. > > In addition to looking better (to me), and being easier to read, another > > reason > > is that there are essentially key=>values here, but the keys are repeated > > (Sort > > Method, etc). > > I collapsed this into 0001 because I think that if we're going to do > away with the key=value style then we effectively to have to do this > to avoid the repeated values being confusing (with key=value it kinda > worked, because that made it seem like the avg/peak were clearly a > subset of the Sort Groups info). > > I also cleaned up the newline patch a bit in the process (we already > have a way to indent with a flag so don't need to do it directly). > > > I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that > > here. > > I went ahead and did that too; we already use "Full-sort", so the > proposed change makes both parallel. > > > Michael already patched most of the comment typos, the remainder I'm > > including > > here as a "nearby patch".. > > Modified slightly. > > James > From becd60ba348558fa241db5cc2100a84b757cbdc5 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Mon, 6 Apr 2020 17:37:31 -0500 > Subject: [PATCH v2 2/2] comment typos: Incremental Sort > > commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da > Author: Tomas Vondra > > Previously reported here: > https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com > --- > src/backend/commands/explain.c | 4 ++-- > src/backend/executor/nodeIncrementalSort.c | 10 ++ > src/backend/utils/sort/tuplesort.c | 8 > 3 files changed, 12 insertions(+), 10 deletions(-) > > diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c > index 5f91c569a0..86c10458f4 100644 > --- a/src/backend/commands/explain.c > +++ b/src/backend/commands/explain.c > @@ -2865,7 +2865,7 @@ > show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, > } > >
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, Apr 18, 2020 at 10:36 PM Justin Pryzby wrote: > > On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote: > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > > > And, should it use two spaces before "Sort Method", "Memory" and > > > > "Pre-sorted > > ... > > > I read through that subthread, and the ending seemed to be Peter > > > wanting things to be unified. Was there a conclusion beyond that? > > > > This discussion is ongoing. I think let's wait until that's settled before > > addressing this more complex and even newer case. We can add "explain, two > > spaces and equals vs colon" to the "Open items" list if need be - I hope the > > discussion will not delay the release. > > The change proposed on the WAL thread is minimal, and makes new explain(WAL) > output consistent with the that of explain(BUFFERS). > > That uses a different format from "Sort", which is what incremental sort > should > follow. (Hashjoin also uses the Sort's format of two-spaces and colons rather > than equals). I think it's not great that buffers/sort are different, but I agree that we should follow sort. > So the attached 0001 makes explain output for incremental sort more consistent > with sort: > > - Two spaces; > - colons rather than equals; > - Don't use semicolon, which isn't in use anywhere else; > > I tested with this: > template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT > a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON t(i); > VACUUM VERBOSE ANALYZE t; > template1=# explain analyze SELECT * FROM t a ORDER BY i,j; > ... >Full-sort Groups: 1000 Sort Method: quicksort Average Memory: 28kB Peak > Memory: 28kB Pre-sorted Groups: 1000 Sort Method: quicksort Average > Memory: 30kB Peak Memory: 30kB > > On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote: > > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby > > > wrote: > > > > Should "Pre-sorted Groups:" be on a separate line ? > > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB > > > > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB > > > > > > I'd originally had that, but Tomas wanted it to be more compact. It's > > > easy to adjust though if the consensus changes on that. > > > > I'm OK with changing the format if there's a consensus. The current > > format seemed better to me, but I'm not particularly attached to it. > > I still think Pre-sorted groups should be on a separate line, as in 0002. > In addition to looking better (to me), and being easier to read, another > reason > is that there are essentially key=>values here, but the keys are repeated > (Sort > Method, etc). I collapsed this into 0001 because I think that if we're going to do away with the key=value style then we effectively to have to do this to avoid the repeated values being confusing (with key=value it kinda worked, because that made it seem like the avg/peak were clearly a subset of the Sort Groups info). I also cleaned up the newline patch a bit in the process (we already have a way to indent with a flag so don't need to do it directly). > I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that > here. I went ahead and did that too; we already use "Full-sort", so the proposed change makes both parallel. > Michael already patched most of the comment typos, the remainder I'm including > here as a "nearby patch".. Modified slightly. James From becd60ba348558fa241db5cc2100a84b757cbdc5 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 6 Apr 2020 17:37:31 -0500 Subject: [PATCH v2 2/2] comment typos: Incremental Sort commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da Author: Tomas Vondra Previously reported here: https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com --- src/backend/commands/explain.c | 4 ++-- src/backend/executor/nodeIncrementalSort.c | 10 ++ src/backend/utils/sort/tuplesort.c | 8 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5f91c569a0..86c10458f4 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2865,7 +2865,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, } /* - * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node + * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node */ static void show_incremental_sort_info(IncrementalSortState *incrsortstate, @@ -2913,7 +2913,7 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate, >shared_info->sinfo[n]; /* - * If a worker hasn't process any sort groups at all, then exclude + * If a worker hasn't processed any sort groups at all, then exclude * it from output since it either
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 07, 2020 at 10:53:05AM -0500, Justin Pryzby wrote: > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > > And, should it use two spaces before "Sort Method", "Memory" and > > > "Pre-sorted > ... > > I read through that subthread, and the ending seemed to be Peter > > wanting things to be unified. Was there a conclusion beyond that? > > This discussion is ongoing. I think let's wait until that's settled before > addressing this more complex and even newer case. We can add "explain, two > spaces and equals vs colon" to the "Open items" list if need be - I hope the > discussion will not delay the release. The change proposed on the WAL thread is minimal, and makes new explain(WAL) output consistent with the that of explain(BUFFERS). That uses a different format from "Sort", which is what incremental sort should follow. (Hashjoin also uses the Sort's format of two-spaces and colons rather than equals). So the attached 0001 makes explain output for incremental sort more consistent with sort: - Two spaces; - colons rather than equals; - Don't use semicolon, which isn't in use anywhere else; I tested with this: template1=# DROP TABLE t; CREATE TABLE t(i int, j int); INSERT INTO t SELECT a-(a%100), a%1000 FROM generate_series(1,9)a; CREATE INDEX ON t(i); VACUUM VERBOSE ANALYZE t; template1=# explain analyze SELECT * FROM t a ORDER BY i,j; ... Full-sort Groups: 1000 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB Pre-sorted Groups: 1000 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB On Tue, Apr 07, 2020 at 05:34:15PM +0200, Tomas Vondra wrote: > On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby wrote: > > > Should "Pre-sorted Groups:" be on a separate line ? > > > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB > > > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB > > > > I'd originally had that, but Tomas wanted it to be more compact. It's > > easy to adjust though if the consensus changes on that. > > I'm OK with changing the format if there's a consensus. The current > format seemed better to me, but I'm not particularly attached to it. I still think Pre-sorted groups should be on a separate line, as in 0002. In addition to looking better (to me), and being easier to read, another reason is that there are essentially key=>values here, but the keys are repeated (Sort Method, etc). I also suggested to rename: s/Presorted/Pre-sorted/, but I didn't do that here. Michael already patched most of the comment typos, the remainder I'm including here as a "nearby patch".. -- Justin >From 55044341f82b847d136cd17df5a3c8d80c8371b4 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 15 Apr 2020 08:45:21 -0500 Subject: [PATCH v1 1/3] Fix explain output for incr sort: - Two spaces; - colons rather than equals; - Don't use semicolon; --- src/backend/commands/explain.c | 18 +++--- src/test/regress/expected/incremental_sort.out | 12 ++-- 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 7ae6131676..9257c52707 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2778,7 +2778,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, { if (indent) appendStringInfoSpaces(es->str, es->indent * 2); - appendStringInfo(es->str, "%s Groups: " INT64_FORMAT " Sort Method", groupLabel, + appendStringInfo(es->str, "%s Groups: " INT64_FORMAT " Sort Method", groupLabel, groupInfo->groupCount); /* plural/singular based on methodNames size */ if (list_length(methodNames) > 1) @@ -2798,24 +2798,20 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, const char *spaceTypeName; spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY); - appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB", + appendStringInfo(es->str, " Average %s: %ldkB Peak %s: %ldkB", spaceTypeName, avgSpace, - groupInfo->maxMemorySpaceUsed); + spaceTypeName, groupInfo->maxMemorySpaceUsed); } if (groupInfo->maxDiskSpaceUsed > 0) { long avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount; - const char *spaceTypeName; spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK); - /* Add a semicolon separator only if memory stats were printed. */ - if (groupInfo->maxMemorySpaceUsed > 0) -appendStringInfo(es->str, ";"); - appendStringInfo(es->str, " %s: avg=%ldkB peak=%ldkB", + appendStringInfo(es->str, " Average %s: %ldkB Peak %s: %ldkB", spaceTypeName, avgSpace, - groupInfo->maxDiskSpaceUsed); + spaceTypeName, groupInfo->maxDiskSpaceUsed); } } else @@ -2899,7 +2895,7 @@
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Thu, Apr 16, 2020 at 1:10 PM Tom Lane wrote: > > James Coleman writes: > > On Fri, Apr 10, 2020 at 10:12 AM James Coleman wrote: > >> One thing I just noticed and had a question about: in > >> preparePresortedCols (which sets up a function call context), do we > >> need to call pg_proc_aclcheck? > > > Background: this came up because I noticed that pg_proc_aclcheck is > > called in the scalar array op case in execExpr.c. > > > However grepping through the source code I see several places where a > > function (including an equality op for an ordering op, like the case > > we have here) gets looked up without calling pg_proc_aclcheck, but > > then other places where the acl check is invoked. > > Rule of thumb is that we don't apply ACL checks to functions/ops > we get out of an opclass; adding a function to an opclass is tantamount > to giving public execute permission on it. If the function/operator > reference came directly from the SQL query it must be checked. All right, in that case I believe we're OK here without modification. We're looking up the equality op based on the ordering op the planner has already selected for sorting the query, and I'm assuming that looking that up via the op family is in the same category as "getting out of an opclass" (since opclasses are part of an opfamily). Thanks for the explanation. > > In addition, I haven't been able to discern a reason for why sometimes > > InvokeFunctionExecuteHook gets called with the function after lookup, > > but not others. > > I would not stand here and say that that hook infrastructure is worth > anything at all. Maybe the coverage is sufficient for some use-cases, > but who's to say? Interesting. It does look to be particularly underused. Just grepping for that hook invocation macro shows, for example, that it's not used in nodeSort.c or tuplesort.c, so clearly it's not executed for the functions we'd use in regular sort. Given that...I think we can proceed without it here too. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > On Fri, Apr 10, 2020 at 10:12 AM James Coleman wrote: >> One thing I just noticed and had a question about: in >> preparePresortedCols (which sets up a function call context), do we >> need to call pg_proc_aclcheck? > Background: this came up because I noticed that pg_proc_aclcheck is > called in the scalar array op case in execExpr.c. > However grepping through the source code I see several places where a > function (including an equality op for an ordering op, like the case > we have here) gets looked up without calling pg_proc_aclcheck, but > then other places where the acl check is invoked. Rule of thumb is that we don't apply ACL checks to functions/ops we get out of an opclass; adding a function to an opclass is tantamount to giving public execute permission on it. If the function/operator reference came directly from the SQL query it must be checked. > In addition, I haven't been able to discern a reason for why sometimes > InvokeFunctionExecuteHook gets called with the function after lookup, > but not others. I would not stand here and say that that hook infrastructure is worth anything at all. Maybe the coverage is sufficient for some use-cases, but who's to say? regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Fri, Apr 10, 2020 at 10:12 AM James Coleman wrote: > > One thing I just noticed and had a question about: in > preparePresortedCols (which sets up a function call context), do we > need to call pg_proc_aclcheck? Background: this came up because I noticed that pg_proc_aclcheck is called in the scalar array op case in execExpr.c. However grepping through the source code I see several places where a function (including an equality op for an ordering op, like the case we have here) gets looked up without calling pg_proc_aclcheck, but then other places where the acl check is invoked. In addition, I haven't been able to discern a reason for why sometimes InvokeFunctionExecuteHook gets called with the function after lookup, but not others. So I'm not sure if either of these needed to be added to the equality op/function lookup code in nodeIncrementalSort's preparePresortedCols or not. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
One thing I just noticed and had a question about: in preparePresortedCols (which sets up a function call context), do we need to call pg_proc_aclcheck? James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote: On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra wrote: On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: >On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: >>hyrax is not too happy with this test: >> >>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 >> >>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking >>EXPLAIN output, but it evidently is. >> > >Thanks, I'll investigate. It's not clear to me either what might be >causing this, but I guess something must have gone wrong in >estimation/planning. > OK, I know what's going on - it's a rather embarassing issue in the regression test. There's no analyze on the test tables, so it uses default estimates for number of groups etc. But with clobber cache the test runs long enough for autoanalyze to kick in and collect stats, so we generate better estimates which changes the plan. I'll get this fixed - explicit analyze and tweaking the data a bit should do the trick. Looking at the tests that failed, I think we should consider just adding: set enable_sort = off; because several of those tests have very specific amounts of data to ensure we test the transition points around the different modes in the incremental sort node. Maybe, but I'd much rather tweak the data so that we test both the costing and execution part. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote: On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra wrote: On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote: >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote: >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra >> wrote: >>> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: >hyrax is not too happy with this test: > >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 > >It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking >EXPLAIN output, but it evidently is. > Thanks, I'll investigate. It's not clear to me either what might be causing this, but I guess something must have gone wrong in estimation/planning. >>> >>>OK, I know what's going on - it's a rather embarassing issue in the >>>regression test. There's no analyze on the test tables, so it uses >>>default estimates for number of groups etc. But with clobber cache the >>>test runs long enough for autoanalyze to kick in and collect stats, so >>>we generate better estimates which changes the plan. >>> >>>I'll get this fixed - explicit analyze and tweaking the data a bit >>>should do the trick. >> >>Looking at the tests that failed, I think we should consider just adding: >>set enable_sort = off; >>because several of those tests have very specific amounts of data to >>ensure we test the transition points around the different modes in the >>incremental sort node. >> > >Maybe, but I'd much rather tweak the data so that we test both the >costing and execution part. > I do think this does the trick by increasing the number of rows a bit (from 100 to 1000) to make the Sort more expensive than Incremental Sort, while still testing the transition points. James, can you verify it that's still true? Those changes all look good to me from a "testing correctness" POV. Also I like that we now test multiple sort methods in the explain output, like: "Sort Methods: top-N heapsort, quicksort". OK, good. I'll push the fix. I personally find the `i/100` notation harder to read than a case, but that's just an opinion... Yeah, but with 1000 rows we'd need a more complex CASE statement (I don't think simply having two groups - small+large would work). Should we change `analyze` to `analyze t` to avoid unnecessarily re-analyzing all other tables in the regression db? Ah, definitely. That was a mistake. Thanks for noticing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > Should we change `analyze` to `analyze t` to avoid unnecessarily > re-analyzing all other tables in the regression db? Yes, a global analyze here is a remarkably horrid idea. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra wrote: > > On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote: > >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote: > >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra > >> wrote: > >>> > >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: > On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: > >hyrax is not too happy with this test: > > > >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 > > > >It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking > >EXPLAIN output, but it evidently is. > > > > Thanks, I'll investigate. It's not clear to me either what might be > causing this, but I guess something must have gone wrong in > estimation/planning. > > >>> > >>>OK, I know what's going on - it's a rather embarassing issue in the > >>>regression test. There's no analyze on the test tables, so it uses > >>>default estimates for number of groups etc. But with clobber cache the > >>>test runs long enough for autoanalyze to kick in and collect stats, so > >>>we generate better estimates which changes the plan. > >>> > >>>I'll get this fixed - explicit analyze and tweaking the data a bit > >>>should do the trick. > >> > >>Looking at the tests that failed, I think we should consider just adding: > >>set enable_sort = off; > >>because several of those tests have very specific amounts of data to > >>ensure we test the transition points around the different modes in the > >>incremental sort node. > >> > > > >Maybe, but I'd much rather tweak the data so that we test both the > >costing and execution part. > > > > I do think this does the trick by increasing the number of rows a bit > (from 100 to 1000) to make the Sort more expensive than Incremental > Sort, while still testing the transition points. > > James, can you verify it that's still true? Those changes all look good to me from a "testing correctness" POV. Also I like that we now test multiple sort methods in the explain output, like: "Sort Methods: top-N heapsort, quicksort". I personally find the `i/100` notation harder to read than a case, but that's just an opinion... Should we change `analyze` to `analyze t` to avoid unnecessarily re-analyzing all other tables in the regression db? James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote: On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote: On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra wrote: On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: hyrax is not too happy with this test: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking EXPLAIN output, but it evidently is. Thanks, I'll investigate. It's not clear to me either what might be causing this, but I guess something must have gone wrong in estimation/planning. OK, I know what's going on - it's a rather embarassing issue in the regression test. There's no analyze on the test tables, so it uses default estimates for number of groups etc. But with clobber cache the test runs long enough for autoanalyze to kick in and collect stats, so we generate better estimates which changes the plan. I'll get this fixed - explicit analyze and tweaking the data a bit should do the trick. Looking at the tests that failed, I think we should consider just adding: set enable_sort = off; because several of those tests have very specific amounts of data to ensure we test the transition points around the different modes in the incremental sort node. Maybe, but I'd much rather tweak the data so that we test both the costing and execution part. I do think this does the trick by increasing the number of rows a bit (from 100 to 1000) to make the Sort more expensive than Incremental Sort, while still testing the transition points. James, can you verify it that's still true? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out index 3072d95643..fb4ab95922 100644 --- a/src/test/regress/expected/incremental_sort.out +++ b/src/test/regress/expected/incremental_sort.out @@ -141,7 +141,8 @@ begin end; $$; -- A single large group tested around each mode transition point. -insert into t(a, b) select 1, i from generate_series(1, 100) n(i); +insert into t(a, b) select i/100 + 1, i + 1 from generate_series(0, 999) n(i); +analyze; explain (costs off) select * from (select * from t order by a) s order by a, b limit 31; QUERY PLAN - @@ -456,7 +457,8 @@ select * from (select * from t order by a) s order by a, b limit 66; delete from t; -- An initial large group followed by a small group. -insert into t(a, b) select (case when i < 50 then 1 else 2 end), i from generate_series(1, 100) n(i); +insert into t(a, b) select i/50 + 1, i + 1 from generate_series(0, 999) n(i); +analyze; explain (costs off) select * from (select * from t order by a) s order by a, b limit 55; QUERY PLAN - @@ -521,7 +523,7 @@ select * from (select * from t order by a) s order by a, b limit 55; 1 | 47 1 | 48 1 | 49 - 2 | 50 + 1 | 50 2 | 51 2 | 52 2 | 53 @@ -538,10 +540,10 @@ select explain_analyze_without_memory('select * from (select * from t order by a Sort Key: t.a, t.b Presorted Key: t.a Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: avg=NNkB peak=NNkB - -> Sort (actual rows=100 loops=1) + -> Sort (actual rows=101 loops=1) Sort Key: t.a Sort Method: quicksort Memory: NNkB - -> Seq Scan on t (actual rows=100 loops=1) + -> Seq Scan on t (actual rows=1000 loops=1) (9 rows) select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * from (select * from t order by a) s order by a, b limit 55')); @@ -584,7 +586,8 @@ select explain_analyze_inc_sort_nodes_verify_invariants('select * from (select * delete from t; -- An initial small group followed by a large group. -insert into t(a, b) select (case when i < 5 then i else 9 end), i from generate_series(1, 100) n(i); +insert into t(a, b) select (case when i < 5 then i else 9 end), i from generate_series(1, 1000) n(i); +analyze; explain (costs off) select * from (select * from t order by a) s order by a, b limit 70; QUERY PLAN - @@ -705,17 +708,17 @@ select * from t left join (select * from (select * from t order by a) v order by rollback; -- Test EXPLAIN ANALYZE with both fullsort and presorted groups. select explain_analyze_without_memory('select * from (select * from t order by a) s order by a, b limit 70'); - explain_analyze_without_memory
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra wrote: > > On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: > >On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: > >>hyrax is not too happy with this test: > >> > >>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 > >> > >>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking > >>EXPLAIN output, but it evidently is. > >> > > > >Thanks, I'll investigate. It's not clear to me either what might be > >causing this, but I guess something must have gone wrong in > >estimation/planning. > > > > OK, I know what's going on - it's a rather embarassing issue in the > regression test. There's no analyze on the test tables, so it uses > default estimates for number of groups etc. But with clobber cache the > test runs long enough for autoanalyze to kick in and collect stats, so > we generate better estimates which changes the plan. > > I'll get this fixed - explicit analyze and tweaking the data a bit > should do the trick. Looking at the tests that failed, I think we should consider just adding: set enable_sort = off; because several of those tests have very specific amounts of data to ensure we test the transition points around the different modes in the incremental sort node. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: hyrax is not too happy with this test: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking EXPLAIN output, but it evidently is. Thanks, I'll investigate. It's not clear to me either what might be causing this, but I guess something must have gone wrong in estimation/planning. OK, I know what's going on - it's a rather embarassing issue in the regression test. There's no analyze on the test tables, so it uses default estimates for number of groups etc. But with clobber cache the test runs long enough for autoanalyze to kick in and collect stats, so we generate better estimates which changes the plan. I'll get this fixed - explicit analyze and tweaking the data a bit should do the trick. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: hyrax is not too happy with this test: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking EXPLAIN output, but it evidently is. Thanks, I'll investigate. It's not clear to me either what might be causing this, but I guess something must have gone wrong in estimation/planning. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
hyrax is not too happy with this test: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-04-07%2004%3A55%3A15 It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking EXPLAIN output, but it evidently is. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Hi, I've pushed the second part of this patch series, adding incremental sort to additional places in the planner. As explained in the commit message (and somewhere in this thread) we've decided to only update some of the places that require sorted input (and do create_sort). This might be overly cautious, I expect we'll add it to more places in the future. As for the remaining part tweaking add_partial_path to also consider startup cost (a bit confusingly 0001 - 0003 in the v54 patchset), I've decided not to push it now and leave it for v14. The add_partial_path change is simple, but I came to the conclusion that the "precheck" function should be modified to follow the same logic - it would be a bit strange if add_partial_path_precheck considered only total cost and add_partial_path considered both startup and total cost. It would not matter for most places because the add_partial_path_precheck is ony used in join planning, but it's still strange. I could have modified the add_partial_path_precheck too, but looking at add_path_precheck we'd probably need to compute required_outer so that we only compare startup_cost when really useful. Or we might simply consider startup_cost every time and leave that up to add_partial_path, but then that would be another difference in behavior. That seems like way too much stuff to rework on the last day of the last commitfest. It does mean we'll fail to generate the cheapest plan in some cases (e.g. with LIMIT, there's an example in [1]) but that's a pre-existing condition, not something introduced by incremental sort. regards [1] https://www.postgresql.org/message-id/20190720132244.3vgg2uynfpxh3me5%40development -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 7, 2020 at 7:58 PM Tomas Vondra wrote: > > On Tue, Apr 07, 2020 at 07:50:26PM -0400, James Coleman wrote: > >On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra > > wrote: > >> > >> On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote: > >> >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > >> >> I've pushed the fist part of this patch series - I've reorganized it a > >> > > >> >I scanned through this again post-commit. Find attached some suggestions. > >> > > >> > >> Thanks. The typo fixes seem clear, except for this bit: > >> > >>* If we've set up either of the sort states yet, we need to reset them. > >>* We could end them and null out the pointers, but there's no reason to > >>* repay the setup cost, and because guard setting up pivot > >> comparator > >>* state similarly, doing so might actually cause a leak. > >> > >> I can't figure out what should be. James, do you recall what this > >> should be? > > > >Yep, it's ExecIncrementalSort. If you look for the block guarded by > >`if (fullsort_state == NULL)` you'll see the call to > >preparePresortedCols(), which sets up the pivot comparator state > >referenced by this comment. > > > > OK, so it should be "... and because ExecIncrementalSort guard ..."? Yes, "because ExecIncrementalSort guards presorted column functions by checking to see if the full sort state has been initialized yet, setting the sort states to null here might cause..." (that's more specific IMO than my original "pivot comparator state...doing so"). James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 07, 2020 at 07:50:26PM -0400, James Coleman wrote: On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra wrote: On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote: >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: >> I've pushed the fist part of this patch series - I've reorganized it a > >I scanned through this again post-commit. Find attached some suggestions. > Thanks. The typo fixes seem clear, except for this bit: * If we've set up either of the sort states yet, we need to reset them. * We could end them and null out the pointers, but there's no reason to * repay the setup cost, and because guard setting up pivot comparator * state similarly, doing so might actually cause a leak. I can't figure out what should be. James, do you recall what this should be? Yep, it's ExecIncrementalSort. If you look for the block guarded by `if (fullsort_state == NULL)` you'll see the call to preparePresortedCols(), which sets up the pivot comparator state referenced by this comment. OK, so it should be "... and because ExecIncrementalSort guard ..."? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 7, 2020 at 7:02 PM Tomas Vondra wrote: > > On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote: > >On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > >> I've pushed the fist part of this patch series - I've reorganized it a > > > >I scanned through this again post-commit. Find attached some suggestions. > > > > Thanks. The typo fixes seem clear, except for this bit: > >* If we've set up either of the sort states yet, we need to reset them. >* We could end them and null out the pointers, but there's no reason to >* repay the setup cost, and because guard setting up pivot comparator >* state similarly, doing so might actually cause a leak. > > I can't figure out what should be. James, do you recall what this > should be? Yep, it's ExecIncrementalSort. If you look for the block guarded by `if (fullsort_state == NULL)` you'll see the call to preparePresortedCols(), which sets up the pivot comparator state referenced by this comment. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 11:25:21PM -0500, Justin Pryzby wrote: On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: I've pushed the fist part of this patch series - I've reorganized it a I scanned through this again post-commit. Find attached some suggestions. Thanks. The typo fixes seem clear, except for this bit: * If we've set up either of the sort states yet, we need to reset them. * We could end them and null out the pointers, but there's no reason to * repay the setup cost, and because guard setting up pivot comparator * state similarly, doing so might actually cause a leak. I can't figure out what should be. James, do you recall what this should be? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby wrote: > > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > > > I've pushed the fist part of this patch series - I've reorganized it a > > > > I scanned through this again post-commit. Find attached some suggestions. > > > > Shouldn't non-text explain output always show both disk *and* mem, including > > zeros ? > > Could you give more context on this? Is there a standard to follow? > Regular sort nodes only ever report one type, so there's not a good > parallel there. Right, I'm not sure either, since it seems to be a new case. Maybe Tomas has a strong intuition. See at least the commit messages here: 3ec20c7091e97a554e7447ac2b7f4ed795631395 7d91b604d9b5d6ec8c19c57a9ffd2f27129cdd94 8ebb69f85445177575684a0ba5cfedda8d840a91 Maybe this one suggests that it should *not* be present unconditionally, but only when that sort type is used? 4b234fd8bf21cd6f5ff44f1f1c613bf40860998d Another thought: is checking if bytes>0 really a good way to determine if a sort type was used ? It seems like checking a bit or a pointer would be better. I guess a size of 0 is unlikely, and it's ok at least in text mode. if (groupInfo->maxMemorySpaceUsed > 0) On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: > > And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted ... > I read through that subthread, and the ending seemed to be Peter > wanting things to be unified. Was there a conclusion beyond that? This discussion is ongoing. I think let's wait until that's settled before addressing this more complex and even newer case. We can add "explain, two spaces and equals vs colon" to the "Open items" list if need be - I hope the discussion will not delay the release. -- Justin
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 07, 2020 at 08:40:30AM -0400, James Coleman wrote: On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby wrote: On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > I've pushed the fist part of this patch series - I've reorganized it a I scanned through this again post-commit. Find attached some suggestions. Shouldn't non-text explain output always show both disk *and* mem, including zeros ? Could you give more context on this? Is there a standard to follow? Regular sort nodes only ever report one type, so there's not a good parallel there. Should "Pre-sorted Groups:" be on a separate line ? | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB I'd originally had that, but Tomas wanted it to be more compact. It's easy to adjust though if the consensus changes on that. I'm OK with changing the format if there's a consensus. The current format seemed better to me, but I'm not particularly attached to it. And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted Groups"? I think you should maybe do that instead of the "semicolon separator". I think "two spaces" makes sense, since the units are different, similar to hash buckets and normal sort node. "Buckets: %d Batches: %d Memory Usage: %ldkB\n", appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n", Note, I made a similar comment regarding two spaces for explain(WAL) here: https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com And Peter E seemed to dislike that, here: https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com I read through that subthread, and the ending seemed to be Peter wanting things to be unified. Was there a conclusion beyond that? Yeah, I don't think there was a clear consensus :-( Also, you're showing: ExplainPropertyInteger("Maximum Sort Space Used", "kB", groupInfo->maxMemorySpaceUsed, es); But in show_hash_info() and show_hashagg_info(), and in your own text output, that's called "Peak": ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es); ExplainPropertyInteger("Peak Memory Usage", "kB", spacePeakKb, es); Yes, that's a miss and should be fixed. Will fix. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Apr 7, 2020 at 12:25 AM Justin Pryzby wrote: > > On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > > I've pushed the fist part of this patch series - I've reorganized it a > > I scanned through this again post-commit. Find attached some suggestions. > > Shouldn't non-text explain output always show both disk *and* mem, including > zeros ? Could you give more context on this? Is there a standard to follow? Regular sort nodes only ever report one type, so there's not a good parallel there. > Should "Pre-sorted Groups:" be on a separate line ? > | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB > Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB I'd originally had that, but Tomas wanted it to be more compact. It's easy to adjust though if the consensus changes on that. > And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted > Groups"? I think you should maybe do that instead of the "semicolon > separator". I think "two spaces" makes sense, since the units are different, > similar to hash buckets and normal sort node. > > "Buckets: %d Batches: %d Memory Usage: %ldkB\n", > appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n", > > Note, I made a similar comment regarding two spaces for explain(WAL) here: > https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com > > And Peter E seemed to dislike that, here: > https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com I read through that subthread, and the ending seemed to be Peter wanting things to be unified. Was there a conclusion beyond that? > Also, you're showing: > ExplainPropertyInteger("Maximum Sort Space Used", "kB", > groupInfo->maxMemorySpaceUsed, es); > > But in show_hash_info() and show_hashagg_info(), and in your own text output, > that's called "Peak": > ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es); > ExplainPropertyInteger("Peak Memory Usage", "kB", > spacePeakKb, es); Yes, that's a miss and should be fixed. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: > I've pushed the fist part of this patch series - I've reorganized it a I scanned through this again post-commit. Find attached some suggestions. Shouldn't non-text explain output always show both disk *and* mem, including zeros ? Should "Pre-sorted Groups:" be on a separate line ? | Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB Pre-sorted Groups: 1 Sort Method: quicksort Memory: avg=30kB peak=30kB And, should it use two spaces before "Sort Method", "Memory" and "Pre-sorted Groups"? I think you should maybe do that instead of the "semicolon separator". I think "two spaces" makes sense, since the units are different, similar to hash buckets and normal sort node. "Buckets: %d Batches: %d Memory Usage: %ldkB\n", appendStringInfo(es->str, "Sort Method: %s %s: %ldkB\n", Note, I made a similar comment regarding two spaces for explain(WAL) here: https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com And Peter E seemed to dislike that, here: https://www.postgresql.org/message-id/ef8c966f-e50a-c583-7b1e-85de6f4ca0d3%402ndquadrant.com Also, you're showing: ExplainPropertyInteger("Maximum Sort Space Used", "kB", groupInfo->maxMemorySpaceUsed, es); But in show_hash_info() and show_hashagg_info(), and in your own text output, that's called "Peak": ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es); ExplainPropertyInteger("Peak Memory Usage", "kB", spacePeakKb, es); -- Justin >From e26f2cc842792fc3c0dd4b4b97c0996d450c6dd7 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 6 Apr 2020 17:37:31 -0500 Subject: [PATCH v1] comment typos and others: Incremental Sort commit d2d8a229bc58a2014dce1c7a4fcdb6c5ab9fb8da Author: Tomas Vondra --- doc/src/sgml/perform.sgml | 2 +- src/backend/commands/explain.c | 17 +++--- src/backend/executor/nodeIncrementalSort.c | 26 +++--- src/backend/utils/sort/tuplesort.c | 10 - src/include/utils/tuplesort.h | 2 +- 5 files changed, 28 insertions(+), 29 deletions(-) diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 0dfc3e80e2..f448abd073 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -311,7 +311,7 @@ EXPLAIN SELECT * FROM tenk1 ORDER BY unique1; -> Seq Scan on tenk1 (cost=0.00..445.00 rows=1 width=244) -If the a part of the plan guarantess an ordering on a prefix of the +If a part of the plan guarantees an ordering on a prefix of the required sort keys, then the planner may instead decide to use an incremental sort step: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index baaa5817af..ca4fe3307d 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2532,7 +2532,7 @@ show_sort_group_keys(PlanState *planstate, const char *qlabel, ExplainPropertyList(qlabel, result, es); if (nPresortedKeys > 0) - ExplainPropertyList("Presorted Key", resultPresorted, es); + ExplainPropertyList("Pre-sorted Key", resultPresorted, es); } /* @@ -2829,7 +2829,6 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, ExplainPropertyList("Sort Methods Used", methodNames, es); - if (groupInfo->maxMemorySpaceUsed > 0) { long avgSpace = groupInfo->totalMemorySpaceUsed / groupInfo->groupCount; const char *spaceTypeName; @@ -2841,12 +2840,12 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, ExplainOpenGroup("Sort Space", memoryName.data, true, es); ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es); - ExplainPropertyInteger("Maximum Sort Space Used", "kB", + ExplainPropertyInteger("Peak Sort Space Used", "kB", groupInfo->maxMemorySpaceUsed, es); ExplainCloseGroup("Sort Spaces", memoryName.data, true, es); } - if (groupInfo->maxDiskSpaceUsed > 0) + { long avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount; const char *spaceTypeName; @@ -2858,7 +2857,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, ExplainOpenGroup("Sort Space", diskName.data, true, es); ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es); - ExplainPropertyInteger("Maximum Sort Space Used", "kB", + ExplainPropertyInteger("Peak Sort Space Used", "kB", groupInfo->maxDiskSpaceUsed, es); ExplainCloseGroup("Sort Spaces", diskName.data, true, es); @@ -2869,7 +2868,7 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, } /* - * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node + * If it's EXPLAIN ANALYZE, show tuplesort stats for an incremental sort node */ static void
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 11:00:37PM -0400, Tom Lane wrote: Tomas Vondra writes: I came to the same conclusion (that the change in TuplesortMethod definiton is the culprit) a while ago and was about to push a fix that initialized it correctly in ExecSortInitializeDSM. But I agree reverting it back to the old definition is probably better. Yeah, for the moment. James would like to not have SORT_TYPE_STILL_IN_PROGRESS be part of the enum at all, I think, and I can see his point --- but then we need some out-of-band representation of "worker not done", so I'm not sure there'll be any net reduction of cruft. Anyway that can be dealt with after we have a stable buildfarm. Agreed. Note also that there's a separate comment-only patch in that shouldn't be forgotten about. OK, I'll take care of that tomorrow. I have two smaller patches to commit in the incremental sort patchset, so I'll add it to that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > I came to the same conclusion (that the change in TuplesortMethod > definiton is the culprit) a while ago and was about to push a fix that > initialized it correctly in ExecSortInitializeDSM. But I agree reverting > it back to the old definition is probably better. Yeah, for the moment. James would like to not have SORT_TYPE_STILL_IN_PROGRESS be part of the enum at all, I think, and I can see his point --- but then we need some out-of-band representation of "worker not done", so I'm not sure there'll be any net reduction of cruft. Anyway that can be dealt with after we have a stable buildfarm. Note also that there's a separate comment-only patch in that shouldn't be forgotten about. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 10:19:41PM -0400, Tom Lane wrote: James Coleman writes: Fair enough. Unsure if Tomas is still online to comment and/or push, but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an initial fix. I'm guessing he went to bed, so I'll push a fix in a moment. The patch has survived enough test cycles here now to make me moderately confident that it fixes the issue. Nope, how I could I sleep with half of the buildfarm still red? I came to the same conclusion (that the change in TuplesortMethod definiton is the culprit) a while ago and was about to push a fix that initialized it correctly in ExecSortInitializeDSM. But I agree reverting it back to the old definition is probably better. Thanks! -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > Fair enough. Unsure if Tomas is still online to comment and/or push, > but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an > initial fix. I'm guessing he went to bed, so I'll push a fix in a moment. The patch has survived enough test cycles here now to make me moderately confident that it fixes the issue. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 10:09 PM Tom Lane wrote: > > James Coleman writes: > > On Mon, Apr 6, 2020 at 9:46 PM Tom Lane wrote: > >> I think the correct fix is to change the enum declaration. > > > Hmm. I don't actually really like that, because it means the value > > here isn't actually semantically correct. That is, the sort type is > > not "in progress"; it's "we never started a sort at all". > > Well, yeah, but that pre-dated this patch, and right now is no > time to improve it; we can debate such fine points at more leisure > once the buildfarm isn't broken. Fair enough. Unsure if Tomas is still online to comment and/or push, but reverting SORT_TYPE_STILL_IN_PROGRESS back to 0 works for me as an initial fix. > Obviously the comment needs fixed... The one in show_short_info? I can work on that (and the other proposed cleanup above) with Tomas tomorrow or later. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > On Mon, Apr 6, 2020 at 9:46 PM Tom Lane wrote: >> I think the correct fix is to change the enum declaration. > Hmm. I don't actually really like that, because it means the value > here isn't actually semantically correct. That is, the sort type is > not "in progress"; it's "we never started a sort at all". Well, yeah, but that pre-dated this patch, and right now is no time to improve it; we can debate such fine points at more leisure once the buildfarm isn't broken. Obviously the comment needs fixed... regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 9:46 PM Tom Lane wrote: > > Tomas Vondra writes: > > I don't know, I've tried running the tests on a number of machines, > > similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked > > everywhere while the failures seem consistent. > > On my machine, it reproduces about one time in six with > force_parallel_mode = regress. It seems possible given your > results that reducing max_parallel_workers would make it more > likely, but I've not tried that. > > What I'm seeing, after adding some debug printouts, is that sortMethod is > frequently zero when we reach the EXPLAIN output for a worker. In many of > the tests this happens even though there is no visible failure, because > we've got a filter function hiding the output :-( > > So I concur with James' conclusion that the existing code is relying on > sortMethod initializing to zeroes, and that we did the wrong thing by > trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation. > I do not like his patch though, particularly not the type pun with NULL. Sentinel and NULL? I hadn't caught that at all. > I think the correct fix is to change the enum declaration. Hmm. I don't actually really like that, because it means the value here isn't actually semantically correct. That is, the sort type is not "in progress"; it's "we never started a sort at all". I don't really love the conflating of those things that the old enum declaration had (even it'd had a helpful comment). It seems to me that we should make "we don't have a type" and "we have a type" distinct. We could add a new enum value SORT_TYPE_UNINITIALIZED or similar though. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > I don't know, I've tried running the tests on a number of machines, > similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked > everywhere while the failures seem consistent. On my machine, it reproduces about one time in six with force_parallel_mode = regress. It seems possible given your results that reducing max_parallel_workers would make it more likely, but I've not tried that. What I'm seeing, after adding some debug printouts, is that sortMethod is frequently zero when we reach the EXPLAIN output for a worker. In many of the tests this happens even though there is no visible failure, because we've got a filter function hiding the output :-( So I concur with James' conclusion that the existing code is relying on sortMethod initializing to zeroes, and that we did the wrong thing by trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation. I do not like his patch though, particularly not the type pun with NULL. I think the correct fix is to change the enum declaration. I know it's darn late where you are, do you want me to change it? regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 08:42:13PM -0400, Tom Lane wrote: Tomas Vondra writes: It doesn't seem to be particularly platform-specific, but I've been unable to reproduce it so far. It seems on older gcc versions, though. It's looking kind of like an uninitialized-memory problem. Note the latest from spurfowl, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl=2020-04-07%2000%3A15%3A05 which got through "make check" and then failed during pg_upgrade's repetition of the test. Similarly on rhinoceros. So there's definitely instability there even on one machine. Perhaps something to do with unexpected cache flushes?? I don't know, I've tried running the tests on a number of machines, similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked everywhere while the failures seem consistent. I've been able to reproduce these failures (same symptoms) by making sure the worker (implied by force_parallel_mode=regress) won't start. set max_parallel_workers = 0; set force_parallel_mode = regress; triggers exactly those failures for me (at least during make check, I haven't tried pg_upgrade tests etc.). So my theory is that we fail to start parallel workers on those machines. It's not clear to me why would it be limited to some machines and why would it be correlated to the incremental sort? I don't think those machines have lower number of parallel workers, no? But maybe incremental sort allowed using more parallel queries for more queries, and we simply run out of parallel workers that way? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 07:27:19PM -0400, James Coleman wrote: On Mon, Apr 6, 2020 at 7:09 PM James Coleman wrote: On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra wrote: > > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > > wrote: > >> > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >> >>On 2020-Apr-06, Tom Lane wrote: > >> >> > >> >>>Locally, things pass without force_parallel_mode, but turning it on > >> >>>produces failures that look similar to rhinoceros's (didn't examine > >> >>>other BF members). > >> >> > >> >>FWIW I looked at the eight failures there were about fifteen minutes ago > >> >>and they were all identical. I can confirm that, in my laptop, the > >> >>tests work without that GUC, and fail in exactly that way with it. > >> >> > >> > > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too > >> >soon. I'll push a fix in a minute. > >> > > >> > >> OK, I've pushed a fix - this should make the buildfarm happy again. > >> > >> It however seems to me a bit more needs to be done. The fix makes > >> show_incremental_sort_info closer to show_sort_info, but not entirely > >> because IncrementalSortState does not have sort_Done flag so it still > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > >> before, but not having that flag seems a bit weird to me. > >> > >> It also seems possibly incorrect - we may end up with > >> > >>fullsortGroupInfo->groupCount == 0 > >>prefixsortGroupInfo->groupCount > 0 > >> > >> but we won't print anything. > > > >This shouldn't ever be possible, because the only way we get any > >prefix groups at all is if we've already sorted a full sort group > >during the mode transition. > > > >> James, any opinion on this? I'd say we should restore the sort_Done flag > >> and make it work as in plain Sort. Or some comment explaining why > >> depending on the counts is OK (assuming it is). > > > >There's previous email traffic on this thread about that (I can look > >it up later this evening), but the short of it is that I believe that > >relying on the group count is actually more correct than a sort_Done > >flag in the case of incremental sort (in contrast to regular sort). > > > > OK. Maybe we should add a comment to explain.c saying it's OK. > > I've pushed a fix for failures due to different planned workers (in the > test I added to show changes due to add_partial_path tweaks). > > It seems we're not out of the woods yet, though. rhinoceros and > sidewinder failed with something like this: > > Sort Method: quicksort Memory: NNkB > + Sort Method: unknown Disk: NNkB > > Would you mind investigating at it? I assume that means those build farm members run with very low work_mem? Is it an acceptable fix to adjust work_mem up a bit just for these tests? Or is that bad practice and these are to expose issues with changing into disk sort mode? On rhinoceros I see: == pgsql.build/src/test/regress/regression.diffs === diff -U3 /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out --- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out 2020-03-14 10:37:49.156761104 -0700 +++ /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out 2020-04-06 16:01:13.766798059 -0700 @@ -1328,8 +1328,9 @@ -> Sort (actual rows=3 loops=1) Sort Key: sq_limit.c1, sq_limit.pk Sort Method: top-N heapsort Memory: xxx + Sort Method: unknown Disk: 0kB -> Seq Scan on sq_limit (actual rows=8 loops=1) -(6 rows) +(7 rows) Same on sidewinder. Given the 0kB I'm not sure this is *just* a work_mem thing, though that's still something I'm curious to know about, and it's still part of the "problem" here. I'm investigating further. I don't see how could this be caused by low work_mem value, really. What I think it happening is that when executed with force_parallel_mode = regress we run this as if in a parallel worker, but then it gets somehow confused and either (leader + worker) or two worker stats, or something like that. I don't know why it's happening, though, or why would it be triggered by the incremental sort patch ... Actually, I just managed to trigger exactly this - the trick is that we plan for certain number of workers, but then fail to start some. So for example like this: create table t (a int, b int, c int); insert into t select mod(i,10), mod(i,10), i from generate_series(1,100) s(i); set force_parallel_mode = regress; set max_parallel_workers = 0; explain (costs off, analyze) select * from t order by a,b;
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 7:31 PM Tomas Vondra wrote: > > On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote: > >On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra > > wrote: > >> > >> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: > >> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > >> > wrote: > >> >> > >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > >> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >> >> >>On 2020-Apr-06, Tom Lane wrote: > >> >> >> > >> >> >>>Locally, things pass without force_parallel_mode, but turning it on > >> >> >>>produces failures that look similar to rhinoceros's (didn't examine > >> >> >>>other BF members). > >> >> >> > >> >> >>FWIW I looked at the eight failures there were about fifteen minutes > >> >> >>ago > >> >> >>and they were all identical. I can confirm that, in my laptop, the > >> >> >>tests work without that GUC, and fail in exactly that way with it. > >> >> >> > >> >> > > >> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns > >> >> >too > >> >> >soon. I'll push a fix in a minute. > >> >> > > >> >> > >> >> OK, I've pushed a fix - this should make the buildfarm happy again. > >> >> > >> >> It however seems to me a bit more needs to be done. The fix makes > >> >> show_incremental_sort_info closer to show_sort_info, but not entirely > >> >> because IncrementalSortState does not have sort_Done flag so it still > >> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > >> >> before, but not having that flag seems a bit weird to me. > >> >> > >> >> It also seems possibly incorrect - we may end up with > >> >> > >> >>fullsortGroupInfo->groupCount == 0 > >> >>prefixsortGroupInfo->groupCount > 0 > >> >> > >> >> but we won't print anything. > >> > > >> >This shouldn't ever be possible, because the only way we get any > >> >prefix groups at all is if we've already sorted a full sort group > >> >during the mode transition. > >> > > >> >> James, any opinion on this? I'd say we should restore the sort_Done flag > >> >> and make it work as in plain Sort. Or some comment explaining why > >> >> depending on the counts is OK (assuming it is). > >> > > >> >There's previous email traffic on this thread about that (I can look > >> >it up later this evening), but the short of it is that I believe that > >> >relying on the group count is actually more correct than a sort_Done > >> >flag in the case of incremental sort (in contrast to regular sort). > >> > > >> > >> OK. Maybe we should add a comment to explain.c saying it's OK. > >> > >> I've pushed a fix for failures due to different planned workers (in the > >> test I added to show changes due to add_partial_path tweaks). > >> > >> It seems we're not out of the woods yet, though. rhinoceros and > >> sidewinder failed with something like this: > >> > >> Sort Method: quicksort Memory: NNkB > >> + Sort Method: unknown Disk: NNkB > >> > >> Would you mind investigating at it? > > > >I assume that means those build farm members run with very low > >work_mem? Is it an acceptable fix to adjust work_mem up a bit just for > >these tests? Or is that bad practice and these are to expose issues > >with changing into disk sort mode? > > > > I don't think so - I don't see any work_mem changes in the config - see > the extra_config at the beginning of the page with details: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2020-04-06%2023%3A00%3A16 > > Moreover, this seems to be in regular Sort, not Incremental Sort and it > very much seems like it gets confused to print a worker info because the > only way for Sort to print two "Sort Method" lines seems to be to enter > either both > >if (sortstate->sort_Done && sortstate->tuplesortstate != NULL) >{ > ... print leader info ... >} > >and > >if (sortstate->shared_info != NULL) >{ > for (n = 0; n < sortstate->shared_info->num_workers; n++) > { >... print worker info ... > } >} > > or maybe there are two workers? It's strange ... > > > It doesn't seem to be particularly platform-specific, but I've been > unable to reproduce it so far. It seems on older gcc versions, though. I haven't been able to reproduce it, but I'm 99% confident this will fix it: -if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS) +if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS +|| sinstrument->sortMethod == NULL) continue; /* ignore any unfilled slots */ Earlier we'd had this discussion about why SORT_TYPE_STILL_IN_PROGRESS was explicitly set to 0 in the enum declaration. Since there was no comment, we changed that, but here I believe that show_sort_info was relying on that as an indicator that a worker didn't actually do any work (since the DSM for the sort node gets set to all zeros, this would work). I'm not sure if the
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > It doesn't seem to be particularly platform-specific, but I've been > unable to reproduce it so far. It seems on older gcc versions, though. It's looking kind of like an uninitialized-memory problem. Note the latest from spurfowl, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl=2020-04-07%2000%3A15%3A05 which got through "make check" and then failed during pg_upgrade's repetition of the test. Similarly on rhinoceros. So there's definitely instability there even on one machine. Perhaps something to do with unexpected cache flushes?? regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote: On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra wrote: On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > wrote: >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: >> >>On 2020-Apr-06, Tom Lane wrote: >> >> >> >>>Locally, things pass without force_parallel_mode, but turning it on >> >>>produces failures that look similar to rhinoceros's (didn't examine >> >>>other BF members). >> >> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago >> >>and they were all identical. I can confirm that, in my laptop, the >> >>tests work without that GUC, and fail in exactly that way with it. >> >> >> > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too >> >soon. I'll push a fix in a minute. >> > >> >> OK, I've pushed a fix - this should make the buildfarm happy again. >> >> It however seems to me a bit more needs to be done. The fix makes >> show_incremental_sort_info closer to show_sort_info, but not entirely >> because IncrementalSortState does not have sort_Done flag so it still >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that >> before, but not having that flag seems a bit weird to me. >> >> It also seems possibly incorrect - we may end up with >> >>fullsortGroupInfo->groupCount == 0 >>prefixsortGroupInfo->groupCount > 0 >> >> but we won't print anything. > >This shouldn't ever be possible, because the only way we get any >prefix groups at all is if we've already sorted a full sort group >during the mode transition. > >> James, any opinion on this? I'd say we should restore the sort_Done flag >> and make it work as in plain Sort. Or some comment explaining why >> depending on the counts is OK (assuming it is). > >There's previous email traffic on this thread about that (I can look >it up later this evening), but the short of it is that I believe that >relying on the group count is actually more correct than a sort_Done >flag in the case of incremental sort (in contrast to regular sort). > OK. Maybe we should add a comment to explain.c saying it's OK. I've pushed a fix for failures due to different planned workers (in the test I added to show changes due to add_partial_path tweaks). It seems we're not out of the woods yet, though. rhinoceros and sidewinder failed with something like this: Sort Method: quicksort Memory: NNkB + Sort Method: unknown Disk: NNkB Would you mind investigating at it? I assume that means those build farm members run with very low work_mem? Is it an acceptable fix to adjust work_mem up a bit just for these tests? Or is that bad practice and these are to expose issues with changing into disk sort mode? I don't think so - I don't see any work_mem changes in the config - see the extra_config at the beginning of the page with details: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2020-04-06%2023%3A00%3A16 Moreover, this seems to be in regular Sort, not Incremental Sort and it very much seems like it gets confused to print a worker info because the only way for Sort to print two "Sort Method" lines seems to be to enter either both if (sortstate->sort_Done && sortstate->tuplesortstate != NULL) { ... print leader info ... } and if (sortstate->shared_info != NULL) { for (n = 0; n < sortstate->shared_info->num_workers; n++) { ... print worker info ... } } or maybe there are two workers? It's strange ... It doesn't seem to be particularly platform-specific, but I've been unable to reproduce it so far. It seems on older gcc versions, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 7:09 PM James Coleman wrote: > > On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra > wrote: > > > > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: > > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > > > wrote: > > >> > > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > > >> >>On 2020-Apr-06, Tom Lane wrote: > > >> >> > > >> >>>Locally, things pass without force_parallel_mode, but turning it on > > >> >>>produces failures that look similar to rhinoceros's (didn't examine > > >> >>>other BF members). > > >> >> > > >> >>FWIW I looked at the eight failures there were about fifteen minutes > > >> >>ago > > >> >>and they were all identical. I can confirm that, in my laptop, the > > >> >>tests work without that GUC, and fail in exactly that way with it. > > >> >> > > >> > > > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too > > >> >soon. I'll push a fix in a minute. > > >> > > > >> > > >> OK, I've pushed a fix - this should make the buildfarm happy again. > > >> > > >> It however seems to me a bit more needs to be done. The fix makes > > >> show_incremental_sort_info closer to show_sort_info, but not entirely > > >> because IncrementalSortState does not have sort_Done flag so it still > > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > > >> before, but not having that flag seems a bit weird to me. > > >> > > >> It also seems possibly incorrect - we may end up with > > >> > > >>fullsortGroupInfo->groupCount == 0 > > >>prefixsortGroupInfo->groupCount > 0 > > >> > > >> but we won't print anything. > > > > > >This shouldn't ever be possible, because the only way we get any > > >prefix groups at all is if we've already sorted a full sort group > > >during the mode transition. > > > > > >> James, any opinion on this? I'd say we should restore the sort_Done flag > > >> and make it work as in plain Sort. Or some comment explaining why > > >> depending on the counts is OK (assuming it is). > > > > > >There's previous email traffic on this thread about that (I can look > > >it up later this evening), but the short of it is that I believe that > > >relying on the group count is actually more correct than a sort_Done > > >flag in the case of incremental sort (in contrast to regular sort). > > > > > > > OK. Maybe we should add a comment to explain.c saying it's OK. > > > > I've pushed a fix for failures due to different planned workers (in the > > test I added to show changes due to add_partial_path tweaks). > > > > It seems we're not out of the woods yet, though. rhinoceros and > > sidewinder failed with something like this: > > > > Sort Method: quicksort Memory: NNkB > > + Sort Method: unknown Disk: NNkB > > > > Would you mind investigating at it? > > I assume that means those build farm members run with very low > work_mem? Is it an acceptable fix to adjust work_mem up a bit just for > these tests? Or is that bad practice and these are to expose issues > with changing into disk sort mode? On rhinoceros I see: == pgsql.build/src/test/regress/regression.diffs === diff -U3 /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out --- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out 2020-03-14 10:37:49.156761104 -0700 +++ /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out 2020-04-06 16:01:13.766798059 -0700 @@ -1328,8 +1328,9 @@ -> Sort (actual rows=3 loops=1) Sort Key: sq_limit.c1, sq_limit.pk Sort Method: top-N heapsort Memory: xxx + Sort Method: unknown Disk: 0kB -> Seq Scan on sq_limit (actual rows=8 loops=1) -(6 rows) +(7 rows) Same on sidewinder. Given the 0kB I'm not sure this is *just* a work_mem thing, though that's still something I'm curious to know about, and it's still part of the "problem" here. I'm investigating further. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra wrote: > > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > > wrote: > >> > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >> >>On 2020-Apr-06, Tom Lane wrote: > >> >> > >> >>>Locally, things pass without force_parallel_mode, but turning it on > >> >>>produces failures that look similar to rhinoceros's (didn't examine > >> >>>other BF members). > >> >> > >> >>FWIW I looked at the eight failures there were about fifteen minutes ago > >> >>and they were all identical. I can confirm that, in my laptop, the > >> >>tests work without that GUC, and fail in exactly that way with it. > >> >> > >> > > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too > >> >soon. I'll push a fix in a minute. > >> > > >> > >> OK, I've pushed a fix - this should make the buildfarm happy again. > >> > >> It however seems to me a bit more needs to be done. The fix makes > >> show_incremental_sort_info closer to show_sort_info, but not entirely > >> because IncrementalSortState does not have sort_Done flag so it still > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > >> before, but not having that flag seems a bit weird to me. > >> > >> It also seems possibly incorrect - we may end up with > >> > >>fullsortGroupInfo->groupCount == 0 > >>prefixsortGroupInfo->groupCount > 0 > >> > >> but we won't print anything. > > > >This shouldn't ever be possible, because the only way we get any > >prefix groups at all is if we've already sorted a full sort group > >during the mode transition. > > > >> James, any opinion on this? I'd say we should restore the sort_Done flag > >> and make it work as in plain Sort. Or some comment explaining why > >> depending on the counts is OK (assuming it is). > > > >There's previous email traffic on this thread about that (I can look > >it up later this evening), but the short of it is that I believe that > >relying on the group count is actually more correct than a sort_Done > >flag in the case of incremental sort (in contrast to regular sort). > > > > OK. Maybe we should add a comment to explain.c saying it's OK. > > I've pushed a fix for failures due to different planned workers (in the > test I added to show changes due to add_partial_path tweaks). > > It seems we're not out of the woods yet, though. rhinoceros and > sidewinder failed with something like this: > > Sort Method: quicksort Memory: NNkB > + Sort Method: unknown Disk: NNkB > > Would you mind investigating at it? I assume that means those build farm members run with very low work_mem? Is it an acceptable fix to adjust work_mem up a bit just for these tests? Or is that bad practice and these are to expose issues with changing into disk sort mode? James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 06:34:04PM -0400, Tom Lane wrote: Tomas Vondra writes: On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote: Well, it's *less* unhappy. thorntail is showing that the number of workers field is not stable; that will need to be masked. Yeah, I've already pushed a fix for that. But there seems to be another failure in th explain output. Looking. I'm kind of unimpressed with that fix, because it guarantees that if there's any problem with more than 2 workers, this test will never find it. I think you should do what I said and arrange to replace the number-of-workers output with "N". I can do that, but isn't this how every other regression test does it? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote: >> Well, it's *less* unhappy. thorntail is showing that the number of >> workers field is not stable; that will need to be masked. > Yeah, I've already pushed a fix for that. But there seems to be another > failure in th explain output. Looking. I'm kind of unimpressed with that fix, because it guarantees that if there's any problem with more than 2 workers, this test will never find it. I think you should do what I said and arrange to replace the number-of-workers output with "N". regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 05:51:43PM -0400, Tom Lane wrote: Tomas Vondra writes: OK, I've pushed a fix - this should make the buildfarm happy again. Well, it's *less* unhappy. thorntail is showing that the number of workers field is not stable; that will need to be masked. Yeah, I've already pushed a fix for that. But there seems to be another failure in th explain output. Looking. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra wrote: On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: >>On 2020-Apr-06, Tom Lane wrote: >> >>>Locally, things pass without force_parallel_mode, but turning it on >>>produces failures that look similar to rhinoceros's (didn't examine >>>other BF members). >> >>FWIW I looked at the eight failures there were about fifteen minutes ago >>and they were all identical. I can confirm that, in my laptop, the >>tests work without that GUC, and fail in exactly that way with it. >> > >Yes, there's a thinko in show_incremental_sort_info() and it returns too >soon. I'll push a fix in a minute. > OK, I've pushed a fix - this should make the buildfarm happy again. It however seems to me a bit more needs to be done. The fix makes show_incremental_sort_info closer to show_sort_info, but not entirely because IncrementalSortState does not have sort_Done flag so it still depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that before, but not having that flag seems a bit weird to me. It also seems possibly incorrect - we may end up with fullsortGroupInfo->groupCount == 0 prefixsortGroupInfo->groupCount > 0 but we won't print anything. This shouldn't ever be possible, because the only way we get any prefix groups at all is if we've already sorted a full sort group during the mode transition. James, any opinion on this? I'd say we should restore the sort_Done flag and make it work as in plain Sort. Or some comment explaining why depending on the counts is OK (assuming it is). There's previous email traffic on this thread about that (I can look it up later this evening), but the short of it is that I believe that relying on the group count is actually more correct than a sort_Done flag in the case of incremental sort (in contrast to regular sort). OK. Maybe we should add a comment to explain.c saying it's OK. I've pushed a fix for failures due to different planned workers (in the test I added to show changes due to add_partial_path tweaks). It seems we're not out of the woods yet, though. rhinoceros and sidewinder failed with something like this: Sort Method: quicksort Memory: NNkB + Sort Method: unknown Disk: NNkB Would you mind investigating at it? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > OK, I've pushed a fix - this should make the buildfarm happy again. Well, it's *less* unhappy. thorntail is showing that the number of workers field is not stable; that will need to be masked. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra wrote: > > On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >>On 2020-Apr-06, Tom Lane wrote: > >> > >>>Locally, things pass without force_parallel_mode, but turning it on > >>>produces failures that look similar to rhinoceros's (didn't examine > >>>other BF members). > >> > >>FWIW I looked at the eight failures there were about fifteen minutes ago > >>and they were all identical. I can confirm that, in my laptop, the > >>tests work without that GUC, and fail in exactly that way with it. > >> > > > >Yes, there's a thinko in show_incremental_sort_info() and it returns too > >soon. I'll push a fix in a minute. > > > > OK, I've pushed a fix - this should make the buildfarm happy again. > > It however seems to me a bit more needs to be done. The fix makes > show_incremental_sort_info closer to show_sort_info, but not entirely > because IncrementalSortState does not have sort_Done flag so it still > depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > before, but not having that flag seems a bit weird to me. > > It also seems possibly incorrect - we may end up with > >fullsortGroupInfo->groupCount == 0 >prefixsortGroupInfo->groupCount > 0 > > but we won't print anything. This shouldn't ever be possible, because the only way we get any prefix groups at all is if we've already sorted a full sort group during the mode transition. > James, any opinion on this? I'd say we should restore the sort_Done flag > and make it work as in plain Sort. Or some comment explaining why > depending on the counts is OK (assuming it is). There's previous email traffic on this thread about that (I can look it up later this evening), but the short of it is that I believe that relying on the group count is actually more correct than a sort_Done flag in the case of incremental sort (in contrast to regular sort). James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: On 2020-Apr-06, Tom Lane wrote: Locally, things pass without force_parallel_mode, but turning it on produces failures that look similar to rhinoceros's (didn't examine other BF members). FWIW I looked at the eight failures there were about fifteen minutes ago and they were all identical. I can confirm that, in my laptop, the tests work without that GUC, and fail in exactly that way with it. Yes, there's a thinko in show_incremental_sort_info() and it returns too soon. I'll push a fix in a minute. OK, I've pushed a fix - this should make the buildfarm happy again. It however seems to me a bit more needs to be done. The fix makes show_incremental_sort_info closer to show_sort_info, but not entirely because IncrementalSortState does not have sort_Done flag so it still depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that before, but not having that flag seems a bit weird to me. It also seems possibly incorrect - we may end up with fullsortGroupInfo->groupCount == 0 prefixsortGroupInfo->groupCount > 0 but we won't print anything. James, any opinion on this? I'd say we should restore the sort_Done flag and make it work as in plain Sort. Or some comment explaining why depending on the counts is OK (assuming it is). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 5:22 PM James Coleman wrote: > > On Mon, Apr 6, 2020 at 5:20 PM James Coleman wrote: > > > > On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra > > wrote: > > > > > > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > > > >On 2020-Apr-06, Tom Lane wrote: > > > > > > > >> Locally, things pass without force_parallel_mode, but turning it on > > > >> produces failures that look similar to rhinoceros's (didn't examine > > > >> other BF members). > > > > > > > >FWIW I looked at the eight failures there were about fifteen minutes ago > > > >and they were all identical. I can confirm that, in my laptop, the > > > >tests work without that GUC, and fail in exactly that way with it. > > > > > > > > > > Yes, there's a thinko in show_incremental_sort_info() and it returns too > > > soon. I'll push a fix in a minute. > > > > I'm stepping through this in a debugger; is what you're considering > > that the for loop through the workers is off by one? > > Oh, nevermind, misread that. > > Looks like if the leader doesn't participate, then we don't show > details for workers. > > Tomas: Do you already have a patch? If not, I can work one up. Well, already have it, so I'll send it just in case. James diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 62c86ecdc5..c31f3e0987 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2880,19 +2880,22 @@ show_incremental_sort_info(IncrementalSortState *incrsortstate, fullsortGroupInfo = >incsort_info.fullsortGroupInfo; - if (!(es->analyze && fullsortGroupInfo->groupCount > 0)) + if (!es->analyze) return; - show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es); - prefixsortGroupInfo = >incsort_info.prefixsortGroupInfo; - if (prefixsortGroupInfo->groupCount > 0) + if (fullsortGroupInfo->groupCount > 0) { + show_incremental_sort_group_info(fullsortGroupInfo, "Full-sort", true, es); + prefixsortGroupInfo = >incsort_info.prefixsortGroupInfo; + if (prefixsortGroupInfo->groupCount > 0) + { + if (es->format == EXPLAIN_FORMAT_TEXT) +appendStringInfo(es->str, " "); + show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es); + } if (es->format == EXPLAIN_FORMAT_TEXT) - appendStringInfo(es->str, " "); - show_incremental_sort_group_info(prefixsortGroupInfo, "Presorted", false, es); + appendStringInfo(es->str, "\n"); } - if (es->format == EXPLAIN_FORMAT_TEXT) - appendStringInfo(es->str, "\n"); if (incrsortstate->shared_info != NULL) { diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql index 3b359efa29..c46443358a 100644 --- a/src/test/regress/sql/incremental_sort.sql +++ b/src/test/regress/sql/incremental_sort.sql @@ -1,3 +1,5 @@ +set force_parallel_mode = regress; + -- When we have to sort the entire table, incremental sort will -- be slower than plain sort, so it should not be used. explain (costs off)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 5:20 PM James Coleman wrote: > > On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra > wrote: > > > > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > > >On 2020-Apr-06, Tom Lane wrote: > > > > > >> Locally, things pass without force_parallel_mode, but turning it on > > >> produces failures that look similar to rhinoceros's (didn't examine > > >> other BF members). > > > > > >FWIW I looked at the eight failures there were about fifteen minutes ago > > >and they were all identical. I can confirm that, in my laptop, the > > >tests work without that GUC, and fail in exactly that way with it. > > > > > > > Yes, there's a thinko in show_incremental_sort_info() and it returns too > > soon. I'll push a fix in a minute. > > I'm stepping through this in a debugger; is what you're considering > that the for loop through the workers is off by one? Oh, nevermind, misread that. Looks like if the leader doesn't participate, then we don't show details for workers. Tomas: Do you already have a patch? If not, I can work one up. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 6, 2020 at 5:12 PM Tomas Vondra wrote: > > On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >On 2020-Apr-06, Tom Lane wrote: > > > >> Locally, things pass without force_parallel_mode, but turning it on > >> produces failures that look similar to rhinoceros's (didn't examine > >> other BF members). > > > >FWIW I looked at the eight failures there were about fifteen minutes ago > >and they were all identical. I can confirm that, in my laptop, the > >tests work without that GUC, and fail in exactly that way with it. > > > > Yes, there's a thinko in show_incremental_sort_info() and it returns too > soon. I'll push a fix in a minute. I'm stepping through this in a debugger; is what you're considering that the for loop through the workers is off by one? James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: On 2020-Apr-06, Tom Lane wrote: Locally, things pass without force_parallel_mode, but turning it on produces failures that look similar to rhinoceros's (didn't examine other BF members). FWIW I looked at the eight failures there were about fifteen minutes ago and they were all identical. I can confirm that, in my laptop, the tests work without that GUC, and fail in exactly that way with it. Yes, there's a thinko in show_incremental_sort_info() and it returns too soon. I'll push a fix in a minute. thanks -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On 2020-Apr-06, Tom Lane wrote: > Locally, things pass without force_parallel_mode, but turning it on > produces failures that look similar to rhinoceros's (didn't examine > other BF members). FWIW I looked at the eight failures there were about fifteen minutes ago and they were all identical. I can confirm that, in my laptop, the tests work without that GUC, and fail in exactly that way with it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > On Mon, Apr 06, 2020 at 04:14:38PM -0400, Tom Lane wrote: >> Did you ever use force_parallel_mode = regress? > Ah, not sure - probably not in this round of tests and there were some > changes in the explain code. Thanks for the hint. Locally, things pass without force_parallel_mode, but turning it on produces failures that look similar to rhinoceros's (didn't examine other BF members). At a guess, looks like missing or incorrect logic for propagating some state back from parallel workers. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 04:14:38PM -0400, Tom Lane wrote: Tomas Vondra writes: Hmmm, I see the buildfarm is not happy about it - a couple of animals failed, but some succeeded. The failure seems like a simple difference in explain output, but it's not clear why would it happen (and I've ran the tests many times but never seen this failure). Did you ever use force_parallel_mode = regress? Ah, not sure - probably not in this round of tests and there were some changes in the explain code. Thanks for the hint. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > Hmmm, I see the buildfarm is not happy about it - a couple of animals > failed, but some succeeded. The failure seems like a simple difference > in explain output, but it's not clear why would it happen (and I've > ran the tests many times but never seen this failure). Did you ever use force_parallel_mode = regress? regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Apr 06, 2020 at 09:57:22PM +0200, Tomas Vondra wrote: Hi, I've pushed the fist part of this patch series - I've reorganized it a bit by moving the add_partial_path changes to the end. That way I've been able to add regression test demonstrating impact of the change on plans involving incremental sort nodes (which wouldn't be possible when committing the add_partial_path first). I'll wait a bit before pushing the two additional parts, so that if something fails we know which bit caused it. Hmmm, I see the buildfarm is not happy about it - a couple of animals failed, but some succeeded. The failure seems like a simple difference in explain output, but it's not clear why would it happen (and I've ran the tests many times but never seen this failure). Investigating. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, Apr 05, 2020 at 03:01:10PM +0200, Tomas Vondra wrote: On Thu, Apr 02, 2020 at 09:40:45PM -0400, James Coleman wrote: On Thu, Apr 2, 2020 at 8:46 PM James Coleman wrote: On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra wrote: ... 5) Overall, I think the costing is OK. I'm sure we'll find cases that will need improvements, but that's fine. However, we now have - cost_tuplesort (used to be cost_sort) - cost_full_sort - cost_incremental_sort - cost_sort I find it a bit confusing that we have cost_sort and cost_full_sort. Why don't we just keep using the dummy path in label_sort_with_costsize? That seems to be the only external caller outside costsize.c. Then we could either make cost_full_sort static or get rid of it entirely. This another area of the patch I haven't really modified. See attached for a cleanup of this; it removed cost_fullsort so label_sort_with_costsize is back to how it was. I've directly merged this into the patch series; if you'd like to see the diff I can send that along. Thanks. Attached is v54 of the patch, with some minor changes. The main two changes are in add_partial_path_precheck(), firstly to also consider startup_cost, as discussed before. The second change (in 0003) is a bit of an experiment to make add_partial_precheck() cheaper by only calling compare_pathkeys after checking the costs first (which should be cheaper than the function call). add_path_precheck already does it in that order anyway. Oh, I forgot to mention a change in add_partial_path - I've removed the reference/dependency on enable_incrementalsort. It seemed rather ugly, and the results without it seem fine (I'm benchmarking only the case with incremental sort enabled anyway). I also plan to look at the other optimization we bolted on last week, i.e. checking length of pathkeys. I'll see if that actually makes measurable difference. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Thu, Apr 2, 2020 at 8:46 PM James Coleman wrote: > > On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra > wrote: > > > > Hi, > > > > Thanks, the v52 looks almost ready. I've been looking at the two or > > three things I mentioned, and I have a couple of comments. > > > > > > 1) /* XXX comparison_cost shouldn't be 0? */ > > > > I'm not worried about this, because this is not really intriduced by > > this patch - create_sort_path has the same comment/issue, so I think > > it's acceptable to do the same thing for incremental sort. > > Sounds good. > > > 2) INITIAL_MEMTUPSIZE > > > > tuplesort.c does this: > > > >#define INITIAL_MEMTUPSIZE Max(1024, \ > >ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1) > > > > supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD. > > But I think it fails to do that, for a couple of reasons. > > > > Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at > > minimum (without padding), so with 1024 elements it's guaranteed to be > > at least 21kB - so exceeding the threshold. The maximum value is > > something like 256. > > > > Secondly, the second part of the formula is guaranteed to get us over > > the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then > > > >ALLOCSET_SEPARATE_THRESHOLD / 30 = 273 > > > > but we end up using 274, resulting in 8220B array. :-( > > > > So I guess the formula should be more like > > > >Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple)) > > > > or something like that. > > > > FWIW I think the whole hypothesis that selecting the array size below > > ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we > > allocate this only once (or very few times), and if we need to grow the > > array we'll hit the threshold anyway. > > > > I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024 > > may be OK too. > > That was a part of the patch I haven't touched since I inherited it, > and I didn't feel like I knew enough about the Postgres memory > management to make a determination on whether the reasoning made > sense. > > So I' happy to use a constant as suggested. I take that back. That code hasn't changed, it's just moved. Here's the current code in tuplesort_begin_common on master: /* * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD; * see comments in grow_memtuples(). */ state->memtupsize = Max(1024, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1); I'm not sure we ought to change that in this patch... James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra wrote: > > Hi, > > Thanks, the v52 looks almost ready. I've been looking at the two or > three things I mentioned, and I have a couple of comments. > > > 1) /* XXX comparison_cost shouldn't be 0? */ > > I'm not worried about this, because this is not really intriduced by > this patch - create_sort_path has the same comment/issue, so I think > it's acceptable to do the same thing for incremental sort. Sounds good. > 2) INITIAL_MEMTUPSIZE > > tuplesort.c does this: > >#define INITIAL_MEMTUPSIZE Max(1024, \ >ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1) > > supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD. > But I think it fails to do that, for a couple of reasons. > > Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at > minimum (without padding), so with 1024 elements it's guaranteed to be > at least 21kB - so exceeding the threshold. The maximum value is > something like 256. > > Secondly, the second part of the formula is guaranteed to get us over > the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then > >ALLOCSET_SEPARATE_THRESHOLD / 30 = 273 > > but we end up using 274, resulting in 8220B array. :-( > > So I guess the formula should be more like > >Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple)) > > or something like that. > > FWIW I think the whole hypothesis that selecting the array size below > ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we > allocate this only once (or very few times), and if we need to grow the > array we'll hit the threshold anyway. > > I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024 > may be OK too. That was a part of the patch I haven't touched since I inherited it, and I didn't feel like I knew enough about the Postgres memory management to make a determination on whether the reasoning made sense. So I' happy to use a constant as suggested. > 3) add_partial_path > > I'm a bit torn about using enable_incrementalsort in add_partial_path. I > know we've done that to eliminate the overhead, but it just seems weird. > I wonder if that really saves us anything or if it was just noise. I'll > do more testing before making a decision. > > While looking at add_path, I see the comparisons there are made in the > opposite order - we first compare costs, and only then pathkeys. That > seems like a more efficient way, I guess (cheaper float comparison > first, pathkey comparison with a loop second). I wonder why it's not > done like that in add_partial_path too? Perhaps this will make it cheap > enough to remove the enable_incrementalsort check. I would love to avoid checking enable_incrementalsort there. I think it's pretty gross to do so. But if it's the only way to keep the patch acceptable, then obviously I'd leave it in. So I'm hopeful we can avoid needing it. > 4) add_partial_path_precheck > > While looking at add_partial_path, I realized add_partial_path_precheck > probably needs the same change - to consider startup cost too. I think > the expectation is that add_partial_path_precheck only rejects paths > that we know would then get rejected by add_partial_path. > > But now the behavior is inconsistent, which might lead to surprising > behavior (I haven't seen such cases, but I haven't looked for them). > At the moment the add_partial_path_precheck is called only from > joinpath.c, maybe it's not an issue there. > > If it's OK to keep it like this, it'd probably deserve a comment why > the difference is OK. And the comments also contain the same claim that > parallel plans only need to look at total cost. I remember some discussion about that much earlier in this thread (or in the spin-off thread [1] re: that specific change that didn't get a lot of action), and I'm pretty sure the conclusion was that we should change both. But I guess we never got around to doing it. > 5) Overall, I think the costing is OK. I'm sure we'll find cases that > will need improvements, but that's fine. However, we now have > > - cost_tuplesort (used to be cost_sort) > - cost_full_sort > - cost_incremental_sort > - cost_sort > > I find it a bit confusing that we have cost_sort and cost_full_sort. Why > don't we just keep using the dummy path in label_sort_with_costsize? > That seems to be the only external caller outside costsize.c. Then we > could either make cost_full_sort static or get rid of it entirely. This another area of the patch I haven't really modified. James [1]: https://www.postgresql.org/message-id/flat/CAAaqYe-5HmM4ih6FWp2RNV9rruunfrFrLhqFXF_nrrNCPy1Zhg%40mail.gmail.com
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Hi, Thanks, the v52 looks almost ready. I've been looking at the two or three things I mentioned, and I have a couple of comments. 1) /* XXX comparison_cost shouldn't be 0? */ I'm not worried about this, because this is not really intriduced by this patch - create_sort_path has the same comment/issue, so I think it's acceptable to do the same thing for incremental sort. 2) INITIAL_MEMTUPSIZE tuplesort.c does this: #define INITIAL_MEMTUPSIZE Max(1024, \ ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple) + 1) supposedly to keep the array size within ALLOCSET_SEPARATE_THRESHOLD. But I think it fails to do that, for a couple of reasons. Firstly, ALLOCSET_SEPARATE_THRESHOLD is 8192, and SortTuple is 21B at minimum (without padding), so with 1024 elements it's guaranteed to be at least 21kB - so exceeding the threshold. The maximum value is something like 256. Secondly, the second part of the formula is guaranteed to get us over the threshold anyway, thanks to the +1. Let's say SortTuple is 30B. Then ALLOCSET_SEPARATE_THRESHOLD / 30 = 273 but we end up using 274, resulting in 8220B array. :-( So I guess the formula should be more like Min(128, ALLOCSET_SEPARATE_THRESHOLD / sizeof(SortTuple)) or something like that. FWIW I think the whole hypothesis that selecting the array size below ALLOCSET_SEPARATE_THRESHOLD reduces overhead is dubious. AFAIC we allocate this only once (or very few times), and if we need to grow the array we'll hit the threshold anyway. I'd just pick a reasonable constant - 128 or 256 seems reasonable, 1024 may be OK too. 3) add_partial_path I'm a bit torn about using enable_incrementalsort in add_partial_path. I know we've done that to eliminate the overhead, but it just seems weird. I wonder if that really saves us anything or if it was just noise. I'll do more testing before making a decision. While looking at add_path, I see the comparisons there are made in the opposite order - we first compare costs, and only then pathkeys. That seems like a more efficient way, I guess (cheaper float comparison first, pathkey comparison with a loop second). I wonder why it's not done like that in add_partial_path too? Perhaps this will make it cheap enough to remove the enable_incrementalsort check. 4) add_partial_path_precheck While looking at add_partial_path, I realized add_partial_path_precheck probably needs the same change - to consider startup cost too. I think the expectation is that add_partial_path_precheck only rejects paths that we know would then get rejected by add_partial_path. But now the behavior is inconsistent, which might lead to surprising behavior (I haven't seen such cases, but I haven't looked for them). At the moment the add_partial_path_precheck is called only from joinpath.c, maybe it's not an issue there. If it's OK to keep it like this, it'd probably deserve a comment why the difference is OK. And the comments also contain the same claim that parallel plans only need to look at total cost. 5) Overall, I think the costing is OK. I'm sure we'll find cases that will need improvements, but that's fine. However, we now have - cost_tuplesort (used to be cost_sort) - cost_full_sort - cost_incremental_sort - cost_sort I find it a bit confusing that we have cost_sort and cost_full_sort. Why don't we just keep using the dummy path in label_sort_with_costsize? That seems to be the only external caller outside costsize.c. Then we could either make cost_full_sort static or get rid of it entirely. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 01, 2020 at 10:09:20PM -0400, James Coleman wrote: On Wed, Apr 1, 2020 at 5:42 PM Tomas Vondra wrote: ... I've realized the way get_useful_pathkeys_for_relation() is coded kinda works against the fastpath we added when comparing pathkeys. That depends on comparing pointers to the list, but we've been building new lists (and then returned those) which defeats the optimization. Attached is a patch that returns the original list in most cases (and only creates a copy when really necessary). This might also save a few cycles on bulding the new list, of course. I've done a bunch of read-only pgbench tests with fairly small scales (1 and 10). First with the built-in read-only transaction, and also with a simple custom query doing an order-by. And I did this both on the default schema and with a bunch of extra indexes. The script I used to run this is attached, along with a summary of results. There are results for master and v40 and v50 patches (the v50 also includes the extra patch fixing get_useful_pathkeys_for_relation). Overall, I'm happy with those results - the v50 seems to be within 1% of master, in both directions. This very much seems like a noise. I still want to do a bit more review of the costing / tuplesort changes, which I plan to do tomorrow. If that goes well, I plan to start committing this. So please if you think this is not ready or wants more I think we need to either implement this or remove the comment: * XXX I wonder if we need to consider adding a projection here, as * create_ordered_paths does. in generate_useful_gather_paths(). Yeah. I think we don't need the projection here. My reasoning is that if we don't need it in generate_gather_paths(), we don't need it here. In the same function we have the following code: /* * When the partial path is already sorted, we can just add a gather * merge on top, and we're done - no point in adding explicit sort. * * XXX Can't we skip this (maybe only for the cheapest partial path) * when the path is already sorted? Then it's likely duplicate with * the path created by generate_gather_paths. */ if (is_sorted) { path = create_gather_merge_path(root, rel, subpath, rel->reltarget, subpath->pathkeys, NULL, rowsp); add_path(rel, >path); continue; } looking at the relevant loop in generate_gather_paths: /* * For each useful ordering, we can consider an order-preserving Gather * Merge. */ foreach(lc, rel->partial_pathlist) { Path *subpath = (Path *) lfirst(lc); GatherMergePath *path; if (subpath->pathkeys == NIL) continue; rows = subpath->rows * subpath->parallel_workers; path = create_gather_merge_path(root, rel, subpath, rel->reltarget, subpath->pathkeys, NULL, rowsp); add_path(rel, >path); } I believe we can eliminate the block entirely in generate_useful_gather_paths(). Here's my reasoning: all paths for which is_sorted is true must necessarily have pathkeys, and since we already add a gather merge for every subpath with pathkeys, we've already added gather merge paths for all of these. I've included a patch to change this, but let me know if the reasoning isn't sound. Good catch! I think you're correct - we don't need to generate this path, and we can just skip that partial path entirely. We can also remove the XXX on this comment (in the same function): * XXX This is not redundant with the gather merge path created in * generate_gather_paths, because that merely preserves ordering of * the cheapest partial path, while here we add an explicit sort to * get match the useful ordering. because of this code in generate_gather_paths(): cheapest_partial_path = linitial(rel->partial_pathlist); rows = cheapest_partial_path->rows * cheapest_partial_path->parallel_workers; simple_gather_path = (Path *) create_gather_path(root, rel, cheapest_partial_path, rel->reltarget, NULL, rowsp); add_path(rel, simple_gather_path); but we can cleanup the comment a bit: fix the grammar issue in the last line and fix the reference to gather merge path (it's a gather path). I've included that in the same patch. OK, makes sense. I also noticed that in create_incremental_sort_path we have this: /* XXX comparison_cost shouldn't be 0? */ but I guess that's part of what you're reviewing tomorrow. Right, one of the bits. time for a review, let me know. I'm not yet sure if I'll commit this as a single change, or in three separate commits. I don't love the idea of committing it as a single patch, but at least the first two I think probably go together. Otherwise we're introducing a "fix" with no proven impact that will slow down planning (even if only in a small way) only to intend to condition that on a GUC in the next commit. But I think you could potentially make an argument for keeping the additional paths separate...but it's not absolutely necessary IMO. OK. I've been actually
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Wed, Apr 01, 2020 at 09:05:27AM -0400, James Coleman wrote: On Tue, Mar 31, 2020 at 11:07 PM James Coleman wrote: On Tue, Mar 31, 2020 at 10:44 PM Tomas Vondra wrote: > > On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote: > >On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra > > wrote: > >> > >> On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote: > >> >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra > >> > wrote: > >> >> > >> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote: > >> >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra > >> >> > wrote: ... > >> >> >One small idea, but I'm not yet sure it helps us a whole lot: if the > >> >> >query pathkeys is only length 1, then we could skip the additional > >> >> >path creation. > >> >> > > >> >> > >> >> I don't follow. Why would we create incremental sort in this case at > >> >> all? With single-element query_pathkeys the path is either unsorted or > >> >> fully sorted - there's no room for incremental sort. No? > >> > > >> >Well, we shouldn't, that's what I'm getting. But I didn't see anything > >> >in the code now that explicitly excludes that case when decided > >> >whether or not to create an incremental sort path, unless I'm missing > >> >something obvious. > >> > >> Well, my point is that create_ordered_paths() looks like this: > >> > >> is_sorted = pathkeys_common_contained_in(root->sort_patkeys, ...); > >> > >> if (is_sorted) > >> { > >> ... old code > >> } > >> else > >> { > >> if (input_path == cheapest_input_path) > >> { > >> ... old code > >> } > >> > >> /* With incremental sort disabled, don't build those paths. */ > >> if (!enable_incrementalsort) > >> continue; > >> > >> /* Likewise, if the path can't be used for incremental sort. */ > >> if (!presorted_keys) > >> continue; > >> > >> ... incremental sort path > >> } > >> > >> Now, with single-item sort_pathkeys, the input path can't be partially > >> sorted. It's either fully sorted - in which case it's handled by the > >> first branch. Or it's not sorted at all, so presorted_keys==0 and we > >> never get to the incremental path. > >> > >> Or did you mean to use the optimization somewhere else? > > > >Hmm, yes, I didn't think through that properly. I'll have to look at > >the other cases to confirm the same logic applies there. I looked through this more carefully, and I did end up finding a few places where we can skip iterating through a list of paths entirely with this check, so I added it there. I also cleaned up some comments, added comments and asserts to the other places where list_length(pathkeys) should be guaranteed to be > 1, removed a few asserts I found unnecessary, and merged duplicative pathkeys_[count_]_contained_in calls. OK > >One other thing:in the code above we create the regular sort path > >inside of `if (input_path == cheapest_input_path)`, but incremental > >sort is outside of that condition. I'm not sure I'm remembering why > >that was, and it's not obvious to me reading it right now (though it's > >getting late here, so maybe I'm just not thinking clearly). Do you > >happen to remember why that is? > > > > It's because for the regular sort, the path is either already sorted or > it requires a full sort. But full sort only makes sense on the cheapest > path, because we assume the additional sort cost is independent of the > input cost, essentially > > cost(path + Sort) = cost(path) + cost(Sort) > > and it's always > > cost(path) + cost(Sort) >= cost(cheapest path) + cost(Sort) > > and by checking for cheapest path we simply skip building all the paths > that we'd end up discarding anyway. > > With incremental sort we can't do this, the cost of the incremental sort > depends on how well presorted is the input path. Thanks for the explanation. I've added a comment to that effect. Thanks. I've realized the way get_useful_pathkeys_for_relation() is coded kinda works against the fastpath we added when comparing pathkeys. That depends on comparing pointers to the list, but we've been building new lists (and then returned those) which defeats the optimization. Attached is a patch that returns the original list in most cases (and only creates a copy when really necessary). This might also save a few cycles on bulding the new list, of course. I've done a bunch of read-only pgbench tests with fairly small scales (1 and 10). First with the built-in read-only transaction, and also with a simple custom query doing an order-by. And I did this both on the default schema and with a bunch of extra indexes. The script I used to run this is attached, along with a summary of results. There are results for master and v40 and v50 patches (the v50 also includes the extra patch fixing get_useful_pathkeys_for_relation). Overall, I'm happy with those results - the v50 seems to be
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 10:12:29PM -0400, James Coleman wrote: On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra wrote: On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote: >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra > wrote: >> >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote: >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra >> > wrote: >> >> >> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: >> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra >> >> > wrote: >> >> >> >> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: >> >> >> >Tomas Vondra writes: >> >> >> >> In general, I think it'd be naive that we can make planner smarter with >> >> >> >> no extra overhead spent on planning, and we can never accept patches >> >> >> >> adding even tiny overhead. With that approach we'd probably end up with >> >> >> >> a trivial planner that generates just a single query plan, because >> >> >> >> that's going to be the fastest planner. A realistic approach needs to >> >> >> >> consider both the planning and execution phase, and benefits of this >> >> >> >> patch seem to be clear - if you have queries that do benefit from it. >> >> >> > >> >> >> >I think that's kind of attacking a straw man, though. The thing that >> >> >> >people push back on, or should push back on IMO, is when a proposed >> >> >> >patch adds significant slowdown to queries that it has no or very little >> >> >> >hope of improving. The trick is to do expensive stuff only when >> >> >> >there's a good chance of getting a better plan out of it. >> >> >> > >> >> >> >> >> >> Yeah, I agree with that. I think the main issue is that we don't really >> >> >> know what the "expensive stuff" is in this case, so it's not really >> >> >> clear how to be smarter :-( >> >> > >> >> >To add to this: I agree that ideally you'd check cheaply to know >> >> >you're in a situation that might help, and then do more work. But here >> >> >the question is always going to be simply "would we benefit from an >> >> >ordering, and, if so, do we have it already partially sorted". It's >> >> >hard to imagine that reducing much conceptually, so we're left with >> >> >optimizations of that check. >> >> > >> >> >> >> I think it depends on what exactly is the expensive part. For example if >> >> it's the construction of IncrementalSort paths, then maybe we could try >> >> do a quick/check check if the path can even be useful by estimating the >> >> cost and only then building the path. >> >> >> >> That's what we do for join algorithms, for example - we first compute >> >> initial_cost_nestloop and only when that seems cheap enough we do the >> >> more expensive stuff. >> >> >> >> But I'm not sure the path construction is the expensive part, as it >> >> should be disabled by enable_incrementalsort=off. But the regression >> >> does not seem to disappear, at least not entirely. >> >> >> >> >> >> >> One possibility is that it's just one of those regressions due to change >> >> >> in binary layout, but I'm not sure know how to verify that. >> >> > >> >> >If we are testing with a case that can't actually add more paths (due >> >> >to it checking the guc before building them), doesn't that effectively >> >> >leave one of these two options: >> >> > >> >> >1. Binary layout/cache/other untraceable change, or >> >> >2. Changes due to refactored function calls. >> >> > >> >> >> >> Hmm, but in case of (1) the overhead should be there even with tests >> >> that don't really have any additional paths to consider, right? I've >> >> tried with such test (single table with no indexes) and I don't quite >> >> see any regression (maybe ~1%). >> > >> >Not necessarily, if the cost is in sort costing or useful pathkeys >> >checking, right? We have run that code even without incremental sort, >> >but it's changed from master. >> > >> >> Ah, I should have mentioned I've done most of the tests on just the >> basic incremental sort patch (0001+0002), without the additional useful >> paths. I initially tested the whole patch series, but after discovering >> the regression I removed the last part (which I suspected might be the >> root cause). But the regression is still there, so it's not that. >> >> It might be in the reworked costing, yeah. But then I'd expect those >> function to show in the perf profile. > >Right. I'm just grasping at straws on that. > >> >> (2) might have impact, but I don't see any immediate supects. Did you >> >> have some functions in mind? >> > >> >I guess this is where the lines blur: I didn't see anything obvious >> >either, but the changes to sort costing...should probably not have >> >real impact...but... >> > >> >> :-( >> >> >> BTW I see the patch adds pathkeys_common but it's not called from >> >> anywhere. It's probably leftover from an earlier patch version. >> >> > >BTW, I think I'm going to rename the pathkeys_common_contained_in >function to something like pathkeys_count_contained_in, unless you >have an
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 9:59 PM Tomas Vondra wrote: > > On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote: > >On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra > > wrote: > >> > >> On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote: > >> >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra > >> > wrote: > >> >> > >> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: > >> >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra > >> >> > wrote: > >> >> >> > >> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: > >> >> >> >Tomas Vondra writes: > >> >> >> >> In general, I think it'd be naive that we can make planner > >> >> >> >> smarter with > >> >> >> >> no extra overhead spent on planning, and we can never accept > >> >> >> >> patches > >> >> >> >> adding even tiny overhead. With that approach we'd probably end > >> >> >> >> up with > >> >> >> >> a trivial planner that generates just a single query plan, because > >> >> >> >> that's going to be the fastest planner. A realistic approach > >> >> >> >> needs to > >> >> >> >> consider both the planning and execution phase, and benefits of > >> >> >> >> this > >> >> >> >> patch seem to be clear - if you have queries that do benefit from > >> >> >> >> it. > >> >> >> > > >> >> >> >I think that's kind of attacking a straw man, though. The thing > >> >> >> >that > >> >> >> >people push back on, or should push back on IMO, is when a proposed > >> >> >> >patch adds significant slowdown to queries that it has no or very > >> >> >> >little > >> >> >> >hope of improving. The trick is to do expensive stuff only when > >> >> >> >there's a good chance of getting a better plan out of it. > >> >> >> > > >> >> >> > >> >> >> Yeah, I agree with that. I think the main issue is that we don't > >> >> >> really > >> >> >> know what the "expensive stuff" is in this case, so it's not really > >> >> >> clear how to be smarter :-( > >> >> > > >> >> >To add to this: I agree that ideally you'd check cheaply to know > >> >> >you're in a situation that might help, and then do more work. But here > >> >> >the question is always going to be simply "would we benefit from an > >> >> >ordering, and, if so, do we have it already partially sorted". It's > >> >> >hard to imagine that reducing much conceptually, so we're left with > >> >> >optimizations of that check. > >> >> > > >> >> > >> >> I think it depends on what exactly is the expensive part. For example if > >> >> it's the construction of IncrementalSort paths, then maybe we could try > >> >> do a quick/check check if the path can even be useful by estimating the > >> >> cost and only then building the path. > >> >> > >> >> That's what we do for join algorithms, for example - we first compute > >> >> initial_cost_nestloop and only when that seems cheap enough we do the > >> >> more expensive stuff. > >> >> > >> >> But I'm not sure the path construction is the expensive part, as it > >> >> should be disabled by enable_incrementalsort=off. But the regression > >> >> does not seem to disappear, at least not entirely. > >> >> > >> >> > >> >> >> One possibility is that it's just one of those regressions due to > >> >> >> change > >> >> >> in binary layout, but I'm not sure know how to verify that. > >> >> > > >> >> >If we are testing with a case that can't actually add more paths (due > >> >> >to it checking the guc before building them), doesn't that effectively > >> >> >leave one of these two options: > >> >> > > >> >> >1. Binary layout/cache/other untraceable change, or > >> >> >2. Changes due to refactored function calls. > >> >> > > >> >> > >> >> Hmm, but in case of (1) the overhead should be there even with tests > >> >> that don't really have any additional paths to consider, right? I've > >> >> tried with such test (single table with no indexes) and I don't quite > >> >> see any regression (maybe ~1%). > >> > > >> >Not necessarily, if the cost is in sort costing or useful pathkeys > >> >checking, right? We have run that code even without incremental sort, > >> >but it's changed from master. > >> > > >> > >> Ah, I should have mentioned I've done most of the tests on just the > >> basic incremental sort patch (0001+0002), without the additional useful > >> paths. I initially tested the whole patch series, but after discovering > >> the regression I removed the last part (which I suspected might be the > >> root cause). But the regression is still there, so it's not that. > >> > >> It might be in the reworked costing, yeah. But then I'd expect those > >> function to show in the perf profile. > > > >Right. I'm just grasping at straws on that. > > > >> >> (2) might have impact, but I don't see any immediate supects. Did you > >> >> have some functions in mind? > >> > > >> >I guess this is where the lines blur: I didn't see anything obvious > >> >either, but the changes to sort costing...should probably not have > >> >real impact...but... > >> > > >> > >> :-( > >> > >> >> BTW I see the patch adds
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 08:42:47PM -0400, James Coleman wrote: On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra wrote: On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote: >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra > wrote: >> >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra >> > wrote: >> >> >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: >> >> >Tomas Vondra writes: >> >> >> In general, I think it'd be naive that we can make planner smarter with >> >> >> no extra overhead spent on planning, and we can never accept patches >> >> >> adding even tiny overhead. With that approach we'd probably end up with >> >> >> a trivial planner that generates just a single query plan, because >> >> >> that's going to be the fastest planner. A realistic approach needs to >> >> >> consider both the planning and execution phase, and benefits of this >> >> >> patch seem to be clear - if you have queries that do benefit from it. >> >> > >> >> >I think that's kind of attacking a straw man, though. The thing that >> >> >people push back on, or should push back on IMO, is when a proposed >> >> >patch adds significant slowdown to queries that it has no or very little >> >> >hope of improving. The trick is to do expensive stuff only when >> >> >there's a good chance of getting a better plan out of it. >> >> > >> >> >> >> Yeah, I agree with that. I think the main issue is that we don't really >> >> know what the "expensive stuff" is in this case, so it's not really >> >> clear how to be smarter :-( >> > >> >To add to this: I agree that ideally you'd check cheaply to know >> >you're in a situation that might help, and then do more work. But here >> >the question is always going to be simply "would we benefit from an >> >ordering, and, if so, do we have it already partially sorted". It's >> >hard to imagine that reducing much conceptually, so we're left with >> >optimizations of that check. >> > >> >> I think it depends on what exactly is the expensive part. For example if >> it's the construction of IncrementalSort paths, then maybe we could try >> do a quick/check check if the path can even be useful by estimating the >> cost and only then building the path. >> >> That's what we do for join algorithms, for example - we first compute >> initial_cost_nestloop and only when that seems cheap enough we do the >> more expensive stuff. >> >> But I'm not sure the path construction is the expensive part, as it >> should be disabled by enable_incrementalsort=off. But the regression >> does not seem to disappear, at least not entirely. >> >> >> >> One possibility is that it's just one of those regressions due to change >> >> in binary layout, but I'm not sure know how to verify that. >> > >> >If we are testing with a case that can't actually add more paths (due >> >to it checking the guc before building them), doesn't that effectively >> >leave one of these two options: >> > >> >1. Binary layout/cache/other untraceable change, or >> >2. Changes due to refactored function calls. >> > >> >> Hmm, but in case of (1) the overhead should be there even with tests >> that don't really have any additional paths to consider, right? I've >> tried with such test (single table with no indexes) and I don't quite >> see any regression (maybe ~1%). > >Not necessarily, if the cost is in sort costing or useful pathkeys >checking, right? We have run that code even without incremental sort, >but it's changed from master. > Ah, I should have mentioned I've done most of the tests on just the basic incremental sort patch (0001+0002), without the additional useful paths. I initially tested the whole patch series, but after discovering the regression I removed the last part (which I suspected might be the root cause). But the regression is still there, so it's not that. It might be in the reworked costing, yeah. But then I'd expect those function to show in the perf profile. Right. I'm just grasping at straws on that. >> (2) might have impact, but I don't see any immediate supects. Did you >> have some functions in mind? > >I guess this is where the lines blur: I didn't see anything obvious >either, but the changes to sort costing...should probably not have >real impact...but... > :-( >> BTW I see the patch adds pathkeys_common but it's not called from >> anywhere. It's probably leftover from an earlier patch version. >> BTW, I think I'm going to rename the pathkeys_common_contained_in function to something like pathkeys_count_contained_in, unless you have an objection to that. The name doesn't seem obvious at all to me. WFM >> >There's not anything obvious in point (2) that would be a big cost, >> >but there are definitely changes there. I was surprised that just >> >eliminating the loop through the pathkeys on the query and the index >> >was enough to save us ~4%. >> > >> >Tomas: Earlier you'd wondered about if we should try to shortcut the >> >changes
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 8:38 PM Tomas Vondra wrote: > > On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote: > >On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra > > wrote: > >> > >> On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: > >> >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra > >> > wrote: > >> >> > >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: > >> >> >Tomas Vondra writes: > >> >> >> In general, I think it'd be naive that we can make planner smarter > >> >> >> with > >> >> >> no extra overhead spent on planning, and we can never accept patches > >> >> >> adding even tiny overhead. With that approach we'd probably end up > >> >> >> with > >> >> >> a trivial planner that generates just a single query plan, because > >> >> >> that's going to be the fastest planner. A realistic approach needs to > >> >> >> consider both the planning and execution phase, and benefits of this > >> >> >> patch seem to be clear - if you have queries that do benefit from it. > >> >> > > >> >> >I think that's kind of attacking a straw man, though. The thing that > >> >> >people push back on, or should push back on IMO, is when a proposed > >> >> >patch adds significant slowdown to queries that it has no or very > >> >> >little > >> >> >hope of improving. The trick is to do expensive stuff only when > >> >> >there's a good chance of getting a better plan out of it. > >> >> > > >> >> > >> >> Yeah, I agree with that. I think the main issue is that we don't really > >> >> know what the "expensive stuff" is in this case, so it's not really > >> >> clear how to be smarter :-( > >> > > >> >To add to this: I agree that ideally you'd check cheaply to know > >> >you're in a situation that might help, and then do more work. But here > >> >the question is always going to be simply "would we benefit from an > >> >ordering, and, if so, do we have it already partially sorted". It's > >> >hard to imagine that reducing much conceptually, so we're left with > >> >optimizations of that check. > >> > > >> > >> I think it depends on what exactly is the expensive part. For example if > >> it's the construction of IncrementalSort paths, then maybe we could try > >> do a quick/check check if the path can even be useful by estimating the > >> cost and only then building the path. > >> > >> That's what we do for join algorithms, for example - we first compute > >> initial_cost_nestloop and only when that seems cheap enough we do the > >> more expensive stuff. > >> > >> But I'm not sure the path construction is the expensive part, as it > >> should be disabled by enable_incrementalsort=off. But the regression > >> does not seem to disappear, at least not entirely. > >> > >> > >> >> One possibility is that it's just one of those regressions due to change > >> >> in binary layout, but I'm not sure know how to verify that. > >> > > >> >If we are testing with a case that can't actually add more paths (due > >> >to it checking the guc before building them), doesn't that effectively > >> >leave one of these two options: > >> > > >> >1. Binary layout/cache/other untraceable change, or > >> >2. Changes due to refactored function calls. > >> > > >> > >> Hmm, but in case of (1) the overhead should be there even with tests > >> that don't really have any additional paths to consider, right? I've > >> tried with such test (single table with no indexes) and I don't quite > >> see any regression (maybe ~1%). > > > >Not necessarily, if the cost is in sort costing or useful pathkeys > >checking, right? We have run that code even without incremental sort, > >but it's changed from master. > > > > Ah, I should have mentioned I've done most of the tests on just the > basic incremental sort patch (0001+0002), without the additional useful > paths. I initially tested the whole patch series, but after discovering > the regression I removed the last part (which I suspected might be the > root cause). But the regression is still there, so it's not that. > > It might be in the reworked costing, yeah. But then I'd expect those > function to show in the perf profile. Right. I'm just grasping at straws on that. > >> (2) might have impact, but I don't see any immediate supects. Did you > >> have some functions in mind? > > > >I guess this is where the lines blur: I didn't see anything obvious > >either, but the changes to sort costing...should probably not have > >real impact...but... > > > > :-( > > >> BTW I see the patch adds pathkeys_common but it's not called from > >> anywhere. It's probably leftover from an earlier patch version. > >> BTW, I think I'm going to rename the pathkeys_common_contained_in function to something like pathkeys_count_contained_in, unless you have an objection to that. The name doesn't seem obvious at all to me. > >> >There's not anything obvious in point (2) that would be a big cost, > >> >but there are definitely changes there. I was surprised that just > >> >eliminating the loop through the pathkeys on
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 08:11:15PM -0400, James Coleman wrote: On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra wrote: On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra > wrote: >> >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: >> >Tomas Vondra writes: >> >> In general, I think it'd be naive that we can make planner smarter with >> >> no extra overhead spent on planning, and we can never accept patches >> >> adding even tiny overhead. With that approach we'd probably end up with >> >> a trivial planner that generates just a single query plan, because >> >> that's going to be the fastest planner. A realistic approach needs to >> >> consider both the planning and execution phase, and benefits of this >> >> patch seem to be clear - if you have queries that do benefit from it. >> > >> >I think that's kind of attacking a straw man, though. The thing that >> >people push back on, or should push back on IMO, is when a proposed >> >patch adds significant slowdown to queries that it has no or very little >> >hope of improving. The trick is to do expensive stuff only when >> >there's a good chance of getting a better plan out of it. >> > >> >> Yeah, I agree with that. I think the main issue is that we don't really >> know what the "expensive stuff" is in this case, so it's not really >> clear how to be smarter :-( > >To add to this: I agree that ideally you'd check cheaply to know >you're in a situation that might help, and then do more work. But here >the question is always going to be simply "would we benefit from an >ordering, and, if so, do we have it already partially sorted". It's >hard to imagine that reducing much conceptually, so we're left with >optimizations of that check. > I think it depends on what exactly is the expensive part. For example if it's the construction of IncrementalSort paths, then maybe we could try do a quick/check check if the path can even be useful by estimating the cost and only then building the path. That's what we do for join algorithms, for example - we first compute initial_cost_nestloop and only when that seems cheap enough we do the more expensive stuff. But I'm not sure the path construction is the expensive part, as it should be disabled by enable_incrementalsort=off. But the regression does not seem to disappear, at least not entirely. >> One possibility is that it's just one of those regressions due to change >> in binary layout, but I'm not sure know how to verify that. > >If we are testing with a case that can't actually add more paths (due >to it checking the guc before building them), doesn't that effectively >leave one of these two options: > >1. Binary layout/cache/other untraceable change, or >2. Changes due to refactored function calls. > Hmm, but in case of (1) the overhead should be there even with tests that don't really have any additional paths to consider, right? I've tried with such test (single table with no indexes) and I don't quite see any regression (maybe ~1%). Not necessarily, if the cost is in sort costing or useful pathkeys checking, right? We have run that code even without incremental sort, but it's changed from master. Ah, I should have mentioned I've done most of the tests on just the basic incremental sort patch (0001+0002), without the additional useful paths. I initially tested the whole patch series, but after discovering the regression I removed the last part (which I suspected might be the root cause). But the regression is still there, so it's not that. It might be in the reworked costing, yeah. But then I'd expect those function to show in the perf profile. (2) might have impact, but I don't see any immediate supects. Did you have some functions in mind? I guess this is where the lines blur: I didn't see anything obvious either, but the changes to sort costing...should probably not have real impact...but... :-( BTW I see the patch adds pathkeys_common but it's not called from anywhere. It's probably leftover from an earlier patch version. >There's not anything obvious in point (2) that would be a big cost, >but there are definitely changes there. I was surprised that just >eliminating the loop through the pathkeys on the query and the index >was enough to save us ~4%. > >Tomas: Earlier you'd wondered about if we should try to shortcut the >changes in costing...I was skeptical of that originally, but maybe >it's worth looking into? I'm going to try backing that out and see >what the numbers look like. > BTW, I did this test, and it looks like we can get back something close to 1% by reverting that initial fix on partial path costing. But we can't get rid of it all the time, at the very least. *Maybe* we could condition it on incremental sort, but I'm not convinced that's the only place it's needed as a fix. Sounds interesting. I actually tried how much the add_partial_path change accounts for, and you're right it was quite a bit. But
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On 2020-Mar-31, Tom Lane wrote: > James Coleman writes: > > On Tue, Mar 31, 2020 at 1:04 PM Tom Lane wrote: > >> Perhaps the semantics are such that that's actually sensible, but it's > >> far from a straightforward remapping of the old enum. > > > Right, I didn't see the explicit "= 0" in other enums there, so it > > made me wonder if it was intentional to designate that one had to be > > 0, but I guess without a comment that's a lot of inference. > > It's possible that somebody meant that as an indicator that the code > depends on palloc0() leaving the field with that value. But if so, > you'd soon find that out ... and an actual comment would be better, > anyway. git blame fingers this: commit bf11e7ee2e3607bb67d25aec73aa53b2d7e9961b Author: Robert Haas AuthorDate: Tue Aug 29 13:22:49 2017 -0400 CommitDate: Tue Aug 29 13:26:33 2017 -0400 Propagate sort instrumentation from workers back to leader. Up until now, when parallel query was used, no details about the sort method or space used by the workers were available; details were shown only for any sorting done by the leader. Fix that. Commit 1177ab1dabf72bafee8f19d904cee3a299f25892 forced the test case added by commit 1f6d515a67ec98194c23a5db25660856c9aab944 to run without parallelism; now that we have this infrastructure, allow that again, with a little tweaking to make it pass with and without force_parallel_mode. Robert Haas and Tom Lane Discussion: http://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=rf6rqq2br+zjvegwj0uod...@mail.gmail.com I looked at the discussion thread. That patch was first posted by Robert at https://postgr.es/m/CA+Tgmoa2VBZW6S8AAXfhpHczb=rf6rqq2br+zjvegwj0uod...@mail.gmail.com without the "= 0" part; later Tom posted v2 here https://postgr.es/m/11223.1503695...@sss.pgh.pa.us containing the "= 0", but I see no actual discussion of that point. I suppose it could also be important to clarify that it's 0 if it were used as an array index of some sort, but I don't see that in 2017's commit. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra wrote: > > On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: > >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra > > wrote: > >> > >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: > >> >Tomas Vondra writes: > >> >> In general, I think it'd be naive that we can make planner smarter with > >> >> no extra overhead spent on planning, and we can never accept patches > >> >> adding even tiny overhead. With that approach we'd probably end up with > >> >> a trivial planner that generates just a single query plan, because > >> >> that's going to be the fastest planner. A realistic approach needs to > >> >> consider both the planning and execution phase, and benefits of this > >> >> patch seem to be clear - if you have queries that do benefit from it. > >> > > >> >I think that's kind of attacking a straw man, though. The thing that > >> >people push back on, or should push back on IMO, is when a proposed > >> >patch adds significant slowdown to queries that it has no or very little > >> >hope of improving. The trick is to do expensive stuff only when > >> >there's a good chance of getting a better plan out of it. > >> > > >> > >> Yeah, I agree with that. I think the main issue is that we don't really > >> know what the "expensive stuff" is in this case, so it's not really > >> clear how to be smarter :-( > > > >To add to this: I agree that ideally you'd check cheaply to know > >you're in a situation that might help, and then do more work. But here > >the question is always going to be simply "would we benefit from an > >ordering, and, if so, do we have it already partially sorted". It's > >hard to imagine that reducing much conceptually, so we're left with > >optimizations of that check. > > > > I think it depends on what exactly is the expensive part. For example if > it's the construction of IncrementalSort paths, then maybe we could try > do a quick/check check if the path can even be useful by estimating the > cost and only then building the path. > > That's what we do for join algorithms, for example - we first compute > initial_cost_nestloop and only when that seems cheap enough we do the > more expensive stuff. > > But I'm not sure the path construction is the expensive part, as it > should be disabled by enable_incrementalsort=off. But the regression > does not seem to disappear, at least not entirely. > > > >> One possibility is that it's just one of those regressions due to change > >> in binary layout, but I'm not sure know how to verify that. > > > >If we are testing with a case that can't actually add more paths (due > >to it checking the guc before building them), doesn't that effectively > >leave one of these two options: > > > >1. Binary layout/cache/other untraceable change, or > >2. Changes due to refactored function calls. > > > > Hmm, but in case of (1) the overhead should be there even with tests > that don't really have any additional paths to consider, right? I've > tried with such test (single table with no indexes) and I don't quite > see any regression (maybe ~1%). Not necessarily, if the cost is in sort costing or useful pathkeys checking, right? We have run that code even without incremental sort, but it's changed from master. > (2) might have impact, but I don't see any immediate supects. Did you > have some functions in mind? I guess this is where the lines blur: I didn't see anything obvious either, but the changes to sort costing...should probably not have real impact...but... > BTW I see the patch adds pathkeys_common but it's not called from > anywhere. It's probably leftover from an earlier patch version. > > >There's not anything obvious in point (2) that would be a big cost, > >but there are definitely changes there. I was surprised that just > >eliminating the loop through the pathkeys on the query and the index > >was enough to save us ~4%. > > > >Tomas: Earlier you'd wondered about if we should try to shortcut the > >changes in costing...I was skeptical of that originally, but maybe > >it's worth looking into? I'm going to try backing that out and see > >what the numbers look like. > > BTW, I did this test, and it looks like we can get back something close to 1% by reverting that initial fix on partial path costing. But we can't get rid of it all the time, at the very least. *Maybe* we could condition it on incremental sort, but I'm not convinced that's the only place it's needed as a fix. > I've described the idea about something like initial_cost_nestloop and > so on. But I'm a bit skeptical about it, considering that the GUC only > has limited effect. > > > A somewhat note is that the number of indexes has pretty significant > impact on planning time, even on master. This is timing of the same > explain script (similar to the one shown before) with different number > of indexes on master: > > 0 indexes7 indexes 49 indexes > --- >
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra wrote: On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: >Tomas Vondra writes: >> In general, I think it'd be naive that we can make planner smarter with >> no extra overhead spent on planning, and we can never accept patches >> adding even tiny overhead. With that approach we'd probably end up with >> a trivial planner that generates just a single query plan, because >> that's going to be the fastest planner. A realistic approach needs to >> consider both the planning and execution phase, and benefits of this >> patch seem to be clear - if you have queries that do benefit from it. > >I think that's kind of attacking a straw man, though. The thing that >people push back on, or should push back on IMO, is when a proposed >patch adds significant slowdown to queries that it has no or very little >hope of improving. The trick is to do expensive stuff only when >there's a good chance of getting a better plan out of it. > Yeah, I agree with that. I think the main issue is that we don't really know what the "expensive stuff" is in this case, so it's not really clear how to be smarter :-( To add to this: I agree that ideally you'd check cheaply to know you're in a situation that might help, and then do more work. But here the question is always going to be simply "would we benefit from an ordering, and, if so, do we have it already partially sorted". It's hard to imagine that reducing much conceptually, so we're left with optimizations of that check. I think it depends on what exactly is the expensive part. For example if it's the construction of IncrementalSort paths, then maybe we could try do a quick/check check if the path can even be useful by estimating the cost and only then building the path. That's what we do for join algorithms, for example - we first compute initial_cost_nestloop and only when that seems cheap enough we do the more expensive stuff. But I'm not sure the path construction is the expensive part, as it should be disabled by enable_incrementalsort=off. But the regression does not seem to disappear, at least not entirely. One possibility is that it's just one of those regressions due to change in binary layout, but I'm not sure know how to verify that. If we are testing with a case that can't actually add more paths (due to it checking the guc before building them), doesn't that effectively leave one of these two options: 1. Binary layout/cache/other untraceable change, or 2. Changes due to refactored function calls. Hmm, but in case of (1) the overhead should be there even with tests that don't really have any additional paths to consider, right? I've tried with such test (single table with no indexes) and I don't quite see any regression (maybe ~1%). (2) might have impact, but I don't see any immediate supects. Did you have some functions in mind? BTW I see the patch adds pathkeys_common but it's not called from anywhere. It's probably leftover from an earlier patch version. There's not anything obvious in point (2) that would be a big cost, but there are definitely changes there. I was surprised that just eliminating the loop through the pathkeys on the query and the index was enough to save us ~4%. Tomas: Earlier you'd wondered about if we should try to shortcut the changes in costing...I was skeptical of that originally, but maybe it's worth looking into? I'm going to try backing that out and see what the numbers look like. I've described the idea about something like initial_cost_nestloop and so on. But I'm a bit skeptical about it, considering that the GUC only has limited effect. A somewhat note is that the number of indexes has pretty significant impact on planning time, even on master. This is timing of the same explain script (similar to the one shown before) with different number of indexes on master: 0 indexes7 indexes 49 indexes --- 10.8512.5627.83 The way I look at incremental sort is that it allows using indexes for queries that couldn't use it before, possibly requiring a separate index. So incremental sort might easily reduce the number of indexes needed, compensating for the overhead we're discussing here. Of course, that may or may not be true. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra wrote: > > On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: > >Tomas Vondra writes: > >> In general, I think it'd be naive that we can make planner smarter with > >> no extra overhead spent on planning, and we can never accept patches > >> adding even tiny overhead. With that approach we'd probably end up with > >> a trivial planner that generates just a single query plan, because > >> that's going to be the fastest planner. A realistic approach needs to > >> consider both the planning and execution phase, and benefits of this > >> patch seem to be clear - if you have queries that do benefit from it. > > > >I think that's kind of attacking a straw man, though. The thing that > >people push back on, or should push back on IMO, is when a proposed > >patch adds significant slowdown to queries that it has no or very little > >hope of improving. The trick is to do expensive stuff only when > >there's a good chance of getting a better plan out of it. > > > > Yeah, I agree with that. I think the main issue is that we don't really > know what the "expensive stuff" is in this case, so it's not really > clear how to be smarter :-( To add to this: I agree that ideally you'd check cheaply to know you're in a situation that might help, and then do more work. But here the question is always going to be simply "would we benefit from an ordering, and, if so, do we have it already partially sorted". It's hard to imagine that reducing much conceptually, so we're left with optimizations of that check. > One possibility is that it's just one of those regressions due to change > in binary layout, but I'm not sure know how to verify that. If we are testing with a case that can't actually add more paths (due to it checking the guc before building them), doesn't that effectively leave one of these two options: 1. Binary layout/cache/other untraceable change, or 2. Changes due to refactored function calls. There's not anything obvious in point (2) that would be a big cost, but there are definitely changes there. I was surprised that just eliminating the loop through the pathkeys on the query and the index was enough to save us ~4%. Tomas: Earlier you'd wondered about if we should try to shortcut the changes in costing...I was skeptical of that originally, but maybe it's worth looking into? I'm going to try backing that out and see what the numbers look like. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: Tomas Vondra writes: In general, I think it'd be naive that we can make planner smarter with no extra overhead spent on planning, and we can never accept patches adding even tiny overhead. With that approach we'd probably end up with a trivial planner that generates just a single query plan, because that's going to be the fastest planner. A realistic approach needs to consider both the planning and execution phase, and benefits of this patch seem to be clear - if you have queries that do benefit from it. I think that's kind of attacking a straw man, though. The thing that people push back on, or should push back on IMO, is when a proposed patch adds significant slowdown to queries that it has no or very little hope of improving. The trick is to do expensive stuff only when there's a good chance of getting a better plan out of it. Yeah, I agree with that. I think the main issue is that we don't really know what the "expensive stuff" is in this case, so it's not really clear how to be smarter :-( One possibility is that it's just one of those regressions due to change in binary layout, but I'm not sure know how to verify that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 06:00:00PM -0400, James Coleman wrote: On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra wrote: On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote: >On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra > wrote: >> >> The main thing I've been working on today is benchmarking how this >> affects planning. And I'm seeing a regression that worries me a bit, >> unfortunately. >> >> The test I'm doing is pretty simple - build a small table with a bunch >> of columns: >> >>create table t (a int, b int, c int, d int, e int, f int, g int); >> >>insert into t select 100*random(), 100*random(), 100*random(), >> 100*random(), 100*random(), 100*random(), 100*random() >>from generate_series(1,10) s(i); >> >> and then a number of indexes on subsets of up to 3 columns, as generated >> using the attached build-indexes.py script. And then run a bunch of >> explains (so no actual execution) sorting the data by at least 4 columns >> (to trigger incremental sort paths), measuring timing of the script. >> >> I did a bunch of runs on current master and v46 with incremental sort >> disabled and enabled, and the results look like this: >> >> master offon >> -- >> 34.60937.46337.729 >> >> which means about 8-9% regression with incremental sort. Of course, this >> is only for planning time, for execution the impact is going to be much >> smaller. But it's still a bit annoying. >> >> I've suspected this might be either due to the add_partial_path changes >> or the patch adding incremental sort to additional places, so I tested >> those parts individually and the answer is no - add_partial_path changes >> have very small impact (~1%, which might be noise). The regression comes >> mostly from the 0002 part that adds incremental sort. At least in this >> particular test - different tests might behave differently, of course. >> >> The annoying bit is that the overhead does not disappear after disabling >> incremental sort. That suggests this is not merely due to considering >> and creating higher number of paths, but due to something that happens >> before we even look at the GUC ... >> >> I think I've found one such place - if you look at compare_pathkeys, it >> has this check right before the forboth() loop: >> >> if (keys1 == keys2) >> return PATHKEYS_EQUAL; >> >> But with incremental sort we don't call pathkeys_contained_in, we call >> pathkeys_common_contained_in instead. And that does not call >> compare_pathkeys and does not have the simple equality check. Adding >> the following check seems to cut the overhead in half, which is nice: >> >> if (keys1 == keys2) >> { >> *n_common = list_length(keys1); >> return true; >> } >> >> Not sure where the rest of the regression comes from yet. > >I noticed in the other function we also optimize by checking if either >keys list is NIL, so I tried adding that, and it might have made a >minor difference, but it's hard to tell as it was under 1%. > Which other function? I don't see this optimization in compare_pathkeys, that only checks key1/key2 after the forboth loop, but that's something different. pathkeys_useful_for_ordering checks both inputs. I may be wrong, but my guess would be that this won't save much, because the loop should terminate immediately. The whole point is not to loop over possibly many pathkeys when we know it's exactly the same pathkeys list (same pointer). When one of the lists is NIL the loop should terminate right away ... >I also ran perf with a slightly modified version of your test that >uses psql, and after the above changes was seeing something like a >3.5% delta between master and this patch series. Nothing obvious in >the perf report though. > Yeah, I don't think there's anything obviously more expensive. >This test is intended to be somewhat worst case, no? At what point do >we consider the trade-off worth it (given that it's not plausible to >have zero impact)? > Yes, more or less. It was definitely designed to do that - it merely plans the query (no execution), with many applicable indexes etc. It's definitely true that once the exection starts to take more time, the overhead will get negligible. Same for reducing number of indexes. And of course, for queries that can benefit from incremental sort, the speedup is likely way larger than this. In general, I think it'd be naive that we can make planner smarter with no extra overhead spent on planning, and we can never accept patches adding even tiny overhead. With that approach we'd probably end up with a trivial planner that generates just a single query plan, because that's going to be the fastest planner. A realistic approach needs to consider both the planning and execution phase, and benefits of this patch seem to be clear - if you have queries that do benefit from it. Let's try to minimize the overhead a bit more, and then we'll see. Any
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Tomas Vondra writes: > In general, I think it'd be naive that we can make planner smarter with > no extra overhead spent on planning, and we can never accept patches > adding even tiny overhead. With that approach we'd probably end up with > a trivial planner that generates just a single query plan, because > that's going to be the fastest planner. A realistic approach needs to > consider both the planning and execution phase, and benefits of this > patch seem to be clear - if you have queries that do benefit from it. I think that's kind of attacking a straw man, though. The thing that people push back on, or should push back on IMO, is when a proposed patch adds significant slowdown to queries that it has no or very little hope of improving. The trick is to do expensive stuff only when there's a good chance of getting a better plan out of it. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 5:53 PM Tomas Vondra wrote: > > On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote: > >On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra > > wrote: > >> > >> The main thing I've been working on today is benchmarking how this > >> affects planning. And I'm seeing a regression that worries me a bit, > >> unfortunately. > >> > >> The test I'm doing is pretty simple - build a small table with a bunch > >> of columns: > >> > >>create table t (a int, b int, c int, d int, e int, f int, g int); > >> > >>insert into t select 100*random(), 100*random(), 100*random(), > >> 100*random(), 100*random(), 100*random(), 100*random() > >>from generate_series(1,10) s(i); > >> > >> and then a number of indexes on subsets of up to 3 columns, as generated > >> using the attached build-indexes.py script. And then run a bunch of > >> explains (so no actual execution) sorting the data by at least 4 columns > >> (to trigger incremental sort paths), measuring timing of the script. > >> > >> I did a bunch of runs on current master and v46 with incremental sort > >> disabled and enabled, and the results look like this: > >> > >> master offon > >> -- > >> 34.60937.46337.729 > >> > >> which means about 8-9% regression with incremental sort. Of course, this > >> is only for planning time, for execution the impact is going to be much > >> smaller. But it's still a bit annoying. > >> > >> I've suspected this might be either due to the add_partial_path changes > >> or the patch adding incremental sort to additional places, so I tested > >> those parts individually and the answer is no - add_partial_path changes > >> have very small impact (~1%, which might be noise). The regression comes > >> mostly from the 0002 part that adds incremental sort. At least in this > >> particular test - different tests might behave differently, of course. > >> > >> The annoying bit is that the overhead does not disappear after disabling > >> incremental sort. That suggests this is not merely due to considering > >> and creating higher number of paths, but due to something that happens > >> before we even look at the GUC ... > >> > >> I think I've found one such place - if you look at compare_pathkeys, it > >> has this check right before the forboth() loop: > >> > >> if (keys1 == keys2) > >> return PATHKEYS_EQUAL; > >> > >> But with incremental sort we don't call pathkeys_contained_in, we call > >> pathkeys_common_contained_in instead. And that does not call > >> compare_pathkeys and does not have the simple equality check. Adding > >> the following check seems to cut the overhead in half, which is nice: > >> > >> if (keys1 == keys2) > >> { > >> *n_common = list_length(keys1); > >> return true; > >> } > >> > >> Not sure where the rest of the regression comes from yet. > > > >I noticed in the other function we also optimize by checking if either > >keys list is NIL, so I tried adding that, and it might have made a > >minor difference, but it's hard to tell as it was under 1%. > > > > Which other function? I don't see this optimization in compare_pathkeys, > that only checks key1/key2 after the forboth loop, but that's something > different. pathkeys_useful_for_ordering checks both inputs. > I may be wrong, but my guess would be that this won't save much, because > the loop should terminate immediately. The whole point is not to loop > over possibly many pathkeys when we know it's exactly the same pathkeys > list (same pointer). When one of the lists is NIL the loop should > terminate right away ... > > >I also ran perf with a slightly modified version of your test that > >uses psql, and after the above changes was seeing something like a > >3.5% delta between master and this patch series. Nothing obvious in > >the perf report though. > > > > Yeah, I don't think there's anything obviously more expensive. > > >This test is intended to be somewhat worst case, no? At what point do > >we consider the trade-off worth it (given that it's not plausible to > >have zero impact)? > > > > Yes, more or less. It was definitely designed to do that - it merely > plans the query (no execution), with many applicable indexes etc. It's > definitely true that once the exection starts to take more time, the > overhead will get negligible. Same for reducing number of indexes. > > And of course, for queries that can benefit from incremental sort, the > speedup is likely way larger than this. > > In general, I think it'd be naive that we can make planner smarter with > no extra overhead spent on planning, and we can never accept patches > adding even tiny overhead. With that approach we'd probably end up with > a trivial planner that generates just a single query plan, because > that's going to be the fastest planner. A realistic approach needs to > consider both the planning and execution phase, and benefits of this > patch
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 02:23:15PM -0400, James Coleman wrote: On Mon, Mar 30, 2020 at 9:14 PM Tomas Vondra wrote: The main thing I've been working on today is benchmarking how this affects planning. And I'm seeing a regression that worries me a bit, unfortunately. The test I'm doing is pretty simple - build a small table with a bunch of columns: create table t (a int, b int, c int, d int, e int, f int, g int); insert into t select 100*random(), 100*random(), 100*random(), 100*random(), 100*random(), 100*random(), 100*random() from generate_series(1,10) s(i); and then a number of indexes on subsets of up to 3 columns, as generated using the attached build-indexes.py script. And then run a bunch of explains (so no actual execution) sorting the data by at least 4 columns (to trigger incremental sort paths), measuring timing of the script. I did a bunch of runs on current master and v46 with incremental sort disabled and enabled, and the results look like this: master offon -- 34.60937.46337.729 which means about 8-9% regression with incremental sort. Of course, this is only for planning time, for execution the impact is going to be much smaller. But it's still a bit annoying. I've suspected this might be either due to the add_partial_path changes or the patch adding incremental sort to additional places, so I tested those parts individually and the answer is no - add_partial_path changes have very small impact (~1%, which might be noise). The regression comes mostly from the 0002 part that adds incremental sort. At least in this particular test - different tests might behave differently, of course. The annoying bit is that the overhead does not disappear after disabling incremental sort. That suggests this is not merely due to considering and creating higher number of paths, but due to something that happens before we even look at the GUC ... I think I've found one such place - if you look at compare_pathkeys, it has this check right before the forboth() loop: if (keys1 == keys2) return PATHKEYS_EQUAL; But with incremental sort we don't call pathkeys_contained_in, we call pathkeys_common_contained_in instead. And that does not call compare_pathkeys and does not have the simple equality check. Adding the following check seems to cut the overhead in half, which is nice: if (keys1 == keys2) { *n_common = list_length(keys1); return true; } Not sure where the rest of the regression comes from yet. I noticed in the other function we also optimize by checking if either keys list is NIL, so I tried adding that, and it might have made a minor difference, but it's hard to tell as it was under 1%. Which other function? I don't see this optimization in compare_pathkeys, that only checks key1/key2 after the forboth loop, but that's something different. I may be wrong, but my guess would be that this won't save much, because the loop should terminate immediately. The whole point is not to loop over possibly many pathkeys when we know it's exactly the same pathkeys list (same pointer). When one of the lists is NIL the loop should terminate right away ... I also ran perf with a slightly modified version of your test that uses psql, and after the above changes was seeing something like a 3.5% delta between master and this patch series. Nothing obvious in the perf report though. Yeah, I don't think there's anything obviously more expensive. This test is intended to be somewhat worst case, no? At what point do we consider the trade-off worth it (given that it's not plausible to have zero impact)? Yes, more or less. It was definitely designed to do that - it merely plans the query (no execution), with many applicable indexes etc. It's definitely true that once the exection starts to take more time, the overhead will get negligible. Same for reducing number of indexes. And of course, for queries that can benefit from incremental sort, the speedup is likely way larger than this. In general, I think it'd be naive that we can make planner smarter with no extra overhead spent on planning, and we can never accept patches adding even tiny overhead. With that approach we'd probably end up with a trivial planner that generates just a single query plan, because that's going to be the fastest planner. A realistic approach needs to consider both the planning and execution phase, and benefits of this patch seem to be clear - if you have queries that do benefit from it. Let's try to minimize the overhead a bit more, and then we'll see. Also, while looking at pathkeys_common_contained_in(), I've been a bit puzzled why does is this correct: return (key1 == NULL); It's easy to not notice it's key1 and not keys1, so I suggest we add a comment 'key1 == NULL' means we've processed whole keys1 list. Done. I've included fixes for Alvaro's comments in this patch series also.
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > On Tue, Mar 31, 2020 at 1:04 PM Tom Lane wrote: >> Perhaps the semantics are such that that's actually sensible, but it's >> far from a straightforward remapping of the old enum. > Right, I didn't see the explicit "= 0" in other enums there, so it > made me wonder if it was intentional to designate that one had to be > 0, but I guess without a comment that's a lot of inference. It's possible that somebody meant that as an indicator that the code depends on palloc0() leaving the field with that value. But if so, you'd soon find that out ... and an actual comment would be better, anyway. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 1:04 PM Tom Lane wrote: > > James Coleman writes: > > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory > > + * instrumentation so needs to have each value be a separate bit. > > >> I don't quite understand why you skipped "1". (Also, is the use of zero > >> a wise choice?) > > > The assignment of 0 was already there, and there wasn't a comment to > > indicate why. That ends up meaning we wouldn't display "still in > > progress" as a type here, which is maybe desirable, but I'm honestly > > not sure why it was that way originally. I'm curious if you have any > > thoughts on it. > > As things stood, the "= 0" was a no-op, since the first enum value > would've been that anyway. But if you're converting this set of symbols > to bits that can be OR'd together, it seems pretty strange to use zero, > because that can't be distinguished from "absence of any entry". > > Perhaps the semantics are such that that's actually sensible, but it's > far from a straightforward remapping of the old enum. Right, I didn't see the explicit "= 0" in other enums there, so it made me wonder if it was intentional to designate that one had to be 0, but I guess without a comment that's a lot of inference. The semantics seemed somewhat useful here in theory, but since I'm not hearing a "yeah, that was intentional but not commented", I'm just going to change it to what you'd naturally expect. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory > + * instrumentation so needs to have each value be a separate bit. >> I don't quite understand why you skipped "1". (Also, is the use of zero >> a wise choice?) > The assignment of 0 was already there, and there wasn't a comment to > indicate why. That ends up meaning we wouldn't display "still in > progress" as a type here, which is maybe desirable, but I'm honestly > not sure why it was that way originally. I'm curious if you have any > thoughts on it. As things stood, the "= 0" was a no-op, since the first enum value would've been that anyway. But if you're converting this set of symbols to bits that can be OR'd together, it seems pretty strange to use zero, because that can't be distinguished from "absence of any entry". Perhaps the semantics are such that that's actually sensible, but it's far from a straightforward remapping of the old enum. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Tue, Mar 31, 2020 at 12:31 PM Alvaro Herrera wrote: > > On 2020-Mar-30, James Coleman wrote: > > > +/* > > + *Instruementation information for IncrementalSort > > + * > > + */ > > +typedef struct IncrementalSortGroupInfo > > +{ > > + int64 groupCount; > > + longmaxDiskSpaceUsed; > > + longtotalDiskSpaceUsed; > > + longmaxMemorySpaceUsed; > > + longtotalMemorySpaceUsed; > > + SizesortMethods; /* bitmask of TuplesortMethod */ > > +} IncrementalSortGroupInfo; > > There's a typo "Instruementation" in the comment, but I'm more surprised > that type Size is being used to store a bitmask. It looks weird to me. > Wouldn't it be more reasonable to use bits32 or some such? (I first > noticed this in the "sizeof(Size)" code that appears in the explain > code.) I just didn't know about bits32; I'll change. > OTOH, aesthetically it would seem to be better to define these values > using ones and increasing shifts (1 << 1 and so on), rather than powers > of two: > > > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory > > + * instrumentation so needs to have each value be a separate bit. > > */ > > typedef enum > > { > > SORT_TYPE_STILL_IN_PROGRESS = 0, > > - SORT_TYPE_TOP_N_HEAPSORT, > > - SORT_TYPE_QUICKSORT, > > - SORT_TYPE_EXTERNAL_SORT, > > - SORT_TYPE_EXTERNAL_MERGE > > + SORT_TYPE_TOP_N_HEAPSORT = 2, > > + SORT_TYPE_QUICKSORT = 4, > > + SORT_TYPE_EXTERNAL_SORT = 8, > > + SORT_TYPE_EXTERNAL_MERGE = 16 > > } TuplesortMethod; > > I don't quite understand why you skipped "1". (Also, is the use of zero > a wise choice?) The assignment of 0 was already there, and there wasn't a comment to indicate why. That ends up meaning we wouldn't display "still in progress" as a type here, which is maybe desirable, but I'm honestly not sure why it was that way originally. I'm curious if you have any thoughts on it. I knew some projects used increasing shifts, but I wasn't sure what the preference was here. I'll correct. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On 2020-Mar-30, James Coleman wrote: > +/* > + *Instruementation information for IncrementalSort > + * > + */ > +typedef struct IncrementalSortGroupInfo > +{ > + int64 groupCount; > + longmaxDiskSpaceUsed; > + longtotalDiskSpaceUsed; > + longmaxMemorySpaceUsed; > + longtotalMemorySpaceUsed; > + SizesortMethods; /* bitmask of TuplesortMethod */ > +} IncrementalSortGroupInfo; There's a typo "Instruementation" in the comment, but I'm more surprised that type Size is being used to store a bitmask. It looks weird to me. Wouldn't it be more reasonable to use bits32 or some such? (I first noticed this in the "sizeof(Size)" code that appears in the explain code.) OTOH, aesthetically it would seem to be better to define these values using ones and increasing shifts (1 << 1 and so on), rather than powers of two: > + * TuplesortMethod is used in a bitmask in Increment Sort's shared memory > + * instrumentation so needs to have each value be a separate bit. > */ > typedef enum > { > SORT_TYPE_STILL_IN_PROGRESS = 0, > - SORT_TYPE_TOP_N_HEAPSORT, > - SORT_TYPE_QUICKSORT, > - SORT_TYPE_EXTERNAL_SORT, > - SORT_TYPE_EXTERNAL_MERGE > + SORT_TYPE_TOP_N_HEAPSORT = 2, > + SORT_TYPE_QUICKSORT = 4, > + SORT_TYPE_EXTERNAL_SORT = 8, > + SORT_TYPE_EXTERNAL_MERGE = 16 > } TuplesortMethod; I don't quite understand why you skipped "1". (Also, is the use of zero a wise choice?) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Mon, Mar 30, 2020 at 06:53:47PM -0400, James Coleman wrote: On Mon, Mar 30, 2020 at 8:24 AM Tomas Vondra wrote: On Sun, Mar 29, 2020 at 10:16:53PM -0400, James Coleman wrote: >On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra > wrote: >> >> Hi, >> >> Attached is a slightly reorganized patch series. I've merged the fixes >> into the appropriate matches, and I've also combined the two patches >> adding incremental sort paths to additional places in planner. >> >> A couple more comments: >> >> >> 1) I think the GUC documentation in src/sgml/config.sgml is a bit too >> detailed, compared to the other enable_* GUCs. I wonder if there's a >> better place where to move the details. What about adding some examples >> and explanation to perform.sgml? > >I'll take a look at that and include in a patch series tomorrow. Attached. >> 2) Looking at the explain output, the verbose mode looks like this: >> >> test=# explain (verbose, analyze) select a from t order by a, b, c; >> QUERY PLAN >> -- >> Gather Merge (cost=66.31..816072.71 rows=8333226 width=24) (actual time=4.787..20092.555 rows=1000 loops=1) >> Output: a, b, c >> Workers Planned: 2 >> Workers Launched: 2 >> -> Incremental Sort (cost=66.28..729200.36 rows=4166613 width=24) (actual time=1.308..14021.575 rows=333 loops=3) >> Output: a, b, c >> Sort Key: t.a, t.b, t.c >> Presorted Key: t.a, t.b >> Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB peak=30kB >> Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB peak=138kB >> Worker 0: actual time=0.766..16122.368 rows=3841573 loops=1 >> Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB >> Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB peak=141kB >> Worker 1: actual time=1.986..16189.831 rows=3845490 loops=1 >> Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB >> Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB peak=139kB >> -> Parallel Index Scan using t_a_b_idx on public.t (cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 rows=333 loops=3) >> Output: a, b, c >> Worker 0: actual time=0.048..11275.178 rows=3841573 loops=1 >> Worker 1: actual time=0.041..11314.133 rows=3845490 loops=1 >> Planning Time: 0.166 ms >> Execution Time: 25135.029 ms >> (22 rows) >> >> There seems to be missing indentation for the first line of worker info. > >Working on that too. See attached. I've folded in the original "explain fixes" patch into the main series, and the "explain fixes" patch in this series contains only the changes for the above. Thanks. I'll take a look at those changes tomorrow. >> I'm still not quite convinced we should be printing two lines - I know >> you mentioned the lines might be too long, but see how long the other >> lines may get ... > >All right, I give in :) > >Do you think non-workers (both the leader and non-parallel plans) >should also move to one line? > I think we should use the same formatting for both cases, so yes. FWIW I forgot to mention I tweaked the INSTRUMENT_SORT_GROUP macro a bit, by moving the if condition in it. That makes the calls easier. Ah, that actually fixed some of the compile warnings. The other is fixed in my explain fixes patch. >> 3) I see the new nodes (plan state, ...) have "presortedCols" which does >> not indicate it's a "number of". I think we usually prefix names of such >> fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.) > >I can fix this too. Changed everywhere we used this var name. I'm tempted to change to nPresortedKeys, but a cursory glance suggests some cases might actually be consistent with other var names reference columns, so I'm not sure if we want to go down that path (and change more than just this). Not sure. We use "sort keys" and "path keys" for this, but I think "columns" is good enough. The main thing I've been working on today is benchmarking how this affects planning. And I'm seeing a regression that worries me a bit, unfortunately. The test I'm doing is pretty simple - build a small table with a bunch of columns: create table t (a int, b int, c int, d int, e int, f int, g int); insert into t select 100*random(), 100*random(), 100*random(), 100*random(), 100*random(), 100*random(), 100*random() from generate_series(1,10) s(i); and then a number of indexes on subsets of up to 3 columns, as generated using the attached build-indexes.py script. And then run a bunch of explains (so no actual execution) sorting the data by at least 4
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, Mar 29, 2020 at 10:16:53PM -0400, James Coleman wrote: On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra wrote: Hi, Attached is a slightly reorganized patch series. I've merged the fixes into the appropriate matches, and I've also combined the two patches adding incremental sort paths to additional places in planner. A couple more comments: 1) I think the GUC documentation in src/sgml/config.sgml is a bit too detailed, compared to the other enable_* GUCs. I wonder if there's a better place where to move the details. What about adding some examples and explanation to perform.sgml? I'll take a look at that and include in a patch series tomorrow. 2) Looking at the explain output, the verbose mode looks like this: test=# explain (verbose, analyze) select a from t order by a, b, c; QUERY PLAN -- Gather Merge (cost=66.31..816072.71 rows=8333226 width=24) (actual time=4.787..20092.555 rows=1000 loops=1) Output: a, b, c Workers Planned: 2 Workers Launched: 2 -> Incremental Sort (cost=66.28..729200.36 rows=4166613 width=24) (actual time=1.308..14021.575 rows=333 loops=3) Output: a, b, c Sort Key: t.a, t.b, t.c Presorted Key: t.a, t.b Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB peak=30kB Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB peak=138kB Worker 0: actual time=0.766..16122.368 rows=3841573 loops=1 Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB peak=141kB Worker 1: actual time=1.986..16189.831 rows=3845490 loops=1 Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB peak=139kB -> Parallel Index Scan using t_a_b_idx on public.t (cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 rows=333 loops=3) Output: a, b, c Worker 0: actual time=0.048..11275.178 rows=3841573 loops=1 Worker 1: actual time=0.041..11314.133 rows=3845490 loops=1 Planning Time: 0.166 ms Execution Time: 25135.029 ms (22 rows) There seems to be missing indentation for the first line of worker info. Working on that too. I'm still not quite convinced we should be printing two lines - I know you mentioned the lines might be too long, but see how long the other lines may get ... All right, I give in :) Do you think non-workers (both the leader and non-parallel plans) should also move to one line? I think we should use the same formatting for both cases, so yes. FWIW I forgot to mention I tweaked the INSTRUMENT_SORT_GROUP macro a bit, by moving the if condition in it. That makes the calls easier. 3) I see the new nodes (plan state, ...) have "presortedCols" which does not indicate it's a "number of". I think we usually prefix names of such fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.) I can fix this too. Also I noticed a few compiler warnings I'll fixup in tomorrow's reply. OK regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, Mar 29, 2020 at 9:44 PM Tomas Vondra wrote: > > Hi, > > Attached is a slightly reorganized patch series. I've merged the fixes > into the appropriate matches, and I've also combined the two patches > adding incremental sort paths to additional places in planner. > > A couple more comments: > > > 1) I think the GUC documentation in src/sgml/config.sgml is a bit too > detailed, compared to the other enable_* GUCs. I wonder if there's a > better place where to move the details. What about adding some examples > and explanation to perform.sgml? I'll take a look at that and include in a patch series tomorrow. > 2) Looking at the explain output, the verbose mode looks like this: > > test=# explain (verbose, analyze) select a from t order by a, b, c; > > QUERY PLAN > -- > Gather Merge (cost=66.31..816072.71 rows=8333226 width=24) (actual > time=4.787..20092.555 rows=1000 loops=1) > Output: a, b, c > Workers Planned: 2 > Workers Launched: 2 > -> Incremental Sort (cost=66.28..729200.36 rows=4166613 width=24) > (actual time=1.308..14021.575 rows=333 loops=3) > Output: a, b, c > Sort Key: t.a, t.b, t.c > Presorted Key: t.a, t.b > Full-sort Groups: 4169 Sort Method: quicksort Memory: avg=30kB > peak=30kB > Presorted Groups: 4144 Sort Method: quicksort Memory: avg=128kB > peak=138kB > Worker 0: actual time=0.766..16122.368 rows=3841573 loops=1 > Full-sort Groups: 6871 Sort Method: quicksort Memory: avg=30kB peak=30kB > Presorted Groups: 6823 Sort Method: quicksort Memory: avg=132kB > peak=141kB > Worker 1: actual time=1.986..16189.831 rows=3845490 loops=1 > Full-sort Groups: 6874 Sort Method: quicksort Memory: avg=30kB peak=30kB > Presorted Groups: 6847 Sort Method: quicksort Memory: avg=130kB > peak=139kB > -> Parallel Index Scan using t_a_b_idx on public.t > (cost=0.43..382365.92 rows=4166613 width=24) (actual time=0.040..9808.449 > rows=333 loops=3) > Output: a, b, c > Worker 0: actual time=0.048..11275.178 rows=3841573 loops=1 > Worker 1: actual time=0.041..11314.133 rows=3845490 loops=1 > Planning Time: 0.166 ms > Execution Time: 25135.029 ms > (22 rows) > > There seems to be missing indentation for the first line of worker info. Working on that too. > I'm still not quite convinced we should be printing two lines - I know > you mentioned the lines might be too long, but see how long the other > lines may get ... All right, I give in :) Do you think non-workers (both the leader and non-parallel plans) should also move to one line? > 3) I see the new nodes (plan state, ...) have "presortedCols" which does > not indicate it's a "number of". I think we usually prefix names of such > fields "n" or "num". What about "nPresortedCols"? (Nitpicking, I know.) I can fix this too. Also I noticed a few compiler warnings I'll fixup in tomorrow's reply. > My TODO for this patch is this: > > - review the costing (I think the estimates are OK, but I recall I >haven't been entirely happy with how it's broken into functions.) > > - review the tuplesort changes (the memory contexts etc.) > > - do more testing of performance impact on planning Sounds good. Thanks, James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, Mar 28, 2020 at 11:14 PM Tomas Vondra wrote: > > On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote: > >On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra > > wrote: > >> > >> Hi, > >> > >> Attached is my take on simplification of the useful pathkeyes thing. It > >> keeps the function, but it truncates query_pathkeys to only members with > >> EC members in the relation. I think that's essentially the optimization > >> you've proposed. > > > >Thanks. I've included that in the patch series in this email (as a > >separate patch) with a few additional comments. > > > > Thanks. > > >I've also noticed that the enabled_incrementalsort check in > >generate_useful_gather_paths seemed broken, because it returned us out > >of the function before creating either a plain gather merge (if > >already sorted) or an explicit sort path. I've included a patch that > >moves it to the if block that actually builds the incremental sort > >path. > > > > Hmmm, that's probably right. > > The reason why the GUC check was right after generate_gather_paths is > that the intent to disable all the useful-pathkeys business, essentially > reducing it back to plain generate_gather_paths. > > But I think you're right that's wrong, because it might lead to strange > behavior when the GUC switches between plans without any incremental > sort nodes - setting it to 'true' might end up picking GM on top of > plain Sort, for example. Thanks. > >> > >> ... > >> > >> 1) Missing worker identification (Worker #). > > > >Fixed. > > > >> 2) Missing method for workers (we have it for the leader, though). > > > >Fixed. Since we can't have pointers in the parallel shared memory > >space, we can't store the sort methods used in a list. To accomplish > >the same goal, I've assigned the TuplesortMethod enum entries uique > >bit positions, and store the methods used in a bitmask. > > > > OK, makes sense. > > >> 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and > >> why it's in parenthesis. > > > >I've removed the parentheses. It's labeled "Methods" since there can > >be more than one (different batches could use different methods). I've > >updated this to properly use singular/plural depending on the number > >of methods used. > > > > I'm a bit confused. How do I know which batch used which method? Is it > actually worth printing in explain analyze? Maybe only print it in the > verbose mode? The alternative is showing no sort method information at all, or only showing it if all batches used the same method (which seems confusing to me). It seems weird that we wouldn't try to find some rough analogue to what a regular sort node outputs, so I've attempted to summarize. This is similar to the memory information: the average doesn't apply to any one batch, and you don't know which one (or how many) hit the peak memory usage either, but I think it's meaningful to know a summary. With the sort methods, I think it's useful to be able to, for example, know if any of the groups happened to trigger the top-n heapsort optimization, or not, and as a corollary, if all of them did or not. > >> 4) Not sure having two lines for each worker is a great idea. > > > >I've left these in for now because the lines are already very long > >(much, much longer than the worker lines in a standard sort node). > >This is largely because we're trying to summarize many sort batches, > >while standard sort nodes only have to give the exact stats from a > >single batch. > > > >See the example output later in the email. > > > > OK > > >> 5) I'd probably prefer having multiple labels for avg/max memory values, > >> instead of (avg) and (max) notes. Also, I think we use "peak" in this > >> context instead of "max". > > > >Updated. > > > > OK > > >Here's the current output: > > > > Limit (cost=1887419.20..1889547.68 rows=1 width=8) (actual > >time=13218.403..13222.519 rows=1 loo > >ps=1) > > -> Gather Merge (cost=1887419.20..19624748.03 rows=8360 > >width=8) (actual time=13218.401..13229.7 > >50 rows=1 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Incremental Sort (cost=1886419.17..10005010.55 > >rows=4180 width=8) (actual time=13208.00 > >4..13208.586 rows=4425 loops=3) > > Sort Key: a, b > > Presorted Key: a > > Full-sort Groups: 1 Sort Method: quicksort Memory: > >avg=28kB peak=28kB > > Presorted Groups: 1 Sort Method: top-N heapsort Memory: > >avg=1681kB peak=1681kB > > Worker 0: Full-sort Groups: 1 Sort Method: quicksort > >Memory: avg=28kB peak=28kB > > Presorted Groups: 1 Sort Method: top-N heapsort > >Memory: avg=1680kB peak=1680kB > > Worker 1: Full-sort Groups: 1 Sort Method: quicksort > >Memory: avg=28kB peak=28kB > > Presorted Groups: 1 Sort Method: top-N heapsort > >Memory: avg=1682kB peak=1682kB > > -> Parallel Index Scan using index_s_a on
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote: On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra wrote: Hi, Attached is my take on simplification of the useful pathkeyes thing. It keeps the function, but it truncates query_pathkeys to only members with EC members in the relation. I think that's essentially the optimization you've proposed. Thanks. I've included that in the patch series in this email (as a separate patch) with a few additional comments. Thanks. I've also noticed that the enabled_incrementalsort check in generate_useful_gather_paths seemed broken, because it returned us out of the function before creating either a plain gather merge (if already sorted) or an explicit sort path. I've included a patch that moves it to the if block that actually builds the incremental sort path. Hmmm, that's probably right. The reason why the GUC check was right after generate_gather_paths is that the intent to disable all the useful-pathkeys business, essentially reducing it back to plain generate_gather_paths. But I think you're right that's wrong, because it might lead to strange behavior when the GUC switches between plans without any incremental sort nodes - setting it to 'true' might end up picking GM on top of plain Sort, for example. ... 1) Missing worker identification (Worker #). Fixed. 2) Missing method for workers (we have it for the leader, though). Fixed. Since we can't have pointers in the parallel shared memory space, we can't store the sort methods used in a list. To accomplish the same goal, I've assigned the TuplesortMethod enum entries uique bit positions, and store the methods used in a bitmask. OK, makes sense. 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and why it's in parenthesis. I've removed the parentheses. It's labeled "Methods" since there can be more than one (different batches could use different methods). I've updated this to properly use singular/plural depending on the number of methods used. I'm a bit confused. How do I know which batch used which method? Is it actually worth printing in explain analyze? Maybe only print it in the verbose mode? 4) Not sure having two lines for each worker is a great idea. I've left these in for now because the lines are already very long (much, much longer than the worker lines in a standard sort node). This is largely because we're trying to summarize many sort batches, while standard sort nodes only have to give the exact stats from a single batch. See the example output later in the email. OK 5) I'd probably prefer having multiple labels for avg/max memory values, instead of (avg) and (max) notes. Also, I think we use "peak" in this context instead of "max". Updated. OK Here's the current output: Limit (cost=1887419.20..1889547.68 rows=1 width=8) (actual time=13218.403..13222.519 rows=1 loo ps=1) -> Gather Merge (cost=1887419.20..19624748.03 rows=8360 width=8) (actual time=13218.401..13229.7 50 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 -> Incremental Sort (cost=1886419.17..10005010.55 rows=4180 width=8) (actual time=13208.00 4..13208.586 rows=4425 loops=3) Sort Key: a, b Presorted Key: a Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB Presorted Groups: 1 Sort Method: top-N heapsort Memory: avg=1681kB peak=1681kB Worker 0: Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB Presorted Groups: 1 Sort Method: top-N heapsort Memory: avg=1680kB peak=1680kB Worker 1: Full-sort Groups: 1 Sort Method: quicksort Memory: avg=28kB peak=28kB Presorted Groups: 1 Sort Method: top-N heapsort Memory: avg=1682kB peak=1682kB -> Parallel Index Scan using index_s_a on s (cost=0.57..4967182.06 rows=4180 width=8 ) (actual time=0.455..11730.878 rows=668 loops=3) James Looks reasonable. Did you try it in other output formats - json/yaml? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Hi, Attached is my take on simplification of the useful pathkeyes thing. It keeps the function, but it truncates query_pathkeys to only members with EC members in the relation. I think that's essentially the optimization you've proposed. I've also noticed an issue in explain output. EXPLAIN ANALYZE on a simple query gives me this: QUERY PLAN --- Gather Merge (cost=66.30..816060.48 rows=8333226 width=24) (actual time=6.464..19091.006 rows=1000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Incremental Sort (cost=66.28..729188.13 rows=4166613 width=24) (actual time=1.836..13401.109 rows=333 loops=3) Sort Key: a, b, c Presorted Key: a, b Full-sort Groups: 4156 (Methods: quicksort) Memory: 30kB (avg), 30kB (max) Presorted Groups: 4137 (Methods: quicksort) Memory: 108kB (avg), 111kB (max) Full-sort Groups: 6888 (Methods: ) Memory: 30kB (avg), 30kB (max) Presorted Groups: 6849 (Methods: ) Memory: 121kB (avg), 131kB (max) Full-sort Groups: 6869 (Methods: ) Memory: 30kB (avg), 30kB (max) Presorted Groups: 6816 (Methods: ) Memory: 128kB (avg), 132kB (max) -> Parallel Index Scan using t_a_b_idx on t (cost=0.43..382353.69 rows=4166613 width=24) (actual time=0.033..9346.679 rows=333 loops=3) Planning Time: 0.133 ms Execution Time: 23998.669 ms (15 rows) while with incremental sort disabled it looks like this: QUERY PLAN - Gather Merge (cost=734387.50..831676.35 rows=8333226 width=24) (actual time=5597.978..14967.246 rows=1000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=734387.47..744804.00 rows=4166613 width=24) (actual time=5584.616..7645.711 rows=333 loops=3) Sort Key: a, b, c Sort Method: external merge Disk: 111216kB Worker 0: Sort Method: external merge Disk: 109552kB Worker 1: Sort Method: external merge Disk: 112112kB -> Parallel Seq Scan on t (cost=0.00..105361.13 rows=4166613 width=24) (actual time=0.011..1753.128 rows=333 loops=3) Planning Time: 0.048 ms Execution Time: 19682.582 ms (11 rows) So I think there's a couple of issues: 1) Missing worker identification (Worker #). 2) Missing method for workers (we have it for the leader, though). 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and why it's in parenthesis. 4) Not sure having two lines for each worker is a great idea. 5) I'd probably prefer having multiple labels for avg/max memory values, instead of (avg) and (max) notes. Also, I think we use "peak" in this context instead of "max". regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 32bf734820..49d4990e66 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2748,7 +2748,6 @@ static List * get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel) { List *useful_pathkeys_list = NIL; - ListCell *lc; /* * Considering query_pathkeys is always worth it, because it might allow us @@ -2756,29 +2755,27 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel) */ if (root->query_pathkeys) { - boolquery_pathkeys_ok = true; + ListCell *lc; + List *pathkeys = NIL; foreach(lc, root->query_pathkeys) { PathKey*pathkey = (PathKey *) lfirst(lc); EquivalenceClass *pathkey_ec = pathkey->pk_eclass; - Expr *em_expr; /* -* We can't use incremental sort for pathkeys containing volatile -* expressions. We could walk the exppression itself, but checking -* ec_has_volatile here saves some cycles. +* We can't build Incremental Sort when pathkeys contain elements +* without an EC member not contained in the current relation, so +* just ignore anything after the first such pathkey. */ - if (pathkey_ec->ec_has_volatile || - !(em_expr = find_em_expr_for_rel(pathkey_ec, rel))) - { -
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, Mar 28, 2020 at 5:30 PM Tomas Vondra wrote: > > On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote: > >On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra > > wrote: > >> > >> ... > >> > >> The more I look at pathkeys_useful_for_ordering() the more I think the > >> get_useful_pathkeys_for_relation() function should look more like it > >> than the postgres_fdw one ... > > > >If we go down that path (and indeed this is actually implied by > >removing the volatile check too), doesn't that really just mean (at > >least for now) that get_useful_pathkeys_for_relation goes away > >entirely and we effectively use root->query_pathkeys directly? The > >only thing your lose them is the check that each eclass has a member > >in the rel. But that check probably wasn't quite right anyway: at > >least for incremental sort (maybe not regular sort), I think we > >probably don't necessarily care that the entire list has members in > >the rel, but rather that some prefix does, right? The idea there would > >be that we could create a gather merge here that supplies a partial > >ordering (that prefix of query_pathkeys) and then above it the planner > >might place another incremental sort (say above a join), or perhaps > >even a join above that would actually be sufficient (since many joins > >are capable of providing an ordering anyway). > > > >I've attached (added prefix .txt to avoid the cfbot assuming this is a > >full series) an idea in that direction to see if we're thinking along > >the same route. You'll want to apply and view with `git diff -w` > >probably. > > > > Hmmm, I'm not sure the patch is quite correct. > > Firstly, I suggest we don't remove get_useful_pathkeys_for_relation > entirely, because that allows us to add more useful pathkeys in the > future (even if we don't consider pathkeys useful for merging now). > We could also do the EC optimization in the function, return NIL and > not loop over partial_pathlist at all. That's cosmetic issue, though. > > More importantly, I'm not sure this makes sense: > >/* consider incremental sort for interesting orderings */ >useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys); > >... > >is_sorted = pathkeys_common_contained_in(useful_pathkeys, > subpath->pathkeys, > _keys); > > I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys, > right? So how could this ever return is_sorted=false? > > The whole point is to end up with query_pathkeys (or whatever pathkeys > we deem useful), but this does not do that. Yes, that's obviously a thinko in my rush to get an idea out. I think useful_pathkeys there would be essentially root->query_pathkeys, or, more correctly the prefix of query_pathkeys that has eclasses shared with the current rel. Or, if we go with the more restrictive approach (see the two approaches I mentioned earlier), then either NIL or query_pathkeys (if they all matches eclasses in the current rel). Given those requirements, I agree that keeping get_useful_pathkeys_for_relation makes sense to wrap up all of that behavior. James
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote: On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra wrote: ... The more I look at pathkeys_useful_for_ordering() the more I think the get_useful_pathkeys_for_relation() function should look more like it than the postgres_fdw one ... If we go down that path (and indeed this is actually implied by removing the volatile check too), doesn't that really just mean (at least for now) that get_useful_pathkeys_for_relation goes away entirely and we effectively use root->query_pathkeys directly? The only thing your lose them is the check that each eclass has a member in the rel. But that check probably wasn't quite right anyway: at least for incremental sort (maybe not regular sort), I think we probably don't necessarily care that the entire list has members in the rel, but rather that some prefix does, right? The idea there would be that we could create a gather merge here that supplies a partial ordering (that prefix of query_pathkeys) and then above it the planner might place another incremental sort (say above a join), or perhaps even a join above that would actually be sufficient (since many joins are capable of providing an ordering anyway). I've attached (added prefix .txt to avoid the cfbot assuming this is a full series) an idea in that direction to see if we're thinking along the same route. You'll want to apply and view with `git diff -w` probably. Hmmm, I'm not sure the patch is quite correct. Firstly, I suggest we don't remove get_useful_pathkeys_for_relation entirely, because that allows us to add more useful pathkeys in the future (even if we don't consider pathkeys useful for merging now). We could also do the EC optimization in the function, return NIL and not loop over partial_pathlist at all. That's cosmetic issue, though. More importantly, I'm not sure this makes sense: /* consider incremental sort for interesting orderings */ useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys); ... is_sorted = pathkeys_common_contained_in(useful_pathkeys, subpath->pathkeys, _keys); I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys, right? So how could this ever return is_sorted=false? The whole point is to end up with query_pathkeys (or whatever pathkeys we deem useful), but this does not do that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services