Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Tue, Sep 26, 2017 at 2:56 PM, Robert Haaswrote: > On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud wrote: >> That's true, but numCols, sortColdIdx etc are also used to display the >> sort key in an explain. If an append can return sorted data, it >> should also display the sort information, so I think these fields are >> still required in an Append node. > > I don't think so. An index scan doesn't output that information, nor > does a nested loop which inherits a sort order from its outer path. I > think the rule is that a plan node which takes steps to get the data > into a certain order might want to output something about that, but a > plan node which somehow gets that ordering without taking any explicit > action doesn't print anything. Oh, ok that indeed makes sense. As I said I'll remove all the useless noise and send an updated patch. Thanks again. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaudwrote: > That's true, but numCols, sortColdIdx etc are also used to display the > sort key in an explain. If an append can return sorted data, it > should also display the sort information, so I think these fields are > still required in an Append node. I don't think so. An index scan doesn't output that information, nor does a nested loop which inherits a sort order from its outer path. I think the rule is that a plan node which takes steps to get the data into a certain order might want to output something about that, but a plan node which somehow gets that ordering without taking any explicit action doesn't print anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Fri, Sep 22, 2017 at 11:09 PM, Robert Haaswrote: > During planning, *every* node has a list of pathkeys associated with > it which represent the presumed output ordering. You can support > ordered append paths without changing any data structures at all; it's > just a matter of putting pathkeys into an AppendPath. > I totally agree. > The reason why MergeAppend has extra stuff in the node (numCols, > sortColIdx, etc.) is so that it can actually perform the sort at > runtime. But this feature requires no runtime support -- that's kinda > the point -- so there's no need for it to carry that information, or > to add any new nodes. > > At least, IIUC. That's true, but numCols, sortColdIdx etc are also used to display the sort key in an explain. If an append can return sorted data, it should also display the sort information, so I think these fields are still required in an Append node. If it's ok to only add these fields in an Append node for the explain part, I'll revert the Append/MergeAppend refactoring and do some other cleanup as you advised earlier. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Fri, Sep 22, 2017 at 4:23 PM, Julien Rouhaudwrote: > That was one of the first question we had with the initial POC. The > reason is that this "sorted append" is not using a merge algorithm but > just appending partitions in the right order, so it looked like we > could either create a new SortedAppend node, or use current Append > node and allow it to return sorted data. We chose the 2nd solution, > and ended up with a lot of duplicated code (all the code for the > ordering), so we tried to have Append and MergeAppend share this code. > I'm not at all satisfied with current shape, but I'm still not sure on > what node to use for this. Do you have any idea on this? During planning, *every* node has a list of pathkeys associated with it which represent the presumed output ordering. You can support ordered append paths without changing any data structures at all; it's just a matter of putting pathkeys into an AppendPath. The reason why MergeAppend has extra stuff in the node (numCols, sortColIdx, etc.) is so that it can actually perform the sort at runtime. But this feature requires no runtime support -- that's kinda the point -- so there's no need for it to carry that information, or to add any new nodes. At least, IIUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Fri, Sep 22, 2017 at 9:58 PM, Robert Haaswrote: > On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud wrote: >> PFA v3 of the patch, once again rebased and that now use all the newly >> available infrastructure. >> >> I also added a check that a sorted append can't be generated when a >> default partition has been declared. > > I just took a quick look at this and was kind of surprised to see that > it didn't look much like what I would expect. > > What I would expect is: >[...] Thanks a lot for the pointers, that's really helpful! > The extensive patch does a lot of other stuff, like > whacking around the structure of AppendPath vs. MergeAppendPath, and > I'm not sure why we need or want any of that. I might be missing > something, though. That was one of the first question we had with the initial POC. The reason is that this "sorted append" is not using a merge algorithm but just appending partitions in the right order, so it looked like we could either create a new SortedAppend node, or use current Append node and allow it to return sorted data. We chose the 2nd solution, and ended up with a lot of duplicated code (all the code for the ordering), so we tried to have Append and MergeAppend share this code. I'm not at all satisfied with current shape, but I'm still not sure on what node to use for this. Do you have any idea on this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaudwrote: > PFA v3 of the patch, once again rebased and that now use all the newly > available infrastructure. > > I also added a check that a sorted append can't be generated when a > default partition has been declared. I just took a quick look at this and was kind of surprised to see that it didn't look much like what I would expect. What I would expect is: 1. The PartitionDescData grows a new flag indicating whether partitions can be regarded as being in bound order. This is true for range partitions without a default partition, list partitions without a default partition and without interleaved bounds, and maybe other cases we want to worry about later. We set this flag correctly when we build the PartitionDesc and propagate it into the RelOptInfo for the partitioned table when we set up its other partitioning details. 2. generate_mergeappend_paths() gets renamed to generate_sorted_append_paths(). At the top of the function, it checks whether the above flag is set; if not, give up on this optimization. Otherwise, figure out what the set of pathkeys that correspond to the partition key would look like. Call these the partition_order_pathkeys. 3. For each set of possible pathkeys, it checks whether the proposed pathkeys are equal to (or an initial prefix of) the partition_order_pathkeys. 4. If so, instead of calling create_merge_append_path(), it calls create_append_path(). 5. create_append_path() gets the proposed pathkeys via a new List *pathkeys argument and sticks them on the path. And that's it. The extensive patch does a lot of other stuff, like whacking around the structure of AppendPath vs. MergeAppendPath, and I'm not sure why we need or want any of that. I might be missing something, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Thu, Sep 21, 2017 at 11:13 AM, Julien Rouhaudwrote: > On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat > wrote: >> With 9140cf8269b0c4ae002b2748d93979d535891311, we store the >> RelOptInfos of partitions in the RelOptInfo of partitioned table. For >> range partitioned table without default partition, they are arranged >> in the ascending order of partition bounds. This patch may avoid >> MergeAppend if the sort keys match partition keys by creating an >> Append path by picking sorted paths from partition RelOptInfos in >> order. You may use slightly modified version of >> add_paths_to_append_rel(). > > > Yes, I just saw this commit this morning, and this is exactly what I > was missing, thanks for the pointer and the patch! PFA v3 of the patch, once again rebased and that now use all the newly available infrastructure. I also added a check that a sorted append can't be generated when a default partition has been declared. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index c1602c59cc..20e63b3204 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -80,6 +80,8 @@ static void show_upper_qual(List *qual, const char *qlabel, ExplainState *es); static void show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es); +static void show_append_keys(AppendState *mstate, List *ancestors, + ExplainState *es); static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors, ExplainState *es); static void show_agg_keys(AggState *astate, List *ancestors, @@ -1602,6 +1604,10 @@ ExplainNode(PlanState *planstate, List *ancestors, show_sort_keys(castNode(SortState, planstate), ancestors, es); show_sort_info(castNode(SortState, planstate), es); break; + case T_Append: + show_append_keys(castNode(AppendState, planstate), + ancestors, es); + break; case T_MergeAppend: show_merge_append_keys(castNode(MergeAppendState, planstate), ancestors, es); @@ -1744,7 +1750,7 @@ ExplainNode(PlanState *planstate, List *ancestors, ancestors, es); break; case T_MergeAppend: - ExplainMemberNodes(((MergeAppend *) plan)->mergeplans, + ExplainMemberNodes(((MergeAppend*) plan)->plan.appendplans, ((MergeAppendState *) planstate)->mergeplans, ancestors, es); break; @@ -1935,6 +1941,22 @@ show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es) ancestors, es); } +/* + * Likewise, for an Append node. + */ +static void +show_append_keys(AppendState *mstate, List *ancestors, + ExplainState *es) +{ + Append *plan = (Append *) mstate->ps.plan; + + show_sort_group_keys((PlanState *) mstate, "Sort Key", +plan->numCols, plan->sortColIdx, +plan->sortOperators, plan->collations, +plan->nullsFirst, +ancestors, es); +} + /* * Likewise, for a MergeAppend node. */ @@ -1942,7 +1964,7 @@ static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors, ExplainState *es) { - MergeAppend *plan = (MergeAppend *) mstate->ps.plan; + Append *plan = (Append *) mstate->ps.plan; show_sort_group_keys((PlanState *) mstate, "Sort Key", plan->numCols, plan->sortColIdx, diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c index 6bf490bd70..601f2547d3 100644 --- a/src/backend/executor/nodeMergeAppend.c +++ b/src/backend/executor/nodeMergeAppend.c @@ -64,6 +64,7 @@ MergeAppendState * ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) { MergeAppendState *mergestate = makeNode(MergeAppendState); + Append *append = >plan; PlanState **mergeplanstates; int nplans; int i; @@ -76,12 +77,12 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) * Lock the non-leaf tables in the partition tree
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapatwrote: > With 9140cf8269b0c4ae002b2748d93979d535891311, we store the > RelOptInfos of partitions in the RelOptInfo of partitioned table. For > range partitioned table without default partition, they are arranged > in the ascending order of partition bounds. This patch may avoid > MergeAppend if the sort keys match partition keys by creating an > Append path by picking sorted paths from partition RelOptInfos in > order. You may use slightly modified version of > add_paths_to_append_rel(). Yes, I just saw this commit this morning, and this is exactly what I was missing, thanks for the pointer and the patch! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Thu, Sep 21, 2017 at 3:30 AM, Julien Rouhaudwrote: > On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentraut > wrote: >> On 3/20/17 11:03, Ronan Dunklau wrote: Great idea. This is too late for v10 at this point, but please add it to the next CommitFest so we don't forget about it. >>> I know it is too late, and thought that it was too early to add it to the >>> commitfest properly since so many design decisions should be discussed. >>> Thanks >>> for the feedback, I added it. >> >> This patch no longer applies. Please send an update. > > > Thanks for the reminder, and sorry for very very late answer. PFA > rebased patch on current head. > > I unfortunately didn't have time yet to read carefully the newly added > infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and > following), so this patch is still using its own copy of partitions > list. I hope I can work on it shortly, but I prefer to send the > rebased version now, since it's still a POC with knowns problems and > questions, and I'll more than welcome any guidance with it. > > With 9140cf8269b0c4ae002b2748d93979d535891311, we store the RelOptInfos of partitions in the RelOptInfo of partitioned table. For range partitioned table without default partition, they are arranged in the ascending order of partition bounds. This patch may avoid MergeAppend if the sort keys match partition keys by creating an Append path by picking sorted paths from partition RelOptInfos in order. You may use slightly modified version of add_paths_to_append_rel(). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentrautwrote: > On 3/20/17 11:03, Ronan Dunklau wrote: >>> Great idea. This is too late for v10 at this point, but please add it >>> to the next CommitFest so we don't forget about it. >> I know it is too late, and thought that it was too early to add it to the >> commitfest properly since so many design decisions should be discussed. >> Thanks >> for the feedback, I added it. > > This patch no longer applies. Please send an update. Thanks for the reminder, and sorry for very very late answer. PFA rebased patch on current head. I unfortunately didn't have time yet to read carefully the newly added infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and following), so this patch is still using its own copy of partitions list. I hope I can work on it shortly, but I prefer to send the rebased version now, since it's still a POC with knowns problems and questions, and I'll more than welcome any guidance with it. diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index c1602c59cc..20e63b3204 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -80,6 +80,8 @@ static void show_upper_qual(List *qual, const char *qlabel, ExplainState *es); static void show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es); +static void show_append_keys(AppendState *mstate, List *ancestors, + ExplainState *es); static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors, ExplainState *es); static void show_agg_keys(AggState *astate, List *ancestors, @@ -1602,6 +1604,10 @@ ExplainNode(PlanState *planstate, List *ancestors, show_sort_keys(castNode(SortState, planstate), ancestors, es); show_sort_info(castNode(SortState, planstate), es); break; + case T_Append: + show_append_keys(castNode(AppendState, planstate), + ancestors, es); + break; case T_MergeAppend: show_merge_append_keys(castNode(MergeAppendState, planstate), ancestors, es); @@ -1744,7 +1750,7 @@ ExplainNode(PlanState *planstate, List *ancestors, ancestors, es); break; case T_MergeAppend: - ExplainMemberNodes(((MergeAppend *) plan)->mergeplans, + ExplainMemberNodes(((MergeAppend*) plan)->plan.appendplans, ((MergeAppendState *) planstate)->mergeplans, ancestors, es); break; @@ -1935,6 +1941,22 @@ show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es) ancestors, es); } +/* + * Likewise, for an Append node. + */ +static void +show_append_keys(AppendState *mstate, List *ancestors, + ExplainState *es) +{ + Append *plan = (Append *) mstate->ps.plan; + + show_sort_group_keys((PlanState *) mstate, "Sort Key", +plan->numCols, plan->sortColIdx, +plan->sortOperators, plan->collations, +plan->nullsFirst, +ancestors, es); +} + /* * Likewise, for a MergeAppend node. */ @@ -1942,7 +1964,7 @@ static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors, ExplainState *es) { - MergeAppend *plan = (MergeAppend *) mstate->ps.plan; + Append *plan = (Append *) mstate->ps.plan; show_sort_group_keys((PlanState *) mstate, "Sort Key", plan->numCols, plan->sortColIdx, diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c index 6bf490bd70..601f2547d3 100644 --- a/src/backend/executor/nodeMergeAppend.c +++ b/src/backend/executor/nodeMergeAppend.c @@ -64,6 +64,7 @@ MergeAppendState * ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) { MergeAppendState *mergestate = makeNode(MergeAppendState); + Append *append = >plan; PlanState **mergeplanstates; int nplans; int i; @@ -76,12 +77,12 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags) * Lock the non-leaf tables in the
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On 3/20/17 11:03, Ronan Dunklau wrote: >> Great idea. This is too late for v10 at this point, but please add it >> to the next CommitFest so we don't forget about it. > I know it is too late, and thought that it was too early to add it to the > commitfest properly since so many design decisions should be discussed. > Thanks > for the feedback, I added it. This patch no longer applies. Please send an update. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On vendredi 24 mars 2017 08:14:03 CEST Ashutosh Bapat wrote: > On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklauwrote: > > On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote: > >> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau > > > > wrote: > >> > With range partitioning, we guarantee that each partition contains non- > >> > overlapping values. Since we know the range allowed for each partition, > >> > it > >> > is possible to sort them according to the partition key (as is done > >> > already for looking up partitions). > >> > > >> > Thus, we ca generate sorted Append plans instead of MergeAppend when > >> > sorting occurs on the partition key. > >> > >> Great idea. This is too late for v10 at this point, but please add it > >> to the next CommitFest so we don't forget about it. > > > > I know it is too late, and thought that it was too early to add it to the > > commitfest properly since so many design decisions should be discussed. > > Thanks for the feedback, I added it. > > Thanks for working on this. I was also thinking about the same. > > I will try to get back to review/work with this for v11. Mean time, I > am working on partition-wise joins [1]. In those patches, I have added > a structure called PartitionScheme, which represents how a relation is > partitioned. For base relations it's mostly copy of PartitionDesc and > PartitionKey, but then it bubbles up across join, with each > partitioned join getting relevant partitioning scheme. If you could > base your design such that is uses PartitionScheme, it could be used > for joins and probably when Jeevan's patch for partition-wise > aggregate [2] comes along, it can be used with grouping. > > [1] > https://www.postgresql.org/message-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQ > vQTwzQOGczq_sw%40mail.gmail.com [2] > http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html Thank you. I should probably wait until your patch is finalised before spending too much time on it, and I could probably also leverage the incremental sort patch if there is progress on it. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklauwrote: > On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote: >> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau > wrote: >> > With range partitioning, we guarantee that each partition contains non- >> > overlapping values. Since we know the range allowed for each partition, it >> > is possible to sort them according to the partition key (as is done >> > already for looking up partitions). >> > >> > Thus, we ca generate sorted Append plans instead of MergeAppend when >> > sorting occurs on the partition key. >> >> Great idea. This is too late for v10 at this point, but please add it >> to the next CommitFest so we don't forget about it. > > I know it is too late, and thought that it was too early to add it to the > commitfest properly since so many design decisions should be discussed. Thanks > for the feedback, I added it. Thanks for working on this. I was also thinking about the same. I will try to get back to review/work with this for v11. Mean time, I am working on partition-wise joins [1]. In those patches, I have added a structure called PartitionScheme, which represents how a relation is partitioned. For base relations it's mostly copy of PartitionDesc and PartitionKey, but then it bubbles up across join, with each partitioned join getting relevant partitioning scheme. If you could base your design such that is uses PartitionScheme, it could be used for joins and probably when Jeevan's patch for partition-wise aggregate [2] comes along, it can be used with grouping. [1] https://www.postgresql.org/message-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQvQTwzQOGczq_sw%40mail.gmail.com [2] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote: > On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklauwrote: > > With range partitioning, we guarantee that each partition contains non- > > overlapping values. Since we know the range allowed for each partition, it > > is possible to sort them according to the partition key (as is done > > already for looking up partitions). > > > > Thus, we ca generate sorted Append plans instead of MergeAppend when > > sorting occurs on the partition key. > > Great idea. This is too late for v10 at this point, but please add it > to the next CommitFest so we don't forget about it. I know it is too late, and thought that it was too early to add it to the commitfest properly since so many design decisions should be discussed. Thanks for the feedback, I added it. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering
On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklauwrote: > With range partitioning, we guarantee that each partition contains non- > overlapping values. Since we know the range allowed for each partition, it is > possible to sort them according to the partition key (as is done already for > looking up partitions). > > Thus, we ca generate sorted Append plans instead of MergeAppend when sorting > occurs on the partition key. Great idea. This is too late for v10 at this point, but please add it to the next CommitFest so we don't forget about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers