Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/10/02 16:45), Michael Paquier wrote: On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote: I tried to close it as being committed, but couldn't do so, because I can't find Fujita-san's name in the list of committers in the CF app's drop down box that lists all committers. Indeed, Fujita-san has been added to the list. And I switched this CF entry at the same time. Thank you guys! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote: > I tried to close it as being committed, but couldn't do so, because I > can't find Fujita-san's name in the list of committers in the CF app's > drop down box that lists all committers. Indeed, Fujita-san has been added to the list. And I switched this CF entry at the same time. -- Michael signature.asc Description: PGP signature
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On 2018/08/31 21:40, Etsuro Fujita wrote: > (2018/08/31 21:30), Jonathan S. Katz wrote: >> Thank you for taking care of that and for committing the patch. I have >> now closed this issues on the open items list. > > Thanks! I noticed that the CF entry for this was not closed. As of this morning, it's been moved to the next 2018-11 CF: https://commitfest.postgresql.org/20/1554/ I tried to close it as being committed, but couldn't do so, because I can't find Fujita-san's name in the list of committers in the CF app's drop down box that lists all committers. Thanks, Amit
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/31 21:30), Jonathan S. Katz wrote: On 8/31/18 7:54 AM, Etsuro Fujita wrote: (2018/08/30 20:25), Etsuro Fujita wrote: (2018/08/29 18:40), Etsuro Fujita wrote: (2018/08/29 0:21), Jonathan S. Katz wrote: On Aug 24, 2018, at 8:38 AM, Etsuro Fujita wrote: (2018/08/24 11:47), Michael Paquier wrote: On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected them but still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue in my terminal environment, so I'm trying to resolve that. Are there any updates on getting this patch committed? That investigation has shown that the cause is my company firewall, not my terminal environment; that firewall has to be configured to allow me to access to that repository. So, I'm asking my company about that. I got the approval from my boss today, so I think that I can have that access soon. I fixed typos in the commit message, which Alvaro pointed out off-list, and revised the message a little bit. Also, I reread the patch and noticed that the latest version includes now-unnecessary regression test cases; those were added to the previous version (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() work for cases where whole-row Vars are involved, because I modified those functions, but the latest version doesn't touch those functions anymore, so those test cases seems unnecessary. So I removed those (no other changes), and committed the patch. Thank you for taking care of that and for committing the patch. I have now closed this issues on the open items list. Thanks! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On 8/31/18 7:54 AM, Etsuro Fujita wrote: > (2018/08/30 20:25), Etsuro Fujita wrote: >> (2018/08/29 18:40), Etsuro Fujita wrote: >>> (2018/08/29 0:21), Jonathan S. Katz wrote: > On Aug 24, 2018, at 8:38 AM, Etsuro > Fujita wrote: > (2018/08/24 11:47), Michael Paquier wrote: >> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>> I tried this today, but doing git behind the corporate firewall >>> doesn't >>> work. I don't know the clear cause of that, so I'll investigate >>> that >>> tomorrow. >> >> You may be able to tweak that by using https as origin point or >> proper >> git proxy settings? > > Yeah, my proxy settings were not correct. With the help of my > colleagues Horiguchi-san and Yamada-san, I corrected them but still > can't clone the master repository. Running git with GIT_CURL_VERBOSE > shows that there is another issue in my terminal environment, so I'm > trying to resolve that. Are there any updates on getting this patch committed? >>> >>> That investigation has shown that the cause is my company firewall, not >>> my terminal environment; that firewall has to be configured to allow me >>> to access to that repository. So, I'm asking my company about that. >> >> I got the approval from my boss today, so I think that I can have that >> access soon. > > I fixed typos in the commit message, which Alvaro pointed out > off-list, and revised the message a little bit. Also, I reread the > patch and noticed that the latest version includes now-unnecessary > regression test cases; those were added to the previous version > (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that > apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() > work for cases where whole-row Vars are involved, because I modified > those functions, but the latest version doesn't touch those functions > anymore, so those test cases seems unnecessary. So I removed those > (no other changes), and committed the patch. Thank you for taking care of that and for committing the patch. I have now closed this issues on the open items list. Jonathan signature.asc Description: OpenPGP digital signature
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/30 20:25), Etsuro Fujita wrote: (2018/08/29 18:40), Etsuro Fujita wrote: (2018/08/29 0:21), Jonathan S. Katz wrote: On Aug 24, 2018, at 8:38 AM, Etsuro Fujita wrote: (2018/08/24 11:47), Michael Paquier wrote: On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected them but still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue in my terminal environment, so I'm trying to resolve that. Are there any updates on getting this patch committed? That investigation has shown that the cause is my company firewall, not my terminal environment; that firewall has to be configured to allow me to access to that repository. So, I'm asking my company about that. I got the approval from my boss today, so I think that I can have that access soon. I fixed typos in the commit message, which Alvaro pointed out off-list, and revised the message a little bit. Also, I reread the patch and noticed that the latest version includes now-unnecessary regression test cases; those were added to the previous version (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() work for cases where whole-row Vars are involved, because I modified those functions, but the latest version doesn't touch those functions anymore, so those test cases seems unnecessary. So I removed those (no other changes), and committed the patch. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5B683F60.2070806%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/29 18:40), Etsuro Fujita wrote: (2018/08/29 0:21), Jonathan S. Katz wrote: On Aug 24, 2018, at 8:38 AM, Etsuro Fujita wrote: (2018/08/24 11:47), Michael Paquier wrote: On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected them but still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue in my terminal environment, so I'm trying to resolve that. Are there any updates on getting this patch committed? That investigation has shown that the cause is my company firewall, not my terminal environment; that firewall has to be configured to allow me to access to that repository. So, I'm asking my company about that. I got the approval from my boss today, so I think that I can have that access soon. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/29 0:21), Jonathan S. Katz wrote: On Aug 24, 2018, at 8:38 AM, Etsuro Fujita wrote: (2018/08/24 11:47), Michael Paquier wrote: On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected them but still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue in my terminal environment, so I'm trying to resolve that. Are there any updates on getting this patch committed? That investigation has shown that the cause is my company firewall, not my terminal environment; that firewall has to be configured to allow me to access to that repository. So, I'm asking my company about that. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
> On Aug 24, 2018, at 8:38 AM, Etsuro Fujita > wrote: > > (2018/08/24 11:47), Michael Paquier wrote: >> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: >>> I tried this today, but doing git behind the corporate firewall doesn't >>> work. I don't know the clear cause of that, so I'll investigate that >>> tomorrow. >> >> You may be able to tweak that by using https as origin point or proper >> git proxy settings? > > Yeah, my proxy settings were not correct. With the help of my colleagues > Horiguchi-san and Yamada-san, I corrected them but still can't clone the > master repository. Running git with GIT_CURL_VERBOSE shows that there is > another issue in my terminal environment, so I'm trying to resolve that. Are there any updates on getting this patch committed? Thanks, Jonathan signature.asc Description: Message signed with OpenPGP
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/24 11:47), Michael Paquier wrote: On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? Yeah, my proxy settings were not correct. With the help of my colleagues Horiguchi-san and Yamada-san, I corrected them but still can't clone the master repository. Running git with GIT_CURL_VERBOSE shows that there is another issue in my terminal environment, so I'm trying to resolve that. Thanks for the advice! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote: > I tried this today, but doing git behind the corporate firewall doesn't > work. I don't know the clear cause of that, so I'll investigate that > tomorrow. You may be able to tweak that by using https as origin point or proper git proxy settings? -- Michael signature.asc Description: PGP signature
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/22 20:08), Etsuro Fujita wrote: (2018/08/16 12:00), Etsuro Fujita wrote: (2018/08/15 23:40), Robert Haas wrote: Given the comments from the RMT, and also on general principle, it seems like we really need to get on with committing something here. It's my understanding you plan to do that, since it's your patch. When? I plan to do that late next week as I'll go on leave until next Tuesday. I added the commit message. Please find attached an updated version of the patch. Does that make sense? If there are not objections, I'll push this tomorrow. I tried this today, but doing git behind the corporate firewall doesn't work. I don't know the clear cause of that, so I'll investigate that tomorrow. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/16 12:00), Etsuro Fujita wrote: (2018/08/15 23:40), Robert Haas wrote: Given the comments from the RMT, and also on general principle, it seems like we really need to get on with committing something here. It's my understanding you plan to do that, since it's your patch. When? I plan to do that late next week as I'll go on leave until next Tuesday. I added the commit message. Please find attached an updated version of the patch. Does that make sense? If there are not objections, I'll push this tomorrow. Best regards, Etsuro Fujita >From e8e5fa32984a51e73831529b055bddbed03feaa1 Mon Sep 17 00:00:00 2001 From: Etsuro Fujita Date: Wed, 22 Aug 2018 19:59:01 +0900 Subject: [PATCH] Disable support for partitionwise joins in problematic cases. Commit f49842d1ee31b976c681322f76025d7732e860f3, which added support for partitionwise joins, built the child's tlist by applying adjust_appendrel_attrs() to the parent's. So in the case where the parent tlist included a whole-row Var for the parent, the child tlist contained a ConverRowtypeExpr. That commit added code to handle that to the planner, but some code paths still made the old assumption that a scan or join rel's would only include Vars and PlaceHolderVars, causing the following errors: * When creating an explicit sort node for an input path for building a mergejoin plan for a child join, prepare_sort_from_pathkeys() throws the 'could not find pathkey item to sort' error. * When deparsing a relation participating a pushed down foreign child join as a subquery in contrib/postgres_fdw, get_relation_column_alias_ids() throws the 'unexpected expression in subquery output' error. * When performing set_plan_references() on a local join plan created by contrib/postgres_fdw for EvalPlanQual support, fix_join_expr() throws the 'variable not found in subplan target lists' error. To fix these, two approaches have been proposed: one by Ashutosh Bapat and one by me. While the former keeps building the child's tlist with a ConvertRowtypeExpr, the latter builds it with a whole-row Var for the child and tries to fix it up later. But both approaches need more work, so refuse to generate partitionwise join paths when whole-row Vars are involved, instead. We don't need to handle ConvertRowtypeExprs in the child's tlists for now, so this commit also removes the changes to the planner, such as setrefs.c. Previously, partitionwise join computed attr_needed data for each child separately, and built the child join's tlists using that data, which also required an extra step for adding PlaceHolderVars to the child join's tlists, but it would be more efficient to build the child join's tlist by applying adjust_appendrel_attrs() to the parent join's. So this commit also changes it that way, and simplifies build_joinrel_tlist() and placeholder.c as well as part of set_append_rel_size() to basically what they were before partitionwise join went in. Back-patch to PG11 where partitionwise join was introduced. Report by Rajkumar Raghuwanshi. Analysis by Ashutosh Bapat, who also provided some of regression tests. Patch by me, reviewed by Robert Haas. Discussion: https://postgr.es/m/cakcux6ktu-8teflwtquuzbyfaza83vuzurd7c1yhc-yewyy...@mail.gmail.com --- contrib/postgres_fdw/expected/postgres_fdw.out| 71 ++--- contrib/postgres_fdw/sql/postgres_fdw.sql | 14 ++- src/backend/nodes/outfuncs.c |1 + src/backend/optimizer/path/allpaths.c | 78 + src/backend/optimizer/path/joinrels.c |7 + src/backend/optimizer/plan/setrefs.c | 58 +- src/backend/optimizer/util/placeholder.c | 58 -- src/backend/optimizer/util/relnode.c | 125 +++-- src/include/nodes/relation.h |8 +- src/test/regress/expected/partition_aggregate.out | 84 ++ src/test/regress/expected/partition_join.out | 93 +--- src/test/regress/sql/partition_aggregate.sql | 10 ++ src/test/regress/sql/partition_join.sql | 14 ++- 13 files changed, 365 insertions(+), 256 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index d912bd9..21a2ef5 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8337,8 +8337,9 @@ ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; -CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) +CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); +ALTER
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/15 23:40), Robert Haas wrote: Given the comments from the RMT, and also on general principle, it seems like we really need to get on with committing something here. It's my understanding you plan to do that, since it's your patch. When? I plan to do that late next week as I'll go on leave until next Tuesday. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Aug 15, 2018 at 7:35 AM, Etsuro Fujita wrote: > Thanks for the comments, Robert! Given the comments from the RMT, and also on general principle, it seems like we really need to get on with committing something here. It's my understanding you plan to do that, since it's your patch. When? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/15 13:04), Amit Langote wrote: On 2018/08/15 12:25, Etsuro Fujita wrote: (2018/08/15 0:51), Robert Haas wrote: On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita wrote: One thing I noticed might be an improvement is to skip build_joinrel_partition_info if the given joinrel will be to have consider_partitionwise_join=false; in the previous patch, that function created the joinrel's partition info such as part_scheme and part_rels if the joinrel is considered as partitioned, independently of the flag consider_partitionwise_join for it, but if that flag is false, we don't generate PWJ paths for the joinrel, so we would not need to create that partition info at all. This would not only avoid unnecessary processing in that function, but also make unnecessary the changes I made to try_partitionwise_join, generate_partitionwise_join_paths, apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I updated the patch that way. Please find attached an updated version of the patch. I guess the question is whether there are (or might be in the future) other dependencies on part_scheme. For example, it looks like partition pruning uses it. I'm not sure whether partition pruning supports a plan like: Append -> Nested Loop -> Seq Scan on p1 -> Index Scan on q1 I'm not sure that either, but if a join relation doesn't have part_scheme set, it means that that relation is considered as non-partitioned, as in the case when enable_partitionwise_join is off, so there would be no PWJ paths generated for it, to begin with. So in that case, ISTM that we don't need to worry about that at least for partition pruning. Fwiw, partition pruning works only for base rels, which applies to both planning-time pruning (pruning is performed only during base rel size estimation) and run-time pruning (we'll add pruning info to the Append plan only if the source AppendPath's parent rel is a base rel). Thanks for that, Amit! I looked into the question for the join or upper rel case, but couldn't find any places in the PG11 code where we assume that a join or upper rel has non-NULL part_scheme, as described below: * Both try_partitionwise_join() and generate_partitionwise_join_paths() check whether a join rel to be considered has non-NULL part_scheme, through the IS_PARTITIONED_REL macro: #define IS_PARTITIONED_REL(rel) \ ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \ (rel)->part_rels && !(IS_DUMMY_REL(rel))) If IS_PARTITIONED_REL, the former adds paths to child-joins, and the latter builds PWJ paths; but both don't assume non-NULL part_scheme. * apply_scanjoin_target_to_paths() checks whether the topmost scan/join rel has non-NULL part_scheme, and if so, it recursively adjusts all partitions; it doesn't assume non-NULL part_scheme. * create_ordinary_grouping_paths() also checks whether the topmost scan/join rel adjusted by apply_scanjoin_target_to_paths() has non-NULL part_scheme, and if so, it considers PWA paths; it doesn't assume non-NULL part_scheme either. * add_paths_to_append_rel(), which is called from each of the above (ie, generate_partitionwise_join_paths(), apply_scanjoin_target_to_paths(), and create_partitionwise_grouping_paths() in create_ordinary_grouping_paths() for creating PWJ paths, adjusting PWJ paths, and creating PWA paths respectively) also checks whether a rel to be considered has non-NULL part_scheme, but never assumes that. And actually, if the rel's part_scheme is NULL, add_paths_to_append_rel() wouldn't be called for the rel, because of the reason described above. Thanks for the comments, Robert! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On 2018/08/15 12:25, Etsuro Fujita wrote: > (2018/08/15 0:51), Robert Haas wrote: >> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita >> wrote: >>> One thing I noticed might be an improvement is to skip >>> build_joinrel_partition_info if the given joinrel will be to have >>> consider_partitionwise_join=false; in the previous patch, that function >>> created the joinrel's partition info such as part_scheme and part_rels if >>> the joinrel is considered as partitioned, independently of the flag >>> consider_partitionwise_join for it, but if that flag is false, we don't >>> generate PWJ paths for the joinrel, so we would not need to create that >>> partition info at all. This would not only avoid unnecessary >>> processing in >>> that function, but also make unnecessary the changes I made to >>> try_partitionwise_join, generate_partitionwise_join_paths, >>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I >>> updated the patch that way. Please find attached an updated version of >>> the >>> patch. >> >> I guess the question is whether there are (or might be in the future) >> other dependencies on part_scheme. For example, it looks like >> partition pruning uses it. I'm not sure whether partition pruning >> supports a plan like: >> >> Append >> -> Nested Loop >> -> Seq Scan on p1 >> -> Index Scan on q1 >> > > I'm not sure that either, but if a join relation doesn't have part_scheme > set, it means that that relation is considered as non-partitioned, as in > the case when enable_partitionwise_join is off, so there would be no PWJ > paths generated for it, to begin with. So in that case, ISTM that we > don't need to worry about that at least for partition pruning. Fwiw, partition pruning works only for base rels, which applies to both planning-time pruning (pruning is performed only during base rel size estimation) and run-time pruning (we'll add pruning info to the Append plan only if the source AppendPath's parent rel is a base rel). Thanks, Amit
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/15 0:51), Robert Haas wrote: On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita wrote: One thing I noticed might be an improvement is to skip build_joinrel_partition_info if the given joinrel will be to have consider_partitionwise_join=false; in the previous patch, that function created the joinrel's partition info such as part_scheme and part_rels if the joinrel is considered as partitioned, independently of the flag consider_partitionwise_join for it, but if that flag is false, we don't generate PWJ paths for the joinrel, so we would not need to create that partition info at all. This would not only avoid unnecessary processing in that function, but also make unnecessary the changes I made to try_partitionwise_join, generate_partitionwise_join_paths, apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I updated the patch that way. Please find attached an updated version of the patch. I guess the question is whether there are (or might be in the future) other dependencies on part_scheme. For example, it looks like partition pruning uses it. I'm not sure whether partition pruning supports a plan like: Append -> Nested Loop -> Seq Scan on p1 -> Index Scan on q1 I'm not sure that either, but if a join relation doesn't have part_scheme set, it means that that relation is considered as non-partitioned, as in the case when enable_partitionwise_join is off, so there would be no PWJ paths generated for it, to begin with. So in that case, ISTM that we don't need to worry about that at least for partition pruning. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
> On Aug 14, 2018, at 11:51 AM, Robert Haas wrote: > > On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita > wrote: >> One thing I noticed might be an improvement is to skip >> build_joinrel_partition_info if the given joinrel will be to have >> consider_partitionwise_join=false; in the previous patch, that function >> created the joinrel's partition info such as part_scheme and part_rels if >> the joinrel is considered as partitioned, independently of the flag >> consider_partitionwise_join for it, but if that flag is false, we don't >> generate PWJ paths for the joinrel, so we would not need to create that >> partition info at all. This would not only avoid unnecessary processing in >> that function, but also make unnecessary the changes I made to >> try_partitionwise_join, generate_partitionwise_join_paths, >> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I >> updated the patch that way. Please find attached an updated version of the >> patch. > > I guess the question is whether there are (or might be in the future) > other dependencies on part_scheme. For example, it looks like > partition pruning uses it. I'm not sure whether partition pruning > supports a plan like: > > Append > -> Nested Loop > -> Seq Scan on p1 > -> Index Scan on q1 > > > If it doesn't, that's just an implementation restriction; somebody > might want to fix things so it works, if it doesn't already. While we (the RMT) are happy to see there has been progress on this thread, 11 Beta 3 was released containing this problem and the time to adequately test this feature prior to GA release is rapidly dwindling. The RMT would like to see this prioritized and resolved. At this point we have considered the option of having partitionwise joins completely disabled in PostgreSQL 11. This is preferably not the path we want to choose, but this option is now on the table and we will issue an ultimatum soon if we do not see further progress. In the previous discussion, the generally accepted solution for PostgreSQL 11 seemed to be to disable the problematic case and any further work would be for PostgreSQL 12 only. If something is different, please indicate why ASAP and work to resolve. The RMT will also conduct some additional technical analysis as well. Sincerely, Alvaro, Andres, Jonathan signature.asc Description: Message signed with OpenPGP
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita wrote: > One thing I noticed might be an improvement is to skip > build_joinrel_partition_info if the given joinrel will be to have > consider_partitionwise_join=false; in the previous patch, that function > created the joinrel's partition info such as part_scheme and part_rels if > the joinrel is considered as partitioned, independently of the flag > consider_partitionwise_join for it, but if that flag is false, we don't > generate PWJ paths for the joinrel, so we would not need to create that > partition info at all. This would not only avoid unnecessary processing in > that function, but also make unnecessary the changes I made to > try_partitionwise_join, generate_partitionwise_join_paths, > apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I > updated the patch that way. Please find attached an updated version of the > patch. I guess the question is whether there are (or might be in the future) other dependencies on part_scheme. For example, it looks like partition pruning uses it. I'm not sure whether partition pruning supports a plan like: Append -> Nested Loop -> Seq Scan on p1 -> Index Scan on q1 If it doesn't, that's just an implementation restriction; somebody might want to fix things so it works, if it doesn't already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/13 11:57), Robert Haas wrote: On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita wrote: In the above I used the test whether the relation's reloptkind is RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a multi-level partitioned table. So I fixed that and added regression test cases for that. I also revised comments a bit. Attached is an updated version of the patch. + /* If so, consider partitionwise joins for that join. */ + if (IS_PARTITIONED_REL(joinrel)) + joinrel->consider_partitionwise_join = true; Maybe this should assert that the inner and outer rels have consider_partitionwise_join set. There is an Assert quite a bit earlier in the function that the parent join have it set, but I think it might make sense to check the children have it set whenever we set the flag. Agreed. Done. One thing I noticed might be an improvement is to skip build_joinrel_partition_info if the given joinrel will be to have consider_partitionwise_join=false; in the previous patch, that function created the joinrel's partition info such as part_scheme and part_rels if the joinrel is considered as partitioned, independently of the flag consider_partitionwise_join for it, but if that flag is false, we don't generate PWJ paths for the joinrel, so we would not need to create that partition info at all. This would not only avoid unnecessary processing in that function, but also make unnecessary the changes I made to try_partitionwise_join, generate_partitionwise_join_paths, apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths. So I updated the patch that way. Please find attached an updated version of the patch. Thanks for the review! Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 8337,8344 ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; --- 8337,8345 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); + ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; *** *** 8389,8416 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) 8 | | (5 rows) ! -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !QUERY PLAN ! - Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) !-> Append ! -> Foreign Scan !Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) ! -> Foreign Scan !Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) ! (7 rows) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !t1 | t2 + (0,0,) | (0,0,) (150,150,0003) | (150,150,0003) (250,250,0005) | (250,250,0005) (400,400,0008) | (400,400,0008) ! (4 rows) -- join with lateral reference EXPLAIN (COSTS OFF) --- 8390,8431 8 | | (5 rows) ! -- with whole-row reference; partitionwise join does not apply EXPLAIN (COSTS OFF) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !QUERY PLAN ! Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) !-> Hash Full Join ! Hash Cond: (t1.a = t2.b) ! -> Append !-> Foreign Scan on
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita wrote: > In the above I used the test whether the relation's reloptkind is > RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a > multi-level partitioned table. So I fixed that and added regression test > cases for that. I also revised comments a bit. Attached is an updated > version of the patch. + /* If so, consider partitionwise joins for that join. */ + if (IS_PARTITIONED_REL(joinrel)) + joinrel->consider_partitionwise_join = true; Maybe this should assert that the inner and outer rels have consider_partitionwise_join set. There is an Assert quite a bit earlier in the function that the parent join have it set, but I think it might make sense to check the children have it set whenever we set the flag. Aside from that I don't really have any suggestions on this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/03 22:28), Etsuro Fujita wrote: (2018/08/03 22:18), Etsuro Fujita wrote: Here is a patch for refusing to generate PWJ paths when WRVs are involved: * We no longer need to handle WRVs, so I've simplified build_joinrel_tlist() and setrefs.c to what they were before partition-wise join went in, as in the previous patch. * attr_needed data for each child is used for building child-joins' tlists, but I think we can build those by applying adjust_appendrel_attrs to the parent's tlists, without attr_needed. So, I've also removed that as in the previous patch. One thing to add: as for the latter, we don't need the changes to placeholder.c either, so I've also simplified that to what they were before partition-wise join went in. *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** *** 3960,3965 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, --- 3960,3967 */ if (extra->patype != PARTITIONWISE_AGGREGATE_NONE && input_rel->part_scheme && input_rel->part_rels && + (input_rel->reloptkind == RELOPT_BASEREL || +input_rel->consider_partitionwise_join) && !IS_DUMMY_REL(input_rel)) { /* *** *** 6913,6919 apply_scanjoin_target_to_paths(PlannerInfo *root, * projection-capable, that might save a separate Result node, and it also * is important for partitionwise aggregate. */ ! if (rel->part_scheme && rel->part_rels) { int partition_idx; List *live_children = NIL; --- 6915,6923 * projection-capable, that might save a separate Result node, and it also * is important for partitionwise aggregate. */ ! if (rel->part_scheme && rel->part_rels && ! (rel->reloptkind == RELOPT_BASEREL || !rel->consider_partitionwise_join)) { int partition_idx; List *live_children = NIL; In the above I used the test whether the relation's reloptkind is RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a multi-level partitioned table. So I fixed that and added regression test cases for that. I also revised comments a bit. Attached is an updated version of the patch. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 8337,8344 ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; --- 8337,8345 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); + ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; *** *** 8389,8416 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) 8 | | (5 rows) ! -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !QUERY PLAN ! - Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) !-> Append ! -> Foreign Scan !Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) ! -> Foreign Scan !Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) ! (7 rows) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !t1 | t2 + (0,0,) | (0,0,) (150,150,0003) | (150,150,0003) (250,250,0005) | (250,250,0005) (400,400,0008) | (400,400,0008) ! (4 rows) -- join with lateral reference EXPLAIN (COSTS OFF) --- 8390,8431 8 | | (5 rows) ! -- with whole-row
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, Aug 3, 2018 at 10:36 AM, Tom Lane wrote: > Anyway, what I'm basically suggesting is that we just disable support for > PWJ in the problematic cases in v11. As long as PWJ isn't even on by > default, that's not much of a loss. Obviously we'll want to fix it in the > future, but the hour grows late for v11, and I think either of these > patches would need quite a bit more work to be committable. That's not my first choice, but I'm OK with it. >> There are definitely some things not to like about this approach. In >> particular, I definitely agree that treating a converted whole-row >> reference specially is not very nice. It feels like it might be >> substantially cleaner to be able to represent this idea as a single >> node rather than a combination of ConvertRowtypeExpr and var with >> varattno = 0. Perhaps in the future we ought to consider either (a) >> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) >> inventing a new WholeRowExpr node that stores two RTIs, one for the >> relation that's generating the whole-row reference and the other for >> the relation that is controlling the column ordering of the result or >> (c) allowing a Var to represent a whole-row expression that has to >> produce its outputs according to the ordering of some other RTE. But >> I don't think it's wise or necessary to whack that around just to fix >> this bug; it is refactoring or improvement work best left to a future >> release. > > I agree with all of that except your conclusion that it's unnecessary > to do it to fix this bug. I find that entirely unsupported and over- > optimistic, especially in view of the number of iterations Ashutosh > went through trying to fix the fallout from not making a clear > distinction. I don't think that the question of how a converted whole-row expr can really make a difference between being able to fix this bug and not being able to fix this bug. It's just a representational choice. If we decide that a converted whole-row expr is represented as ConvertRowTypeExpr on top of a Var, then we have to check for that. If we revise the representation to use a Var node directly, or to use a new WholeRowVar node, then we'll just need to check for those things instead. I think that the shape of the fix itself does not change. Moreover, it's pretty much a 1:1 mapping. It's not like, say, replacing plans with paths in the whole upper half of the planner, where the representation is different enough that things that would have been very difficult to manage in the old representation become relatively simple in the new one. It's basically just a question of how you spell it. Writing a test for ConvertRowTypeExpr atop Var and using it a bunch of places is not beautiful, and it's certainly going to be marginally slower than a direct test for WholeRowVar or some other representation that bundles the whole thing into one node, but I think it should work just fine, and the representation can be revised later in a separate patch once we agree on an approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Robert Haas writes: > On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane wrote: >> As far as I can see from the example that started this thread, >> postgres_fdw injects WRVs into a PWJ whether or not the query involves >> FOR UPDATE; that's why this bug is reproducible in a query without FOR >> UPDATE. But we shouldn't need any EPQ support in that case. > I might be missing something, but the original reproducer involves a > FOR UPDATE clause, and the expected regression test output in > postgres_fdw.out appears to show a whole-row var in the foreign scan's > target list only in those examples where a locking clause is present. Oh, sigh, never mind that --- I was thinking that only the first of the two example queries involved FOR UPDATE, but they both do. So no mystery there after all. Anyway, what I'm basically suggesting is that we just disable support for PWJ in the problematic cases in v11. As long as PWJ isn't even on by default, that's not much of a loss. Obviously we'll want to fix it in the future, but the hour grows late for v11, and I think either of these patches would need quite a bit more work to be committable. BTW, I forgot to respond to this: >> I do not trust Ashutosh's patch because of the point you noted that it >> will kick in on ConvertRowtypeExprs that are not related to partitioning. > I don't remember making that point -- can you clarify? I'm looking at https://www.postgresql.org/message-id/CA%2BTgmoZSaKq-fYALn5jf6c_X3%3D%3DRb2s8eqLDwGpV%3DLNNhTXYwg%40mail.gmail.com wherein you said > There are definitely some things not to like about this approach. In > particular, I definitely agree that treating a converted whole-row > reference specially is not very nice. It feels like it might be > substantially cleaner to be able to represent this idea as a single > node rather than a combination of ConvertRowtypeExpr and var with > varattno = 0. Perhaps in the future we ought to consider either (a) > trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) > inventing a new WholeRowExpr node that stores two RTIs, one for the > relation that's generating the whole-row reference and the other for > the relation that is controlling the column ordering of the result or > (c) allowing a Var to represent a whole-row expression that has to > produce its outputs according to the ordering of some other RTE. But > I don't think it's wise or necessary to whack that around just to fix > this bug; it is refactoring or improvement work best left to a future > release. I agree with all of that except your conclusion that it's unnecessary to do it to fix this bug. I find that entirely unsupported and over- optimistic, especially in view of the number of iterations Ashutosh went through trying to fix the fallout from not making a clear distinction. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane wrote: >>> Now, that's a bit of a problem for postgres_fdw, because it seems to >>> insist on injecting WRVs even when the query text does not require any. >>> Why is that, and can't we get rid of it? > >> I don't quite know what you mean here -- postgres_fdw does use >> whole-row vars for EPQ handling, which may be what you're thinking >> about. > > As far as I can see from the example that started this thread, > postgres_fdw injects WRVs into a PWJ whether or not the query involves > FOR UPDATE; that's why this bug is reproducible in a query without FOR > UPDATE. But we shouldn't need any EPQ support in that case. I might be missing something, but the original reproducer involves a FOR UPDATE clause, and the expected regression test output in postgres_fdw.out appears to show a whole-row var in the foreign scan's target list only in those examples where a locking clause is present. >> Honestly, I'm pretty impressed that we have added not one but two >> members to the RelOptKind enum without as little collateral damage as >> there has been. > > Color me a bit more skeptical about the bug density in that, given > that enable_partitionwise_join is off by default; that means you're > not getting a lot of testing. You're hard to please. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Robert Haas writes: > On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane wrote: >> Now, that's a bit of a problem for postgres_fdw, because it seems to >> insist on injecting WRVs even when the query text does not require any. >> Why is that, and can't we get rid of it? > I don't quite know what you mean here -- postgres_fdw does use > whole-row vars for EPQ handling, which may be what you're thinking > about. As far as I can see from the example that started this thread, postgres_fdw injects WRVs into a PWJ whether or not the query involves FOR UPDATE; that's why this bug is reproducible in a query without FOR UPDATE. But we shouldn't need any EPQ support in that case. > Honestly, I'm pretty impressed that we have added not one but two > members to the RelOptKind enum without as little collateral damage as > there has been. Color me a bit more skeptical about the bug density in that, given that enable_partitionwise_join is off by default; that means you're not getting a lot of testing. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane wrote: > I do not trust Ashutosh's patch because of the point you noted that it > will kick in on ConvertRowtypeExprs that are not related to partitioning. I don't remember making that point -- can you clarify? > It's also got a rather fundamental conceptual (or at least documentation) > error in that it says it's making pull_var_clause do certain things to > all ConvertRowtypeExprs, but that's not what the code actually does. > I think the tension around that has a lot to do with the fact that the > patch went through so many revisions, and I don't have any faith that > it's right even yet. As you mentioned upthread, this might be insoluble > without some representational change for converted whole-row Vars. Insoluble is a strong word > What I'm thinking might be a more appropriate thing, at least for > getting v11 out the door, is to refuse to generate partitionwise > joins when any whole-row vars are involved. (Perhaps that's not > enough to dodge all the problems, though?) It's enough to dodge the problem being discussed on this thread. > Now, that's a bit of a problem for postgres_fdw, because it seems to > insist on injecting WRVs even when the query text does not require any. > Why is that, and can't we get rid of it? It's horrid for performance > even without any of these other considerations. But if we fail to get > rid of that in time for v11, it would mean that postgres_fdw can't > participate in PWJ, which isn't great but I think we could live with it > for now. I don't quite know what you mean here -- postgres_fdw does use whole-row vars for EPQ handling, which may be what you're thinking about. > BTW, what exactly do we think the production status of PWJ is, anyway? > I notice that upthread it was stated that enable_partitionwise_join > defaults to off, as indeed it still does, and we'd turn it on later > when we'd gotten rid of some memory-hogging problems. If that hasn't > happened yet (and I don't see any open item about considering enabling > PWJ by default for v11), then I have exactly no hesitation about > lobotomizing PWJ as hard as we need to to mask this problem for v11. > I'd certainly prefer a solution along those lines to either of these > patches. Yeah, that hasn't been done yet. Partition-wise join probably needs a good bit of work in a number of areas to do all the things that people will want it to do -- see the thread on advanced partition-matching, which is an improvement but still doesn't cover every case that could come up. In terms of memory usage, I think that we need some discussion of the approach that should be taken. I see a couple of possible things we could do, not necessarily mutually exclusive. 1. We could try to avoid translating all of the parent's data structures for every child RelOptInfo, either by rejiggering things so that the child doesn't need all of that data, or by making it able to use the parent's copy of the data. 2. We could try to recycle memory more quickly. For example, if we're planning a partition-wise join of A-B-C-D, first plan paths for A1-B1-C1-D1 (and all proper subsets of those rels), then throw away most of the memory, then plan paths for A2-B2-C2-D2, etc. 3. We could generate paths for on "representative" children (e.g. the biggest ones) and use that to estimate the cost of the partition-wise plan. If that plan is selected, then go back and figure out real paths for the other children. All of these things help in different ways, and Ashutosh had code for some version of all of them at various points, but the problem is quite difficult. If you have three tables with 1000 partitions each, then without partition-wise join you need 3000 (partitions) + 3 (baserels) + 3 (level-2 joinrels) + 1 (level-3 joinrel) RelOptInfos. If you do a partition-wise join, then you get 3000 level-2 child-joinrels and 1000 level-3 child joinrels. Those RelOptInfo structures, and the paths attached to them, cannot help but take up memory. Perhaps that's OK, and we ought to just say "well, if you want better plans, you're going to have to allow more memory for planning". If not, we have to decide how hard we want to work in which areas and how good the result needs to be in order to have this turned on by difficult. Honestly, I'm pretty impressed that we have added not one but two members to the RelOptKind enum without as little collateral damage as there has been. This issue is annoying and the discussion has gone on longer than anyone would probably prefer, but it's really pretty minor in the grand scheme of things, at least IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/03 22:18), Etsuro Fujita wrote: Here is a patch for refusing to generate PWJ paths when WRVs are involved: * We no longer need to handle WRVs, so I've simplified build_joinrel_tlist() and setrefs.c to what they were before partition-wise join went in, as in the previous patch. * attr_needed data for each child is used for building child-joins' tlists, but I think we can build those by applying adjust_appendrel_attrs to the parent's tlists, without attr_needed. So, I've also removed that as in the previous patch. One thing to add: as for the latter, we don't need the changes to placeholder.c either, so I've also simplified that to what they were before partition-wise join went in. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/03 5:25), Tom Lane wrote: What I'm thinking might be a more appropriate thing, at least for getting v11 out the door, is to refuse to generate partitionwise joins when any whole-row vars are involved. Agreed. (Perhaps that's not enough to dodge all the problems, though?) Now, that's a bit of a problem for postgres_fdw, because it seems to insist on injecting WRVs even when the query text does not require any. Why is that, and can't we get rid of it? It's horrid for performance even without any of these other considerations. But if we fail to get rid of that in time for v11, it would mean that postgres_fdw can't participate in PWJ, which isn't great but I think we could live with it for now. Sorry, I don't understand this. Could you elaborate on that a bit more? BTW, what exactly do we think the production status of PWJ is, anyway? I notice that upthread it was stated that enable_partitionwise_join defaults to off, as indeed it still does, and we'd turn it on later when we'd gotten rid of some memory-hogging problems. If that hasn't happened yet (and I don't see any open item about considering enabling PWJ by default for v11), then I have exactly no hesitation about lobotomizing PWJ as hard as we need to to mask this problem for v11. That hasn't happened yet; I think we left that for PG12, IIRC. Here is a patch for refusing to generate PWJ paths when WRVs are involved: * We no longer need to handle WRVs, so I've simplified build_joinrel_tlist() and setrefs.c to what they were before partition-wise join went in, as in the previous patch. * attr_needed data for each child is used for building child-joins' tlists, but I think we can build those by applying adjust_appendrel_attrs to the parent's tlists, without attr_needed. So, I've also removed that as in the previous patch. Maybe I'm missing something, though. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 8337,8344 ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; --- 8337,8345 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); + ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; *** *** 8389,8416 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) 8 | | (5 rows) ! -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !QUERY PLAN ! - Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) !-> Append ! -> Foreign Scan !Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) ! -> Foreign Scan !Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) ! (7 rows) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !t1 | t2 + (0,0,) | (0,0,) (150,150,0003) | (150,150,0003) (250,250,0005) | (250,250,0005) (400,400,0008) | (400,400,0008) ! (4 rows) -- join with lateral reference EXPLAIN (COSTS OFF) --- 8390,8431 8 | | (5 rows) ! -- with whole-row reference; partitionwise join does not apply EXPLAIN (COSTS OFF) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !QUERY PLAN ! Sort Sort Key: ((t1.*)::fprt1),
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Robert Haas writes: > Does anyone else want to weigh in on this? It seems to me that there > are several people who are quite willing to complain about the fact > that this hasn't been fixed, but not willing to express an opinion > about the shape of the fix. Either the RMT needs to take executive > action, or we need some more votes. [ reads thread ... ] Well, my vote is that I've got zero faith in either of these patches. I do not trust Ashutosh's patch because of the point you noted that it will kick in on ConvertRowtypeExprs that are not related to partitioning. It's also got a rather fundamental conceptual (or at least documentation) error in that it says it's making pull_var_clause do certain things to all ConvertRowtypeExprs, but that's not what the code actually does. I think the tension around that has a lot to do with the fact that the patch went through so many revisions, and I don't have any faith that it's right even yet. As you mentioned upthread, this might be insoluble without some representational change for converted whole-row Vars. As for Etsuro-san's patch, I don't like that for the same reasons you gave. Also, I note that back towards the beginning of July it was said that that patch depends on the assumption of no Gather below an Append. That's broken *now*, not in some hypothetical future. (See the nearby "FailedAssertion on partprune" thread, wherein we're trying to fix the planner's handling of exactly such plans.) What I'm thinking might be a more appropriate thing, at least for getting v11 out the door, is to refuse to generate partitionwise joins when any whole-row vars are involved. (Perhaps that's not enough to dodge all the problems, though?) Now, that's a bit of a problem for postgres_fdw, because it seems to insist on injecting WRVs even when the query text does not require any. Why is that, and can't we get rid of it? It's horrid for performance even without any of these other considerations. But if we fail to get rid of that in time for v11, it would mean that postgres_fdw can't participate in PWJ, which isn't great but I think we could live with it for now. BTW, what exactly do we think the production status of PWJ is, anyway? I notice that upthread it was stated that enable_partitionwise_join defaults to off, as indeed it still does, and we'd turn it on later when we'd gotten rid of some memory-hogging problems. If that hasn't happened yet (and I don't see any open item about considering enabling PWJ by default for v11), then I have exactly no hesitation about lobotomizing PWJ as hard as we need to to mask this problem for v11. I'd certainly prefer a solution along those lines to either of these patches. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Hi, On 2018-08-02 15:19:49 -0400, Robert Haas wrote: > Does anyone else want to weigh in on this? It seems to me that there > are several people who are quite willing to complain about the fact > that this hasn't been fixed, but not willing to express an opinion > about the shape of the fix. Either the RMT needs to take executive > action, or we need some more votes. (taking my RMT hat off, as it's not been coordinated) Well, I think it's enough in the weeds that not too many people are going to have an informed opinion on this, and I'm not sure uninformed opinions help. But I do think the fact that one of the authors and the committer are on one side, and Tom seems to tentatively agree, weighs quite a bit here. As far as I can tell, it's "just" Etsuro on the other side - obviously his opinion counts, but it doesn't outweigh others. I think if you feel confident, you should be ok just commit the approach you favor, and I read you as being confident that that's the better approach. Greetings, Andres Freund
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Aug 2, 2018 at 7:20 AM, Etsuro Fujita wrote: > The new approach seems to me more localized than what Ashutosh had proposed. > One obvious advantage of the new approach is that we no longer need changes > to postgres_fdw for it to work for partitionwise joins, which would apply > for any other FDWs, making the FDW authors' life easy. Well, I don't know what to say here expect that I don't agree. I think Ashutosh's approach is a pretty natural extension of the system that we have now. It involves needing to handle converted whole row vars in some new places, most of which were handled in the original patch, but postgres_fdw was missed. Your approach involves changing the meaning of the target list, but only in narrow corner cases. I completely disagree that we can say it's less invasive. It may indeed be less work for extension authors, though, though perhaps at the expense of moving the conversion from the remote server to the local one. >> But in general, with your approach, any code that >> looks at the tlist for a child-join has to know that the tlist is to >> be used as-is *except that* ConvertRowtypeExpr may need to be >> inserted. I think that special case could be easy to overlook, and it >> seems pretty arbitrary. > > I think we can check to see whether the conversion is needed, from the flags > added to RelOptInfo. Sure, I don't dispute that it can be made to work. I just think it's an ugly kludge. > I have to admit that it's weird to adjust the child's targetlist in that > case when putting the child under the parent's Append/MergeAppend, but IMHO > I think that would be much localized. Yeah, I just don't agree at all. Does anyone else want to weigh in on this? It seems to me that there are several people who are quite willing to complain about the fact that this hasn't been fixed, but not willing to express an opinion about the shape of the fix. Either the RMT needs to take executive action, or we need some more votes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/02 4:30), Robert Haas wrote: On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita wrote: I updated the patch that way. Updated patch attached. I fixed a bug and did a bit of cleanups as well. Looking this over from a technical point of view, I think it's better than what you proposed before but I still don't agree with the approach. I don't think there is any getting around the fact that converted whole-row vars are going to result in some warts. Ashutosh inserted most of the necessary warts in the original partition-wise join patch, but missed some cases in postgres_fdw; his approach is, basically, to add the matching warts there. Your approach is to rip out all of those warts and insert different ones. You've simplified build_tlist_index_other_vars() and add_placeholders_to_joinrel() and build_joinrel_tlist() to basically what they were before partition-wise join went in. On the other hand, RelOptInfo grows three new fields, Sorry, I forgot to remove one of the fields that isn't needed anymore. RelOptInfo needs two new fields, in the new approach. set_append_rel_size() ends up more complicated than it was before when you include the node code added in the build_childrel_tlist function it calls, build_child_joinrel() has a different set of complications, and most importantly create_append_path() and create_merge_append_path() now do surgery on their input paths that knows specifically about the converted-whole-row var case. I would be glad to be rid of the stuff that you're ripping out, but in terms of complexity, I don't think we really come out ahead with the stuff you're adding. The new approach seems to me more localized than what Ashutosh had proposed. One obvious advantage of the new approach is that we no longer need changes to postgres_fdw for it to work for partitionwise joins, which would apply for any other FDWs, making the FDW authors' life easy. I'm also still concerned about the design. In your old approach, you were creating the paths with with the wrong target list and then fixing it when you turned the paths into a plan. In your new approach, you're still creating the paths with the wrong target list, but now you're fixing it when you put an Append or MergeAppend path on top of them. I still think it's a bad idea to have anything other than the correct paths in the target list from the beginning. I don't think so; actually, the child's targetlist would not have to be the same as the parent's until the child gets in under the parent's Append or MergeAppend. So, I modified the child's target list so as to make it easy to join the child relations. For one thing, what happens if no Append or MergeAppend is ever put on top of them? One way that can happen today is partition-wise aggregate, although I haven't been able to demonstrate a bug, so maybe that's covered somehow. Right. In the new approach, the targetlist for the topmost child scan/join is guaranteed to be the same as that for the topmost parent scan/join by the planner work that I added. But in general, with your approach, any code that looks at the tlist for a child-join has to know that the tlist is to be used as-is *except that* ConvertRowtypeExpr may need to be inserted. I think that special case could be easy to overlook, and it seems pretty arbitrary. I think we can check to see whether the conversion is needed, from the flags added to RelOptInfo. A tlist is a list of arbitrary expressions to be produced; with this patch, we'd end up with the tlist being the list of expressions to be produced in every case *except for* child-joins containing whole-row-vars. I just can't get behind a strange exception like that. I have to admit that it's weird to adjust the child's targetlist in that case when putting the child under the parent's Append/MergeAppend, but IMHO I think that would be much localized. Thanks for the comments! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/02 2:44), Robert Haas wrote: Tom, Ashutosh, and I all seem to agree that we shouldn't try to re-jigger things at create-plan time. I think that a 3-1 consensus against your proposal is sufficient to say we shouldn't go that way. Agreed. Now, here you've got a new approach which avoids that, which I have not yet reviewed. I'll try to spend some time on it this afternoon, but really, I think it's too late for this. This bug was reported in February, and we're supposed to be pushing 11 final out the door in not much more than a month. Proposing a new approach in August is not good. Agreed. Can't we just do what Ashutosh proposed for now and revisit this for v12? I think that may be possible. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita wrote: > I updated the patch that way. Updated patch attached. I fixed a bug and > did a bit of cleanups as well. Looking this over from a technical point of view, I think it's better than what you proposed before but I still don't agree with the approach. I don't think there is any getting around the fact that converted whole-row vars are going to result in some warts. Ashutosh inserted most of the necessary warts in the original partition-wise join patch, but missed some cases in postgres_fdw; his approach is, basically, to add the matching warts there. Your approach is to rip out all of those warts and insert different ones. You've simplified build_tlist_index_other_vars() and add_placeholders_to_joinrel() and build_joinrel_tlist() to basically what they were before partition-wise join went in. On the other hand, RelOptInfo grows three new fields, set_append_rel_size() ends up more complicated than it was before when you include the node code added in the build_childrel_tlist function it calls, build_child_joinrel() has a different set of complications, and most importantly create_append_path() and create_merge_append_path() now do surgery on their input paths that knows specifically about the converted-whole-row var case. I would be glad to be rid of the stuff that you're ripping out, but in terms of complexity, I don't think we really come out ahead with the stuff you're adding. I'm also still concerned about the design. In your old approach, you were creating the paths with with the wrong target list and then fixing it when you turned the paths into a plan. In your new approach, you're still creating the paths with the wrong target list, but now you're fixing it when you put an Append or MergeAppend path on top of them. I still think it's a bad idea to have anything other than the correct paths in the target list from the beginning. For one thing, what happens if no Append or MergeAppend is ever put on top of them? One way that can happen today is partition-wise aggregate, although I haven't been able to demonstrate a bug, so maybe that's covered somehow. But in general, with your approach, any code that looks at the tlist for a child-join has to know that the tlist is to be used as-is *except that* ConvertRowtypeExpr may need to be inserted. I think that special case could be easy to overlook, and it seems pretty arbitrary. A tlist is a list of arbitrary expressions to be produced; with this patch, we'd end up with the tlist being the list of expressions to be produced in every case *except for* child-joins containing whole-row-vars. I just can't get behind a strange exception like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Aug 1, 2018 at 7:46 AM, Etsuro Fujita wrote: > I posted the updated patch [1]. Etsuro-san: I really think we should just go with what Ashutosh had proposed. Tom, Ashutosh, and I all seem to agree that we shouldn't try to re-jigger things at create-plan time. I think that a 3-1 consensus against your proposal is sufficient to say we shouldn't go that way. Now, here you've got a new approach which avoids that, which I have not yet reviewed. I'll try to spend some time on it this afternoon, but really, I think it's too late for this. This bug was reported in February, and we're supposed to be pushing 11 final out the door in not much more than a month. Proposing a new approach in August is not good. Can't we just do what Ashutosh proposed for now and revisit this for v12? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/08/01 5:31), Alvaro Herrera wrote: On 2018-Jul-31, Etsuro Fujita wrote: (2018/07/31 4:06), Andres Freund wrote: On 2018-07-20 08:38:09 -0400, Robert Haas wrote: I'm going to study this some more now, but I really think this is going in the wrong direction. We're going to have to get somewhere on this topic soon. This thread has been open for nearly half a year, and we're late in the beta phase now. What can we do to get to an agreement soon? I'd like to propose an updated patch as proposed in [1]. But there is no patch there, and neither there is agreement from the other party that the approach described there is sound. I posted the updated patch [1]. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5B619D23.401%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/26 21:11), Etsuro Fujita wrote: (2018/07/26 5:27), Robert Haas wrote: Well, I could have the wrong idea here, but I tend to think allowing for ConvertRowTypeExpr elsewhere won't be that bad. I still don't like that because in my opinion, changes needed for that would not be localized, and that would make code complicated more than necessary. As I mentioned in a previous email, another idea to avoid that would be to adjust tlists for children at path creation time, not plan creation time; we could adjust the tlist for each of subpaths accumulated for an Append/MergeAppend path in add_paths_to_append_rel called from set_append_rel_pathlist or generate_partitionwise_join_paths, with create_projection_path adding ConvertRowTypeExpr. It seems unlikely that performing create_projection_path to such a subpath would change its property of being the cheapest, so it would be safe to adjust the tlists that way. This would not require making create_plan complicated anymore. I might be missing something, though. I updated the patch that way. Updated patch attached. I fixed a bug and did a bit of cleanups as well. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 8337,8344 ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; --- 8337,8345 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); + ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; *** *** 8391,8416 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !QUERY PLAN ! - Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) -> Append -> Foreign Scan !Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) -> Foreign Scan !Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) (7 rows) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !t1 | t2 + (0,0,) | (0,0,) (150,150,0003) | (150,150,0003) (250,250,0005) | (250,250,0005) (400,400,0008) | (400,400,0008) ! (4 rows) -- join with lateral reference EXPLAIN (COSTS OFF) --- 8392,8427 -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !QUERY PLAN ! Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) -> Append -> Foreign Scan !Relations: (public.ftprt1_p1 t1) FULL JOIN (public.ftprt2_p1 t2) -> Foreign Scan !Relations: (public.ftprt1_p2 t1) FULL JOIN (public.ftprt2_p2 t2) (7 rows) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !wr | wr + (0,0,) | (0,0,) + (50,50,0001) | + (100,100,0002) | (150,150,0003) | (150,150,0003) + (200,200,0004) | (250,250,0005) | (250,250,0005) + (300,300,0006) | + (350,350,0007) | (400,400,0008) | (400,400,0008) ! (450,450,0009)
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On 2018-Jul-31, Etsuro Fujita wrote: > (2018/07/31 4:06), Andres Freund wrote: > > On 2018-07-20 08:38:09 -0400, Robert Haas wrote: > > > I'm going to study this some more now, but I really think this is > > > going in the wrong direction. > > > > We're going to have to get somewhere on this topic soon. This thread has > > been open for nearly half a year, and we're late in the beta phase now. > > What can we do to get to an agreement soon? > > I'd like to propose an updated patch as proposed in [1]. But there is no patch there, and neither there is agreement from the other party that the approach described there is sound. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/31 4:06), Andres Freund wrote: On 2018-07-20 08:38:09 -0400, Robert Haas wrote: I'm going to study this some more now, but I really think this is going in the wrong direction. We're going to have to get somewhere on this topic soon. This thread has been open for nearly half a year, and we're late in the beta phase now. What can we do to get to an agreement soon? I'd like to propose an updated patch as proposed in [1]. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5B59BA55.30200%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Hi, On 2018-07-20 08:38:09 -0400, Robert Haas wrote: > I'm going to study this some more now, but I really think this is > going in the wrong direction. We're going to have to get somewhere on this topic soon. This thread has been open for nearly half a year, and we're late in the beta phase now. What can we do to get to an agreement soon? Greetings, Andres Freund
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/26 5:27), Robert Haas wrote: On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita wrote: Isn't that assumption fundamental to your whole approach? I don't think so. What I mean here is: currently the subplan would be a scan/join node, but in future we might have eg, a Sort node atop the scan/join node, so it would be better to update the patch to handle such a case as well. But how would you do that? What I had in mind was to insert a Rusult node with inject_projection_plan and adjust the tlist of the Result, as done for adding sort columns to a tlist in prepare_sort_from_pathkeys. I think that's a bad idea. The target list affects lots of things, like costing. If we don't insert a ConvertRowTypeExpr into the child's target list, the costing will be wrong to the extent that ConvertRowTypeExpr has any cost, which it does. Actually, this is not true at least currently, because set_append_rel_size doesn't do anything about the costing: Why would it? Append can't project, so the cost of any expressions that appear in its target list is irrelevant. What is affected is the cost of the scans below the Append -- see e.g. cost_seqscan(), which uses the data produced by set_pathtarget_cost_width(). By set_rel_size()? Sorry, I don't understand what you mean by this. I think the data used by such a costing function is computed by set_rel_size in set_append_rel_size, not set_pathtarget_cost_width; in the case of a plain partition, for example, set_rel_size would call set_plain_rel_size, and then set_plain_rel_size would eventually call set_rel_width, which sets reltarget->cost, which I think would be used by e.g., cost_seqscan. cost_qual_eval_node, which is called in set_rel_width for computing the cost of ConvertRowTypeExpr, ignores that expression, so currently, we don't charge any cost for it to the partition's reltarget->cost, and to the cost of a scan below the Append. I'm not sure that's a good idea, because I think we have a trade-off relation; the more we make create_plan simple, the more we need to make earlier states of the planner complicated. And it looks to me like the partitionwise join code is making earlier (and later) stages of the planner too complicated, to make create_plan simple. I think that create_plan is *supposed* to be simple. Its purpose is to prune away data that's only needed during planning and add things that can be computed at the last minute which are needed at execution time. Making it do anything else is, in my opinion, not good. I agree on that point. When considering paritionwise joining, it would make things complicated to have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed upthread, it deviates from the planner's assumption that a rel's targetlist would only include Vars and PHVs. So, I propose to include a child whole-row Var in the targetlist instead, in which case, we need to fix the Var after the fact, but can avoid making many other parts of the planner complicated. Well, I could have the wrong idea here, but I tend to think allowing for ConvertRowTypeExpr elsewhere won't be that bad. I still don't like that because in my opinion, changes needed for that would not be localized, and that would make code complicated more than necessary. As I mentioned in a previous email, another idea to avoid that would be to adjust tlists for children at path creation time, not plan creation time; we could adjust the tlist for each of subpaths accumulated for an Append/MergeAppend path in add_paths_to_append_rel called from set_append_rel_pathlist or generate_partitionwise_join_paths, with create_projection_path adding ConvertRowTypeExpr. It seems unlikely that performing create_projection_path to such a subpath would change its property of being the cheapest, so it would be safe to adjust the tlists that way. This would not require making create_plan complicated anymore. I might be missing something, though. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Jul 26, 2018 at 2:12 AM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita >> wrote: >>> I'm not sure that's a good idea, because I think we have a trade-off >>> relation; the more we make create_plan simple, the more we need to make >>> earlier states of the planner complicated. >>> >>> And it looks to me like the partitionwise join code is making earlier (and >>> later) stages of the planner too complicated, to make create_plan simple. > >> I think that create_plan is *supposed* to be simple. Its purpose is >> to prune away data that's only needed during planning and add things >> that can be computed at the last minute which are needed at execution >> time. Making it do anything else is, in my opinion, not good. > > I tend to agree with Robert on this. In particular, if you put anything > into create_plan or later that affects cost estimates, you're really doing > it wrong, because changing those estimates might've changed the plan > entirely. (As I recall, we have a couple of cheats like that now, > associated with dropping no-op projection and subquery scan nodes. > But let's not add more.) +1 > > TBH, I think this entire discussion is proving what sheer folly it was > to allow partitions to have rowtypes not identical to their parent. > But I suppose we're stuck with that now. > Every table created has a different rowtype and when the partition's rowtype is different from that of the partitioned table, we add a ConvertRowtypeExpr node on top of the whole-row expression. 2153 if (appinfo->parent_reltype != appinfo->child_reltype) 2154 { 2155 ConvertRowtypeExpr *r = makeNode(ConvertRowtypeExpr); We may be able to set rowtype of a partition to be same as the rowtype of a partitioned table when a partition table is created using an ALTER TABLE syntax. But are you suggesting that we change the reltype of a partition being attached? What happens when we detach a partition and let the table live independent of the partitioned table? Do we create a new reltype? I am not even going into the problem when we try to attach a partition with different tuple layout. I do believe that such usecases arise and rewriting the table being attached is a viable option. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Robert Haas writes: > On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita > wrote: >> I'm not sure that's a good idea, because I think we have a trade-off >> relation; the more we make create_plan simple, the more we need to make >> earlier states of the planner complicated. >> >> And it looks to me like the partitionwise join code is making earlier (and >> later) stages of the planner too complicated, to make create_plan simple. > I think that create_plan is *supposed* to be simple. Its purpose is > to prune away data that's only needed during planning and add things > that can be computed at the last minute which are needed at execution > time. Making it do anything else is, in my opinion, not good. I tend to agree with Robert on this. In particular, if you put anything into create_plan or later that affects cost estimates, you're really doing it wrong, because changing those estimates might've changed the plan entirely. (As I recall, we have a couple of cheats like that now, associated with dropping no-op projection and subquery scan nodes. But let's not add more.) TBH, I think this entire discussion is proving what sheer folly it was to allow partitions to have rowtypes not identical to their parent. But I suppose we're stuck with that now. regards, tom lane
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita wrote: >> Isn't that assumption fundamental to your whole approach? > > I don't think so. What I mean here is: currently the subplan would be a > scan/join node, but in future we might have eg, a Sort node atop the > scan/join node, so it would be better to update the patch to handle such a > case as well. But how would you do that? I think that's a bad idea. The target list affects lots of things, like costing. If we don't insert a ConvertRowTypeExpr into the child's target list, the costing will be wrong to the extent that ConvertRowTypeExpr has any cost, which it does. >>> >>> >>> Actually, this is not true at least currently, because >>> set_append_rel_size >>> doesn't do anything about the costing: >> >> >> Why would it? Append can't project, so the cost of any expressions >> that appear in its target list is irrelevant. What is affected is the >> cost of the scans below the Append -- see e.g. cost_seqscan(), which >> uses the data produced by set_pathtarget_cost_width(). > > By set_rel_size()? Sorry, I don't understand what you mean by this. > I'm not sure that's a good idea, because I think we have a trade-off > relation; the more we make create_plan simple, the more we need to make > earlier states of the planner complicated. > > And it looks to me like the partitionwise join code is making earlier (and > later) stages of the planner too complicated, to make create_plan simple. I think that create_plan is *supposed* to be simple. Its purpose is to prune away data that's only needed during planning and add things that can be computed at the last minute which are needed at execution time. Making it do anything else is, in my opinion, not good. > When considering paritionwise joining, it would make things complicated to > have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed > upthread, it deviates from the planner's assumption that a rel's targetlist > would only include Vars and PHVs. So, I propose to include a child > whole-row Var in the targetlist instead, in which case, we need to fix the > Var after the fact, but can avoid making many other parts of the planner > complicated. Well, I could have the wrong idea here, but I tend to think allowing for ConvertRowTypeExpr elsewhere won't be that bad. Does anyone else want to weigh in on this issue? Tom, perhaps? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/24 3:09), Robert Haas wrote: On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita wrote: I have to admit that that is not a good idea. So, I'll update the patch so that we don't assume the projection capability of the subplan anymore, if we go this way. Isn't that assumption fundamental to your whole approach? I don't think so. What I mean here is: currently the subplan would be a scan/join node, but in future we might have eg, a Sort node atop the scan/join node, so it would be better to update the patch to handle such a case as well. Also, even today, this would fail if the subplan happened to be a Sort, and it's not very obvious why that couldn't happen. You mean the MergeAppend case? In that case, the patch will adjust the subplan before adding a Sort above the subplan, so that could not happen. That would only help if create_merge_append_plan() itself inserted a Sort node. It wouldn't help if the Sort node came from a child path manufactured by create_sort_path(). As I mentioned above, I think we could handle such a case as well. I think that's a bad idea. The target list affects lots of things, like costing. If we don't insert a ConvertRowTypeExpr into the child's target list, the costing will be wrong to the extent that ConvertRowTypeExpr has any cost, which it does. Actually, this is not true at least currently, because set_append_rel_size doesn't do anything about the costing: Why would it? Append can't project, so the cost of any expressions that appear in its target list is irrelevant. What is affected is the cost of the scans below the Append -- see e.g. cost_seqscan(), which uses the data produced by set_pathtarget_cost_width(). By set_rel_size()? Some createplan.c routines already change the tlists of their nodes. For example, create_merge_append_plan would add sort columns to each of its subplans if necessary. I think it would be similar to that to add a ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist. You have a point, but I think that code is actually not a very good idea, and I'd like to see us do less of that sort of thing, not more. Any case in which we change the plan while creating it has many of the same problems that I discussed in the previous email. For example, create_merge_append_path() has to know that a Sort node might get inserted and set the costing accordingly. If the callers guaranteed that the necessary sort path had already been inserted, then we wouldn't need that special handling. I'm not sure that's a good idea, because I think we have a trade-off relation; the more we make create_plan simple, the more we need to make earlier states of the planner complicated. And it looks to me like the partitionwise join code is making earlier (and later) stages of the planner too complicated, to make create_plan simple. Also, that code is adding additional columns, computed from the columns we have available, so that we can sort. Those extra columns then get discarded at the next level of the Plan tree. What you're trying to do is different. Perhaps this is too harsh a judgement, but it looks to me like you're using a deliberately-wrong representation of the value that you ultimately want to produce and then patching it up after the fact. When considering paritionwise joining, it would make things complicated to have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed upthread, it deviates from the planner's assumption that a rel's targetlist would only include Vars and PHVs. So, I propose to include a child whole-row Var in the targetlist instead, in which case, we need to fix the Var after the fact, but can avoid making many other parts of the planner complicated. That seems quite a bit worse than what the existing code is doing. One idea to address that concern would be to adjust the pathtarget of each path created in generate_partitionwise_join_paths accordingly. I'm missing something, though. I reviewed his patch, and thought that that would fix the issue, but this is about the current design on the child tlist handling in partitionise join rather than that patch: it deviates from the assumption that we had for the scan/join planning till PG10 that "a rel's targetlist would only include Vars and PlaceHolderVars", and turns out that there are a lot of places where we need to modify code to suppress that deviation, as partly shown in that patch. So, ISTM that the current design in itself is not that localized. I used to think that was the assumption, too, but it seems to me that Tom told me a while back that it was never really true, and that assumption was in my head. Unfortunately, I don't have a link to that email handy. Either way, I think the solution is to get the tlist right from the start and cope with the consequences downstream, not start with the wrong thing and try to fix it later. I'm not fully convinced that's really the right solution,
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita wrote: > I have to admit that that is not a good idea. So, I'll update the patch so > that we don't assume the projection capability of the subplan anymore, if we > go this way. Isn't that assumption fundamental to your whole approach? >> Also, even today, this would fail if the subplan happened to be >> a Sort, and it's not very obvious why that couldn't happen. > > You mean the MergeAppend case? In that case, the patch will adjust the > subplan before adding a Sort above the subplan, so that could not happen. That would only help if create_merge_append_plan() itself inserted a Sort node. It wouldn't help if the Sort node came from a child path manufactured by create_sort_path(). >> I think that's a bad idea. The target list affects lots >> of things, like costing. If we don't insert a ConvertRowTypeExpr into >> the child's target list, the costing will be wrong to the extent that >> ConvertRowTypeExpr has any cost, which it does. > > Actually, this is not true at least currently, because set_append_rel_size > doesn't do anything about the costing: Why would it? Append can't project, so the cost of any expressions that appear in its target list is irrelevant. What is affected is the cost of the scans below the Append -- see e.g. cost_seqscan(), which uses the data produced by set_pathtarget_cost_width(). > Some createplan.c routines already change the tlists of their nodes. For > example, create_merge_append_plan would add sort columns to each of its > subplans if necessary. I think it would be similar to that to add a > ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist. You have a point, but I think that code is actually not a very good idea, and I'd like to see us do less of that sort of thing, not more. Any case in which we change the plan while creating it has many of the same problems that I discussed in the previous email. For example, create_merge_append_path() has to know that a Sort node might get inserted and set the costing accordingly. If the callers guaranteed that the necessary sort path had already been inserted, then we wouldn't need that special handling. Also, that code is adding additional columns, computed from the columns we have available, so that we can sort. Those extra columns then get discarded at the next level of the Plan tree. What you're trying to do is different. Perhaps this is too harsh a judgement, but it looks to me like you're using a deliberately-wrong representation of the value that you ultimately want to produce and then patching it up after the fact. That seems quite a bit worse than what the existing code is doing. > I reviewed his patch, and thought that that would fix the issue, but this is > about the current design on the child tlist handling in partitionise join > rather than that patch: it deviates from the assumption that we had for the > scan/join planning till PG10 that "a rel's targetlist would only include > Vars and PlaceHolderVars", and turns out that there are a lot of places > where we need to modify code to suppress that deviation, as partly shown in > that patch. So, ISTM that the current design in itself is not that > localized. I used to think that was the assumption, too, but it seems to me that Tom told me a while back that it was never really true, and that assumption was in my head. Unfortunately, I don't have a link to that email handy. Either way, I think the solution is to get the tlist right from the start and cope with the consequences downstream, not start with the wrong thing and try to fix it later. To see an example of why I think that's a valuable approach, see 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5, especially the changes in the regression test outputs. The old code discovered after it had generated an Append that it had the wrong tlist, but since the Append can't project, it had to add a Result node. With the new code, we get the children of the Append to produce the right output from the start, and then the Append just needs to concatenate all that output, so no Result node is needed. As noted in the commit message, it also made the costing more accurate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, Jul 20, 2018 at 8:56 PM, Robert Haas wrote: [ ... clipped portion ...] > > In short, plan creation time is just the wrong time to change the > plan. It's just a time to translate the plan from the form needed by > the planner to the form needed by the executor. It would be fine if > the change you were making were only cosmetic, but it's not. > Agree with all that, including the clipped paras. > After looking over both patches, I think Ashutosh Bapat has basically > the right idea. What he is proposing is a localized fix that doesn't > seem to make any changes to the way things work overall. I do think > his patches need to be fixed up a bit to avoid conflating > ConvertRowtypeExpr with "converted whole-row reference." The two are > certainly not equivalent; the latter is a narrow special case. Some > of the comments could use some more wordsmithing, too, I think. Apart > from those gripes, though, I think it's solving the problem the > correct way: don't build the wrong plan and try to fix it, just build > the right plan from the beginning. I will work on that if everyone agrees that that's the right way to go. Fujita-san seems to have some arguments still. I have argued on the same lines as yours but your explanation is better. I don't have anything more to add. I will wait for that to be resolved. > > There are definitely some things not to like about this approach. In > particular, I definitely agree that treating a converted whole-row > reference specially is not very nice. It feels like it might be > substantially cleaner to be able to represent this idea as a single > node rather than a combination of ConvertRowtypeExpr and var with > varattno = 0. Perhaps in the future we ought to consider either (a) > trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) > inventing a new WholeRowExpr node that stores two RTIs, one for the > relation that's generating the whole-row reference and the other for > the relation that is controlling the column ordering of the result or > (c) allowing a Var to represent a whole-row expression that has to > produce its outputs according to the ordering of some other RTE. I never liked representing a whole-row expression as a Var worst with varattno = 0. We should have always casted it as RowExpr(set of vars, one var per column). This problem wouldn't arise then. Many other problems wouldn't arise then, I think. I think we do that so that we can convert a tuple from buffer into a datum and put it in place of Var with varattno = 0. Given that I prefer (a) or (b) in all the cases. If not that, then c. But I agree that we have to avoid ConvertRowtypeExpr being used in this case. > But > I don't think it's wise or necessary to whack that around just to fix > this bug; it is refactoring or improvement work best left to a future > release. > > Also, it is definitely a real shame that we have to build attr_needed > data for each child separately. Right now, we've got to build a whole > lot of things for each child individually, and that's slow and > inefficient. We're not going to scale nicely to large partitioning > hierarchies unless we can figure out some way of minimizing the work > we do for each partition. So, the fact that Fujii-san's approach > avoids needing to compute attr_needed in some cases is very appealing. > However, in my opinion, it's nowhere near appealing enough to justify > trying to do surgery on the target list at plan-creation time. I > think we need to leave optimizing this to a future effort. > Partition-wise join/aggregate, and partitioning in general, need a lot > of optimization in a lot of places, and fortunately there are people > working on that, but our goal here should just be to fix things up > well enough that we can ship it. I agree. > I don't see anything in Ashutosh's > patch that is so ugly that we can't live with it for the time being. Thanks. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/21 0:26), Robert Haas wrote: On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas wrote: This looks like a terrible design to me. If somebody in future invents a new plan type that is not projection-capable, then this could fail an assertion here and there won't be any simple fix. And if you reach here and the child targetlists somehow haven't been adjusted, then you won't even fail an assertion, you'll just crash (or maybe return wrong answers). To elaborate on this thought a bit, it is currently the case that all scan and join types are projection-capable, and most likely we would make any such node types added in the future projection-capable as well, but it still doesn't seem like a good idea to me to have tangentially-related parts of the system depend on that in subtle ways. I have to admit that that is not a good idea. So, I'll update the patch so that we don't assume the projection capability of the subplan anymore, if we go this way. Also, even today, this would fail if the subplan happened to be a Sort, and it's not very obvious why that couldn't happen. You mean the MergeAppend case? In that case, the patch will adjust the subplan before adding a Sort above the subplan, so that could not happen. More broadly, the central idea of this patch is that we can set the target list for a child rel to something other than the target list that we expect it will eventually produce, and then at plan creation time fix it. Yeah, the patch would fix the the final output to the appendrel parent at plan creation time if necessary, but it would produces the tlist of an intermediate child scan/join node as written in the child relation's reltarget at that time. I think that's a bad idea. The target list affects lots of things, like costing. If we don't insert a ConvertRowTypeExpr into the child's target list, the costing will be wrong to the extent that ConvertRowTypeExpr has any cost, which it does. Actually, this is not true at least currently, because set_append_rel_size doesn't do anything about the costing: * NB: the resulting childrel->reltarget->exprs may contain arbitrary * expressions, which otherwise would not occur in a rel's targetlist. * Code that might be looking at an appendrel child must cope with * such. (Normally, a rel's targetlist would only include Vars and * PlaceHolderVars.) XXX we do not bother to update the cost or width * fields of childrel->reltarget; not clear if that would be useful. Any other current or future property that is computed based on the target list -- parallel-safety, width, security-level, whatever -- could also potentially end up with a wrong value if the target list as written does not match the target list as will actually be produced. It's true that, in all of these areas, the ConvertRowTypeExpr isn't likely to have a lot of impact; it probably won't have a big effect on costing and may not actually cause a problem for the other things that I mentioned. On the other hand, if it does cause a serious problem, what options would we have for fixing it other than to back out your entire patch and solve the problem some other way? The idea that derived properties won't be correct unless they are based on the real target list is pretty fundamental, and I think deviating from the idea that we get the correct target list first and then compute the derived properties afterward is likely to cause real headaches for future developers. In short, plan creation time is just the wrong time to change the plan. It's just a time to translate the plan from the form needed by the planner to the form needed by the executor. It would be fine if the change you were making were only cosmetic, but it's not. Some createplan.c routines already change the tlists of their nodes. For example, create_merge_append_plan would add sort columns to each of its subplans if necessary. I think it would be similar to that to add a ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist. After looking over both patches, I think Ashutosh Bapat has basically the right idea. What he is proposing is a localized fix that doesn't seem to make any changes to the way things work overall. I reviewed his patch, and thought that that would fix the issue, but this is about the current design on the child tlist handling in partitionise join rather than that patch: it deviates from the assumption that we had for the scan/join planning till PG10 that "a rel's targetlist would only include Vars and PlaceHolderVars", and turns out that there are a lot of places where we need to modify code to suppress that deviation, as partly shown in that patch. So, ISTM that the current design in itself is not that localized. Partition-wise join/aggregate, and partitioning in general, need a lot of optimization in a lot of places, and fortunately there are people working on that, but our goal
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas wrote: > This looks like a terrible design to me. If somebody in future > invents a new plan type that is not projection-capable, then this > could fail an assertion here and there won't be any simple fix. And > if you reach here and the child targetlists somehow haven't been > adjusted, then you won't even fail an assertion, you'll just crash (or > maybe return wrong answers). To elaborate on this thought a bit, it is currently the case that all scan and join types are projection-capable, and most likely we would make any such node types added in the future projection-capable as well, but it still doesn't seem like a good idea to me to have tangentially-related parts of the system depend on that in subtle ways. Also, even today, this would fail if the subplan happened to be a Sort, and it's not very obvious why that couldn't happen. If it can't happen today, it's not obvious why a future code change couldn't introduce cases where it can happen. More broadly, the central idea of this patch is that we can set the target list for a child rel to something other than the target list that we expect it will eventually produce, and then at plan creation time fix it. I think that's a bad idea. The target list affects lots of things, like costing. If we don't insert a ConvertRowTypeExpr into the child's target list, the costing will be wrong to the extent that ConvertRowTypeExpr has any cost, which it does. Any other current or future property that is computed based on the target list -- parallel-safety, width, security-level, whatever -- could also potentially end up with a wrong value if the target list as written does not match the target list as will actually be produced. It's true that, in all of these areas, the ConvertRowTypeExpr isn't likely to have a lot of impact; it probably won't have a big effect on costing and may not actually cause a problem for the other things that I mentioned. On the other hand, if it does cause a serious problem, what options would we have for fixing it other than to back out your entire patch and solve the problem some other way? The idea that derived properties won't be correct unless they are based on the real target list is pretty fundamental, and I think deviating from the idea that we get the correct target list first and then compute the derived properties afterward is likely to cause real headaches for future developers. In short, plan creation time is just the wrong time to change the plan. It's just a time to translate the plan from the form needed by the planner to the form needed by the executor. It would be fine if the change you were making were only cosmetic, but it's not. After looking over both patches, I think Ashutosh Bapat has basically the right idea. What he is proposing is a localized fix that doesn't seem to make any changes to the way things work overall. I do think his patches need to be fixed up a bit to avoid conflating ConvertRowtypeExpr with "converted whole-row reference." The two are certainly not equivalent; the latter is a narrow special case. Some of the comments could use some more wordsmithing, too, I think. Apart from those gripes, though, I think it's solving the problem the correct way: don't build the wrong plan and try to fix it, just build the right plan from the beginning. There are definitely some things not to like about this approach. In particular, I definitely agree that treating a converted whole-row reference specially is not very nice. It feels like it might be substantially cleaner to be able to represent this idea as a single node rather than a combination of ConvertRowtypeExpr and var with varattno = 0. Perhaps in the future we ought to consider either (a) trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b) inventing a new WholeRowExpr node that stores two RTIs, one for the relation that's generating the whole-row reference and the other for the relation that is controlling the column ordering of the result or (c) allowing a Var to represent a whole-row expression that has to produce its outputs according to the ordering of some other RTE. But I don't think it's wise or necessary to whack that around just to fix this bug; it is refactoring or improvement work best left to a future release. Also, it is definitely a real shame that we have to build attr_needed data for each child separately. Right now, we've got to build a whole lot of things for each child individually, and that's slow and inefficient. We're not going to scale nicely to large partitioning hierarchies unless we can figure out some way of minimizing the work we do for each partition. So, the fact that Fujii-san's approach avoids needing to compute attr_needed in some cases is very appealing. However, in my opinion, it's nowhere near appealing enough to justify trying to do surgery on the target list at plan-creation time. I think we need to leave optimizing this to
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Jul 18, 2018 at 8:00 AM, Etsuro Fujita wrote: > [ new patch ] + /* + * If the child plan is an Append or MergeAppend, the targetlists of its + * subplans would have already been adjusted before we get here, so need + * to work here. + */ + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) + return; + + /* The child plan should be a scan or join */ + Assert(is_projection_capable_plan(subplan)); This looks like a terrible design to me. If somebody in future invents a new plan type that is not projection-capable, then this could fail an assertion here and there won't be any simple fix. And if you reach here and the child targetlists somehow haven't been adjusted, then you won't even fail an assertion, you'll just crash (or maybe return wrong answers). I'm going to study this some more now, but I really think this is going in the wrong direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/13 23:05), Ashutosh Bapat wrote: On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita wrote: In this example, the value of the whole-row reference to the child table ptp1 for that record is ('foo',1), and that of the index expression for that record is (1,'foo'). Those have different column orders, but the latter could be mapped to the former by a technique like do_convert_tuple. The expression in this case would look like ptp1::pt::ptp1 which won't match targetlist expression ptp1. I am also doubtful that the planner will be able to deduce that it need to apply an inverse function of ::pt and what exactly such an inverse function is. So index only scan won't be picked. we could support index-only scans with such an index in the case where we have the whole-row reference in the targetlist, not the index expression itself. Can you please show an index only scan path being created in this case? We currently don't consider index-only scan with index expressions, so I haven't thought in detail yet about how the planner would work. But once we have that index-only scan, I think we could extend that to the case mentioned above, by adding this to the planner: if the index expression is of the form var::parenttype, consider that (not only the expression itself but) var can be returned from the index. I think the expression like ptp1::pt::ptp1 would be useful to get the value of ptp1 from the index at execution time. There's a patch in an adjacent thread started by David Rowley to rip out Append/MergeAppend when there is only one subplan. So, your solution won't work there. Thanks for sharing that information! I skimmed the thread. I haven't yet caught up with all the discussions there, so I think I'm missing something, but it looks like that we haven't yet reached any consensus on the way to go. In my opinion, I like the approach mentioned in [1]. And if we go that way, my patch seems to fit into that, because in that approach the Append/MergeAppend could be removed after adjusting the targetlists for its subplans in create_append_plan/create_merge_append_plan. Anyway, I'd like to join in that work for PG12. Whatever may be the outcome of that work, I think what we fix here shouldn't require to reverted in a few months from now, just so that that patch works. I think we could add that optimization without reverting this change because the essential part of this change is to make create_plan adjust the tlists of the subplans based on the instruction stored into the subplans' RelOptInfos (ie, need_adjust_tlist in the second version of the patch). I think this technique could be extended even to the case where we have that optimization. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/04 11:00), Etsuro Fujita wrote: (2018/07/04 1:35), Andres Freund wrote: What's the plan forward here? I still think that this approach would be the right way to go, so I plan to polish the patch. The approach would not require extra cycles where partitioning is not involved as discussed before, and makes code much simpler, which would make maintenance and the FDW author's life easy, so I still think that the approach would be the direction to go in. So, I updated the patch. Here are changes: * I assumed that the child plan passed to adjust_subplan_tlist was a scan or join unless it was Append or MergeAppend, but I noticed I was wrong; if the topmost scan/join relation is partitioned, apply_scanjoin_target_to_paths might have pushed down ProjectSet or Result to each of partitions. So, in that case the child's plan would have ProjectSet or Result atop the scan or join plan. (I could not produce such an example, though.) To address this, I modified build_childrel_tlist so that we just apply adjust_appendrel_attrs to the parent's targetlist to get the child's if the parent is the topmost scan/join relation. So, we don't need to care about that in adjust_subplan_tlist. I think this change would also increase the efficiency for that case. I added regression test cases for that as well. * I fixed an oversight in outfuncs.c. * I renamed member variables I added to RelOptInfo before, and revised code/comments a bit. Attached is an updated version of the patch. Best regards, Etsuro Fujita *** a/postgres-master/contrib/postgres_fdw/expected/postgres_fdw.out --- b/postgres-master/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 8337,8344 ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; --- 8337,8345 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); + ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; *** *** 8391,8416 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !QUERY PLAN ! - Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) -> Append -> Foreign Scan !Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) -> Foreign Scan !Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) (7 rows) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !t1 | t2 + (0,0,) | (0,0,) (150,150,0003) | (150,150,0003) (250,250,0005) | (250,250,0005) (400,400,0008) | (400,400,0008) ! (4 rows) -- join with lateral reference EXPLAIN (COSTS OFF) --- 8392,8427 -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !QUERY PLAN ! Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) -> Append -> Foreign Scan !Relations: (public.ftprt1_p1 t1) FULL JOIN (public.ftprt2_p1 t2) -> Foreign Scan !Relations: (public.ftprt1_p2 t1) FULL JOIN (public.ftprt2_p2 t2) (7 rows) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita wrote: > > In this example, the value of the whole-row reference to the child table > ptp1 for that record is ('foo',1), and that of the index expression for that > record is (1,'foo'). Those have different column orders, but the latter > could be mapped to the former by a technique like do_convert_tuple. So, > what I suspect is: by producing the value of the whole-row reference from > that of the index expression like that, The expression in this case would look like ptp1::pt::ptp1 which won't match targetlist expression ptp1. I am also doubtful that the planner will be able to deduce that it need to apply an inverse function of ::pt and what exactly such an inverse function is. So index only scan won't be picked. > we could support index-only scans > with such an index in the case where we have the whole-row reference in the > targetlist, not the index expression itself. Can you please show an index only scan path being created in this case? > >> If that time is long >> enough, that's ok. In this case, I agree that if we haven't heard from >> users till now that they need to create indexes on whole-row >> expressions, there's chance that we will never implement this feature. > > > I don't object to adding that feature to future versions of PG if required. > >So, I think it is safe to have whole-row > Vars instead of ConvertRowtypeExprs in the targetlist. when it's looking for an expression, it finds a whole-row expression so it think it needs to add a ConvertRowtypeExpr on that. But when the plan is created, there is ConvertRowtypeExpr already, but there is no way to know that a new ConvertRowtypeExpr is not needed anymore. So, we may have two ConvertRowtypeExprs giving wrong results. >>> >>> >>> >>> Maybe I'm missing something, but I don't think that we need to worry >>> about >>> that, because in the approach I proposed, we only add CREs above >>> whole-row >>> Vars in the targetlists for subplans of an Append/MergeAppend for a >>> partitioned relation at plan creation time. >> >> >> There's a patch in an adjacent thread started by David Rowley to rip >> out Append/MergeAppend when there is only one subplan. So, your >> solution won't work there. > > > Thanks for sharing that information! I skimmed the thread. I haven't yet > caught up with all the discussions there, so I think I'm missing something, > but it looks like that we haven't yet reached any consensus on the way to > go. In my opinion, I like the approach mentioned in [1]. And if we go that > way, my patch seems to fit into that, because in that approach the > Append/MergeAppend could be removed after adjusting the targetlists for its > subplans in create_append_plan/create_merge_append_plan. Anyway, I'd like > to join in that work for PG12. Whatever may be the outcome of that work, I think what we fix here shouldn't require to reverted in a few months from now, just so that that patch works. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/12 13:38), Ashutosh Bapat wrote: On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita wrote: (2018/07/11 20:02), Ashutosh Bapat wrote: On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita wrote: Actually, even if we could create such an index on the child table and the targetlist had the ConvertRowtypeExpr, the planner would still not be able to use an index-only scan with that index; because check_index_only would not consider that an index-only scan is possible for that index because that index is an expression index and that function currently does not consider that index expressions are able to be returned back in an index-only scan. That behavior of the planner might be improved in future, though. Right and when we do so, not having ConvertRowtypeExpr in the targetlist will be a problem. Yeah, but I don't think that that's unsolvable; because in that case the CRE as an index expression could be converted back to the whole-row Var's rowtype by adding another CRE to the index expression for that conversion, I suspect that that special handling could allow us to support an index-only scan even when having the whole-row Var instead of the CRE in the targetlist. I am not able to understand this. Can you please provide an example? Sorry, my explanation was not good. Let me try. We can't create an inherited index on the whole-row reference in a partitioned table, but actually, we can directly create the corresponding index on a child table: postgres=# create table pt (a int, b text) partition by list (a); CREATE TABLE postgres=# create table ptp1 (b text, a int check (a in (1))); CREATE TABLE postgres=# alter table pt attach partition ptp1 for values in (1); ALTER TABLE postgres=# create index ptp1_conv_wr_idx on ptp1 ((ptp1::pt)); CREATE INDEX postgres=# insert into pt values (1, 'foo'); INSERT 0 1 postgres=# select *, ptp1 as wr, ptp1::pt as conv_wr from ptp1; b | a | wr| conv_wr -+---+-+- foo | 1 | (foo,1) | (1,foo) (1 row) In this example, the value of the whole-row reference to the child table ptp1 for that record is ('foo',1), and that of the index expression for that record is (1,'foo'). Those have different column orders, but the latter could be mapped to the former by a technique like do_convert_tuple. So, what I suspect is: by producing the value of the whole-row reference from that of the index expression like that, we could support index-only scans with such an index in the case where we have the whole-row reference in the targetlist, not the index expression itself. (Having said that, I'm not 100% sure we need to solve that problem when we improve the planner, because there doesn't seem to me to be enough use-case to justify making the code complicated for that.) Anyway, I think that that would be a matter of future versions of PG. I am afraid that a fix which just works only till we change the other parts of the system is useful only till that time. At that time, it needs to be replaced with a different fix. I agree on that point. If that time is long enough, that's ok. In this case, I agree that if we haven't heard from users till now that they need to create indexes on whole-row expressions, there's chance that we will never implement this feature. I don't object to adding that feature to future versions of PG if required. So, I think it is safe to have whole-row Vars instead of ConvertRowtypeExprs in the targetlist. when it's looking for an expression, it finds a whole-row expression so it think it needs to add a ConvertRowtypeExpr on that. But when the plan is created, there is ConvertRowtypeExpr already, but there is no way to know that a new ConvertRowtypeExpr is not needed anymore. So, we may have two ConvertRowtypeExprs giving wrong results. Maybe I'm missing something, but I don't think that we need to worry about that, because in the approach I proposed, we only add CREs above whole-row Vars in the targetlists for subplans of an Append/MergeAppend for a partitioned relation at plan creation time. There's a patch in an adjacent thread started by David Rowley to rip out Append/MergeAppend when there is only one subplan. So, your solution won't work there. Thanks for sharing that information! I skimmed the thread. I haven't yet caught up with all the discussions there, so I think I'm missing something, but it looks like that we haven't yet reached any consensus on the way to go. In my opinion, I like the approach mentioned in [1]. And if we go that way, my patch seems to fit into that, because in that approach the Append/MergeAppend could be removed after adjusting the targetlists for its subplans in create_append_plan/create_merge_append_plan. Anyway, I'd like to join in that work for PG12. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/1867.1509032831%40sss.pgh.pa.us
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita wrote: > (2018/07/11 20:02), Ashutosh Bapat wrote: >> >> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita >> wrote: >>> >>> Actually, even if we could create such an index on the child table and >>> the >>> targetlist had the ConvertRowtypeExpr, the planner would still not be >>> able >>> to use an index-only scan with that index; because check_index_only would >>> not consider that an index-only scan is possible for that index because >>> that >>> index is an expression index and that function currently does not >>> consider >>> that index expressions are able to be returned back in an index-only >>> scan. >>> That behavior of the planner might be improved in future, though. > > >> Right and when we do so, not having ConvertRowtypeExpr in the >> targetlist will be a problem. > > > Yeah, but I don't think that that's unsolvable; because in that case the CRE > as an index expression could be converted back to the whole-row Var's > rowtype by adding another CRE to the index expression for that conversion, I > suspect that that special handling could allow us to support an index-only > scan even when having the whole-row Var instead of the CRE in the > targetlist. I am not able to understand this. Can you please provide an example? > (Having said that, I'm not 100% sure we need to solve that > problem when we improve the planner, because there doesn't seem to me to be > enough use-case to justify making the code complicated for that.) Anyway, I > think that that would be a matter of future versions of PG. I am afraid that a fix which just works only till we change the other parts of the system is useful only till that time. At that time, it needs to be replaced with a different fix. If that time is long enough, that's ok. In this case, I agree that if we haven't heard from users till now that they need to create indexes on whole-row expressions, there's chance that we will never implement this feature. > At places in planner we match equivalence members to the targetlist entries. This matching will fail unexpectedly when ConvertRowtypeExpr is removed from a child's targetlist. But again I couldn't reproduce a problem when such a mismatch arises. >>> >>> >>> >>> IIUC, I don't think the planner assumes that for an equivalence member >>> there >>> is an matching entry for that member in the targetlist; what I think the >>> planner assumes is: an equivalence member is able to be computed from >>> expressions in the targetlist. >> >> >> This is true. However, >> >>> So, I think it is safe to have whole-row >>> Vars instead of ConvertRowtypeExprs in the targetlist. >> >> >> when it's looking for an expression, it finds a whole-row expression >> so it think it needs to add a ConvertRowtypeExpr on that. But when the >> plan is created, there is ConvertRowtypeExpr already, but there is no >> way to know that a new ConvertRowtypeExpr is not needed anymore. So, >> we may have two ConvertRowtypeExprs giving wrong results. > > > Maybe I'm missing something, but I don't think that we need to worry about > that, because in the approach I proposed, we only add CREs above whole-row > Vars in the targetlists for subplans of an Append/MergeAppend for a > partitioned relation at plan creation time. There's a patch in an adjacent thread started by David Rowley to rip out Append/MergeAppend when there is only one subplan. So, your solution won't work there. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/11 20:02), Ashutosh Bapat wrote: On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita wrote: Actually, even if we could create such an index on the child table and the targetlist had the ConvertRowtypeExpr, the planner would still not be able to use an index-only scan with that index; because check_index_only would not consider that an index-only scan is possible for that index because that index is an expression index and that function currently does not consider that index expressions are able to be returned back in an index-only scan. That behavior of the planner might be improved in future, though. Right and when we do so, not having ConvertRowtypeExpr in the targetlist will be a problem. Yeah, but I don't think that that's unsolvable; because in that case the CRE as an index expression could be converted back to the whole-row Var's rowtype by adding another CRE to the index expression for that conversion, I suspect that that special handling could allow us to support an index-only scan even when having the whole-row Var instead of the CRE in the targetlist. (Having said that, I'm not 100% sure we need to solve that problem when we improve the planner, because there doesn't seem to me to be enough use-case to justify making the code complicated for that.) Anyway, I think that that would be a matter of future versions of PG. At places in planner we match equivalence members to the targetlist entries. This matching will fail unexpectedly when ConvertRowtypeExpr is removed from a child's targetlist. But again I couldn't reproduce a problem when such a mismatch arises. IIUC, I don't think the planner assumes that for an equivalence member there is an matching entry for that member in the targetlist; what I think the planner assumes is: an equivalence member is able to be computed from expressions in the targetlist. This is true. However, So, I think it is safe to have whole-row Vars instead of ConvertRowtypeExprs in the targetlist. when it's looking for an expression, it finds a whole-row expression so it think it needs to add a ConvertRowtypeExpr on that. But when the plan is created, there is ConvertRowtypeExpr already, but there is no way to know that a new ConvertRowtypeExpr is not needed anymore. So, we may have two ConvertRowtypeExprs giving wrong results. Maybe I'm missing something, but I don't think that we need to worry about that, because in the approach I proposed, we only add CREs above whole-row Vars in the targetlists for subplans of an Append/MergeAppend for a partitioned relation at plan creation time. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita wrote: > > > Actually, even if we could create such an index on the child table and the > targetlist had the ConvertRowtypeExpr, the planner would still not be able > to use an index-only scan with that index; because check_index_only would > not consider that an index-only scan is possible for that index because that > index is an expression index and that function currently does not consider > that index expressions are able to be returned back in an index-only scan. > That behavior of the planner might be improved in future, though. > >> Pathkey points to an equivalence class, which contains equivalence >> members. A parent equivalence class member containing a whole-row >> reference gets translated into a child equivalence member containing a >> ConvertRowtypeExpr. Right and when we do so, not having ConvertRowtypeExpr in the targetlist will be a problem. > > > I think so too. > >> At places in planner we match equivalence members >> to the targetlist entries. This matching will fail unexpectedly when >> ConvertRowtypeExpr is removed from a child's targetlist. But again I >> couldn't reproduce a problem when such a mismatch arises. > > > IIUC, I don't think the planner assumes that for an equivalence member there > is an matching entry for that member in the targetlist; what I think the > planner assumes is: an equivalence member is able to be computed from > expressions in the targetlist. This is true. However, > So, I think it is safe to have whole-row > Vars instead of ConvertRowtypeExprs in the targetlist. when it's looking for an expression, it finds a whole-row expression so it think it needs to add a ConvertRowtypeExpr on that. But when the plan is created, there is ConvertRowtypeExpr already, but there is no way to know that a new ConvertRowtypeExpr is not needed anymore. So, we may have two ConvertRowtypeExprs giving wrong results. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/10 19:58), Ashutosh Bapat wrote: On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita wrote: (2018/07/09 20:43), Ashutosh Bapat wrote: At the cost of having targetlist being type inconsistent. I don't have any testcase either to show that that's a problem in practice. As I said before, I don't see any issue in having such a targetlist, but I might be missing something, so I'd like to discuss a bit more about that. Could you tell me the logic/place in the PG code where you think the problem might occur. (IIRC, you mentioned something about that before (pathkeys? or index-only scans?), but sorry, I don't understand that.) IIUC, index-only scan will be used when the all the required columns are covered by an index. If there is an index on the whole-row reference of the parent, it will be translated into a ConvertRowtypeExpr of the child when an index on the child is created. I think so too. If the targetlist doesn't have ConvertRowtypeExpr, as your patch does, the planner won't be able to use such an index on the child table. But I couldn't create an index with a whole-row reference in it. So, I think this isn't possible right now. Actually, even if we could create such an index on the child table and the targetlist had the ConvertRowtypeExpr, the planner would still not be able to use an index-only scan with that index; because check_index_only would not consider that an index-only scan is possible for that index because that index is an expression index and that function currently does not consider that index expressions are able to be returned back in an index-only scan. That behavior of the planner might be improved in future, though. Pathkey points to an equivalence class, which contains equivalence members. A parent equivalence class member containing a whole-row reference gets translated into a child equivalence member containing a ConvertRowtypeExpr. I think so too. At places in planner we match equivalence members to the targetlist entries. This matching will fail unexpectedly when ConvertRowtypeExpr is removed from a child's targetlist. But again I couldn't reproduce a problem when such a mismatch arises. IIUC, I don't think the planner assumes that for an equivalence member there is an matching entry for that member in the targetlist; what I think the planner assumes is: an equivalence member is able to be computed from expressions in the targetlist. So, I think it is safe to have whole-row Vars instead of ConvertRowtypeExprs in the targetlist. Thanks for the explanation! Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita wrote: > (2018/07/09 20:43), Ashutosh Bapat wrote: >>> >>> >>> I don't have any numbers right now, so that is nothing but a concern. But >>> as >>> I said in a previous email, in the approach I proposed, we don't need to >>> spend extra cycles where partitioning is not involved. I think that is a >>> good thing in itself. No? >> >> >> At the cost of having targetlist being type inconsistent. I don't have >> any testcase either to show that that's a problem in practice. So, >> it's a trade-off of a concern vs concern. > > > As I said before, I don't see any issue in having such a targetlist, but I > might be missing something, so I'd like to discuss a bit more about that. > Could you tell me the logic/place in the PG code where you think the problem > might occur. (IIRC, you mentioned something about that before (pathkeys? or > index-only scans?), but sorry, I don't understand that.) IIUC, index-only scan will be used when the all the required columns are covered by an index. If there is an index on the whole-row reference of the parent, it will be translated into a ConvertRowtypeExpr of the child when an index on the child is created. If the targetlist doesn't have ConvertRowtypeExpr, as your patch does, the planner won't be able to use such an index on the child table. But I couldn't create an index with a whole-row reference in it. So, I think this isn't possible right now. But in future if we allow creating index on whole-row reference or Pathkey points to an equivalence class, which contains equivalence members. A parent equivalence class member containing a whole-row reference gets translated into a child equivalence member containing a ConvertRowtypeExpr. At places in planner we match equivalence members to the targetlist entries. This matching will fail unexpectedly when ConvertRowtypeExpr is removed from a child's targetlist. But again I couldn't reproduce a problem when such a mismatch arises. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/09 20:43), Ashutosh Bapat wrote: I don't have any numbers right now, so that is nothing but a concern. But as I said in a previous email, in the approach I proposed, we don't need to spend extra cycles where partitioning is not involved. I think that is a good thing in itself. No? At the cost of having targetlist being type inconsistent. I don't have any testcase either to show that that's a problem in practice. So, it's a trade-off of a concern vs concern. As I said before, I don't see any issue in having such a targetlist, but I might be missing something, so I'd like to discuss a bit more about that. Could you tell me the logic/place in the PG code where you think the problem might occur. (IIRC, you mentioned something about that before (pathkeys? or index-only scans?), but sorry, I don't understand that.) Apart from that, in your approach there are extra cycles spent in traversing the targetlist to add ConvertRowtypeExpr, albeit only when there is a whole-row expression in the targetlist, when creating plans. That's not there in my patch. Right. That's unfortunate, but I think that that would be still better than needing to spent extra cycles where partitioning is not involved. And ISTM that the traversing cost is not that large compared to the cost we pay before that for query planning for a partitionwise join. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
> > I don't have any numbers right now, so that is nothing but a concern. But as > I said in a previous email, in the approach I proposed, we don't need to > spend extra cycles where partitioning is not involved. I think that is a > good thing in itself. No? At the cost of having targetlist being type inconsistent. I don't have any testcase either to show that that's a problem in practice. So, it's a trade-off of a concern vs concern. Apart from that, in your approach there are extra cycles spent in traversing the targetlist to add ConvertRowtypeExpr, albeit only when there is a whole-row expression in the targetlist, when creating plans. That's not there in my patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/09 20:06), Ashutosh Bapat wrote: On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita wrote: As I said, we do spend cycles in that function testing whether a node is Aggref or not even when the query doesn't have aggregates or grouping OR spend cycles in testing whether a node is a PlaceHolderVar when the query doesn't produce any. So, I don't see any problem with spending a few cycles testing whether a node is ConvertRowtypeExpr or not when a ConvertRowtypeExpr is not in the query or command. That's not a huge performance trouble. I would be happy to change my mind, if you show me performance different with and without this patch in planning. I haven't seen any. I have to admit that the case in [1] wouldn't affect the performance, but my concern is that there might be some cases where the test affects performance. What are those cases? Can you please provide any numbers supporting your claim? I don't have any numbers right now, so that is nothing but a concern. But as I said in a previous email, in the approach I proposed, we don't need to spend extra cycles where partitioning is not involved. I think that is a good thing in itself. No? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita wrote: >> >> >> As I said, we do spend cycles in that function testing whether a node >> is Aggref or not even when the query doesn't have aggregates or >> grouping OR spend cycles in testing whether a node is a PlaceHolderVar >> when the query doesn't produce any. So, I don't see any problem with >> spending a few cycles testing whether a node is ConvertRowtypeExpr or >> not when a ConvertRowtypeExpr is not in the query or command. That's >> not a huge performance trouble. I would be happy to change my mind, if >> you show me performance different with and without this patch in >> planning. I haven't seen any. > > > I have to admit that the case in [1] wouldn't affect the performance, but my > concern is that there might be some cases where the test affects > performance. What are those cases? Can you please provide any numbers supporting your claim? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/06 20:20), Ashutosh Bapat wrote: On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita wrote: (2018/07/04 21:37), Ashutosh Bapat wrote: On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita wrote: Let me explain about that: 1) my patch won't apply that function to a child if its top parent is an appendrel built from a UNION ALL subquery, even though the child is a partition of a partitioned table pulled up from a leaf subquery into the parent query, like lpt_p1, and 2) my patch will apply that function to a child if its top parent is a partitioned table, whether or not the partitioned table is involved in a partitionwise join. By #1, we avoid the adjustment work at plan creation time, as explained above. It might be worth trying to be smarter about #2 (for example, in the case of a join of a partitioned table and a non-partitioned table, since we don't consider a partitionwise join for that join, it's better to not apply that function to partitions of the partitioned table, to avoid the adjustment work at plan creation time), but ISTM we don't have enough information to be smarter. That looks like a kludge to me rather than a proper fix. It's not clear to me as to when somebody can expect ConvertRowtypeExpr in the targetlist and when don't while creating paths and to an extent plans. For example, inside add_paths_to_append_rel() or in apply_scanjoin_target_to_paths() or for that matter any path creation or plan creation function, we will sometimes get targetlists with ConvertRowtypeExpr() and sometime not. How do we know which is when. That is known from a new flag "has_child_wholerow" added to RelOptInfo to indicate whether an other relation's reltarget has child wholerow Vars instead of ConvertRowtypeExprs. That flag is initialized to false and only set to true in build_childrel_tlist if it adds child wholerow Vars to the reltarget of a given other relation. Though, I don't think we need to be conscious about whether the reltarget has child wholerow Vars or ConvertRowtypeExprs when creating paths; we would just create paths based on the reltarget. What we need to be careful about is when creating an Append/MergeAppend plan; we have to adjust the subplan's tlist so child wholerow Vars in that tlist are converted to the corresponding ConvertRowtypeExprs, if that tlist has those Vars, which is also known from the new flag. I think the biggest issue in the original patch you proposed is that we spend extra cycles where partitioning is not involved, which is the biggest reason why I think the original patch isn't the right way to go. When there are no partitions involved, there won't be any ConvertRowtypeExprs there which means the function is_converted_whole_row_reference() would just return from the first line checking IsA() and nullness of node. Really? IIRC, with the original patch we would spend extra cycles in a non-partitioned inheritance processing [1]. As I said, we do spend cycles in that function testing whether a node is Aggref or not even when the query doesn't have aggregates or grouping OR spend cycles in testing whether a node is a PlaceHolderVar when the query doesn't produce any. So, I don't see any problem with spending a few cycles testing whether a node is ConvertRowtypeExpr or not when a ConvertRowtypeExpr is not in the query or command. That's not a huge performance trouble. I would be happy to change my mind, if you show me performance different with and without this patch in planning. I haven't seen any. I have to admit that the case in [1] wouldn't affect the performance, but my concern is that there might be some cases where the test affects performance. In contrast, in the approach I proposed, we wouldn't need to worry about that because the approach does not modify code that is not related to partitioning. I think that's a good thing. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita wrote: > (2018/07/04 21:37), Ashutosh Bapat wrote: >> >> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita >> wrote: >>> >>> (2018/07/04 19:04), Ashutosh Bapat wrote: On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita wrote: > > (2018/06/22 22:54), Ashutosh Bapat wrote: >> >> + if (enable_partitionwise_join&& >> rel->top_parent_is_partitioned) >> + { >> + build_childrel_tlist(root, rel, childrel, 1,); >> + } >> >> Why do we need rel->top_parent_is_partitioned? If a relation is >> partitioned (if (rel->part_scheme), it's either the top parent or is >> partition of some other partitioned table. In either case this >> condition will be true. > > > This would be needed to avoid unnecessarily applying > build_childrel_tlist > to > child rels of a partitioned table for which we don't consider > partitionwise > join. Consider: > > postgres=# create table lpt (c1 int, c2 text) partition by list (c1); > CREATE TABLE > postgres=# create table lpt_p1 partition of lpt for values in (1); > CREATE TABLE > postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); > CREATE TABLE > postgres=# create table test (c1 int, c2 text); > CREATE TABLE > postgres=# explain verbose select * from (select * from lpt union all > select > * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); > > >>> I might misunderstand your words, but in the above example the patch >>> doesn't >>> apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is >>> because we can avoid adjusting the tlists for the corresponding subplans >>> at >>> plan creation time so that whole-row Vars in the tlists are transformed >>> into >>> CREs. I think the overhead of the adjustment is not that big, but not >>> zero, >>> so it would be worth avoiding applying build_childrel_tlist to partitions >>> if >>> the top parent won't participate in a partitionwise-join at all. >> >> >> I don't think that answers my question. When we join lpt with test, >> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even >> when join between lpt and test is not going to use partition-wise >> join. Why? > > > Maybe my explanation including the example was not good. Sorry about that, > but my patch will *not* apply build_childrel_tlist to lpt_p1 and lpt_p2 > since the top parent of lpt_p1 and lpt_p2 is the UNION ALL subquery and > hence not a partitioned table (ie, we have > rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2). > >> As per your explanation, the condition "if >> (enable_partitionwise_join&& rel->top_parent_is_partitioned)" is >> used to avoid applying build_childrel_tlist() when partition-wise join >> won't be possible. But it's not covering all the cases. > > > Let me explain about that: 1) my patch won't apply that function to a child > if its top parent is an appendrel built from a UNION ALL subquery, even > though the child is a partition of a partitioned table pulled up from a leaf > subquery into the parent query, like lpt_p1, and 2) my patch will apply that > function to a child if its top parent is a partitioned table, whether or not > the partitioned table is involved in a partitionwise join. By #1, we avoid > the adjustment work at plan creation time, as explained above. It might be > worth trying to be smarter about #2 (for example, in the case of a join of a > partitioned table and a non-partitioned table, since we don't consider a > partitionwise join for that join, it's better to not apply that function to > partitions of the partitioned table, to avoid the adjustment work at plan > creation time), but ISTM we don't have enough information to be smarter. That looks like a kludge to me rather than a proper fix. It's not clear to me as to when somebody can expect ConvertRowtypeExpr in the targetlist and when don't while creating paths and to an extent plans. For example, inside add_paths_to_append_rel() or in apply_scanjoin_target_to_paths() or for that matter any path creation or plan creation function, we will sometimes get targetlists with ConvertRowtypeExpr() and sometime not. How do we know which is when. > A ConvertRowtypeExpr is used to fix this inconsistency. That's why I chose to use pull_var_clause() as a place to fix the problem and not fix ConvertRowtypeExpr in targetlist itself. >>> >>> >>> >>> I think the biggest issue in the original patch you proposed is that we >>> spend extra cycles where partitioning is not involved, which is the >>> biggest >>> reason why I think the original patch isn't the right way to go. >> >> >> When there are no partitions involved, there won't be any >> ConvertRowtypeExprs there which means the function >> is_converted_whole_row_reference() would just return from the first >> line checking IsA() and nullness of node. > > >
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/04 21:37), Ashutosh Bapat wrote: On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita wrote: (2018/07/04 19:04), Ashutosh Bapat wrote: On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita wrote: (2018/06/22 22:54), Ashutosh Bapat wrote: + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) + { + build_childrel_tlist(root, rel, childrel, 1,); + } Why do we need rel->top_parent_is_partitioned? If a relation is partitioned (if (rel->part_scheme), it's either the top parent or is partition of some other partitioned table. In either case this condition will be true. This would be needed to avoid unnecessarily applying build_childrel_tlist to child rels of a partitioned table for which we don't consider partitionwise join. Consider: postgres=# create table lpt (c1 int, c2 text) partition by list (c1); CREATE TABLE postgres=# create table lpt_p1 partition of lpt for values in (1); CREATE TABLE postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); CREATE TABLE postgres=# create table test (c1 int, c2 text); CREATE TABLE postgres=# explain verbose select * from (select * from lpt union all select * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); I might misunderstand your words, but in the above example the patch doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is because we can avoid adjusting the tlists for the corresponding subplans at plan creation time so that whole-row Vars in the tlists are transformed into CREs. I think the overhead of the adjustment is not that big, but not zero, so it would be worth avoiding applying build_childrel_tlist to partitions if the top parent won't participate in a partitionwise-join at all. I don't think that answers my question. When we join lpt with test, your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even when join between lpt and test is not going to use partition-wise join. Why? Maybe my explanation including the example was not good. Sorry about that, but my patch will *not* apply build_childrel_tlist to lpt_p1 and lpt_p2 since the top parent of lpt_p1 and lpt_p2 is the UNION ALL subquery and hence not a partitioned table (ie, we have rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2). As per your explanation, the condition "if (enable_partitionwise_join&& rel->top_parent_is_partitioned)" is used to avoid applying build_childrel_tlist() when partition-wise join won't be possible. But it's not covering all the cases. Let me explain about that: 1) my patch won't apply that function to a child if its top parent is an appendrel built from a UNION ALL subquery, even though the child is a partition of a partitioned table pulled up from a leaf subquery into the parent query, like lpt_p1, and 2) my patch will apply that function to a child if its top parent is a partitioned table, whether or not the partitioned table is involved in a partitionwise join. By #1, we avoid the adjustment work at plan creation time, as explained above. It might be worth trying to be smarter about #2 (for example, in the case of a join of a partitioned table and a non-partitioned table, since we don't consider a partitionwise join for that join, it's better to not apply that function to partitions of the partitioned table, to avoid the adjustment work at plan creation time), but ISTM we don't have enough information to be smarter. An in-between state will produce a hell lot of confusion for any further optimization. Whenever we change the code around partition-wise operations in future, we will have to check whether or not a given child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I have mentioned earlier, I am also not comfortable with the targetlist of child relations being type inconsistent with that of the parent, which is a fundamental rule in inheritance. Worst keep them inconsistent during path creation and make them consistent at the time of creating plans. A child's whole-row Var has datatype of the child where as that of parent has datatype of parent. I don't see any critical issue here. Could you elaborate a bit more on that point? I think breaking a fundamental rule like this itself is critical. But interestingly I am not able to find a case where it becomes a problem. But may be I haven't tried enough. And the question is if it's not required to have the targetlists type consistent, why even bother with ConvertRowtypeExpr addition there? We can use your approach of adding ConvertRowtypeExpr at the end in all the cases. I think that the tlist of a (not-the-topmost) child relation doesn't need to be type-consistent with that of the parent; it has only to include all Vars that are needed for higher joinquals and final output to the parent appendrel. (In other words, I think we can build the tlist in the same manner as we build tlists of base or join relations in the main join tree.) On
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/05 22:04), Ashutosh Bapat wrote: On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita wrote: Another thing I noticed is: actually, we don't produce an Append with child Gathers in apply_scanjoin_target_to_paths, which I thought we would do that in the case of scanjoin_target_parallel_safe=false, but I noticed I was wrong. Sorry for that. The reason is because in that case, even if we create new partial Append paths with child Gathers, we don't run generate_gatehr_paths for the newly created partial paths at the end of that function shown below, since the parent's consider_parallel flag is set to false in that case: Hmm. I don't think that's a great idea since such a plan might turn out cheaper esp. when there are very few children which could use parallel query and parallel append is possible at the top parent. Actually, I don't think we can create such a Parallel Append, because in the case where we have scanjoin_target_parallel_safe=false, child paths would not be parallel-safe anymore as those paths are rewritten to have parallel-unsafe targets in the recursion of apply_scanjoin_target_to_paths. anyway, that's what it is today. But I think, we shouldn't write code assuming that an Append will never see a Gather below it. We might see some plans like that in future and need to change your code. I agree that we might allow such an Append in future, but I'm not sure we should write code for that in PG11, because 1) that would make code complicated than necessary and 2) that might cause overlooking of unexpected behavior of the planner; if the planner created such an Append erroneously, we would run that by that code without noticing that. To reduce the maintenance burden and the overlooking risk, I think it's better to extend code for such a plan when we allow that plan for a partitionwise join. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita wrote: > > > In the case where scanjoin_target_parallel_safe=false, we actually will > consider such parallel paths using the existing partial paths for the parent > appendrel in the code path shown in a previous email (note: we would already > have done generate_partitionwise_join_paths or set_append_rel_pathlist for > the parent appendrel in query_planner.) For such a parallel path, we might > need to do a projection in the Gather node for generating the SRF-free > scan/join target, but I think that would be still efficient. So, I'm not > sure that we really need to create child Gathers and generate an Append with > those Gathers, as you mentioned. I don't think it will be Append of Gathers, but Append where one of the subplans is Gather. I don't think that possibility is completely eliminated. > > Another thing I noticed is: actually, we don't produce an Append with child > Gathers in apply_scanjoin_target_to_paths, which I thought we would do that > in the case of scanjoin_target_parallel_safe=false, but I noticed I was > wrong. Sorry for that. The reason is because in that case, even if we > create new partial Append paths with child Gathers, we don't run > generate_gatehr_paths for the newly created partial paths at the end of that > function shown below, since the parent's consider_parallel flag is set to > false in that case: > > /* > * Consider generating Gather or Gather Merge paths. We must only do > this > * if the relation is parallel safe, and we don't do it for child rels > to > * avoid creating multiple Gather nodes within the same plan. We must do > * this after all paths have been generated and before set_cheapest, > since > * one of the generated paths may turn out to be the cheapest one. > */ > if (rel->consider_parallel && !IS_OTHER_REL(rel)) > generate_gather_paths(root, rel, false); Hmm. I don't think that's a great idea since such a plan might turn out cheaper esp. when there are very few children which could use parallel query and parallel append is possible at the top parent. But anyway, that's what it is today. But I think, we shouldn't write code assuming that an Append will never see a Gather below it. We might see some plans like that in future and need to change your code. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/05 20:15), Etsuro Fujita wrote: (2018/07/04 19:11), Ashutosh Bapat wrote: On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita wrote: I don't produce a test case where the plan is an Append with Gather subplans, but I'm not sure that it's a good thing to allow us to consider such a plan because in that plan, each Gather would try to grab its own pool of workers. Am I missing something? If the overall join can not use parallelism, having individual child joins use parallel query might turn out to be efficient even if done one child at a time. Parallel append drastically reduces the cases where something like could be useful, but I don't think we can theoretically eliminate the need for such a plan. In the case where scanjoin_target_parallel_safe=false, we actually will consider such parallel paths using the existing partial paths for the parent appendrel in the code path shown in a previous email (note: we would already have done generate_partitionwise_join_paths or set_append_rel_pathlist for the parent appendrel in query_planner.) For such a parallel path, we might need to do a projection in the Gather node for generating the SRF-free scan/join target, but I think that would be still efficient. So, I'm not sure that we really need to create child Gathers and generate an Append with those Gathers, as you mentioned. Another thing I noticed is: actually, we don't produce an Append with child Gathers in apply_scanjoin_target_to_paths, which I thought we would do that in the case of scanjoin_target_parallel_safe=false, but I noticed I was wrong. Sorry for that. The reason is because in that case, even if we create new partial Append paths with child Gathers, we don't run generate_gatehr_paths for the newly created partial paths at the end of that function shown below, since the parent's consider_parallel flag is set to false in that case: /* * Consider generating Gather or Gather Merge paths. We must only do this * if the relation is parallel safe, and we don't do it for child rels to * avoid creating multiple Gather nodes within the same plan. We must do * this after all paths have been generated and before set_cheapest, since * one of the generated paths may turn out to be the cheapest one. */ if (rel->consider_parallel && !IS_OTHER_REL(rel)) generate_gather_paths(root, rel, false); One thing to add is: ISTM we need some cleanup for apply_scanjoin_target_to_paths. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/04 19:11), Ashutosh Bapat wrote: On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita wrote: I don't produce a test case where the plan is an Append with Gather subplans, but I'm not sure that it's a good thing to allow us to consider such a plan because in that plan, each Gather would try to grab its own pool of workers. Am I missing something? If the overall join can not use parallelism, having individual child joins use parallel query might turn out to be efficient even if done one child at a time. Parallel append drastically reduces the cases where something like could be useful, but I don't think we can theoretically eliminate the need for such a plan. In the case where scanjoin_target_parallel_safe=false, we actually will consider such parallel paths using the existing partial paths for the parent appendrel in the code path shown in a previous email (note: we would already have done generate_partitionwise_join_paths or set_append_rel_pathlist for the parent appendrel in query_planner.) For such a parallel path, we might need to do a projection in the Gather node for generating the SRF-free scan/join target, but I think that would be still efficient. So, I'm not sure that we really need to create child Gathers and generate an Append with those Gathers, as you mentioned. Another thing I noticed is: actually, we don't produce an Append with child Gathers in apply_scanjoin_target_to_paths, which I thought we would do that in the case of scanjoin_target_parallel_safe=false, but I noticed I was wrong. Sorry for that. The reason is because in that case, even if we create new partial Append paths with child Gathers, we don't run generate_gatehr_paths for the newly created partial paths at the end of that function shown below, since the parent's consider_parallel flag is set to false in that case: /* * Consider generating Gather or Gather Merge paths. We must only do this * if the relation is parallel safe, and we don't do it for child rels to * avoid creating multiple Gather nodes within the same plan. We must do * this after all paths have been generated and before set_cheapest, since * one of the generated paths may turn out to be the cheapest one. */ if (rel->consider_parallel && !IS_OTHER_REL(rel)) generate_gather_paths(root, rel, false); Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita wrote: > (2018/07/04 19:04), Ashutosh Bapat wrote: >> >> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita >> wrote: >>> >>> (2018/06/22 22:54), Ashutosh Bapat wrote: + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) + { + build_childrel_tlist(root, rel, childrel, 1,); + } Why do we need rel->top_parent_is_partitioned? If a relation is partitioned (if (rel->part_scheme), it's either the top parent or is partition of some other partitioned table. In either case this condition will be true. >>> >>> >>> >>> This would be needed to avoid unnecessarily applying build_childrel_tlist >>> to >>> child rels of a partitioned table for which we don't consider >>> partitionwise >>> join. Consider: >>> >>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1); >>> CREATE TABLE >>> postgres=# create table lpt_p1 partition of lpt for values in (1); >>> CREATE TABLE >>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); >>> CREATE TABLE >>> postgres=# create table test (c1 int, c2 text); >>> CREATE TABLE >>> postgres=# explain verbose select * from (select * from lpt union all >>> select >>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); >> >> >> --- plan clipped >> >>> >>> In this example, the top parent is not a partitioned table and we don't >>> need >>> to consider partitionwise join for that, so we don't need to apply that >>> to >>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). >> >> >> FWIW, the test is not sufficient. In the above example, a simple >> select * from lpt inner join test where lpt.c1 = test.c1 would not use >> partition-wise join technique. Why not to avoid build_childrel_tlist() >> in that case as well? > > > I might misunderstand your words, but in the above example the patch doesn't > apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is > because we can avoid adjusting the tlists for the corresponding subplans at > plan creation time so that whole-row Vars in the tlists are transformed into > CREs. I think the overhead of the adjustment is not that big, but not zero, > so it would be worth avoiding applying build_childrel_tlist to partitions if > the top parent won't participate in a partitionwise-join at all. I don't think that answers my question. When we join lpt with test, your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even when join between lpt and test is not going to use partition-wise join. Why? As per your explanation, the condition "if (enable_partitionwise_join && rel->top_parent_is_partitioned)" is used to avoid applying build_childrel_tlist() when partition-wise join won't be possible. But it's not covering all the cases. > >> Worst we can change the criteria to use >> partition-wise join in future e.g. above case would use partition-wise >> join by 1. merging lpt_p1 into corresponding partition of lpt so that >> ss is partitioned and 2. repartitioning test or joining test with each >> partition of lpt separately. In those cases the changes you are doing >> here need to be reverted and put somewhere else. There's already a >> patch to reverse the order of join and grouping out there. How would >> this work with that. > > > Interesting, but that seems like a matter of PG12+. Yes, and I don't think it's a good idea to change this fix for PG12+ again. > >> In general I think set_append_rel_size() or build_simple_rel() is not >> the place where we should decide whether the given relation will >> participate in a PWJ. Also we should not selectively add a >> ConvertRowtypeExpr on top of a whole-row Var of some a child relation. >> We should do it for all the child rels or shouldn't do it at all. > > > One thing I thought was to apply build_childrel_tlist for all the child rels > whether its top parent is a partitioned table or not, but as I mentioned > above, that would incur needless overhead for adjusting the tlists for the > child rels that don't involve in a partitionwise join such as lpt_p1 and > lpt_p2, which I think is not good. > >> An >> in-between state will produce a hell lot of confusion for any further >> optimization. Whenever we change the code around partition-wise >> operations in future, we will have to check whether or not a given >> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I >> have mentioned earlier, I am also not comfortable with the targetlist >> of child relations being type inconsistent with that of the parent, >> which is a fundamental rule in inheritance. Worst keep them >> inconsistent during path creation and make them consistent at the time >> of creating plans. A child's whole-row Var has datatype of the child >> where as that of parent has datatype of parent. > > > I don't see any critical issue here. Could you elaborate a bit more on that > point? I think breaking a
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/04 19:04), Ashutosh Bapat wrote: On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita wrote: (2018/06/22 22:54), Ashutosh Bapat wrote: + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) + { + build_childrel_tlist(root, rel, childrel, 1,); + } Why do we need rel->top_parent_is_partitioned? If a relation is partitioned (if (rel->part_scheme), it's either the top parent or is partition of some other partitioned table. In either case this condition will be true. This would be needed to avoid unnecessarily applying build_childrel_tlist to child rels of a partitioned table for which we don't consider partitionwise join. Consider: postgres=# create table lpt (c1 int, c2 text) partition by list (c1); CREATE TABLE postgres=# create table lpt_p1 partition of lpt for values in (1); CREATE TABLE postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); CREATE TABLE postgres=# create table test (c1 int, c2 text); CREATE TABLE postgres=# explain verbose select * from (select * from lpt union all select * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); --- plan clipped In this example, the top parent is not a partitioned table and we don't need to consider partitionwise join for that, so we don't need to apply that to the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). FWIW, the test is not sufficient. In the above example, a simple select * from lpt inner join test where lpt.c1 = test.c1 would not use partition-wise join technique. Why not to avoid build_childrel_tlist() in that case as well? I might misunderstand your words, but in the above example the patch doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2. The reason for that is because we can avoid adjusting the tlists for the corresponding subplans at plan creation time so that whole-row Vars in the tlists are transformed into CREs. I think the overhead of the adjustment is not that big, but not zero, so it would be worth avoiding applying build_childrel_tlist to partitions if the top parent won't participate in a partitionwise-join at all. Worst we can change the criteria to use partition-wise join in future e.g. above case would use partition-wise join by 1. merging lpt_p1 into corresponding partition of lpt so that ss is partitioned and 2. repartitioning test or joining test with each partition of lpt separately. In those cases the changes you are doing here need to be reverted and put somewhere else. There's already a patch to reverse the order of join and grouping out there. How would this work with that. Interesting, but that seems like a matter of PG12+. In general I think set_append_rel_size() or build_simple_rel() is not the place where we should decide whether the given relation will participate in a PWJ. Also we should not selectively add a ConvertRowtypeExpr on top of a whole-row Var of some a child relation. We should do it for all the child rels or shouldn't do it at all. One thing I thought was to apply build_childrel_tlist for all the child rels whether its top parent is a partitioned table or not, but as I mentioned above, that would incur needless overhead for adjusting the tlists for the child rels that don't involve in a partitionwise join such as lpt_p1 and lpt_p2, which I think is not good. An in-between state will produce a hell lot of confusion for any further optimization. Whenever we change the code around partition-wise operations in future, we will have to check whether or not a given child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I have mentioned earlier, I am also not comfortable with the targetlist of child relations being type inconsistent with that of the parent, which is a fundamental rule in inheritance. Worst keep them inconsistent during path creation and make them consistent at the time of creating plans. A child's whole-row Var has datatype of the child where as that of parent has datatype of parent. I don't see any critical issue here. Could you elaborate a bit more on that point? A ConvertRowtypeExpr is used to fix this inconsistency. That's why I chose to use pull_var_clause() as a place to fix the problem and not fix ConvertRowtypeExpr in targetlist itself. I think the biggest issue in the original patch you proposed is that we spend extra cycles where partitioning is not involved, which is the biggest reason why I think the original patch isn't the right way to go. I am also not comfortable in just avoiding ConvertRowtypeExprs in targetlist and leave them as is in the conditions and ECs etc. This means searching a ConvertRowtypeExpr e.g. for creating pathkeys in targetlist will fail at the time of path creation but will succeed at the time of plan creation. This is turning more invasive that my approach of fixing it through PVC. Sorry, I don't understand this. Could you show me places where the problem happens? + /* No work if the
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/02 18:46), Etsuro Fujita wrote: (2018/06/22 23:58), Robert Haas wrote: And, in general, it seems to me that we want to produce the right outputs at the lowest possible level of the plan tree. For instance, suppose that one of the relations being scanned is not parallel-safe but the others are. Then, we could end up with a plan like this: Append -> Seq Scan on temp_rela -> Gather -> Parallel Seq Scan on thing1 -> Gather -> Parallel Seq Scan on thing2 If a projection is required to convert the row type expression, we certainly want that to get pushed below the Gather, not to happen at the Gather level itself. IIUC, we currently don't consider such a plan for partition-wise join; we'll only consider gathering partial paths for the parent appendrel. While updating the patch, I noticed that I'm wrong here. IIUC, apply_scanjoin_target_to_paths would allow us to consider such an Append for a partitioned relation when scanjoin_target_parallel_safe=false, as it generates new Append paths by recursively adjusting all partitions for which we call generate_gather_paths in that case as shown below: /* * If the scan/join target is not parallel-safe, partial paths cannot * generate it. */ if (!scanjoin_target_parallel_safe) { /* * Since we can't generate the final scan/join target, this is our * last opportunity to use any partial paths that exist. We don't do * this if the case where the target is parallel-safe, since we will * be able to generate superior paths by doing it after the final * scan/join target has been applied. * * Note that this may invalidate rel->cheapest_total_path, so we must * not rely on it after this point without first calling set_cheapest. */ generate_gather_paths(root, rel, false); /* Can't use parallel query above this level. */ rel->partial_pathlist = NIL; rel->consider_parallel = false; } I don't produce a test case where the plan is an Append with Gather subplans, but I'm not sure that it's a good thing to allow us to consider such a plan because in that plan, each Gather would try to grab its own pool of workers. Am I missing something? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita wrote: > (2018/06/22 22:54), Ashutosh Bapat wrote: >> >> I have started reviewing the patch. > > > Thanks for the review! > >> + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) >> + { >> + build_childrel_tlist(root, rel, childrel, 1,); >> + } >> >> Why do we need rel->top_parent_is_partitioned? If a relation is >> partitioned (if (rel->part_scheme), it's either the top parent or is >> partition of some other partitioned table. In either case this >> condition will be true. > > > This would be needed to avoid unnecessarily applying build_childrel_tlist to > child rels of a partitioned table for which we don't consider partitionwise > join. Consider: > > postgres=# create table lpt (c1 int, c2 text) partition by list (c1); > CREATE TABLE > postgres=# create table lpt_p1 partition of lpt for values in (1); > CREATE TABLE > postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); > CREATE TABLE > postgres=# create table test (c1 int, c2 text); > CREATE TABLE > postgres=# explain verbose select * from (select * from lpt union all select > * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); --- plan clipped > > In this example, the top parent is not a partitioned table and we don't need > to consider partitionwise join for that, so we don't need to apply that to > the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). FWIW, the test is not sufficient. In the above example, a simple select * from lpt inner join test where lpt.c1 = test.c1 would not use partition-wise join technique. Why not to avoid build_childrel_tlist() in that case as well? Worst we can change the criteria to use partition-wise join in future e.g. above case would use partition-wise join by 1. merging lpt_p1 into corresponding partition of lpt so that ss is partitioned and 2. repartitioning test or joining test with each partition of lpt separately. In those cases the changes you are doing here need to be reverted and put somewhere else. There's already a patch to reverse the order of join and grouping out there. How would this work with that. In general I think set_append_rel_size() or build_simple_rel() is not the place where we should decide whether the given relation will participate in a PWJ. Also we should not selectively add a ConvertRowtypeExpr on top of a whole-row Var of some a child relation. We should do it for all the child rels or shouldn't do it at all. An in-between state will produce a hell lot of confusion for any further optimization. Whenever we change the code around partition-wise operations in future, we will have to check whether or not a given child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I have mentioned earlier, I am also not comfortable with the targetlist of child relations being type inconsistent with that of the parent, which is a fundamental rule in inheritance. Worst keep them inconsistent during path creation and make them consistent at the time of creating plans. A child's whole-row Var has datatype of the child where as that of parent has datatype of parent. A ConvertRowtypeExpr is used to fix this inconsistency. That's why I chose to use pull_var_clause() as a place to fix the problem and not fix ConvertRowtypeExpr in targetlist itself. I am also not comfortable in just avoiding ConvertRowtypeExprs in targetlist and leave them as is in the conditions and ECs etc. This means searching a ConvertRowtypeExpr e.g. for creating pathkeys in targetlist will fail at the time of path creation but will succeed at the time of plan creation. This is turning more invasive that my approach of fixing it through PVC. > >> + /* No work if the child plan is an Append or MergeAppend */ >> + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) >> + return; >> >> Why? Probably it's related to the answer to the first question, But I >> don't see the connection. Given that partition-wise join will be >> applicable level by level, we need to recurse in >> adjust_subplan_tlist(). > > > I don't think so; I think if the child plan is an Append or MergeAppend, the > tlist for each subplan of the Append or MergeAppend would have already been > adjusted while create_plan_recurse before we are called here. Ok. Thanks for the clarification. I think we should add a comment. > >> + /* The child plan should be able to do projection */ >> + Assert(is_projection_capable_plan(subplan)); >> + >> Again why? A MergeAppend's subplan could be a Sort plan, which will >> not be projection capable. > > > IIUC, since we are called here right after create_plan_recurse, the child > plan would be a scan or join unless it's neither Append nor MergeAppend. So > it should be projection-capable. Maybe I'm missing something, though. That's not true. add_paths_to_append_rel() adds sort paths on scan if necessary and creates merge append path. -- Best Wishes, Ashutosh Bapat
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/07/04 1:35), Andres Freund wrote: On 2018-06-22 10:58:28 -0400, Robert Haas wrote: On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita wrote: Here is a patch for that. I think this approach is going to run into trouble if the level at which we have to apply the ConvertRowTypeExpr happens not to be a projection-capable node. What's the plan forward here? I still think that this approach would be the right way to go, so I plan to polish the patch. This has been an open item for quite a while. Robert, are you in agreement with this approach on a high level? Who is going to drive this forward? I'm willing to do that if it's OK, but I'd like to get more feedback from Robert, Ashutosh or anyone else. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Hi Ashutosh, Etsuro, Robert, On 2018-06-22 10:58:28 -0400, Robert Haas wrote: > On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita > wrote: > > Here is a patch for that. > > > > * As I said upthread, the patch makes code much more simple; I removed all > > the changes to setrefs.c added by the partitionwise-join patch. I also > > simplified the logic for building a tlist for a child-join rel. The original > > PWJ computes attr_needed data even for child rels, and build the tlist for a > > child-join by passing to build_joinrel_tlist that data for input child rels > > for the child-join. But I think that's redundant, and it's more > > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist > > to get the child-join's tlist. So, I changed that way, which made > > unnecessary all the changes to build_joinrel_tlist and placeholder.c added > > by the PWJ patch, so I removed those as well. > > > > * The patch contains all of the regression tests in the original patch > > proposed by Ashutosh. > > I think this approach is going to run into trouble if the level at > which we have to apply the ConvertRowTypeExpr happens not to be a > projection-capable node. What's the plan forward here? This has been an open item for quite a while. Robert, are you in agreement with this approach on a high level? Who is going to drive this forward? Greetings, Andres Freund
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/06/22 23:58), Robert Haas wrote: I think this approach is going to run into trouble if the level at which we have to apply the ConvertRowTypeExpr happens not to be a projection-capable node. Actually, the level we have to do that would be a child rel of a partitioned table or a child join of a partitionwise join, so the plan node would be a scan or join plan unless the child rel or child join is itself partitioned, in which case the plan node would be Append/MergeAppend and the proposed patch recursively would apply that conversion to child plans for the Append/MergeAppend). And, in general, it seems to me that we want to produce the right outputs at the lowest possible level of the plan tree. For instance, suppose that one of the relations being scanned is not parallel-safe but the others are. Then, we could end up with a plan like this: Append -> Seq Scan on temp_rela -> Gather -> Parallel Seq Scan on thing1 -> Gather -> Parallel Seq Scan on thing2 If a projection is required to convert the row type expression, we certainly want that to get pushed below the Gather, not to happen at the Gather level itself. IIUC, we currently don't consider such a plan for partition-wise join; we'll only consider gathering partial paths for the parent appendrel. So, I'm not sure we need to take into account that when applying the ConvertRowtypeExpr. Maybe I'm missing something, though. We certainly don't want it to happen at the Append level, which can't even handle it. I think so too. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/06/22 22:54), Ashutosh Bapat wrote: I have started reviewing the patch. Thanks for the review! + if (enable_partitionwise_join&& rel->top_parent_is_partitioned) + { + build_childrel_tlist(root, rel, childrel, 1,); + } Why do we need rel->top_parent_is_partitioned? If a relation is partitioned (if (rel->part_scheme), it's either the top parent or is partition of some other partitioned table. In either case this condition will be true. This would be needed to avoid unnecessarily applying build_childrel_tlist to child rels of a partitioned table for which we don't consider partitionwise join. Consider: postgres=# create table lpt (c1 int, c2 text) partition by list (c1); CREATE TABLE postgres=# create table lpt_p1 partition of lpt for values in (1); CREATE TABLE postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text); CREATE TABLE postgres=# create table test (c1 int, c2 text); CREATE TABLE postgres=# explain verbose select * from (select * from lpt union all select * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1); QUERY PLAN Merge Join (cost=289.92..538.20 rows=16129 width=72) Output: lpt_p1.c1, lpt_p1.c2, test.c1, test.c2 Merge Cond: (test.c1 = lpt_p1.c1) -> Sort (cost=88.17..91.35 rows=1270 width=36) Output: test.c1, test.c2 Sort Key: test.c1 -> Seq Scan on public.test (cost=0.00..22.70 rows=1270 width=36) Output: test.c1, test.c2 -> Sort (cost=201.74..208.09 rows=2540 width=36) Output: lpt_p1.c1, lpt_p1.c2 Sort Key: lpt_p1.c1 -> Append (cost=0.00..58.10 rows=2540 width=36) -> Seq Scan on public.lpt_p1 (cost=0.00..22.70 rows=1270 width= 36) Output: lpt_p1.c1, lpt_p1.c2 -> Seq Scan on public.lpt_p2 (cost=0.00..22.70 rows=1270 width= 36) Output: lpt_p2.c1, lpt_p2.c2 (16 rows) In this example, the top parent is not a partitioned table and we don't need to consider partitionwise join for that, so we don't need to apply that to the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2). + /* No work if the child plan is an Append or MergeAppend */ + if (IsA(subplan, Append) || IsA(subplan, MergeAppend)) + return; Why? Probably it's related to the answer to the first question, But I don't see the connection. Given that partition-wise join will be applicable level by level, we need to recurse in adjust_subplan_tlist(). I don't think so; I think if the child plan is an Append or MergeAppend, the tlist for each subplan of the Append or MergeAppend would have already been adjusted while create_plan_recurse before we are called here. + /* The child plan should be able to do projection */ + Assert(is_projection_capable_plan(subplan)); + Again why? A MergeAppend's subplan could be a Sort plan, which will not be projection capable. IIUC, since we are called here right after create_plan_recurse, the child plan would be a scan or join unless it's neither Append nor MergeAppend. So it should be projection-capable. Maybe I'm missing something, though. This is not a full review. I will continue reviewing it further. Thanks again. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita wrote: > Here is a patch for that. > > * As I said upthread, the patch makes code much more simple; I removed all > the changes to setrefs.c added by the partitionwise-join patch. I also > simplified the logic for building a tlist for a child-join rel. The original > PWJ computes attr_needed data even for child rels, and build the tlist for a > child-join by passing to build_joinrel_tlist that data for input child rels > for the child-join. But I think that's redundant, and it's more > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist > to get the child-join's tlist. So, I changed that way, which made > unnecessary all the changes to build_joinrel_tlist and placeholder.c added > by the PWJ patch, so I removed those as well. > > * The patch contains all of the regression tests in the original patch > proposed by Ashutosh. I think this approach is going to run into trouble if the level at which we have to apply the ConvertRowTypeExpr happens not to be a projection-capable node. And, in general, it seems to me that we want to produce the right outputs at the lowest possible level of the plan tree. For instance, suppose that one of the relations being scanned is not parallel-safe but the others are. Then, we could end up with a plan like this: Append -> Seq Scan on temp_rela -> Gather -> Parallel Seq Scan on thing1 -> Gather -> Parallel Seq Scan on thing2 If a projection is required to convert the row type expression, we certainly want that to get pushed below the Gather, not to happen at the Gather level itself. We certainly don't want it to happen at the Append level, which can't even handle it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/06/15 20:56), Etsuro Fujita wrote: Actually, I've created a patch implementing that proposal. But I think that patch needs more work, so I'm planning to post it next week. Here is a patch for that. * As I said upthread, the patch makes code much more simple; I removed all the changes to setrefs.c added by the partitionwise-join patch. I also simplified the logic for building a tlist for a child-join rel. The original PWJ computes attr_needed data even for child rels, and build the tlist for a child-join by passing to build_joinrel_tlist that data for input child rels for the child-join. But I think that's redundant, and it's more straightforward to apply adjust_appendrel_attrs to the parent-join's tlist to get the child-join's tlist. So, I changed that way, which made unnecessary all the changes to build_joinrel_tlist and placeholder.c added by the PWJ patch, so I removed those as well. * The patch contains all of the regression tests in the original patch proposed by Ashutosh. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 8215,8222 ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false'); ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; --- 8215,8223 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false'); INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; ! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int) SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true'); + ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250); CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true'); ANALYZE fprt2; *** *** 8269,8294 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !QUERY PLAN ! - Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) -> Append -> Foreign Scan !Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) -> Foreign Scan !Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) (7 rows) ! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2; !t1 | t2 + (0,0,) | (0,0,) (150,150,0003) | (150,150,0003) (250,250,0005) | (250,250,0005) (400,400,0008) | (400,400,0008) ! (4 rows) -- join with lateral reference EXPLAIN (COSTS OFF) --- 8270,8305 -- with whole-row reference EXPLAIN (COSTS OFF) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !QUERY PLAN ! Sort Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2) -> Append -> Foreign Scan !Relations: (public.ftprt1_p1 t1) FULL JOIN (public.ftprt2_p1 t2) -> Foreign Scan !Relations: (public.ftprt1_p2 t1) FULL JOIN (public.ftprt2_p2 t2) (7 rows) ! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2; !wr | wr + (0,0,) | (0,0,) + (50,50,0001) | + (100,100,0002) | (150,150,0003) | (150,150,0003) + (200,200,0004) | (250,250,0005) | (250,250,0005) + (300,300,0006) | + (350,350,0007) | (400,400,0008) | (400,400,0008) ! (450,450,0009) | ! | (75,75,0001) ! | (225,225,0004) ! | (325,325,0006) !
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/06/14 22:37), Ashutosh Bapat wrote: On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita wrote: (2018/06/07 19:42), Ashutosh Bapat wrote: On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita wrote: Yeah, but I mean we modify set_append_rel_size so that we only map a parent wholerow Var in the parent tlist to a child wholerow Var in the child's tlist; parent wholerow Vars in the parent's other expressions such as conditions are transformed into CREs as-is. What happens to a PlaceHolderVar containing whole-row reference when that appears in a condition and/or targetlist. Whole-row Vars contained in such PHV's expressions will be mapped into ConvertRowtypeExprs with adjust_appendrel_attrs. I don't think the tlists for the children need to have their wholerows transformed into the corresponding ConvertRowtypeExprs *at this point*, so what I'd like to propose is to 1) map a parent wholerow Var simply to a child wholerow Var, instead (otherwise, the same as adjust_appendrel_attrs), when building the reltarget for a child in set_append_rel_size, 2) build the tlists for child joins using those children's wholerow Vars at path creation time, and 3) adjust those wholerow Vars in the tlists for subpaths in the chosen AppendPath so that those are transformed into the corresponding ConvertRowtypeExprs, at plan creation time (ie, in create_append_plan/create_merge_append_plan). IIUC, this would not require any changes to pull_var_clause as proposed by the patch. This would not require any changes to postgres_fdw as proposed by the patch, either. In addition, this would make unnecessary the code added to setrefs.c by the partitionwise-join patch. Maybe I'm missing something though. Not translating whole-row expressions to ConvertRowtypeExpr before creating paths can lead to a number of anomalies. For example, 1. if there's an index on the whole-row expression of child, translating parent's whole-row expression to child's whole-row expression as is will lead to using that index, when in reality the it can not be used since the condition/ORDER BY clause (which originally referred the parent's whole-row expression) require the child's whole-row reassembled as parent's whole-row. 2. Constraints on child'd whole-row expression, will be used to prune a child when they can not be used since the original condition was on parent' whole-row expression and not that of the child. 3. Equivalence classes will think that a child whole-row expression (without ConvertRowtypeExpr) is equivalent to an expression which is part of the same equivalence class as the parent' whole-row expression. Is that still possible when we only map a parent wholerow Var in the parent's tlist to a child wholerow Var in the child's tlist? Yes. 1 will affect the choice of index-only scans. We use pathkey comparisons at a number of places so tlist going out of sync with equivalence classes can affect a number of places. Sorry, I'm still not sure if #1 is really a problem. Could you show me an example for that? build_tlist_to_deparse() is used to deparse targetlist at the time of creating paths,as well as during the planning. According to your proposal we will build different tlists during path creation and plan creation. That doesn't look good. Maybe my explanation was not enough, but my proposal will guarantee that the FDW can build the same tlist at path creation time and plan creation time because the RelOptInfo's reltarget from which the tlist is created doesn't change at all. Apart from that your proposal of changing a child's targetlist at the time of plan creation to use CRE doesn't help since the original problem as described in [1] happens at the time of creating plans as described in [1]. I think my proposal will address the original issue. Actually, I've created a patch implementing that proposal. That's still WIP, but it works well even for these cases (and makes code much more simple, as mentioned above!) But I think that patch needs more work, so I'm planning to post it next week. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita wrote: > (2018/06/07 19:42), Ashutosh Bapat wrote: >> >> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita >> wrote: >>> >>> Since I'm not 100% sure that that is the right way to go, I've been >>> rethinking how to fix this issue. Yet another idea I came up with to fix >>> this is to redesign the handling of the tlists for children in the >>> partitioning case. Currently, we build the reltarget for a child by >>> applying adjust_appendrel_attrs to the reltarget for its parent in >>> set_append_rel_size, which maps a wholerow Var referring to the parent >>> rel >>> to a ConvertRowtypeExpr that translates a wholerow of the child rel into >>> a >>> wholerow of the parent rel's rowtype. This works well for the >>> non-partitioned inheritance case, but makes complicated the code for >>> handling the partitioning case especially when planning >>> partitionwise-joins. >>> And I think this would be the root cause of this issue. >> >> >> Although true, this is not only about targetlist. Even the whole-row >> expressions in the conditions, equivalence classes and other >> planner/optimizer data structures are translated to >> ConvertRowtypeExpr. > > > Yeah, but I mean we modify set_append_rel_size so that we only map a parent > wholerow Var in the parent tlist to a child wholerow Var in the child's > tlist; parent wholerow Vars in the parent's other expressions such as > conditions are transformed into CREs as-is. What happens to a PlaceHolderVar containing whole-row reference when that appears in a condition and/or targetlist. > > >>> I don't think the >>> tlists for the children need to have their wholerows transformed into the >>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to >>> propose is to 1) map a parent wholerow Var simply to a child wholerow >>> Var, >>> instead (otherwise, the same as adjust_appendrel_attrs), when building >>> the >>> reltarget for a child in set_append_rel_size, 2) build the tlists for >>> child >>> joins using those children's wholerow Vars at path creation time, and 3) >>> adjust those wholerow Vars in the tlists for subpaths in the chosen >>> AppendPath so that those are transformed into the corresponding >>> ConvertRowtypeExprs, at plan creation time (ie, in >>> create_append_plan/create_merge_append_plan). IIUC, this would not >>> require >>> any changes to pull_var_clause as proposed by the patch. This would not >>> require any changes to postgres_fdw as proposed by the patch, either. In >>> addition, this would make unnecessary the code added to setrefs.c by the >>> partitionwise-join patch. Maybe I'm missing something though. >> >> >> Not translating whole-row expressions to ConvertRowtypeExpr before >> creating paths can lead to a number of anomalies. For example, >> >> 1. if there's an index on the whole-row expression of child, >> translating parent's whole-row expression to child's whole-row >> expression as is will lead to using that index, when in reality the it >> can not be used since the condition/ORDER BY clause (which originally >> referred the parent's whole-row expression) require the child's >> whole-row reassembled as parent's whole-row. >> 2. Constraints on child'd whole-row expression, will be used to prune >> a child when they can not be used since the original condition was on >> parent' whole-row expression and not that of the child. >> 3. Equivalence classes will think that a child whole-row expression >> (without ConvertRowtypeExpr) is equivalent to an expression which is >> part of the same equivalence class as the parent' whole-row >> expression. > > > Is that still possible when we only map a parent wholerow Var in the > parent's tlist to a child wholerow Var in the child's tlist? Yes. 1 will affect the choice of index-only scans. We use pathkey comparisons at a number of places so tlist going out of sync with equivalence classes can affect a number of places. build_tlist_to_deparse() is used to deparse targetlist at the time of creating paths,as well as during the planning. According to your proposal we will build different tlists during path creation and plan creation. That doesn't look good. Apart from that your proposal of changing a child's targetlist at the time of plan creation to use CRE doesn't help since the original problem as described in [1] happens at the time of creating plans as described in [1]. The parent's whole-row reference has a different data type than child's whole-row reference. If we do not cover child's whole-row reference by CRE, the parent and child targetlists will go out of sync as far as the type of individual columns are concerned. That too doesn't look good to me. [1] https://www.postgresql.org/message-id/CAFjFpRfD5=lsoqg1euf-vmpm9koezbbgjo68d9wkubjxop_...@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/06/07 19:42), Ashutosh Bapat wrote: On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita wrote: Since I'm not 100% sure that that is the right way to go, I've been rethinking how to fix this issue. Yet another idea I came up with to fix this is to redesign the handling of the tlists for children in the partitioning case. Currently, we build the reltarget for a child by applying adjust_appendrel_attrs to the reltarget for its parent in set_append_rel_size, which maps a wholerow Var referring to the parent rel to a ConvertRowtypeExpr that translates a wholerow of the child rel into a wholerow of the parent rel's rowtype. This works well for the non-partitioned inheritance case, but makes complicated the code for handling the partitioning case especially when planning partitionwise-joins. And I think this would be the root cause of this issue. Although true, this is not only about targetlist. Even the whole-row expressions in the conditions, equivalence classes and other planner/optimizer data structures are translated to ConvertRowtypeExpr. Yeah, but I mean we modify set_append_rel_size so that we only map a parent wholerow Var in the parent tlist to a child wholerow Var in the child's tlist; parent wholerow Vars in the parent's other expressions such as conditions are transformed into CREs as-is. I don't think the tlists for the children need to have their wholerows transformed into the corresponding ConvertRowtypeExprs *at this point*, so what I'd like to propose is to 1) map a parent wholerow Var simply to a child wholerow Var, instead (otherwise, the same as adjust_appendrel_attrs), when building the reltarget for a child in set_append_rel_size, 2) build the tlists for child joins using those children's wholerow Vars at path creation time, and 3) adjust those wholerow Vars in the tlists for subpaths in the chosen AppendPath so that those are transformed into the corresponding ConvertRowtypeExprs, at plan creation time (ie, in create_append_plan/create_merge_append_plan). IIUC, this would not require any changes to pull_var_clause as proposed by the patch. This would not require any changes to postgres_fdw as proposed by the patch, either. In addition, this would make unnecessary the code added to setrefs.c by the partitionwise-join patch. Maybe I'm missing something though. Not translating whole-row expressions to ConvertRowtypeExpr before creating paths can lead to a number of anomalies. For example, 1. if there's an index on the whole-row expression of child, translating parent's whole-row expression to child's whole-row expression as is will lead to using that index, when in reality the it can not be used since the condition/ORDER BY clause (which originally referred the parent's whole-row expression) require the child's whole-row reassembled as parent's whole-row. 2. Constraints on child'd whole-row expression, will be used to prune a child when they can not be used since the original condition was on parent' whole-row expression and not that of the child. 3. Equivalence classes will think that a child whole-row expression (without ConvertRowtypeExpr) is equivalent to an expression which is part of the same equivalence class as the parent' whole-row expression. Is that still possible when we only map a parent wholerow Var in the parent's tlist to a child wholerow Var in the child's tlist? Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita wrote: > (2018/05/18 16:33), Etsuro Fujita wrote: >> >> Other than pull_var_clause things, >> the updated version looks good to me, so I'll mark this as Ready for >> Committer. > > > Since I'm not 100% sure that that is the right way to go, I've been > rethinking how to fix this issue. Yet another idea I came up with to fix > this is to redesign the handling of the tlists for children in the > partitioning case. Currently, we build the reltarget for a child by > applying adjust_appendrel_attrs to the reltarget for its parent in > set_append_rel_size, which maps a wholerow Var referring to the parent rel > to a ConvertRowtypeExpr that translates a wholerow of the child rel into a > wholerow of the parent rel's rowtype. This works well for the > non-partitioned inheritance case, but makes complicated the code for > handling the partitioning case especially when planning partitionwise-joins. > And I think this would be the root cause of this issue. Although true, this is not only about targetlist. Even the whole-row expressions in the conditions, equivalence classes and other planner/optimizer data structures are translated to ConvertRowtypeExpr. > I don't think the > tlists for the children need to have their wholerows transformed into the > corresponding ConvertRowtypeExprs *at this point*, so what I'd like to > propose is to 1) map a parent wholerow Var simply to a child wholerow Var, > instead (otherwise, the same as adjust_appendrel_attrs), when building the > reltarget for a child in set_append_rel_size, 2) build the tlists for child > joins using those children's wholerow Vars at path creation time, and 3) > adjust those wholerow Vars in the tlists for subpaths in the chosen > AppendPath so that those are transformed into the corresponding > ConvertRowtypeExprs, at plan creation time (ie, in > create_append_plan/create_merge_append_plan). IIUC, this would not require > any changes to pull_var_clause as proposed by the patch. This would not > require any changes to postgres_fdw as proposed by the patch, either. In > addition, this would make unnecessary the code added to setrefs.c by the > partitionwise-join patch. Maybe I'm missing something though. Not translating whole-row expressions to ConvertRowtypeExpr before creating paths can lead to a number of anomalies. For example, 1. if there's an index on the whole-row expression of child, translating parent's whole-row expression to child's whole-row expression as is will lead to using that index, when in reality the it can not be used since the condition/ORDER BY clause (which originally referred the parent's whole-row expression) require the child's whole-row reassembled as parent's whole-row. 2. Constraints on child'd whole-row expression, will be used to prune a child when they can not be used since the original condition was on parent' whole-row expression and not that of the child. 3. Equivalence classes will think that a child whole-row expression (without ConvertRowtypeExpr) is equivalent to an expression which is part of the same equivalence class as the parent' whole-row expression. Given these serious problems, I don't think we could afford not to cover a child whole-row reference in ConvertRowtypeExpr. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/05/18 16:33), Etsuro Fujita wrote: Other than pull_var_clause things, the updated version looks good to me, so I'll mark this as Ready for Committer. Since I'm not 100% sure that that is the right way to go, I've been rethinking how to fix this issue. Yet another idea I came up with to fix this is to redesign the handling of the tlists for children in the partitioning case. Currently, we build the reltarget for a child by applying adjust_appendrel_attrs to the reltarget for its parent in set_append_rel_size, which maps a wholerow Var referring to the parent rel to a ConvertRowtypeExpr that translates a wholerow of the child rel into a wholerow of the parent rel's rowtype. This works well for the non-partitioned inheritance case, but makes complicated the code for handling the partitioning case especially when planning partitionwise-joins. And I think this would be the root cause of this issue. I don't think the tlists for the children need to have their wholerows transformed into the corresponding ConvertRowtypeExprs *at this point*, so what I'd like to propose is to 1) map a parent wholerow Var simply to a child wholerow Var, instead (otherwise, the same as adjust_appendrel_attrs), when building the reltarget for a child in set_append_rel_size, 2) build the tlists for child joins using those children's wholerow Vars at path creation time, and 3) adjust those wholerow Vars in the tlists for subpaths in the chosen AppendPath so that those are transformed into the corresponding ConvertRowtypeExprs, at plan creation time (ie, in create_append_plan/create_merge_append_plan). IIUC, this would not require any changes to pull_var_clause as proposed by the patch. This would not require any changes to postgres_fdw as proposed by the patch, either. In addition, this would make unnecessary the code added to setrefs.c by the partitionwise-join patch. Maybe I'm missing something though. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/05/17 23:19), Ashutosh Bapat wrote: On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujitawrote: (2018/05/16 22:49), Ashutosh Bapat wrote: On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita wrote: However, considering that pull_var_clause is used in many places where partitioning is *not* involved, I still think it's better to avoid spending extra cycles needed only for partitioning in that function. Right. The changes done in inheritance_planner() imply that we can encounter a ConvertRowtypeExpr even in the expressions for a relation which is not a child relation in the translated query. It was a child in the original query but after translating it becomes a parent and has ConvertRowtypeExpr somewhere there. Sorry, I don't understand this. Could you show me such an example? Even without inheritance we can induce a ConvertRowtypeExpr on a base relation which is not "other" relation create table parent (a int, b int); create table child () inherits(parent); alter table child add constraint whole_par_const check ((child::parent).a = 1); There is no way to tell by looking at the parsed query whether pull_var_clause() from StoreRelCheck() will encounter a ConvertRowtypeExpr or not. We could check whether the tables involved are inherited or not, but that's more expensive than is_converted_whole_row_reference(). Yeah, ISTM that the inheritance test makes things worse. So, there is no way to know whether a given expression will have ConvertRowtypeExpr in it. That's my understanding. But if we can device such a test, I am happy to apply that test before or witin the call to pull_var_clause(). We don't need that expensive test if user has passed PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check the shape of expression tree. It would cause more asymmetry in pull_var_clause(), but we can live with it or change the order of tests for all the three options. I will differ that to a committer. Sounds more invasive. Considering where we are in the development cycle for PG11, I think it'd be better to address this by something like the approach I proposed. Anyway, +1 for asking the committer's opinion. Thanks. Let's ask that. - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: +extern bool +is_converted_whole_row_reference(Node *node) I think we should remove "extern" from the definition. Done. - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: I think we can fix this by adding another flag to indicate whether we deparsed the first live column of the relation, as in the "first" bool flag in deparseTargetList. Thanks. Fixed. Thanks for updating the patch set! Other than pull_var_clause things, the updated version looks good to me, so I'll mark this as Ready for Committer. As I said before, I think this issue should be addressed in advance of the PG11 release. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujitawrote: > (2018/05/16 22:49), Ashutosh Bapat wrote: >> >> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita >> wrote: >>> >>> However, considering that >>> pull_var_clause is used in many places where partitioning is *not* >>> involved, >>> I still think it's better to avoid spending extra cycles needed only for >>> partitioning in that function. >> >> >> Right. The changes done in inheritance_planner() imply that we can >> encounter a ConvertRowtypeExpr even in the expressions for a relation >> which is not a child relation in the translated query. It was a child >> in the original query but after translating it becomes a parent and >> has ConvertRowtypeExpr somewhere there. > > > Sorry, I don't understand this. Could you show me such an example? Even without inheritance we can induce a ConvertRowtypeExpr on a base relation which is not "other" relation create table parent (a int, b int); create table child () inherits(parent); alter table child add constraint whole_par_const check ((child::parent).a = 1); There is no way to tell by looking at the parsed query whether pull_var_clause() from StoreRelCheck() will encounter a ConvertRowtypeExpr or not. We could check whether the tables involved are inherited or not, but that's more expensive than is_converted_whole_row_reference(). > >> So, there is no way to know >> whether a given expression will have ConvertRowtypeExpr in it. That's >> my understanding. But if we can device such a test, I am happy to >> apply that test before or witin the call to pull_var_clause(). >> >> We don't need that expensive test if user has passed >> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check >> the shape of expression tree. It would cause more asymmetry in >> pull_var_clause(), but we can live with it or change the order of >> tests for all the three options. I will differ that to a committer. > > > Sounds more invasive. Considering where we are in the development cycle for > PG11, I think it'd be better to address this by something like the approach > I proposed. Anyway, +1 for asking the committer's opinion. Thanks. > > - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: > > +extern bool > +is_converted_whole_row_reference(Node *node) > > I think we should remove "extern" from the definition. Done. > > - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: > > > I think we can fix this by adding another flag to indicate whether we > deparsed the first live column of the relation, as in the "first" bool flag > in deparseTargetList. Thanks. Fixed. > > One more thing to add is: the patch adds support for deparsing a > ConvertRowtypeExpr that translates a wholerow of a childrel into a wholerow > of its parentrel's rowtype, so by modifying is_foreign_expr accordingly, we > could push down such CREs in JOIN ON conditions to the remote for example. > But that would be an improvement, not a fix, so I think we should leave that > for another patch for PG12. Right. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company expr_ref_error_pwj_v9.tar.gz Description: GNU Zip compressed data
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/05/16 22:49), Ashutosh Bapat wrote: On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujitawrote: However, considering that pull_var_clause is used in many places where partitioning is *not* involved, I still think it's better to avoid spending extra cycles needed only for partitioning in that function. Right. The changes done in inheritance_planner() imply that we can encounter a ConvertRowtypeExpr even in the expressions for a relation which is not a child relation in the translated query. It was a child in the original query but after translating it becomes a parent and has ConvertRowtypeExpr somewhere there. Sorry, I don't understand this. Could you show me such an example? So, there is no way to know whether a given expression will have ConvertRowtypeExpr in it. That's my understanding. But if we can device such a test, I am happy to apply that test before or witin the call to pull_var_clause(). We don't need that expensive test if user has passed PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check the shape of expression tree. It would cause more asymmetry in pull_var_clause(), but we can live with it or change the order of tests for all the three options. I will differ that to a committer. Sounds more invasive. Considering where we are in the development cycle for PG11, I think it'd be better to address this by something like the approach I proposed. Anyway, +1 for asking the committer's opinion. + /* Construct the conversion map. */ + parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0); I think it'd be better to pass -1, not 0, as the typmod argument for that function because that would be consistent with other places where we use that function with named rowtypes (eg, ruleutils.c). Done. Thanks! -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno) +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel, + int *relno, int *colno) -1 for that change; because 1) we use "var" for implying many things as in eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the name as-is. I think pull_var_clause is the only place where we do that and I don't like that very much. I also don't like is_subquery_var() name since it's too specific. We might want that kind of functionality to ship generic expressions and not just Vars in future. Usually we would be searching for columns in the subquery targetlist so the name change looks good to me. IIRC, the function was added one release back, so backpatching pain, if any, won't be much. But I don't think we should carry a misnomer for many releases to come. Let's differ this to a committer. OK Here's set of updated patches. Thanks for updating the patch set! I noticed followings. Sorry for not having noticed that before. - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch: +extern bool +is_converted_whole_row_reference(Node *node) I think we should remove "extern" from the definition. - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: + /* Construct ROW expression according to the conversion map. */ + appendStringInfoString(buf, "ROW("); + for (cnt = 0; cnt < parent_desc->natts; cnt++) + { + /* Ignore dropped columns. */ + if (attrMap[cnt] == 0) + continue; + + if (cnt > 0) + appendStringInfoString(buf, ", "); + deparseColumnRef(buf, child_var->varno, attrMap[cnt], +planner_rt_fetch(child_var->varno, root), +qualify_col); + } + appendStringInfoChar(buf, ')'); The result would start with ", " in the case where the cnt=0 column of the parent relation is a dropped one. Here is an example: postgres=# create table pt1 (a int, b int) partition by range (b); CREATE TABLE postgres=# alter table pt1 drop column a; ALTER TABLE postgres=# alter table pt1 add column a int; ALTER TABLE postgres=# create table loct11 (like pt1); CREATE TABLE postgres=# create foreign table pt1p1 partition of pt1 for values from (0) to (100) server loopback options (table_name 'loct11', use_remote_estimate 'true'); CREATE FOREIGN TABLE postgres=# insert into pt1 select i, i from generate_series(0, 99, 2) i; INSERT 0 50 postgres=# analyze pt1; ANALYZE postgres=# analyze pt1p1; ANALYZE postgres=# create table pt2 (a int, b int) partition by range (b); CREATE TABLE postgres=# create table loct21 (like pt2); CREATE TABLE postgres=# create foreign table pt2p1 partition of pt2 for values from (0) to (100) server loopback options (table_name 'loct21', use_remote_estimate 'true'); CREATE FOREIGN TABLE postgres=# insert into pt2 select i, i from generate_series(0, 99, 3) i; INSERT 0 34 postgres=# analyze pt2; ANALYZE postgres=# analyze pt2p1; ANALYZE postgres=# set enable_partitionwise_join to true; SET postgres=# explain verbose select pt1.* from pt1, pt2 where pt1.b = pt2.b for update
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujitawrote: > So we could really minimize the code change and avoid the additional overhead caused by the is_converted_whole_row_reference test added to pull_var_clause. I think the latter would be rather important, because PWJ is disabled by default. What do you think about that approach? >>> >>> >>> Partition-wise join and partition-wise aggregation are disabled by >>> default temporarily. We will change it in near future once the memory >>> consumption issue has been tackled. The features are useful and users >>> would want those to be enabled by default like other query >>> optimizations. So, we can't take a decision based on that. > > > Yeah, I really agree on that point! However, considering that > pull_var_clause is used in many places where partitioning is *not* involved, > I still think it's better to avoid spending extra cycles needed only for > partitioning in that function. Right. The changes done in inheritance_planner() imply that we can encounter a ConvertRowtypeExpr even in the expressions for a relation which is not a child relation in the translated query. It was a child in the original query but after translating it becomes a parent and has ConvertRowtypeExpr somewhere there. So, there is no way to know whether a given expression will have ConvertRowtypeExpr in it. That's my understanding. But if we can device such a test, I am happy to apply that test before or witin the call to pull_var_clause(). We don't need that expensive test if user has passed PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check the shape of expression tree. It would cause more asymmetry in pull_var_clause(), but we can live with it or change the order of tests for all the three options. I will differ that to a committer. > + /* Construct the conversion map. */ > + parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0); > > I think it'd be better to pass -1, not 0, as the typmod argument for that > function because that would be consistent with other places where we use > that function with named rowtypes (eg, ruleutils.c). Done. > > -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno) > +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel, > + int *relno, int *colno) > > -1 for that change; because 1) we use "var" for implying many things as in > eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the > name as-is. I think pull_var_clause is the only place where we do that and I don't like that very much. I also don't like is_subquery_var() name since it's too specific. We might want that kind of functionality to ship generic expressions and not just Vars in future. Usually we would be searching for columns in the subquery targetlist so the name change looks good to me. IIRC, the function was added one release back, so backpatching pain, if any, won't be much. But I don't think we should carry a misnomer for many releases to come. Let's differ this to a committer. Here's set of updated patches. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company expr_ref_error_pwj_v8.tar.gz Description: GNU Zip compressed data
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/05/14 15:34), Ashutosh Bapat wrote: On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapatwrote: On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita wrote: Here's the query. explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2; Thanks! Yet yet another case where pull_var_clause produces an error: postgres=# create table list_pt (a int, b text) partition by list (a); CREATE TABLE postgres=# create table list_ptp1 partition of list_pt for values in (1); CREATE TABLE postgres=# alter table list_ptp1 add constraint list_ptp1_check check (list_ptp1::list_pt::text != NULL); ERROR: ConvertRowtypeExpr found where not expected The patch kept the flags passed to pull_var_clause in src/backend/catalog/heap.c as-is, but I think we need to modify that accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger as well. With all the support that we have added in partitioning for v11, I think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places, which were left unchanged in the earlier versions of patches, which were written when all that support wasn't in v11. Actually, we'd get the same error without using anything in partitioning. Here is an example: postgres=# create table parent (a int, b text); CREATE TABLE postgres=# create table child (a int, b text); CREATE TABLE postgres=# alter table child inherit parent ; ALTER TABLE postgres=# alter table child add constraint child_check check (child::parent::text != NULL); ERROR: ConvertRowtypeExpr found where not expected Having said that, I started to think this approach would make code much complicated. Another idea I came up with to avoid that would be to use pull_var_clause as-is and then rewrite wholerows of partitions back to ConvertRowtypeExprs translating those wholerows to wholerows of their parents tables' rowtypes if necessary. That would be annoying and a little bit inefficient, but the places where we need to rewrite is limited: prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. Other reason why we can't use your approach is we can not decide whether ConvertRowtypeExpr is needed by just looking at the Var node. You might say, that we add a ConvertRowtypeExpr if the Var::varno references a child relation. But that's not true. For example a whole row reference embedded in ConvertRowtypeExpr added by query adjustments in inheritance_planner() is not a child's whole-row reference in the adjusted query. Sorry, I don't understand this fully. So, it's not clear to me when we add a ConvertRowtypeExpr and we don't. I think it'd be OK to rewrite so at least in prepare_sort_from_pathkeys and build_tlist_to_deparse. I am not sure if those are the only two places which require a fix. Right now it looks like those are the places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true in the future, esp. given the amount of work we expect to happen in the partitioning area. When that happens, fixing all those places in that way wouldn't be good. I have to admit that the approach I proposed is a band-aid fix. So we could really minimize the code change and avoid the additional overhead caused by the is_converted_whole_row_reference test added to pull_var_clause. I think the latter would be rather important, because PWJ is disabled by default. What do you think about that approach? Partition-wise join and partition-wise aggregation are disabled by default temporarily. We will change it in near future once the memory consumption issue has been tackled. The features are useful and users would want those to be enabled by default like other query optimizations. So, we can't take a decision based on that. Yeah, I really agree on that point! However, considering that pull_var_clause is used in many places where partitioning is *not* involved, I still think it's better to avoid spending extra cycles needed only for partitioning in that function. Here's set of updated patches. Thanks for updating the patch! Here are some other minor comments on patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: + /* Construct the conversion map. */ + parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0); I think it'd be better to pass -1, not 0, as the typmod argument for that function because that would be consistent with other places where we use that function with named rowtypes (eg, ruleutils.c). -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno) +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel, + int *relno, int *colno) -1 for that change; because 1) we use "var" for implying many things as in eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the name as-is. Other than that the patch set looks good to me. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapatwrote: > On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita > wrote: >> >> I guess you forgot to show the query. > > Sorry. Here's the query. > > explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b > where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2; > >> >> Yet yet another case where pull_var_clause produces an error: >> >> postgres=# create table list_pt (a int, b text) partition by list (a); >> CREATE TABLE >> postgres=# create table list_ptp1 partition of list_pt for values in (1); >> CREATE TABLE >> postgres=# alter table list_ptp1 add constraint list_ptp1_check check >> (list_ptp1::list_pt::text != NULL); >> ERROR: ConvertRowtypeExpr found where not expected >> >> The patch kept the flags passed to pull_var_clause in >> src/backend/catalog/heap.c as-is, but I think we need to modify that >> accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger >> as well. > > With all the support that we have added in partitioning for v11, I > think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places, > which were left unchanged in the earlier versions of patches, which > were written when all that support wasn't in v11. > >> >> Having said that, I started to think this approach would make code much >> complicated. Another idea I came up with to avoid that would be to use >> pull_var_clause as-is and then rewrite wholerows of partitions back to >> ConvertRowtypeExprs translating those wholerows to wholerows of their >> parents tables' rowtypes if necessary. That would be annoying and a little >> bit inefficient, but the places where we need to rewrite is limited: >> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. > > Other reason why we can't use your approach is we can not decide > whether ConvertRowtypeExpr is needed by just looking at the Var node. > You might say, that we add a ConvertRowtypeExpr if the Var::varno > references a child relation. But that's not true. For example a whole > row reference embedded in ConvertRowtypeExpr added by query > adjustments in inheritance_planner() is not a child's whole-row > reference in the adjusted query. So, it's not clear to me when we add > a ConvertRowtypeExpr and we don't. I am not sure if those are the only > two places which require a fix. Right now it looks like those are the > places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true > in the future, esp. given the amount of work we expect to happen in > the partitioning area. When that happens, fixing all those places in > that way wouldn't be good. > >> So we could >> really minimize the code change and avoid the additional overhead caused by >> the is_converted_whole_row_reference test added to pull_var_clause. I think >> the latter would be rather important, because PWJ is disabled by default. >> What do you think about that approach? > > Partition-wise join and partition-wise aggregation are disabled by > default temporarily. We will change it in near future once the memory > consumption issue has been tackled. The features are useful and users > would want those to be enabled by default like other query > optimizations. So, we can't take a decision based on that. Here's set of updated patches. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company expr_ref_error_pwj_v7.tar.gz Description: GNU Zip compressed data
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujitawrote: > > I guess you forgot to show the query. Sorry. Here's the query. explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2; > > Yet yet another case where pull_var_clause produces an error: > > postgres=# create table list_pt (a int, b text) partition by list (a); > CREATE TABLE > postgres=# create table list_ptp1 partition of list_pt for values in (1); > CREATE TABLE > postgres=# alter table list_ptp1 add constraint list_ptp1_check check > (list_ptp1::list_pt::text != NULL); > ERROR: ConvertRowtypeExpr found where not expected > > The patch kept the flags passed to pull_var_clause in > src/backend/catalog/heap.c as-is, but I think we need to modify that > accordingly. Probably, the flags passed to pull_var_clause in CreateTrigger > as well. With all the support that we have added in partitioning for v11, I think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places, which were left unchanged in the earlier versions of patches, which were written when all that support wasn't in v11. > > Having said that, I started to think this approach would make code much > complicated. Another idea I came up with to avoid that would be to use > pull_var_clause as-is and then rewrite wholerows of partitions back to > ConvertRowtypeExprs translating those wholerows to wholerows of their > parents tables' rowtypes if necessary. That would be annoying and a little > bit inefficient, but the places where we need to rewrite is limited: > prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC. Other reason why we can't use your approach is we can not decide whether ConvertRowtypeExpr is needed by just looking at the Var node. You might say, that we add a ConvertRowtypeExpr if the Var::varno references a child relation. But that's not true. For example a whole row reference embedded in ConvertRowtypeExpr added by query adjustments in inheritance_planner() is not a child's whole-row reference in the adjusted query. So, it's not clear to me when we add a ConvertRowtypeExpr and we don't. I am not sure if those are the only two places which require a fix. Right now it looks like those are the places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true in the future, esp. given the amount of work we expect to happen in the partitioning area. When that happens, fixing all those places in that way wouldn't be good. > So we could > really minimize the code change and avoid the additional overhead caused by > the is_converted_whole_row_reference test added to pull_var_clause. I think > the latter would be rather important, because PWJ is disabled by default. > What do you think about that approach? Partition-wise join and partition-wise aggregation are disabled by default temporarily. We will change it in near future once the memory consumption issue has been tackled. The features are useful and users would want those to be enabled by default like other query optimizations. So, we can't take a decision based on that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Thu, May 10, 2018 at 6:23 PM, Etsuro Fujitawrote: > (2018/05/10 13:04), Ashutosh Bapat wrote: >> >> On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita >> wrote: >>> >>> (2018/04/25 18:51), Ashutosh Bapat wrote: Actually I noticed that ConvertRowtypeExpr are used to cast a child's whole row reference expression (not just a Var node) into that of its parent and not. For example a cast like NULL::child::parent produces a ConvertRowtypeExpr encapsulating a NULL constant node and not a Var node. We are interested only in ConvertRowtypeExprs embedding Var nodes with Var::varattno = 0. I have changed this code to use function is_converted_whole_row_reference() instead of the above code with Assert. In order to use that function, I had to take it out of setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. >>> >>> >>> This change seems a bit confusing to me because the flag bits >>> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to >>> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a >>> given clause, but really, it only handles ConvertRowtypeExprs you >>> mentioned >>> above. To make that function easy to understand and use, I think it'd be >>> better to use the IsA(node, ConvertRowtypeExpr) test as in the first >>> version >>> of the patch, instead of is_converted_whole_row_reference, which would be >>> more consistent with other cases such as PlaceHolderVar. >> >> >> I agree that using is_converted_whole_row_reference() is not >> consistent with the other nodes that are handled by pull_var_clause(). >> I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's >> being done with those options. But using >> is_converted_whole_row_reference() is the correct thing to do since we >> are interested only in the whole-row references embedded in >> ConvertRowtypeExpr. There can be anything encapsulated in >> ConvertRowtypeExpr(), a non-shippable function for example. We don't >> want to try to push that down in postgres_fdw's case. Neither in other >> cases. > > > Yeah, but I think the IsA test would be sufficient to make things work > correctly because we can assume that there aren't any other nodes than Var, > PHV, and CRE translating a wholerow value of a child table into a wholerow > value of its parent table's rowtype in the expression clause to which we > apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES. Not really. Consider following case using the same tables fprt1 and fprt2 in test sql/postgres_fdw.sql create function row_as_is(a ftprt2_p1) returns ftprt2_p1 as $$begin return a; end;$$ language plpgsql; With the change you propose i.e. diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index f972712..088da14 100644 --- a/src/backend/optimizer/util/var.c +++ b/src/backend/optimizer/util/var.c @@ -646,8 +646,9 @@ pull_var_clause_walker(Node *node, pull_var_clause_context *context) else elog(ERROR, "PlaceHolderVar found where not expected"); } - else if (is_converted_whole_row_reference(node)) + else if (IsA(node, ConvertRowtypeExpr)) { + Assert(is_converted_whole_row_reference(node)); if (context->flags & PVC_INCLUDE_CONVERTROWTYPES) { context->varlist = lappend(context->varlist, node); following query crashes at Assert(is_converted_whole_row_reference(node)); If we remove that assert, it would end up pushing down row_as_is(), which is a local user defined function, to the foreign server. That's not desirable since the foreign server may not have that user defined function. > > BTW, one thing I noticed about the new version of the patch is: there is yet > another case where pull_var_clause produces an error. Here is an example: > > postgres=# create table t1 (a int, b text); > CREATE TABLE > postgres=# create table t2 (a int, b text); > CREATE TABLE > postgres=# create foreign table ft1 (a int, b text) server loopback options > (table_name 't1'); > CREATE FOREIGN TABLE > postgres=# create foreign table ft2 (a int, b text) server loopback options > (table_name 't2'); > CREATE FOREIGN TABLE > postgres=# insert into ft1 values (1, 'foo'); > INSERT 0 1 > postgres=# insert into ft1 values (2, 'bar'); > INSERT 0 1 > postgres=# insert into ft2 values (1, 'test1'); > INSERT 0 1 > postgres=# insert into ft2 values (2, 'test2'); > INSERT 0 1 > postgres=# analyze ft1; > ANALYZE > postgres=# analyze ft2; > ANALYZE > postgres=# create table parent (a int, b text); > CREATE TABLE > postgres=# alter foreign table ft1 inherit parent; > ALTER FOREIGN TABLE > postgres=# explain verbose update parent set b = ft2.b from ft2 where > parent.a = ft2.a returning parent; > ERROR: ConvertRowtypeExpr found where not expected > > To produce this, apply the patch in [1] as
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/05/10 13:04), Ashutosh Bapat wrote: On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujitawrote: (2018/04/25 18:51), Ashutosh Bapat wrote: Actually I noticed that ConvertRowtypeExpr are used to cast a child's whole row reference expression (not just a Var node) into that of its parent and not. For example a cast like NULL::child::parent produces a ConvertRowtypeExpr encapsulating a NULL constant node and not a Var node. We are interested only in ConvertRowtypeExprs embedding Var nodes with Var::varattno = 0. I have changed this code to use function is_converted_whole_row_reference() instead of the above code with Assert. In order to use that function, I had to take it out of setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. This change seems a bit confusing to me because the flag bits "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a given clause, but really, it only handles ConvertRowtypeExprs you mentioned above. To make that function easy to understand and use, I think it'd be better to use the IsA(node, ConvertRowtypeExpr) test as in the first version of the patch, instead of is_converted_whole_row_reference, which would be more consistent with other cases such as PlaceHolderVar. I agree that using is_converted_whole_row_reference() is not consistent with the other nodes that are handled by pull_var_clause(). I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's being done with those options. But using is_converted_whole_row_reference() is the correct thing to do since we are interested only in the whole-row references embedded in ConvertRowtypeExpr. There can be anything encapsulated in ConvertRowtypeExpr(), a non-shippable function for example. We don't want to try to push that down in postgres_fdw's case. Neither in other cases. Yeah, but I think the IsA test would be sufficient to make things work correctly because we can assume that there aren't any other nodes than Var, PHV, and CRE translating a wholerow value of a child table into a wholerow value of its parent table's rowtype in the expression clause to which we apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES. BTW, one thing I noticed about the new version of the patch is: there is yet another case where pull_var_clause produces an error. Here is an example: postgres=# create table t1 (a int, b text); CREATE TABLE postgres=# create table t2 (a int, b text); CREATE TABLE postgres=# create foreign table ft1 (a int, b text) server loopback options (table_name 't1'); CREATE FOREIGN TABLE postgres=# create foreign table ft2 (a int, b text) server loopback options (table_name 't2'); CREATE FOREIGN TABLE postgres=# insert into ft1 values (1, 'foo'); INSERT 0 1 postgres=# insert into ft1 values (2, 'bar'); INSERT 0 1 postgres=# insert into ft2 values (1, 'test1'); INSERT 0 1 postgres=# insert into ft2 values (2, 'test2'); INSERT 0 1 postgres=# analyze ft1; ANALYZE postgres=# analyze ft2; ANALYZE postgres=# create table parent (a int, b text); CREATE TABLE postgres=# alter foreign table ft1 inherit parent; ALTER FOREIGN TABLE postgres=# explain verbose update parent set b = ft2.b from ft2 where parent.a = ft2.a returning parent; ERROR: ConvertRowtypeExpr found where not expected To produce this, apply the patch in [1] as well as the new version; without that patch in [1], the update query would crash the server (see [1] for details). To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to pull_var_clause in build_remote_returning in postgres_fdw.c as well. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/5AF43E02.3%40lab.ntt.co.jp
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujitawrote: > > (2018/04/25 18:51), Ashutosh Bapat wrote: >> Actually I noticed that ConvertRowtypeExpr are used to cast a child's >> whole row reference expression (not just a Var node) into that of its >> parent and not. For example a cast like NULL::child::parent produces a >> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var >> node. We are interested only in ConvertRowtypeExprs embedding Var >> nodes with Var::varattno = 0. I have changed this code to use function >> is_converted_whole_row_reference() instead of the above code with >> Assert. In order to use that function, I had to take it out of >> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. > > This change seems a bit confusing to me because the flag bits > "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to > pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a > given clause, but really, it only handles ConvertRowtypeExprs you mentioned > above. To make that function easy to understand and use, I think it'd be > better to use the IsA(node, ConvertRowtypeExpr) test as in the first version > of the patch, instead of is_converted_whole_row_reference, which would be > more consistent with other cases such as PlaceHolderVar. I agree that using is_converted_whole_row_reference() is not consistent with the other nodes that are handled by pull_var_clause(). I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's being done with those options. But using is_converted_whole_row_reference() is the correct thing to do since we are interested only in the whole-row references embedded in ConvertRowtypeExpr. There can be anything encapsulated in ConvertRowtypeExpr(), a non-shippable function for example. We don't want to try to push that down in postgres_fdw's case. Neither in other cases. So, it boils down to changing names of PVC_*_CONVERTROWTYPES to something more meaningful. How about PVC_*_CONVERTWHOLEROWS? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/05/08 16:20), Ashutosh Bapat wrote: On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujitawrote: Here are my review comments on patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: * This assertion in deparseConvertRowtypeExpr wouldn't always be true because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN TABLE ADD COLUMN: + Assert(parent_desc->natts == child_desc->natts); I think we should remove that assertion. Removed. * But I don't think that change is enough. Here is another oddity with that assertion removed. To fix this, I think we should skip the deparseColumnRef processing for dropped columns in the parent table. Done. Thanks for those changes! I have changed the test file a bit to test these scenarios. +1 * I think it would be better to do EXPLAIN with the VERBOSE option to the queries this patch added to the regression tests, to see if ConvertRowtypeExprs are correctly deparsed in the remote queries. The reason VERBOSE option was omitted for partition-wise join was to avoid repeated and excessive output for every child-join. I think that still applies. I'd like to leave that for the committer's judge. PFA patch-set with the fixes. Thanks for updating the patch! I also noticed that the new function deparseConvertRowtypeExpr is not quite following the naming convention of the other deparseFoo functions. Foo is usually the type of the node the parser would produced when the SQL string produced by that function is parsed. That doesn't hold for the SQL string produced by ConvertRowtypeExpr but then naming it as generic deparseRowExpr() wouldn't be correct either. And then there are functions like deparseColumnRef which may produce a variety of SQL strings which get parsed into different node types e.g. a whole-row reference would produce ROW(...) which gets parsed as a RowExpr. Please let me know if you have any suggestions for the name. To be honest, I don't have any strong opinion about that. But I like "deparseConvertRowtypeExpr" because that name seems to well represent the functionality, so I'd vote for that. BTW, one thing I'm a bit concerned about is this: (2018/04/25 18:51), Ashutosh Bapat wrote: > Actually I noticed that ConvertRowtypeExpr are used to cast a child's > whole row reference expression (not just a Var node) into that of its > parent and not. For example a cast like NULL::child::parent produces a > ConvertRowtypeExpr encapsulating a NULL constant node and not a Var > node. We are interested only in ConvertRowtypeExprs embedding Var > nodes with Var::varattno = 0. I have changed this code to use function > is_converted_whole_row_reference() instead of the above code with > Assert. In order to use that function, I had to take it out of > setrefs.c and put it in nodeFuncs.c which seemed to be a better fit. This change seems a bit confusing to me because the flag bits "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a given clause, but really, it only handles ConvertRowtypeExprs you mentioned above. To make that function easy to understand and use, I think it'd be better to use the IsA(node, ConvertRowtypeExpr) test as in the first version of the patch, instead of is_converted_whole_row_reference, which would be more consistent with other cases such as PlaceHolderVar. Best regards, Etsuro Fujita
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujitawrote: > (2018/04/27 14:40), Ashutosh Bapat wrote: >> >> Here's updated patch set. > > > Thanks for updating the patch! Here are my review comments on patch > 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: > > * This assertion in deparseConvertRowtypeExpr wouldn't always be true > because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN > TABLE ADD COLUMN: > > + Assert(parent_desc->natts == child_desc->natts); > > Here is such an example: > > I think we should remove that assertion. Removed. > > * But I don't think that change is enough. Here is another oddity with that > assertion removed. > > To fix this, I think we should skip the deparseColumnRef processing for > dropped columns in the parent table. Done. I have changed the test file a bit to test these scenarios. Thanks for testing. > > * I think it would be better to do EXPLAIN with the VERBOSE option to the > queries this patch added to the regression tests, to see if > ConvertRowtypeExprs are correctly deparsed in the remote queries. The reason VERBOSE option was omitted for partition-wise join was to avoid repeated and excessive output for every child-join. I think that still applies. PFA patch-set with the fixes. I also noticed that the new function deparseConvertRowtypeExpr is not quite following the naming convention of the other deparseFoo functions. Foo is usually the type of the node the parser would produced when the SQL string produced by that function is parsed. That doesn't hold for the SQL string produced by ConvertRowtypeExpr but then naming it as generic deparseRowExpr() wouldn't be correct either. And then there are functions like deparseColumnRef which may produce a variety of SQL strings which get parsed into different node types e.g. a whole-row reference would produce ROW(...) which gets parsed as a RowExpr. Please let me know if you have any suggestions for the name. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company expr_ref_error_pwj_v5.tar.gz Description: GNU Zip compressed data
Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
(2018/04/27 14:40), Ashutosh Bapat wrote: Here's updated patch set. Thanks for updating the patch! Here are my review comments on patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch: * This assertion in deparseConvertRowtypeExpr wouldn't always be true because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN TABLE ADD COLUMN: + Assert(parent_desc->natts == child_desc->natts); Here is such an example: postgres=# create table fprt1 (a int, b int, c varchar) partition by range (a); postgres=# create table fprt1_p1 (like fprt1); postgres=# create table fprt1_p2 (like fprt1); postgres=# create foreign table ftprt1_p1 (a int, b int, c varchar) server loopback options (table_name 'fprt1_p1', use_remote_estimate 'true'); postgres=# alter foreign table ftprt1_p1 drop column c; postgres=# alter foreign table ftprt1_p1 add column c varchar; postgres=# alter table fprt1 attach partition ftprt1_p1 for values from (0) to (250); postgres=# create foreign table ftprt1_p2 partition of fprt1 for values from (250) to (500) server loopback options (table_name 'fprt1_p2', use_remote_estimate 'true'); postgres=# insert into fprt1 select i, i, to_char(i/50, 'FM') from generate_series(0, 499, 2) i; postgres=# analyze fprt1; postgres=# analyze fprt1_p1; postgres=# analyze fprt1_p2; postgres=# create table fprt2 (a int, b int, c varchar) partition by range (b); postgres=# create table fprt2_p1 (like fprt2); postgres=# create table fprt2_p2 (like fprt2); postgres=# create foreign table ftprt2_p1 partition of fprt2 for values from (0) to (250) server loopback options (table_name 'fprt2_p1', use_remote_estimate 'true'); postgres=# create foreign table ftprt2_p2 partition of fprt2 for values from (250) to (500) server loopback options (table_name 'fprt2_p2', use_remote_estimate 'true'); postgres=# insert into fprt2 select i, i, to_char(i/50, 'FM') from generate_series(0, 499, 3) i; postgres=# analyze fprt2; postgres=# analyze fprt2_p1; postgres=# analyze fprt2_p2; postgres=# set enable_partitionwise_join to true; postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) where t1.a % 25 = 0; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I think we should remove that assertion. * But I don't think that change is enough. Here is another oddity with that assertion removed. postgres=# alter table fprt1 detach partition ftprt1_p1; postgres=# alter table fprt1 detach partition ftprt1_p2; postgres=# alter table fprt1 drop column c; postgres=# alter table fprt1 add column c varchar; postgres=# alter table fprt1 attach partition ftprt1_p1 for values from (0) to (250); postgres=# alter table fprt1 attach partition ftprt1_p2 for values from (250) to (500); postgres=# set enable_partitionwise_join to true; postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) where t1.a % 25 = 0; ERROR: malformed record literal: "(300,300,"(300,300,0006)",0006)" DETAIL: Too many columns. CONTEXT: processing expression at position 1 in select list The reason for that would be: in the following, the patch doesn't take care of dropped columns in the parent table (ie, columns such that attrMap[cnt]=0). + /* Construct ROW expression according to the conversion map. */ + appendStringInfoString(buf, "ROW("); + for (cnt = 0; cnt < natts; cnt++) + { + if (cnt > 0) + appendStringInfoString(buf, ", "); + deparseColumnRef(buf, child_var->varno, attrMap[cnt], root, +qualify_col); + } + appendStringInfoChar(buf, ')'); To fix this, I think we should skip the deparseColumnRef processing for dropped columns in the parent table. * I think it would be better to do EXPLAIN with the VERBOSE option to the queries this patch added to the regression tests, to see if ConvertRowtypeExprs are correctly deparsed in the remote queries. That's all I have for now. Best regards, Etsuro Fujita