subject:"Re\: Expression errors with \"FOR UPDATE\" and postgres

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-10-02 Thread Etsuro Fujita


(2018/10/02 16:45), Michael Paquier wrote:

On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote:

I tried to close it as being committed, but couldn't do so, because I
can't find Fujita-san's name in the list of committers in the CF app's
drop down box that lists all committers.


Indeed, Fujita-san has been added to the list.  And I switched this CF
entry at the same time.


Thank you guys!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-10-02 Thread Michael Paquier

On Tue, Oct 02, 2018 at 04:11:54PM +0900, Amit Langote wrote:
> I tried to close it as being committed, but couldn't do so, because I
> can't find Fujita-san's name in the list of committers in the CF app's
> drop down box that lists all committers.

Indeed, Fujita-san has been added to the list.  And I switched this CF
entry at the same time.
--
Michael


signature.asc
Description: PGP signature

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-10-02 Thread Amit Langote

On 2018/08/31 21:40, Etsuro Fujita wrote:
> (2018/08/31 21:30), Jonathan S. Katz wrote:
>> Thank you for taking care of that and for committing the patch. I have
>> now closed this issues on the open items list.
> 
> Thanks!

I noticed that the CF entry for this was not closed.  As of this morning,
it's been moved to the next 2018-11 CF:

https://commitfest.postgresql.org/20/1554/

I tried to close it as being committed, but couldn't do so, because I
can't find Fujita-san's name in the list of committers in the CF app's
drop down box that lists all committers.

Thanks,
Amit

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-31 Thread Etsuro Fujita


(2018/08/31 21:30), Jonathan S. Katz wrote:

On 8/31/18 7:54 AM, Etsuro Fujita wrote:

(2018/08/30 20:25), Etsuro Fujita wrote:

(2018/08/29 18:40), Etsuro Fujita wrote:

(2018/08/29 0:21), Jonathan S. Katz wrote:

On Aug 24, 2018, at 8:38 AM, Etsuro
Fujita  wrote:
(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall
doesn't
work. I don't know the clear cause of that, so I'll investigate
that
tomorrow.


You may be able to tweak that by using https as origin point or
proper
git proxy settings?


Yeah, my proxy settings were not correct. With the help of my
colleagues Horiguchi-san and Yamada-san, I corrected them but still
can't clone the master repository. Running git with GIT_CURL_VERBOSE
shows that there is another issue in my terminal environment, so I'm
trying to resolve that.


Are there any updates on getting this patch committed?


That investigation has shown that the cause is my company firewall, not
my terminal environment; that firewall has to be configured to allow me
to access to that repository. So, I'm asking my company about that.


I got the approval from my boss today, so I think that I can have that
access soon.


I fixed typos in the commit message, which Alvaro pointed out
off-list, and revised the message a little bit.  Also, I reread the
patch and noticed that the latest version includes now-unnecessary
regression test cases; those were added to the previous version
(refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that
apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths()
work for cases where whole-row Vars are involved, because I modified
those functions, but the latest version doesn't touch those functions
anymore, so those test cases seems unnecessary.  So I removed those
(no other changes), and committed the patch.


Thank you for taking care of that and for committing the patch. I have
now closed this issues on the open items list.


Thanks!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-31 Thread Jonathan S. Katz

On 8/31/18 7:54 AM, Etsuro Fujita wrote:
> (2018/08/30 20:25), Etsuro Fujita wrote:
>> (2018/08/29 18:40), Etsuro Fujita wrote:
>>> (2018/08/29 0:21), Jonathan S. Katz wrote:
> On Aug 24, 2018, at 8:38 AM, Etsuro
> Fujita wrote:
> (2018/08/24 11:47), Michael Paquier wrote:
>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>> I tried this today, but doing git behind the corporate firewall
>>> doesn't
>>> work. I don't know the clear cause of that, so I'll investigate
>>> that
>>> tomorrow.
>>
>> You may be able to tweak that by using https as origin point or
>> proper
>> git proxy settings?
>
> Yeah, my proxy settings were not correct. With the help of my
> colleagues Horiguchi-san and Yamada-san, I corrected them but still
> can't clone the master repository. Running git with GIT_CURL_VERBOSE
> shows that there is another issue in my terminal environment, so I'm
> trying to resolve that.

 Are there any updates on getting this patch committed?
>>>
>>> That investigation has shown that the cause is my company firewall, not
>>> my terminal environment; that firewall has to be configured to allow me
>>> to access to that repository. So, I'm asking my company about that.
>>
>> I got the approval from my boss today, so I think that I can have that
>> access soon.
>
> I fixed typos in the commit message, which Alvaro pointed out
> off-list, and revised the message a little bit.  Also, I reread the
> patch and noticed that the latest version includes now-unnecessary
> regression test cases; those were added to the previous version
> (refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that
> apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths()
> work for cases where whole-row Vars are involved, because I modified
> those functions, but the latest version doesn't touch those functions
> anymore, so those test cases seems unnecessary.  So I removed those
> (no other changes), and committed the patch.

Thank you for taking care of that and for committing the patch. I have
now closed this issues on the open items list.

Jonathan




signature.asc
Description: OpenPGP digital signature

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-31 Thread Etsuro Fujita


(2018/08/30 20:25), Etsuro Fujita wrote:

(2018/08/29 18:40), Etsuro Fujita wrote:

(2018/08/29 0:21), Jonathan S. Katz wrote:

On Aug 24, 2018, at 8:38 AM, Etsuro
Fujita wrote:
(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall
doesn't
work. I don't know the clear cause of that, so I'll investigate that
tomorrow.


You may be able to tweak that by using https as origin point or proper
git proxy settings?


Yeah, my proxy settings were not correct. With the help of my
colleagues Horiguchi-san and Yamada-san, I corrected them but still
can't clone the master repository. Running git with GIT_CURL_VERBOSE
shows that there is another issue in my terminal environment, so I'm
trying to resolve that.


Are there any updates on getting this patch committed?


That investigation has shown that the cause is my company firewall, not
my terminal environment; that firewall has to be configured to allow me
to access to that repository. So, I'm asking my company about that.


I got the approval from my boss today, so I think that I can have that
access soon.


I fixed typos in the commit message, which Alvaro pointed out off-list, 
and revised the message a little bit.  Also, I reread the patch and 
noticed that the latest version includes now-unnecessary regression test 
cases; those were added to the previous version 
(refuse-pwj-when-wrvs-involved-2.patch in [1]) to check that 
apply_scanjoin_target_to_paths() and create_ordinary_grouping_paths() 
work for cases where whole-row Vars are involved, because I modified 
those functions, but the latest version doesn't touch those functions 
anymore, so those test cases seems unnecessary.  So I removed those (no 
other changes), and committed the patch.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B683F60.2070806%40lab.ntt.co.jp

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-30 Thread Etsuro Fujita


(2018/08/29 18:40), Etsuro Fujita wrote:

(2018/08/29 0:21), Jonathan S. Katz wrote:

On Aug 24, 2018, at 8:38 AM, Etsuro
Fujita wrote:
(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall
doesn't
work. I don't know the clear cause of that, so I'll investigate that
tomorrow.


You may be able to tweak that by using https as origin point or proper
git proxy settings?


Yeah, my proxy settings were not correct. With the help of my
colleagues Horiguchi-san and Yamada-san, I corrected them but still
can't clone the master repository. Running git with GIT_CURL_VERBOSE
shows that there is another issue in my terminal environment, so I'm
trying to resolve that.


Are there any updates on getting this patch committed?


That investigation has shown that the cause is my company firewall, not
my terminal environment; that firewall has to be configured to allow me
to access to that repository. So, I'm asking my company about that.


I got the approval from my boss today, so I think that I can have that 
access soon.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-29 Thread Etsuro Fujita


(2018/08/29 0:21), Jonathan S. Katz wrote:

On Aug 24, 2018, at 8:38 AM, Etsuro Fujita  wrote:
(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall doesn't
work.  I don't know the clear cause of that, so I'll investigate that
tomorrow.


You may be able to tweak that by using https as origin point or proper
git proxy settings?


Yeah, my proxy settings were not correct.  With the help of my colleagues 
Horiguchi-san and Yamada-san, I corrected them but still can't clone the master 
repository.  Running git with GIT_CURL_VERBOSE shows that there is another 
issue in my terminal environment, so I'm trying to resolve that.


Are there any updates on getting this patch committed?


That investigation has shown that the cause is my company firewall, not 
my terminal environment; that firewall has to be configured to allow me 
to access to that repository.  So, I'm asking my company about that.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-28 Thread Jonathan S. Katz


> On Aug 24, 2018, at 8:38 AM, Etsuro Fujita  
> wrote:
> 
> (2018/08/24 11:47), Michael Paquier wrote:
>> On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
>>> I tried this today, but doing git behind the corporate firewall doesn't
>>> work.  I don't know the clear cause of that, so I'll investigate that
>>> tomorrow.
>> 
>> You may be able to tweak that by using https as origin point or proper
>> git proxy settings?
> 
> Yeah, my proxy settings were not correct.  With the help of my colleagues 
> Horiguchi-san and Yamada-san, I corrected them but still can't clone the 
> master repository.  Running git with GIT_CURL_VERBOSE shows that there is 
> another issue in my terminal environment, so I'm trying to resolve that.

Are there any updates on getting this patch committed?

Thanks,

Jonathan


signature.asc
Description: Message signed with OpenPGP

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-24 Thread Etsuro Fujita


(2018/08/24 11:47), Michael Paquier wrote:

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:

I tried this today, but doing git behind the corporate firewall doesn't
work.  I don't know the clear cause of that, so I'll investigate that
tomorrow.


You may be able to tweak that by using https as origin point or proper
git proxy settings?


Yeah, my proxy settings were not correct.  With the help of my 
colleagues Horiguchi-san and Yamada-san, I corrected them but still 
can't clone the master repository.  Running git with GIT_CURL_VERBOSE 
shows that there is another issue in my terminal environment, so I'm 
trying to resolve that.


Thanks for the advice!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-23 Thread Michael Paquier

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
> I tried this today, but doing git behind the corporate firewall doesn't
> work.  I don't know the clear cause of that, so I'll investigate that
> tomorrow.

You may be able to tweak that by using https as origin point or proper
git proxy settings?
--
Michael


signature.asc
Description: PGP signature

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-23 Thread Etsuro Fujita


(2018/08/22 20:08), Etsuro Fujita wrote:

(2018/08/16 12:00), Etsuro Fujita wrote:

(2018/08/15 23:40), Robert Haas wrote:

Given the comments from the RMT, and also on general principle, it
seems like we really need to get on with committing something here.
It's my understanding you plan to do that, since it's your patch.
When?


I plan to do that late next week as I'll go on leave until next Tuesday.


I added the commit message. Please find attached an updated version of
the patch. Does that make sense? If there are not objections, I'll push
this tomorrow.


I tried this today, but doing git behind the corporate firewall doesn't 
work.  I don't know the clear cause of that, so I'll investigate that 
tomorrow.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-22 Thread Etsuro Fujita


(2018/08/16 12:00), Etsuro Fujita wrote:

(2018/08/15 23:40), Robert Haas wrote:

Given the comments from the RMT, and also on general principle, it
seems like we really need to get on with committing something here.
It's my understanding you plan to do that, since it's your patch.
When?


I plan to do that late next week as I'll go on leave until next Tuesday.


I added the commit message.  Please find attached an updated version of 
the patch.  Does that make sense?  If there are not objections, I'll 
push this tomorrow.


Best regards,
Etsuro Fujita
>From e8e5fa32984a51e73831529b055bddbed03feaa1 Mon Sep 17 00:00:00 2001
From: Etsuro Fujita 
Date: Wed, 22 Aug 2018 19:59:01 +0900
Subject: [PATCH] Disable support for partitionwise joins in problematic
 cases.

Commit f49842d1ee31b976c681322f76025d7732e860f3, which added support for
partitionwise joins, built the child's tlist by applying
adjust_appendrel_attrs() to the parent's.  So in the case where the parent
tlist included a whole-row Var for the parent, the child tlist contained a
ConverRowtypeExpr.  That commit added code to handle that to the planner,
but some code paths still made the old assumption that a scan or join
rel's would only include Vars and PlaceHolderVars, causing the following
errors:

* When creating an explicit sort node for an input path for building a
  mergejoin plan for a child join, prepare_sort_from_pathkeys() throws the
  'could not find pathkey item to sort' error.
* When deparsing a relation participating a pushed down foreign child join
  as a subquery in contrib/postgres_fdw, get_relation_column_alias_ids()
  throws the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan created by
  contrib/postgres_fdw for EvalPlanQual support, fix_join_expr() throws
  the 'variable not found in subplan target lists' error.

To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me.  While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child and tries to fix it up later.  But both approaches need more work,
so refuse to generate partitionwise join paths when whole-row Vars are
involved, instead.  We don't need to handle ConvertRowtypeExprs in the
child's tlists for now, so this commit also removes the changes to the
planner, such as setrefs.c.

Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlists using that data, which also
required an extra step for adding PlaceHolderVars to the child join's
tlists, but it would be more efficient to build the child join's tlist by
applying adjust_appendrel_attrs() to the parent join's.  So this commit
also changes it that way, and simplifies build_joinrel_tlist() and
placeholder.c as well as part of set_append_rel_size() to basically what
they were before partitionwise join went in.

Back-patch to PG11 where partitionwise join was introduced.

Report by Rajkumar Raghuwanshi.  Analysis by Ashutosh Bapat, who also
provided some of regression tests.  Patch by me, reviewed by Robert Haas.

Discussion: https://postgr.es/m/cakcux6ktu-8teflwtquuzbyfaza83vuzurd7c1yhc-yewyy...@mail.gmail.com
---
 contrib/postgres_fdw/expected/postgres_fdw.out|   71 ++---
 contrib/postgres_fdw/sql/postgres_fdw.sql |   14 ++-
 src/backend/nodes/outfuncs.c  |1 +
 src/backend/optimizer/path/allpaths.c |   78 +
 src/backend/optimizer/path/joinrels.c |7 +
 src/backend/optimizer/plan/setrefs.c  |   58 +-
 src/backend/optimizer/util/placeholder.c  |   58 --
 src/backend/optimizer/util/relnode.c  |  125 +++--
 src/include/nodes/relation.h  |8 +-
 src/test/regress/expected/partition_aggregate.out |   84 ++
 src/test/regress/expected/partition_join.out  |   93 +---
 src/test/regress/sql/partition_aggregate.sql  |   10 ++
 src/test/regress/sql/partition_join.sql   |   14 ++-
 13 files changed, 365 insertions(+), 256 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d912bd9..21a2ef5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8337,8 +8337,9 @@ ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
 ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
 INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
 INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
-CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
+CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
 	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ALTER

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-15 Thread Etsuro Fujita


(2018/08/15 23:40), Robert Haas wrote:

Given the comments from the RMT, and also on general principle, it
seems like we really need to get on with committing something here.
It's my understanding you plan to do that, since it's your patch.
When?


I plan to do that late next week as I'll go on leave until next Tuesday.

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-15 Thread Robert Haas

On Wed, Aug 15, 2018 at 7:35 AM, Etsuro Fujita
 wrote:
> Thanks for the comments, Robert!

Given the comments from the RMT, and also on general principle, it
seems like we really need to get on with committing something here.
It's my understanding you plan to do that, since it's your patch.
When?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-15 Thread Etsuro Fujita


(2018/08/15 13:04), Amit Langote wrote:

On 2018/08/15 12:25, Etsuro Fujita wrote:

(2018/08/15 0:51), Robert Haas wrote:

On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
   wrote:

One thing I noticed might be an improvement is to skip
build_joinrel_partition_info if the given joinrel will be to have
consider_partitionwise_join=false; in the previous patch, that function
created the joinrel's partition info such as part_scheme and part_rels if
the joinrel is considered as partitioned, independently of the flag
consider_partitionwise_join for it, but if that flag is false, we don't
generate PWJ paths for the joinrel, so we would not need to create that
partition info at all.  This would not only avoid unnecessary
processing in
that function, but also make unnecessary the changes I made to
try_partitionwise_join, generate_partitionwise_join_paths,
apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
updated the patch that way.  Please find attached an updated version of
the
patch.


I guess the question is whether there are (or might be in the future)
other dependencies on part_scheme.  For example, it looks like
partition pruning uses it.  I'm not sure whether partition pruning
supports a plan like:

Append
->   Nested Loop
->   Seq Scan on p1
->   Index Scan on q1



I'm not sure that either, but if a join relation doesn't have part_scheme
set, it means that that relation is considered as non-partitioned, as in
the case when enable_partitionwise_join is off, so there would be no PWJ
paths generated for it, to begin with.  So in that case, ISTM that we
don't need to worry about that at least for partition pruning.


Fwiw, partition pruning works only for base rels, which applies to both
planning-time pruning (pruning is performed only during base rel size
estimation) and run-time pruning (we'll add pruning info to the Append
plan only if the source AppendPath's parent rel is a base rel).


Thanks for that, Amit!

I looked into the question for the join or upper rel case, but couldn't 
find any places in the PG11 code where we assume that a join or upper 
rel has non-NULL part_scheme, as described below:


* Both try_partitionwise_join() and generate_partitionwise_join_paths() 
check whether a join rel to be considered has non-NULL part_scheme, 
through the IS_PARTITIONED_REL macro:


#define IS_PARTITIONED_REL(rel) \
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
 (rel)->part_rels && !(IS_DUMMY_REL(rel)))

If IS_PARTITIONED_REL, the former adds paths to child-joins, and the 
latter builds PWJ paths; but both don't assume non-NULL part_scheme.


* apply_scanjoin_target_to_paths() checks whether the topmost scan/join 
rel has non-NULL part_scheme, and if so, it recursively adjusts all 
partitions; it doesn't assume non-NULL part_scheme.


* create_ordinary_grouping_paths() also checks whether the topmost 
scan/join rel adjusted by apply_scanjoin_target_to_paths() has non-NULL 
part_scheme, and if so, it considers PWA paths; it doesn't assume 
non-NULL part_scheme either.


* add_paths_to_append_rel(), which is called from each of the above (ie, 
generate_partitionwise_join_paths(), apply_scanjoin_target_to_paths(), 
and create_partitionwise_grouping_paths() in 
create_ordinary_grouping_paths() for creating PWJ paths, adjusting PWJ 
paths, and creating PWA paths respectively) also checks whether a rel to 
be considered has non-NULL part_scheme, but never assumes that.  And 
actually, if the rel's part_scheme is NULL, add_paths_to_append_rel() 
wouldn't be called for the rel, because of the reason described above.


Thanks for the comments, Robert!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-14 Thread Amit Langote

On 2018/08/15 12:25, Etsuro Fujita wrote:
> (2018/08/15 0:51), Robert Haas wrote:
>> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
>>   wrote:
>>> One thing I noticed might be an improvement is to skip
>>> build_joinrel_partition_info if the given joinrel will be to have
>>> consider_partitionwise_join=false; in the previous patch, that function
>>> created the joinrel's partition info such as part_scheme and part_rels if
>>> the joinrel is considered as partitioned, independently of the flag
>>> consider_partitionwise_join for it, but if that flag is false, we don't
>>> generate PWJ paths for the joinrel, so we would not need to create that
>>> partition info at all.  This would not only avoid unnecessary
>>> processing in
>>> that function, but also make unnecessary the changes I made to
>>> try_partitionwise_join, generate_partitionwise_join_paths,
>>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
>>> updated the patch that way.  Please find attached an updated version of
>>> the
>>> patch.
>>
>> I guess the question is whether there are (or might be in the future)
>> other dependencies on part_scheme.  For example, it looks like
>> partition pruning uses it.  I'm not sure whether partition pruning
>> supports a plan like:
>>
>> Append
>> ->  Nested Loop
>>    ->  Seq Scan on p1
>>    ->  Index Scan on q1
>> 
> 
> I'm not sure that either, but if a join relation doesn't have part_scheme
> set, it means that that relation is considered as non-partitioned, as in
> the case when enable_partitionwise_join is off, so there would be no PWJ
> paths generated for it, to begin with.  So in that case, ISTM that we
> don't need to worry about that at least for partition pruning.

Fwiw, partition pruning works only for base rels, which applies to both
planning-time pruning (pruning is performed only during base rel size
estimation) and run-time pruning (we'll add pruning info to the Append
plan only if the source AppendPath's parent rel is a base rel).

Thanks,
Amit

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-14 Thread Etsuro Fujita


(2018/08/15 0:51), Robert Haas wrote:

On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
  wrote:

One thing I noticed might be an improvement is to skip
build_joinrel_partition_info if the given joinrel will be to have
consider_partitionwise_join=false; in the previous patch, that function
created the joinrel's partition info such as part_scheme and part_rels if
the joinrel is considered as partitioned, independently of the flag
consider_partitionwise_join for it, but if that flag is false, we don't
generate PWJ paths for the joinrel, so we would not need to create that
partition info at all.  This would not only avoid unnecessary processing in
that function, but also make unnecessary the changes I made to
try_partitionwise_join, generate_partitionwise_join_paths,
apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
updated the patch that way.  Please find attached an updated version of the
patch.


I guess the question is whether there are (or might be in the future)
other dependencies on part_scheme.  For example, it looks like
partition pruning uses it.  I'm not sure whether partition pruning
supports a plan like:

Append
->  Nested Loop
   ->  Seq Scan on p1
   ->  Index Scan on q1



I'm not sure that either, but if a join relation doesn't have 
part_scheme set, it means that that relation is considered as 
non-partitioned, as in the case when enable_partitionwise_join is off, 
so there would be no PWJ paths generated for it, to begin with.  So in 
that case, ISTM that we don't need to worry about that at least for 
partition pruning.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-14 Thread Jonathan S. Katz

> On Aug 14, 2018, at 11:51 AM, Robert Haas  wrote:
> 
> On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
>  wrote:
>> One thing I noticed might be an improvement is to skip
>> build_joinrel_partition_info if the given joinrel will be to have
>> consider_partitionwise_join=false; in the previous patch, that function
>> created the joinrel's partition info such as part_scheme and part_rels if
>> the joinrel is considered as partitioned, independently of the flag
>> consider_partitionwise_join for it, but if that flag is false, we don't
>> generate PWJ paths for the joinrel, so we would not need to create that
>> partition info at all.  This would not only avoid unnecessary processing in
>> that function, but also make unnecessary the changes I made to
>> try_partitionwise_join, generate_partitionwise_join_paths,
>> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
>> updated the patch that way.  Please find attached an updated version of the
>> patch.
> 
> I guess the question is whether there are (or might be in the future)
> other dependencies on part_scheme.  For example, it looks like
> partition pruning uses it.  I'm not sure whether partition pruning
> supports a plan like:
> 
> Append
> -> Nested Loop
>  -> Seq Scan on p1
>  -> Index Scan on q1
> 
> 
> If it doesn't, that's just an implementation restriction; somebody
> might want to fix things so it works, if it doesn't already.

While we (the RMT) are happy to see there has been progress on this thread,
11 Beta 3 was released containing this problem and the time to adequately
test this feature prior to GA release is rapidly dwindling.

The RMT would like to see this prioritized and resolved. At this point we have
considered the option of having partitionwise joins completely disabled in
PostgreSQL 11. This is preferably not the path we want to choose, but this
option is now on the table and we will issue an ultimatum soon if we do not see
further progress.

In the previous discussion, the generally accepted solution for PostgreSQL 11
seemed to be to disable the problematic case and any further work would be for
PostgreSQL 12 only. If something is different, please indicate why ASAP and
work to resolve. The RMT will also conduct some additional technical analysis
as well.

Sincerely,

Alvaro, Andres, Jonathan

signature.asc
Description: Message signed with OpenPGP

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-14 Thread Robert Haas

On Mon, Aug 13, 2018 at 12:32 PM, Etsuro Fujita
 wrote:
> One thing I noticed might be an improvement is to skip
> build_joinrel_partition_info if the given joinrel will be to have
> consider_partitionwise_join=false; in the previous patch, that function
> created the joinrel's partition info such as part_scheme and part_rels if
> the joinrel is considered as partitioned, independently of the flag
> consider_partitionwise_join for it, but if that flag is false, we don't
> generate PWJ paths for the joinrel, so we would not need to create that
> partition info at all.  This would not only avoid unnecessary processing in
> that function, but also make unnecessary the changes I made to
> try_partitionwise_join, generate_partitionwise_join_paths,
> apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So I
> updated the patch that way.  Please find attached an updated version of the
> patch.

I guess the question is whether there are (or might be in the future)
other dependencies on part_scheme.  For example, it looks like
partition pruning uses it.  I'm not sure whether partition pruning
supports a plan like:

Append
-> Nested Loop
  -> Seq Scan on p1
  -> Index Scan on q1


If it doesn't, that's just an implementation restriction; somebody
might want to fix things so it works, if it doesn't already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-13 Thread Etsuro Fujita


(2018/08/13 11:57), Robert Haas wrote:

On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita
  wrote:

In the above I used the test whether the relation's reloptkind is
RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a
multi-level partitioned table.  So I fixed that and added regression test
cases for that.  I also revised comments a bit.  Attached is an updated
version of the patch.


+   /* If so, consider partitionwise joins for that join. */
+   if (IS_PARTITIONED_REL(joinrel))
+   joinrel->consider_partitionwise_join = true;

Maybe this should assert that the inner and outer rels have
consider_partitionwise_join set.  There is an Assert quite a bit
earlier in the function that the parent join have it set, but I think
it might make sense to check the children have it set whenever we set
the flag.


Agreed.  Done.

One thing I noticed might be an improvement is to skip 
build_joinrel_partition_info if the given joinrel will be to have 
consider_partitionwise_join=false; in the previous patch, that function 
created the joinrel's partition info such as part_scheme and part_rels 
if the joinrel is considered as partitioned, independently of the flag 
consider_partitionwise_join for it, but if that flag is false, we don't 
generate PWJ paths for the joinrel, so we would not need to create that 
partition info at all.  This would not only avoid unnecessary processing 
in that function, but also make unnecessary the changes I made to 
try_partitionwise_join, generate_partitionwise_join_paths, 
apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So 
I updated the patch that way.  Please find attached an updated version 
of the patch.


Thanks for the review!

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8337,8344  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8337,8345 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8389,8416  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
   8 |   | 
  (5 rows)
  
! -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
!->  Append
!  ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
!  ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
! (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8390,8431 
   8 |   | 
  (5 rows)
  
! -- with whole-row reference; partitionwise join does not apply
  EXPLAIN (COSTS OFF)
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!QUERY PLAN   
! 
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
!->  Hash Full Join
!  Hash Cond: (t1.a = t2.b)
!  ->  Append
!->  Foreign Scan on

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-12 Thread Robert Haas

On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita
 wrote:
> In the above I used the test whether the relation's reloptkind is
> RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a
> multi-level partitioned table.  So I fixed that and added regression test
> cases for that.  I also revised comments a bit.  Attached is an updated
> version of the patch.

+   /* If so, consider partitionwise joins for that join. */
+   if (IS_PARTITIONED_REL(joinrel))
+   joinrel->consider_partitionwise_join = true;

Maybe this should assert that the inner and outer rels have
consider_partitionwise_join set.  There is an Assert quite a bit
earlier in the function that the parent join have it set, but I think
it might make sense to check the children have it set whenever we set
the flag.

Aside from that I don't really have any suggestions on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-06 Thread Etsuro Fujita


(2018/08/03 22:28), Etsuro Fujita wrote:

(2018/08/03 22:18), Etsuro Fujita wrote:

Here is a patch for refusing to generate PWJ paths when WRVs are
involved:

* We no longer need to handle WRVs, so I've simplified
build_joinrel_tlist() and setrefs.c to what they were before
partition-wise join went in, as in the previous patch.

* attr_needed data for each child is used for building child-joins'
tlists, but I think we can build those by applying
adjust_appendrel_attrs to the parent's tlists, without attr_needed. So,
I've also removed that as in the previous patch.


One thing to add: as for the latter, we don't need the changes to
placeholder.c either, so I've also simplified that to what they were
before partition-wise join went in.


*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
***
*** 3960,3965  create_ordinary_grouping_paths(PlannerInfo *root, 
RelOptInfo *input_rel,

--- 3960,3967 
 */
if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
input_rel->part_scheme && input_rel->part_rels &&
+   (input_rel->reloptkind == RELOPT_BASEREL ||
+input_rel->consider_partitionwise_join) &&
!IS_DUMMY_REL(input_rel))
{
/*
***
*** 6913,6919  apply_scanjoin_target_to_paths(PlannerInfo *root,
 * projection-capable, that might save a separate Result node, and 
it also

 * is important for partitionwise aggregate.
 */
!   if (rel->part_scheme && rel->part_rels)
{
int partition_idx;
List   *live_children = NIL;
--- 6915,6923 
 * projection-capable, that might save a separate Result node, and 
it also

 * is important for partitionwise aggregate.
 */
!   if (rel->part_scheme && rel->part_rels &&
!   (rel->reloptkind == RELOPT_BASEREL ||
!rel->consider_partitionwise_join))
{
int partition_idx;
List   *live_children = NIL;

In the above I used the test whether the relation's reloptkind is 
RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a 
multi-level partitioned table.  So I fixed that and added regression 
test cases for that.  I also revised comments a bit.  Attached is an 
updated version of the patch.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8337,8344  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8337,8345 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8389,8416  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
   8 |   | 
  (5 rows)
  
! -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
!->  Append
!  ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
!  ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
! (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8390,8431 
   8 |   | 
  (5 rows)
  
! -- with whole-row

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Robert Haas

On Fri, Aug 3, 2018 at 10:36 AM, Tom Lane  wrote:
> Anyway, what I'm basically suggesting is that we just disable support for
> PWJ in the problematic cases in v11.  As long as PWJ isn't even on by
> default, that's not much of a loss.  Obviously we'll want to fix it in the
> future, but the hour grows late for v11, and I think either of these
> patches would need quite a bit more work to be committable.

That's not my first choice, but I'm OK with it.

>> There are definitely some things not to like about this approach.  In
>> particular, I definitely agree that treating a converted whole-row
>> reference specially is not very nice.  It feels like it might be
>> substantially cleaner to be able to represent this idea as a single
>> node rather than a combination of ConvertRowtypeExpr and var with
>> varattno = 0.  Perhaps in the future we ought to consider either (a)
>> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
>> inventing a new WholeRowExpr node that stores two RTIs, one for the
>> relation that's generating the whole-row reference and the other for
>> the relation that is controlling the column ordering of the result or
>> (c) allowing a Var to represent a whole-row expression that has to
>> produce its outputs according to the ordering of some other RTE.  But
>> I don't think it's wise or necessary to whack that around just to fix
>> this bug; it is refactoring or improvement work best left to a future
>> release.
>
> I agree with all of that except your conclusion that it's unnecessary
> to do it to fix this bug.  I find that entirely unsupported and over-
> optimistic, especially in view of the number of iterations Ashutosh
> went through trying to fix the fallout from not making a clear
> distinction.

I don't think that the question of how a converted whole-row expr can
really make a difference between being able to fix this bug and not
being able to fix this bug.  It's just a representational choice.  If
we decide that a converted whole-row expr is represented as
ConvertRowTypeExpr on top of a Var, then we have to check for that.
If we revise the representation to use a Var node directly, or to use
a new WholeRowVar node, then we'll just need to check for those things
instead.  I think that the shape of the fix itself does not change.
Moreover, it's pretty much a 1:1 mapping.  It's not like, say,
replacing plans with paths in the whole upper half of the planner,
where the representation is different enough that things that would
have been very difficult to manage in the old representation become
relatively simple in the new one.  It's basically just a question of
how you spell it.  Writing a test for ConvertRowTypeExpr atop Var and
using it a bunch of places is not beautiful, and it's certainly going
to be marginally slower than a direct test for WholeRowVar or some
other representation that bundles the whole thing into one node, but I
think it should work just fine, and the representation can be revised
later in a separate patch once we agree on an approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Tom Lane

Robert Haas  writes:
> On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane  wrote:
>> As far as I can see from the example that started this thread,
>> postgres_fdw injects WRVs into a PWJ whether or not the query involves
>> FOR UPDATE; that's why this bug is reproducible in a query without FOR
>> UPDATE.  But we shouldn't need any EPQ support in that case.

> I might be missing something, but the original reproducer involves a
> FOR UPDATE clause, and the expected regression test output in
> postgres_fdw.out appears to show a whole-row var in the foreign scan's
> target list only in those examples where a locking clause is present.

Oh, sigh, never mind that --- I was thinking that only the first of
the two example queries involved FOR UPDATE, but they both do.
So no mystery there after all.

Anyway, what I'm basically suggesting is that we just disable support for
PWJ in the problematic cases in v11.  As long as PWJ isn't even on by
default, that's not much of a loss.  Obviously we'll want to fix it in the
future, but the hour grows late for v11, and I think either of these
patches would need quite a bit more work to be committable.

BTW, I forgot to respond to this:

>> I do not trust Ashutosh's patch because of the point you noted that it
>> will kick in on ConvertRowtypeExprs that are not related to partitioning.
> I don't remember making that point -- can you clarify?

I'm looking at
https://www.postgresql.org/message-id/CA%2BTgmoZSaKq-fYALn5jf6c_X3%3D%3DRb2s8eqLDwGpV%3DLNNhTXYwg%40mail.gmail.com
wherein you said

> There are definitely some things not to like about this approach.  In
> particular, I definitely agree that treating a converted whole-row
> reference specially is not very nice.  It feels like it might be
> substantially cleaner to be able to represent this idea as a single
> node rather than a combination of ConvertRowtypeExpr and var with
> varattno = 0.  Perhaps in the future we ought to consider either (a)
> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
> inventing a new WholeRowExpr node that stores two RTIs, one for the
> relation that's generating the whole-row reference and the other for
> the relation that is controlling the column ordering of the result or
> (c) allowing a Var to represent a whole-row expression that has to
> produce its outputs according to the ordering of some other RTE.  But
> I don't think it's wise or necessary to whack that around just to fix
> this bug; it is refactoring or improvement work best left to a future
> release.

I agree with all of that except your conclusion that it's unnecessary
to do it to fix this bug.  I find that entirely unsupported and over-
optimistic, especially in view of the number of iterations Ashutosh
went through trying to fix the fallout from not making a clear
distinction.

regards, tom lane

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Robert Haas

On Fri, Aug 3, 2018 at 10:07 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane  wrote:
>>> Now, that's a bit of a problem for postgres_fdw, because it seems to
>>> insist on injecting WRVs even when the query text does not require any.
>>> Why is that, and can't we get rid of it?
>
>> I don't quite know what you mean here -- postgres_fdw does use
>> whole-row vars for EPQ handling, which may be what you're thinking
>> about.
>
> As far as I can see from the example that started this thread,
> postgres_fdw injects WRVs into a PWJ whether or not the query involves
> FOR UPDATE; that's why this bug is reproducible in a query without FOR
> UPDATE.  But we shouldn't need any EPQ support in that case.

I might be missing something, but the original reproducer involves a
FOR UPDATE clause, and the expected regression test output in
postgres_fdw.out appears to show a whole-row var in the foreign scan's
target list only in those examples where a locking clause is present.

>> Honestly, I'm pretty impressed that we have added not one but two
>> members to the RelOptKind enum without as little collateral damage as
>> there has been.
>
> Color me a bit more skeptical about the bug density in that, given
> that enable_partitionwise_join is off by default; that means you're
> not getting a lot of testing.

You're hard to please.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Tom Lane

Robert Haas  writes:
> On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane  wrote:
>> Now, that's a bit of a problem for postgres_fdw, because it seems to
>> insist on injecting WRVs even when the query text does not require any.
>> Why is that, and can't we get rid of it?

> I don't quite know what you mean here -- postgres_fdw does use
> whole-row vars for EPQ handling, which may be what you're thinking
> about.

As far as I can see from the example that started this thread,
postgres_fdw injects WRVs into a PWJ whether or not the query involves
FOR UPDATE; that's why this bug is reproducible in a query without FOR
UPDATE.  But we shouldn't need any EPQ support in that case.

> Honestly, I'm pretty impressed that we have added not one but two
> members to the RelOptKind enum without as little collateral damage as
> there has been.

Color me a bit more skeptical about the bug density in that, given
that enable_partitionwise_join is off by default; that means you're
not getting a lot of testing.

regards, tom lane

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Robert Haas

On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane  wrote:
> I do not trust Ashutosh's patch because of the point you noted that it
> will kick in on ConvertRowtypeExprs that are not related to partitioning.

I don't remember making that point -- can you clarify?

> It's also got a rather fundamental conceptual (or at least documentation)
> error in that it says it's making pull_var_clause do certain things to
> all ConvertRowtypeExprs, but that's not what the code actually does.
> I think the tension around that has a lot to do with the fact that the
> patch went through so many revisions, and I don't have any faith that
> it's right even yet.  As you mentioned upthread, this might be insoluble
> without some representational change for converted whole-row Vars.

Insoluble is a strong word

> What I'm thinking might be a more appropriate thing, at least for
> getting v11 out the door, is to refuse to generate partitionwise
> joins when any whole-row vars are involved.  (Perhaps that's not
> enough to dodge all the problems, though?)

It's enough to dodge the problem being discussed on this thread.

> Now, that's a bit of a problem for postgres_fdw, because it seems to
> insist on injecting WRVs even when the query text does not require any.
> Why is that, and can't we get rid of it?  It's horrid for performance
> even without any of these other considerations.  But if we fail to get
> rid of that in time for v11, it would mean that postgres_fdw can't
> participate in PWJ, which isn't great but I think we could live with it
> for now.

I don't quite know what you mean here -- postgres_fdw does use
whole-row vars for EPQ handling, which may be what you're thinking
about.

> BTW, what exactly do we think the production status of PWJ is, anyway?
> I notice that upthread it was stated that enable_partitionwise_join
> defaults to off, as indeed it still does, and we'd turn it on later
> when we'd gotten rid of some memory-hogging problems.  If that hasn't
> happened yet (and I don't see any open item about considering enabling
> PWJ by default for v11), then I have exactly no hesitation about
> lobotomizing PWJ as hard as we need to to mask this problem for v11.
> I'd certainly prefer a solution along those lines to either of these
> patches.

Yeah, that hasn't been done yet.  Partition-wise join probably needs a
good bit of work in a number of areas to do all the things that people
will want it to do -- see the thread on advanced partition-matching,
which is an improvement but still doesn't cover every case that could
come up.  In terms of memory usage, I think that we need some
discussion of the approach that should be taken.  I see a couple of
possible things we could do, not necessarily mutually exclusive.

1. We could try to avoid translating all of the parent's data
structures for every child RelOptInfo, either by rejiggering things so
that the child doesn't need all of that data, or by making it able to
use the parent's copy of the data.

2. We could try to recycle memory more quickly.  For example, if we're
planning a partition-wise join of A-B-C-D, first plan paths for
A1-B1-C1-D1 (and all proper subsets of those rels), then throw away
most of the memory, then plan paths for A2-B2-C2-D2, etc.

3. We could generate paths for on "representative" children (e.g. the
biggest ones) and use that to estimate the cost of the partition-wise
plan.  If that plan is selected, then go back and figure out real
paths for the other children.

All of these things help in different ways, and Ashutosh had code for
some version of all of them at various points, but the problem is
quite difficult.  If you have three tables with 1000 partitions each,
then without partition-wise join you need 3000 (partitions) + 3
(baserels) + 3 (level-2 joinrels) + 1 (level-3 joinrel) RelOptInfos.
If you do a partition-wise join, then you get 3000 level-2
child-joinrels and 1000 level-3 child joinrels.  Those RelOptInfo
structures, and the paths attached to them, cannot help but take up
memory.  Perhaps that's OK, and we ought to just say "well, if you
want better plans, you're going to have to allow more memory for
planning".  If not, we have to decide how hard we want to work in
which areas and how good the result needs to be in order to have this
turned on by difficult.

Honestly, I'm pretty impressed that we have added not one but two
members to the RelOptKind enum without as little collateral damage as
there has been.  This issue is annoying and the discussion has gone on
longer than anyone would probably prefer, but it's really pretty minor
in the grand scheme of things, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Etsuro Fujita


(2018/08/03 22:18), Etsuro Fujita wrote:

Here is a patch for refusing to generate PWJ paths when WRVs are involved:

* We no longer need to handle WRVs, so I've simplified
build_joinrel_tlist() and setrefs.c to what they were before
partition-wise join went in, as in the previous patch.

* attr_needed data for each child is used for building child-joins'
tlists, but I think we can build those by applying
adjust_appendrel_attrs to the parent's tlists, without attr_needed. So,
I've also removed that as in the previous patch.


One thing to add: as for the latter, we don't need the changes to 
placeholder.c either, so I've also simplified that to what they were 
before partition-wise join went in.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-03 Thread Etsuro Fujita


(2018/08/03 5:25), Tom Lane wrote:

What I'm thinking might be a more appropriate thing, at least for
getting v11 out the door, is to refuse to generate partitionwise
joins when any whole-row vars are involved.


Agreed.


(Perhaps that's not
enough to dodge all the problems, though?)

Now, that's a bit of a problem for postgres_fdw, because it seems to
insist on injecting WRVs even when the query text does not require any.
Why is that, and can't we get rid of it?  It's horrid for performance
even without any of these other considerations.  But if we fail to get
rid of that in time for v11, it would mean that postgres_fdw can't
participate in PWJ, which isn't great but I think we could live with it
for now.


Sorry, I don't understand this.  Could you elaborate on that a bit more?


BTW, what exactly do we think the production status of PWJ is, anyway?
I notice that upthread it was stated that enable_partitionwise_join
defaults to off, as indeed it still does, and we'd turn it on later
when we'd gotten rid of some memory-hogging problems.  If that hasn't
happened yet (and I don't see any open item about considering enabling
PWJ by default for v11), then I have exactly no hesitation about
lobotomizing PWJ as hard as we need to to mask this problem for v11.


That hasn't happened yet; I think we left that for PG12, IIRC.

Here is a patch for refusing to generate PWJ paths when WRVs are involved:

* We no longer need to handle WRVs, so I've simplified 
build_joinrel_tlist() and setrefs.c to what they were before 
partition-wise join went in, as in the previous patch.


* attr_needed data for each child is used for building child-joins' 
tlists, but I think we can build those by applying 
adjust_appendrel_attrs to the parent's tlists, without attr_needed.  So, 
I've also removed that as in the previous patch.


Maybe I'm missing something, though.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8337,8344  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8337,8345 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8389,8416  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
   8 |   | 
  (5 rows)
  
! -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
!->  Append
!  ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
!  ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
! (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8390,8431 
   8 |   | 
  (5 rows)
  
! -- with whole-row reference; partitionwise join does not apply
  EXPLAIN (COSTS OFF)
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!QUERY PLAN   
! 
   Sort
 Sort Key: ((t1.*)::fprt1),

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-02 Thread Tom Lane

Robert Haas  writes:
> Does anyone else want to weigh in on this?  It seems to me that there
> are several people who are quite willing to complain about the fact
> that this hasn't been fixed, but not willing to express an opinion
> about the shape of the fix.  Either the RMT needs to take executive
> action, or we need some more votes.

[ reads thread ... ]

Well, my vote is that I've got zero faith in either of these patches.

I do not trust Ashutosh's patch because of the point you noted that it
will kick in on ConvertRowtypeExprs that are not related to partitioning.
It's also got a rather fundamental conceptual (or at least documentation)
error in that it says it's making pull_var_clause do certain things to
all ConvertRowtypeExprs, but that's not what the code actually does.
I think the tension around that has a lot to do with the fact that the
patch went through so many revisions, and I don't have any faith that
it's right even yet.  As you mentioned upthread, this might be insoluble
without some representational change for converted whole-row Vars.

As for Etsuro-san's patch, I don't like that for the same reasons you
gave.  Also, I note that back towards the beginning of July it was
said that that patch depends on the assumption of no Gather below
an Append.  That's broken *now*, not in some hypothetical future.
(See the nearby "FailedAssertion on partprune" thread, wherein we're
trying to fix the planner's handling of exactly such plans.)

What I'm thinking might be a more appropriate thing, at least for
getting v11 out the door, is to refuse to generate partitionwise
joins when any whole-row vars are involved.  (Perhaps that's not
enough to dodge all the problems, though?)

Now, that's a bit of a problem for postgres_fdw, because it seems to
insist on injecting WRVs even when the query text does not require any.
Why is that, and can't we get rid of it?  It's horrid for performance
even without any of these other considerations.  But if we fail to get
rid of that in time for v11, it would mean that postgres_fdw can't
participate in PWJ, which isn't great but I think we could live with it
for now.

BTW, what exactly do we think the production status of PWJ is, anyway?
I notice that upthread it was stated that enable_partitionwise_join
defaults to off, as indeed it still does, and we'd turn it on later
when we'd gotten rid of some memory-hogging problems.  If that hasn't
happened yet (and I don't see any open item about considering enabling
PWJ by default for v11), then I have exactly no hesitation about
lobotomizing PWJ as hard as we need to to mask this problem for v11.
I'd certainly prefer a solution along those lines to either of these
patches.

regards, tom lane

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-02 Thread Andres Freund

Hi,

On 2018-08-02 15:19:49 -0400, Robert Haas wrote:
> Does anyone else want to weigh in on this?  It seems to me that there
> are several people who are quite willing to complain about the fact
> that this hasn't been fixed, but not willing to express an opinion
> about the shape of the fix.  Either the RMT needs to take executive
> action, or we need some more votes.

(taking my RMT hat off, as it's not been coordinated)

Well, I think it's enough in the weeds that not too many people are
going to have an informed opinion on this, and I'm not sure uninformed
opinions help. But I do think the fact that one of the authors and the
committer are on one side, and Tom seems to tentatively agree, weighs
quite a bit here.  As far as I can tell, it's "just" Etsuro on the other
side - obviously his opinion counts, but it doesn't outweigh others.  I
think if you feel confident, you should be ok just commit the approach
you favor, and I read you as being confident that that's the better
approach.

Greetings,

Andres Freund

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-02 Thread Robert Haas

On Thu, Aug 2, 2018 at 7:20 AM, Etsuro Fujita
 wrote:
> The new approach seems to me more localized than what Ashutosh had proposed.
> One obvious advantage of the new approach is that we no longer need changes
> to postgres_fdw for it to work for partitionwise joins, which would apply
> for any other FDWs, making the FDW authors' life easy.

Well, I don't know what to say here expect that I don't agree.  I
think Ashutosh's approach is a pretty natural extension of the system
that we have now.  It involves needing to handle converted whole row
vars in some new places, most of which were handled in the original
patch, but postgres_fdw was missed.  Your approach involves changing
the meaning of the target list, but only in narrow corner cases.  I
completely disagree that we can say it's less invasive.  It may indeed
be less work for extension authors, though, though perhaps at the
expense of moving the conversion from the remote server to the local
one.

>> But in general, with your approach, any code that
>> looks at the tlist for a child-join has to know that the tlist is to
>> be used as-is *except that* ConvertRowtypeExpr may need to be
>> inserted.  I think that special case could be easy to overlook, and it
>> seems pretty arbitrary.
>
> I think we can check to see whether the conversion is needed, from the flags
> added to RelOptInfo.

Sure, I don't dispute that it can be made to work.  I just think it's
an ugly kludge.

> I have to admit that it's weird to adjust the child's targetlist in that
> case when putting the child under the parent's Append/MergeAppend, but IMHO
> I think that would be much localized.

Yeah, I just don't agree at all.

Does anyone else want to weigh in on this?  It seems to me that there
are several people who are quite willing to complain about the fact
that this hasn't been fixed, but not willing to express an opinion
about the shape of the fix.  Either the RMT needs to take executive
action, or we need some more votes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-02 Thread Etsuro Fujita


(2018/08/02 4:30), Robert Haas wrote:

On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita
  wrote:

I updated the patch that way.  Updated patch attached.  I fixed a bug and
did a bit of cleanups as well.


Looking this over from a technical point of view, I think it's better
than what you proposed before but I still don't agree with the
approach.  I don't think there is any getting around the fact that
converted whole-row vars are going to result in some warts.  Ashutosh
inserted most of the necessary warts in the original partition-wise
join patch, but missed some cases in postgres_fdw; his approach is,
basically, to add the matching warts there.  Your approach is to rip
out all of those warts and insert different ones.  You've simplified
build_tlist_index_other_vars() and add_placeholders_to_joinrel() and
build_joinrel_tlist() to basically what they were before
partition-wise join went in.  On the other hand, RelOptInfo grows
three new fields,


Sorry, I forgot to remove one of the fields that isn't needed anymore. 
RelOptInfo needs two new fields, in the new approach.



set_append_rel_size() ends up more complicated than
it was before when you include the node code added in the
build_childrel_tlist function it calls, build_child_joinrel() has a
different set of complications, and most importantly
create_append_path() and create_merge_append_path() now do surgery on
their input paths that knows specifically about the
converted-whole-row var case.  I would be glad to be rid of the stuff
that you're ripping out, but in terms of complexity, I don't think we
really come out ahead with the stuff you're adding.


The new approach seems to me more localized than what Ashutosh had 
proposed.  One obvious advantage of the new approach is that we no 
longer need changes to postgres_fdw for it to work for partitionwise 
joins, which would apply for any other FDWs, making the FDW authors' 
life easy.



I'm also still concerned about the design.  In your old approach, you
were creating the paths with with the wrong target list and then
fixing it when you turned the paths into a plan.  In your new
approach, you're still creating the paths with the wrong target list,
but now you're fixing it when you put an Append or MergeAppend path on
top of them.  I still think it's a bad idea to have anything other
than the correct paths in the target list from the beginning.


I don't think so; actually, the child's targetlist would not have to be 
the same as the parent's until the child gets in under the parent's 
Append or MergeAppend.  So, I modified the child's target list so as to 
make it easy to join the child relations.



For one
thing, what happens if no Append or MergeAppend is ever put on top of
them?  One way that can happen today is partition-wise aggregate,
although I haven't been able to demonstrate a bug, so maybe that's
covered somehow.


Right.  In the new approach, the targetlist for the topmost child 
scan/join is guaranteed to be the same as that for the topmost parent 
scan/join by the planner work that I added.



But in general, with your approach, any code that
looks at the tlist for a child-join has to know that the tlist is to
be used as-is *except that* ConvertRowtypeExpr may need to be
inserted.  I think that special case could be easy to overlook, and it
seems pretty arbitrary.


I think we can check to see whether the conversion is needed, from the 
flags added to RelOptInfo.



A tlist is a list of arbitrary expressions to
be produced; with this patch, we'd end up with the tlist being the
list of expressions to be produced in every case *except for*
child-joins containing whole-row-vars.  I just can't get behind a
strange exception like that.


I have to admit that it's weird to adjust the child's targetlist in that 
case when putting the child under the parent's Append/MergeAppend, but 
IMHO I think that would be much localized.


Thanks for the comments!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-02 Thread Etsuro Fujita


(2018/08/02 2:44), Robert Haas wrote:

Tom, Ashutosh, and I all seem to agree that we shouldn't try to
re-jigger things at create-plan time.  I think that a 3-1 consensus
against your proposal is sufficient to say we shouldn't go that way.


Agreed.


Now, here you've got a new approach which avoids that, which I have
not yet reviewed.  I'll try to spend some time on it this afternoon,
but really, I think it's too late for this.  This bug was reported in
February, and we're supposed to be pushing 11 final out the door in
not much more than a month.  Proposing a new approach in August is not
good.


Agreed.


Can't we just do what Ashutosh proposed for now and revisit
this for v12?


I think that may be possible.

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-01 Thread Robert Haas

On Wed, Aug 1, 2018 at 7:44 AM, Etsuro Fujita
 wrote:
> I updated the patch that way.  Updated patch attached.  I fixed a bug and
> did a bit of cleanups as well.

Looking this over from a technical point of view, I think it's better
than what you proposed before but I still don't agree with the
approach.  I don't think there is any getting around the fact that
converted whole-row vars are going to result in some warts.  Ashutosh
inserted most of the necessary warts in the original partition-wise
join patch, but missed some cases in postgres_fdw; his approach is,
basically, to add the matching warts there.  Your approach is to rip
out all of those warts and insert different ones.  You've simplified
build_tlist_index_other_vars() and add_placeholders_to_joinrel() and
build_joinrel_tlist() to basically what they were before
partition-wise join went in.  On the other hand, RelOptInfo grows
three new fields, set_append_rel_size() ends up more complicated than
it was before when you include the node code added in the
build_childrel_tlist function it calls, build_child_joinrel() has a
different set of complications, and most importantly
create_append_path() and create_merge_append_path() now do surgery on
their input paths that knows specifically about the
converted-whole-row var case.  I would be glad to be rid of the stuff
that you're ripping out, but in terms of complexity, I don't think we
really come out ahead with the stuff you're adding.

I'm also still concerned about the design.  In your old approach, you
were creating the paths with with the wrong target list and then
fixing it when you turned the paths into a plan.  In your new
approach, you're still creating the paths with the wrong target list,
but now you're fixing it when you put an Append or MergeAppend path on
top of them.  I still think it's a bad idea to have anything other
than the correct paths in the target list from the beginning.  For one
thing, what happens if no Append or MergeAppend is ever put on top of
them?  One way that can happen today is partition-wise aggregate,
although I haven't been able to demonstrate a bug, so maybe that's
covered somehow.  But in general, with your approach, any code that
looks at the tlist for a child-join has to know that the tlist is to
be used as-is *except that* ConvertRowtypeExpr may need to be
inserted.  I think that special case could be easy to overlook, and it
seems pretty arbitrary.  A tlist is a list of arbitrary expressions to
be produced; with this patch, we'd end up with the tlist being the
list of expressions to be produced in every case *except for*
child-joins containing whole-row-vars.  I just can't get behind a
strange exception like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-01 Thread Robert Haas

On Wed, Aug 1, 2018 at 7:46 AM, Etsuro Fujita
 wrote:
> I posted the updated patch [1].

Etsuro-san:

I really think we should just go with what Ashutosh had proposed.
Tom, Ashutosh, and I all seem to agree that we shouldn't try to
re-jigger things at create-plan time.  I think that a 3-1 consensus
against your proposal is sufficient to say we shouldn't go that way.

Now, here you've got a new approach which avoids that, which I have
not yet reviewed.  I'll try to spend some time on it this afternoon,
but really, I think it's too late for this.  This bug was reported in
February, and we're supposed to be pushing 11 final out the door in
not much more than a month.  Proposing a new approach in August is not
good.  Can't we just do what Ashutosh proposed for now and revisit
this for v12?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-01 Thread Etsuro Fujita


(2018/08/01 5:31), Alvaro Herrera wrote:

On 2018-Jul-31, Etsuro Fujita wrote:

(2018/07/31 4:06), Andres Freund wrote:

On 2018-07-20 08:38:09 -0400, Robert Haas wrote:

I'm going to study this some more now, but I really think this is
going in the wrong direction.


We're going to have to get somewhere on this topic soon. This thread has
been open for nearly half a year, and we're late in the beta phase now.
What can we do to get to an agreement soon?


I'd like to propose an updated patch as proposed in [1].


But there is no patch there, and neither there is agreement from the
other party that the approach described there is sound.


I posted the updated patch [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B619D23.401%40lab.ntt.co.jp

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-01 Thread Etsuro Fujita


(2018/07/26 21:11), Etsuro Fujita wrote:

(2018/07/26 5:27), Robert Haas wrote:

Well, I could have the wrong idea here, but I tend to think allowing
for ConvertRowTypeExpr elsewhere won't be that bad.


I still don't like that because in my opinion, changes needed for that
would not be localized, and that would make code complicated more than
necessary.

As I mentioned in a previous email, another idea to avoid that would be
to adjust tlists for children at path creation time, not plan creation
time; we could adjust the tlist for each of subpaths accumulated for an
Append/MergeAppend path in add_paths_to_append_rel called from
set_append_rel_pathlist or generate_partitionwise_join_paths, with
create_projection_path adding ConvertRowTypeExpr. It seems unlikely that
performing create_projection_path to such a subpath would change its
property of being the cheapest, so it would be safe to adjust the tlists
that way. This would not require making create_plan complicated anymore.
I might be missing something, though.


I updated the patch that way.  Updated patch attached.  I fixed a bug 
and did a bit of cleanups as well.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8337,8344  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8337,8345 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8391,8416  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
  
  -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
 ->  Append
   ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
   ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
  (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8392,8427 
  
  -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!QUERY PLAN   
! 
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
 ->  Append
   ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) FULL JOIN (public.ftprt2_p1 t2)
   ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) FULL JOIN (public.ftprt2_p2 t2)
  (7 rows)
  
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!wr   |   wr   
  +
   (0,0,) | (0,0,)
+  (50,50,0001)   | 
+  (100,100,0002) | 
   (150,150,0003) | (150,150,0003)
+  (200,200,0004) | 
   (250,250,0005) | (250,250,0005)
+  (300,300,0006) | 
+  (350,350,0007) | 
   (400,400,0008) | (400,400,0008)
!  (450,450,0009)

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-31 Thread Alvaro Herrera

On 2018-Jul-31, Etsuro Fujita wrote:

> (2018/07/31 4:06), Andres Freund wrote:
> > On 2018-07-20 08:38:09 -0400, Robert Haas wrote:
> > > I'm going to study this some more now, but I really think this is
> > > going in the wrong direction.
> > 
> > We're going to have to get somewhere on this topic soon. This thread has
> > been open for nearly half a year, and we're late in the beta phase now.
> > What can we do to get to an agreement soon?
> 
> I'd like to propose an updated patch as proposed in [1].

But there is no patch there, and neither there is agreement from the
other party that the approach described there is sound.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-30 Thread Etsuro Fujita


(2018/07/31 4:06), Andres Freund wrote:

On 2018-07-20 08:38:09 -0400, Robert Haas wrote:

I'm going to study this some more now, but I really think this is
going in the wrong direction.


We're going to have to get somewhere on this topic soon. This thread has
been open for nearly half a year, and we're late in the beta phase now.
What can we do to get to an agreement soon?


I'd like to propose an updated patch as proposed in [1].

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5B59BA55.30200%40lab.ntt.co.jp

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-30 Thread Andres Freund

Hi,

On 2018-07-20 08:38:09 -0400, Robert Haas wrote:
> I'm going to study this some more now, but I really think this is
> going in the wrong direction.

We're going to have to get somewhere on this topic soon. This thread has
been open for nearly half a year, and we're late in the beta phase now.
What can we do to get to an agreement soon?

Greetings,

Andres Freund

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-26 Thread Etsuro Fujita


(2018/07/26 5:27), Robert Haas wrote:

On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
  wrote:

Isn't that assumption fundamental to your whole approach?


I don't think so.  What I mean here is: currently the subplan would be a
scan/join node, but in future we might have eg, a Sort node atop the
scan/join node, so it would be better to update the patch to handle such a
case as well.


But how would you do that?


What I had in mind was to insert a Rusult node with 
inject_projection_plan and adjust the tlist of the Result, as done for 
adding sort columns to a tlist in prepare_sort_from_pathkeys.



I think that's a bad idea.  The target list affects lots
of things, like costing.  If we don't insert a ConvertRowTypeExpr into
the child's target list, the costing will be wrong to the extent that
ConvertRowTypeExpr has any cost, which it does.



Actually, this is not true at least currently, because
set_append_rel_size
doesn't do anything about the costing:



Why would it?  Append can't project, so the cost of any expressions
that appear in its target list is irrelevant.  What is affected is the
cost of the scans below the Append -- see e.g. cost_seqscan(), which
uses the data produced by set_pathtarget_cost_width().


By set_rel_size()?


Sorry, I don't understand what you mean by this.


I think the data used by such a costing function is computed by 
set_rel_size in set_append_rel_size, not set_pathtarget_cost_width; in 
the case of a plain partition, for example, set_rel_size would call 
set_plain_rel_size, and then set_plain_rel_size would eventually call 
set_rel_width, which sets reltarget->cost, which I think would be used 
by e.g., cost_seqscan.  cost_qual_eval_node, which is called in 
set_rel_width for computing the cost of ConvertRowTypeExpr, ignores that 
expression, so currently, we don't charge any cost for it to the 
partition's reltarget->cost, and to the cost of a scan below the Append.



I'm not sure that's a good idea, because I think we have a trade-off
relation; the more we make create_plan simple, the more we need to make
earlier states of the planner complicated.

And it looks to me like the partitionwise join code is making earlier (and
later) stages of the planner too complicated, to make create_plan simple.


I think that create_plan is *supposed* to be simple.  Its purpose is
to prune away data that's only needed during planning and add things
that can be computed at the last minute which are needed at execution
time.  Making it do anything else is, in my opinion, not good.


I agree on that point.


When considering paritionwise joining, it would make things complicated to
have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed
upthread, it deviates from the planner's assumption that a rel's targetlist
would only include Vars and PHVs.  So, I propose to include a child
whole-row Var in the targetlist instead, in which case, we need to fix the
Var after the fact, but can avoid making many other parts of the planner
complicated.


Well, I could have the wrong idea here, but I tend to think allowing
for ConvertRowTypeExpr elsewhere won't be that bad.


I still don't like that because in my opinion, changes needed for that 
would not be localized, and that would make code complicated more than 
necessary.


As I mentioned in a previous email, another idea to avoid that would be 
to adjust tlists for children at path creation time, not plan creation 
time; we could adjust the tlist for each of subpaths accumulated for an 
Append/MergeAppend path in add_paths_to_append_rel called from 
set_append_rel_pathlist or generate_partitionwise_join_paths, with 
create_projection_path adding ConvertRowTypeExpr.  It seems unlikely 
that performing create_projection_path to such a subpath would change 
its property of being the cheapest, so it would be safe to adjust the 
tlists that way.  This would not require making create_plan complicated 
anymore.  I might be missing something, though.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-25 Thread Ashutosh Bapat

On Thu, Jul 26, 2018 at 2:12 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
>>  wrote:
>>> I'm not sure that's a good idea, because I think we have a trade-off
>>> relation; the more we make create_plan simple, the more we need to make
>>> earlier states of the planner complicated.
>>>
>>> And it looks to me like the partitionwise join code is making earlier (and
>>> later) stages of the planner too complicated, to make create_plan simple.
>
>> I think that create_plan is *supposed* to be simple.  Its purpose is
>> to prune away data that's only needed during planning and add things
>> that can be computed at the last minute which are needed at execution
>> time.  Making it do anything else is, in my opinion, not good.
>
> I tend to agree with Robert on this.  In particular, if you put anything
> into create_plan or later that affects cost estimates, you're really doing
> it wrong, because changing those estimates might've changed the plan
> entirely.  (As I recall, we have a couple of cheats like that now,
> associated with dropping no-op projection and subquery scan nodes.
> But let's not add more.)

+1

>
> TBH, I think this entire discussion is proving what sheer folly it was
> to allow partitions to have rowtypes not identical to their parent.
> But I suppose we're stuck with that now.
>

Every table created has a different rowtype and when the partition's
rowtype is different from that of the partitioned table, we add a
ConvertRowtypeExpr node on top of the whole-row expression.

2153 if (appinfo->parent_reltype != appinfo->child_reltype)
2154 {
2155 ConvertRowtypeExpr *r =
makeNode(ConvertRowtypeExpr);

We may be able to set rowtype of a partition to be same as the rowtype
of a partitioned table when a partition table is created using an
ALTER TABLE syntax. But are you suggesting that we change the reltype
of a partition being attached? What happens when we detach a partition
and let the table live independent of the partitioned table? Do we
create a new reltype?

I am not even going into the problem when we try to attach a partition
with different tuple layout. I do believe that such usecases arise and
rewriting the table being attached is a viable option.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-25 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
>  wrote:
>> I'm not sure that's a good idea, because I think we have a trade-off
>> relation; the more we make create_plan simple, the more we need to make
>> earlier states of the planner complicated.
>> 
>> And it looks to me like the partitionwise join code is making earlier (and
>> later) stages of the planner too complicated, to make create_plan simple.

> I think that create_plan is *supposed* to be simple.  Its purpose is
> to prune away data that's only needed during planning and add things
> that can be computed at the last minute which are needed at execution
> time.  Making it do anything else is, in my opinion, not good.

I tend to agree with Robert on this.  In particular, if you put anything
into create_plan or later that affects cost estimates, you're really doing
it wrong, because changing those estimates might've changed the plan
entirely.  (As I recall, we have a couple of cheats like that now,
associated with dropping no-op projection and subquery scan nodes.
But let's not add more.)

TBH, I think this entire discussion is proving what sheer folly it was
to allow partitions to have rowtypes not identical to their parent.
But I suppose we're stuck with that now.

regards, tom lane

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-25 Thread Robert Haas

On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
 wrote:
>> Isn't that assumption fundamental to your whole approach?
>
> I don't think so.  What I mean here is: currently the subplan would be a
> scan/join node, but in future we might have eg, a Sort node atop the
> scan/join node, so it would be better to update the patch to handle such a
> case as well.

But how would you do that?

 I think that's a bad idea.  The target list affects lots
 of things, like costing.  If we don't insert a ConvertRowTypeExpr into
 the child's target list, the costing will be wrong to the extent that
 ConvertRowTypeExpr has any cost, which it does.
>>>
>>>
>>> Actually, this is not true at least currently, because
>>> set_append_rel_size
>>> doesn't do anything about the costing:
>>
>>
>> Why would it?  Append can't project, so the cost of any expressions
>> that appear in its target list is irrelevant.  What is affected is the
>> cost of the scans below the Append -- see e.g. cost_seqscan(), which
>> uses the data produced by set_pathtarget_cost_width().
>
> By set_rel_size()?

Sorry, I don't understand what you mean by this.

> I'm not sure that's a good idea, because I think we have a trade-off
> relation; the more we make create_plan simple, the more we need to make
> earlier states of the planner complicated.
>
> And it looks to me like the partitionwise join code is making earlier (and
> later) stages of the planner too complicated, to make create_plan simple.

I think that create_plan is *supposed* to be simple.  Its purpose is
to prune away data that's only needed during planning and add things
that can be computed at the last minute which are needed at execution
time.  Making it do anything else is, in my opinion, not good.

> When considering paritionwise joining, it would make things complicated to
> have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed
> upthread, it deviates from the planner's assumption that a rel's targetlist
> would only include Vars and PHVs.  So, I propose to include a child
> whole-row Var in the targetlist instead, in which case, we need to fix the
> Var after the fact, but can avoid making many other parts of the planner
> complicated.

Well, I could have the wrong idea here, but I tend to think allowing
for ConvertRowTypeExpr elsewhere won't be that bad.

Does anyone else want to weigh in on this issue?  Tom, perhaps?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-24 Thread Etsuro Fujita


(2018/07/24 3:09), Robert Haas wrote:

On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita
  wrote:

I have to admit that that is not a good idea.  So, I'll update the patch so
that we don't assume the projection capability of the subplan anymore, if we
go this way.


Isn't that assumption fundamental to your whole approach?


I don't think so.  What I mean here is: currently the subplan would be a 
scan/join node, but in future we might have eg, a Sort node atop the 
scan/join node, so it would be better to update the patch to handle such 
a case as well.



Also, even today, this would fail if the subplan happened to be
a Sort, and it's not very obvious why that couldn't happen.


You mean the MergeAppend case?  In that case, the patch will adjust the
subplan before adding a Sort above the subplan, so that could not happen.


That would only help if create_merge_append_plan() itself inserted a
Sort node.  It wouldn't help if the Sort node came from a child path
manufactured by create_sort_path().


As I mentioned above, I think we could handle such a case as well.


I think that's a bad idea.  The target list affects lots
of things, like costing.  If we don't insert a ConvertRowTypeExpr into
the child's target list, the costing will be wrong to the extent that
ConvertRowTypeExpr has any cost, which it does.


Actually, this is not true at least currently, because set_append_rel_size
doesn't do anything about the costing:


Why would it?  Append can't project, so the cost of any expressions
that appear in its target list is irrelevant.  What is affected is the
cost of the scans below the Append -- see e.g. cost_seqscan(), which
uses the data produced by set_pathtarget_cost_width().


By set_rel_size()?


Some createplan.c routines already change the tlists of their nodes. For
example, create_merge_append_plan would add sort columns to each of its
subplans if necessary.  I think it would be similar to that to add a
ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist.


You have a point, but I think that code is actually not a very good
idea, and I'd like to see us do less of that sort of thing, not more.
Any case in which we change the plan while creating it has many of the
same problems that I discussed in the previous email.  For example,
create_merge_append_path() has to know that a Sort node might get
inserted and set the costing accordingly.  If the callers guaranteed
that the necessary sort path had already been inserted, then we
wouldn't need that special handling.


I'm not sure that's a good idea, because I think we have a trade-off 
relation; the more we make create_plan simple, the more we need to make 
earlier states of the planner complicated.


And it looks to me like the partitionwise join code is making earlier 
(and later) stages of the planner too complicated, to make create_plan 
simple.



Also, that code is adding additional columns, computed from the
columns we have available, so that we can sort.  Those extra columns
then get discarded at the next level of the Plan tree.  What you're
trying to do is different.  Perhaps this is too harsh a judgement, but
it looks to me like you're using a deliberately-wrong representation
of the value that you ultimately want to produce and then patching it
up after the fact.


When considering paritionwise joining, it would make things complicated 
to have a ConvertRowtypeExpr in a partrel's targetlist, because as 
discussed upthread, it deviates from the planner's assumption that a 
rel's targetlist would only include Vars and PHVs.  So, I propose to 
include a child whole-row Var in the targetlist instead, in which case, 
we need to fix the Var after the fact, but can avoid making many other 
parts of the planner complicated.



That seems quite a bit worse than what the
existing code is doing.


One idea to address that concern would be to adjust the pathtarget of 
each path created in generate_partitionwise_join_paths accordingly.  I'm 
missing something, though.



I reviewed his patch, and thought that that would fix the issue, but this is
about the current design on the child tlist handling in partitionise join
rather than that patch: it deviates from the assumption that we had for the
scan/join planning till PG10 that "a rel's targetlist would only include
Vars and PlaceHolderVars", and turns out that there are a lot of places
where we need to modify code to suppress that deviation, as partly shown in
that patch.  So, ISTM that the current design in itself is not that
localized.


I used to think that was the assumption, too, but it seems to me that
Tom told me a while back that it was never really true, and that
assumption was in my head.  Unfortunately, I don't have a link to that
email handy.  Either way, I think the solution is to get the tlist
right from the start and cope with the consequences downstream, not
start with the wrong thing and try to fix it later.


I'm not fully convinced that's really the right solution,

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-23 Thread Robert Haas

On Mon, Jul 23, 2018 at 3:49 AM, Etsuro Fujita
 wrote:
> I have to admit that that is not a good idea.  So, I'll update the patch so
> that we don't assume the projection capability of the subplan anymore, if we
> go this way.

Isn't that assumption fundamental to your whole approach?

>> Also, even today, this would fail if the subplan happened to be
>> a Sort, and it's not very obvious why that couldn't happen.
>
> You mean the MergeAppend case?  In that case, the patch will adjust the
> subplan before adding a Sort above the subplan, so that could not happen.

That would only help if create_merge_append_plan() itself inserted a
Sort node.  It wouldn't help if the Sort node came from a child path
manufactured by create_sort_path().

>> I think that's a bad idea.  The target list affects lots
>> of things, like costing.  If we don't insert a ConvertRowTypeExpr into
>> the child's target list, the costing will be wrong to the extent that
>> ConvertRowTypeExpr has any cost, which it does.
>
> Actually, this is not true at least currently, because set_append_rel_size
> doesn't do anything about the costing:

Why would it?  Append can't project, so the cost of any expressions
that appear in its target list is irrelevant.  What is affected is the
cost of the scans below the Append -- see e.g. cost_seqscan(), which
uses the data produced by set_pathtarget_cost_width().

> Some createplan.c routines already change the tlists of their nodes. For
> example, create_merge_append_plan would add sort columns to each of its
> subplans if necessary.  I think it would be similar to that to add a
> ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist.

You have a point, but I think that code is actually not a very good
idea, and I'd like to see us do less of that sort of thing, not more.
Any case in which we change the plan while creating it has many of the
same problems that I discussed in the previous email.  For example,
create_merge_append_path() has to know that a Sort node might get
inserted and set the costing accordingly.  If the callers guaranteed
that the necessary sort path had already been inserted, then we
wouldn't need that special handling.

Also, that code is adding additional columns, computed from the
columns we have available, so that we can sort.  Those extra columns
then get discarded at the next level of the Plan tree.  What you're
trying to do is different.  Perhaps this is too harsh a judgement, but
it looks to me like you're using a deliberately-wrong representation
of the value that you ultimately want to produce and then patching it
up after the fact.  That seems quite a bit worse than what the
existing code is doing.

> I reviewed his patch, and thought that that would fix the issue, but this is
> about the current design on the child tlist handling in partitionise join
> rather than that patch: it deviates from the assumption that we had for the
> scan/join planning till PG10 that "a rel's targetlist would only include
> Vars and PlaceHolderVars", and turns out that there are a lot of places
> where we need to modify code to suppress that deviation, as partly shown in
> that patch.  So, ISTM that the current design in itself is not that
> localized.

I used to think that was the assumption, too, but it seems to me that
Tom told me a while back that it was never really true, and that
assumption was in my head.  Unfortunately, I don't have a link to that
email handy.  Either way, I think the solution is to get the tlist
right from the start and cope with the consequences downstream, not
start with the wrong thing and try to fix it later.  To see an example
of why I think that's a valuable approach, see
11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5, especially the changes in
the regression test outputs.  The old code discovered after it had
generated an Append that it had the wrong tlist, but since the Append
can't project, it had to add a Result node.  With the new code, we get
the children of the Append to produce the right output from the start,
and then the Append just needs to concatenate all that output, so no
Result node is needed.  As noted in the commit message, it also made
the costing more accurate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-23 Thread Ashutosh Bapat

On Fri, Jul 20, 2018 at 8:56 PM, Robert Haas  wrote:
[ ... clipped portion ...]
>
> In short, plan creation time is just the wrong time to change the
> plan.  It's just a time to translate the plan from the form needed by
> the planner to the form needed by the executor.  It would be fine if
> the change you were making were only cosmetic, but it's not.
>

Agree with all that, including the clipped paras.

> After looking over both patches, I think Ashutosh Bapat has basically
> the right idea.  What he is proposing is a localized fix that doesn't
> seem to make any changes to the way things work overall.  I do think
> his patches need to be fixed up a bit to avoid conflating
> ConvertRowtypeExpr with "converted whole-row reference."  The two are
> certainly not equivalent; the latter is a narrow special case.  Some
> of the comments could use some more wordsmithing, too, I think.  Apart
> from those gripes, though, I think it's solving the problem the
> correct way: don't build the wrong plan and try to fix it, just build
> the right plan from the beginning.

I will work on that if everyone agrees that that's the right way to
go. Fujita-san seems to have some arguments still. I have argued on
the same lines as yours but your explanation is better. I don't have
anything more to add. I will wait for that to be resolved.

>
> There are definitely some things not to like about this approach.  In
> particular, I definitely agree that treating a converted whole-row
> reference specially is not very nice.  It feels like it might be
> substantially cleaner to be able to represent this idea as a single
> node rather than a combination of ConvertRowtypeExpr and var with
> varattno = 0.  Perhaps in the future we ought to consider either (a)
> trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
> inventing a new WholeRowExpr node that stores two RTIs, one for the
> relation that's generating the whole-row reference and the other for
> the relation that is controlling the column ordering of the result or
> (c) allowing a Var to represent a whole-row expression that has to
> produce its outputs according to the ordering of some other RTE.

I never liked representing a whole-row expression as a Var worst with
varattno = 0. We should have always casted it as RowExpr(set of vars,
one var per column). This problem wouldn't arise then. Many other
problems wouldn't arise then, I think. I think we do that so that we
can convert a tuple from buffer into a datum and put it in place of
Var with varattno = 0. Given that I prefer (a) or (b) in all the
cases. If not that, then c. But I agree that we have to avoid
ConvertRowtypeExpr being used in this case.

> But
> I don't think it's wise or necessary to whack that around just to fix
> this bug; it is refactoring or improvement work best left to a future
> release.
>
> Also, it is definitely a real shame that we have to build attr_needed
> data for each child separately.  Right now, we've got to build a whole
> lot of things for each child individually, and that's slow and
> inefficient.  We're not going to scale nicely to large partitioning
> hierarchies unless we can figure out some way of minimizing the work
> we do for each partition.  So, the fact that Fujii-san's approach
> avoids needing to compute attr_needed in some cases is very appealing.
> However, in my opinion, it's nowhere near appealing enough to justify
> trying to do surgery on the target list at plan-creation time.  I
> think we need to leave optimizing this to a future effort.
> Partition-wise join/aggregate, and partitioning in general, need a lot
> of optimization in a lot of places, and fortunately there are people
> working on that, but our goal here should just be to fix things up
> well enough that we can ship it.

I agree.

> I don't see anything in Ashutosh's
> patch that is so ugly that we can't live with it for the time being.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-23 Thread Etsuro Fujita


(2018/07/21 0:26), Robert Haas wrote:

On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas  wrote:

This looks like a terrible design to me.  If somebody in future
invents a new plan type that is not projection-capable, then this
could fail an assertion here and there won't be any simple fix.  And
if you reach here and the child targetlists somehow haven't been
adjusted, then you won't even fail an assertion, you'll just crash (or
maybe return wrong answers).


To elaborate on this thought a bit, it is currently the case that all
scan and join types are projection-capable, and most likely we would
make any such node types added in the future projection-capable as
well, but it still doesn't seem like a good idea to me to have
tangentially-related parts of the system depend on that in subtle
ways.


I have to admit that that is not a good idea.  So, I'll update the patch 
so that we don't assume the projection capability of the subplan 
anymore, if we go this way.



Also, even today, this would fail if the subplan happened to be
a Sort, and it's not very obvious why that couldn't happen.


You mean the MergeAppend case?  In that case, the patch will adjust the 
subplan before adding a Sort above the subplan, so that could not happen.



More broadly, the central idea of this patch is that we can set the
target list for a child rel to something other than the target list
that we expect it will eventually produce, and then at plan creation
time fix it.


Yeah, the patch would fix the the final output to the appendrel parent 
at plan creation time if necessary, but it would produces the tlist of 
an intermediate child scan/join node as written in the child relation's 
reltarget at that time.



I think that's a bad idea.  The target list affects lots
of things, like costing.  If we don't insert a ConvertRowTypeExpr into
the child's target list, the costing will be wrong to the extent that
ConvertRowTypeExpr has any cost, which it does.


Actually, this is not true at least currently, because 
set_append_rel_size doesn't do anything about the costing:


 * NB: the resulting childrel->reltarget->exprs may contain 
arbitrary
 * expressions, which otherwise would not occur in a rel's 
targetlist.

 * Code that might be looking at an appendrel child must cope with
 * such.  (Normally, a rel's targetlist would only include Vars and
 * PlaceHolderVars.)  XXX we do not bother to update the cost 
or width
 * fields of childrel->reltarget; not clear if that would be 
useful.



Any other current or
future property that is computed based on the target list --
parallel-safety, width, security-level, whatever -- could also
potentially end up with a wrong value if the target list as written
does not match the target list as will actually be produced.  It's
true that, in all of these areas, the ConvertRowTypeExpr isn't likely
to have a lot of impact; it probably won't have a big effect on
costing and may not actually cause a problem for the other things that
I mentioned.  On the other hand, if it does cause a serious problem,
what options would we have for fixing it other than to back out your
entire patch and solve the problem some other way?  The idea that
derived properties won't be correct unless they are based on the real
target list is pretty fundamental, and I think deviating from the idea
that we get the correct target list first and then compute the derived
properties afterward is likely to cause real headaches for future
developers.

In short, plan creation time is just the wrong time to change the
plan.  It's just a time to translate the plan from the form needed by
the planner to the form needed by the executor.  It would be fine if
the change you were making were only cosmetic, but it's not.


Some createplan.c routines already change the tlists of their nodes. 
For example, create_merge_append_plan would add sort columns to each of 
its subplans if necessary.  I think it would be similar to that to add a 
ConvertRowtypeExpr above a child whole-row Var in the subplan's tlist.



After looking over both patches, I think Ashutosh Bapat has basically
the right idea.  What he is proposing is a localized fix that doesn't
seem to make any changes to the way things work overall.


I reviewed his patch, and thought that that would fix the issue, but 
this is about the current design on the child tlist handling in 
partitionise join rather than that patch: it deviates from the 
assumption that we had for the scan/join planning till PG10 that "a 
rel's targetlist would only include Vars and PlaceHolderVars", and turns 
out that there are a lot of places where we need to modify code to 
suppress that deviation, as partly shown in that patch.  So, ISTM that 
the current design in itself is not that localized.



Partition-wise join/aggregate, and partitioning in general, need a lot
of optimization in a lot of places, and fortunately there are people
working on that, but our goal

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-20 Thread Robert Haas

On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas  wrote:
> This looks like a terrible design to me.  If somebody in future
> invents a new plan type that is not projection-capable, then this
> could fail an assertion here and there won't be any simple fix.  And
> if you reach here and the child targetlists somehow haven't been
> adjusted, then you won't even fail an assertion, you'll just crash (or
> maybe return wrong answers).

To elaborate on this thought a bit, it is currently the case that all
scan and join types are projection-capable, and most likely we would
make any such node types added in the future projection-capable as
well, but it still doesn't seem like a good idea to me to have
tangentially-related parts of the system depend on that in subtle
ways.  Also, even today, this would fail if the subplan happened to be
a Sort, and it's not very obvious why that couldn't happen.  If it
can't happen today, it's not obvious why a future code change couldn't
introduce cases where it can happen.

More broadly, the central idea of this patch is that we can set the
target list for a child rel to something other than the target list
that we expect it will eventually produce, and then at plan creation
time fix it.  I think that's a bad idea.  The target list affects lots
of things, like costing.  If we don't insert a ConvertRowTypeExpr into
the child's target list, the costing will be wrong to the extent that
ConvertRowTypeExpr has any cost, which it does.  Any other current or
future property that is computed based on the target list --
parallel-safety, width, security-level, whatever -- could also
potentially end up with a wrong value if the target list as written
does not match the target list as will actually be produced.  It's
true that, in all of these areas, the ConvertRowTypeExpr isn't likely
to have a lot of impact; it probably won't have a big effect on
costing and may not actually cause a problem for the other things that
I mentioned.  On the other hand, if it does cause a serious problem,
what options would we have for fixing it other than to back out your
entire patch and solve the problem some other way?  The idea that
derived properties won't be correct unless they are based on the real
target list is pretty fundamental, and I think deviating from the idea
that we get the correct target list first and then compute the derived
properties afterward is likely to cause real headaches for future
developers.

In short, plan creation time is just the wrong time to change the
plan.  It's just a time to translate the plan from the form needed by
the planner to the form needed by the executor.  It would be fine if
the change you were making were only cosmetic, but it's not.

After looking over both patches, I think Ashutosh Bapat has basically
the right idea.  What he is proposing is a localized fix that doesn't
seem to make any changes to the way things work overall.  I do think
his patches need to be fixed up a bit to avoid conflating
ConvertRowtypeExpr with "converted whole-row reference."  The two are
certainly not equivalent; the latter is a narrow special case.  Some
of the comments could use some more wordsmithing, too, I think.  Apart
from those gripes, though, I think it's solving the problem the
correct way: don't build the wrong plan and try to fix it, just build
the right plan from the beginning.

There are definitely some things not to like about this approach.  In
particular, I definitely agree that treating a converted whole-row
reference specially is not very nice.  It feels like it might be
substantially cleaner to be able to represent this idea as a single
node rather than a combination of ConvertRowtypeExpr and var with
varattno = 0.  Perhaps in the future we ought to consider either (a)
trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
inventing a new WholeRowExpr node that stores two RTIs, one for the
relation that's generating the whole-row reference and the other for
the relation that is controlling the column ordering of the result or
(c) allowing a Var to represent a whole-row expression that has to
produce its outputs according to the ordering of some other RTE.  But
I don't think it's wise or necessary to whack that around just to fix
this bug; it is refactoring or improvement work best left to a future
release.

Also, it is definitely a real shame that we have to build attr_needed
data for each child separately.  Right now, we've got to build a whole
lot of things for each child individually, and that's slow and
inefficient.  We're not going to scale nicely to large partitioning
hierarchies unless we can figure out some way of minimizing the work
we do for each partition.  So, the fact that Fujii-san's approach
avoids needing to compute attr_needed in some cases is very appealing.
However, in my opinion, it's nowhere near appealing enough to justify
trying to do surgery on the target list at plan-creation time.  I
think we need to leave optimizing this to

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-20 Thread Robert Haas

On Wed, Jul 18, 2018 at 8:00 AM, Etsuro Fujita
 wrote:
> [ new patch ]

+ /*
+ * If the child plan is an Append or MergeAppend, the targetlists of its
+ * subplans would have already been adjusted before we get here, so need
+ * to work here.
+ */
+ if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
+ return;
+
+ /* The child plan should be a scan or join */
+ Assert(is_projection_capable_plan(subplan));

This looks like a terrible design to me.  If somebody in future
invents a new plan type that is not projection-capable, then this
could fail an assertion here and there won't be any simple fix.  And
if you reach here and the child targetlists somehow haven't been
adjusted, then you won't even fail an assertion, you'll just crash (or
maybe return wrong answers).

I'm going to study this some more now, but I really think this is
going in the wrong direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-18 Thread Etsuro Fujita


(2018/07/13 23:05), Ashutosh Bapat wrote:

On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita
  wrote:

In this example, the value of the whole-row reference to the child table
ptp1 for that record is ('foo',1), and that of the index expression for that
record is (1,'foo').  Those have different column orders, but the latter
could be mapped to the former by a technique like do_convert_tuple.



The expression in this case would look like ptp1::pt::ptp1 which won't
match targetlist expression ptp1. I am also doubtful that the planner
will be able to deduce that it need to apply an inverse function of
::pt and what exactly such an inverse function is. So index only scan
won't be picked.



we could support index-only scans
with such an index in the case where we have the whole-row reference in the
targetlist, not the index expression itself.


Can you please show an index only scan path being created in this case?


We currently don't consider index-only scan with index expressions, so I 
haven't thought in detail yet about how the planner would work.  But 
once we have that index-only scan, I think we could extend that to the 
case mentioned above, by adding this to the planner: if the index 
expression is of the form var::parenttype, consider that (not only the 
expression itself but) var can be returned from the index.  I think the 
expression like ptp1::pt::ptp1 would be useful to get the value of ptp1 
from the index at execution time.



There's a patch in an adjacent thread started by David Rowley to rip
out Append/MergeAppend when there is only one subplan. So, your
solution won't work there.



Thanks for sharing that information!  I skimmed the thread.  I haven't yet
caught up with all the discussions there, so I think I'm missing something,
but it looks like that we haven't yet reached any consensus on the way to
go.  In my opinion, I like the approach mentioned in [1].  And if we go that
way, my patch seems to fit into that, because in that approach the
Append/MergeAppend could be removed after adjusting the targetlists for its
subplans in create_append_plan/create_merge_append_plan.  Anyway, I'd like
to join in that work for PG12.


Whatever may be the outcome of that work, I think what we fix here
shouldn't require to reverted in a few months from now, just so that
that patch works.


I think we could add that optimization without reverting this change 
because the essential part of this change is to make create_plan adjust 
the tlists of the subplans based on the instruction stored into the 
subplans' RelOptInfos (ie, need_adjust_tlist in the second version of 
the patch).  I think this technique could be extended even to the case 
where we have that optimization.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-18 Thread Etsuro Fujita


(2018/07/04 11:00), Etsuro Fujita wrote:

(2018/07/04 1:35), Andres Freund wrote:

What's the plan forward here?


I still think that this approach would be the right way to go, so I plan
to polish the patch.


The approach would not require extra cycles where partitioning is not 
involved as discussed before, and makes code much simpler, which would 
make maintenance and the FDW author's life easy, so I still think that 
the approach would be the direction to go in.  So, I updated the patch. 
 Here are changes:


* I assumed that the child plan passed to adjust_subplan_tlist was a 
scan or join unless it was Append or MergeAppend, but I noticed I was 
wrong; if the topmost scan/join relation is partitioned, 
apply_scanjoin_target_to_paths might have pushed down ProjectSet or 
Result to each of partitions.  So, in that case the child's plan would 
have ProjectSet or Result atop the scan or join plan.  (I could not 
produce such an example, though.)  To address this, I modified 
build_childrel_tlist so that we just apply adjust_appendrel_attrs to the 
parent's targetlist to get the child's if the parent is the topmost 
scan/join relation.  So, we don't need to care about that in 
adjust_subplan_tlist.  I think this change would also increase the 
efficiency for that case.  I added regression test cases for that as well.


* I fixed an oversight in outfuncs.c.

* I renamed member variables I added to RelOptInfo before, and revised 
code/comments a bit.


Attached is an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/postgres-master/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/postgres-master/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8337,8344  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8337,8345 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8391,8416  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
  
  -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
 ->  Append
   ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
   ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
  (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8392,8427 
  
  -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!QUERY PLAN   
! 
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
 ->  Append
   ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) FULL JOIN (public.ftprt2_p1 t2)
   ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) FULL JOIN (public.ftprt2_p2 t2)
  (7 rows)
  
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-13 Thread Ashutosh Bapat

On Thu, Jul 12, 2018 at 4:32 PM, Etsuro Fujita
 wrote:
>
> In this example, the value of the whole-row reference to the child table
> ptp1 for that record is ('foo',1), and that of the index expression for that
> record is (1,'foo').  Those have different column orders, but the latter
> could be mapped to the former by a technique like do_convert_tuple.  So,
> what I suspect is: by producing the value of the whole-row reference from
> that of the index expression like that,

The expression in this case would look like ptp1::pt::ptp1 which won't
match targetlist expression ptp1. I am also doubtful that the planner
will be able to deduce that it need to apply an inverse function of
::pt and what exactly such an inverse function is. So index only scan
won't be picked.

> we could support index-only scans
> with such an index in the case where we have the whole-row reference in the
> targetlist, not the index expression itself.

Can you please show an index only scan path being created in this case?


>
>> If that time is long
>> enough, that's ok. In this case, I agree that if we haven't heard from
>> users till now that they need to create indexes on whole-row
>> expressions, there's chance that we will never implement this feature.
>
>
> I don't object to adding that feature to future versions of PG if required.
>
>So, I think it is safe to have whole-row
> Vars instead of ConvertRowtypeExprs in the targetlist.



 when it's looking for an expression, it finds a whole-row expression
 so it think it needs to add a ConvertRowtypeExpr on that. But when the
 plan is created, there is ConvertRowtypeExpr already, but there is no
 way to know that a new ConvertRowtypeExpr is not needed anymore. So,
 we may have two ConvertRowtypeExprs giving wrong results.
>>>
>>>
>>>
>>> Maybe I'm missing something, but I don't think that we need to worry
>>> about
>>> that, because in the approach I proposed, we only add CREs above
>>> whole-row
>>> Vars in the targetlists for subplans of an Append/MergeAppend for a
>>> partitioned relation at plan creation time.
>>
>>
>> There's a patch in an adjacent thread started by David Rowley to rip
>> out Append/MergeAppend when there is only one subplan. So, your
>> solution won't work there.
>
>
> Thanks for sharing that information!  I skimmed the thread.  I haven't yet
> caught up with all the discussions there, so I think I'm missing something,
> but it looks like that we haven't yet reached any consensus on the way to
> go.  In my opinion, I like the approach mentioned in [1].  And if we go that
> way, my patch seems to fit into that, because in that approach the
> Append/MergeAppend could be removed after adjusting the targetlists for its
> subplans in create_append_plan/create_merge_append_plan.  Anyway, I'd like
> to join in that work for PG12.

Whatever may be the outcome of that work, I think what we fix here
shouldn't require to reverted in a few months from now, just so that
that patch works.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-12 Thread Etsuro Fujita


(2018/07/12 13:38), Ashutosh Bapat wrote:

On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita
  wrote:

(2018/07/11 20:02), Ashutosh Bapat wrote:

On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
   wrote:

Actually, even if we could create such an index on the child table and
the
targetlist had the ConvertRowtypeExpr, the planner would still not be
able
to use an index-only scan with that index; because check_index_only would
not consider that an index-only scan is possible for that index because
that
index is an expression index and that function currently does not
consider
that index expressions are able to be returned back in an index-only
scan.
That behavior of the planner might be improved in future, though.




Right and when we do so, not having ConvertRowtypeExpr in the
targetlist will be a problem.



Yeah, but I don't think that that's unsolvable; because in that case the CRE
as an index expression could be converted back to the whole-row Var's
rowtype by adding another CRE to the index expression for that conversion, I
suspect that that special handling could allow us to support an index-only
scan even when having the whole-row Var instead of the CRE in the
targetlist.


I am not able to understand this. Can you please provide an example?


Sorry, my explanation was not good.  Let me try.  We can't create an 
inherited index on the whole-row reference in a partitioned table, but 
actually, we can directly create the corresponding index on a child table:


postgres=# create table pt (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table ptp1 (b text, a int check (a in (1)));
CREATE TABLE
postgres=# alter table pt attach partition ptp1 for values in (1);
ALTER TABLE
postgres=# create index ptp1_conv_wr_idx on ptp1 ((ptp1::pt));
CREATE INDEX
postgres=# insert into pt values (1, 'foo');
INSERT 0 1
postgres=# select *, ptp1 as wr, ptp1::pt as conv_wr from ptp1;
  b  | a |   wr| conv_wr
-+---+-+-
 foo | 1 | (foo,1) | (1,foo)
(1 row)

In this example, the value of the whole-row reference to the child table 
ptp1 for that record is ('foo',1), and that of the index expression for 
that record is (1,'foo').  Those have different column orders, but the 
latter could be mapped to the former by a technique like 
do_convert_tuple.  So, what I suspect is: by producing the value of the 
whole-row reference from that of the index expression like that, we 
could support index-only scans with such an index in the case where we 
have the whole-row reference in the targetlist, not the index expression 
itself.



(Having said that, I'm not 100% sure we need to solve that
problem when we improve the planner, because there doesn't seem to me to be
enough use-case to justify making the code complicated for that.)  Anyway, I
think that that would be a matter of future versions of PG.


I am afraid that a fix which just works only till we change the other
parts of the system is useful only till that time. At that time, it
needs to be replaced with a different fix.


I agree on that point.


If that time is long
enough, that's ok. In this case, I agree that if we haven't heard from
users till now that they need to create indexes on whole-row
expressions, there's chance that we will never implement this feature.


I don't object to adding that feature to future versions of PG if required.


   So, I think it is safe to have whole-row
Vars instead of ConvertRowtypeExprs in the targetlist.



when it's looking for an expression, it finds a whole-row expression
so it think it needs to add a ConvertRowtypeExpr on that. But when the
plan is created, there is ConvertRowtypeExpr already, but there is no
way to know that a new ConvertRowtypeExpr is not needed anymore. So,
we may have two ConvertRowtypeExprs giving wrong results.



Maybe I'm missing something, but I don't think that we need to worry about
that, because in the approach I proposed, we only add CREs above whole-row
Vars in the targetlists for subplans of an Append/MergeAppend for a
partitioned relation at plan creation time.


There's a patch in an adjacent thread started by David Rowley to rip
out Append/MergeAppend when there is only one subplan. So, your
solution won't work there.


Thanks for sharing that information!  I skimmed the thread.  I haven't 
yet caught up with all the discussions there, so I think I'm missing 
something, but it looks like that we haven't yet reached any consensus 
on the way to go.  In my opinion, I like the approach mentioned in [1]. 
 And if we go that way, my patch seems to fit into that, because in 
that approach the Append/MergeAppend could be removed after adjusting 
the targetlists for its subplans in 
create_append_plan/create_merge_append_plan.  Anyway, I'd like to join 
in that work for PG12.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/1867.1509032831%40sss.pgh.pa.us

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-11 Thread Ashutosh Bapat

On Thu, Jul 12, 2018 at 9:02 AM, Etsuro Fujita
 wrote:
> (2018/07/11 20:02), Ashutosh Bapat wrote:
>>
>> On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
>>   wrote:
>>>
>>> Actually, even if we could create such an index on the child table and
>>> the
>>> targetlist had the ConvertRowtypeExpr, the planner would still not be
>>> able
>>> to use an index-only scan with that index; because check_index_only would
>>> not consider that an index-only scan is possible for that index because
>>> that
>>> index is an expression index and that function currently does not
>>> consider
>>> that index expressions are able to be returned back in an index-only
>>> scan.
>>> That behavior of the planner might be improved in future, though.
>
>
>> Right and when we do so, not having ConvertRowtypeExpr in the
>> targetlist will be a problem.
>
>
> Yeah, but I don't think that that's unsolvable; because in that case the CRE
> as an index expression could be converted back to the whole-row Var's
> rowtype by adding another CRE to the index expression for that conversion, I
> suspect that that special handling could allow us to support an index-only
> scan even when having the whole-row Var instead of the CRE in the
> targetlist.

I am not able to understand this. Can you please provide an example?

> (Having said that, I'm not 100% sure we need to solve that
> problem when we improve the planner, because there doesn't seem to me to be
> enough use-case to justify making the code complicated for that.)  Anyway, I
> think that that would be a matter of future versions of PG.

I am afraid that a fix which just works only till we change the other
parts of the system is useful only till that time. At that time, it
needs to be replaced with a different fix. If that time is long
enough, that's ok. In this case, I agree that if we haven't heard from
users till now that they need to create indexes on whole-row
expressions, there's chance that we will never implement this feature.

>
 At places in planner we match equivalence members
 to the targetlist entries. This matching will fail unexpectedly when
 ConvertRowtypeExpr is removed from a child's targetlist. But again I
 couldn't reproduce a problem when such a mismatch arises.
>>>
>>>
>>>
>>> IIUC, I don't think the planner assumes that for an equivalence member
>>> there
>>> is an matching entry for that member in the targetlist; what I think the
>>> planner assumes is: an equivalence member is able to be computed from
>>> expressions in the targetlist.
>>
>>
>> This is true. However,
>>
>>>   So, I think it is safe to have whole-row
>>> Vars instead of ConvertRowtypeExprs in the targetlist.
>>
>>
>> when it's looking for an expression, it finds a whole-row expression
>> so it think it needs to add a ConvertRowtypeExpr on that. But when the
>> plan is created, there is ConvertRowtypeExpr already, but there is no
>> way to know that a new ConvertRowtypeExpr is not needed anymore. So,
>> we may have two ConvertRowtypeExprs giving wrong results.
>
>
> Maybe I'm missing something, but I don't think that we need to worry about
> that, because in the approach I proposed, we only add CREs above whole-row
> Vars in the targetlists for subplans of an Append/MergeAppend for a
> partitioned relation at plan creation time.

There's a patch in an adjacent thread started by David Rowley to rip
out Append/MergeAppend when there is only one subplan. So, your
solution won't work there.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-11 Thread Etsuro Fujita


(2018/07/11 20:02), Ashutosh Bapat wrote:

On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
  wrote:

Actually, even if we could create such an index on the child table and the
targetlist had the ConvertRowtypeExpr, the planner would still not be able
to use an index-only scan with that index; because check_index_only would
not consider that an index-only scan is possible for that index because that
index is an expression index and that function currently does not consider
that index expressions are able to be returned back in an index-only scan.
That behavior of the planner might be improved in future, though.



Right and when we do so, not having ConvertRowtypeExpr in the
targetlist will be a problem.


Yeah, but I don't think that that's unsolvable; because in that case the 
CRE as an index expression could be converted back to the whole-row 
Var's rowtype by adding another CRE to the index expression for that 
conversion, I suspect that that special handling could allow us to 
support an index-only scan even when having the whole-row Var instead of 
the CRE in the targetlist.  (Having said that, I'm not 100% sure we need 
to solve that problem when we improve the planner, because there doesn't 
seem to me to be enough use-case to justify making the code complicated 
for that.)  Anyway, I think that that would be a matter of future 
versions of PG.



At places in planner we match equivalence members
to the targetlist entries. This matching will fail unexpectedly when
ConvertRowtypeExpr is removed from a child's targetlist. But again I
couldn't reproduce a problem when such a mismatch arises.



IIUC, I don't think the planner assumes that for an equivalence member there
is an matching entry for that member in the targetlist; what I think the
planner assumes is: an equivalence member is able to be computed from
expressions in the targetlist.


This is true. However,


  So, I think it is safe to have whole-row
Vars instead of ConvertRowtypeExprs in the targetlist.


when it's looking for an expression, it finds a whole-row expression
so it think it needs to add a ConvertRowtypeExpr on that. But when the
plan is created, there is ConvertRowtypeExpr already, but there is no
way to know that a new ConvertRowtypeExpr is not needed anymore. So,
we may have two ConvertRowtypeExprs giving wrong results.


Maybe I'm missing something, but I don't think that we need to worry 
about that, because in the approach I proposed, we only add CREs above 
whole-row Vars in the targetlists for subplans of an Append/MergeAppend 
for a partitioned relation at plan creation time.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-11 Thread Ashutosh Bapat

On Wed, Jul 11, 2018 at 1:23 PM, Etsuro Fujita
 wrote:
>
>
> Actually, even if we could create such an index on the child table and the
> targetlist had the ConvertRowtypeExpr, the planner would still not be able
> to use an index-only scan with that index; because check_index_only would
> not consider that an index-only scan is possible for that index because that
> index is an expression index and that function currently does not consider
> that index expressions are able to be returned back in an index-only scan.
> That behavior of the planner might be improved in future, though.
>
>> Pathkey points to an equivalence class, which contains equivalence
>> members. A parent equivalence class member containing a whole-row
>> reference gets translated into a child equivalence member containing a
>> ConvertRowtypeExpr.

Right and when we do so, not having ConvertRowtypeExpr in the
targetlist will be a problem.

>
>
> I think so too.
>
>> At places in planner we match equivalence members
>> to the targetlist entries. This matching will fail unexpectedly when
>> ConvertRowtypeExpr is removed from a child's targetlist. But again I
>> couldn't reproduce a problem when such a mismatch arises.
>
>
> IIUC, I don't think the planner assumes that for an equivalence member there
> is an matching entry for that member in the targetlist; what I think the
> planner assumes is: an equivalence member is able to be computed from
> expressions in the targetlist.

This is true. However,

>  So, I think it is safe to have whole-row
> Vars instead of ConvertRowtypeExprs in the targetlist.

when it's looking for an expression, it finds a whole-row expression
so it think it needs to add a ConvertRowtypeExpr on that. But when the
plan is created, there is ConvertRowtypeExpr already, but there is no
way to know that a new ConvertRowtypeExpr is not needed anymore. So,
we may have two ConvertRowtypeExprs giving wrong results.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-11 Thread Etsuro Fujita


(2018/07/10 19:58), Ashutosh Bapat wrote:

On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita
  wrote:

(2018/07/09 20:43), Ashutosh Bapat wrote:

At the cost of having targetlist being type inconsistent. I don't have
any testcase either to show that that's a problem in practice.



As I said before, I don't see any issue in having such a targetlist, but I
might be missing something, so I'd like to discuss a bit more about that.
Could you tell me the logic/place in the PG code where you think the problem
might occur.  (IIRC, you mentioned something about that before (pathkeys? or
index-only scans?), but sorry, I don't understand that.)


IIUC, index-only scan will be used when the all the required columns
are covered by an index. If there is an index on the whole-row
reference of the parent, it will be translated into a
ConvertRowtypeExpr of the child when an index on the child is created.


I think so too.


If the targetlist doesn't have ConvertRowtypeExpr, as your patch does,
the planner won't be able to use such an index on the child table. But
I couldn't create an index with a whole-row reference in it. So, I
think this isn't possible right now.


Actually, even if we could create such an index on the child table and 
the targetlist had the ConvertRowtypeExpr, the planner would still not 
be able to use an index-only scan with that index; because 
check_index_only would not consider that an index-only scan is possible 
for that index because that index is an expression index and that 
function currently does not consider that index expressions are able to 
be returned back in an index-only scan.  That behavior of the planner 
might be improved in future, though.



Pathkey points to an equivalence class, which contains equivalence
members. A parent equivalence class member containing a whole-row
reference gets translated into a child equivalence member containing a
ConvertRowtypeExpr.


I think so too.


At places in planner we match equivalence members
to the targetlist entries. This matching will fail unexpectedly when
ConvertRowtypeExpr is removed from a child's targetlist. But again I
couldn't reproduce a problem when such a mismatch arises.


IIUC, I don't think the planner assumes that for an equivalence member 
there is an matching entry for that member in the targetlist; what I 
think the planner assumes is: an equivalence member is able to be 
computed from expressions in the targetlist.  So, I think it is safe to 
have whole-row Vars instead of ConvertRowtypeExprs in the targetlist.


Thanks for the explanation!

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-10 Thread Ashutosh Bapat

On Tue, Jul 10, 2018 at 9:08 AM, Etsuro Fujita
 wrote:
> (2018/07/09 20:43), Ashutosh Bapat wrote:
>>>
>>>
>>> I don't have any numbers right now, so that is nothing but a concern. But
>>> as
>>> I said in a previous email, in the approach I proposed, we don't need to
>>> spend extra cycles where partitioning is not involved.  I think that is a
>>> good thing in itself.  No?
>>
>>
>> At the cost of having targetlist being type inconsistent. I don't have
>> any testcase either to show that that's a problem in practice. So,
>> it's a trade-off of a concern vs concern.
>
>
> As I said before, I don't see any issue in having such a targetlist, but I
> might be missing something, so I'd like to discuss a bit more about that.
> Could you tell me the logic/place in the PG code where you think the problem
> might occur.  (IIRC, you mentioned something about that before (pathkeys? or
> index-only scans?), but sorry, I don't understand that.)

IIUC, index-only scan will be used when the all the required columns
are covered by an index. If there is an index on the whole-row
reference of the parent, it will be translated into a
ConvertRowtypeExpr of the child when an index on the child is created.
If the targetlist doesn't have ConvertRowtypeExpr, as your patch does,
the planner won't be able to use such an index on the child table. But
I couldn't create an index with a whole-row reference in it. So, I
think this isn't possible right now. But in future if we allow
creating index on whole-row reference or

Pathkey points to an equivalence class, which contains equivalence
members. A parent equivalence class member containing a whole-row
reference gets translated into a child equivalence member containing a
ConvertRowtypeExpr. At places in planner we match equivalence members
to the targetlist entries. This matching will fail unexpectedly when
ConvertRowtypeExpr is removed from a child's targetlist. But again I
couldn't reproduce a problem when such a mismatch arises.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-09 Thread Etsuro Fujita


(2018/07/09 20:43), Ashutosh Bapat wrote:


I don't have any numbers right now, so that is nothing but a concern. But as
I said in a previous email, in the approach I proposed, we don't need to
spend extra cycles where partitioning is not involved.  I think that is a
good thing in itself.  No?


At the cost of having targetlist being type inconsistent. I don't have
any testcase either to show that that's a problem in practice. So,
it's a trade-off of a concern vs concern.


As I said before, I don't see any issue in having such a targetlist, but 
I might be missing something, so I'd like to discuss a bit more about 
that.  Could you tell me the logic/place in the PG code where you think 
the problem might occur.  (IIRC, you mentioned something about that 
before (pathkeys? or index-only scans?), but sorry, I don't understand 
that.)



Apart from that, in your approach there are extra cycles spent in
traversing the targetlist to add ConvertRowtypeExpr, albeit only when
there is a whole-row expression in the targetlist, when creating
plans. That's not there in my patch.


Right.  That's unfortunate, but I think that that would be still better 
than needing to spent extra cycles where partitioning is not involved. 
And ISTM that the traversing cost is not that large compared to the cost 
we pay before that for query planning for a partitionwise join.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-09 Thread Ashutosh Bapat

>
> I don't have any numbers right now, so that is nothing but a concern. But as
> I said in a previous email, in the approach I proposed, we don't need to
> spend extra cycles where partitioning is not involved.  I think that is a
> good thing in itself.  No?

At the cost of having targetlist being type inconsistent. I don't have
any testcase either to show that that's a problem in practice. So,
it's a trade-off of a concern vs concern.

Apart from that, in your approach there are extra cycles spent in
traversing the targetlist to add ConvertRowtypeExpr, albeit only when
there is a whole-row expression in the targetlist, when creating
plans. That's not there in my patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-09 Thread Etsuro Fujita


(2018/07/09 20:06), Ashutosh Bapat wrote:

On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita
  wrote:

As I said, we do spend cycles in that function testing whether a node
is Aggref or not even when the query doesn't have aggregates or
grouping OR spend cycles in testing whether a node is a PlaceHolderVar
when the query doesn't produce any. So, I don't see any problem with
spending a few cycles testing whether a node is ConvertRowtypeExpr or
not when a ConvertRowtypeExpr is not in the query or command. That's
not a huge performance trouble. I would be happy to change my mind, if
you show me performance different with and without this patch in
planning. I haven't seen any.



I have to admit that the case in [1] wouldn't affect the performance, but my
concern is that there might be some cases where the test affects
performance.


What are those cases? Can you please provide any numbers supporting your claim?


I don't have any numbers right now, so that is nothing but a concern. 
But as I said in a previous email, in the approach I proposed, we don't 
need to spend extra cycles where partitioning is not involved.  I think 
that is a good thing in itself.  No?


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-09 Thread Ashutosh Bapat

On Mon, Jul 9, 2018 at 4:33 PM, Etsuro Fujita
 wrote:
>>
>>
>> As I said, we do spend cycles in that function testing whether a node
>> is Aggref or not even when the query doesn't have aggregates or
>> grouping OR spend cycles in testing whether a node is a PlaceHolderVar
>> when the query doesn't produce any. So, I don't see any problem with
>> spending a few cycles testing whether a node is ConvertRowtypeExpr or
>> not when a ConvertRowtypeExpr is not in the query or command. That's
>> not a huge performance trouble. I would be happy to change my mind, if
>> you show me performance different with and without this patch in
>> planning. I haven't seen any.
>
>
> I have to admit that the case in [1] wouldn't affect the performance, but my
> concern is that there might be some cases where the test affects
> performance.

What are those cases? Can you please provide any numbers supporting your claim?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-09 Thread Etsuro Fujita


(2018/07/06 20:20), Ashutosh Bapat wrote:

On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita
  wrote:

(2018/07/04 21:37), Ashutosh Bapat wrote:

On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
   wrote:



Let me explain about that: 1) my patch won't apply that function to a child
if its top parent is an appendrel built from a UNION ALL subquery, even
though the child is a partition of a partitioned table pulled up from a leaf
subquery into the parent query, like lpt_p1, and 2) my patch will apply that
function to a child if its top parent is a partitioned table, whether or not
the partitioned table is involved in a partitionwise join.  By #1, we avoid
the adjustment work at plan creation time, as explained above.  It might be
worth trying to be smarter about #2 (for example, in the case of a join of a
partitioned table and a non-partitioned table, since we don't consider a
partitionwise join for that join, it's better to not apply that function to
partitions of the partitioned table, to avoid the adjustment work at plan
creation time), but ISTM we don't have enough information to be smarter.


That looks like a kludge to me rather than a proper fix. It's not
clear to me as to when somebody can expect ConvertRowtypeExpr in the
targetlist and when don't while creating paths and to an extent plans.
For example, inside add_paths_to_append_rel() or in
apply_scanjoin_target_to_paths() or for that matter any path creation
or plan creation function, we will sometimes get targetlists with
ConvertRowtypeExpr() and sometime not. How do we know which is when.


That is known from a new flag "has_child_wholerow" added to RelOptInfo 
to indicate whether an other relation's reltarget has child wholerow 
Vars instead of ConvertRowtypeExprs.  That flag is initialized to false 
and only set to true in build_childrel_tlist if it adds child wholerow 
Vars to the reltarget of a given other relation.  Though, I don't think 
we need to be conscious about whether the reltarget has child wholerow 
Vars or ConvertRowtypeExprs when creating paths; we would just create 
paths based on the reltarget.  What we need to be careful about is when 
creating an Append/MergeAppend plan; we have to adjust the subplan's 
tlist so child wholerow Vars in that tlist are converted to the 
corresponding ConvertRowtypeExprs, if that tlist has those Vars, which 
is also known from the new flag.



I think the biggest issue in the original patch you proposed is that we
spend extra cycles where partitioning is not involved, which is the
biggest
reason why I think the original patch isn't the right way to go.



When there are no partitions involved, there won't be any
ConvertRowtypeExprs there which means the function
is_converted_whole_row_reference() would just return from the first
line checking IsA() and nullness of node.



Really?  IIRC, with the original patch we would spend extra cycles in a
non-partitioned inheritance processing [1].


As I said, we do spend cycles in that function testing whether a node
is Aggref or not even when the query doesn't have aggregates or
grouping OR spend cycles in testing whether a node is a PlaceHolderVar
when the query doesn't produce any. So, I don't see any problem with
spending a few cycles testing whether a node is ConvertRowtypeExpr or
not when a ConvertRowtypeExpr is not in the query or command. That's
not a huge performance trouble. I would be happy to change my mind, if
you show me performance different with and without this patch in
planning. I haven't seen any.


I have to admit that the case in [1] wouldn't affect the performance, 
but my concern is that there might be some cases where the test affects 
performance.  In contrast, in the approach I proposed, we wouldn't need 
to worry about that because the approach does not modify code that is 
not related to partitioning.  I think that's a good thing.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-06 Thread Ashutosh Bapat

On Fri, Jul 6, 2018 at 4:29 PM, Etsuro Fujita
 wrote:
> (2018/07/04 21:37), Ashutosh Bapat wrote:
>>
>> On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
>>   wrote:
>>>
>>> (2018/07/04 19:04), Ashutosh Bapat wrote:

 On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
wrote:
>
> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>
>> +   if (enable_partitionwise_join&&
>> rel->top_parent_is_partitioned)
>> +   {
>> +   build_childrel_tlist(root, rel, childrel, 1,);
>> +   }
>>
>> Why do we need rel->top_parent_is_partitioned? If a relation is
>> partitioned (if (rel->part_scheme), it's either the top parent or is
>> partition of some other partitioned table. In either case this
>> condition will be true.
>
>
> This would be needed to avoid unnecessarily applying
> build_childrel_tlist
> to
> child rels of a partitioned table for which we don't consider
> partitionwise
> join.  Consider:
>
> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
> CREATE TABLE
> postgres=# create table lpt_p1 partition of lpt for values in (1);
> CREATE TABLE
> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
> CREATE TABLE
> postgres=# create table test (c1 int, c2 text);
> CREATE TABLE
> postgres=# explain verbose select * from (select * from lpt union all
> select
> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>
>
>>> I might misunderstand your words, but in the above example the patch
>>> doesn't
>>> apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
>>> because we can avoid adjusting the tlists for the corresponding subplans
>>> at
>>> plan creation time so that whole-row Vars in the tlists are transformed
>>> into
>>> CREs.  I think the overhead of the adjustment is not that big, but not
>>> zero,
>>> so it would be worth avoiding applying build_childrel_tlist to partitions
>>> if
>>> the top parent won't participate in a partitionwise-join at all.
>>
>>
>> I don't think that answers my question. When we join lpt with test,
>> your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
>> when join between lpt and test is not going to use partition-wise
>> join. Why?
>
>
> Maybe my explanation including the example was not good.  Sorry about that,
> but my patch will *not* apply build_childrel_tlist to lpt_p1 and lpt_p2
> since the top parent of lpt_p1 and lpt_p2 is the UNION ALL subquery and
> hence not a partitioned table (ie, we have
> rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2).
>
>> As per your explanation, the condition "if
>> (enable_partitionwise_join&&   rel->top_parent_is_partitioned)" is
>> used to avoid applying build_childrel_tlist() when partition-wise join
>> won't be possible. But it's not covering all the cases.
>
>
> Let me explain about that: 1) my patch won't apply that function to a child
> if its top parent is an appendrel built from a UNION ALL subquery, even
> though the child is a partition of a partitioned table pulled up from a leaf
> subquery into the parent query, like lpt_p1, and 2) my patch will apply that
> function to a child if its top parent is a partitioned table, whether or not
> the partitioned table is involved in a partitionwise join.  By #1, we avoid
> the adjustment work at plan creation time, as explained above.  It might be
> worth trying to be smarter about #2 (for example, in the case of a join of a
> partitioned table and a non-partitioned table, since we don't consider a
> partitionwise join for that join, it's better to not apply that function to
> partitions of the partitioned table, to avoid the adjustment work at plan
> creation time), but ISTM we don't have enough information to be smarter.

That looks like a kludge to me rather than a proper fix. It's not
clear to me as to when somebody can expect ConvertRowtypeExpr in the
targetlist and when don't while creating paths and to an extent plans.
For example, inside add_paths_to_append_rel() or in
apply_scanjoin_target_to_paths() or for that matter any path creation
or plan creation function, we will sometimes get targetlists with
ConvertRowtypeExpr() and sometime not. How do we know which is when.


>
 A ConvertRowtypeExpr
 is used to fix this inconsistency. That's why I chose to use
 pull_var_clause() as a place to fix the problem and not fix
 ConvertRowtypeExpr in targetlist itself.
>>>
>>>
>>>
>>> I think the biggest issue in the original patch you proposed is that we
>>> spend extra cycles where partitioning is not involved, which is the
>>> biggest
>>> reason why I think the original patch isn't the right way to go.
>>
>>
>> When there are no partitions involved, there won't be any
>> ConvertRowtypeExprs there which means the function
>> is_converted_whole_row_reference() would just return from the first
>> line checking IsA() and nullness of node.
>
>
>

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-06 Thread Etsuro Fujita


(2018/07/04 21:37), Ashutosh Bapat wrote:

On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
  wrote:

(2018/07/04 19:04), Ashutosh Bapat wrote:

On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
   wrote:

(2018/06/22 22:54), Ashutosh Bapat wrote:

+   if (enable_partitionwise_join&&
rel->top_parent_is_partitioned)
+   {
+   build_childrel_tlist(root, rel, childrel, 1,);
+   }

Why do we need rel->top_parent_is_partitioned? If a relation is
partitioned (if (rel->part_scheme), it's either the top parent or is
partition of some other partitioned table. In either case this
condition will be true.



This would be needed to avoid unnecessarily applying build_childrel_tlist
to
child rels of a partitioned table for which we don't consider
partitionwise
join.  Consider:

postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
CREATE TABLE
postgres=# create table lpt_p1 partition of lpt for values in (1);
CREATE TABLE
postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
CREATE TABLE
postgres=# create table test (c1 int, c2 text);
CREATE TABLE
postgres=# explain verbose select * from (select * from lpt union all
select
* from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);



I might misunderstand your words, but in the above example the patch doesn't
apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
because we can avoid adjusting the tlists for the corresponding subplans at
plan creation time so that whole-row Vars in the tlists are transformed into
CREs.  I think the overhead of the adjustment is not that big, but not zero,
so it would be worth avoiding applying build_childrel_tlist to partitions if
the top parent won't participate in a partitionwise-join at all.


I don't think that answers my question. When we join lpt with test,
your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
when join between lpt and test is not going to use partition-wise
join. Why?


Maybe my explanation including the example was not good.  Sorry about 
that, but my patch will *not* apply build_childrel_tlist to lpt_p1 and 
lpt_p2 since the top parent of lpt_p1 and lpt_p2 is the UNION ALL 
subquery and hence not a partitioned table (ie, we have 
rel->top_parent_is_partitioned=false for lpt_p1 and lpt_p2).



As per your explanation, the condition "if
(enable_partitionwise_join&&   rel->top_parent_is_partitioned)" is
used to avoid applying build_childrel_tlist() when partition-wise join
won't be possible. But it's not covering all the cases.


Let me explain about that: 1) my patch won't apply that function to a 
child if its top parent is an appendrel built from a UNION ALL subquery, 
even though the child is a partition of a partitioned table pulled up 
from a leaf subquery into the parent query, like lpt_p1, and 2) my patch 
will apply that function to a child if its top parent is a partitioned 
table, whether or not the partitioned table is involved in a 
partitionwise join.  By #1, we avoid the adjustment work at plan 
creation time, as explained above.  It might be worth trying to be 
smarter about #2 (for example, in the case of a join of a partitioned 
table and a non-partitioned table, since we don't consider a 
partitionwise join for that join, it's better to not apply that function 
to partitions of the partitioned table, to avoid the adjustment work at 
plan creation time), but ISTM we don't have enough information to be 
smarter.



An
in-between state will produce a hell lot of confusion for any further
optimization. Whenever we change the code around partition-wise
operations in future, we will have to check whether or not a given
child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
have mentioned earlier, I am also not comfortable with the targetlist
of child relations being type inconsistent with that of the parent,
which is a fundamental rule in inheritance. Worst keep them
inconsistent during path creation and make them consistent at the time
of creating plans. A child's whole-row Var has datatype of the child
where as that of parent has datatype of parent.



I don't see any critical issue here.  Could you elaborate a bit more on that
point?


I think breaking a fundamental rule like this itself is critical. But
interestingly I am not able to find a case where it becomes a problem.
But may be I haven't tried enough. And the question is if it's not
required to have the targetlists type consistent, why even bother with
ConvertRowtypeExpr addition there? We can use your approach of adding
ConvertRowtypeExpr at the end in all the cases.


I think that the tlist of a (not-the-topmost) child relation doesn't 
need to be type-consistent with that of the parent; it has only to 
include all Vars that are needed for higher joinquals and final output 
to the parent appendrel.  (In other words, I think we can build the 
tlist in the same manner as we build tlists of base or join relations in 
the main join tree.)  On

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-05 Thread Etsuro Fujita


(2018/07/05 22:04), Ashutosh Bapat wrote:

On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita
  wrote:

Another thing I noticed is: actually, we don't produce an Append with child
Gathers in apply_scanjoin_target_to_paths, which I thought we would do that
in the case of scanjoin_target_parallel_safe=false, but I noticed I was
wrong.  Sorry for that.  The reason is because in that case, even if we
create new partial Append paths with child Gathers, we don't run
generate_gatehr_paths for the newly created partial paths at the end of that
function shown below, since the parent's consider_parallel flag is set to
false in that case:



Hmm. I don't think that's a great idea since such a plan might turn
out cheaper esp. when there are very few children which could use
parallel query and parallel append is possible at the top parent.


Actually, I don't think we can create such a Parallel Append, because in 
the case where we have scanjoin_target_parallel_safe=false, child paths 
would not be parallel-safe anymore as those paths are rewritten to have 
parallel-unsafe targets in the recursion of apply_scanjoin_target_to_paths.



anyway, that's what it is today. But I think, we shouldn't write code
assuming that an Append will never see a Gather below it. We might see
some plans like that in future and need to change your code.


I agree that we might allow such an Append in future, but I'm not sure 
we should write code for that in PG11, because 1) that would make code 
complicated than necessary and 2) that might cause overlooking of 
unexpected behavior of the planner; if the planner created such an 
Append erroneously, we would run that by that code without noticing 
that.  To reduce the maintenance burden and the overlooking risk, I 
think it's better to extend code for such a plan when we allow that plan 
for a partitionwise join.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-05 Thread Ashutosh Bapat

On Thu, Jul 5, 2018 at 4:45 PM, Etsuro Fujita
 wrote:
>
>
> In the case where scanjoin_target_parallel_safe=false, we actually will
> consider such parallel paths using the existing partial paths for the parent
> appendrel in the code path shown in a previous email (note: we would already
> have done generate_partitionwise_join_paths or set_append_rel_pathlist for
> the parent appendrel in query_planner.)  For such a parallel path, we might
> need to do a projection in the Gather node for generating the SRF-free
> scan/join target, but I think that would be still efficient.  So, I'm not
> sure that we really need to create child Gathers and generate an Append with
> those Gathers, as you mentioned.

I don't think it will be Append of Gathers, but Append where one of
the subplans is Gather. I don't think that possibility is completely
eliminated.

>
> Another thing I noticed is: actually, we don't produce an Append with child
> Gathers in apply_scanjoin_target_to_paths, which I thought we would do that
> in the case of scanjoin_target_parallel_safe=false, but I noticed I was
> wrong.  Sorry for that.  The reason is because in that case, even if we
> create new partial Append paths with child Gathers, we don't run
> generate_gatehr_paths for the newly created partial paths at the end of that
> function shown below, since the parent's consider_parallel flag is set to
> false in that case:
>
> /*
>  * Consider generating Gather or Gather Merge paths.  We must only do
> this
>  * if the relation is parallel safe, and we don't do it for child rels
> to
>  * avoid creating multiple Gather nodes within the same plan. We must do
>  * this after all paths have been generated and before set_cheapest,
> since
>  * one of the generated paths may turn out to be the cheapest one.
>  */
> if (rel->consider_parallel && !IS_OTHER_REL(rel))
> generate_gather_paths(root, rel, false);

Hmm. I don't think that's a great idea since such a plan might turn
out cheaper esp. when there are very few children which could use
parallel query and parallel append is possible at the top parent. But
anyway, that's what it is today. But I think, we shouldn't write code
assuming that an Append will never see a Gather below it. We might see
some plans like that in future and need to change your code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-05 Thread Etsuro Fujita


(2018/07/05 20:15), Etsuro Fujita wrote:

(2018/07/04 19:11), Ashutosh Bapat wrote:

On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita
 wrote:


I don't produce a test case where the plan is an Append with Gather
subplans, but I'm not sure that it's a good thing to allow us to
consider
such a plan because in that plan, each Gather would try to grab its
own pool
of workers. Am I missing something?


If the overall join can not use parallelism, having individual child
joins use parallel query might turn out to be efficient even if done
one child at a time. Parallel append drastically reduces the cases
where something like could be useful, but I don't think we can
theoretically eliminate the need for such a plan.


In the case where scanjoin_target_parallel_safe=false, we actually will
consider such parallel paths using the existing partial paths for the
parent appendrel in the code path shown in a previous email (note: we
would already have done generate_partitionwise_join_paths or
set_append_rel_pathlist for the parent appendrel in query_planner.) For
such a parallel path, we might need to do a projection in the Gather
node for generating the SRF-free scan/join target, but I think that
would be still efficient. So, I'm not sure that we really need to create
child Gathers and generate an Append with those Gathers, as you mentioned.

Another thing I noticed is: actually, we don't produce an Append with
child Gathers in apply_scanjoin_target_to_paths, which I thought we
would do that in the case of scanjoin_target_parallel_safe=false, but I
noticed I was wrong. Sorry for that. The reason is because in that case,
even if we create new partial Append paths with child Gathers, we don't
run generate_gatehr_paths for the newly created partial paths at the end
of that function shown below, since the parent's consider_parallel flag
is set to false in that case:

/*
* Consider generating Gather or Gather Merge paths. We must only do this
* if the relation is parallel safe, and we don't do it for child rels to
* avoid creating multiple Gather nodes within the same plan. We must do
* this after all paths have been generated and before set_cheapest, since
* one of the generated paths may turn out to be the cheapest one.
*/
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_gather_paths(root, rel, false);


One thing to add is: ISTM we need some cleanup for 
apply_scanjoin_target_to_paths.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-05 Thread Etsuro Fujita


(2018/07/04 19:11), Ashutosh Bapat wrote:

On Wed, Jul 4, 2018 at 3:36 PM, Etsuro Fujita
  wrote:


I don't produce a test case where the plan is an Append with Gather
subplans, but I'm not sure that it's a good thing to allow us to consider
such a plan because in that plan, each Gather would try to grab its own pool
of workers.  Am I missing something?


If the overall join can not use parallelism, having individual child
joins use parallel query might turn out to be efficient even if done
one child at a time. Parallel append drastically reduces the cases
where something like could be useful, but I don't think we can
theoretically eliminate the need for such a plan.


In the case where scanjoin_target_parallel_safe=false, we actually will 
consider such parallel paths using the existing partial paths for the 
parent appendrel in the code path shown in a previous email (note: we 
would already have done generate_partitionwise_join_paths or 
set_append_rel_pathlist for the parent appendrel in query_planner.)  For 
such a parallel path, we might need to do a projection in the Gather 
node for generating the SRF-free scan/join target, but I think that 
would be still efficient.  So, I'm not sure that we really need to 
create child Gathers and generate an Append with those Gathers, as you 
mentioned.


Another thing I noticed is: actually, we don't produce an Append with 
child Gathers in apply_scanjoin_target_to_paths, which I thought we 
would do that in the case of scanjoin_target_parallel_safe=false, but I 
noticed I was wrong.  Sorry for that.  The reason is because in that 
case, even if we create new partial Append paths with child Gathers, we 
don't run generate_gatehr_paths for the newly created partial paths at 
the end of that function shown below, since the parent's 
consider_parallel flag is set to false in that case:


/*
 * Consider generating Gather or Gather Merge paths.  We must only 
do this
 * if the relation is parallel safe, and we don't do it for child 
rels to
 * avoid creating multiple Gather nodes within the same plan. We 
must do
 * this after all paths have been generated and before 
set_cheapest, since

 * one of the generated paths may turn out to be the cheapest one.
 */
if (rel->consider_parallel && !IS_OTHER_REL(rel))
generate_gather_paths(root, rel, false);

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-04 Thread Ashutosh Bapat

On Wed, Jul 4, 2018 at 5:36 PM, Etsuro Fujita
 wrote:
> (2018/07/04 19:04), Ashutosh Bapat wrote:
>>
>> On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
>>   wrote:
>>>
>>> (2018/06/22 22:54), Ashutosh Bapat wrote:

 +   if (enable_partitionwise_join&&
 rel->top_parent_is_partitioned)
 +   {
 +   build_childrel_tlist(root, rel, childrel, 1,);
 +   }

 Why do we need rel->top_parent_is_partitioned? If a relation is
 partitioned (if (rel->part_scheme), it's either the top parent or is
 partition of some other partitioned table. In either case this
 condition will be true.
>>>
>>>
>>>
>>> This would be needed to avoid unnecessarily applying build_childrel_tlist
>>> to
>>> child rels of a partitioned table for which we don't consider
>>> partitionwise
>>> join.  Consider:
>>>
>>> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
>>> CREATE TABLE
>>> postgres=# create table lpt_p1 partition of lpt for values in (1);
>>> CREATE TABLE
>>> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
>>> CREATE TABLE
>>> postgres=# create table test (c1 int, c2 text);
>>> CREATE TABLE
>>> postgres=# explain verbose select * from (select * from lpt union all
>>> select
>>> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);
>>
>>
>> --- plan clipped
>>
>>>
>>> In this example, the top parent is not a partitioned table and we don't
>>> need
>>> to consider partitionwise join for that, so we don't need to apply that
>>> to
>>> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).
>>
>>
>> FWIW, the test is not sufficient. In the above example, a simple
>> select * from lpt inner join test where lpt.c1 = test.c1 would not use
>> partition-wise join technique. Why not to avoid build_childrel_tlist()
>> in that case as well?
>
>
> I might misunderstand your words, but in the above example the patch doesn't
> apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for that is
> because we can avoid adjusting the tlists for the corresponding subplans at
> plan creation time so that whole-row Vars in the tlists are transformed into
> CREs.  I think the overhead of the adjustment is not that big, but not zero,
> so it would be worth avoiding applying build_childrel_tlist to partitions if
> the top parent won't participate in a partitionwise-join at all.

I don't think that answers my question. When we join lpt with test,
your patch will apply build_childrel_tlist() to lpt_p1 and lpt_p2 even
when join between lpt and test is not going to use partition-wise
join. Why? As per your explanation, the condition "if
(enable_partitionwise_join &&  rel->top_parent_is_partitioned)" is
used to avoid applying build_childrel_tlist() when partition-wise join
won't be possible. But it's not covering all the cases.

>
>> Worst we can change the criteria to use
>> partition-wise join in future e.g. above case would use partition-wise
>> join by 1. merging lpt_p1 into corresponding partition of lpt so that
>> ss is partitioned and 2. repartitioning test or joining test with each
>> partition of lpt separately. In those cases the changes you are doing
>> here need to be reverted and put somewhere else. There's already a
>> patch to reverse the order of join and grouping out there. How would
>> this work with that.
>
>
> Interesting, but that seems like a matter of PG12+.

Yes, and I don't think it's a good idea to change this fix for PG12+ again.

>
>> In general I think set_append_rel_size() or build_simple_rel() is not
>> the place where we should decide whether the given relation will
>> participate in a PWJ. Also we should not selectively add a
>> ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
>> We should do it for all the child rels or shouldn't do it at all.
>
>
> One thing I thought was to apply build_childrel_tlist for all the child rels
> whether its top parent is a partitioned table or not, but as I mentioned
> above, that would incur needless overhead for adjusting the tlists for the
> child rels that don't involve in a partitionwise join such as lpt_p1 and
> lpt_p2, which I think is not good.
>
>> An
>> in-between state will produce a hell lot of confusion for any further
>> optimization. Whenever we change the code around partition-wise
>> operations in future, we will have to check whether or not a given
>> child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
>> have mentioned earlier, I am also not comfortable with the targetlist
>> of child relations being type inconsistent with that of the parent,
>> which is a fundamental rule in inheritance. Worst keep them
>> inconsistent during path creation and make them consistent at the time
>> of creating plans. A child's whole-row Var has datatype of the child
>> where as that of parent has datatype of parent.
>
>
> I don't see any critical issue here.  Could you elaborate a bit more on that
> point?

I think breaking a

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-04 Thread Etsuro Fujita


(2018/07/04 19:04), Ashutosh Bapat wrote:

On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
  wrote:

(2018/06/22 22:54), Ashutosh Bapat wrote:

+   if (enable_partitionwise_join&&   rel->top_parent_is_partitioned)
+   {
+   build_childrel_tlist(root, rel, childrel, 1,);
+   }

Why do we need rel->top_parent_is_partitioned? If a relation is
partitioned (if (rel->part_scheme), it's either the top parent or is
partition of some other partitioned table. In either case this
condition will be true.



This would be needed to avoid unnecessarily applying build_childrel_tlist to
child rels of a partitioned table for which we don't consider partitionwise
join.  Consider:

postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
CREATE TABLE
postgres=# create table lpt_p1 partition of lpt for values in (1);
CREATE TABLE
postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
CREATE TABLE
postgres=# create table test (c1 int, c2 text);
CREATE TABLE
postgres=# explain verbose select * from (select * from lpt union all select
* from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);


--- plan clipped



In this example, the top parent is not a partitioned table and we don't need
to consider partitionwise join for that, so we don't need to apply that to
the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).


FWIW, the test is not sufficient. In the above example, a simple
select * from lpt inner join test where lpt.c1 = test.c1 would not use
partition-wise join technique. Why not to avoid build_childrel_tlist()
in that case as well?


I might misunderstand your words, but in the above example the patch 
doesn't apply build_childrel_tlist to lpt_p1 and lpt_p2.  The reason for 
that is because we can avoid adjusting the tlists for the corresponding 
subplans at plan creation time so that whole-row Vars in the tlists are 
transformed into CREs.  I think the overhead of the adjustment is not 
that big, but not zero, so it would be worth avoiding applying 
build_childrel_tlist to partitions if the top parent won't participate 
in a partitionwise-join at all.



Worst we can change the criteria to use
partition-wise join in future e.g. above case would use partition-wise
join by 1. merging lpt_p1 into corresponding partition of lpt so that
ss is partitioned and 2. repartitioning test or joining test with each
partition of lpt separately. In those cases the changes you are doing
here need to be reverted and put somewhere else. There's already a
patch to reverse the order of join and grouping out there. How would
this work with that.


Interesting, but that seems like a matter of PG12+.


In general I think set_append_rel_size() or build_simple_rel() is not
the place where we should decide whether the given relation will
participate in a PWJ. Also we should not selectively add a
ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
We should do it for all the child rels or shouldn't do it at all.


One thing I thought was to apply build_childrel_tlist for all the child 
rels whether its top parent is a partitioned table or not, but as I 
mentioned above, that would incur needless overhead for adjusting the 
tlists for the child rels that don't involve in a partitionwise join 
such as lpt_p1 and lpt_p2, which I think is not good.



An
in-between state will produce a hell lot of confusion for any further
optimization. Whenever we change the code around partition-wise
operations in future, we will have to check whether or not a given
child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
have mentioned earlier, I am also not comfortable with the targetlist
of child relations being type inconsistent with that of the parent,
which is a fundamental rule in inheritance. Worst keep them
inconsistent during path creation and make them consistent at the time
of creating plans. A child's whole-row Var has datatype of the child
where as that of parent has datatype of parent.


I don't see any critical issue here.  Could you elaborate a bit more on 
that point?



A ConvertRowtypeExpr
is used to fix this inconsistency. That's why I chose to use
pull_var_clause() as a place to fix the problem and not fix
ConvertRowtypeExpr in targetlist itself.


I think the biggest issue in the original patch you proposed is that we 
spend extra cycles where partitioning is not involved, which is the 
biggest reason why I think the original patch isn't the right way to go.



I am also not comfortable in just avoiding ConvertRowtypeExprs in
targetlist and leave them as is in the conditions and ECs etc. This
means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
targetlist will fail at the time of path creation but will succeed at
the time of plan creation.

This is turning more invasive that my approach of fixing it through PVC.


Sorry, I don't understand this.  Could you show me places where the 
problem happens?



+   /* No work if the

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-04 Thread Etsuro Fujita


(2018/07/02 18:46), Etsuro Fujita wrote:

(2018/06/22 23:58), Robert Haas wrote:

And, in general, it seems to me that we want
to produce the right outputs at the lowest possible level of the plan
tree. For instance, suppose that one of the relations being scanned
is not parallel-safe but the others are. Then, we could end up with a
plan like this:

Append
-> Seq Scan on temp_rela
-> Gather
-> Parallel Seq Scan on thing1
-> Gather
-> Parallel Seq Scan on thing2

If a projection is required to convert the row type expression, we
certainly want that to get pushed below the Gather, not to happen at
the Gather level itself.


IIUC, we currently don't consider such a plan for partition-wise join;
we'll only consider gathering partial paths for the parent appendrel.


While updating the patch, I noticed that I'm wrong here.  IIUC, 
apply_scanjoin_target_to_paths would allow us to consider such an Append 
for a partitioned relation when scanjoin_target_parallel_safe=false, as 
it generates new Append paths by recursively adjusting all partitions 
for which we call generate_gather_paths in that case as shown below:


/*
 * If the scan/join target is not parallel-safe, partial paths cannot
 * generate it.
 */
if (!scanjoin_target_parallel_safe)
{
/*
 * Since we can't generate the final scan/join target, this is our
 * last opportunity to use any partial paths that exist.  We 
don't do
 * this if the case where the target is parallel-safe, since we 
will

 * be able to generate superior paths by doing it after the final
 * scan/join target has been applied.
 *
 * Note that this may invalidate rel->cheapest_total_path, so 
we must
 * not rely on it after this point without first calling 
set_cheapest.

 */
generate_gather_paths(root, rel, false);

/* Can't use parallel query above this level. */
rel->partial_pathlist = NIL;
rel->consider_parallel = false;
}

I don't produce a test case where the plan is an Append with Gather 
subplans, but I'm not sure that it's a good thing to allow us to 
consider such a plan because in that plan, each Gather would try to grab 
its own pool of workers.  Am I missing something?


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-04 Thread Ashutosh Bapat

On Fri, Jun 29, 2018 at 6:21 PM, Etsuro Fujita
 wrote:
> (2018/06/22 22:54), Ashutosh Bapat wrote:
>>
>> I have started reviewing the patch.
>
>
> Thanks for the review!
>
>> +   if (enable_partitionwise_join&&  rel->top_parent_is_partitioned)
>> +   {
>> +   build_childrel_tlist(root, rel, childrel, 1,);
>> +   }
>>
>> Why do we need rel->top_parent_is_partitioned? If a relation is
>> partitioned (if (rel->part_scheme), it's either the top parent or is
>> partition of some other partitioned table. In either case this
>> condition will be true.
>
>
> This would be needed to avoid unnecessarily applying build_childrel_tlist to
> child rels of a partitioned table for which we don't consider partitionwise
> join.  Consider:
>
> postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
> CREATE TABLE
> postgres=# create table lpt_p1 partition of lpt for values in (1);
> CREATE TABLE
> postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
> CREATE TABLE
> postgres=# create table test (c1 int, c2 text);
> CREATE TABLE
> postgres=# explain verbose select * from (select * from lpt union all select
> * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

--- plan clipped

>
> In this example, the top parent is not a partitioned table and we don't need
> to consider partitionwise join for that, so we don't need to apply that to
> the child rel lpt_p1 of the partitioned table lpt (and the table lpt_p2).

FWIW, the test is not sufficient. In the above example, a simple
select * from lpt inner join test where lpt.c1 = test.c1 would not use
partition-wise join technique. Why not to avoid build_childrel_tlist()
in that case as well? Worst we can change the criteria to use
partition-wise join in future e.g. above case would use partition-wise
join by 1. merging lpt_p1 into corresponding partition of lpt so that
ss is partitioned and 2. repartitioning test or joining test with each
partition of lpt separately. In those cases the changes you are doing
here need to be reverted and put somewhere else. There's already a
patch to reverse the order of join and grouping out there. How would
this work with that.

In general I think set_append_rel_size() or build_simple_rel() is not
the place where we should decide whether the given relation will
participate in a PWJ. Also we should not selectively add a
ConvertRowtypeExpr on top of a whole-row Var of some a child relation.
We should do it for all the child rels or shouldn't do it at all. An
in-between state will produce a hell lot of confusion for any further
optimization. Whenever we change the code around partition-wise
operations in future, we will have to check whether or not a given
child rel has its whole-row Var embedded in ConvertRowtypeExpr. As I
have mentioned earlier, I am also not comfortable with the targetlist
of child relations being type inconsistent with that of the parent,
which is a fundamental rule in inheritance. Worst keep them
inconsistent during path creation and make them consistent at the time
of creating plans. A child's whole-row Var has datatype of the child
where as that of parent has datatype of parent. A ConvertRowtypeExpr
is used to fix this inconsistency. That's why I chose to use
pull_var_clause() as a place to fix the problem and not fix
ConvertRowtypeExpr in targetlist itself.

I am also not comfortable in just avoiding ConvertRowtypeExprs in
targetlist and leave them as is in the conditions and ECs etc. This
means searching a ConvertRowtypeExpr e.g. for creating pathkeys in
targetlist will fail at the time of path creation but will succeed at
the time of plan creation.

This is turning more invasive that my approach of fixing it through PVC.

>
>> +   /* No work if the child plan is an Append or MergeAppend */
>> +   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
>> +   return;
>>
>> Why? Probably it's related to the answer to the first question, But I
>> don't see the connection. Given that partition-wise join will be
>> applicable level by level, we need to recurse in
>> adjust_subplan_tlist().
>
>
> I don't think so; I think if the child plan is an Append or MergeAppend, the
> tlist for each subplan of the Append or MergeAppend would have already been
> adjusted while create_plan_recurse before we are called here.

Ok. Thanks for the clarification. I think we should add a comment.

>
>> +   /* The child plan should be able to do projection */
>> +   Assert(is_projection_capable_plan(subplan));
>> +
>> Again why? A MergeAppend's subplan could be a Sort plan, which will
>> not be projection capable.
>
>
> IIUC, since we are called here right after create_plan_recurse, the child
> plan would be a scan or join unless it's neither Append nor MergeAppend.  So
> it should be projection-capable.  Maybe I'm missing something, though.

That's not true. add_paths_to_append_rel() adds sort paths on scan if
necessary and creates merge append path.

-- 
Best Wishes,
Ashutosh Bapat

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-03 Thread Etsuro Fujita


(2018/07/04 1:35), Andres Freund wrote:

On 2018-06-22 10:58:28 -0400, Robert Haas wrote:

On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita
  wrote:

Here is a patch for that.



I think this approach is going to run into trouble if the level at
which we have to apply the ConvertRowTypeExpr happens not to be a
projection-capable node.


What's the plan forward here?


I still think that this approach would be the right way to go, so I plan 
to polish the patch.



This has been an open item for quite a
while. Robert, are you in agreement with this approach on a high level?
Who is going to drive this forward?


I'm willing to do that if it's OK, but I'd like to get more feedback 
from Robert, Ashutosh or anyone else.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-03 Thread Andres Freund

Hi Ashutosh, Etsuro, Robert,

On 2018-06-22 10:58:28 -0400, Robert Haas wrote:
> On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita
>  wrote:
> > Here is a patch for that.
> >
> > * As I said upthread, the patch makes code much more simple; I removed all
> > the changes to setrefs.c added by the partitionwise-join patch.  I also
> > simplified the logic for building a tlist for a child-join rel. The original
> > PWJ computes attr_needed data even for child rels, and build the tlist for a
> > child-join by passing to build_joinrel_tlist that data for input child rels
> > for the child-join.  But I think that's redundant, and it's more
> > straightforward to apply adjust_appendrel_attrs to the parent-join's tlist
> > to get the child-join's tlist.  So, I changed that way, which made
> > unnecessary all the changes to build_joinrel_tlist and placeholder.c added
> > by the PWJ patch, so I removed those as well.
> >
> > * The patch contains all of the regression tests in the original patch
> > proposed by Ashutosh.
> 
> I think this approach is going to run into trouble if the level at
> which we have to apply the ConvertRowTypeExpr happens not to be a
> projection-capable node.

What's the plan forward here? This has been an open item for quite a
while. Robert, are you in agreement with this approach on a high level?
Who is going to drive this forward?

Greetings,

Andres Freund

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-02 Thread Etsuro Fujita


(2018/06/22 23:58), Robert Haas wrote:

I think this approach is going to run into trouble if the level at
which we have to apply the ConvertRowTypeExpr happens not to be a
projection-capable node.


Actually, the level we have to do that would be a child rel of a 
partitioned table or a child join of a partitionwise join, so the plan 
node would be a scan or join plan unless the child rel or child join is 
itself partitioned, in which case the plan node would be 
Append/MergeAppend and the proposed patch recursively would apply that 
conversion to child plans for the Append/MergeAppend).



And, in general, it seems to me that we want
to produce the right outputs at the lowest possible level of the plan
tree.  For instance, suppose that one of the relations being scanned
is not parallel-safe but the others are.  Then, we could end up with a
plan like this:

Append
->  Seq Scan on temp_rela
->  Gather
   ->  Parallel Seq Scan on thing1
->  Gather
   ->  Parallel Seq Scan on thing2

If a projection is required to convert the row type expression, we
certainly want that to get pushed below the Gather, not to happen at
the Gather level itself.


IIUC, we currently don't consider such a plan for partition-wise join; 
we'll only consider gathering partial paths for the parent appendrel. 
So, I'm not sure we need to take into account that when applying the 
ConvertRowtypeExpr.  Maybe I'm missing something, though.



We certainly don't want it to happen at the
Append level, which can't even handle it.


I think so too.

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-29 Thread Etsuro Fujita


(2018/06/22 22:54), Ashutosh Bapat wrote:

I have started reviewing the patch.


Thanks for the review!


+   if (enable_partitionwise_join&&  rel->top_parent_is_partitioned)
+   {
+   build_childrel_tlist(root, rel, childrel, 1,);
+   }

Why do we need rel->top_parent_is_partitioned? If a relation is
partitioned (if (rel->part_scheme), it's either the top parent or is
partition of some other partitioned table. In either case this
condition will be true.


This would be needed to avoid unnecessarily applying 
build_childrel_tlist to child rels of a partitioned table for which we 
don't consider partitionwise join.  Consider:


postgres=# create table lpt (c1 int, c2 text) partition by list (c1);
CREATE TABLE
postgres=# create table lpt_p1 partition of lpt for values in (1);
CREATE TABLE
postgres=# create table lpt_p2 (c1 int check (c1 in (2)), c2 text);
CREATE TABLE
postgres=# create table test (c1 int, c2 text);
CREATE TABLE
postgres=# explain verbose select * from (select * from lpt union all 
select * from lpt_p2) ss(c1, c2) inner join test on (ss.c1 = test.c1);

 QUERY PLAN



 Merge Join  (cost=289.92..538.20 rows=16129 width=72)
   Output: lpt_p1.c1, lpt_p1.c2, test.c1, test.c2
   Merge Cond: (test.c1 = lpt_p1.c1)
   ->  Sort  (cost=88.17..91.35 rows=1270 width=36)
 Output: test.c1, test.c2
 Sort Key: test.c1
 ->  Seq Scan on public.test  (cost=0.00..22.70 rows=1270 width=36)
   Output: test.c1, test.c2
   ->  Sort  (cost=201.74..208.09 rows=2540 width=36)
 Output: lpt_p1.c1, lpt_p1.c2
 Sort Key: lpt_p1.c1
 ->  Append  (cost=0.00..58.10 rows=2540 width=36)
   ->  Seq Scan on public.lpt_p1  (cost=0.00..22.70 
rows=1270 width=

36)
 Output: lpt_p1.c1, lpt_p1.c2
   ->  Seq Scan on public.lpt_p2  (cost=0.00..22.70 
rows=1270 width=

36)
 Output: lpt_p2.c1, lpt_p2.c2
(16 rows)

In this example, the top parent is not a partitioned table and we don't 
need to consider partitionwise join for that, so we don't need to apply 
that to the child rel lpt_p1 of the partitioned table lpt (and the table 
lpt_p2).



+   /* No work if the child plan is an Append or MergeAppend */
+   if (IsA(subplan, Append) || IsA(subplan, MergeAppend))
+   return;

Why? Probably it's related to the answer to the first question, But I
don't see the connection. Given that partition-wise join will be
applicable level by level, we need to recurse in
adjust_subplan_tlist().


I don't think so; I think if the child plan is an Append or MergeAppend, 
the tlist for each subplan of the Append or MergeAppend would have 
already been adjusted while create_plan_recurse before we are called here.



+   /* The child plan should be able to do projection */
+   Assert(is_projection_capable_plan(subplan));
+
Again why? A MergeAppend's subplan could be a Sort plan, which will
not be projection capable.


IIUC, since we are called here right after create_plan_recurse, the 
child plan would be a scan or join unless it's neither Append nor 
MergeAppend.  So it should be projection-capable.  Maybe I'm missing 
something, though.



This is not a full review. I will continue reviewing it further.


Thanks again.

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-22 Thread Robert Haas

On Tue, Jun 19, 2018 at 8:46 AM, Etsuro Fujita
 wrote:
> Here is a patch for that.
>
> * As I said upthread, the patch makes code much more simple; I removed all
> the changes to setrefs.c added by the partitionwise-join patch.  I also
> simplified the logic for building a tlist for a child-join rel. The original
> PWJ computes attr_needed data even for child rels, and build the tlist for a
> child-join by passing to build_joinrel_tlist that data for input child rels
> for the child-join.  But I think that's redundant, and it's more
> straightforward to apply adjust_appendrel_attrs to the parent-join's tlist
> to get the child-join's tlist.  So, I changed that way, which made
> unnecessary all the changes to build_joinrel_tlist and placeholder.c added
> by the PWJ patch, so I removed those as well.
>
> * The patch contains all of the regression tests in the original patch
> proposed by Ashutosh.

I think this approach is going to run into trouble if the level at
which we have to apply the ConvertRowTypeExpr happens not to be a
projection-capable node.  And, in general, it seems to me that we want
to produce the right outputs at the lowest possible level of the plan
tree.  For instance, suppose that one of the relations being scanned
is not parallel-safe but the others are.  Then, we could end up with a
plan like this:

Append
-> Seq Scan on temp_rela
-> Gather
  -> Parallel Seq Scan on thing1
-> Gather
  -> Parallel Seq Scan on thing2

If a projection is required to convert the row type expression, we
certainly want that to get pushed below the Gather, not to happen at
the Gather level itself.  We certainly don't want it to happen at the
Append level, which can't even handle it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-19 Thread Etsuro Fujita


(2018/06/15 20:56), Etsuro Fujita wrote:

Actually, I've
created a patch implementing that proposal.



But I think that patch needs more work, so I'm
planning to post it next week.


Here is a patch for that.

* As I said upthread, the patch makes code much more simple; I removed 
all the changes to setrefs.c added by the partitionwise-join patch.  I 
also simplified the logic for building a tlist for a child-join rel. 
The original PWJ computes attr_needed data even for child rels, and 
build the tlist for a child-join by passing to build_joinrel_tlist that 
data for input child rels for the child-join.  But I think that's 
redundant, and it's more straightforward to apply adjust_appendrel_attrs 
to the parent-join's tlist to get the child-join's tlist.  So, I changed 
that way, which made unnecessary all the changes to build_joinrel_tlist 
and placeholder.c added by the PWJ patch, so I removed those as well.


* The patch contains all of the regression tests in the original patch 
proposed by Ashutosh.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8215,8222  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8215,8223 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8269,8294  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
  
  -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
 ->  Append
   ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
   ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
  (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8270,8305 
  
  -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!QUERY PLAN   
! 
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
 ->  Append
   ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) FULL JOIN (public.ftprt2_p1 t2)
   ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) FULL JOIN (public.ftprt2_p2 t2)
  (7 rows)
  
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!wr   |   wr   
  +
   (0,0,) | (0,0,)
+  (50,50,0001)   | 
+  (100,100,0002) | 
   (150,150,0003) | (150,150,0003)
+  (200,200,0004) | 
   (250,250,0005) | (250,250,0005)
+  (300,300,0006) | 
+  (350,350,0007) | 
   (400,400,0008) | (400,400,0008)
!  (450,450,0009) | 
! | (75,75,0001)
! | (225,225,0004)
! | (325,325,0006)
!

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-15 Thread Etsuro Fujita


(2018/06/14 22:37), Ashutosh Bapat wrote:

On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita
  wrote:

(2018/06/07 19:42), Ashutosh Bapat wrote:

On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
   wrote:

Yeah, but I mean we modify set_append_rel_size so that we only map a parent
wholerow Var in the parent tlist to a child wholerow Var in the child's
tlist; parent wholerow Vars in the parent's other expressions such as
conditions are transformed into CREs as-is.


What happens to a PlaceHolderVar containing whole-row reference when
that appears in a condition and/or targetlist.


Whole-row Vars contained in such PHV's expressions will be mapped into 
ConvertRowtypeExprs with adjust_appendrel_attrs.



   I don't think the
tlists for the children need to have their wholerows transformed into the
corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
propose is to 1) map a parent wholerow Var simply to a child wholerow
Var,
instead (otherwise, the same as adjust_appendrel_attrs), when building
the
reltarget for a child in set_append_rel_size, 2) build the tlists for
child
joins using those children's wholerow Vars at path creation time, and 3)
adjust those wholerow Vars in the tlists for subpaths in the chosen
AppendPath so that those are transformed into the corresponding
ConvertRowtypeExprs, at plan creation time (ie, in
create_append_plan/create_merge_append_plan).  IIUC, this would not
require
any changes to pull_var_clause as proposed by the patch.  This would not
require any changes to postgres_fdw as proposed by the patch, either.  In
addition, this would make unnecessary the code added to setrefs.c by the
partitionwise-join patch.  Maybe I'm missing something though.



Not translating whole-row expressions to ConvertRowtypeExpr before
creating paths can lead to a number of anomalies. For example,

1. if there's an index on the whole-row expression of child,
translating parent's whole-row expression to child's whole-row
expression as is will lead to using that index, when in reality the it
can not be used since the condition/ORDER BY clause (which originally
referred the parent's whole-row expression) require the child's
whole-row reassembled as parent's whole-row.
2. Constraints on child'd whole-row expression, will be used to prune
a child when they can not be used since the original condition was on
parent' whole-row expression and not that of the child.
3. Equivalence classes will think that a child whole-row expression
(without ConvertRowtypeExpr) is equivalent to an expression which is
part of the same equivalence class as the parent' whole-row
expression.



Is that still possible when we only map a parent wholerow Var in the
parent's tlist to a child wholerow Var in the child's tlist?


Yes. 1 will affect the choice of index-only scans. We use pathkey
comparisons at a number of places so tlist going out of sync with
equivalence classes can affect a number of places.


Sorry, I'm still not sure if #1 is really a problem.  Could you show me 
an example for that?



build_tlist_to_deparse() is used to deparse targetlist at the time of
creating paths,as well as during the planning. According to your
proposal we will build different tlists during path creation and plan
creation. That doesn't look good.


Maybe my explanation was not enough, but my proposal will guarantee that 
the FDW can build the same tlist at path creation time and plan creation 
time because the RelOptInfo's reltarget from which the tlist is created 
doesn't change at all.



Apart from that your proposal of
changing a child's targetlist at the time of plan creation to use CRE
doesn't help since the original problem as described in [1] happens at
the time of creating plans as described in [1].


I think my proposal will address the original issue.  Actually, I've 
created a patch implementing that proposal.  That's still WIP, but it 
works well even for these cases (and makes code much more simple, as 
mentioned above!)  But I think that patch needs more work, so I'm 
planning to post it next week.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-14 Thread Ashutosh Bapat

On Mon, Jun 11, 2018 at 4:05 PM, Etsuro Fujita
 wrote:
> (2018/06/07 19:42), Ashutosh Bapat wrote:
>>
>> On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
>>   wrote:
>>>
>>> Since I'm not 100% sure that that is the right way to go, I've been
>>> rethinking how to fix this issue.  Yet another idea I came up with to fix
>>> this is to redesign the handling of the tlists for children in the
>>> partitioning case.  Currently, we build the reltarget for a child by
>>> applying adjust_appendrel_attrs to the reltarget for its parent in
>>> set_append_rel_size, which maps a wholerow Var referring to the parent
>>> rel
>>> to a ConvertRowtypeExpr that translates a wholerow of the child rel into
>>> a
>>> wholerow of the parent rel's rowtype.  This works well for the
>>> non-partitioned inheritance case, but makes complicated the code for
>>> handling the partitioning case especially when planning
>>> partitionwise-joins.
>>> And I think this would be the root cause of this issue.
>>
>>
>> Although true, this is not only about targetlist. Even the whole-row
>> expressions in the conditions, equivalence classes and other
>> planner/optimizer data structures are translated to
>> ConvertRowtypeExpr.
>
>
> Yeah, but I mean we modify set_append_rel_size so that we only map a parent
> wholerow Var in the parent tlist to a child wholerow Var in the child's
> tlist; parent wholerow Vars in the parent's other expressions such as
> conditions are transformed into CREs as-is.

What happens to a PlaceHolderVar containing whole-row reference when
that appears in a condition and/or targetlist.

>
>
>>>   I don't think the
>>> tlists for the children need to have their wholerows transformed into the
>>> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
>>> propose is to 1) map a parent wholerow Var simply to a child wholerow
>>> Var,
>>> instead (otherwise, the same as adjust_appendrel_attrs), when building
>>> the
>>> reltarget for a child in set_append_rel_size, 2) build the tlists for
>>> child
>>> joins using those children's wholerow Vars at path creation time, and 3)
>>> adjust those wholerow Vars in the tlists for subpaths in the chosen
>>> AppendPath so that those are transformed into the corresponding
>>> ConvertRowtypeExprs, at plan creation time (ie, in
>>> create_append_plan/create_merge_append_plan).  IIUC, this would not
>>> require
>>> any changes to pull_var_clause as proposed by the patch.  This would not
>>> require any changes to postgres_fdw as proposed by the patch, either.  In
>>> addition, this would make unnecessary the code added to setrefs.c by the
>>> partitionwise-join patch.  Maybe I'm missing something though.
>>
>>
>> Not translating whole-row expressions to ConvertRowtypeExpr before
>> creating paths can lead to a number of anomalies. For example,
>>
>> 1. if there's an index on the whole-row expression of child,
>> translating parent's whole-row expression to child's whole-row
>> expression as is will lead to using that index, when in reality the it
>> can not be used since the condition/ORDER BY clause (which originally
>> referred the parent's whole-row expression) require the child's
>> whole-row reassembled as parent's whole-row.
>> 2. Constraints on child'd whole-row expression, will be used to prune
>> a child when they can not be used since the original condition was on
>> parent' whole-row expression and not that of the child.
>> 3. Equivalence classes will think that a child whole-row expression
>> (without ConvertRowtypeExpr) is equivalent to an expression which is
>> part of the same equivalence class as the parent' whole-row
>> expression.
>
>
> Is that still possible when we only map a parent wholerow Var in the
> parent's tlist to a child wholerow Var in the child's tlist?

Yes. 1 will affect the choice of index-only scans. We use pathkey
comparisons at a number of places so tlist going out of sync with
equivalence classes can affect a number of places.

build_tlist_to_deparse() is used to deparse targetlist at the time of
creating paths,as well as during the planning. According to your
proposal we will build different tlists during path creation and plan
creation. That doesn't look good. Apart from that your proposal of
changing a child's targetlist at the time of plan creation to use CRE
doesn't help since the original problem as described in [1] happens at
the time of creating plans as described in [1].

The parent's whole-row reference has a different data type than
child's whole-row reference. If we do not cover child's whole-row
reference by CRE, the parent and child targetlists will go out of sync
as far as the type of individual columns are concerned. That too
doesn't look good to me.

[1] 
https://www.postgresql.org/message-id/CAFjFpRfD5=lsoqg1euf-vmpm9koezbbgjo68d9wkubjxop_...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-11 Thread Etsuro Fujita


(2018/06/07 19:42), Ashutosh Bapat wrote:

On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
  wrote:

Since I'm not 100% sure that that is the right way to go, I've been
rethinking how to fix this issue.  Yet another idea I came up with to fix
this is to redesign the handling of the tlists for children in the
partitioning case.  Currently, we build the reltarget for a child by
applying adjust_appendrel_attrs to the reltarget for its parent in
set_append_rel_size, which maps a wholerow Var referring to the parent rel
to a ConvertRowtypeExpr that translates a wholerow of the child rel into a
wholerow of the parent rel's rowtype.  This works well for the
non-partitioned inheritance case, but makes complicated the code for
handling the partitioning case especially when planning partitionwise-joins.
And I think this would be the root cause of this issue.


Although true, this is not only about targetlist. Even the whole-row
expressions in the conditions, equivalence classes and other
planner/optimizer data structures are translated to
ConvertRowtypeExpr.


Yeah, but I mean we modify set_append_rel_size so that we only map a 
parent wholerow Var in the parent tlist to a child wholerow Var in the 
child's tlist; parent wholerow Vars in the parent's other expressions 
such as conditions are transformed into CREs as-is.



  I don't think the
tlists for the children need to have their wholerows transformed into the
corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
propose is to 1) map a parent wholerow Var simply to a child wholerow Var,
instead (otherwise, the same as adjust_appendrel_attrs), when building the
reltarget for a child in set_append_rel_size, 2) build the tlists for child
joins using those children's wholerow Vars at path creation time, and 3)
adjust those wholerow Vars in the tlists for subpaths in the chosen
AppendPath so that those are transformed into the corresponding
ConvertRowtypeExprs, at plan creation time (ie, in
create_append_plan/create_merge_append_plan).  IIUC, this would not require
any changes to pull_var_clause as proposed by the patch.  This would not
require any changes to postgres_fdw as proposed by the patch, either.  In
addition, this would make unnecessary the code added to setrefs.c by the
partitionwise-join patch.  Maybe I'm missing something though.


Not translating whole-row expressions to ConvertRowtypeExpr before
creating paths can lead to a number of anomalies. For example,

1. if there's an index on the whole-row expression of child,
translating parent's whole-row expression to child's whole-row
expression as is will lead to using that index, when in reality the it
can not be used since the condition/ORDER BY clause (which originally
referred the parent's whole-row expression) require the child's
whole-row reassembled as parent's whole-row.
2. Constraints on child'd whole-row expression, will be used to prune
a child when they can not be used since the original condition was on
parent' whole-row expression and not that of the child.
3. Equivalence classes will think that a child whole-row expression
(without ConvertRowtypeExpr) is equivalent to an expression which is
part of the same equivalence class as the parent' whole-row
expression.


Is that still possible when we only map a parent wholerow Var in the 
parent's tlist to a child wholerow Var in the child's tlist?


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-07 Thread Ashutosh Bapat

On Wed, Jun 6, 2018 at 5:00 PM, Etsuro Fujita
 wrote:
> (2018/05/18 16:33), Etsuro Fujita wrote:
>>
>> Other than pull_var_clause things,
>> the updated version looks good to me, so I'll mark this as Ready for
>> Committer.
>
>
> Since I'm not 100% sure that that is the right way to go, I've been
> rethinking how to fix this issue.  Yet another idea I came up with to fix
> this is to redesign the handling of the tlists for children in the
> partitioning case.  Currently, we build the reltarget for a child by
> applying adjust_appendrel_attrs to the reltarget for its parent in
> set_append_rel_size, which maps a wholerow Var referring to the parent rel
> to a ConvertRowtypeExpr that translates a wholerow of the child rel into a
> wholerow of the parent rel's rowtype.  This works well for the
> non-partitioned inheritance case, but makes complicated the code for
> handling the partitioning case especially when planning partitionwise-joins.
> And I think this would be the root cause of this issue.

Although true, this is not only about targetlist. Even the whole-row
expressions in the conditions, equivalence classes and other
planner/optimizer data structures are translated to
ConvertRowtypeExpr.

>  I don't think the
> tlists for the children need to have their wholerows transformed into the
> corresponding ConvertRowtypeExprs *at this point*, so what I'd like to
> propose is to 1) map a parent wholerow Var simply to a child wholerow Var,
> instead (otherwise, the same as adjust_appendrel_attrs), when building the
> reltarget for a child in set_append_rel_size, 2) build the tlists for child
> joins using those children's wholerow Vars at path creation time, and 3)
> adjust those wholerow Vars in the tlists for subpaths in the chosen
> AppendPath so that those are transformed into the corresponding
> ConvertRowtypeExprs, at plan creation time (ie, in
> create_append_plan/create_merge_append_plan).  IIUC, this would not require
> any changes to pull_var_clause as proposed by the patch.  This would not
> require any changes to postgres_fdw as proposed by the patch, either.  In
> addition, this would make unnecessary the code added to setrefs.c by the
> partitionwise-join patch.  Maybe I'm missing something though.

Not translating whole-row expressions to ConvertRowtypeExpr before
creating paths can lead to a number of anomalies. For example,

1. if there's an index on the whole-row expression of child,
translating parent's whole-row expression to child's whole-row
expression as is will lead to using that index, when in reality the it
can not be used since the condition/ORDER BY clause (which originally
referred the parent's whole-row expression) require the child's
whole-row reassembled as parent's whole-row.
2. Constraints on child'd whole-row expression, will be used to prune
a child when they can not be used since the original condition was on
parent' whole-row expression and not that of the child.
3. Equivalence classes will think that a child whole-row expression
(without ConvertRowtypeExpr) is equivalent to an expression which is
part of the same equivalence class as the parent' whole-row
expression.

Given these serious problems, I don't think we could afford not to
cover a child whole-row reference in ConvertRowtypeExpr.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-06-06 Thread Etsuro Fujita


(2018/05/18 16:33), Etsuro Fujita wrote:

Other than pull_var_clause things,
the updated version looks good to me, so I'll mark this as Ready for
Committer.


Since I'm not 100% sure that that is the right way to go, I've been 
rethinking how to fix this issue.  Yet another idea I came up with to 
fix this is to redesign the handling of the tlists for children in the 
partitioning case.  Currently, we build the reltarget for a child by 
applying adjust_appendrel_attrs to the reltarget for its parent in 
set_append_rel_size, which maps a wholerow Var referring to the parent 
rel to a ConvertRowtypeExpr that translates a wholerow of the child rel 
into a wholerow of the parent rel's rowtype.  This works well for the 
non-partitioned inheritance case, but makes complicated the code for 
handling the partitioning case especially when planning 
partitionwise-joins.  And I think this would be the root cause of this 
issue.  I don't think the tlists for the children need to have their 
wholerows transformed into the corresponding ConvertRowtypeExprs *at 
this point*, so what I'd like to propose is to 1) map a parent wholerow 
Var simply to a child wholerow Var, instead (otherwise, the same as 
adjust_appendrel_attrs), when building the reltarget for a child in 
set_append_rel_size, 2) build the tlists for child joins using those 
children's wholerow Vars at path creation time, and 3) adjust those 
wholerow Vars in the tlists for subpaths in the chosen AppendPath so 
that those are transformed into the corresponding ConvertRowtypeExprs, 
at plan creation time (ie, in 
create_append_plan/create_merge_append_plan).  IIUC, this would not 
require any changes to pull_var_clause as proposed by the patch.  This 
would not require any changes to postgres_fdw as proposed by the patch, 
either.  In addition, this would make unnecessary the code added to 
setrefs.c by the partitionwise-join patch.  Maybe I'm missing something 
though.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-18 Thread Etsuro Fujita


(2018/05/17 23:19), Ashutosh Bapat wrote:

On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita
  wrote:

(2018/05/16 22:49), Ashutosh Bapat wrote:

On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
   wrote:

However, considering that
pull_var_clause is used in many places where partitioning is *not*
involved,
I still think it's better to avoid spending extra cycles needed only for
partitioning in that function.



Right. The changes done in inheritance_planner() imply that we can
encounter a ConvertRowtypeExpr even in the expressions for a relation
which is not a child relation in the translated query. It was a child
in the original query but after translating it becomes a parent and
has ConvertRowtypeExpr somewhere there.



Sorry, I don't understand this.  Could you show me such an example?


Even without inheritance we can induce a ConvertRowtypeExpr on a base
relation which is not "other" relation

create table parent (a int, b int);
create table child () inherits(parent);
alter table child add constraint whole_par_const check ((child::parent).a = 1);

There is no way to tell by looking at the parsed query whether
pull_var_clause() from StoreRelCheck() will encounter a
ConvertRowtypeExpr or not. We could check whether the tables involved
are inherited or not, but that's more expensive than
is_converted_whole_row_reference().


Yeah, ISTM that the inheritance test makes things worse.


So, there is no way to know
whether a given expression will have ConvertRowtypeExpr in it. That's
my understanding. But if we can device such a test, I am happy to
apply that test before or witin the call to pull_var_clause().

We don't need that expensive test if user has passed
PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
the shape of expression tree. It would cause more asymmetry in
pull_var_clause(), but we can live with it or change the order of
tests for all the three options. I will differ that to a committer.



Sounds more invasive.  Considering where we are in the development cycle for
PG11, I think it'd be better to address this by something like the approach
I proposed.  Anyway, +1 for asking the committer's opinion.


Thanks.


Let's ask that.


- On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

+extern bool
+is_converted_whole_row_reference(Node *node)

I think we should remove "extern" from the definition.


Done.



- On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:


I think we can fix this by adding another flag to indicate whether we
deparsed the first live column of the relation, as in the "first" bool flag
in deparseTargetList.


Thanks. Fixed.


Thanks for updating the patch set!  Other than pull_var_clause things, 
the updated version looks good to me, so I'll mark this as Ready for 
Committer.  As I said before, I think this issue should be addressed in 
advance of the PG11 release.


Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-17 Thread Ashutosh Bapat

On Thu, May 17, 2018 at 4:50 PM, Etsuro Fujita
 wrote:
> (2018/05/16 22:49), Ashutosh Bapat wrote:
>>
>> On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
>>   wrote:
>>>
>>> However, considering that
>>> pull_var_clause is used in many places where partitioning is *not*
>>> involved,
>>> I still think it's better to avoid spending extra cycles needed only for
>>> partitioning in that function.
>>
>>
>> Right. The changes done in inheritance_planner() imply that we can
>> encounter a ConvertRowtypeExpr even in the expressions for a relation
>> which is not a child relation in the translated query. It was a child
>> in the original query but after translating it becomes a parent and
>> has ConvertRowtypeExpr somewhere there.
>
>
> Sorry, I don't understand this.  Could you show me such an example?

Even without inheritance we can induce a ConvertRowtypeExpr on a base
relation which is not "other" relation

create table parent (a int, b int);
create table child () inherits(parent);
alter table child add constraint whole_par_const check ((child::parent).a = 1);

There is no way to tell by looking at the parsed query whether
pull_var_clause() from StoreRelCheck() will encounter a
ConvertRowtypeExpr or not. We could check whether the tables involved
are inherited or not, but that's more expensive than
is_converted_whole_row_reference().

>
>> So, there is no way to know
>> whether a given expression will have ConvertRowtypeExpr in it. That's
>> my understanding. But if we can device such a test, I am happy to
>> apply that test before or witin the call to pull_var_clause().
>>
>> We don't need that expensive test if user has passed
>> PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
>> the shape of expression tree. It would cause more asymmetry in
>> pull_var_clause(), but we can live with it or change the order of
>> tests for all the three options. I will differ that to a committer.
>
>
> Sounds more invasive.  Considering where we are in the development cycle for
> PG11, I think it'd be better to address this by something like the approach
> I proposed.  Anyway, +1 for asking the committer's opinion.

Thanks.

>
> - On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:
>
> +extern bool
> +is_converted_whole_row_reference(Node *node)
>
> I think we should remove "extern" from the definition.

Done.

>
> - On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>
>
> I think we can fix this by adding another flag to indicate whether we
> deparsed the first live column of the relation, as in the "first" bool flag
> in deparseTargetList.

Thanks. Fixed.

>
> One more thing to add is: the patch adds support for deparsing a
> ConvertRowtypeExpr that translates a wholerow of a childrel into a wholerow
> of its parentrel's rowtype, so by modifying is_foreign_expr accordingly, we
> could push down such CREs in JOIN ON conditions to the remote for example.
> But that would be an improvement, not a fix, so I think we should leave that
> for another patch for PG12.

Right.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


expr_ref_error_pwj_v9.tar.gz
Description: GNU Zip compressed data

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-17 Thread Etsuro Fujita


(2018/05/16 22:49), Ashutosh Bapat wrote:

On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
  wrote:

However, considering that
pull_var_clause is used in many places where partitioning is *not* involved,
I still think it's better to avoid spending extra cycles needed only for
partitioning in that function.


Right. The changes done in inheritance_planner() imply that we can
encounter a ConvertRowtypeExpr even in the expressions for a relation
which is not a child relation in the translated query. It was a child
in the original query but after translating it becomes a parent and
has ConvertRowtypeExpr somewhere there.


Sorry, I don't understand this.  Could you show me such an example?


So, there is no way to know
whether a given expression will have ConvertRowtypeExpr in it. That's
my understanding. But if we can device such a test, I am happy to
apply that test before or witin the call to pull_var_clause().

We don't need that expensive test if user has passed
PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
the shape of expression tree. It would cause more asymmetry in
pull_var_clause(), but we can live with it or change the order of
tests for all the three options. I will differ that to a committer.


Sounds more invasive.  Considering where we are in the development cycle 
for PG11, I think it'd be better to address this by something like the 
approach I proposed.  Anyway, +1 for asking the committer's opinion.



+   /* Construct the conversion map. */
+   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);

I think it'd be better to pass -1, not 0, as the typmod argument for that
function because that would be consistent with other places where we use
that function with named rowtypes (eg, ruleutils.c).


Done.


Thanks!


-is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
+is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
+   int *relno, int *colno)

-1 for that change; because 1) we use "var" for implying many things as in
eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the
name as-is.


I think pull_var_clause is the only place where we do that and I don't
like that very much. I also don't like is_subquery_var() name since
it's too specific. We might want that kind of functionality to ship
generic expressions and not just Vars in future. Usually we would be
searching for columns in the subquery targetlist so the name change
looks good to me. IIRC, the function was added one release back, so
backpatching pain, if any, won't be much. But I don't think we should
carry a misnomer for many releases to come. Let's differ this to a
committer.


OK


Here's set of updated patches.


Thanks for updating the patch set!  I noticed followings.  Sorry for not 
having noticed that before.


- On patch 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

+extern bool
+is_converted_whole_row_reference(Node *node)

I think we should remove "extern" from the definition.

- On patch 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:

+   /* Construct ROW expression according to the conversion map. */
+   appendStringInfoString(buf, "ROW(");
+   for (cnt = 0; cnt < parent_desc->natts; cnt++)
+   {
+   /* Ignore dropped columns. */
+   if (attrMap[cnt] == 0)
+   continue;
+
+   if (cnt > 0)
+   appendStringInfoString(buf, ", ");
+   deparseColumnRef(buf, child_var->varno, attrMap[cnt],
+planner_rt_fetch(child_var->varno, root),
+qualify_col);
+   }
+   appendStringInfoChar(buf, ')');

The result would start with ", " in the case where the cnt=0 column of 
the parent relation is a dropped one.  Here is an example:


postgres=# create table pt1 (a int, b int) partition by range (b);
CREATE TABLE
postgres=# alter table pt1 drop column a;
ALTER TABLE
postgres=# alter table pt1 add column a int;
ALTER TABLE
postgres=# create table loct11 (like pt1);
CREATE TABLE
postgres=# create foreign table pt1p1 partition of pt1 for values from 
(0) to (100) server loopback options (table_name 'loct11', 
use_remote_estimate 'true');

CREATE FOREIGN TABLE
postgres=# insert into pt1 select i, i from generate_series(0, 99, 2) i;
INSERT 0 50
postgres=# analyze pt1;
ANALYZE
postgres=# analyze pt1p1;
ANALYZE
postgres=# create table pt2 (a int, b int) partition by range (b);
CREATE TABLE
postgres=# create table loct21 (like pt2);
CREATE TABLE
postgres=# create foreign table pt2p1 partition of pt2 for values from 
(0) to (100) server loopback options (table_name 'loct21', 
use_remote_estimate 'true');

CREATE FOREIGN TABLE
postgres=# insert into pt2 select i, i from generate_series(0, 99, 3) i;
INSERT 0 34
postgres=# analyze pt2;
ANALYZE
postgres=# analyze pt2p1;
ANALYZE
postgres=# set enable_partitionwise_join to true;
SET
postgres=# explain verbose select pt1.* from pt1, pt2 where pt1.b = 
pt2.b for update

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-16 Thread Ashutosh Bapat

On Wed, May 16, 2018 at 4:01 PM, Etsuro Fujita
 wrote:
>
 So we could
 really minimize the code change and avoid the additional overhead caused
 by
 the is_converted_whole_row_reference test added to pull_var_clause.  I
 think
 the latter would be rather important, because PWJ is disabled by
 default.
 What do you think about that approach?
>>>
>>>
>>> Partition-wise join and partition-wise aggregation are disabled by
>>> default temporarily. We will change it in near future once the memory
>>> consumption issue has been tackled. The features are useful and users
>>> would want those to be enabled by default like other query
>>> optimizations. So, we can't take a decision based on that.
>
>
> Yeah, I really agree on that point!  However, considering that
> pull_var_clause is used in many places where partitioning is *not* involved,
> I still think it's better to avoid spending extra cycles needed only for
> partitioning in that function.

Right. The changes done in inheritance_planner() imply that we can
encounter a ConvertRowtypeExpr even in the expressions for a relation
which is not a child relation in the translated query. It was a child
in the original query but after translating it becomes a parent and
has ConvertRowtypeExpr somewhere there. So, there is no way to know
whether a given expression will have ConvertRowtypeExpr in it. That's
my understanding. But if we can device such a test, I am happy to
apply that test before or witin the call to pull_var_clause().

We don't need that expensive test if user has passed
PVC_RECURSE_CONVERTROWTYPE. So we could test that first and then check
the shape of expression tree. It would cause more asymmetry in
pull_var_clause(), but we can live with it or change the order of
tests for all the three options. I will differ that to a committer.

> +   /* Construct the conversion map. */
> +   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);
>
> I think it'd be better to pass -1, not 0, as the typmod argument for that
> function because that would be consistent with other places where we use
> that function with named rowtypes (eg, ruleutils.c).

Done.

>
> -is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
> +is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
> +   int *relno, int *colno)
>
> -1 for that change; because 1) we use "var" for implying many things as in
> eg, pull_var_clause and 2) I think it'd make back-patching easy to keep the
> name as-is.

I think pull_var_clause is the only place where we do that and I don't
like that very much. I also don't like is_subquery_var() name since
it's too specific. We might want that kind of functionality to ship
generic expressions and not just Vars in future. Usually we would be
searching for columns in the subquery targetlist so the name change
looks good to me. IIRC, the function was added one release back, so
backpatching pain, if any, won't be much. But I don't think we should
carry a misnomer for many releases to come. Let's differ this to a
committer.

Here's set of updated patches.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

expr_ref_error_pwj_v8.tar.gz
Description: GNU Zip compressed data

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-16 Thread Etsuro Fujita


(2018/05/14 15:34), Ashutosh Bapat wrote:

On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat
  wrote:

On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
  wrote:
Here's the query.

explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;


Thanks!


Yet yet another case where pull_var_clause produces an error:

postgres=# create table list_pt (a int, b text) partition by list (a);
CREATE TABLE
postgres=# create table list_ptp1 partition of list_pt for values in (1);
CREATE TABLE
postgres=# alter table list_ptp1 add constraint list_ptp1_check check
(list_ptp1::list_pt::text != NULL);
ERROR:  ConvertRowtypeExpr found where not expected

The patch kept the flags passed to pull_var_clause in
src/backend/catalog/heap.c as-is, but I think we need to modify that
accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
as well.


With all the support that we have added in partitioning for v11, I
think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
which were left unchanged in the earlier versions of patches, which
were written when all that support wasn't in v11.


Actually, we'd get the same error without using anything in 
partitioning.  Here is an example:


postgres=# create table parent (a int, b text);
CREATE TABLE
postgres=# create table child (a int, b text);
CREATE TABLE
postgres=# alter table child inherit parent ;
ALTER TABLE
postgres=# alter table child add constraint child_check check 
(child::parent::text != NULL);

ERROR:  ConvertRowtypeExpr found where not expected


Having said that, I started to think this approach would make code much
complicated.  Another idea I came up with to avoid that would be to use
pull_var_clause as-is and then rewrite wholerows of partitions back to
ConvertRowtypeExprs translating those wholerows to wholerows of their
parents tables' rowtypes if necessary.  That would be annoying and a little
bit inefficient, but the places where we need to rewrite is limited:
prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.


Other reason why we can't use your approach is we can not decide
whether ConvertRowtypeExpr is needed by just looking at the Var node.
You might say, that we add a ConvertRowtypeExpr if the Var::varno
references a child relation. But that's not true. For example a whole
row reference embedded in ConvertRowtypeExpr added by query
adjustments in inheritance_planner() is not a child's whole-row
reference in the adjusted query.


Sorry, I don't understand this fully.


So, it's not clear to me when we add
a ConvertRowtypeExpr and we don't.


I think it'd be OK to rewrite so at least in prepare_sort_from_pathkeys 
and build_tlist_to_deparse.



I am not sure if those are the only
two places which require a fix. Right now it looks like those are the
places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
in the future, esp. given the amount of work we expect to happen in
the partitioning area. When that happens, fixing all those places in
that way wouldn't be good.


I have to admit that the approach I proposed is a band-aid fix.


So we could
really minimize the code change and avoid the additional overhead caused by
the is_converted_whole_row_reference test added to pull_var_clause.  I think
the latter would be rather important, because PWJ is disabled by default.
What do you think about that approach?


Partition-wise join and partition-wise aggregation are disabled by
default temporarily. We will change it in near future once the memory
consumption issue has been tackled. The features are useful and users
would want those to be enabled by default like other query
optimizations. So, we can't take a decision based on that.


Yeah, I really agree on that point!  However, considering that 
pull_var_clause is used in many places where partitioning is *not* 
involved, I still think it's better to avoid spending extra cycles 
needed only for partitioning in that function.



Here's set of updated patches.


Thanks for updating the patch!

Here are some other minor comments on patch 
0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:


+   /* Construct the conversion map. */
+   parent_desc = lookup_rowtype_tupdesc(cre->resulttype, 0);

I think it'd be better to pass -1, not 0, as the typmod argument for 
that function because that would be consistent with other places where 
we use that function with named rowtypes (eg, ruleutils.c).


-is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno)
+is_subquery_column(Node *node, Index varno, RelOptInfo *foreignrel,
+   int *relno, int *colno)

-1 for that change; because 1) we use "var" for implying many things as 
in eg, pull_var_clause and 2) I think it'd make back-patching easy to 
keep the name as-is.


Other than that the patch set looks good to me.

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-14 Thread Ashutosh Bapat

On Mon, May 14, 2018 at 10:21 AM, Ashutosh Bapat
 wrote:
> On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
>  wrote:
>>
>> I guess you forgot to show the query.
>
> Sorry. Here's the query.
>
> explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
> where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;
>
>>
>> Yet yet another case where pull_var_clause produces an error:
>>
>> postgres=# create table list_pt (a int, b text) partition by list (a);
>> CREATE TABLE
>> postgres=# create table list_ptp1 partition of list_pt for values in (1);
>> CREATE TABLE
>> postgres=# alter table list_ptp1 add constraint list_ptp1_check check
>> (list_ptp1::list_pt::text != NULL);
>> ERROR:  ConvertRowtypeExpr found where not expected
>>
>> The patch kept the flags passed to pull_var_clause in
>> src/backend/catalog/heap.c as-is, but I think we need to modify that
>> accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
>> as well.
>
> With all the support that we have added in partitioning for v11, I
> think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
> which were left unchanged in the earlier versions of patches, which
> were written when all that support wasn't in v11.
>
>>
>> Having said that, I started to think this approach would make code much
>> complicated.  Another idea I came up with to avoid that would be to use
>> pull_var_clause as-is and then rewrite wholerows of partitions back to
>> ConvertRowtypeExprs translating those wholerows to wholerows of their
>> parents tables' rowtypes if necessary.  That would be annoying and a little
>> bit inefficient, but the places where we need to rewrite is limited:
>> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.
>
> Other reason why we can't use your approach is we can not decide
> whether ConvertRowtypeExpr is needed by just looking at the Var node.
> You might say, that we add a ConvertRowtypeExpr if the Var::varno
> references a child relation. But that's not true. For example a whole
> row reference embedded in ConvertRowtypeExpr added by query
> adjustments in inheritance_planner() is not a child's whole-row
> reference in the adjusted query. So, it's not clear to me when we add
> a ConvertRowtypeExpr and we don't. I am not sure if those are the only
> two places which require a fix. Right now it looks like those are the
> places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
> in the future, esp. given the amount of work we expect to happen in
> the partitioning area. When that happens, fixing all those places in
> that way wouldn't be good.
>
>> So we could
>> really minimize the code change and avoid the additional overhead caused by
>> the is_converted_whole_row_reference test added to pull_var_clause.  I think
>> the latter would be rather important, because PWJ is disabled by default.
>> What do you think about that approach?
>
> Partition-wise join and partition-wise aggregation are disabled by
> default temporarily. We will change it in near future once the memory
> consumption issue has been tackled. The features are useful and users
> would want those to be enabled by default like other query
> optimizations. So, we can't take a decision based on that.

Here's set of updated patches.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


expr_ref_error_pwj_v7.tar.gz
Description: GNU Zip compressed data

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-13 Thread Ashutosh Bapat

On Fri, May 11, 2018 at 6:31 PM, Etsuro Fujita
 wrote:
>
> I guess you forgot to show the query.

Sorry. Here's the query.

explain verbose select * from fprt1 t1 join fprt2 t2 on t1.a = t2.b
where t1 = row_as_is(row(t2.a, t2.b, t2.c)::ftprt2_p1)::fprt2;

>
> Yet yet another case where pull_var_clause produces an error:
>
> postgres=# create table list_pt (a int, b text) partition by list (a);
> CREATE TABLE
> postgres=# create table list_ptp1 partition of list_pt for values in (1);
> CREATE TABLE
> postgres=# alter table list_ptp1 add constraint list_ptp1_check check
> (list_ptp1::list_pt::text != NULL);
> ERROR:  ConvertRowtypeExpr found where not expected
>
> The patch kept the flags passed to pull_var_clause in
> src/backend/catalog/heap.c as-is, but I think we need to modify that
> accordingly.  Probably, the flags passed to pull_var_clause in CreateTrigger
> as well.

With all the support that we have added in partitioning for v11, I
think we need to add PVC_RECURSE_CONVERTROWTYPE at all the places,
which were left unchanged in the earlier versions of patches, which
were written when all that support wasn't in v11.

>
> Having said that, I started to think this approach would make code much
> complicated.  Another idea I came up with to avoid that would be to use
> pull_var_clause as-is and then rewrite wholerows of partitions back to
> ConvertRowtypeExprs translating those wholerows to wholerows of their
> parents tables' rowtypes if necessary.  That would be annoying and a little
> bit inefficient, but the places where we need to rewrite is limited:
> prepare_sort_from_pathkeys and build_tlist_to_deparse, IIUC.

Other reason why we can't use your approach is we can not decide
whether ConvertRowtypeExpr is needed by just looking at the Var node.
You might say, that we add a ConvertRowtypeExpr if the Var::varno
references a child relation. But that's not true. For example a whole
row reference embedded in ConvertRowtypeExpr added by query
adjustments in inheritance_planner() is not a child's whole-row
reference in the adjusted query. So, it's not clear to me when we add
a ConvertRowtypeExpr and we don't. I am not sure if those are the only
two places which require a fix. Right now it looks like those are the
places which need PVC_INCLUDE_CONVERTROWTYPE, but that may not be true
in the future, esp. given the amount of work we expect to happen in
the partitioning area. When that happens, fixing all those places in
that way wouldn't be good.

> So we could
> really minimize the code change and avoid the additional overhead caused by
> the is_converted_whole_row_reference test added to pull_var_clause.  I think
> the latter would be rather important, because PWJ is disabled by default.
> What do you think about that approach?

Partition-wise join and partition-wise aggregation are disabled by
default temporarily. We will change it in near future once the memory
consumption issue has been tackled. The features are useful and users
would want those to be enabled by default like other query
optimizations. So, we can't take a decision based on that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-11 Thread Ashutosh Bapat

On Thu, May 10, 2018 at 6:23 PM, Etsuro Fujita
 wrote:
> (2018/05/10 13:04), Ashutosh Bapat wrote:
>>
>> On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita
>>   wrote:
>>>
>>> (2018/04/25 18:51), Ashutosh Bapat wrote:

 Actually I noticed that ConvertRowtypeExpr are used to cast a child's
 whole row reference expression (not just a Var node) into that of its
 parent and not. For example a cast like NULL::child::parent produces a
 ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
 node. We are interested only in ConvertRowtypeExprs embedding Var
 nodes with Var::varattno = 0. I have changed this code to use function
 is_converted_whole_row_reference() instead of the above code with
 Assert. In order to use that function, I had to take it out of
 setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.
>>>
>>>
>>> This change seems a bit confusing to me because the flag bits
>>> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to
>>> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a
>>> given clause, but really, it only handles ConvertRowtypeExprs you
>>> mentioned
>>> above.  To make that function easy to understand and use, I think it'd be
>>> better to use the IsA(node, ConvertRowtypeExpr) test as in the first
>>> version
>>> of the patch, instead of is_converted_whole_row_reference, which would be
>>> more consistent with other cases such as PlaceHolderVar.
>>
>>
>> I agree that using is_converted_whole_row_reference() is not
>> consistent with the other nodes that are handled by pull_var_clause().
>> I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's
>> being done with those options. But using
>> is_converted_whole_row_reference() is the correct thing to do since we
>> are interested only in the whole-row references embedded in
>> ConvertRowtypeExpr. There can be anything encapsulated in
>> ConvertRowtypeExpr(), a non-shippable function for example. We don't
>> want to try to push that down in postgres_fdw's case. Neither in other
>> cases.
>
>
> Yeah, but I think the IsA test would be sufficient to make things work
> correctly because we can assume that there aren't any other nodes than Var,
> PHV, and CRE translating a wholerow value of a child table into a wholerow
> value of its parent table's rowtype in the expression clause to which we
> apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES.

Not really.
Consider following case using the same tables fprt1 and fprt2 in test
sql/postgres_fdw.sql
create function row_as_is(a ftprt2_p1) returns ftprt2_p1 as $$begin
return a; end;$$ language plpgsql;

With the change you propose i.e.

diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index f972712..088da14 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -646,8 +646,9 @@ pull_var_clause_walker(Node *node,
pull_var_clause_context *context)
else
elog(ERROR, "PlaceHolderVar found where not expected");
}
-   else if (is_converted_whole_row_reference(node))
+   else if (IsA(node, ConvertRowtypeExpr))
{
+   Assert(is_converted_whole_row_reference(node));
if (context->flags & PVC_INCLUDE_CONVERTROWTYPES)
{
context->varlist = lappend(context->varlist, node);


following query crashes at Assert(is_converted_whole_row_reference(node));

If we remove that assert, it would end up pushing down row_as_is(),
which is a local user defined function, to the foreign server. That's
not desirable since the foreign server may not have that user defined
function.

>
> BTW, one thing I noticed about the new version of the patch is: there is yet
> another case where pull_var_clause produces an error.  Here is an example:
>
> postgres=# create table t1 (a int, b text);
> CREATE TABLE
> postgres=# create table t2 (a int, b text);
> CREATE TABLE
> postgres=# create foreign table ft1 (a int, b text) server loopback options
> (table_name 't1');
> CREATE FOREIGN TABLE
> postgres=# create foreign table ft2 (a int, b text) server loopback options
> (table_name 't2');
> CREATE FOREIGN TABLE
> postgres=# insert into ft1 values (1, 'foo');
> INSERT 0 1
> postgres=# insert into ft1 values (2, 'bar');
> INSERT 0 1
> postgres=# insert into ft2 values (1, 'test1');
> INSERT 0 1
> postgres=# insert into ft2 values (2, 'test2');
> INSERT 0 1
> postgres=# analyze ft1;
> ANALYZE
> postgres=# analyze ft2;
> ANALYZE
> postgres=# create table parent (a int, b text);
> CREATE TABLE
> postgres=# alter foreign table ft1 inherit parent;
> ALTER FOREIGN TABLE
> postgres=# explain verbose update parent set b = ft2.b from ft2 where
> parent.a = ft2.a returning parent;
> ERROR:  ConvertRowtypeExpr found where not expected
>
> To produce this, apply the patch in [1] as

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-10 Thread Etsuro Fujita


(2018/05/10 13:04), Ashutosh Bapat wrote:

On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita
  wrote:

(2018/04/25 18:51), Ashutosh Bapat wrote:

Actually I noticed that ConvertRowtypeExpr are used to cast a child's
whole row reference expression (not just a Var node) into that of its
parent and not. For example a cast like NULL::child::parent produces a
ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
node. We are interested only in ConvertRowtypeExprs embedding Var
nodes with Var::varattno = 0. I have changed this code to use function
is_converted_whole_row_reference() instead of the above code with
Assert. In order to use that function, I had to take it out of
setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.


This change seems a bit confusing to me because the flag bits
"PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to
pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a
given clause, but really, it only handles ConvertRowtypeExprs you mentioned
above.  To make that function easy to understand and use, I think it'd be
better to use the IsA(node, ConvertRowtypeExpr) test as in the first version
of the patch, instead of is_converted_whole_row_reference, which would be
more consistent with other cases such as PlaceHolderVar.


I agree that using is_converted_whole_row_reference() is not
consistent with the other nodes that are handled by pull_var_clause().
I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's
being done with those options. But using
is_converted_whole_row_reference() is the correct thing to do since we
are interested only in the whole-row references embedded in
ConvertRowtypeExpr. There can be anything encapsulated in
ConvertRowtypeExpr(), a non-shippable function for example. We don't
want to try to push that down in postgres_fdw's case. Neither in other
cases.


Yeah, but I think the IsA test would be sufficient to make things work 
correctly because we can assume that there aren't any other nodes than 
Var, PHV, and CRE translating a wholerow value of a child table into a 
wholerow value of its parent table's rowtype in the expression clause to 
which we apply pull_var_clause with PVC_INCLUDE_CONVERTROWTYPES.


BTW, one thing I noticed about the new version of the patch is: there is 
yet another case where pull_var_clause produces an error.  Here is an 
example:


postgres=# create table t1 (a int, b text);
CREATE TABLE
postgres=# create table t2 (a int, b text);
CREATE TABLE
postgres=# create foreign table ft1 (a int, b text) server loopback 
options (table_name 't1');

CREATE FOREIGN TABLE
postgres=# create foreign table ft2 (a int, b text) server loopback 
options (table_name 't2');

CREATE FOREIGN TABLE
postgres=# insert into ft1 values (1, 'foo');
INSERT 0 1
postgres=# insert into ft1 values (2, 'bar');
INSERT 0 1
postgres=# insert into ft2 values (1, 'test1');
INSERT 0 1
postgres=# insert into ft2 values (2, 'test2');
INSERT 0 1
postgres=# analyze ft1;
ANALYZE
postgres=# analyze ft2;
ANALYZE
postgres=# create table parent (a int, b text);
CREATE TABLE
postgres=# alter foreign table ft1 inherit parent;
ALTER FOREIGN TABLE
postgres=# explain verbose update parent set b = ft2.b from ft2 where 
parent.a = ft2.a returning parent;

ERROR:  ConvertRowtypeExpr found where not expected

To produce this, apply the patch in [1] as well as the new version; 
without that patch in [1], the update query would crash the server (see 
[1] for details).  To fix this, I think we need to pass 
PVC_RECURSE_CONVERTROWTYPES to pull_var_clause in build_remote_returning 
in postgres_fdw.c as well.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5AF43E02.3%40lab.ntt.co.jp

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-09 Thread Ashutosh Bapat

On Wed, May 9, 2018 at 5:20 PM, Etsuro Fujita
 wrote:
>
> (2018/04/25 18:51), Ashutosh Bapat wrote:
>> Actually I noticed that ConvertRowtypeExpr are used to cast a child's
>> whole row reference expression (not just a Var node) into that of its
>> parent and not. For example a cast like NULL::child::parent produces a
>> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
>> node. We are interested only in ConvertRowtypeExprs embedding Var
>> nodes with Var::varattno = 0. I have changed this code to use function
>> is_converted_whole_row_reference() instead of the above code with
>> Assert. In order to use that function, I had to take it out of
>> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.
>
> This change seems a bit confusing to me because the flag bits
> "PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed to
> pull_var_clause look as if it handles any ConvertRowtypeExpr nodes from a
> given clause, but really, it only handles ConvertRowtypeExprs you mentioned
> above.  To make that function easy to understand and use, I think it'd be
> better to use the IsA(node, ConvertRowtypeExpr) test as in the first version
> of the patch, instead of is_converted_whole_row_reference, which would be
> more consistent with other cases such as PlaceHolderVar.

I agree that using is_converted_whole_row_reference() is not
consistent with the other nodes that are handled by pull_var_clause().
I also agree that PVC_*_CONVERTROWTYPES doesn't reflect exactly what's
being done with those options. But using
is_converted_whole_row_reference() is the correct thing to do since we
are interested only in the whole-row references embedded in
ConvertRowtypeExpr. There can be anything encapsulated in
ConvertRowtypeExpr(), a non-shippable function for example. We don't
want to try to push that down in postgres_fdw's case. Neither in other
cases. So, it boils down to changing names of PVC_*_CONVERTROWTYPES to
something more meaningful. How about PVC_*_CONVERTWHOLEROWS?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-09 Thread Etsuro Fujita

(2018/05/08 16:20), Ashutosh Bapat wrote:

On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujita
  wrote:

Here are my review comments on patch
0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:

* This assertion in deparseConvertRowtypeExpr wouldn't always be true
because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN
TABLE ADD COLUMN:

+   Assert(parent_desc->natts == child_desc->natts);

I think we should remove that assertion.

Removed.

* But I don't think that change is enough.  Here is another oddity with that
assertion removed.

To fix this, I think we should skip the deparseColumnRef processing for
dropped columns in the parent table.

Done.

Thanks for those changes!

I have changed the test file a bit to test these scenarios.

+1

* I think it would be better to do EXPLAIN with the VERBOSE option to the
queries this patch added to the regression tests, to see if
ConvertRowtypeExprs are correctly deparsed in the remote queries.

The reason VERBOSE option was omitted for partition-wise join was to
avoid repeated and excessive output for every child-join. I think that
still applies.

I'd like to leave that for the committer's judge.

PFA patch-set with the fixes.

Thanks for updating the patch!

I also noticed that the new function deparseConvertRowtypeExpr is not
quite following the naming convention of the other deparseFoo
functions. Foo is usually the type of the node the parser would
produced when the SQL string produced by that function is parsed. That
doesn't hold for the SQL string produced by ConvertRowtypeExpr but
then naming it as generic deparseRowExpr() wouldn't be correct either.
And then there are functions like deparseColumnRef which may produce a
variety of SQL strings which get parsed into different node types e.g.
a whole-row reference would produce ROW(...) which gets parsed as a
RowExpr. Please let me know if you have any suggestions for the name.

To be honest, I don't have any strong opinion about that.  But I like 
"deparseConvertRowtypeExpr" because that name seems to well represent 
the functionality, so I'd vote for that.

BTW, one thing I'm a bit concerned about is this:

(2018/04/25 18:51), Ashutosh Bapat wrote:
> Actually I noticed that ConvertRowtypeExpr are used to cast a child's
> whole row reference expression (not just a Var node) into that of its
> parent and not. For example a cast like NULL::child::parent produces a
> ConvertRowtypeExpr encapsulating a NULL constant node and not a Var
> node. We are interested only in ConvertRowtypeExprs embedding Var
> nodes with Var::varattno = 0. I have changed this code to use function
> is_converted_whole_row_reference() instead of the above code with
> Assert. In order to use that function, I had to take it out of
> setrefs.c and put it in nodeFuncs.c which seemed to be a better fit.

This change seems a bit confusing to me because the flag bits 
"PVC_INCLUDE_CONVERTROWTYPES" and "PVC_RECURSE_CONVERTROWTYPES" passed 
to pull_var_clause look as if it handles any ConvertRowtypeExpr nodes 
from a given clause, but really, it only handles ConvertRowtypeExprs you 
mentioned above.  To make that function easy to understand and use, I 
think it'd be better to use the IsA(node, ConvertRowtypeExpr) test as in 
the first version of the patch, instead of 
is_converted_whole_row_reference, which would be more consistent with 
other cases such as PlaceHolderVar.

Best regards,
Etsuro Fujita

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-08 Thread Ashutosh Bapat

On Tue, May 1, 2018 at 5:00 PM, Etsuro Fujita
 wrote:
> (2018/04/27 14:40), Ashutosh Bapat wrote:
>>
>> Here's updated patch set.
>
>
> Thanks for updating the patch!  Here are my review comments on patch
> 0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:
>
> * This assertion in deparseConvertRowtypeExpr wouldn't always be true
> because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN
> TABLE ADD COLUMN:
>
> +   Assert(parent_desc->natts == child_desc->natts);
>
> Here is such an example:

>
> I think we should remove that assertion.

Removed.

>
> * But I don't think that change is enough.  Here is another oddity with that
> assertion removed.

>
> To fix this, I think we should skip the deparseColumnRef processing for
> dropped columns in the parent table.

Done.

I have changed the test file a bit to test these scenarios.

Thanks for testing.

>
> * I think it would be better to do EXPLAIN with the VERBOSE option to the
> queries this patch added to the regression tests, to see if
> ConvertRowtypeExprs are correctly deparsed in the remote queries.

The reason VERBOSE option was omitted for partition-wise join was to
avoid repeated and excessive output for every child-join. I think that
still applies.

PFA patch-set with the fixes.

I also noticed that the new function deparseConvertRowtypeExpr is not
quite following the naming convention of the other deparseFoo
functions. Foo is usually the type of the node the parser would
produced when the SQL string produced by that function is parsed. That
doesn't hold for the SQL string produced by ConvertRowtypeExpr but
then naming it as generic deparseRowExpr() wouldn't be correct either.
And then there are functions like deparseColumnRef which may produce a
variety of SQL strings which get parsed into different node types e.g.
a whole-row reference would produce ROW(...) which gets parsed as a
RowExpr. Please let me know if you have any suggestions for the name.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

expr_ref_error_pwj_v5.tar.gz
Description: GNU Zip compressed data

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-05-01 Thread Etsuro Fujita


(2018/04/27 14:40), Ashutosh Bapat wrote:

Here's updated patch set.


Thanks for updating the patch!  Here are my review comments on patch 
0003-postgres_fdw-child-join-with-ConvertRowtypeExprs-cau.patch:


* This assertion in deparseConvertRowtypeExpr wouldn't always be true 
because of cases like ALTER FOREIGN TABLE DROP COLUMN plus ALTER FOREIGN 
TABLE ADD COLUMN:


+   Assert(parent_desc->natts == child_desc->natts);

Here is such an example:

postgres=# create table fprt1 (a int, b int, c varchar) partition by 
range (a);

postgres=# create table fprt1_p1 (like fprt1);
postgres=# create table fprt1_p2 (like fprt1);
postgres=# create foreign table ftprt1_p1 (a int, b int, c varchar) 
server loopback options (table_name 'fprt1_p1', use_remote_estimate 'true');

postgres=# alter foreign table ftprt1_p1 drop column c;
postgres=# alter foreign table ftprt1_p1 add column c varchar;
postgres=# alter table fprt1 attach partition ftprt1_p1 for values from 
(0) to (250);
postgres=# create foreign table ftprt1_p2 partition of fprt1 for values 
from (250) to (500) server loopback options (table_name 'fprt1_p2', 
use_remote_estimate 'true');
postgres=# insert into fprt1 select i, i, to_char(i/50, 'FM') from 
generate_series(0, 499, 2) i;

postgres=# analyze fprt1;
postgres=# analyze fprt1_p1;
postgres=# analyze fprt1_p2;
postgres=# create table fprt2 (a int, b int, c varchar) partition by 
range (b);

postgres=# create table fprt2_p1 (like fprt2);
postgres=# create table fprt2_p2 (like fprt2);
postgres=# create foreign table ftprt2_p1 partition of fprt2 for values 
from (0) to (250) server loopback options (table_name 'fprt2_p1', 
use_remote_estimate 'true');
postgres=# create foreign table ftprt2_p2 partition of fprt2 for values 
from (250) to (500) server loopback options (table_name 'fprt2_p2', 
use_remote_estimate 'true');
postgres=# insert into fprt2 select i, i, to_char(i/50, 'FM') from 
generate_series(0, 499, 3) i;

postgres=# analyze fprt2;
postgres=# analyze fprt2_p1;
postgres=# analyze fprt2_p2;
postgres=# set enable_partitionwise_join to true;
postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) 
where t1.a % 25 = 0;

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I think we should remove that assertion.

* But I don't think that change is enough.  Here is another oddity with 
that assertion removed.


postgres=# alter table fprt1 detach partition ftprt1_p1;
postgres=# alter table fprt1 detach partition ftprt1_p2;
postgres=# alter table fprt1 drop column c;
postgres=# alter table fprt1 add column c varchar;
postgres=# alter table fprt1 attach partition ftprt1_p1 for values from 
(0) to (250);
postgres=# alter table fprt1 attach partition ftprt1_p2 for values from 
(250) to (500);

postgres=# set enable_partitionwise_join to true;
postgres=# select t1, t2 from fprt1 t1 join fprt2 t2 on (t1.a = t2.b) 
where t1.a % 25 = 0;

ERROR:  malformed record literal: "(300,300,"(300,300,0006)",0006)"
DETAIL:  Too many columns.
CONTEXT:  processing expression at position 1 in select list

The reason for that would be: in the following, the patch doesn't take 
care of dropped columns in the parent table (ie, columns such that 
attrMap[cnt]=0).


+   /* Construct ROW expression according to the conversion map. */
+   appendStringInfoString(buf, "ROW(");
+   for (cnt = 0; cnt < natts; cnt++)
+   {
+   if (cnt > 0)
+   appendStringInfoString(buf, ", ");
+   deparseColumnRef(buf, child_var->varno, attrMap[cnt], root,
+qualify_col);
+   }
+   appendStringInfoChar(buf, ')');

To fix this, I think we should skip the deparseColumnRef processing for 
dropped columns in the parent table.


* I think it would be better to do EXPLAIN with the VERBOSE option to 
the queries this patch added to the regression tests, to see if 
ConvertRowtypeExprs are correctly deparsed in the remote queries.


That's all I have for now.

Best regards,
Etsuro Fujita

1 2 >

1 - 100 of 109 matches

Mail list logo