It is more or less well known that the planner doesn't perform well with
more than a few hundred partitions even when only a handful of partitions
are ultimately included in the plan.  Situation has improved a bit in PG
11 where we replaced the older method of pruning partitions one-by-one
using constraint exclusion with a much faster method that finds relevant
partitions by using partitioning metadata.  However, we could only use it
for SELECT queries, because UPDATE/DELETE are handled by a completely
different code path, whose structure doesn't allow it to call the new
pruning module's functionality.  Actually, not being able to use the new
pruning is not the only problem for UPDATE/DELETE, more on which further
below.

While situation improved with new pruning where it could, there are still
overheads in the way planner handles partitions.  As things stand today,
it will spend cycles and allocate memory for partitions even before
pruning is performed, meaning most of that effort could be for partitions
that were better left untouched.  Currently, planner will lock, heap_open
*all* partitions, create range table entries and AppendRelInfos  for them,
and finally initialize RelOptInfos for them, even touching the disk file
of each partition in the process, in an earlier phase of planning.  All of
that processing is vain for partitions that are pruned, because they won't
be included in the final plan.  This problem grows worse as the number of
partitions grows beyond thousands, because range table grows too big.

That could be fixed by delaying all that per-partition activity to a point
where pruning has already been performed, so that we know the partitions
to open and create planning data structures for, such as somewhere
downstream to query_planner.  But before we can do that we must do
something about the fact that UPDATE/DELETE won't be able to cope with
that because the code path that currently handles the planning of
UPDATE/DELETE on partitioned tables (inheritance_planner called from
subquery_planner) relies on AppendRelInfos for all partitions having been
initialized by an earlier planning phase.  Delaying it to query_planner
would be too late, because inheritance_planner calls query_planner for
each partition, not for the parent.  That is, if query_planner, which is
downstream to inheritance_planner, was in the charge of determining which
partitions to open, the latter wouldn't know which partitions to call the
former for. :)

That would be fixed if there is no longer this ordering dependency, which
is what I propose to do with the attached patch 0001.  I've tried to
describe how the patch manages to do that in its commit message, but I'll
summarize here.  As things stand today, inheritance_planner modifies the
query for each leaf partition to make the partition act as the query's
result relation instead of the original partitioned table and calls
grouping_planner on the query.  That means anything that's joined to
partitioned table looks to instead be joined to the partition and join
paths are generated likewise.  Also, the resulting path's targetlist is
adjusted to be suitable for the result partition.  Upon studying how this
works, I concluded that the same result can be achieved if we call
grouping_planner only once and repeat the portions of query_planner's and
grouping_planner's processing that generate the join paths and appropriate
target list, respectively, for each partition.  That way, we can rely on
query_planner determining result partitions for us, which in turn relies
on the faster partprune.c based method of pruning.  That speeds things up
in two ways.  Faster pruning and we no longer repeat common processing for
each partition.


With 0001 in place, there is nothing that requires that partitions be
opened by an earlier planning phase, so, I propose patch 0002, which
refactors the opening and creation of planner data structures for
partitions such that it is now performed after pruning. However, it
doesn't do anything about the fact that partitions are all still locked in
the earlier phase.

With various overheads gone thanks to 0001 and 0002, locking of all
partitions via find_all_inheritos can be seen as the single largest
bottleneck, which 0003 tries to address.  I've kept it a separate patch,
because I'll need to think a bit more to say that it's actually to safe to
defer locking to late planning, due mainly to the concern about the change
in the order of locking from the current method.  I'm attaching it here,
because I also want to show the performance improvement we can expect with it.


I measured the gain in performance due to each patch on a modest virtual
machine.  Details of the measurement and results follow.

* Benchmark scripts

update.sql
update ht set a = 0 where b = 1;

select.sql
select * from ht where b = 1;

* Table:

create table ht (a int, b int) partition by hash (b)
create table ht_1 partition of ht for values with (modulus N, remainder 0)
..
create table ht_N partition of ht for values with (modulus N, remainder N-1)

* Rounded tps with update.sql and select.sql against regular table (nparts
= 0) and partitioned table with various partition counts:

pgbench -n -T 60 -f update.sql

nparts  master    0001   0002   0003
======  ======    ====   ====   ====
0         2856    2893   2862   2816
8          507    1115   1447   1872
16         260     765   1173   1892
32         119     483    922   1884
64          59     282    615   1881
128         29     153    378   1835
256         14      79    210   1803
512          5      40    113   1728
1024         2      17     57   1616
2048         0*      9     30   1471
4096         0+      4     15   1236
8192         0=      2      7    975

* 0.46
+ 0.0064
= 0 (OOM on a virtual machine with 4GB RAM)

As can be seen here, 0001 is a big help for update queries.

pgbench -n -T 60 -f select.sql

For a select query that doesn't contain join and needs to scan only one
partition:

nparts  master    0001   0002   0003
======  ======    ====   ====   ====
0         2290    2329   2319   2268
8         1058    1077   1414   1788
16         711     729   1124   1789
32         450     475    879   1773
64         265     272    603   1765
128        146     149    371   1685
256         76      77    214   1678
512         39      39    112   1636
1024        16      17     59   1525
2048         8       9     29   1416
4096         4       4     15   1195
8192         2       2      7    932

Actually, here we get almost same numbers with 0001 as with master,
because 0001 changes nothing for SELECT queries.  We start seeing
improvement with 0002, the patch to delay opening partitions.

Thanks,
Amit

From 060bd2445ea9cba9adadd73505689d6f06583ee8 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Fri, 24 Aug 2018 12:39:36 +0900
Subject: [PATCH 1/3] Overhaul partitioned table update/delete planning

Current method, inheritance_planner, applies grouping_planner and
hence query_planner to the query repeatedly with each leaf partition
replacing the root parent as the query's result relation. One big
drawback of this approach is that it cannot use partprune.c to
perform partition pruning on the partitioned result relation, because
it can only be invoked if query_planner sees the partitioned relation
itself in the query.  That is not true with the existing method,
because as mentioned above, query_planner is invoked with the
partitioned relation replaced with individual leaf partitions.

While most of the work in each repitition of grouping_planner (and
query_planner) is same, a couple of things may differ from partition
to partition -- 1. Join planning may produce different Paths for
joining against different result partitions, 2. grouping_planner
may produce different top-level target lists for different
partitions, based on their TupleDescs.

This commit rearranges things so that, only the planning steps that
affect 1 and 2 above are repeated for partitions that are selected by
query_planner by applying partprune.c based pruning to the original
partitioned result rel.

That makes things faster because  1. partprune.c based pruning is
used instead of using constraint exclusion for each partition, 2.
grouping_planner (and query_planner) is invoked only once instead of
for every partition thus saving cycles and memory.

This still doesn't help much if no partitions are pruned, because
we still repeat join planning and makes copies of the query for
each partition, but for common cases where only handful partitions
remain after pruning, this makes things significanly faster.
---
 doc/src/sgml/ddl.sgml                        |  15 +-
 src/backend/optimizer/path/allpaths.c        |  97 ++++++-
 src/backend/optimizer/plan/planmain.c        |   4 +-
 src/backend/optimizer/plan/planner.c         | 378 ++++++++++++++++++++-------
 src/backend/optimizer/prep/prepunion.c       |  28 +-
 src/backend/optimizer/util/plancat.c         |  30 ---
 src/test/regress/expected/partition_join.out |   4 +-
 7 files changed, 416 insertions(+), 140 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b5ed1b7939..53c479fbb8 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3933,16 +3933,6 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate 
&gt;= DATE '2008-01-01';
     <xref linkend="guc-enable-partition-pruning"/> setting.
    </para>
 
-   <note>
-    <para>
-     Currently, pruning of partitions during the planning of an
-     <command>UPDATE</command> or <command>DELETE</command> command is
-     implemented using the constraint exclusion method (however, it is
-     controlled by the <literal>enable_partition_pruning</literal> rather than
-     <literal>constraint_exclusion</literal>) &mdash; see the following section
-     for details and caveats that apply.
-    </para>
-
     <para>
      Execution-time partition pruning currently occurs for the
      <literal>Append</literal> and <literal>MergeAppend</literal> node types.
@@ -3964,9 +3954,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate 
&gt;= DATE '2008-01-01';
 
    <para>
     <firstterm>Constraint exclusion</firstterm> is a query optimization
-    technique similar to partition pruning.  While it is primarily used
-    for partitioning implemented using the legacy inheritance method, it can be
-    used for other purposes, including with declarative partitioning.
+    technique similar to partition pruning.  It is primarily used
+    for partitioning implemented using the legacy inheritance method.
    </para>
 
    <para>
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 0e80aeb65c..5937c0436a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -36,6 +36,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
@@ -119,6 +120,9 @@ static void set_namedtuplestore_pathlist(PlannerInfo *root, 
RelOptInfo *rel,
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
                                           RangeTblEntry *rte);
 static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
+static RelOptInfo *partitionwise_make_rel_from_joinlist(PlannerInfo *root,
+                                               RelOptInfo *parent,
+                                               List *joinlist);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                                                  pushdown_safety_info 
*safetyInfo);
 static bool recurse_pushdown_safe(Node *setOp, Query *topquery,
@@ -181,13 +185,30 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 
        /*
         * Generate access paths for the entire join tree.
+        *
+        * If we're doing this for an UPDATE or DELETE query whose target is a
+        * partitioned table, we must do the join planning against each of its
+        * leaf partitions instead.
         */
-       rel = make_rel_from_joinlist(root, joinlist);
+       if (root->parse->resultRelation &&
+               root->parse->commandType != CMD_INSERT &&
+               root->simple_rel_array[root->parse->resultRelation] &&
+               
root->simple_rel_array[root->parse->resultRelation]->part_scheme)
+       {
+               RelOptInfo *rootrel = 
root->simple_rel_array[root->parse->resultRelation];
 
-       /*
-        * The result should join all and only the query's base rels.
-        */
-       Assert(bms_equal(rel->relids, root->all_baserels));
+               rel = partitionwise_make_rel_from_joinlist(root, rootrel, 
joinlist);
+       }
+       else
+       {
+               rel = make_rel_from_joinlist(root, joinlist);
+
+               /*
+                * The result should join all and only the query's base rels.
+                */
+               Assert(bms_equal(rel->relids, root->all_baserels));
+
+       }
 
        return rel;
 }
@@ -2591,6 +2612,72 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo 
*rel, bool override_rows)
 }
 
 /*
+ * partitionwise_make_rel_from_joinlist
+ *             performs join planning against each of the leaf partitions 
contained
+ *             in the partition tree whose root relation is 'parent'
+ *
+ * Recursively called for each partitioned table contained in a given
+ *partition tree.
+ */
+static RelOptInfo *
+partitionwise_make_rel_from_joinlist(PlannerInfo *root,
+                                                                        
RelOptInfo *parent,
+                                                                        List 
*joinlist)
+{
+       int             i;
+
+       Assert(root->parse->resultRelation != 0);
+       Assert(parent->part_scheme != NULL);
+
+       for (i = 0; i < parent->nparts; i++)
+       {
+               RelOptInfo *partrel = parent->part_rels[i];
+               AppendRelInfo *appinfo;
+               List       *translated_joinlist;
+               List       *saved_join_info_list = 
list_copy(root->join_info_list);
+
+               /* Ignore pruned partitions. */
+               if (IS_DUMMY_REL(partrel))
+                       continue;
+
+               /*
+                * Hack to make the join planning code believe that 'partrel' 
can
+                * be joined against.
+                */
+               partrel->reloptkind = RELOPT_BASEREL;
+
+               /*
+                * Replace references to the parent rel in expressions relevant 
to join
+                * planning.
+                */
+               appinfo = root->append_rel_array[partrel->relid];
+               translated_joinlist = (List *)
+                                               adjust_appendrel_attrs(root, 
(Node *) joinlist,
+                                                                               
           1, &appinfo);
+               root->join_info_list = (List *)
+                                               adjust_appendrel_attrs(root,
+                                                                               
           (Node *) root->join_info_list,
+                                                                               
           1, &appinfo);
+               /* Reset join planning data structures for a new partition. */
+               root->join_rel_list = NIL;
+               root->join_rel_hash = NULL;
+
+               /* Recurse if the partition is itself a partitioned table. */
+               if (partrel->part_scheme != NULL)
+                       partrel = partitionwise_make_rel_from_joinlist(root, 
partrel,
+                                                                               
                                translated_joinlist);
+               else
+                       /* Perform the join planning and save the resulting 
relation. */
+                       parent->part_rels[i] =
+                                               make_rel_from_joinlist(root, 
translated_joinlist);
+
+               root->join_info_list = saved_join_info_list;
+       }
+
+       return parent;
+}
+
+/*
  * make_rel_from_joinlist
  *       Build access paths using a "joinlist" to guide the join path search.
  *
diff --git a/src/backend/optimizer/plan/planmain.c 
b/src/backend/optimizer/plan/planmain.c
index b05adc70c4..3f0d80eaa6 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -266,7 +266,9 @@ query_planner(PlannerInfo *root, List *tlist,
 
        /* Check that we got at least one usable path */
        if (!final_rel || !final_rel->cheapest_total_path ||
-               final_rel->cheapest_total_path->param_info != NULL)
+               final_rel->cheapest_total_path->param_info != NULL ||
+               (final_rel->relid == root->parse->resultRelation &&
+                root->parse->commandType == CMD_INSERT))
                elog(ERROR, "failed to construct the join relation");
 
        return final_rel;
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 96bf0601a8..076dbd3d62 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -238,6 +238,16 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
                                         List *targetList,
                                         List *groupClause);
 
+static void partitionwise_adjust_scanjoin_target(PlannerInfo *root,
+                                                                        
RelOptInfo *parent,
+                                                                        List 
**partition_subroots,
+                                                                        List 
**partitioned_rels,
+                                                                        List 
**resultRelations,
+                                                                        List 
**subpaths,
+                                                                        List 
**WCOLists,
+                                                                        List 
**returningLists,
+                                                                        List 
**rowMarks);
+
 
 /*****************************************************************************
  *
@@ -959,7 +969,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
         * needs special processing, else go straight to grouping_planner.
         */
        if (parse->resultRelation &&
-               rt_fetch(parse->resultRelation, parse->rtable)->inh)
+               rt_fetch(parse->resultRelation, parse->rtable)->inh &&
+               rt_fetch(parse->resultRelation, parse->rtable)->relkind !=
+                                RELKIND_PARTITIONED_TABLE)
                inheritance_planner(root);
        else
                grouping_planner(root, false, tuple_fraction);
@@ -1688,6 +1700,14 @@ grouping_planner(PlannerInfo *root, bool 
inheritance_update,
        RelOptInfo *current_rel;
        RelOptInfo *final_rel;
        ListCell   *lc;
+       List       *orig_parse_tlist = list_copy(parse->targetList);
+       List       *partition_subroots = NIL;
+       List       *partitioned_rels = NIL;
+       List       *partition_resultRelations = NIL;
+       List       *partition_subpaths = NIL;
+       List       *partition_WCOLists = NIL;
+       List       *partition_returningLists = NIL;
+       List       *partition_rowMarks = NIL;
 
        /* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
        if (parse->limitCount || parse->limitOffset)
@@ -2018,13 +2038,44 @@ grouping_planner(PlannerInfo *root, bool 
inheritance_update,
                        scanjoin_targets_contain_srfs = NIL;
                }
 
-               /* Apply scan/join target. */
-               scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1
-                       && equal(scanjoin_target->exprs, 
current_rel->reltarget->exprs);
-               apply_scanjoin_target_to_paths(root, current_rel, 
scanjoin_targets,
-                                                                          
scanjoin_targets_contain_srfs,
-                                                                          
scanjoin_target_parallel_safe,
-                                                                          
scanjoin_target_same_exprs);
+               /*
+                * For an UPDATE/DELETE query whose target is partitioned 
table, we
+                * must generate the targetlist for each of its leaf partitions 
and
+                * apply that.
+                */
+               if (current_rel->reloptkind == RELOPT_BASEREL &&
+                       current_rel->part_scheme &&
+                       current_rel->relid == root->parse->resultRelation &&
+                       parse->commandType != CMD_INSERT)
+               {
+                       /*
+                        * scanjoin_target shouldn't have changed from 
final_target,
+                        * because UPDATE/DELETE doesn't support various 
features that
+                        * would've required modifications that are performed 
above.
+                        * That's important because we'll generate final_target 
freshly
+                        * for each partition in 
partitionwise_adjust_scanjoin_target.
+                        */
+                       Assert(scanjoin_target == final_target);
+                       root->parse->targetList = orig_parse_tlist;
+                       partitionwise_adjust_scanjoin_target(root, current_rel,
+                                                                               
                 &partition_subroots,
+                                                                               
                 &partitioned_rels,
+                                                                               
                 &partition_resultRelations,
+                                                                               
                 &partition_subpaths,
+                                                                               
                 &partition_WCOLists,
+                                                                               
                 &partition_returningLists,
+                                                                               
                 &partition_rowMarks);
+               }
+               else
+               {
+                       /* Apply scan/join target. */
+                       scanjoin_target_same_exprs = 
list_length(scanjoin_targets) == 1
+                               && equal(scanjoin_target->exprs, 
current_rel->reltarget->exprs);
+                       apply_scanjoin_target_to_paths(root, current_rel, 
scanjoin_targets,
+                                                                               
   scanjoin_targets_contain_srfs,
+                                                                               
   scanjoin_target_parallel_safe,
+                                                                               
   scanjoin_target_same_exprs);
+               }
 
                /*
                 * Save the various upper-rel PathTargets we just computed into
@@ -2136,93 +2187,119 @@ grouping_planner(PlannerInfo *root, bool 
inheritance_update,
        final_rel->useridiscurrent = current_rel->useridiscurrent;
        final_rel->fdwroutine = current_rel->fdwroutine;
 
-       /*
-        * Generate paths for the final_rel.  Insert all surviving paths, with
-        * LockRows, Limit, and/or ModifyTable steps added if needed.
-        */
-       foreach(lc, current_rel->pathlist)
+       if (current_rel->reloptkind == RELOPT_BASEREL &&
+               current_rel->relid == root->parse->resultRelation &&
+               current_rel->part_scheme &&
+               parse->commandType != CMD_INSERT)
        {
-               Path       *path = (Path *) lfirst(lc);
-
-               /*
-                * If there is a FOR [KEY] UPDATE/SHARE clause, add the 
LockRows node.
-                * (Note: we intentionally test parse->rowMarks not 
root->rowMarks
-                * here.  If there are only non-locking rowmarks, they should be
-                * handled by the ModifyTable node instead.  However, 
root->rowMarks
-                * is what goes into the LockRows node.)
-                */
-               if (parse->rowMarks)
-               {
-                       path = (Path *) create_lockrows_path(root, final_rel, 
path,
-                                                                               
                 root->rowMarks,
-                                                                               
                 SS_assign_special_param(root));
-               }
-
-               /*
-                * If there is a LIMIT/OFFSET clause, add the LIMIT node.
-                */
-               if (limit_needed(parse))
-               {
-                       path = (Path *) create_limit_path(root, final_rel, path,
-                                                                               
          parse->limitOffset,
-                                                                               
          parse->limitCount,
-                                                                               
          offset_est, count_est);
-               }
-
-               /*
-                * If this is an INSERT/UPDATE/DELETE, and we're not being 
called from
-                * inheritance_planner, add the ModifyTable node.
-                */
-               if (parse->commandType != CMD_SELECT && !inheritance_update)
-               {
-                       List       *withCheckOptionLists;
-                       List       *returningLists;
-                       List       *rowMarks;
-
-                       /*
-                        * Set up the WITH CHECK OPTION and RETURNING 
lists-of-lists, if
-                        * needed.
-                        */
-                       if (parse->withCheckOptions)
-                               withCheckOptionLists = 
list_make1(parse->withCheckOptions);
-                       else
-                               withCheckOptionLists = NIL;
-
-                       if (parse->returningList)
-                               returningLists = 
list_make1(parse->returningList);
-                       else
-                               returningLists = NIL;
-
-                       /*
-                        * If there was a FOR [KEY] UPDATE/SHARE clause, the 
LockRows node
-                        * will have dealt with fetching non-locked marked 
rows, else we
-                        * need to have ModifyTable do that.
-                        */
-                       if (parse->rowMarks)
-                               rowMarks = NIL;
-                       else
-                               rowMarks = root->rowMarks;
-
-                       path = (Path *)
+               Path *path = (Path *)
                                create_modifytable_path(root, final_rel,
                                                                                
parse->commandType,
                                                                                
parse->canSetTag,
                                                                                
parse->resultRelation,
-                                                                               
NIL,
-                                                                               
false,
-                                                                               
list_make1_int(parse->resultRelation),
-                                                                               
list_make1(path),
-                                                                               
list_make1(root),
-                                                                               
withCheckOptionLists,
-                                                                               
returningLists,
-                                                                               
rowMarks,
-                                                                               
parse->onConflict,
+                                                                               
partitioned_rels,
+                                                                               
root->partColsUpdated,
+                                                                               
partition_resultRelations,
+                                                                               
partition_subpaths,
+                                                                               
partition_subroots,
+                                                                               
partition_WCOLists,
+                                                                               
partition_returningLists,
+                                                                               
partition_rowMarks,
+                                                                               
NULL,
                                                                                
SS_assign_special_param(root));
-               }
-
-               /* And shove it into final_rel */
                add_path(final_rel, path);
        }
+       else
+       {
+               /*
+                * Generate paths for the final_rel.  Insert all surviving 
paths, with
+                * LockRows, Limit, and/or ModifyTable steps added if needed.
+                */
+               foreach(lc, current_rel->pathlist)
+               {
+                       Path       *path = (Path *) lfirst(lc);
+
+                       /*
+                        * If there is a FOR [KEY] UPDATE/SHARE clause, add the 
LockRows
+                        * node. (Note: we intentionally test parse->rowMarks 
not
+                        * root->rowMarks here.  If there are only non-locking 
rowmarks,
+                        * they should be handled by the ModifyTable node 
instead.
+                        * However, root->rowMarks is what goes into the 
LockRows node.)
+                        */
+                       if (parse->rowMarks)
+                       {
+                               path = (Path *)
+                                                       
create_lockrows_path(root, final_rel, path,
+                                                                               
                 root->rowMarks,
+                                                                               
                 SS_assign_special_param(root));
+                       }
+
+                       /*
+                        * If there is a LIMIT/OFFSET clause, add the LIMIT 
node.
+                        */
+                       if (limit_needed(parse))
+                       {
+                               path = (Path *) create_limit_path(root, 
final_rel, path,
+                                                                               
                  parse->limitOffset,
+                                                                               
                  parse->limitCount,
+                                                                               
                  offset_est, count_est);
+                       }
+
+                       /*
+                        * If this is an INSERT/UPDATE/DELETE, and we're not 
being called
+                        * from inheritance_planner, add the ModifyTable node.
+                        */
+                       if (parse->commandType != CMD_SELECT && 
!inheritance_update)
+                       {
+                               List       *withCheckOptionLists;
+                               List       *returningLists;
+                               List       *rowMarks;
+
+                               /*
+                                * Set up the WITH CHECK OPTION and RETURNING 
lists-of-lists,
+                                * if needed.
+                                */
+                               if (parse->withCheckOptions)
+                                       withCheckOptionLists = 
list_make1(parse->withCheckOptions);
+                               else
+                                       withCheckOptionLists = NIL;
+
+                               if (parse->returningList)
+                                       returningLists = 
list_make1(parse->returningList);
+                               else
+                                       returningLists = NIL;
+
+                               /*
+                                * If there was a FOR [KEY] UPDATE/SHARE 
clause, the LockRows
+                                * node will have dealt with fetching 
non-locked marked rows,
+                                * else we need to have ModifyTable do that.
+                                */
+                               if (parse->rowMarks)
+                                       rowMarks = NIL;
+                               else
+                                       rowMarks = root->rowMarks;
+
+                               path = (Path *)
+                                       create_modifytable_path(root, final_rel,
+                                                                               
        parse->commandType,
+                                                                               
        parse->canSetTag,
+                                                                               
        parse->resultRelation,
+                                                                               
        NIL,
+                                                                               
        false,
+                                                                               
        list_make1_int(parse->resultRelation),
+                                                                               
        list_make1(path),
+                                                                               
        list_make1(root),
+                                                                               
        withCheckOptionLists,
+                                                                               
        returningLists,
+                                                                               
        rowMarks,
+                                                                               
        parse->onConflict,
+                                                                               
        SS_assign_special_param(root));
+                       }
+
+                       /* And shove it into final_rel */
+                       add_path(final_rel, path);
+               }
+       }
 
        /*
         * Generate partial paths for final_rel, too, if outer query levels 
might
@@ -2259,6 +2336,129 @@ grouping_planner(PlannerInfo *root, bool 
inheritance_update,
 }
 
 /*
+ * partitionwise_adjust_scanjoin_target
+ *             adjusts query's targetlist for each partition in the partition 
tree
+ *             whose root is 'parent' and apply it to their paths via
+ *             apply_scanjoin_target_to_paths
+ *
+ * Its output also consists of various pieces of information that will go
+ * into the ModifyTable node that will be created for this query.
+ */
+static void
+partitionwise_adjust_scanjoin_target(PlannerInfo *root,
+                                                                        
RelOptInfo *parent,
+                                                                        List 
**subroots,
+                                                                        List 
**partitioned_rels,
+                                                                        List 
**resultRelations,
+                                                                        List 
**subpaths,
+                                                                        List 
**WCOLists,
+                                                                        List 
**returningLists,
+                                                                        List 
**rowMarks)
+{
+       Query  *parse = root->parse;
+       int             i;
+
+       *partitioned_rels = lappend(*partitioned_rels,
+                                                               
list_make1_int(parent->relid));
+
+       for (i = 0; i < parent->nparts; i++)
+       {
+               RelOptInfo *child_rel = parent->part_rels[i];
+               AppendRelInfo *appinfo;
+               int                     relid;
+               List       *tlist;
+               PathTarget *scanjoin_target;
+               bool            scanjoin_target_parallel_safe;
+               bool            scanjoin_target_same_exprs;
+               PlannerInfo *partition_subroot;
+               Query      *partition_parse;
+
+               /* Ignore pruned partitions. */
+               if (IS_DUMMY_REL(child_rel))
+                       continue;
+
+               /*
+                * Extract the original relid of partition to fetch its 
AppendRelInfo.
+                * We must find it like this, because
+                * partitionwise_make_rel_from_joinlist replaces the original 
rel
+                * with one generated by join planning which may be different.
+                */
+               relid = -1;
+               while ((relid = bms_next_member(child_rel->relids, relid)) > 0)
+                       if (root->append_rel_array[relid] &&
+                               root->append_rel_array[relid]->parent_relid ==
+                               parent->relid)
+                               break;
+
+               appinfo = root->append_rel_array[relid];
+
+               /* Translate Query structure for this partition. */
+               partition_parse = (Query *)
+                                               adjust_appendrel_attrs(root,
+                                                                               
           (Node *) parse,
+                                                                               
           1, &appinfo);
+
+               /* Recurse if partition is itself a partitioned table. */
+               if (child_rel->part_scheme)
+               {
+                       root->parse = partition_parse;
+                       partitionwise_adjust_scanjoin_target(root, child_rel,
+                                                                               
                 subroots,
+                                                                               
                 partitioned_rels,
+                                                                               
                 resultRelations,
+                                                                               
                 subpaths,
+                                                                               
                 WCOLists,
+                                                                               
                 returningLists,
+                                                                               
                 rowMarks);
+                       /* Restore the Query for processing the next partition. 
*/
+                       root->parse = parse;
+               }
+               else
+               {
+                       /*
+                        * Generate a separate PlannerInfo for this partition.  
We'll need
+                        * it when generating the ModifyTable subplan for this 
partition.
+                        */
+                       partition_subroot = makeNode(PlannerInfo);
+                       *subroots = lappend(*subroots, partition_subroot);
+                       memcpy(partition_subroot, root, sizeof(PlannerInfo));
+                       partition_subroot->parse = partition_parse;
+
+                       /*
+                        * Preprocess the translated targetlist and save it in 
the
+                        * partition's PlannerInfo for the perusal of later 
planning
+                        * steps.
+                        */
+                       tlist = preprocess_targetlist(partition_subroot);
+                       partition_subroot->processed_tlist = tlist;
+
+                       /* Apply scan/join target. */
+                       scanjoin_target = create_pathtarget(root, tlist);
+                       scanjoin_target_same_exprs = 
equal(scanjoin_target->exprs,
+                                                                               
           child_rel->reltarget->exprs);
+                       scanjoin_target_parallel_safe =
+                               is_parallel_safe(root, (Node *) 
scanjoin_target->exprs);
+                       apply_scanjoin_target_to_paths(root, child_rel,
+                                                                               
   list_make1(scanjoin_target),
+                                                                               
   NIL,
+                                                                               
   scanjoin_target_parallel_safe,
+                                                                               
   scanjoin_target_same_exprs);
+
+                       /* Collect information that will go into the 
ModifyTable */
+                       *resultRelations = lappend_int(*resultRelations, relid);
+                       *subpaths = lappend(*subpaths, 
child_rel->cheapest_total_path);
+                       if (partition_parse->withCheckOptions)
+                               *WCOLists = lappend(*WCOLists, 
partition_parse->withCheckOptions);
+                       if (partition_parse->returningList)
+                               *returningLists = lappend(*returningLists,
+                                                                               
  partition_parse->returningList);
+                       if (partition_parse->rowMarks)
+                               *rowMarks = lappend(*rowMarks, 
partition_parse->rowMarks);
+               }
+       }
+}
+
+/*
  * Do preprocessing for groupingSets clause and related data.  This handles the
  * preliminary steps of expanding the grouping sets, organizing them into lists
  * of rollups, and preparing annotations which will later be filled in with
@@ -6964,7 +7164,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
                }
 
                /* Build new paths for this relation by appending child paths. 
*/
-               if (live_children != NIL)
+               if (live_children != NIL &&
+                       !(rel->reloptkind == RELOPT_BASEREL &&
+                         rel->relid == root->parse->resultRelation))
                        add_paths_to_append_rel(root, rel, live_children);
        }
 
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index 690b6bbab7..f4c485cdc9 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -2265,8 +2265,34 @@ adjust_appendrel_attrs_mutator(Node *node,
                                                                                
          context->appinfos);
                return (Node *) phv;
        }
+
+       if (IsA(node, SpecialJoinInfo))
+       {
+               SpecialJoinInfo *oldinfo = (SpecialJoinInfo *) node;
+               SpecialJoinInfo *newinfo = makeNode(SpecialJoinInfo);
+
+               memcpy(newinfo, oldinfo, sizeof(SpecialJoinInfo));
+               newinfo->min_lefthand = 
adjust_child_relids(oldinfo->min_lefthand,
+                                                                               
                        context->nappinfos,
+                                                                               
                        context->appinfos);
+               newinfo->min_righthand = 
adjust_child_relids(oldinfo->min_righthand,
+                                                                               
                         context->nappinfos,
+                                                                               
                         context->appinfos);
+               newinfo->syn_lefthand = 
adjust_child_relids(oldinfo->syn_lefthand,
+                                                                               
                        context->nappinfos,
+                                                                               
                        context->appinfos);
+               newinfo->syn_righthand = 
adjust_child_relids(oldinfo->syn_righthand,
+                                                                               
                         context->nappinfos,
+                                                                               
                         context->appinfos);
+               newinfo->semi_rhs_exprs =
+                                       (List *) expression_tree_mutator((Node 
*)
+                                                                               
                         oldinfo->semi_rhs_exprs,
+                                                                               
                         adjust_appendrel_attrs_mutator,
+                                                                               
                         (void *) context);
+               return (Node *) newinfo;
+       }
+
        /* Shouldn't need to handle planner auxiliary nodes here */
-       Assert(!IsA(node, SpecialJoinInfo));
        Assert(!IsA(node, AppendRelInfo));
        Assert(!IsA(node, PlaceHolderInfo));
        Assert(!IsA(node, MinMaxAggInfo));
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index 8369e3ad62..8d67f21f42 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1265,36 +1265,6 @@ get_relation_constraints(PlannerInfo *root,
                }
        }
 
-       /*
-        * Append partition predicates, if any.
-        *
-        * For selects, partition pruning uses the parent table's partition 
bound
-        * descriptor, instead of constraint exclusion which is driven by the
-        * individual partition's partition constraint.
-        */
-       if (enable_partition_pruning && root->parse->commandType != CMD_SELECT)
-       {
-               List       *pcqual = RelationGetPartitionQual(relation);
-
-               if (pcqual)
-               {
-                       /*
-                        * Run the partition quals through const-simplification 
similar to
-                        * check constraints.  We skip canonicalize_qual, 
though, because
-                        * partition quals should be in canonical form already; 
also,
-                        * since the qual is in implicit-AND format, we'd have 
to
-                        * explicitly convert it to explicit-AND format and 
back again.
-                        */
-                       pcqual = (List *) eval_const_expressions(root, (Node *) 
pcqual);
-
-                       /* Fix Vars to have the desired varno */
-                       if (varno != 1)
-                               ChangeVarNodes((Node *) pcqual, 1, varno, 0);
-
-                       result = list_concat(result, pcqual);
-               }
-       }
-
        heap_close(relation, NoLock);
 
        return result;
diff --git a/src/test/regress/expected/partition_join.out 
b/src/test/regress/expected/partition_join.out
index 7d04d12c6e..9074182512 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1752,7 +1752,7 @@ WHERE EXISTS (
                Filter: (c IS NULL)
          ->  Nested Loop
                ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss_1
+               ->  Subquery Scan on ss
                      ->  Limit
                            ->  Seq Scan on int8_tbl int8_tbl_1
    ->  Nested Loop Semi Join
@@ -1760,7 +1760,7 @@ WHERE EXISTS (
                Filter: (c IS NULL)
          ->  Nested Loop
                ->  Seq Scan on int4_tbl
-               ->  Subquery Scan on ss_2
+               ->  Subquery Scan on ss
                      ->  Limit
                            ->  Seq Scan on int8_tbl int8_tbl_2
 (28 rows)
-- 
2.11.0

From bed30ca4b5ddd258a7593d24aeffd7db2a6e70c9 Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Wed, 16 May 2018 14:35:40 +0900
Subject: [PATCH 2/3] Lazy creation of partition objects for planning

With the current approach, *all* partitions are opened and range
table entries are created for them in the planner's prep phase, which
is much sooner than when partition pruning is performed.  This means
that query_planner ends up spending cycles and memory on many
partitions that potentially won't be included in the plan, such
as creating RelOptInfos, AppendRelInfos.

To avoid that, add partition range table entries and other planning
data structures for only partitions that remain after applying
partition pruning.

Some code like that of partitionwise join rely on the fact that even
though partitions may have been pruned, they would still have a
RelOptInfo, albeit marked dummy to handle the outer join case where
the pruned partition appears on the nullable side of join.  So this
commit also teaches the partitionwise join code to allocate dummy
RelOptInfos for pruned partitions.

There are couple of regression test diffs caused by the fact that
we no longer allocate a duplicate RT entry for a partitioned table
in its role as child and also that the individual partition RT
entries are now created in the order in which their parent's are
processed whereas previously they'd be added to the range table
in the order of depth-first expansion of the tree.
---
 src/backend/optimizer/path/allpaths.c             |  60 +++--
 src/backend/optimizer/path/joinrels.c             |   5 +
 src/backend/optimizer/plan/initsplan.c            |  60 +++++
 src/backend/optimizer/plan/planmain.c             |  30 ---
 src/backend/optimizer/plan/planner.c              |   8 +-
 src/backend/optimizer/prep/prepunion.c            | 314 +++++++++-------------
 src/backend/optimizer/util/plancat.c              |  12 +-
 src/backend/optimizer/util/relnode.c              | 169 ++++++++++--
 src/backend/partitioning/partprune.c              | 100 ++++---
 src/include/nodes/relation.h                      |   4 +
 src/include/optimizer/pathnode.h                  |   6 +
 src/include/optimizer/plancat.h                   |   2 +-
 src/include/optimizer/planmain.h                  |   3 +
 src/include/optimizer/prep.h                      |  10 +
 src/include/partitioning/partprune.h              |   2 +-
 src/test/regress/expected/join.out                |  22 +-
 src/test/regress/expected/partition_aggregate.out |   4 +-
 17 files changed, 486 insertions(+), 325 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 5937c0436a..d6d1e26209 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -151,6 +151,7 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 {
        RelOptInfo *rel;
        Index           rti;
+       double          total_pages;
 
        /*
         * Construct the all_baserels Relids set.
@@ -181,6 +182,35 @@ make_one_rel(PlannerInfo *root, List *joinlist)
         * then generate access paths.
         */
        set_base_rel_sizes(root);
+
+       /*
+        * We should now have size estimates for every actual table involved in
+        * the query, and we also know which if any have been deleted from the
+        * query by join removal; so we can compute total_table_pages.
+        *
+        * Note that appendrels are not double-counted here, even though we 
don't
+        * bother to distinguish RelOptInfos for appendrel parents, because the
+        * parents will still have size zero.
+        *
+        * XXX if a table is self-joined, we will count it once per appearance,
+        * which perhaps is the wrong thing ... but that's not completely clear,
+        * and detecting self-joins here is difficult, so ignore it for now.
+        */
+       total_pages = 0;
+       for (rti = 1; rti < root->simple_rel_array_size; rti++)
+       {
+               RelOptInfo *brel = root->simple_rel_array[rti];
+
+               if (brel == NULL)
+                       continue;
+
+               Assert(brel->relid == rti); /* sanity check on array */
+
+               if (IS_SIMPLE_REL(brel))
+                       total_pages += (double) brel->pages;
+       }
+       root->total_table_pages = total_pages;
+
        set_base_rel_pathlists(root);
 
        /*
@@ -896,8 +926,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
        double     *parent_attrsizes;
        int                     nattrs;
        ListCell   *l;
-       Relids          live_children = NULL;
-       bool            did_pruning = false;
 
        /* Guard against stack overflow due to overly deep inheritance tree. */
        check_stack_depth();
@@ -913,21 +941,14 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
         * partitioned table's list will contain all such indexes.
         */
        if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+       {
                rel->partitioned_child_rels = list_make1_int(rti);
 
-       /*
-        * If the partitioned relation has any baserestrictinfo quals then we
-        * attempt to use these quals to prune away partitions that cannot
-        * possibly contain any tuples matching these quals.  In this case we'll
-        * store the relids of all partitions which could possibly contain a
-        * matching tuple, and skip anything else in the loop below.
-        */
-       if (enable_partition_pruning &&
-               rte->relkind == RELKIND_PARTITIONED_TABLE &&
-               rel->baserestrictinfo != NIL)
-       {
-               live_children = prune_append_rel_partitions(rel);
-               did_pruning = true;
+               /*
+                * And do prunin.  Note that this adds AppendRelInfo's of only 
the
+                * partitions that are not pruned.
+                */
+               prune_append_rel_partitions(root, rel);
        }
 
        /*
@@ -1178,13 +1199,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
                        continue;
                }
 
-               if (did_pruning && !bms_is_member(appinfo->child_relid, 
live_children))
-               {
-                       /* This partition was pruned; skip it. */
-                       set_dummy_rel_pathlist(childrel);
-                       continue;
-               }
-
                if (relation_excluded_by_constraints(root, childrel, childRTE))
                {
                        /*
@@ -2637,7 +2651,7 @@ partitionwise_make_rel_from_joinlist(PlannerInfo *root,
                List       *saved_join_info_list = 
list_copy(root->join_info_list);
 
                /* Ignore pruned partitions. */
-               if (IS_DUMMY_REL(partrel))
+               if (partrel == NULL || IS_DUMMY_REL(partrel))
                        continue;
 
                /*
diff --git a/src/backend/optimizer/path/joinrels.c 
b/src/backend/optimizer/path/joinrels.c
index 7008e1318e..af9c4ac8fd 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1369,6 +1369,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo 
*rel1, RelOptInfo *rel2,
                AppendRelInfo **appinfos;
                int                     nappinfos;
 
+               if (child_rel1 == NULL)
+                       child_rel1 = build_dummy_partition_rel(root, rel1, 
cnt_parts);
+               if (child_rel2 == NULL)
+                       child_rel2 = build_dummy_partition_rel(root, rel2, 
cnt_parts);
+
                /* We should never try to join two overlapping sets of rels. */
                Assert(!bms_overlap(child_rel1->relids, child_rel2->relids));
                child_joinrelids = bms_union(child_rel1->relids, 
child_rel2->relids);
diff --git a/src/backend/optimizer/plan/initsplan.c 
b/src/backend/optimizer/plan/initsplan.c
index 01335db511..d85f782d50 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -132,6 +132,66 @@ add_base_rels_to_query(PlannerInfo *root, Node *jtnode)
                         (int) nodeTag(jtnode));
 }
 
+/*
+ * add_rel_partitions_to_query
+ *             create range table entries and "otherrel" RelOptInfos and for 
the
+ *             partitions of 'rel' specified by the caller
+ *
+ * To store the objects thus created, various arrays in 'root' are expanded
+ * by repalloc'ing them.
+ */
+void
+add_rel_partitions_to_query(PlannerInfo *root, RelOptInfo *rel,
+                                                       bool scan_all_parts,
+                                                       Bitmapset *partindexes)
+{
+       int             new_size;
+       int             num_added_parts;
+       int             i;
+
+       Assert(partindexes != NULL || scan_all_parts);
+
+       /* Expand the PlannerInfo arrays to hold new partition objects. */
+       num_added_parts = scan_all_parts ? rel->nparts :
+                                               bms_num_members(partindexes);
+       new_size = root->simple_rel_array_size + num_added_parts;
+       root->simple_rte_array = (RangeTblEntry **)
+                                                       
repalloc(root->simple_rte_array,
+                                                                        
sizeof(RangeTblEntry *) * new_size);
+       root->simple_rel_array = (RelOptInfo **)
+                                                               
repalloc(root->simple_rel_array,
+                                                                               
 sizeof(RelOptInfo *) * new_size);
+       if (root->append_rel_array)
+               root->append_rel_array = (AppendRelInfo **)
+                                                                       
repalloc(root->append_rel_array,
+                                                                        
sizeof(AppendRelInfo *) * new_size);
+       else
+               root->append_rel_array = (AppendRelInfo **)
+                                                                       
palloc0(sizeof(AppendRelInfo *) *
+                                                                               
        new_size);
+
+       /* Set the contents of just allocated memory to 0. */
+       MemSet(root->simple_rte_array + root->simple_rel_array_size,
+                  0, sizeof(RangeTblEntry *) * num_added_parts);
+       MemSet(root->simple_rel_array + root->simple_rel_array_size,
+                  0, sizeof(RelOptInfo *) * num_added_parts);
+       MemSet(root->append_rel_array + root->simple_rel_array_size,
+                  0, sizeof(AppendRelInfo *) * num_added_parts);
+       root->simple_rel_array_size = new_size;
+
+       /* And add the partitions. */
+       if (scan_all_parts)
+               for (i = 0; i < rel->nparts; i++)
+                       rel->part_rels[i] = build_partition_rel(root, rel,
+                                                                               
                        rel->part_oids[i]);
+       else
+       {
+               i = -1;
+               while ((i = bms_next_member(partindexes, i)) >= 0)
+                       rel->part_rels[i] = build_partition_rel(root, rel,
+                                                                               
                        rel->part_oids[i]);
+       }
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/planmain.c 
b/src/backend/optimizer/plan/planmain.c
index 3f0d80eaa6..1bd3f0e350 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -57,8 +57,6 @@ query_planner(PlannerInfo *root, List *tlist,
        Query      *parse = root->parse;
        List       *joinlist;
        RelOptInfo *final_rel;
-       Index           rti;
-       double          total_pages;
 
        /*
         * If the query has an empty join tree, then it's something easy like
@@ -232,34 +230,6 @@ query_planner(PlannerInfo *root, List *tlist,
        extract_restriction_or_clauses(root);
 
        /*
-        * We should now have size estimates for every actual table involved in
-        * the query, and we also know which if any have been deleted from the
-        * query by join removal; so we can compute total_table_pages.
-        *
-        * Note that appendrels are not double-counted here, even though we 
don't
-        * bother to distinguish RelOptInfos for appendrel parents, because the
-        * parents will still have size zero.
-        *
-        * XXX if a table is self-joined, we will count it once per appearance,
-        * which perhaps is the wrong thing ... but that's not completely clear,
-        * and detecting self-joins here is difficult, so ignore it for now.
-        */
-       total_pages = 0;
-       for (rti = 1; rti < root->simple_rel_array_size; rti++)
-       {
-               RelOptInfo *brel = root->simple_rel_array[rti];
-
-               if (brel == NULL)
-                       continue;
-
-               Assert(brel->relid == rti); /* sanity check on array */
-
-               if (IS_SIMPLE_REL(brel))
-                       total_pages += (double) brel->pages;
-       }
-       root->total_table_pages = total_pages;
-
-       /*
         * Ready to do the primary planning.
         */
        final_rel = make_one_rel(root, joinlist);
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 076dbd3d62..88db46a6e5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2374,7 +2374,7 @@ partitionwise_adjust_scanjoin_target(PlannerInfo *root,
                Query      *partition_parse;
 
                /* Ignore pruned partitions. */
-               if (IS_DUMMY_REL(child_rel))
+               if (child_rel == NULL || IS_DUMMY_REL(child_rel))
                        continue;
 
                /*
@@ -7134,6 +7134,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
                        int                     nappinfos;
                        List       *child_scanjoin_targets = NIL;
 
+                       if (child_rel == NULL)
+                               continue;
+
                        /* Translate scan/join targets for this child. */
                        appinfos = find_appinfos_by_relids(root, 
child_rel->relids,
                                                                                
           &nappinfos);
@@ -7237,6 +7240,9 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
                RelOptInfo *child_grouped_rel;
                RelOptInfo *child_partially_grouped_rel;
 
+               if (child_input_rel == NULL)
+                       continue;
+
                /* Input child rel must have a path */
                Assert(child_input_rel->pathlist != NIL);
 
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index f4c485cdc9..279f686fb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -49,6 +49,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
+#include "utils/lsyscache.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
 #include "utils/syscache.h"
@@ -101,21 +103,10 @@ static List *generate_append_tlist(List *colTypes, List 
*colCollations,
 static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
 static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
                                                 Index rti);
-static void expand_partitioned_rtentry(PlannerInfo *root,
-                                                  RangeTblEntry *parentrte,
-                                                  Index parentRTindex, 
Relation parentrel,
-                                                  PlanRowMark *top_parentrc, 
LOCKMODE lockmode,
-                                                  List **appinfos);
-static void expand_single_inheritance_child(PlannerInfo *root,
-                                                               RangeTblEntry 
*parentrte,
-                                                               Index 
parentRTindex, Relation parentrel,
-                                                               PlanRowMark 
*top_parentrc, Relation childrel,
-                                                               List 
**appinfos, RangeTblEntry **childrte_p,
-                                                               Index 
*childRTindex_p);
-static void make_inh_translation_list(Relation oldrelation,
-                                                 Relation newrelation,
-                                                 Index newvarno,
-                                                 List **translated_vars);
+static void make_inh_translation_list(TupleDesc old_tupdesc,
+                                                 TupleDesc new_tupdesc,
+                                                 RangeTblEntry *oldrte, 
RangeTblEntry *newrte,
+                                                 Index newvarno, List 
**translated_vars);
 static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
                                        List *translated_vars);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
@@ -1522,6 +1513,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry 
*rte, Index rti)
        LOCKMODE        lockmode;
        List       *inhOIDs;
        ListCell   *l;
+       List       *appinfos = NIL;
 
        /* Does RT entry allow inheritance? */
        if (!rte->inh)
@@ -1585,173 +1577,58 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
        if (oldrc)
                oldrc->isParent = true;
 
+       /* Partitioned tables are expanded elsewhere. */
+       if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+       {
+               list_free(inhOIDs);
+               return;
+       }
+
        /*
         * Must open the parent relation to examine its tupdesc.  We need not 
lock
         * it; we assume the rewriter already did.
         */
        oldrelation = heap_open(parentOID, NoLock);
 
-       /* Scan the inheritance set and expand it */
-       if (RelationGetPartitionDesc(oldrelation) != NULL)
+       foreach(l, inhOIDs)
        {
-               Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
+               Oid                     childOID = lfirst_oid(l);
+               Index           childRTindex = 0;
+               RangeTblEntry *childrte = NULL;
+               AppendRelInfo *appinfo = NULL;
 
-               /*
-                * If this table has partitions, recursively expand them in the 
order
-                * in which they appear in the PartitionDesc.  While at it, also
-                * extract the partition key columns of all the partitioned 
tables.
-                */
-               expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
-                                                                  lockmode, 
&root->append_rel_list);
+               add_inheritance_child_to_query(root, rte, rti,
+                                                                          
oldrelation->rd_rel->reltype,
+                                                                          
RelationGetDescr(oldrelation),
+                                                                          
oldrc, childOID, NoLock,
+                                                                          
&appinfo, &childrte,
+                                                                          
&childRTindex);
+               Assert(childRTindex > 1);
+               Assert(childrte != NULL);
+               Assert(appinfo != NULL);
+               appinfos = lappend(appinfos, appinfo);
        }
+
+       /*
+        * If all the children were temp tables, pretend it's a
+        * non-inheritance situation; we don't need Append node in that case.
+        * The duplicate RTE we added for the parent table is harmless, so we
+        * don't bother to get rid of it; ditto for the useless PlanRowMark
+        * node.
+        */
+       if (list_length(appinfos) < 2)
+               rte->inh = false;
        else
-       {
-               List       *appinfos = NIL;
-               RangeTblEntry *childrte;
-               Index           childRTindex;
-
-               /*
-                * This table has no partitions.  Expand any plain inheritance
-                * children in the order the OIDs were returned by
-                * find_all_inheritors.
-                */
-               foreach(l, inhOIDs)
-               {
-                       Oid                     childOID = lfirst_oid(l);
-                       Relation        newrelation;
-
-                       /* Open rel if needed; we already have required locks */
-                       if (childOID != parentOID)
-                               newrelation = heap_open(childOID, NoLock);
-                       else
-                               newrelation = oldrelation;
-
-                       /*
-                        * It is possible that the parent table has children 
that are temp
-                        * tables of other backends.  We cannot safely access 
such tables
-                        * (because of buffering issues), and the best thing to 
do seems
-                        * to be to silently ignore them.
-                        */
-                       if (childOID != parentOID && 
RELATION_IS_OTHER_TEMP(newrelation))
-                       {
-                               heap_close(newrelation, lockmode);
-                               continue;
-                       }
-
-                       expand_single_inheritance_child(root, rte, rti, 
oldrelation, oldrc,
-                                                                               
        newrelation,
-                                                                               
        &appinfos, &childrte,
-                                                                               
        &childRTindex);
-
-                       /* Close child relations, but keep locks */
-                       if (childOID != parentOID)
-                               heap_close(newrelation, NoLock);
-               }
-
-               /*
-                * If all the children were temp tables, pretend it's a
-                * non-inheritance situation; we don't need Append node in that 
case.
-                * The duplicate RTE we added for the parent table is harmless, 
so we
-                * don't bother to get rid of it; ditto for the useless 
PlanRowMark
-                * node.
-                */
-               if (list_length(appinfos) < 2)
-                       rte->inh = false;
-               else
-                       root->append_rel_list = 
list_concat(root->append_rel_list,
-                                                                               
                appinfos);
-
-       }
+               root->append_rel_list = list_concat(root->append_rel_list,
+                                                                               
        appinfos);
 
        heap_close(oldrelation, NoLock);
 }
 
 /*
- * expand_partitioned_rtentry
- *             Recursively expand an RTE for a partitioned table.
- *
- * Note that RelationGetPartitionDispatchInfo will expand partitions in the
- * same order as this code.
- */
-static void
-expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
-                                                  Index parentRTindex, 
Relation parentrel,
-                                                  PlanRowMark *top_parentrc, 
LOCKMODE lockmode,
-                                                  List **appinfos)
-{
-       int                     i;
-       RangeTblEntry *childrte;
-       Index           childRTindex;
-       PartitionDesc partdesc = RelationGetPartitionDesc(parentrel);
-
-       check_stack_depth();
-
-       /* A partitioned table should always have a partition descriptor. */
-       Assert(partdesc);
-
-       Assert(parentrte->inh);
-
-       /*
-        * Note down whether any partition key cols are being updated. Though 
it's
-        * the root partitioned table's updatedCols we are interested in, we
-        * instead use parentrte to get the updatedCols. This is convenient
-        * because parentrte already has the root partrel's updatedCols 
translated
-        * to match the attribute ordering of parentrel.
-        */
-       if (!root->partColsUpdated)
-               root->partColsUpdated =
-                       has_partition_attrs(parentrel, parentrte->updatedCols, 
NULL);
-
-       /* First expand the partitioned table itself. */
-       expand_single_inheritance_child(root, parentrte, parentRTindex, 
parentrel,
-                                                                       
top_parentrc, parentrel,
-                                                                       
appinfos, &childrte, &childRTindex);
-
-       /*
-        * If the partitioned table has no partitions, treat this as the
-        * non-inheritance case.
-        */
-       if (partdesc->nparts == 0)
-       {
-               parentrte->inh = false;
-               return;
-       }
-
-       for (i = 0; i < partdesc->nparts; i++)
-       {
-               Oid                     childOID = partdesc->oids[i];
-               Relation        childrel;
-
-               /* Open rel; we already have required locks */
-               childrel = heap_open(childOID, NoLock);
-
-               /*
-                * Temporary partitions belonging to other sessions should have 
been
-                * disallowed at definition, but for paranoia's sake, let's 
double
-                * check.
-                */
-               if (RELATION_IS_OTHER_TEMP(childrel))
-                       elog(ERROR, "temporary relation from another session 
found as partition");
-
-               expand_single_inheritance_child(root, parentrte, parentRTindex,
-                                                                               
parentrel, top_parentrc, childrel,
-                                                                               
appinfos, &childrte, &childRTindex);
-
-               /* If this child is itself partitioned, recurse */
-               if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-                       expand_partitioned_rtentry(root, childrte, childRTindex,
-                                                                          
childrel, top_parentrc, lockmode,
-                                                                          
appinfos);
-
-               /* Close child relation, but keep locks */
-               heap_close(childrel, NoLock);
-       }
-}
-
-/*
- * expand_single_inheritance_child
+ * add_inheritance_child_to_query
  *             Build a RangeTblEntry and an AppendRelInfo, if appropriate, plus
- *             maybe a PlanRowMark.
+ *             maybe a PlanRowMark for a child relation.
  *
  * We now expand the partition hierarchy level by level, creating a
  * corresponding hierarchy of AppendRelInfos and RelOptInfos, where each
@@ -1769,19 +1646,70 @@ expand_partitioned_rtentry(PlannerInfo *root, 
RangeTblEntry *parentrte,
  * The child RangeTblEntry and its RTI are returned in "childrte_p" and
  * "childRTindex_p" resp.
  */
-static void
-expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
-                                                               Index 
parentRTindex, Relation parentrel,
-                                                               PlanRowMark 
*top_parentrc, Relation childrel,
-                                                               List 
**appinfos, RangeTblEntry **childrte_p,
-                                                               Index 
*childRTindex_p)
+void
+add_inheritance_child_to_query(PlannerInfo *root, RangeTblEntry *parentrte,
+                                                          Index parentRTindex, 
Oid parentRelType,
+                                                          TupleDesc parentDesc,
+                                                          PlanRowMark 
*top_parentrc,
+                                                          Oid childOID, int 
lockmode,
+                                                          AppendRelInfo 
**appinfo_p,
+                                                          RangeTblEntry 
**childrte_p,
+                                                          Index 
*childRTindex_p)
 {
        Query      *parse = root->parse;
-       Oid                     parentOID = RelationGetRelid(parentrel);
-       Oid                     childOID = RelationGetRelid(childrel);
+       Oid                     parentOID = parentrte->relid;
        RangeTblEntry *childrte;
        Index           childRTindex;
        AppendRelInfo *appinfo;
+       Relation        childrel = NULL;
+       char            child_relkind;
+       Oid                     child_reltype;
+       TupleDesc       childDesc;
+
+       *appinfo_p = NULL;
+       *childrte_p = NULL;
+       *childRTindex_p = 0;
+
+       /* Open rel if needed; we already have required locks */
+       if (childOID != parentOID)
+       {
+               childrel = heap_open(childOID, lockmode);
+
+               /*
+                * Temporary partitions belonging to other sessions should have 
been
+                * disallowed at definition, but for paranoia's sake, let's 
double
+                * check.
+                */
+               if (RELATION_IS_OTHER_TEMP(childrel))
+               {
+                       if (childrel->rd_rel->relispartition)
+                               elog(ERROR, "temporary relation from another 
session found as partition");
+                       heap_close(childrel, lockmode);
+                       return;
+               }
+
+               child_relkind = childrel->rd_rel->relkind;
+
+               /*
+                * No point in adding to the query a partitioned table that has 
no
+                * partitions.
+                */
+               if (child_relkind == RELKIND_PARTITIONED_TABLE &&
+                       RelationGetPartitionDesc(childrel)->nparts == 0)
+               {
+                       heap_close(childrel, lockmode);
+                       return;
+               }
+
+               child_reltype = childrel->rd_rel->reltype;
+               childDesc = RelationGetDescr(childrel);
+       }
+       else
+       {
+               child_relkind = parentrte->relkind;
+               child_reltype =  parentRelType;
+               childDesc = parentDesc;
+       }
 
        /*
         * Build an RTE for the child, and attach to query's rangetable list. We
@@ -1798,7 +1726,7 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
        childrte = copyObject(parentrte);
        *childrte_p = childrte;
        childrte->relid = childOID;
-       childrte->relkind = childrel->rd_rel->relkind;
+       childrte->relkind = child_relkind;
        /* A partitioned child will need to be expanded further. */
        if (childOID != parentOID &&
                childrte->relkind == RELKIND_PARTITIONED_TABLE)
@@ -1823,12 +1751,13 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
                appinfo = makeNode(AppendRelInfo);
                appinfo->parent_relid = parentRTindex;
                appinfo->child_relid = childRTindex;
-               appinfo->parent_reltype = parentrel->rd_rel->reltype;
-               appinfo->child_reltype = childrel->rd_rel->reltype;
-               make_inh_translation_list(parentrel, childrel, childRTindex,
+               appinfo->parent_reltype = parentRelType;
+               appinfo->child_reltype = child_reltype;
+               make_inh_translation_list(parentDesc, childDesc,
+                                                                 parentrte, 
childrte, childRTindex,
                                                                  
&appinfo->translated_vars);
                appinfo->parent_reloid = parentOID;
-               *appinfos = lappend(*appinfos, appinfo);
+               *appinfo_p = appinfo;
 
                /*
                 * Translate the column permissions bitmaps to the child's 
attnums (we
@@ -1879,6 +1808,13 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
 
                root->rowMarks = lappend(root->rowMarks, childrc);
        }
+
+       /* Close child relations, but keep locks */
+       if (childOID != parentOID)
+       {
+               Assert(childrel != NULL);
+               heap_close(childrel, lockmode);
+       }
 }
 
 /*
@@ -1889,14 +1825,12 @@ expand_single_inheritance_child(PlannerInfo *root, 
RangeTblEntry *parentrte,
  * For paranoia's sake, we match type/collation as well as attribute name.
  */
 static void
-make_inh_translation_list(Relation oldrelation, Relation newrelation,
-                                                 Index newvarno,
-                                                 List **translated_vars)
+make_inh_translation_list(TupleDesc old_tupdesc, TupleDesc new_tupdesc,
+                                                 RangeTblEntry *oldrte, 
RangeTblEntry *newrte,
+                                                 Index newvarno, List 
**translated_vars)
 {
        List       *vars = NIL;
-       TupleDesc       old_tupdesc = RelationGetDescr(oldrelation);
-       TupleDesc       new_tupdesc = RelationGetDescr(newrelation);
-       Oid                     new_relid = RelationGetRelid(newrelation);
+       Oid                     new_relid = newrte->relid;
        int                     oldnatts = old_tupdesc->natts;
        int                     newnatts = new_tupdesc->natts;
        int                     old_attno;
@@ -1926,7 +1860,7 @@ make_inh_translation_list(Relation oldrelation, Relation 
newrelation,
                 * When we are generating the "translation list" for the parent 
table
                 * of an inheritance set, no need to search for matches.
                 */
-               if (oldrelation == newrelation)
+               if (oldrte->relid == newrte->relid)
                {
                        vars = lappend(vars, makeVar(newvarno,
                                                                                
 (AttrNumber) (old_attno + 1),
@@ -1955,7 +1889,7 @@ make_inh_translation_list(Relation oldrelation, Relation 
newrelation,
                        newtup = SearchSysCacheAttName(new_relid, attname);
                        if (!newtup)
                                elog(ERROR, "could not find inherited attribute 
\"%s\" of relation \"%s\"",
-                                        attname, 
RelationGetRelationName(newrelation));
+                                        attname, get_rel_name(newrte->relid));
                        new_attno = ((Form_pg_attribute) 
GETSTRUCT(newtup))->attnum - 1;
                        ReleaseSysCache(newtup);
 
@@ -1965,10 +1899,10 @@ make_inh_translation_list(Relation oldrelation, 
Relation newrelation,
                /* Found it, check type and collation match */
                if (atttypid != att->atttypid || atttypmod != att->atttypmod)
                        elog(ERROR, "attribute \"%s\" of relation \"%s\" does 
not match parent's type",
-                                attname, RelationGetRelationName(newrelation));
+                                attname, get_rel_name(newrte->relid));
                if (attcollation != att->attcollation)
                        elog(ERROR, "attribute \"%s\" of relation \"%s\" does 
not match parent's collation",
-                                attname, RelationGetRelationName(newrelation));
+                                attname, get_rel_name(newrte->relid));
 
                vars = lappend(vars, makeVar(newvarno,
                                                                         
(AttrNumber) (new_attno + 1),
@@ -2121,7 +2055,7 @@ adjust_appendrel_attrs_mutator(Node *node,
                        }
                }
 
-               if (var->varlevelsup == 0 && appinfo)
+               if (var->varlevelsup == 0 && appinfo && 
appinfo->translated_vars)
                {
                        var->varno = appinfo->child_relid;
                        var->varnoold = appinfo->child_relid;
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index 8d67f21f42..100dfd8e0c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -106,7 +106,7 @@ static void set_baserel_partition_key_exprs(Relation 
relation,
  */
 void
 get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
-                                 RelOptInfo *rel)
+                                 Bitmapset *updatedCols, RelOptInfo *rel)
 {
        Index           varno = rel->relid;
        Relation        relation;
@@ -449,7 +449,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
         * inheritance parents may be partitioned.
         */
        if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       {
                set_relation_partition_info(root, rel, relation);
+               if (!root->partColsUpdated)
+                       root->partColsUpdated =
+                               has_partition_attrs(relation, updatedCols, 
NULL);
+       }
+
+       rel->tupdesc = RelationGetDescr(relation);
+       rel->reltype = RelationGetForm(relation)->reltype;
 
        heap_close(relation, NoLock);
 
@@ -1883,6 +1891,8 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo 
*rel,
        rel->nparts = partdesc->nparts;
        set_baserel_partition_key_exprs(relation, rel);
        rel->partition_qual = RelationGetPartitionQual(relation);
+       rel->part_oids = (Oid *) palloc(rel->nparts * sizeof(Oid));
+       memcpy(rel->part_oids, partdesc->oids, rel->nparts * sizeof(Oid));
 }
 
 /*
diff --git a/src/backend/optimizer/util/relnode.c 
b/src/backend/optimizer/util/relnode.c
index c69740eda6..b267f07c18 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
 
 #include <limits.h>
 
+#include "catalog/pg_class.h"
 #include "miscadmin.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
@@ -27,6 +28,7 @@
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
 #include "partitioning/partbounds.h"
+#include "storage/lockdefs.h"
 #include "utils/hsearch.h"
 
 
@@ -137,6 +139,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
 
        /* Rel should not exist already */
        Assert(relid > 0 && relid < root->simple_rel_array_size);
+
        if (root->simple_rel_array[relid] != NULL)
                elog(ERROR, "rel %d already exists", relid);
 
@@ -218,7 +221,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
        {
                case RTE_RELATION:
                        /* Table --- retrieve statistics from the system 
catalogs */
-                       get_relation_info(root, rte->relid, rte->inh, rel);
+                       get_relation_info(root, rte->relid, rte->inh, 
rte->updatedCols,
+                                                         rel);
                        break;
                case RTE_SUBQUERY:
                case RTE_FUNCTION:
@@ -268,41 +272,30 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
        if (rte->inh)
        {
                ListCell   *l;
-               int                     nparts = rel->nparts;
-               int                     cnt_parts = 0;
 
-               if (nparts > 0)
+               /*
+                * For partitioned tables, we just allocate space for 
RelOptInfo's.
+                * pointers for all partitions and copy the partition OIDs from 
the
+                * relcache.  Actual RelOptInfo is built for a partition only 
if it is
+                * not pruned.
+                */
+               if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+               {
                        rel->part_rels = (RelOptInfo **)
-                               palloc(sizeof(RelOptInfo *) * nparts);
+                               palloc0(sizeof(RelOptInfo *) * rel->nparts);
+                       return rel;
+               }
 
                foreach(l, root->append_rel_list)
                {
                        AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
-                       RelOptInfo *childrel;
 
                        /* append_rel_list contains all append rels; ignore 
others */
                        if (appinfo->parent_relid != relid)
                                continue;
 
-                       childrel = build_simple_rel(root, appinfo->child_relid,
-                                                                               
rel);
-
-                       /* Nothing more to do for an unpartitioned table. */
-                       if (!rel->part_scheme)
-                               continue;
-
-                       /*
-                        * The order of partition OIDs in append_rel_list is 
the same as
-                        * the order in the PartitionDesc, so the order of 
part_rels will
-                        * also match the PartitionDesc.  See 
expand_partitioned_rtentry.
-                        */
-                       Assert(cnt_parts < nparts);
-                       rel->part_rels[cnt_parts] = childrel;
-                       cnt_parts++;
+                       (void) build_simple_rel(root, appinfo->child_relid, 
rel);
                }
-
-               /* We should have seen all the child partitions. */
-               Assert(cnt_parts == nparts);
        }
 
        return rel;
@@ -1768,3 +1761,131 @@ build_joinrel_partition_info(RelOptInfo *joinrel, 
RelOptInfo *outer_rel,
                joinrel->nullable_partexprs[cnt] = nullable_partexpr;
        }
 }
+
+/*
+ * build_dummy_partition_rel
+ *             Build a RelOptInfo and AppendRelInfo for a pruned partition
+ *
+ * This does not result in opening the relation or a range table entry being
+ * created.  Also, the RelOptInfo thus created is not stored anywhere else
+ * beside the parent's part_rels array.
+ *
+ * The only reason this exists is because partition-wise join, in some cases,
+ * needs a RelOptInfo to represent an empty relation that's on the nullable
+ * side of an outer join, so that a Path representing the outer join can be
+ * created.
+ */
+RelOptInfo *
+build_dummy_partition_rel(PlannerInfo *root, RelOptInfo *parent, int partidx)
+{
+       RelOptInfo *rel;
+
+       Assert(parent->part_rels[partidx] == NULL);
+
+       /* Create minimally valid-looking RelOptInfo with parent's relid. */
+       rel = makeNode(RelOptInfo);
+       rel->reloptkind = RELOPT_OTHER_MEMBER_REL;
+       rel->relid = parent->relid;
+       rel->relids = bms_copy(parent->relids);
+       if (parent->top_parent_relids)
+               rel->top_parent_relids = parent->top_parent_relids;
+       else
+               rel->top_parent_relids = bms_copy(parent->relids);
+       rel->reltarget = copy_pathtarget(parent->reltarget);
+       parent->part_rels[partidx] = rel;
+       mark_dummy_rel(rel);
+
+       /*
+        * Now we'll need a (noop) AppendRelInfo for parent, because we're 
setting
+        * the dummy partition's relid to be same as the parent's.
+        */
+       if (root->append_rel_array[parent->relid] == NULL)
+       {
+               AppendRelInfo *appinfo = makeNode(AppendRelInfo);
+
+               appinfo->parent_relid = parent->relid;
+               appinfo->child_relid = parent->relid;
+               appinfo->parent_reltype = parent->reltype;
+               appinfo->child_reltype = parent->reltype;
+               /* leaving translated_vars to NIL to mean no translation needed 
*/
+               appinfo->parent_reloid = 
root->simple_rte_array[parent->relid]->relid;
+               root->append_rel_array[parent->relid] = appinfo;
+       }
+
+       return rel;
+}
+
+/*
+ * build_partition_rel
+ *             This adds a valid partition to the query by adding it to the
+ *             range table and creating planner data structures for it
+ */
+RelOptInfo *
+build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid)
+{
+       RangeTblEntry *parentrte = root->simple_rte_array[parent->relid];
+       RelOptInfo *result;
+       Index           partRTindex = 0;
+       RangeTblEntry *partrte = NULL;
+       AppendRelInfo *appinfo = NULL;
+       PlanRowMark *rootrc = NULL;
+
+       /* Locate the root partitioned table and fetch its PlanRowMark, if any. 
*/
+       if (root->rowMarks)
+       {
+               Index           rootRTindex = 0;
+
+               /*
+                * The root partitioned table itself might be a child of UNION 
ALL
+                * parent, so we must resort to finding the root parent like 
this.
+                */
+               rootRTindex = parent->relid;
+               if (root->append_rel_array[rootRTindex])
+               {
+                       AppendRelInfo *tmp = 
root->append_rel_array[rootRTindex];
+
+                       /*
+                        * Keep moving up until we each the parent rel that's 
not a
+                        * partitioned table.  The one before that one would be 
the root
+                        * parent.
+                        */
+                       while(root->simple_rel_array[rootRTindex]->part_scheme)
+                       {
+                               tmp = root->append_rel_array[tmp->parent_relid];
+                               if (tmp == NULL)
+                                       break;
+                               rootRTindex = tmp->parent_relid;
+                       }
+               }
+
+               rootrc = get_plan_rowmark(root->rowMarks, rootRTindex);
+       }
+
+       /*
+        * expand_inherited_rtentry alreay locked all partitions, so pass
+        * NoLock for lockmode.
+        */
+       add_inheritance_child_to_query(root, parentrte, parent->relid,
+                                                                  
parent->reltype, parent->tupdesc,
+                                                                  rootrc, 
partoid, NoLock,
+                                                                  &appinfo, 
&partrte, &partRTindex);
+
+       /* Partition turned out to be a partitioned table with 0 partitions. */
+       if (partrte == NULL)
+               return NULL;
+
+       Assert(appinfo != NULL);
+       root->append_rel_list = lappend(root->append_rel_list, appinfo);
+       root->simple_rte_array[partRTindex] = partrte;
+       root->append_rel_array[partRTindex] = appinfo;
+
+       /* Build the RelOptInfo. */
+       result = build_simple_rel(root, partRTindex, parent);
+
+       /* Set the information created by create_lateral_join_info(). */
+       result->direct_lateral_relids = parent->direct_lateral_relids;
+       result->lateral_relids = parent->lateral_relids;
+       result->lateral_referencers = parent->lateral_referencers;
+
+       return result;
+}
diff --git a/src/backend/partitioning/partprune.c 
b/src/backend/partitioning/partprune.c
index b5c1c7d4dd..331e2717b2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -45,7 +45,9 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
+#include "optimizer/cost.h"
 #include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
@@ -443,9 +445,18 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, 
RelOptInfo *parentrel,
                for (i = 0; i < nparts; i++)
                {
                        RelOptInfo *partrel = subpart->part_rels[i];
-                       int                     subplanidx = 
relid_subplan_map[partrel->relid] - 1;
-                       int                     subpartidx = 
relid_subpart_map[partrel->relid] - 1;
+                       int                     subplanidx;
+                       int                     subpartidx;
 
+                       if (partrel == NULL)
+                       {
+                               subplan_map[i] = -1;
+                               subpart_map[i] = -1;
+                               continue;
+                       }
+
+                       subplanidx = relid_subplan_map[partrel->relid] - 1;
+                       subpartidx = relid_subpart_map[partrel->relid] - 1;
                        subplan_map[i] = subplanidx;
                        subpart_map[i] = subpartidx;
                        if (subplanidx >= 0)
@@ -548,61 +559,68 @@ gen_partprune_steps(RelOptInfo *rel, List *clauses, bool 
*contradictory)
  *
  * Callers must ensure that 'rel' is a partitioned table.
  */
-Relids
-prune_append_rel_partitions(RelOptInfo *rel)
+void
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
 {
-       Relids          result;
        List       *clauses = rel->baserestrictinfo;
        List       *pruning_steps;
-       bool            contradictory;
+       bool            contradictory,
+                               scan_all_parts = false;
        PartitionPruneContext context;
-       Bitmapset  *partindexes;
-       int                     i;
+       Bitmapset  *partindexes = NULL;
 
-       Assert(clauses != NIL);
        Assert(rel->part_scheme != NULL);
 
        /* If there are no partitions, return the empty set */
        if (rel->nparts == 0)
-               return NULL;
+               return;
 
-       /*
-        * Process clauses.  If the clauses are found to be contradictory, we 
can
-        * return the empty set.
-        */
-       pruning_steps = gen_partprune_steps(rel, clauses, &contradictory);
-       if (contradictory)
-               return NULL;
-
-       /* Set up PartitionPruneContext */
-       context.strategy = rel->part_scheme->strategy;
-       context.partnatts = rel->part_scheme->partnatts;
-       context.nparts = rel->nparts;
-       context.boundinfo = rel->boundinfo;
-       context.partcollation = rel->part_scheme->partcollation;
-       context.partsupfunc = rel->part_scheme->partsupfunc;
-       context.stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) *
+       if (enable_partition_pruning && clauses != NIL)
+       {
+               /*
+                * Process clauses.  If the clauses are found to be 
contradictory, we
+                * can return the empty set.
+                */
+               pruning_steps = gen_partprune_steps(rel, clauses, 
&contradictory);
+               if (!contradictory)
+               {
+                       context.strategy = rel->part_scheme->strategy;
+                       context.partnatts = rel->part_scheme->partnatts;
+                       context.nparts = rel->nparts;
+                       context.boundinfo = rel->boundinfo;
+                       context.partcollation = rel->part_scheme->partcollation;
+                       context.partsupfunc = rel->part_scheme->partsupfunc;
+                       context.stepcmpfuncs = (FmgrInfo *)
+                                                                               
palloc0(sizeof(FmgrInfo) *
                                                                                
                context.partnatts *
                                                                                
                list_length(pruning_steps));
-       context.ppccontext = CurrentMemoryContext;
+                       context.ppccontext = CurrentMemoryContext;
 
-       /* These are not valid when being called from the planner */
-       context.partrel = NULL;
-       context.planstate = NULL;
-       context.exprstates = NULL;
-       context.exprhasexecparam = NULL;
-       context.evalexecparams = false;
+                       /* These are not valid when being called from the 
planner */
+                       context.partrel = NULL;
+                       context.planstate = NULL;
+                       context.exprstates = NULL;
+                       context.exprhasexecparam = NULL;
+                       context.evalexecparams = false;
 
-       /* Actual pruning happens here. */
-       partindexes = get_matching_partitions(&context, pruning_steps);
+                       /* Actual pruning happens here. */
+                       partindexes = get_matching_partitions(&context, 
pruning_steps);
 
-       /* Add selected partitions' RT indexes to result. */
-       i = -1;
-       result = NULL;
-       while ((i = bms_next_member(partindexes, i)) >= 0)
-               result = bms_add_member(result, rel->part_rels[i]->relid);
+                       /* No need to add partitions if all were pruned. */
+                       if (bms_is_empty(partindexes))
+                               return;
+               }
+               else
+                       scan_all_parts = true;
+       }
+       else
+               scan_all_parts = true;
 
-       return result;
+       /*
+        * Build selected partitions' range table entries, RelOptInfos, and
+        * AppendRelInfos.
+        */
+       add_rel_partitions_to_query(root, rel, scan_all_parts, partindexes);
 }
 
 /*
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 41caf873fb..1e8371d814 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
 #define RELATION_H
 
 #include "access/sdir.h"
+#include "access/tupdesc.h"
 #include "fmgr.h"
 #include "lib/stringinfo.h"
 #include "nodes/params.h"
@@ -695,11 +696,14 @@ typedef struct RelOptInfo
        int                     nparts;                 /* number of partitions 
*/
        struct PartitionBoundInfoData *boundinfo;       /* Partition bounds */
        List       *partition_qual; /* partition constraint */
+       Oid                *part_oids;          /* partition OIDs */
        struct RelOptInfo **part_rels;  /* Array of RelOptInfos of partitions,
                                                                         * 
stored in the same order of bounds */
        List      **partexprs;          /* Non-nullable partition key 
expressions. */
        List      **nullable_partexprs; /* Nullable partition key expressions. 
*/
        List       *partitioned_child_rels; /* List of RT indexes. */
+       TupleDesc       tupdesc;
+       Oid                     reltype;
 } RelOptInfo;
 
 /*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7c5ff22650..4f567765a4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -297,5 +297,11 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
                                         RelOptInfo *outer_rel, RelOptInfo 
*inner_rel,
                                         RelOptInfo *parent_joinrel, List 
*restrictlist,
                                         SpecialJoinInfo *sjinfo, JoinType 
jointype);
+extern RelOptInfo *build_dummy_partition_rel(PlannerInfo *root,
+                                                                       
RelOptInfo *parent,
+                                                                       int 
partidx);
+extern RelOptInfo *build_partition_rel(PlannerInfo *root,
+                                                                          
RelOptInfo *parent,
+                                                                          Oid 
partoid);
 
 #endif                                                 /* PATHNODE_H */
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 7d53cbbb87..edaf2a3b4f 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -26,7 +26,7 @@ extern PGDLLIMPORT get_relation_info_hook_type 
get_relation_info_hook;
 
 
 extern void get_relation_info(PlannerInfo *root, Oid relationObjectId,
-                                 bool inhparent, RelOptInfo *rel);
+                                 bool inhparent, Bitmapset *updatedCols, 
RelOptInfo *rel);
 
 extern List *infer_arbiter_indexes(PlannerInfo *root);
 
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab0280d2..1916a33467 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,9 @@ extern int    from_collapse_limit;
 extern int     join_collapse_limit;
 
 extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
+extern void add_rel_partitions_to_query(PlannerInfo *root, RelOptInfo *rel,
+                                                       bool scan_all_parts,
+                                                       Bitmapset *partindexes);
 extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
 extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
                                           Relids where_needed, bool 
create_new_ph);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 38608770a2..ca66f75544 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -49,6 +49,16 @@ extern RelOptInfo *plan_set_operations(PlannerInfo *root);
 
 extern void expand_inherited_tables(PlannerInfo *root);
 
+extern void add_inheritance_child_to_query(PlannerInfo *root,
+                                                               RangeTblEntry 
*parentrte,
+                                                               Index 
parentRTindex, Oid parentRelType,
+                                                               TupleDesc 
parentDesc,
+                                                               PlanRowMark 
*top_parentrc,
+                                                               Oid childOID, 
int lockmode,
+                                                               AppendRelInfo 
**appinfo_p,
+                                                               RangeTblEntry 
**childrte_p,
+                                                               Index 
*childRTindex_p);
+
 extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                                           int nappinfos, AppendRelInfo 
**appinfos);
 
diff --git a/src/include/partitioning/partprune.h 
b/src/include/partitioning/partprune.h
index b95c346bab..55a324583b 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -79,7 +79,7 @@ extern PartitionPruneInfo 
*make_partition_pruneinfo(PlannerInfo *root,
                                                 List *subpaths,
                                                 List *partitioned_rels,
                                                 List *prunequal);
-extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern void prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
                                                List *pruning_steps);
 
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index dc6262be43..5f931591a6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5533,29 +5533,29 @@ select t1.b, ss.phv from join_ut1 t1 left join lateral
               (select t2.a as t2a, t3.a t3a, least(t1.a, t2.a, t3.a) phv
                                          from join_pt1 t2 join join_ut1 t3 on 
t2.a = t3.b) ss
               on t1.a = ss.t2a order by t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
+                             QUERY PLAN                             
+--------------------------------------------------------------------
  Sort
-   Output: t1.b, (LEAST(t1.a, t2.a, t3.a)), t1.a
+   Output: t1.b, (LEAST(t1.a, t2_1.a, t3.a)), t1.a
    Sort Key: t1.a
    ->  Nested Loop Left Join
-         Output: t1.b, (LEAST(t1.a, t2.a, t3.a)), t1.a
+         Output: t1.b, (LEAST(t1.a, t2_1.a, t3.a)), t1.a
          ->  Seq Scan on public.join_ut1 t1
                Output: t1.a, t1.b, t1.c
          ->  Hash Join
-               Output: t2.a, LEAST(t1.a, t2.a, t3.a)
-               Hash Cond: (t3.b = t2.a)
+               Output: t2_1.a, LEAST(t1.a, t2_1.a, t3.a)
+               Hash Cond: (t3.b = t2_1.a)
                ->  Seq Scan on public.join_ut1 t3
                      Output: t3.a, t3.b, t3.c
                ->  Hash
-                     Output: t2.a
+                     Output: t2_1.a
                      ->  Append
-                           ->  Seq Scan on public.join_pt1p1p1 t2
-                                 Output: t2.a
-                                 Filter: (t1.a = t2.a)
-                           ->  Seq Scan on public.join_pt1p2 t2_1
+                           ->  Seq Scan on public.join_pt1p1p1 t2_1
                                  Output: t2_1.a
                                  Filter: (t1.a = t2_1.a)
+                           ->  Seq Scan on public.join_pt1p2 t2
+                                 Output: t2.a
+                                 Filter: (t1.a = t2.a)
 (21 rows)
 
 select t1.b, ss.phv from join_ut1 t1 left join lateral
diff --git a/src/test/regress/expected/partition_aggregate.out 
b/src/test/regress/expected/partition_aggregate.out
index d286050c9a..d1ce6ad423 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -144,7 +144,7 @@ SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
            QUERY PLAN           
 --------------------------------
  HashAggregate
-   Group Key: pagg_tab.c
+   Group Key: c
    ->  Result
          One-Time Filter: false
 (4 rows)
@@ -159,7 +159,7 @@ SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
            QUERY PLAN           
 --------------------------------
  GroupAggregate
-   Group Key: pagg_tab.c
+   Group Key: c
    ->  Result
          One-Time Filter: false
 (4 rows)
-- 
2.11.0

From 57b8cadddce13952a0a62d37c51dd02c7a436ebc Mon Sep 17 00:00:00 2001
From: amit <amitlangot...@gmail.com>
Date: Thu, 23 Aug 2018 17:30:18 +0900
Subject: [PATCH 3/3] Only lock partitions that will be scanned by a query

---
 src/backend/optimizer/prep/prepunion.c |  8 +++-----
 src/backend/optimizer/util/relnode.c   | 17 ++++++++++-------
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index 279f686fb0..6a2adb5f4d 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1555,14 +1555,15 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
                lockmode = AccessShareLock;
 
        /* Scan for all members of inheritance set, acquire needed locks */
-       inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+       if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+               inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
 
        /*
         * Check that there's at least one descendant, else treat as no-child
         * case.  This could happen despite above has_subclass() check, if table
         * once had a child but no longer does.
         */
-       if (list_length(inhOIDs) < 2)
+       if (rte->relkind != RELKIND_PARTITIONED_TABLE && list_length(inhOIDs) < 
2)
        {
                /* Clear flag before returning */
                rte->inh = false;
@@ -1579,10 +1580,7 @@ expand_inherited_rtentry(PlannerInfo *root, 
RangeTblEntry *rte, Index rti)
 
        /* Partitioned tables are expanded elsewhere. */
        if (rte->relkind == RELKIND_PARTITIONED_TABLE)
-       {
-               list_free(inhOIDs);
                return;
-       }
 
        /*
         * Must open the parent relation to examine its tupdesc.  We need not 
lock
diff --git a/src/backend/optimizer/util/relnode.c 
b/src/backend/optimizer/util/relnode.c
index b267f07c18..f9bde0c058 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1825,16 +1825,16 @@ build_partition_rel(PlannerInfo *root, RelOptInfo 
*parent, Oid partoid)
 {
        RangeTblEntry *parentrte = root->simple_rte_array[parent->relid];
        RelOptInfo *result;
+       Index           rootRTindex = 0;
        Index           partRTindex = 0;
        RangeTblEntry *partrte = NULL;
        AppendRelInfo *appinfo = NULL;
        PlanRowMark *rootrc = NULL;
+       int                     lockmode;
 
        /* Locate the root partitioned table and fetch its PlanRowMark, if any. 
*/
        if (root->rowMarks)
        {
-               Index           rootRTindex = 0;
-
                /*
                 * The root partitioned table itself might be a child of UNION 
ALL
                 * parent, so we must resort to finding the root parent like 
this.
@@ -1861,13 +1861,16 @@ build_partition_rel(PlannerInfo *root, RelOptInfo 
*parent, Oid partoid)
                rootrc = get_plan_rowmark(root->rowMarks, rootRTindex);
        }
 
-       /*
-        * expand_inherited_rtentry alreay locked all partitions, so pass
-        * NoLock for lockmode.
-        */
+       /* Determine the correct lockmode to use. */
+       if (rootRTindex == root->parse->resultRelation)
+               lockmode = RowExclusiveLock;
+       else if (rootrc && RowMarkRequiresRowShareLock(rootrc->markType))
+               lockmode = RowShareLock;
+       else
+               lockmode = AccessShareLock;
        add_inheritance_child_to_query(root, parentrte, parent->relid,
                                                                   
parent->reltype, parent->tupdesc,
-                                                                  rootrc, 
partoid, NoLock,
+                                                                  rootrc, 
partoid, lockmode,
                                                                   &appinfo, 
&partrte, &partRTindex);
 
        /* Partition turned out to be a partitioned table with 0 partitions. */
-- 
2.11.0

Reply via email to