The patch-set in [1] supports partition-wise join when the partition bounds and
partition keys of the joining tables exactly match. The last two patches in the
last few patch-sets in that thread implement more
advanced partition matching code. In order to avoid mixing reviews for advanced
partition matching and the basic partition-wise join implementation, I am
starting a new thread to discuss the same. I am attaching the last two
patches from that patch set here.

The new partition matching algorithm handles following cases when a
given partition
on one side has at most one matching partition matching on the other side.

1. When the ranges of the joining tables do not match exactly E.g. partition
table t1 has partitions t1p1 (0 - 100), t1p2 (150 - 200) and partition table t2
has partitions t2p1 (0 - 50), t2p2 (100 - 175). In this case (t1p1, t2p1) and
(t1p2, t2p2) form the matching partition pairs, which can be joined. While
matching the pairs, we also compute the partition bounds for the resulting
join. An INNER join between t1 and t2 will have ranges (0 - 50) since no row
with 50 <= key < 100 from t1p1 is going to find a matching row in t2p1 and (150
- 175) since no row with 100 <= key < 150 from t2p2 is going to find a matching
row in t1p2 and no row with 175 <= key < 200 in t1p2 is going to find a
matching row in t1p2. A t1 LEFT join t2 on the other hand will have ranges same
as the outer relation i.e. t1, (0 - 100), (150 - 200) since all rows from t1
will be part of the join. Thus depending upon the type of join the partition
bounds of the resultant join relation change. Similarly for list partitioned
table, when the lists do not match exactly, the algorithm finds matching pairs
of partitions and the lists of resultant join relation. E.g.  t1 has
partitions t1p1 ('a',
'b', 'c'), t1p2 ('e', 'f') and t2 has partitions t2p1 ('a', 'b'), t2p2 ('d',
'e', 'f'). In this case (t1p1, t2p1) and (t2p1, t2p2) form the matching
pairs which are joined. Inner join will have bounds ('a','b'), ('e', 'f') and
t1 LEFT JOIN t2 will have bounds same as t1.

2. When one or both side have at least one partition that does not have
matching partition on the other side. E.g. t1 has partitions t1p1 ('a','b'),
t1p2 ('c','d') and t2 has only one partition t2p1 ('a','b') OR t1 has
partitions t1p1 (0 - 100), t1p2 (100 - 200) and t2 has only one partition t2p1
(0 - 100). In this case as well different types of joins will have different
partition bounds for the result using similar rules described above.

3. A combination of 1 and 2 e.g. t1 has partitions t1p1('a','b','c'),
t1p2('d','e','f') and t2 has a single partition t2p1 ('a','b', 'z').

Algorithm
---------
The pairs of matching partitions and the partition bounds of the join are
calculated by an algorithm similar to merge join.

In such a join, it can be observed that every partition on either side,
contributes to at most one partition of the resultant join relation. Thus for
every partition on either side, we keep track of the partition of resultant
join (if any), which it contributes to.  If multiple partitions from any of the
joining relations map to a single partition of the resultant join, we need to
gang those partitions together before joining the partition/s from the other
side. Since we do not have infrastructure for ganging multiple arbitrary
RelOptInfos together in a parent RelOptInfo, we do not support such a
partitionw-wise join right now. We stop merging the bounds immediately when we
detect such a case.

For list partitioned tables, we compare list values from both the sides,
starting with the lowest. If the two list values being compared match,
corresponding partitions from both sides form a pair of partitions to be
joined. We record this mapping and also include the list value in join bounds.
If the two list values do not match and the lower of those two comes from the
outer side of the join, we include it in the join bounds. We advance to the
next list value on side with the lower list value continuing the process of
merging till list values on at least one side are exhausted. If the remaining
values are from the outer side, we include those in the join partition bounds.
Every list value included in the join bounds, and its originating partition/s
are associated with appropriate partition of the resultant join. For more
details please see partition_list_bounds_merge() in the attached patch.

In case of range partitioned tables, we compare the ranges of the partitions in
increasing order of their bounds. If two ranges being compared overlap,
corresponding partitions from both sides form a pair of partitions to be
joined. We record this mapping and also include the merged range in the bounds
of resultant join. The overlapping ranges are merged based on the type of join
as described above. If either of the ranges completely precedes the other, and
it's on the outer side, we include that range in the bounds of resultant join.
We advance to the next range on the side with lower upper bound till ranges on
at least one side are exhausted.  If the remaining ranges are from the outer
side, we include those in the partition bounds of resultant join. While
including a range in the partition bounds of the resultant join if its lower
bound precedes the upper bound of the last included range, it indicates that
multiple partitions on that side map to one partition on the other side, so we
bail out. Notice that in this method, we always include the ranges in the
partition bounds of the resultant join in the increasing order of their bounds.
Every range included in the join's partition bounds and it's corresponding
partition/s from joining relations are associated with appropriate partition of
the resultant join. For more details please see partition_range_bounds_merge()
in the attached patch.

The partitions from both sides (one partition from each side) which map to the
same partition of the resultant join are joined to form child-joins. The case
when an outer partition may not have a matching partition from the inner side
will be discussed in the next section. Except for the above algorithm to find
the pairs of matching partitions and calculating bounds of the resultant join,
the rest of the partition-wise join algorithm remains the same.

Unsupported case: When a partition from outer side doesn't have matching
partition on the inner side.
--------------------------------------------------------------------------
Consider a join t1 LEFT JOIN t2 where t1 has partitions t1p1 (0 - 100), t1p2
(100 - 200) and t2 has a single partition t2p1(0 - 100). The rows in t1p2 won't
have a matching row in t2 since there is no partition matching t1p2. The result
of the join will have rows in t1p2 with columns from t2 NULLed. In order to
execute this join as a partition-wise join, we need a dummy relation in place
of the missing partition, which we can join with t1p2. We need this placeholder
dummy relation (its targetlist, relids etc.), so that rest of the planner can
work with the resulting child-join.

We notice the missing partitions only while planning the join (during the
execution of make_one_rel()), by which time we have frozen the number of base
relations.  Introducing a base relation during join planning is not supported
in current planner. Similarly, a partition can be missing from a partitioned
join relation, in which case we have to add a dummy join relation. This might
need adding corresponding base relations as well. I have not spent time looking
for what it takes to support these cases. For now the patch does not support
partition-wise join in such cases.

TODOs
-----------
1. Add tests for advanced partition matching algorithm
2. Improve code quality, commenting, function names etc.

[1] 
https://www.postgresql.org/message-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=p1juky...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 7a9d903deb0475b4f8b0742fbe904a7cf0ce69c1 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH 10/11] Modify bound comparision functions to accept members
 of PartitionKey

Functions partition_bound_cmp(), partition_rbound_cmp() and
partition_rbound_datum_cmp() are required to merge partition bounds
from joining relations. While doing so, we do not have access to the
PartitionKey of either relations. So, modify these functions to accept
only required members of PartitionKey so that the functions can be
reused for merging bounds.

Ashutosh Bapat.
---
 src/backend/catalog/partition.c |   76 ++++++++++++++++++++++-----------------
 1 file changed, 44 insertions(+), 32 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 96a64ce..d42e1b5 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -126,15 +126,17 @@ static List *generate_partition_qual(Relation rel);
 
 static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
 					 List *datums, bool lower);
-static int32 partition_rbound_cmp(PartitionKey key,
-					 Datum *datums1, PartitionRangeDatumKind *kind1,
-					 bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
-						   Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+					 Oid *partcollation, Datum *datums1,
+					 PartitionRangeDatumKind *kind1, bool lower1,
+					 PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(int partnatts, FmgrInfo *partsupfunc,
+						   Oid *partcollation, Datum *rb_datums,
+						   PartitionRangeDatumKind *rb_kind,
 						   Datum *tuple_datums);
 
-static int32 partition_bound_cmp(PartitionKey key,
-					PartitionBoundInfo boundinfo,
+static int32 partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc,
+					Oid *partcollation, PartitionBoundInfo boundinfo,
 					int offset, void *probe, bool probe_is_bound);
 static int partition_bound_bsearch(PartitionKey key,
 						PartitionBoundInfo boundinfo,
@@ -719,8 +721,9 @@ check_new_partition_bound(char *relname, Relation parent,
 				 * First check if the resulting range would be empty with
 				 * specified lower and upper bounds
 				 */
-				if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
-										 upper) >= 0)
+				if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+										 key->partcollation, lower->datums,
+										 lower->kind, true, upper) >= 0)
 				{
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -771,9 +774,11 @@ check_new_partition_bound(char *relname, Relation parent,
 						{
 							int32		cmpval;
 
-							cmpval = partition_bound_cmp(key, boundinfo,
-														 offset + 1, upper,
-														 true);
+							cmpval = partition_bound_cmp(key->partnatts,
+														 key->partsupfunc,
+														 key->partcollation,
+														 boundinfo, offset + 1,
+														 upper, true);
 							if (cmpval < 0)
 							{
 								/*
@@ -2138,7 +2143,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
 	PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
 	PartitionKey key = (PartitionKey) arg;
 
-	return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+	return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+								key->partcollation, b1->datums, b1->kind,
+								b1->lower, b2);
 }
 
 /*
@@ -2155,7 +2162,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
  * two contiguous partitions.
  */
 static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
 					 Datum *datums1, PartitionRangeDatumKind *kind1,
 					 bool lower1, PartitionRangeBound *b2)
 {
@@ -2165,7 +2172,7 @@ partition_rbound_cmp(PartitionKey key,
 	PartitionRangeDatumKind *kind2 = b2->kind;
 	bool		lower2 = b2->lower;
 
-	for (i = 0; i < key->partnatts; i++)
+	for (i = 0; i < partnatts; i++)
 	{
 		/*
 		 * First, handle cases where the column is unbounded, which should not
@@ -2186,8 +2193,8 @@ partition_rbound_cmp(PartitionKey key,
 			 */
 			break;
 
-		cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
-												 key->partcollation[i],
+		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+												 partcollation[i],
 												 datums1[i],
 												 datums2[i]));
 		if (cmpval != 0)
@@ -2213,22 +2220,23 @@ partition_rbound_cmp(PartitionKey key,
  * is <, =, or > partition key of tuple (tuple_datums)
  */
 static int32
-partition_rbound_datum_cmp(PartitionKey key,
-						   Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
+partition_rbound_datum_cmp(int partnatts, FmgrInfo *partsupfunc,
+						   Oid *partcollation, Datum *rb_datums,
+						   PartitionRangeDatumKind *rb_kind,
 						   Datum *tuple_datums)
 {
 	int			i;
 	int32		cmpval = -1;
 
-	for (i = 0; i < key->partnatts; i++)
+	for (i = 0; i < partnatts; i++)
 	{
 		if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
 			return -1;
 		else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
 			return 1;
 
-		cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
-												 key->partcollation[i],
+		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+												 partcollation[i],
 												 rb_datums[i],
 												 tuple_datums[i]));
 		if (cmpval != 0)
@@ -2245,17 +2253,18 @@ partition_rbound_datum_cmp(PartitionKey key,
  * specified in *probe.
  */
 static int32
-partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
-					int offset, void *probe, bool probe_is_bound)
+partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
+					PartitionBoundInfo boundinfo, int offset, void *probe,
+					bool probe_is_bound)
 {
 	Datum	   *bound_datums = boundinfo->datums[offset];
 	int32		cmpval = -1;
 
-	switch (key->strategy)
+	switch (boundinfo->strategy)
 	{
 		case PARTITION_STRATEGY_LIST:
-			cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
-													 key->partcollation[0],
+			cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+													 partcollation[0],
 													 bound_datums[0],
 													 *(Datum *) probe));
 			break;
@@ -2273,12 +2282,14 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
 					 */
 					bool		lower = boundinfo->indexes[offset] < 0;
 
-					cmpval = partition_rbound_cmp(key,
-												  bound_datums, kind, lower,
+					cmpval = partition_rbound_cmp(partnatts, partsupfunc,
+												  partcollation, bound_datums,
+												  kind, lower,
 												  (PartitionRangeBound *) probe);
 				}
 				else
-					cmpval = partition_rbound_datum_cmp(key,
+					cmpval = partition_rbound_datum_cmp(partnatts, partsupfunc,
+														partcollation,
 														bound_datums, kind,
 														(Datum *) probe);
 				break;
@@ -2286,7 +2297,7 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
 
 		default:
 			elog(ERROR, "unexpected partition strategy: %d",
-				 (int) key->strategy);
+				 (int) boundinfo->strategy);
 	}
 
 	return cmpval;
@@ -2320,7 +2331,8 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
 		int32		cmpval;
 
 		mid = (lo + hi + 1) / 2;
-		cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
+		cmpval = partition_bound_cmp(key->partnatts, key->partsupfunc,
+									 key->partcollation, boundinfo, mid, probe,
 									 probe_is_bound);
 		if (cmpval <= 0)
 		{
-- 
1.7.9.5

From a650bbfa5510aa8db87d36be9def50d265779a3e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.ba...@enterprisedb.com>
Date: Wed, 9 Aug 2017 12:30:34 +0530
Subject: [PATCH 11/11] WIP Partition-wise join for 1:1, 1:0, 0:1 partition
 matching.

Earlier version of partition-wise join implementation allowed
partition-wise join between two relations with exactly same partition
bounds. This commit allows partition-wise join to be applied under
following conditions

1. the partition bounds of joining relations are such that rows from
given partition on one side join can join with rows from maximum one
partition on the other side i.e. bounds of a given partition on one
side match/overlap with those of maximum one partition on the other
side. If the mapping happens to be m:n where m > 1 or n > 1, we have
to gang multiple partition relations together into a single relation.
This means that we have to add simple relations during join
processing, something which is not supported right now.  ALso, in such
a case, different pairs of joining relations can produce different
partition bounds for the same join relation, which again is not
supported right now.

2. For every partition on outer side that can contribute to the result
of an OUTER side, there exists at least one (taken along with item 1,
it means exactly one)  matching partition on the inner side. To
support partition-wise join when the inner matching partition doesn't
exist, we have to add a dummy base relation corresponding to the
non-existent inner partition. We don't have support add base relations
during join processing.

This commit is not complete yet.

Ashutosh Bapat.
---
 src/backend/catalog/partition.c       | 1231 +++++++++++++++++++++++++++++++++
 src/backend/optimizer/path/joinrels.c |   77 ++-
 src/backend/optimizer/util/relnode.c  |   42 +-
 src/include/catalog/partition.h       |    6 +
 4 files changed, 1325 insertions(+), 31 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d42e1b5..eb35fab 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -141,6 +141,38 @@ static int32 partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc,
 static int partition_bound_bsearch(PartitionKey key,
 						PartitionBoundInfo boundinfo,
 						void *probe, bool probe_is_bound, bool *is_equal);
+static PartitionBoundInfo partition_range_bounds_merge(int partnatts,
+							 FmgrInfo *supfuncs, Oid *collations,
+							 PartitionBoundInfo boundinfo1, int nparts1,
+							 PartitionBoundInfo boundinfo2, int nparts2,
+							 JoinType jointype, List **parts1, List **parts2);
+static PartitionBoundInfo partition_list_bounds_merge(int partnatts,
+							FmgrInfo *partsupfunc, Oid *collations,
+							PartitionBoundInfo boundinfo1, int nparts1,
+							PartitionBoundInfo boundinfo2, int nparts2,
+							JoinType jointype, List **parts1, List **parts2);
+static void generate_matching_part_pairs(int *mergemap1, int npart1,
+							 int *mergemap2, int nparts2, JoinType jointype,
+							 int nparts, List **parts1, List **parts2);
+static PartitionBoundInfo build_merged_partition_bounds(char strategy,
+							  List *merged_datums, List *merged_indexes,
+							  List *merged_contents, int null_index);
+static int map_and_merge_partitions(int *partmap1, int *mergemap1, int index1,
+						 int *partmap2, int *mergemap2, int index2,
+						 int *next_index);
+static int32 partition_range_bound_cmp(int partnatts, FmgrInfo *partsupfunc,
+						  Oid *collations, PartitionRangeBound *bound1,
+						  PartitionRangeBound *bound2);
+static int partition_range_cmp(int partnatts, FmgrInfo *supfuncs,
+						   Oid *collations, PartitionRangeBound *lower_bound1,
+						   PartitionRangeBound *upper_bound1,
+						   PartitionRangeBound *lower_bound2,
+						   PartitionRangeBound *upper_bound2, bool *overlap);
+static bool partition_range_merge_next_lb(int partnatts, FmgrInfo *supfuncs,
+							  Oid *collations, Datum *next_lb_datums,
+							  PartitionRangeDatumKind *next_lb_kind,
+							  List **merged_datums, List **merged_kinds,
+							  List **merged_indexes);
 
 /*
  * RelationBuildPartitionDesc
@@ -2348,3 +2380,1202 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
 
 	return lo;
 }
+
+/*
+ * Merge the given partition bounds.
+ *
+ * If given partition bounds can not be merged, return NULL.
+ *
+ * The function also returns two lists of partition indexes one for each of the
+ * joining relations. Both the lists contain the same number of elements. The
+ * partition indexes at the same positions in the list indicate partitions from
+ * each side to be joined and their position corresponds to the index of
+ * partition to which the results of the child-join belong in the partitioned
+ * join.
+ */
+extern PartitionBoundInfo
+partition_bounds_merge(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
+					   PartitionBoundInfo boundinfo1, int nparts1,
+					   PartitionBoundInfo boundinfo2, int nparts2,
+					   JoinType jointype, List **parts1, List **parts2)
+{
+	PartitionBoundInfo merged_bounds;
+	char		strategy;
+
+	/* Bail out if partitioning strategies are different. */
+	if (boundinfo1->strategy != boundinfo2->strategy)
+		return NULL;
+
+	*parts1 = NIL;
+	*parts2 = NIL;
+	strategy = boundinfo1->strategy;
+	if (strategy == PARTITION_STRATEGY_LIST)
+		merged_bounds = partition_list_bounds_merge(partnatts, partsupfunc,
+													partcollation, boundinfo1,
+													nparts1, boundinfo2,
+													nparts2, jointype, parts1,
+													parts2);
+	else if (strategy == PARTITION_STRATEGY_RANGE)
+		merged_bounds = partition_range_bounds_merge(partnatts, partsupfunc,
+													 partcollation, boundinfo1,
+													 nparts1, boundinfo2,
+													 nparts2, jointype, parts1,
+													 parts2);
+	else
+		elog(ERROR, "unexpected partition strategy: %d", strategy);
+
+	Assert(merged_bounds || (*parts1 == NIL && *parts2 == NIL));
+	return merged_bounds;
+}
+
+/*
+ * partition_get_range_bounds
+ *
+ * Given the index of lower bound in datums array, return lower and upper
+ * bounds and the index of the partition with that lower bound.
+ */
+static int
+partition_get_range_bounds(PartitionBoundInfo bi, int lb_index,
+						   PartitionRangeBound *lower,
+						   PartitionRangeBound *upper)
+{
+	int			part_index;
+
+	/* A lower bound should have at least one more bound after it. */
+	Assert(lb_index < bi->ndatums - 1);
+
+	/* The lower bound should correspond to a valid partition. */
+	part_index = bi->indexes[lb_index + 1];
+	Assert(part_index >= 0);
+
+	lower->kind = bi->kind[lb_index];
+	lower->datums = bi->datums[lb_index];
+	lower->lower = true;
+	upper->kind = bi->kind[lb_index + 1];
+	upper->datums = bi->datums[lb_index + 1];
+	upper->lower = false;
+
+	return part_index;
+}
+
+/*
+ * partition_range_get_next_lb_index
+ *
+ * Given the index of lower bound in datums array return the
+ * index of lower bound of the next partition. When the given index corresponds
+ * to the last partition, return -1.
+ */
+static int
+partition_range_get_next_lb_index(PartitionBoundInfo bi, int lb_index)
+{
+	/* A lower bound should have at least one more bound after it. */
+	Assert(lb_index < bi->ndatums - 1);
+
+	/* The lower bound should correspond to a valid partition. */
+	Assert(bi->indexes[lb_index + 1] >= 0);
+
+	/*
+	 * If there are no bounds left beyond the upper bound, we have reached the
+	 * last partition.
+	 */
+	if (lb_index + 2 < bi->ndatums)
+	{
+		/*
+		 * If the bound next to the upper bound corresponds to no partition,
+		 * that's the next lower bound of the next partition. Otherwise, the
+		 * current upper bound is the lower bound of the next partition.
+		 */
+		if (bi->indexes[lb_index + 2] < 0)
+			return lb_index + 2;
+		else
+			return lb_index + 1;
+	}
+	else
+		return -1;
+}
+
+static int32
+partition_range_bound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *collations,
+						  PartitionRangeBound *bound1,
+						  PartitionRangeBound *bound2)
+{
+	return partition_rbound_cmp(partnatts, partsupfunc, collations,
+								bound1->datums, bound1->kind, bound1->lower,
+								bound2);
+}
+
+/*
+ * partition_range_cmp
+ *
+ * Compare the bounds of two range partitions and return <, = or > 0, if the
+ * first partition's upper bound is lower than, equal to or higher than the
+ * second partition's upper bound resp.
+ *
+ * Also, set overlaps to true, if the ranges overlap, otherwise set it to
+ * false.
+ */
+static int
+partition_range_cmp(int partnatts, FmgrInfo *supfuncs, Oid *collations,
+						   PartitionRangeBound *lower_bound1,
+						   PartitionRangeBound *upper_bound1,
+						   PartitionRangeBound *lower_bound2,
+						   PartitionRangeBound *upper_bound2, bool *overlap)
+{
+	/*
+	 * Compare upper bound of the first partition with the lower bound of the
+	 * second and vice-versa. If lower bound is higher than the upper bound,
+	 * the partitions are not overlapping. All other cases indicate overlapping
+	 * partitions.
+	 * TODO: Add a testcase which has lower and upper bound matching exactly.
+	 * Lower bound is inclusive and upper bound is exclusive, so even if the
+	 * datums match, the bounds do not match exactly.
+	 */
+	if (partition_range_bound_cmp(partnatts, supfuncs, collations,
+								  lower_bound1, upper_bound2) > 0)
+	{
+		*overlap = false;
+		return 1;
+	}
+	else if (partition_range_bound_cmp(partnatts, supfuncs, collations,
+									   lower_bound2, upper_bound1) > 0)
+	{
+		*overlap = false;
+		return -1;
+	}
+	else
+	{
+		*overlap = true;
+		return partition_range_bound_cmp(partnatts, supfuncs, collations,
+										 upper_bound1, upper_bound2);
+	}
+}
+
+/*
+ * partition_range_merge
+ *
+ * Merge the partition bounds of given two partitions such that the join
+ * between the given two partitions fits merged bounds.
+ *
+ * "merged_upper" will be set to one of the given upper bounds and
+ * "merged_lower" will be set to one of the given lower bounds.
+ */
+static void
+partition_range_merge(int partnatts, FmgrInfo *supfuncs,
+							 Oid *collations, JoinType jointype,
+							 PartitionRangeBound *left_lb,
+							 PartitionRangeBound *left_ub,
+							 PartitionRangeBound *right_lb,
+							 PartitionRangeBound *right_ub,
+							 PartitionRangeBound **merged_lb,
+							 PartitionRangeBound **merged_ub)
+{
+	/*
+	 * An outer join will have all the rows from the outer side, so merged
+	 * bounds will be same as the outer bounds. An inner join will have rows
+	 * that fit both the bounds, thus lower merged bound will be higher of two
+	 * lower bounds and upper merged bound will be lower of the two upper
+	 * bounds.
+	 */
+	switch (jointype)
+	{
+		case JOIN_LEFT:
+		case JOIN_ANTI:
+			*merged_ub = left_ub;
+			*merged_lb = left_lb;
+			break;
+
+		case JOIN_RIGHT:
+			*merged_ub = right_ub;
+			*merged_lb = right_lb;
+			break;
+
+		case JOIN_INNER:
+		case JOIN_SEMI:
+			if (partition_range_bound_cmp(partnatts, supfuncs, collations,
+										  left_ub, right_ub) < 0)
+				*merged_ub = left_ub;
+			else
+				*merged_ub = right_ub;
+
+			if (partition_range_bound_cmp(partnatts, supfuncs, collations,
+										  left_lb, right_lb) > 0)
+				*merged_lb = left_lb;
+			else
+				*merged_lb = right_lb;
+			break;
+
+		case JOIN_FULL:
+			if (partition_range_bound_cmp(partnatts, supfuncs, collations,
+										  left_ub, right_ub) > 0)
+				*merged_ub = left_ub;
+			else
+				*merged_ub = right_ub;
+
+			if (partition_range_bound_cmp(partnatts, supfuncs, collations,
+										  left_lb, right_lb) < 0)
+				*merged_lb = left_lb;
+			else
+				*merged_lb = right_lb;
+			break;
+
+		default:
+			elog(ERROR, "Unexpected join type %d", jointype);
+	}
+
+	return;
+}
+
+/*
+ * Add the lower bound of the next range to the list of bounds, if the lower
+ * bound is higher or equal to the previous upper bound. If successful return
+ * true, otherwise false.
+ */
+static bool
+partition_range_merge_next_lb(int partnatts, FmgrInfo *supfuncs,
+							  Oid *collations, Datum *next_lb_datums,
+							  PartitionRangeDatumKind *next_lb_kind,
+							  List **merged_datums, List **merged_kinds,
+							  List **merged_indexes)
+{
+	int			cmpval;
+
+	if (!*merged_datums)
+	{
+		Assert(!*merged_kinds && !*merged_indexes);
+		cmpval = 1;
+	}
+	else
+	{
+		PartitionRangeBound	prev_ub;
+
+		prev_ub.datums = llast(*merged_datums);
+		prev_ub.kind = llast(*merged_kinds);
+		prev_ub.lower = false;
+
+		/*
+		 * TODO: explain why do we pass lower to be false for the next lower
+		 * bound.
+		 */
+		cmpval = partition_rbound_cmp(partnatts, supfuncs, collations,
+									  next_lb_datums, next_lb_kind, false,
+									  &prev_ub);
+	}
+
+	/*
+	 * The lower bound is lower than the last upper bound, thus does not fit
+	 * the bounds created so far and hence can not be merged with the existing
+	 * bounds.
+	 */
+	if (cmpval < 0)
+		return false;
+
+	/*
+	 * Add bounds of the new merged partition. If the next lower bound is
+	 * higher than the last upper bound, add new range with index
+	 * corresponding to the lower bound as -1. If the merged lower bound
+	 * is same as the last merged upper bound, the last upper bound will be
+	 * reused as the lower bound of the next range.
+	 */
+	if (cmpval > 0)
+	{
+		*merged_datums = lappend(*merged_datums, next_lb_datums);
+		*merged_kinds = lappend(*merged_kinds, next_lb_kind);
+		*merged_indexes = lappend_int(*merged_indexes, -1);
+	}
+
+	return true;
+}
+
+/*
+ * Merge given two range partition bounds.
+ *
+ * Work horse function for partition_bounds_merge() for range partitioned
+ * tables.
+ *
+ * TODO: for an anti-join, the caller is supposed to send the outer relation as
+ * left relation. May be we should rename left and right as inner and outer. We
+ * don't need to handle RIGHT joins in this function, so renaming them as outer
+ * and inner is fine.
+ */
+static PartitionBoundInfo
+partition_range_bounds_merge(int partnatts, FmgrInfo *supfuncs, Oid *collations,
+							 PartitionBoundInfo left_bi, int left_nparts,
+							 PartitionBoundInfo right_bi, int right_nparts,
+							 JoinType jointype, List **left_parts, List **right_parts)
+{
+	int		   *left_pmap;
+	int		   *left_mmap;
+	int		   *right_pmap;
+	int		   *right_mmap;
+	int			cnt1;
+	int			cnt2;
+	int			left_part;
+	int			right_part;
+	PartitionBoundInfo merged_bounds = NULL;
+	bool		merged = true;	/* By default we ranges are merge-able. */
+	int			left_lb_index;
+	int			right_lb_index;
+	int			next_index;
+	int			cmpval;
+	List	   *merged_datums = NIL;
+	List	   *merged_indexes = NIL;
+	List	   *merged_kinds = NIL;
+
+	Assert(left_bi->strategy == right_bi->strategy &&
+		   left_bi->strategy == PARTITION_STRATEGY_RANGE);
+
+	*left_parts = NIL;
+	*right_parts = NIL;
+
+	/* Allocate and initialize partition maps. */
+	left_pmap = (int *) palloc(sizeof(int) * left_nparts);
+	left_mmap = (int *) palloc(sizeof(int) * left_nparts);
+	right_pmap = (int *) palloc(sizeof(int) * right_nparts);
+	right_mmap = (int *) palloc(sizeof(int) * right_nparts);
+
+	for (cnt1 = 0; cnt1 < left_nparts; cnt1++)
+	{
+		left_pmap[cnt1] = -1;
+		left_mmap[cnt1] = -1;
+	}
+	for (cnt2 = 0; cnt2 < right_nparts; cnt2++)
+	{
+		right_pmap[cnt2] = -1;
+		right_mmap[cnt2] = -1;
+	}
+
+	left_lb_index = 0;
+	right_lb_index = 0;
+	next_index = 0;
+	while (left_lb_index >= 0 && right_lb_index >= 0)
+	{
+		PartitionRangeBound left_lb;
+		PartitionRangeBound left_ub;
+		PartitionRangeBound right_lb;
+		PartitionRangeBound right_ub;
+		PartitionRangeBound *merged_lb = NULL;
+		PartitionRangeBound *merged_ub = NULL;
+		int			merged_index = -1;
+		bool		overlap;
+
+		/* Get the range bounds of the next partition. */
+		left_part = partition_get_range_bounds(left_bi, left_lb_index,
+											   &left_lb, &left_ub);
+		right_part = partition_get_range_bounds(right_bi, right_lb_index,
+												&right_lb, &right_ub);
+
+		cmpval = partition_range_cmp(partnatts, supfuncs, collations,
+									 &left_lb, &left_ub, &right_lb, &right_ub,
+									 &overlap);
+
+		if (overlap)
+		{
+			/* Overlapping ranges, try merging. */
+			partition_range_merge(partnatts, supfuncs, collations, jointype,
+								  &left_lb, &left_ub, &right_lb, &right_ub,
+								  &merged_lb, &merged_ub);
+			merged_index = map_and_merge_partitions(left_pmap, left_mmap,
+													left_part, right_pmap,
+													right_mmap, right_part,
+													&next_index);
+
+			if (merged_index < 0)
+			{
+				merged = false;
+				break;
+			}
+		}
+
+		if (cmpval == 0)
+		{
+			Assert(overlap);
+
+			/* Move to the next pair of partitions. */
+			left_lb_index = partition_range_get_next_lb_index(left_bi,
+															  left_lb_index);
+			right_lb_index = partition_range_get_next_lb_index(right_bi,
+															   right_lb_index);
+		}
+		else if (cmpval < 0)
+		{
+			/*
+			 * If the partition on the left was not mapped to any partition on
+			 * the right. Any row from that partition will not have a matching
+			 * row from the other relation. So the partition will be part of
+			 * the join if it's an anti-join or the left side is the outer
+			 * side.
+			 */
+			if (jointype == JOIN_INNER || jointype == JOIN_SEMI ||
+				jointype == JOIN_RIGHT)
+			{
+				/* Nothing to do. */
+			}
+			else if (jointype == JOIN_LEFT || jointype == JOIN_FULL ||
+					 jointype == JOIN_ANTI)
+			{
+				if (left_mmap[left_part] < 0)
+				{
+					left_mmap[left_part] = next_index++;
+					merged_index = left_mmap[left_part];
+					merged_lb = &left_lb;
+					merged_ub = &left_ub;
+				}
+			}
+			else
+			{
+				/* Don't know what to do with other join types. Bail out. */
+				merged = false;
+			}
+
+			/* Move to the next partition on the left side. */
+			left_lb_index = partition_range_get_next_lb_index(left_bi,
+															  left_lb_index);
+		}
+		else
+		{
+			Assert(cmpval > 0);
+
+			/*
+			 * If the partition on the right was not mapped to any partition on
+			 * the left. Any row from that partition will not have a matching
+			 * row from the other relation. So the partition will be part of
+			 * the join if the right side is the outer side.
+			 */
+			if (jointype == JOIN_INNER || jointype == JOIN_SEMI ||
+				jointype == JOIN_LEFT || jointype == JOIN_ANTI)
+				merged_index = -1;
+			else if (jointype == JOIN_RIGHT || jointype == JOIN_FULL)
+			{
+				if (right_mmap[right_part] < 0)
+				{
+					right_mmap[right_part] = next_index++;
+					merged_index = right_mmap[right_part];
+					merged_lb = &right_lb;
+					merged_ub = &right_ub;
+				}
+			}
+			else
+			{
+				/* Don't know what to do with other join types. Bail out. */
+				merged = false;
+			}
+
+			/* Move to the next partition on the right side. */
+			right_lb_index = partition_range_get_next_lb_index(right_bi,
+															   right_lb_index);
+		}
+
+		if (!merged)
+			break;
+
+		/* A skipped partition is not added to merged bounds. */
+		if (merged_index < 0)
+			continue;
+
+		/*
+		 * We have a valid partition index for the next partition of join. The
+		 * partition should have valid range.
+		 */
+		Assert(merged_lb && merged_ub);
+
+		/* Try merging merged lower bound with the last upper bound. */
+		merged = partition_range_merge_next_lb(partnatts, supfuncs,
+											   collations, merged_lb->datums,
+											   merged_lb->kind, &merged_datums,
+											   &merged_kinds, &merged_indexes);
+		if (merged)
+		{
+			/* Add upper bound with the merged partition index. */
+			merged_datums = lappend(merged_datums, merged_ub->datums);
+			merged_kinds = lappend(merged_kinds, merged_ub->kind);
+			merged_indexes = lappend_int(merged_indexes, merged_index);
+		}
+		else
+			break;
+	}
+
+	/*
+	 * We will run the above loop till we exhaust ranges of at least one side
+	 * unless we failed to merge the ranges.
+	 */
+	Assert (!merged || (left_lb_index < 0 || right_lb_index < 0));
+
+	/*
+	 * Handle any remaining partition bounds.  If remaining partitions fall on
+	 * the inner side of the join, none of the rows in those partition are
+	 * going to be joined with any row on the outer side and hence those
+	 * partitions will not be part of the join result. Hence only consider the
+	 * remaining partitions on the outer side of the join.
+	 */
+	if (merged &&
+		((left_lb_index >= 0 &&
+		  (jointype == JOIN_LEFT || jointype == JOIN_FULL ||
+		   jointype == JOIN_LEFT)) ||
+		 (right_lb_index >= 0 &&
+		  (jointype == JOIN_RIGHT || jointype == JOIN_FULL))))
+	{
+		int			bound_index = -1;
+		PartitionBoundInfo rem_bi = NULL;
+		int			*mmap = NULL;
+
+		if (left_lb_index >= 0)
+		{
+			Assert(jointype == JOIN_LEFT || jointype == JOIN_FULL ||
+				   jointype == JOIN_ANTI);
+			bound_index = left_lb_index;
+			rem_bi = left_bi;
+			mmap = left_mmap;
+		}
+		else if (right_lb_index >= 0)
+		{
+			Assert(jointype == JOIN_RIGHT || jointype == JOIN_FULL);
+			bound_index = right_lb_index;
+			rem_bi = right_bi;
+			mmap = right_mmap;
+		}
+
+		Assert(bound_index >= 0 && rem_bi && mmap);
+
+		/*
+		 * Merge lower bound of the next range with the upper bound of last
+		 * range.
+		 */
+		merged = partition_range_merge_next_lb(partnatts, supfuncs, collations,
+											   rem_bi->datums[bound_index],
+											   rem_bi->kind[bound_index],
+											   &merged_datums, &merged_kinds,
+											   &merged_indexes);
+
+		/*
+		 * Rest of the bounds correspond to valid ranges so add them after
+		 * remapping their partitions as required.
+		 */
+		for (bound_index++; merged && bound_index < rem_bi->ndatums;
+			 bound_index++)
+		{
+			Datum	   *datums = rem_bi->datums[bound_index];
+			int			index = rem_bi->indexes[bound_index];
+			int			part_index;
+
+			if (index < 0)
+				part_index = index;
+			else
+			{
+				if (mmap[index] < 0)
+					mmap[index] = next_index++;
+				part_index = mmap[index];
+			}
+
+			merged_indexes = lappend_int(merged_indexes, part_index);
+			merged_datums = lappend(merged_datums, datums);
+			merged_kinds = lappend(merged_kinds,
+								   rem_bi->kind[bound_index]);
+		}
+	}
+
+	/* Create PartitionBoundInfo */
+	if (merged)
+	{
+		/* Use maps to match partition from the joining relations. */
+		generate_matching_part_pairs(left_mmap, left_nparts, right_mmap,
+									 right_nparts, jointype, next_index,
+									 left_parts, right_parts);
+
+		/* Craft a PartitionBoundInfo to return. */
+		if (*left_parts && *right_parts)
+		{
+			Assert(list_length(*left_parts) == list_length(*right_parts));
+			Assert(list_length(*left_parts) == next_index);
+			merged_bounds = build_merged_partition_bounds(left_bi->strategy,
+														  merged_datums,
+														  merged_indexes,
+														  merged_kinds,
+														  -1);
+		}
+	}
+
+	list_free(merged_datums);
+	list_free(merged_indexes);
+	pfree(left_pmap);
+	pfree(right_pmap);
+	pfree(left_mmap);
+	pfree(right_mmap);
+
+	/* Free any memory we used in this function. */
+	return merged_bounds;
+}
+
+/*
+ * partition_bounds_merge()'s arm for list partitioned tables.
+ *
+ * The function builds the maps of matching partitions from either relation. It
+ * builds the list of partition key values that may appear in the join result
+ * alongwith the list of indexes of partitions of join to which those values
+ * belong. It then crafts a PartitionBoundInfo structure representing the
+ * partition bounds of the join result.
+ */
+static PartitionBoundInfo
+partition_list_bounds_merge(int partnatts, FmgrInfo *partsupfunc,
+							Oid *partcollation, PartitionBoundInfo left_bi,
+							int left_nparts, PartitionBoundInfo right_bi,
+							int right_nparts, JoinType jointype, List **left_parts,
+							List **right_parts)
+{
+	int		   *left_pmap;	/* left to right partition map */
+	int		   *left_mmap;	/* left to merged partition map */
+	int		   *right_pmap;	/* right to left partition map */
+	int		   *right_mmap;	/* right to merged partition map */
+	int			cntl;
+	int			cntr;
+	bool		merged = true;
+	List	   *merged_datums = NIL;
+	List	   *merged_indexes = NIL;
+	int			next_index = 0;
+	int			null_index;
+	PartitionBoundInfo merged_bounds = NULL;
+	int		   *left_indexes = left_bi->indexes;
+	int		   *right_indexes = right_bi->indexes;
+	int			left_ni = left_bi->null_index;
+	int			right_ni = right_bi->null_index;
+
+	Assert(left_bi->strategy == right_bi->strategy &&
+		   left_bi->strategy == PARTITION_STRATEGY_LIST);
+
+	/* List partitions do not require unbounded ranges. */
+	Assert(!left_bi->kind && !right_bi->kind);
+
+	/* Allocate and initialize partition maps. */
+	left_pmap = (int *) palloc(sizeof(int) * left_nparts);
+	left_mmap = (int *) palloc(sizeof(int) * left_nparts);
+	right_pmap = (int *) palloc(sizeof(int) * right_nparts);
+	right_mmap = (int *) palloc(sizeof(int) * right_nparts);
+
+	/* Initialize partition maps. */
+	for (cntl = 0; cntl < left_nparts; cntl++)
+	{
+		left_pmap[cntl] = -1;
+		left_mmap[cntl] = -1;
+	}
+	for (cntr = 0; cntr < right_nparts; cntr++)
+	{
+		right_pmap[cntr] = -1;
+		right_mmap[cntr] = -1;
+	}
+
+	cntl = cntr = 0;
+	while (cntl < left_bi->ndatums && cntr < right_bi->ndatums)
+	{
+		Datum	   *ldatums = left_bi->datums[cntl];
+		Datum	   *rdatums = right_bi->datums[cntr];
+		int			l_index = left_indexes[cntl];
+		int			r_index = right_indexes[cntr];
+		int			cmpval;
+		int			merged_index;
+		Datum	   *merged_datum;
+
+		/* Every list datum should map to a valid partition index. */
+		Assert(l_index >= 0 && r_index >= 0);
+
+		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+												 partcollation[0], ldatums[0],
+												 rdatums[0]));
+		if (cmpval == 0)
+		{
+			/*
+			 * Try matching partitions containing the matching datums. If
+			 * successful, add the datum to the merged bounds with index of
+			 * merged partition containing it.
+			 */
+			merged_datum = ldatums;
+			merged_index = map_and_merge_partitions(left_pmap, left_mmap, l_index,
+													right_pmap, right_mmap, r_index,
+													&next_index);
+
+			if (merged_index < 0)
+			{
+				merged = false;
+				break;
+			}
+
+			/* Move to the next pair of bounds. */
+			cntl++;
+			cntr++;
+		}
+		else if (cmpval < 0)
+		{
+			/*
+			 * This list datum is present in the left side but not the right
+			 * side. So it will appear in the join when the left side is outer
+			 * side.
+			 */
+			if (jointype == JOIN_INNER || jointype == JOIN_RIGHT ||
+				jointype == JOIN_SEMI)
+				merged_index = -1;
+			else if (jointype == JOIN_LEFT || jointype == JOIN_FULL ||
+					 jointype == JOIN_ANTI)
+			{
+				if (left_mmap[l_index] < 0)
+					left_mmap[l_index] = next_index++;
+				merged_index = left_mmap[l_index];
+				merged_datum = ldatums;
+			}
+			else
+			{
+				/* Don't know what to do with other join types. */
+				merged = false;
+				break;
+			}
+
+			/* Move to the next datum on the left side. */
+			cntl++;
+		}
+		else
+		{
+			Assert(cmpval > 0);
+
+			/*
+			 * This list datum is present in the right side but not the left
+			 * side. So it will appear in the join when the right side is outer
+			 * side.
+			 */
+			if (jointype == JOIN_INNER || jointype == JOIN_LEFT ||
+				jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+				merged_index = -1;
+			else if (jointype == JOIN_RIGHT || jointype == JOIN_FULL)
+			{
+				/*
+				 * Every list value on the outer side will appear in the
+				 * join.  Find the merged partition to which this value
+				 * belongs.
+				 */
+				if (right_mmap[r_index] < 0)
+					right_mmap[r_index] = next_index++;
+				merged_index = right_mmap[r_index];
+				merged_datum = rdatums;
+			}
+			else
+			{
+				/* Don't know what to do with other join types. */
+				merged = false;
+				break;
+			}
+
+			/* Move to the next datum on the right side. */
+			cntr++;
+		}
+
+		/*
+		 * Add the datum with appropriate index in the list of datums, if the
+		 * rows containing that datum are deemed to be part of the join.
+		 */
+		if (merged_index >= 0)
+		{
+			merged_indexes = lappend_int(merged_indexes, merged_index);
+			merged_datums = lappend(merged_datums, merged_datum);
+		}
+	}
+
+	/*
+	 * If merge is unsuccessful, bail out without any further processing.
+	 * That leaks the memory allocated in this function. So, try not to leak
+	 * memory.
+	 */
+	if (!merged)
+		goto merge_failed;
+
+	/* We should have exhausted datums on at least one side. */
+	Assert(cntr >= right_bi->ndatums || cntl >= left_bi->ndatums);
+
+	/*
+	 * Add any remaining list values on the outer side, assigning partition
+	 * indexes if required.
+	 */
+	if (jointype == JOIN_LEFT || jointype == JOIN_FULL || jointype == JOIN_ANTI)
+	{
+		for (;cntl < left_bi->ndatums; cntl++)
+		{
+			Datum	   *ldatums = left_bi->datums[cntl];
+			int			l_index = left_indexes[cntl];
+
+			if (left_mmap[l_index] < 0)
+				left_mmap[l_index] = next_index++;
+			merged_indexes = lappend_int(merged_indexes, left_mmap[l_index]);
+			merged_datums = lappend(merged_datums, ldatums);
+		}
+	}
+
+	if (jointype == JOIN_RIGHT || jointype == JOIN_FULL)
+	{
+		for (;cntr < right_bi->ndatums; cntr++)
+		{
+			Datum	   *rdatums = right_bi->datums[cntr];
+			int			r_index = right_indexes[cntr];
+
+			if (right_mmap[r_index] < 0)
+				right_mmap[r_index] = next_index++;
+			merged_indexes = lappend_int(merged_indexes, right_mmap[r_index]);
+			merged_datums = lappend(merged_datums, rdatums);
+		}
+	}
+
+	/*
+	 * Merge NULL partitions if any. Find the index of merged partition to
+	 * which the NULL values belong in the join result. We can eliminate a NULL
+	 * partition when it appears only in the inner relation.
+	 */
+	if (!partition_bound_accepts_nulls(left_bi) &&
+		!partition_bound_accepts_nulls(right_bi))
+		null_index = -1;
+	else if (partition_bound_accepts_nulls(left_bi) &&
+			 !partition_bound_accepts_nulls(right_bi))
+	{
+		if (jointype == JOIN_LEFT || jointype == JOIN_FULL ||
+			jointype == JOIN_ANTI)
+		{
+			if (left_mmap[left_ni] < 0)
+				left_mmap[left_ni] = next_index++;
+			null_index = left_mmap[left_ni];
+		}
+		else
+			null_index = -1;
+	}
+	else if (!partition_bound_accepts_nulls(left_bi) &&
+			 partition_bound_accepts_nulls(right_bi))
+	{
+		if (jointype == JOIN_RIGHT || jointype == JOIN_FULL)
+		{
+			if (right_mmap[right_ni] < 0)
+				right_mmap[right_ni] = next_index++;
+			null_index = right_mmap[right_ni];
+		}
+		else
+			null_index = -1;
+	}
+	else
+	{
+		/* Both the relations have NULL partitions, try merging them. */
+		null_index = map_and_merge_partitions(left_pmap, left_mmap,
+											  left_ni, right_pmap,
+											  right_mmap, right_ni,
+											  &next_index);
+		if (null_index < 0)
+			merged = false;
+	}
+
+	/* If successful build the output structures. */
+	if (merged)
+	{
+
+		/* Use maps to match partition from the joining relations. */
+		generate_matching_part_pairs(left_mmap, left_nparts, right_mmap,
+									 right_nparts, jointype, next_index,
+									 left_parts, right_parts);
+		/* Craft a PartitionBoundInfo to return. */
+		if (*left_parts && *right_parts)
+		{
+			Assert(list_length(*left_parts) == list_length(*right_parts));
+			Assert(list_length(*left_parts) == next_index);
+			merged_bounds = build_merged_partition_bounds(left_bi->strategy,
+														  merged_datums,
+														  merged_indexes, NIL,
+														  null_index);
+		}
+	}
+
+merge_failed:
+	list_free(merged_datums);
+	list_free(merged_indexes);
+	pfree(left_pmap);
+	pfree(right_pmap);
+	pfree(left_mmap);
+	pfree(right_mmap);
+
+	return merged_bounds;
+}
+
+/*
+ * map_and_merge_partitions
+ *
+ * If the two given partitions (given by index1 and index2 resp.) are already
+ * mapped to each other return the index of corresponding partition in the
+ * merged set of partitions.  If they do not have a merged partition associated
+ * with them, assign a new merged partition index.  If the partitions are
+ * already mapped and their mapped partitions are different from each other,
+ * they can not be merged, so return -1.
+ *
+ * partmap1[i] gives the partition of relation 2 which matches ith partition of
+ * relation 1. Similarly for partmap2.
+ *
+ * mergemap1[i] gives the partition in the merged set to which ith partition of
+ * relation 1 maps to. Similarly for mergemap2.
+ *
+ * index1 and index2 are the indexes of matching partition from respective
+ * relations.
+ *
+ * *next_index is used and incremented when the given partitions require a new
+ * merged partition.
+ */
+static int
+map_and_merge_partitions(int *partmap1, int *mergemap1, int index1,
+						 int *partmap2, int *mergemap2, int index2,
+						 int *next_index)
+{
+	int			merged_index;
+
+	/*
+	 * If both the partitions are not mapped to each other, update the
+	 * maps.
+	 */
+	if (partmap1[index1] < 0 && partmap2[index2] < 0)
+	{
+		partmap1[index1] = index2;
+		partmap2[index2] = index1;
+	}
+
+	/*
+	 * If the given to partitions map to each other, find the corresponding
+	 * merged partition index .
+	 */
+	if (partmap1[index1] == index2 && partmap2[index2] == index1)
+	{
+		/*
+		 * If both the partitions are mapped to the same merged partition, get
+		 * the index of merged partition.
+		 */
+		if (mergemap1[index1] == mergemap2[index2])
+		{
+			merged_index = mergemap1[index1];
+
+			/*
+			 * If the given two partitions do not have a merged partition
+			 * associated with them, allocate a new merged partition.
+			 */
+			if (merged_index < 0)
+			{
+				merged_index = *next_index;
+				*next_index = *next_index + 1;
+				mergemap1[index1] = merged_index;
+				mergemap2[index2] = merged_index;
+			}
+		}
+
+		/*
+		 * If partition from one relation was mapped to a merged partition but
+		 * not the partition from the other relation, map the same merged
+		 * partition to the partition from other relation, since matching
+		 * partitions map to the same merged partition.
+		 */
+		else if (mergemap1[index1] >= 0 && mergemap2[index2] < 0)
+		{
+			mergemap2[index2] = mergemap1[index1];
+			merged_index = mergemap1[index1];
+		}
+		else if (mergemap1[index1] < 0 && mergemap2[index2] >= 0)
+		{
+			mergemap1[index1] = mergemap2[index2];
+			merged_index = mergemap2[index2];
+		}
+		else
+		{
+			Assert(mergemap1[index1] != mergemap2[index2] &&
+				   mergemap1[index1] >= 0 && mergemap2[index2] >= 0);
+
+			/*
+			 * Both the partitions map to different merged partitions. This
+			 * means that multiple partitions from one relation matches to one
+			 * partition from the other relation. Partition-wise join does not
+			 * handle this case right now, since it requires ganging multiple
+			 * partitions together (into one RelOptInfo).
+			 */
+			merged_index = -1;
+		}
+	}
+	else
+	{
+		/*
+		 * Multiple partitions from one relation map to one partition from the
+		 * other relation. Partition-wise join does not handle this case right
+		 * now, since it requires ganging multiple partitions together (into
+		 * one RelOptInfo).
+		 */
+		merged_index = -1;
+	}
+
+	return merged_index;
+}
+
+/*
+ * generate_matching_part_pairs
+ *
+ * Given the merged partition to which partition on either side of join map,
+ * produce the list pairs of partitions which when joined produce the merged
+ * partitions in the order of merged partition indexes.
+ *
+ * If successful, the pairs of partitions are returned as two separate lists
+ * one for each side. Otherwise, those lists will be set to NIL.
+ *
+ * TODO: rename the sides as outer and inner. You may not need to support
+ * JOIN_RIGHT, since we won't see that type here.
+ */
+static void
+generate_matching_part_pairs(int *mergemap1, int nparts1, int *mergemap2,
+							 int nparts2, JoinType jointype, int nparts,
+							 List **parts1, List **parts2)
+{
+	bool		merged = true;
+	int		  **matching_parts;
+	int			cnt1;
+	int			cnt2;
+
+	matching_parts = (int **) palloc(sizeof(int *) * 2);
+	matching_parts[0] = (int *) palloc(sizeof(int) * nparts);
+	matching_parts[1] = (int *) palloc(sizeof(int) * nparts);
+
+	*parts1 = NIL;
+	*parts2 = NIL;
+
+	for (cnt1 = 0; cnt1 < nparts; cnt1++)
+	{
+		matching_parts[0][cnt1] = -1;
+		matching_parts[1][cnt1] = -1;
+	}
+
+	/* Set pairs of matching partitions. */
+	for (cnt1 = 0; cnt1 < nparts1; cnt1++)
+	{
+		if (mergemap1[cnt1] >= 0)
+		{
+			Assert(mergemap1[cnt1] < nparts);
+			matching_parts[0][mergemap1[cnt1]] = cnt1;
+		}
+	}
+	for (cnt2 = 0; cnt2 < nparts2; cnt2++)
+	{
+		if (mergemap2[cnt2] >= 0)
+		{
+			Assert(mergemap2[cnt2] < nparts);
+			matching_parts[1][mergemap2[cnt2]] = cnt2;
+		}
+	}
+
+	/*
+	 * If we have a partition missing on an inner side, we need to add a dummy
+	 * relation which joins with the outer partition. If the inner relation
+	 * happens to be a base relation, it will require adding a dummy child
+	 * base relation during join processing. Right now, we freeze the base
+	 * relation arrays like PlannerInfo::simple_rte_array after planning for
+	 * base relations. Adding a new (dummy) base relation would require some
+	 * changes to that. So, right now, we do not implement partition-wise join
+	 * in such cases.
+	 */
+	for (cnt1 = 0; cnt1 < nparts; cnt1++)
+	{
+		int			part1 = matching_parts[0][cnt1];
+		int			part2 = matching_parts[1][cnt1];
+
+		/* At least one of the partitions should exist. */
+		Assert(part1 >= 0 || part2 >= 0);
+
+		switch (jointype)
+		{
+			case JOIN_INNER:
+			case JOIN_SEMI:
+
+				/*
+				 * An inner or semi join can not return any row when the
+				 * matching partition on either side is missing. We should
+				 * have eliminated all such cases while merging the bounds.
+				 */
+				Assert(part1 >= 0 && part2 >= 0);
+				break;
+
+			case JOIN_LEFT:
+			case JOIN_ANTI:
+				Assert(part1 >= 0);
+				if (part2 < 0)
+					merged = false;
+				break;
+
+			case JOIN_RIGHT:
+				Assert(part2 >= 0);
+				if (part1 < 0)
+					merged = false;
+				break;
+
+			case JOIN_FULL:
+				if (part1 < 0 || part2 < 0)
+					merged = false;
+				break;
+
+			default:
+				/* We do not know what to do in this case. Bail out. */
+				merged = false;
+		}
+
+		if (!merged)
+			break;
+
+		*parts1 = lappend_int(*parts1, part1);
+		*parts2 = lappend_int(*parts2, part2);
+	}
+
+	pfree(matching_parts[0]);
+	pfree(matching_parts[1]);
+	pfree(matching_parts);
+
+	if (!merged)
+	{
+		list_free(*parts1);
+		list_free(*parts2);
+		*parts1 = NIL;
+		*parts2 = NIL;
+	}
+}
+
+static PartitionBoundInfo
+build_merged_partition_bounds(char strategy, List *merged_datums,
+							  List *merged_indexes, List *merged_kinds,
+							  int null_index)
+{
+	int			cnt;
+	PartitionBoundInfo merged_bounds;
+	ListCell   *lc;
+
+	/* We expect the same number of elements in datums and indexes lists. */
+	Assert(list_length(merged_datums) == list_length(merged_indexes));
+
+	merged_bounds = (PartitionBoundInfo) palloc(sizeof(PartitionBoundInfoData));
+	merged_bounds->strategy = strategy;
+	merged_bounds->ndatums = list_length(merged_datums);
+
+	if (strategy == PARTITION_STRATEGY_RANGE)
+	{
+		Assert(list_length(merged_datums) == list_length(merged_kinds));
+		merged_bounds->kind = (PartitionRangeDatumKind **) palloc(sizeof(PartitionRangeDatumKind *) *
+															   list_length(merged_kinds));
+		cnt = 0;
+		foreach(lc, merged_kinds)
+			merged_bounds->kind[cnt++] = lfirst(lc);
+
+		/* There are ndatums+1 indexes in case of range partitions */
+		merged_indexes = lappend_int(merged_indexes, -1);
+	}
+	else
+		merged_bounds->kind = NULL;
+
+	cnt = 0;
+	merged_bounds->datums = (Datum **) palloc(sizeof(Datum *) *
+											  list_length(merged_datums));
+	foreach(lc, merged_datums)
+		merged_bounds->datums[cnt++] = lfirst(lc);
+
+	merged_bounds->indexes = (int *) palloc(sizeof(int) *
+											list_length(merged_indexes));
+	cnt = 0;
+	foreach(lc, merged_indexes)
+		merged_bounds->indexes[cnt++] = lfirst_int(lc);
+
+	merged_bounds->null_index = null_index;
+
+	return merged_bounds;
+}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a97a895..fb8d752 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1308,8 +1308,13 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 						RelOptInfo *joinrel, SpecialJoinInfo *parent_sjinfo,
 						List *parent_restrictlist)
 {
-	int			nparts;
 	int			cnt_parts;
+	PartitionScheme part_scheme;
+	PartitionBoundInfo join_boundinfo;
+	List	   *parts1;
+	List	   *parts2;
+	ListCell   *lc1;
+	ListCell   *lc2;
 
 	/* Guard against stack overflow due to overly deep partition hierarchy. */
 	check_stack_depth();
@@ -1353,39 +1358,54 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 	 */
 	Assert(joinrel->part_scheme == rel1->part_scheme &&
 		   joinrel->part_scheme == rel2->part_scheme);
+	part_scheme = joinrel->part_scheme;
 
 	/*
-	 * Since we allow partition-wise join only when the partition bounds of
-	 * the joining relations exactly match, the partition bounds of the join
-	 * should match those of the joining relations.
+	 * Get the list of matching partitions from both sides of the join. While
+	 * doing so, we also build the partition bounds of the join relation,
+	 * which should match the bounds calculated for other pairs. TODO: why
+	 * should every pair result in the same partition bounds?
 	 */
-	Assert(partition_bounds_equal(joinrel->part_scheme->partnatts,
-								  joinrel->part_scheme->parttyplen,
-								  joinrel->part_scheme->parttypbyval,
-								  joinrel->boundinfo, rel1->boundinfo));
-	Assert(partition_bounds_equal(joinrel->part_scheme->partnatts,
-								  joinrel->part_scheme->parttyplen,
-								  joinrel->part_scheme->parttypbyval,
-								  joinrel->boundinfo, rel2->boundinfo));
-
-	nparts = joinrel->nparts;
-
+	join_boundinfo = partition_bounds_merge(part_scheme->partnatts,
+											part_scheme->partsupfunc,
+											part_scheme->parttypcoll,
+											rel1->boundinfo, rel1->nparts,
+											rel2->boundinfo, rel2->nparts,
+											parent_sjinfo->jointype,
+											&parts1, &parts2);
+
+	Assert(join_boundinfo);
+	Assert(partition_bounds_equal(part_scheme->partnatts,
+								  part_scheme->parttyplen,
+								  part_scheme->parttypbyval, join_boundinfo,
+								  joinrel->boundinfo));
 	elog(DEBUG3, "join between relations %s and %s is considered for partition-wise join.",
 		 bmsToString(rel1->relids), bmsToString(rel2->relids));
 
-	/* Allocate space to hold child-joins RelOptInfos, if not already done. */
+	/*
+	 * Every pair of joining relations should result in the same number of
+	 * child-joins.
+	 */
+	Assert(joinrel->nparts == list_length(parts1));
+	Assert(joinrel->nparts == list_length(parts2));
+
+	/* Allocate space for hold child-joins RelOptInfos, if not already done. */
 	if (!joinrel->part_rels)
-		joinrel->part_rels = (RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
+		joinrel->part_rels = (RelOptInfo **) palloc0(sizeof(RelOptInfo *) *
+													 joinrel->nparts);
 
 	/*
 	 * Create child-join relations for this partitioned join, if those don't
 	 * exist. Add paths to child-joins for a pair of child relations
 	 * corresponding to the given pair of parent relations.
 	 */
-	for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
+	cnt_parts = 0;
+	forboth(lc1, parts1, lc2, parts2)
 	{
-		RelOptInfo *child_rel1 = rel1->part_rels[cnt_parts];
-		RelOptInfo *child_rel2 = rel2->part_rels[cnt_parts];
+		int			part1 = lfirst_int(lc1);
+		int			part2 = lfirst_int(lc2);
+		RelOptInfo *child_rel1;
+		RelOptInfo *child_rel2;
 		SpecialJoinInfo *child_sjinfo;
 		List	   *child_restrictlist;
 		RelOptInfo *child_joinrel;
@@ -1393,6 +1413,10 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		AppendRelInfo **appinfos;
 		int			nappinfos;
 
+		Assert(part1 >= 0 && part2 >= 0);
+		child_rel1 = rel1->part_rels[part1];
+		child_rel2 = rel2->part_rels[part2];
+
 		/* We should never try to join two overlapping sets of rels. */
 		Assert(!bms_overlap(child_rel1->relids, child_rel2->relids));
 		child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids);
@@ -1425,6 +1449,15 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 			joinrel->part_rels[cnt_parts] = child_joinrel;
 		}
 
+		/*
+		 * For every pair of joining relations, the set of matching partitions
+		 * would change. However, the base relation partitions constituting
+		 * the given child should remain same for all the joining pairs. Since
+		 * the order in which children are stored in the array of child-joins,
+		 * depends upon partition bounds of the join, which are same for all
+		 * the joining pairs, every joining pair yields the child-joins in the
+		 * same order.
+		 */
 		Assert(bms_equal(child_joinrel->relids, child_joinrelids));
 
 		populate_joinrel_with_paths(root, child_rel1, child_rel2,
@@ -1437,7 +1470,11 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		 */
 		try_partition_wise_join(root, child_rel1, child_rel2, child_joinrel,
 								child_sjinfo, child_restrictlist);
+
+		cnt_parts++;
 	}
+
+	Assert(cnt_parts == joinrel->nparts);
 }
 
 /*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 72b6832..c58b00c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1623,6 +1623,9 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	int			partnatts;
 	int			cnt;
 	PartitionScheme part_scheme;
+	PartitionBoundInfo join_boundinfo;
+	List	   *parts1;
+	List	   *parts2;
 
 	/* Nothing to do if partition-wise join technique is disabled. */
 	if (!enable_partition_wise_join)
@@ -1663,17 +1666,26 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 		   REL_HAS_ALL_PART_PROPS(inner_rel));
 
 	/*
-	 * For now, our partition matching algorithm can match partitions only
-	 * when the partition bounds of the joining relations are exactly same.
-	 * So, bail out otherwise.
+	 * Every pair of joining relations would yield the same partition bounds
+	 * for a given join (TODO: why?) so we compute the bounds only the first
+	 * time. Then for every pair we find the pairs of matching partitions from
+	 * the joining relations and join those. TODO: Needs a better explanation
+	 * of why is this true.  TODO: Also there is no reason to have
+	 * part_indexes1 and part_indexes2 pulled here just to be freed up later.
+	 * So, we might want to do something better.
 	 */
-	if (outer_rel->nparts != inner_rel->nparts ||
-		!partition_bounds_equal(part_scheme->partnatts,
-								part_scheme->parttyplen,
-								part_scheme->parttypbyval,
-								outer_rel->boundinfo, inner_rel->boundinfo))
+	join_boundinfo = partition_bounds_merge(part_scheme->partnatts,
+											part_scheme->partsupfunc,
+											part_scheme->parttypcoll,
+											outer_rel->boundinfo,
+											outer_rel->nparts,
+											inner_rel->boundinfo,
+											inner_rel->nparts,
+											jointype, &parts1, &parts2);
+	if (!join_boundinfo)
 	{
 		Assert(!IS_PARTITIONED_REL(joinrel));
+		Assert(!parts1 && !parts2);
 		return;
 	}
 
@@ -1686,13 +1698,16 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 		   !joinrel->nullable_partexprs && !joinrel->part_rels &&
 		   !joinrel->boundinfo);
 
+	Assert(list_length(parts1) == list_length(parts2));
+
 	/*
 	 * Join relation is partitioned using same partitioning scheme as the
-	 * joining relations and has same bounds.
+	 * joining relations. It will have as many partitions as the pairs of
+	 * matching partitions we found.
 	 */
 	joinrel->part_scheme = part_scheme;
-	joinrel->boundinfo = outer_rel->boundinfo;
-	joinrel->nparts = outer_rel->nparts;
+	joinrel->nparts = list_length(parts1);
+	joinrel->boundinfo = join_boundinfo;
 	partnatts = joinrel->part_scheme->partnatts;
 
 	/*
@@ -1813,4 +1828,9 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 			joinrel->nullable_partexprs[cnt] = nullable_partexprs;
 		}
 	}
+
+	/* TODO: OR we could actually create the child-join relations here.*/
+	list_free(parts1);
+	list_free(parts2);
+
 }
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2283c67..056a4f9 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -99,4 +99,10 @@ extern int get_partition_for_tuple(PartitionDispatch *pd,
 						EState *estate,
 						PartitionDispatchData **failed_at,
 						TupleTableSlot **failed_slot);
+extern PartitionBoundInfo partition_bounds_merge(int partnatts,
+					   FmgrInfo *supfuncs, Oid *collations,
+					   PartitionBoundInfo boundinfo1, int nparts1,
+					   PartitionBoundInfo boundinfo2, int nparts2,
+					   JoinType jointype, List **parts1, List **parts2);
+
 #endif							/* PARTITION_H */
-- 
1.7.9.5

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to