Re: [HACKERS] PoC: full merge join on comparison clause

Alexander Kuzmenkov Tue, 10 Jul 2018 09:37:43 -0700

I tried to fix the things you mentioned and improve the comments. Amongother changes, there is now a description of how merge join works withinequalities at the top of nodeMergejoin.c. It also explains why we onlysupport one inequality clause.


Some particular points:


On 07/06/2018 04:01 PM, Ashutosh Bapat wrote:

-        StrategyNumber opstrategy = mergestrategies[iClause];
+        StrategyNumber sort_strategy = mergestrategies[iClause];
-        int            op_strategy;
+        int            join_strategy;
I don't see a reason why should we change the name of variable here. These are
operator strategies and there's no need to change their names. The name change
is introducing unnecessary diffs.

These variables have different meaning but their names differ only withan underscore. When I had to change this function, I made mistakesbecause of this. I'd keep the descriptive names to avoid furtherconfusion. Should this be a separate patch?

I think we should write a wrapper around MJCompare which interprets the result 
rather
than changing MJCompare itself. OR at least change the name of MJCompare.

Renamed the function to MJTestTuples to reflect that it decides whetherwe join tuples or advance either side.

-         * for get_mergejoin_opfamilies().
+         * for get_equality_opfamilies().

I think we should leave mergejoin word in there or at least indicate that these
are btree opfamilies so that we don't confuse it with hash equality operator
families.

Renamed these to get_btree_equality_opfamilies() andget_btree_comparison_opfamilies().

+        if (parent->mj_Ineq_Present)
+            elog(ERROR, "inequality mergejoin clause must be the last one");
+

IIUC, this never happens. If it really happens, we have created a path which
can not be used practically. That should never happen. It will help to add a
comment here clarifying that situation.

This is just a cross-check for the planner. Added a comment. We shouldprobably use a separate error code for internal errors as opposed touser errors, but I'm not sure if we have one, I see just elog(ERROR)being used everywhere.


+    bool        have_inequality = have_inequality_mergeclause(mergeclauses);

There will be many paths created with different ordering of pathkeys. So,
instead of calling have_inequality_mergeclause() for each of those paths, it's
better to save this status in the path itself when creating the path.

I removed this function altogether, because we can just check the lastmerge clause. When we cost the path, we already have a propermergejoinable list of clauses, so if there is an inequality clause, it'sthe last one.

          /* Make a pathkey list with this guy first */
          if (l != list_head(all_pathkeys))
+        {
+            if (have_inequality && l == list_tail(all_pathkeys))
+                /* Inequality merge clause must be the last, we can't
move it */
+                break;
+

I am kind of baffled by this change. IIUC the way we create different orderings
of pathkeys here, we are just rotating the pathkeys in circular order. This
means there is exactly one ordering of pathkeys where the pathkey corresponding
to the inequality clause is the last one.

This code does not rotate the pathkeys circularly, but puts each of themin the first position, and keeps the rest in the original order.Say, if we have three equality pathkeys, and one inequality pathkey atthe end (let's denote them as E1, E2, E3, IE), the permutations it trieswill be like this:

E1 E2 E3 IE
E2 E1 E3 IE
E3 E1 E2 IE
Does this sound right?

      /* Might have no mergeclauses */
      if (nClauses == 0)
          return NIL;

+    {
+        List *ineq_clauses = find_inequality_clauses(mergeclauses);
+
+        if (list_length(ineq_clauses) > 1)
+            return NIL;

Without this patch, when there is an inequality clause with FULL JOIN, we will
not create a merge join path because select_mergejoin_clauses() will set
mergejoin_allowed to false. This means that we won't call
sort_inner_and_outer(). I think this patch also has to do the same i.e. when
there are more than one inequality clauses, select_mergejoin_clauses() should
set mergejoin_allowed to false in case of a FULL JOIN since merge join
machinary won't be able to handle that case.

If we do that, we could arrange extra.mergeclause_list such that the inequality
clause is always at the end thus finding inequality clause would be easy.

I changed select_mergejoin_clauses() to filter multiple inequalityclauses and disable join if needed. Now we can use extra inequalities asjoin filter, if it's not full join. I didn't reorderextra.mergeclause_list there, because this order is ignored later.select_outer_pathkeys_for_merge() chooses the order of pathkeys usingsome heuristics, and then find_mergeclauses_for_outer_pathkeys()reorders the clauses accordingly.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 6171be01494422a3cad5b5cfea6f70d2437fd8bc Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmen...@postgrespro.ru>
Date: Tue, 10 Apr 2018 12:31:21 +0300
Subject: [PATCH 2/2] Inequality merge join.

Perform merge joins on inequality clause. The current merge join
algorithm requires minimal modification to support one inequality clause
at the final position. This has performance benefits in some cases, and also
allows to perform full joins on inequality, which was not possible
before.
This commit modifies the merge join path generation logic and cost functions to
account for inequality clauses, and adds some tests.
---
 src/backend/executor/nodeMergejoin.c         | 135 +++++++++++++--
 src/backend/optimizer/path/costsize.c        |  27 ++-
 src/backend/optimizer/path/joinpath.c        |  27 ++-
 src/backend/optimizer/path/pathkeys.c        | 218 ++++++++++++++++++++---
 src/backend/optimizer/plan/initsplan.c       |  19 ++
 src/backend/utils/adt/selfuncs.c             |  40 +++--
 src/backend/utils/cache/lsyscache.c          |  40 +++++
 src/include/nodes/execnodes.h                |   5 +
 src/include/optimizer/paths.h                |   3 +-
 src/include/utils/lsyscache.h                |   1 +
 src/test/regress/expected/join.out           | 250 +++++++++++++++++++++++----
 src/test/regress/expected/partition_join.out |   4 +
 src/test/regress/sql/join.sql                |  66 ++++++-
 src/test/regress/sql/partition_join.sql      |   5 +
 14 files changed, 745 insertions(+), 95 deletions(-)

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 7298e1c..fea47cb 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -89,6 +89,44 @@
  *		proceed to another state.  This state is stored in the node's
  *		execution state information and is preserved across calls to
  *		ExecMergeJoin. -cim 10/31/89
+ *
+ * 		This algorithm can work almost as-is when the last join clause
+ * 		is not an equality clause, but a comparison one. This introduces
+ * 		an additional restriction to the ordering of the inputs: when
+ * 		moving to the next outer tuple, the beginning of the matching
+ * 		stretch of inner tuples must not change. For example, if the
+ * 		join operator is ">=", inputs must be in ascending order.
+ * 		Consider this example:
+ * 			outer  >= innner
+ * 				1		0 - first match for outer = 1, 2
+ * 				2		1 - last match for outer = 1
+ * 						2 - last match for outer = 2
+ *
+ * 		And if the inputs were sorted in descending order:
+ * 			outer  >= inner
+ * 				2		2 - first match for outer = 2
+ * 				1		1 - first match for outer = 1
+ * 						0 - last match for outer = 1, 2
+ *
+ * 		It can be seen that the beginning of the matching interval of
+ * 		inner tuples changes when we move to the next outer tuple.
+ * 		Supporting this, i.e. testing and advancing the marked tuple,
+ * 		would complicate the join algorithm. Instead of that, we have
+ * 		the planner ensure that the inputs are suitably ordered, and
+ * 		recheck this on initialization.
+ *
+ * 		In other words, we can easily support joining inner tuples that
+ * 		are	effectively "less", or "ordered before", the outer tuple in the
+ * 		given input ordering. It is enough to modify the tuple test
+ * 		function so that it chooses to join the inner tuples that compare
+ * 		"less", if so required by the respective join clause.
+ *
+ * 		If the inequality clause is not the last one, or if there are several
+ * 		of them, this algorithm doesn't work, because it is not possible to
+ * 		sort the inputs in such a way that given an outer tuple, the matching
+ * 		inner tuples form a contiguous interval. The planner takes care to
+ * 		select and order the clauses appropriately, and we recheck this at
+ * 		initialization.
  */
 #include "postgres.h"
 
@@ -157,20 +195,26 @@ typedef enum
  * MJExamineQuals
  *
  * This deconstructs the list of mergejoinable expressions, which is given
- * to us by the planner in the form of a list of "leftexpr = rightexpr"
+ * to us by the planner in the form of a list of "leftexpr operator rightexpr"
  * expression trees in the order matching the sort columns of the inputs.
- * We build an array of MergeJoinClause structs containing the information
- * we will need at runtime.  Each struct essentially tells us how to compare
- * the two expressions from the original clause.
+ * The "operator" here may be a btree equality or comparison. We build an
+ * array of MergeJoinClause structs containing the information we will need
+ * at runtime. Each struct essentially tells us how to compare the two
+ * expressions from the original clause. We record additional information
+ * about the inequality clause directly to MergeJoinState, because we can
+ * have at most one.
  *
  * In addition to the expressions themselves, the planner passes the btree
  * opfamily OID, collation OID, btree strategy number (BTLessStrategyNumber or
  * BTGreaterStrategyNumber), and nulls-first flag that identify the intended
- * sort ordering for each merge key.  The mergejoinable operator is an
- * equality operator in the opfamily, and the two inputs are guaranteed to be
+ * sort ordering for each merge key.  The mergejoinable operator is a
+ * comparison operator in the opfamily, and the two inputs are guaranteed to be
  * ordered in either increasing or decreasing (respectively) order according
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
+ * For inequality merge clause, we check that the sort ordering is compatible
+ * with the clause operator, and determine whether to join tuples that compare
+ * "equal". We always join tuples that compare "less".
  */
 static void
 MJExamineQuals(List *mergeclauses,
@@ -201,6 +245,13 @@ MJExamineQuals(List *mergeclauses,
 		Oid			op_righttype;
 		Oid			sortfunc;
 
+		/*
+		 * Check that there is no planner error and we have no more than
+		 * one inequality clause.
+		 */
+		if (parent->mj_Ineq_Present)
+			elog(ERROR, "inequality mergejoin clause must be the last one");
+
 		if (!IsA(qual, OpExpr))
 			elog(ERROR, "mergejoin clause is not an OpExpr");
 
@@ -228,9 +279,38 @@ MJExamineQuals(List *mergeclauses,
 								   &join_strategy,
 								   &op_lefttype,
 								   &op_righttype);
-		if (join_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
-				 qual->opno);
+
+		/*
+		 * If it's an inequality clause, determine whether we join tuples that
+		 * compare "equal". Also check for compatibility with the sort direction.
+		 */
+		if (join_strategy != BTEqualStrategyNumber)
+		{
+			parent->mj_Ineq_Present = true;
+			switch (join_strategy)
+			{
+				case BTLessEqualStrategyNumber:
+					parent->mj_Ineq_JoinEqual = true;
+					/* fall through */
+				case BTLessStrategyNumber:
+					if (sort_strategy != BTGreaterStrategyNumber)
+						elog(ERROR, "join strategy %d is not compatible with sort strategy %d",
+							 join_strategy, sort_strategy);
+					break;
+
+				case BTGreaterEqualStrategyNumber:
+					parent->mj_Ineq_JoinEqual = true;
+					/* fall through */
+				case BTGreaterStrategyNumber:
+					if (sort_strategy != BTLessStrategyNumber)
+						elog(ERROR, "join strategy %d is not compatible with sort strategy %d",
+							 join_strategy, sort_strategy);
+					break;
+
+				default:
+					elog(ERROR, "unsupported join strategy %d", join_strategy);
+			}
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -416,6 +496,19 @@ MJTestTuples(MergeJoinState *mergestate)
 	{
 		MergeJoinClause clause = &mergestate->mj_Clauses[i];
 		int			sort_result;
+		bool join_equal = true;
+		bool join_lesser = false;
+
+		if (mergestate->mj_Ineq_Present && i == mergestate->mj_NumClauses - 1)
+		{
+			/*
+			 * If the last merge clause is an inequality, check whether
+			 * we have to join the inner tuples that compare as "equal".
+			 * We always join tuples that compare as "less".
+			 */
+			join_equal = mergestate->mj_Ineq_JoinEqual;
+			join_lesser = true;
+		}
 
 		/*
 		 * Special case for NULL-vs-NULL, else use standard comparison.
@@ -426,21 +519,35 @@ MJTestTuples(MergeJoinState *mergestate)
 			continue;
 		}
 
+		/* Left is outer. */
 		sort_result = ApplySortComparator(clause->ldatum, clause->lisnull,
 										  clause->rdatum, clause->risnull,
 										  &clause->ssup);
 
-		result = sort_result == 0 ? MJCR_Join
-					: sort_result < 0 ? MJCR_NextOuter : MJCR_NextInner;
+		if (sort_result < 0) /* outer "less" than inner */
+			result = MJCR_NextOuter;
+		else if (sort_result == 0) /* outer "equals" inner */
+		{
+			if (join_equal)
+				result = MJCR_Join;
+			else
+				result = MJCR_NextOuter;
+		}
+		else /* outer "greater" than equal */
+		{
+			if (join_lesser)
+				result = MJCR_Join;
+			else
+				result = MJCR_NextInner;
+		}
 
 		if (result != MJCR_Join)
 			break;
 	}
 
 	/*
-	 * If we had any NULL-vs-NULL inputs, we do not want to report that the
-	 * tuples are equal.  Instead, if result is still 0, change it to +1. This
-	 * will result in advancing the inner side of the join.
+	 * If we had any NULL-vs-NULL inputs, we do not want to join the tuples.
+	 * Instead, advance the inner side of the join.
 	 *
 	 * Likewise, if there was a constant-false joinqual, do not report
 	 * equality.  We have to check this as part of the mergequals, else the
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a2a7e0c..16993e4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2851,6 +2851,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	double		mergejointuples,
 				rescannedtuples;
 	double		rescanratio;
+	bool		have_inequality;
 
 	/* Protect some assumptions below that rowcounts aren't zero or NaN */
 	if (inner_path_rows <= 0 || isnan(inner_path_rows))
@@ -2904,6 +2905,13 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 		path->skip_mark_restore = false;
 
 	/*
+	 * Check whether one of join clauses is an inequality. It can only
+	 * be the last one, as required by our merge join algorithm.
+	 */
+	have_inequality = list_tail(mergeclauses) != NULL
+		&& !((RestrictInfo *) lfirst(list_tail(mergeclauses)))->is_mj_equality;
+
+	/*
 	 * Get approx # tuples passing the mergequals.  We use approx_tuple_count
 	 * here because we need an estimate done with JOIN_INNER semantics.
 	 */
@@ -2932,18 +2940,25 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
 	 * when we should not.  Can we do better without expensive selectivity
 	 * computations?
 	 *
+	 * Also, if merge clauses contain inequality, n_i matches all m_k where i <= k.
+	 * From that we derive: rescanned tuples = (m1 - 1) * n1 + (m2 - 1) * (n1 + n2)
+	 * + ... =  m1 * n1 + m2 * (n1 + n2) + ... - n1 - (n1 + n2) - ...
+	 * In the limit case of n_i = 1, n1 + (n1 + n2) + ... = sum(n_i) ^ 2 / 2.
+	 * Therefore, rescanned tuples = size of join - (inner_rows) ^ 2 / 2.
+	 *
 	 * The whole issue is moot if we are working from a unique-ified outer
 	 * input, or if we know we don't need to mark/restore at all.
 	 */
-	if (IsA(outer_path, UniquePath) ||path->skip_mark_restore)
+	if (have_inequality)
+		rescannedtuples = mergejointuples - inner_path_rows * inner_path_rows / 2.;
+	else if (IsA(outer_path, UniquePath) ||path->skip_mark_restore)
 		rescannedtuples = 0;
 	else
-	{
 		rescannedtuples = mergejointuples - inner_path_rows;
-		/* Must clamp because of possible underestimate */
-		if (rescannedtuples < 0)
-			rescannedtuples = 0;
-	}
+
+	/* Must clamp because of possible underestimate */
+	if (rescannedtuples < 0)
+		rescannedtuples = 0;
 	/* We'll inflate various costs this much to account for rescanning */
 	rescanratio = 1.0 + (rescannedtuples / inner_path_rows);
 
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..1dcbff1 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -892,6 +892,7 @@ sort_inner_and_outer(PlannerInfo *root,
 	Path	   *cheapest_safe_inner = NULL;
 	List	   *all_pathkeys;
 	ListCell   *l;
+	bool		have_inequality = false;
 
 	/*
 	 * We only consider the cheapest-total-cost input paths, since we are
@@ -992,7 +993,7 @@ sort_inner_and_outer(PlannerInfo *root,
 	 */
 	all_pathkeys = select_outer_pathkeys_for_merge(root,
 												   extra->mergeclause_list,
-												   joinrel);
+												   joinrel, &have_inequality);
 
 	foreach(l, all_pathkeys)
 	{
@@ -1004,9 +1005,15 @@ sort_inner_and_outer(PlannerInfo *root,
 
 		/* Make a pathkey list with this guy first */
 		if (l != list_head(all_pathkeys))
+		{
+			if (have_inequality && l == list_tail(all_pathkeys))
+				/* Inequality merge clause must be the last, we can't move it */
+				break;
+
 			outerkeys = lcons(front_pathkey,
 							  list_delete_ptr(list_copy(all_pathkeys),
 											  front_pathkey));
+		}
 		else
 			outerkeys = all_pathkeys;	/* no work at first one... */
 
@@ -1924,6 +1931,8 @@ hash_inner_and_outer(PlannerInfo *root,
  * We examine each restrictinfo clause known for the join to see
  * if it is mergejoinable and involves vars from the two sub-relations
  * currently of interest.
+ *
+ * We also allow no more than one inequality clause.
  */
 static List *
 select_mergejoin_clauses(PlannerInfo *root,
@@ -1937,6 +1946,7 @@ select_mergejoin_clauses(PlannerInfo *root,
 	List	   *result_list = NIL;
 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
 	bool		have_nonmergeable_joinclause = false;
+	bool		have_inequality = false;
 	ListCell   *l;
 
 	foreach(l, restrictlist)
@@ -2005,6 +2015,21 @@ select_mergejoin_clauses(PlannerInfo *root,
 			continue;			/* can't handle redundant eclasses */
 		}
 
+		/*
+		 * Check that there is at most one inequality clause. We don't care
+		 * about the order of the clauses here, this is handled by
+		 * select_outer_pathkeys_for_merge().
+		 */
+		if (!restrictinfo->is_mj_equality)
+		{
+			if (have_inequality)
+			{
+				have_nonmergeable_joinclause = true;
+				continue;
+			}
+			have_inequality = true;
+		}
+
 		result_list = lappend(result_list, restrictinfo);
 	}
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 0fa6f91..8935d2b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -990,6 +990,44 @@ update_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
 }
 
 /*
+ * Determine the sort order required by an inequality merge clause.
+ */
+static int
+get_merge_sort_strategy(RestrictInfo *rinfo)
+{
+	Oid opfamily = linitial_oid(rinfo->mergeopfamilies);
+	Oid opno;
+	int join_strategy;
+	Oid lefttype;
+	Oid righttype;
+	bool sort_ascending;
+
+	Assert(IsA(rinfo->clause, OpExpr));
+	opno = ((OpExpr *) rinfo->clause)->opno;
+	get_op_opfamily_properties(opno, opfamily,
+							   false /* ordering_op */ , &join_strategy,
+							   &lefttype, &righttype);
+	switch (join_strategy)
+	{
+		case BTLessEqualStrategyNumber:
+		case BTLessStrategyNumber:
+			sort_ascending = false;
+			break;
+		case BTGreaterEqualStrategyNumber:
+		case BTGreaterStrategyNumber:
+			sort_ascending = true;
+			break;
+		default:
+			elog(ERROR, "unknown merge join clause strategy %d\n", join_strategy);
+	}
+
+	if (!rinfo->outer_is_left)
+		sort_ascending = !sort_ascending;
+
+	return sort_ascending ? BTLessStrategyNumber : BTGreaterStrategyNumber;
+}
+
+/*
  * find_mergeclauses_for_outer_pathkeys
  *	  This routine attempts to find a list of mergeclauses that can be
  *	  used with a specified ordering for the join's outer relation.
@@ -1028,6 +1066,7 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
 		PathKey    *pathkey = (PathKey *) lfirst(i);
 		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
 		List	   *matched_restrictinfos = NIL;
+		RestrictInfo *matched_ineq = NULL;
 		ListCell   *j;
 
 		/*----------
@@ -1065,6 +1104,10 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
 		 * has to delete duplicates when it constructs the inner pathkeys
 		 * list, and we also have to deal with such cases specially in
 		 * create_mergejoin_plan().
+		 *
+		 * For inequality merge clauses, make sure that the direction of
+		 * pathkey is compatible with the merge clause operator. Also, allow
+		 * no more than one inequality clause.
 		 *----------
 		 */
 		foreach(j, restrictinfos)
@@ -1074,11 +1117,31 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
 
 			clause_ec = rinfo->outer_is_left ?
 				rinfo->left_ec : rinfo->right_ec;
-			if (clause_ec == pathkey_ec)
+
+			if (clause_ec != pathkey_ec)
+				continue;
+
+			if (rinfo->is_mj_equality)
 				matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
+			else if (pathkey->pk_strategy == get_merge_sort_strategy(rinfo))
+			{
+				if (matched_ineq)
+					break; /* can't match more than one inequality clause */
+
+				matched_ineq = rinfo;
+			}
 		}
 
 		/*
+		 * If we did find usable mergeclause(s) for this sort-key position,
+		 * add them to result list. If present, add inequality clause to
+		 * the final position.
+		 */
+		mergeclauses = list_concat(mergeclauses, matched_restrictinfos);
+		if (matched_ineq)
+			mergeclauses = lappend(mergeclauses, matched_ineq);
+
+		/*
 		 * If we didn't find a mergeclause, we're done --- any additional
 		 * sort-key positions in the pathkeys are useless.  (But we can still
 		 * mergejoin if we found at least one mergeclause.)
@@ -1087,10 +1150,11 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
 			break;
 
 		/*
-		 * If we did find usable mergeclause(s) for this sort-key position,
-		 * add them to result list.
+		 * If we already have an inequality clause, we can't add any more
+		 * clauses after it.
 		 */
-		mergeclauses = list_concat(mergeclauses, matched_restrictinfos);
+		if (matched_ineq)
+			break;
 	}
 
 	return mergeclauses;
@@ -1110,6 +1174,7 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
  * select_mergejoin_clauses())
  *
  * Returns a pathkeys list that can be applied to the outer relation.
+ * If there is an inequality clause, *have_inequality is set to true.
  *
  * Since we assume here that a sort is required, there is no particular use
  * in matching any available ordering of the outerrel.  (joinpath.c has an
@@ -1123,20 +1188,45 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
 List *
 select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
-								RelOptInfo *joinrel)
+								RelOptInfo *joinrel,
+								bool *have_inequality)
 {
-	List	   *pathkeys = NIL;
+	List	   *eq_pathkeys = NIL;
 	int			nClauses = list_length(mergeclauses);
 	EquivalenceClass **ecs;
 	int		   *scores;
 	int			necs;
 	ListCell   *lc;
 	int			j;
+	PathKey	   *ineq_pathkey = NULL;
+	int ineq_strategy = BTLessStrategyNumber;
+	RestrictInfo *ineq_clause = NULL;
+	int ineq_ec_index = -1;
+
+	*have_inequality = false;
 
 	/* Might have no mergeclauses */
 	if (nClauses == 0)
 		return NIL;
 
+	/* Check if we have an inequality clause. */
+	foreach (lc, mergeclauses)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+		Assert(rinfo->mergeopfamilies);
+		if (!rinfo->is_mj_equality)
+		{
+			/*
+			 * Found an inequality clause, determine which sort strategy
+			 * it requires.
+			 */
+			ineq_clause = rinfo;
+			*have_inequality = true;
+			ineq_strategy = get_merge_sort_strategy(ineq_clause);
+			break;
+		}
+	}
+
 	/*
 	 * Make arrays of the ECs used by the mergeclauses (dropping any
 	 * duplicates) and their "popularity" scores.
@@ -1160,12 +1250,20 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 		else
 			oeclass = rinfo->right_ec;
 
-		/* reject duplicates */
+		/* Find the outer EC in the array. */
 		for (j = 0; j < necs; j++)
 		{
 			if (ecs[j] == oeclass)
 				break;
 		}
+		/*
+		 * Remember the index of the EC that corresponds to the
+		 * inequality clause.
+		 */
+		if (rinfo == ineq_clause)
+			ineq_ec_index = j;
+
+		/* If we have already processed this EC, no need to do it again. */
 		if (j < necs)
 			continue;
 
@@ -1181,15 +1279,27 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 				score++;
 		}
 
+		/* Add the EC to the arrays. */
 		ecs[necs] = oeclass;
 		scores[necs] = score;
 		necs++;
 	}
 
+	/* If there is an inequality clause, its EC index must be valid. */
+	Assert(ineq_clause == NULL || (ineq_ec_index >= 0 && ineq_ec_index < necs));
+
 	/*
 	 * Find out if we have all the ECs mentioned in query_pathkeys; if so we
 	 * can generate a sort order that's also useful for final output. There is
 	 * no percentage in a partial match, though, so we have to have 'em all.
+	 *
+	 * Moreover, the pathkey that corresponds to the inequality merge clause
+	 * must have a particular sort direction, so we check this too.
+	 *
+	 * If the inequality pathkey is included in root pathkeys, and some pathkeys
+	 * required by merge clauses are not included, we will not be able to produce
+	 * the required ordering, because these omitted pathkeys will have to go
+	 * before the inequality pathkey.
 	 */
 	if (root->query_pathkeys)
 	{
@@ -1205,13 +1315,36 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 			}
 			if (j >= necs)
 				break;			/* didn't find match */
+
+			if (j == ineq_ec_index)
+			{
+				if (lc != list_tail(root->query_pathkeys))
+					break; /* The inequality pathkey must be the last one. */
+
+				if (query_pathkey->pk_strategy != ineq_strategy)
+					break; /* The inequality pathkey has wrong  direction. */
+
+				/*
+				 * root->query_pathkeys shouldn't be redundant, so this pathkey
+				 * must be the first one we see for this equivalence class.
+				 */
+				Assert(ineq_pathkey == 0);
+
+				/*
+				 * The inequality pathkey will be processed separately, so store
+				 * it to ineq_pathkey.
+				 */
+				ineq_pathkey = query_pathkey;
+			}
 		}
-		/* if we got to the end of the list, we have them all */
+
+		/*
+		 * If we got to the end of the list, we have all the root pathkeys.
+		 * Copy them to the resulting list, skipping the inequality pathkey.
+		 * Mark the corresponding ECs as already emitted.
+		 */
 		if (lc == NULL)
 		{
-			/* copy query_pathkeys as starting point for our output */
-			pathkeys = list_copy(root->query_pathkeys);
-			/* mark their ECs as already-emitted */
 			foreach(lc, root->query_pathkeys)
 			{
 				PathKey    *query_pathkey = (PathKey *) lfirst(lc);
@@ -1225,14 +1358,18 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 						break;
 					}
 				}
+
+				if (query_pathkey != ineq_pathkey)
+					eq_pathkeys = lappend(eq_pathkeys, query_pathkey);
 			}
 		}
 	}
 
 	/*
-	 * Add remaining ECs to the list in popularity order, using a default sort
-	 * ordering.  (We could use qsort() here, but the list length is usually
-	 * so small it's not worth it.)
+	 * Add remaining ECs to the list in popularity order. (We could use qsort()
+	 * here, but the list length is usually so small it's not worth it.) Use
+	 * a default sort ordering for the equality clauses, and the ordering we
+	 * computed earlier for the inequality clause.
 	 */
 	for (;;)
 	{
@@ -1240,6 +1377,7 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 		int			best_score;
 		EquivalenceClass *ec;
 		PathKey    *pathkey;
+		int 		strategy;
 
 		best_j = 0;
 		best_score = scores[0];
@@ -1255,20 +1393,35 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 			break;				/* all done */
 		ec = ecs[best_j];
 		scores[best_j] = -1;
+		strategy = best_j == ineq_ec_index ? ineq_strategy : BTLessStrategyNumber;
 		pathkey = make_canonical_pathkey(root,
 										 ec,
 										 linitial_oid(ec->ec_opfamilies),
-										 BTLessStrategyNumber,
-										 false);
+										 strategy,
+										 strategy == BTGreaterStrategyNumber);
 		/* can't be redundant because no duplicate ECs */
-		Assert(!pathkey_is_redundant(pathkey, pathkeys));
-		pathkeys = lappend(pathkeys, pathkey);
+		Assert(!pathkey_is_redundant(pathkey, eq_pathkeys));
+
+		/*
+		 * The equality pathkeys are added to the list, and the inequality one is
+		 * recorded separately.
+		 */
+		if (best_j == ineq_ec_index)
+		{
+			Assert(ineq_pathkey == NULL);
+			ineq_pathkey = pathkey;
+		}
+		else
+			eq_pathkeys = lappend(eq_pathkeys, pathkey);
 	}
 
 	pfree(ecs);
 	pfree(scores);
 
-	return pathkeys;
+	if (ineq_pathkey)
+		return lappend(eq_pathkeys, ineq_pathkey);
+
+	return eq_pathkeys;
 }
 
 /*
@@ -1489,6 +1642,10 @@ trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
  * one of the directions happens to match an ORDER BY key, in which case
  * that direction should be preferred, in hopes of avoiding a final sort step.
  * right_merge_direction() implements this heuristic.
+ *
+ * Note that a merge join on an inequality clause can be performed only for
+ * a particular ordering of inputs, so we keep both sort directions if such
+ * clause is present.
  */
 static int
 pathkeys_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
@@ -1500,12 +1657,9 @@ pathkeys_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
 	{
 		PathKey    *pathkey = (PathKey *) lfirst(i);
 		bool		matched = false;
+		bool		right_direction = right_merge_direction(root, pathkey);
 		ListCell   *j;
 
-		/* If "wrong" direction, not useful for merging */
-		if (!right_merge_direction(root, pathkey))
-			break;
-
 		/*
 		 * First look into the EquivalenceClass of the pathkey, to see if
 		 * there are any members not yet joined to the rel.  If so, it's
@@ -1513,7 +1667,16 @@ pathkeys_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
 		 */
 		if (rel->has_eclass_joins &&
 			eclass_useful_for_merging(root, pathkey->pk_eclass, rel))
+		{
+			/*
+			 * If "wrong" direction, not useful for merging on an equality
+			 * clause.
+			 */
+			if (!right_direction)
+				return useful;
+
 			matched = true;
+		}
 		else
 		{
 			/*
@@ -1529,8 +1692,13 @@ pathkeys_useful_for_merging(PlannerInfo *root, RelOptInfo *rel, List *pathkeys)
 					continue;
 				update_mergeclause_eclasses(root, restrictinfo);
 
-				if (pathkey->pk_eclass == restrictinfo->left_ec ||
-					pathkey->pk_eclass == restrictinfo->right_ec)
+				/*
+				 * Consider pathkey useful if it has the "right" direction,
+				 * or if the correspoinding join clause is an inequality.
+				 */
+				if ((pathkey->pk_eclass == restrictinfo->left_ec
+					|| pathkey->pk_eclass == restrictinfo->right_ec)
+					&& (right_direction || !restrictinfo->is_mj_equality))
 				{
 					matched = true;
 					break;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e7aa058..929bc60 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2018,6 +2018,20 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
 			initialize_mergeclause_eclasses(root, restrictinfo);
 		}
 	}
+	else if (restrictinfo->mergeopfamilies)
+	{
+		/* Not an equality clause, but maybe still mergejoinable? */
+		initialize_mergeclause_eclasses(root, restrictinfo);
+
+		if (maybe_outer_join
+			&& jointype == JOIN_FULL
+			&& restrictinfo->can_join)
+		{
+			root->full_join_clauses = lappend(root->full_join_clauses,
+							  restrictinfo);
+			return;
+		}
+	}
 
 	/* No EC special case applies, so push it into the clause lists */
 	distribute_restrictinfo_to_rels(root, restrictinfo);
@@ -2630,6 +2644,11 @@ check_mergejoinable(RestrictInfo *restrictinfo)
 			restrictinfo->mergeopfamilies = get_btree_equality_opfamilies(opno);
 			restrictinfo->is_mj_equality = true;
 		}
+		else
+		{
+			restrictinfo->mergeopfamilies = get_btree_comparison_opfamilies(opno);
+			restrictinfo->is_mj_equality = false;
+		}
 	}
 
 	/*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4b08cdb..2e4d404 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3043,7 +3043,6 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
 							   &op_strategy,
 							   &op_lefttype,
 							   &op_righttype);
-	Assert(op_strategy == BTEqualStrategyNumber);
 
 	/*
 	 * Look up the various operators we need.  If we don't find them all, it
@@ -3226,18 +3225,39 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
 	if (selec != DEFAULT_INEQ_SEL)
 		*rightstart = selec;
 
-	/*
-	 * Only one of the two "start" fractions can really be more than zero;
-	 * believe the larger estimate and reset the other one to exactly 0.0. If
-	 * we get exactly equal estimates (as can easily happen with self-joins),
-	 * believe neither.
-	 */
-	if (*leftstart < *rightstart)
+	if (op_strategy == BTLessStrategyNumber
+		|| op_strategy == BTLessEqualStrategyNumber)
+	{
+		/*
+		 * If the left variable must be less than right, its first tuple
+		 * will already produce the first join pair.
+		 */
 		*leftstart = 0.0;
-	else if (*leftstart > *rightstart)
+	}
+	else if (op_strategy == BTGreaterStrategyNumber
+			 || op_strategy == BTGreaterEqualStrategyNumber)
+	{
+		/*
+		 * Similarly for the right variable and greater operator.
+		 */
 		*rightstart = 0.0;
+	}
 	else
-		*leftstart = *rightstart = 0.0;
+	{
+		Assert(op_strategy == BTEqualStrategyNumber);
+		/*
+		 * Only one of the two "start" fractions can really be more than zero;
+		 * believe the larger estimate and reset the other one to exactly 0.0. If
+		 * we get exactly equal estimates (as can easily happen with self-joins),
+		 * believe neither.
+		 */
+		if (*leftstart < *rightstart)
+			*leftstart = 0.0;
+		else if (*leftstart > *rightstart)
+			*rightstart = 0.0;
+		else
+			*leftstart = *rightstart = 0.0;
+	}
 
 	/*
 	 * If the sort order is nulls-first, we're going to have to skip over any
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a8a175c..fcff959 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -389,6 +389,46 @@ get_btree_equality_opfamilies(Oid opno)
 }
 
 /*
+ * get_btree_comparison_opfamilies
+ *		Given an operator, returns a list of operator families in which it
+ * 		represents btree comparison.
+ *
+ * Also see the comment for get_btree_equality_opfamilies().
+ */
+List *
+get_btree_comparison_opfamilies(Oid opno)
+{
+	List	   *result = NIL;
+	CatCList   *catlist;
+	int			i;
+
+	/*
+	 * Search pg_amop to see if the target operator is registered as the "<"
+	 * or ">" operator of any btree opfamily.
+	 */
+	catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno));
+
+	for (i = 0; i < catlist->n_members; i++)
+	{
+		HeapTuple	tuple = &catlist->members[i]->tuple;
+		Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple);
+
+		if (aform->amopmethod == BTREE_AM_OID
+			&& (aform->amopstrategy == BTLessStrategyNumber
+				|| aform->amopstrategy == BTLessEqualStrategyNumber
+				|| aform->amopstrategy == BTGreaterStrategyNumber
+				|| aform->amopstrategy == BTGreaterEqualStrategyNumber))
+		{
+			result = lappend_oid(result, aform->amopfamily);
+		}
+	}
+
+	ReleaseSysCacheList(catlist);
+
+	return result;
+}
+
+/*
  * get_compatible_hash_operators
  *		Get the OID(s) of hash equality operator(s) compatible with the given
  *		operator, but operating on its LHS and/or RHS datatype.
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index da7f52c..a6b91e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1727,6 +1727,9 @@ typedef struct NestLoopState
  *		NullInnerTupleSlot prepared null tuple for left outer joins
  *		OuterEContext	   workspace for computing outer tuple's join values
  *		InnerEContext	   workspace for computing inner tuple's join values
+ *		Ineq_Present	   true if the last merge clause is inequalty
+ *		Ineq_JoinEqual	   true if should join values that test "equal" on the
+ * 							inequality clause
  * ----------------
  */
 /* private in nodeMergejoin.c: */
@@ -1737,6 +1740,8 @@ typedef struct MergeJoinState
 	JoinState	js;				/* its first field is NodeTag */
 	int			mj_NumClauses;
 	MergeJoinClause mj_Clauses; /* array of length mj_NumClauses */
+	bool		mj_Ineq_Present;
+	bool		mj_Ineq_JoinEqual;
 	int			mj_JoinState;
 	bool		mj_SkipMarkRestore;
 	bool		mj_ExtraMarks;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..daad894 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -223,7 +223,8 @@ extern List *find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
 									 List *restrictinfos);
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
-								RelOptInfo *joinrel);
+								RelOptInfo *joinrel,
+								bool *have_inequality);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index e8684ad..f7d11af 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -75,6 +75,7 @@ extern bool get_ordering_op_properties(Oid opno,
 extern Oid	get_equality_op_for_ordering_op(Oid opno, bool *reverse);
 extern Oid	get_ordering_op_for_equality_op(Oid opno, bool use_lhs_type);
 extern List *get_btree_equality_opfamilies(Oid opno);
+extern List *get_btree_comparison_opfamilies(Oid opno);
 extern bool get_compatible_hash_operators(Oid opno,
 							  Oid *lhs_opno, Oid *rhs_opno);
 extern bool get_op_hash_functions(Oid opno,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index dc6262b..c862809 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1700,18 +1700,19 @@ SELECT '' AS "xxx", *
 -- Non-equi-joins
 --
 SELECT '' AS "xxx", *
-  FROM J1_TBL JOIN J2_TBL ON (J1_TBL.i <= J2_TBL.k);
+  FROM J1_TBL JOIN J2_TBL ON (J1_TBL.i <= J2_TBL.k)
+  ORDER BY J1_TBL.i, J2_TBL.k;
  xxx | i | j |   t   | i | k 
 -----+---+---+-------+---+---
-     | 1 | 4 | one   | 2 | 2
-     | 2 | 3 | two   | 2 | 2
+     | 0 |   | zero  |   | 0
      | 0 |   | zero  | 2 | 2
+     | 0 |   | zero  | 2 | 4
+     | 1 | 4 | one   | 2 | 2
      | 1 | 4 | one   | 2 | 4
+     | 2 | 3 | two   | 2 | 2
      | 2 | 3 | two   | 2 | 4
      | 3 | 2 | three | 2 | 4
      | 4 | 1 | four  | 2 | 4
-     | 0 |   | zero  | 2 | 4
-     | 0 |   | zero  |   | 0
 (9 rows)
 
 --
@@ -1846,6 +1847,171 @@ SELECT '' AS "xxx", *
 (1 row)
 
 --
+-- Full merge join
+--
+set enable_hashjoin to 0;
+-- simple
+select * from j1_tbl full join j2_tbl on j1_tbl.i < j2_tbl.k;
+ i | j |   t   | i | k  
+---+---+-------+---+----
+   | 0 | zero  |   |   
+   |   | null  |   |   
+   |   |       | 0 |   
+   |   |       |   |   
+ 8 | 8 | eight |   |   
+ 7 | 7 | seven |   |   
+ 6 | 6 | six   |   |   
+ 5 | 0 | five  |   |   
+ 4 | 1 | four  |   |   
+ 3 | 2 | three | 2 |  4
+ 2 | 3 | two   | 2 |  4
+ 1 | 4 | one   | 2 |  4
+ 1 | 4 | one   | 2 |  2
+ 0 |   | zero  | 2 |  4
+ 0 |   | zero  | 2 |  2
+   |   |       |   |  0
+   |   |       | 1 | -1
+   |   |       | 3 | -3
+   |   |       | 5 | -5
+   |   |       | 5 | -5
+(20 rows)
+
+-- multiple clauses
+select * from j1_tbl full join j2_tbl on j1_tbl.i > j2_tbl.i and j1_tbl.i = j2_tbl.k;
+ i | j |   t   | i | k  
+---+---+-------+---+----
+   |   |       | 5 | -5
+   |   |       | 5 | -5
+   |   |       | 3 | -3
+   |   |       | 1 | -1
+   |   |       |   |  0
+ 0 |   | zero  |   |   
+ 1 | 4 | one   |   |   
+ 2 | 3 | two   |   |   
+   |   |       | 2 |  2
+ 3 | 2 | three |   |   
+ 4 | 1 | four  | 2 |  4
+   |   |       | 0 |   
+   |   |       |   |   
+ 5 | 0 | five  |   |   
+ 6 | 6 | six   |   |   
+ 7 | 7 | seven |   |   
+ 8 | 8 | eight |   |   
+   |   | null  |   |   
+   | 0 | zero  |   |   
+(19 rows)
+
+-- multiple inequality clauses
+select * from j1_tbl full join j2_tbl on j1_tbl.i > j2_tbl.i and j1_tbl.i < j2_tbl.k;
+ERROR:  FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
+-- outer pathkeys for multiple inequality clauses
+explain (costs off)
+	select * from (select * from j1_tbl order by i) j1_tbl
+	full join j2_tbl
+	on j1_tbl.i > j2_tbl.i and j1_tbl.i > j2_tbl.k;
+ERROR:  FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
+-- suitable root pathkeys
+explain (costs off)
+	select * from j1_tbl join j2_tbl
+	on j1_tbl.i < j2_tbl.i and j1_tbl.i = j2_tbl.k
+	order by j2_tbl.k, j2_tbl.i;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Merge Join
+   Merge Cond: ((j2_tbl.k = j1_tbl.i) AND (j2_tbl.i > j1_tbl.i))
+   ->  Sort
+         Sort Key: j2_tbl.k, j2_tbl.i
+         ->  Seq Scan on j2_tbl
+   ->  Sort
+         Sort Key: j1_tbl.i
+         ->  Seq Scan on j1_tbl
+(8 rows)
+
+-- unsuitable root pathkeys
+explain (costs off)
+	select * from j1_tbl join j2_tbl
+	on j1_tbl.i > j2_tbl.i and j1_tbl.i = j2_tbl.k
+	order by j2_tbl.k, j2_tbl.i;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Sort
+   Sort Key: j1_tbl.i, j2_tbl.i
+   ->  Merge Join
+         Merge Cond: ((j1_tbl.i = j2_tbl.k) AND (j1_tbl.i > j2_tbl.i))
+         ->  Sort
+               Sort Key: j1_tbl.i
+               ->  Seq Scan on j1_tbl
+         ->  Sort
+               Sort Key: j2_tbl.k, j2_tbl.i
+               ->  Seq Scan on j2_tbl
+(10 rows)
+
+-- suitable outer pathkeys
+explain (costs off)
+	select * from j1_tbl
+	full join (select * from j2_tbl order by k, i) j2_tbl
+	on j1_tbl.i < j2_tbl.i and j1_tbl.i = j2_tbl.k;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Merge Full Join
+   Merge Cond: ((j2_tbl.k = j1_tbl.i) AND (j2_tbl.i > j1_tbl.i))
+   ->  Sort
+         Sort Key: j2_tbl.k, j2_tbl.i
+         ->  Seq Scan on j2_tbl
+   ->  Sort
+         Sort Key: j1_tbl.i
+         ->  Seq Scan on j1_tbl
+(8 rows)
+
+-- unsuitable outer pathkeys
+explain (costs off)
+	select * from j1_tbl
+	full join (select * from j2_tbl order by k, i) j2_tbl
+	on j1_tbl.i > j2_tbl.i and j1_tbl.i = j2_tbl.k;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Merge Full Join
+   Merge Cond: ((j1_tbl.i = j2_tbl.k) AND (j1_tbl.i > j2_tbl.i))
+   ->  Sort
+         Sort Key: j1_tbl.i
+         ->  Seq Scan on j1_tbl
+   ->  Materialize
+         ->  Sort
+               Sort Key: j2_tbl.k, j2_tbl.i
+               ->  Seq Scan on j2_tbl
+(9 rows)
+
+-- using an index
+set enable_seqscan to off;
+create index idx_j1_tbl_i on j1_tbl(i);
+analyze j1_tbl;
+explain (costs off) select * from j1_tbl full join j2_tbl on j1_tbl.i > j2_tbl.k;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Merge Full Join
+   Merge Cond: (j1_tbl.i > j2_tbl.k)
+   ->  Index Scan using idx_j1_tbl_i on j1_tbl
+   ->  Sort
+         Sort Key: j2_tbl.k
+         ->  Seq Scan on j2_tbl
+(6 rows)
+
+explain (costs off) select * from j1_tbl full join j2_tbl on j1_tbl.i < j2_tbl.k;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Merge Full Join
+   Merge Cond: (j1_tbl.i < j2_tbl.k)
+   ->  Index Scan Backward using idx_j1_tbl_i on j1_tbl
+   ->  Sort
+         Sort Key: j2_tbl.k DESC
+         ->  Seq Scan on j2_tbl
+(6 rows)
+
+drop index idx_j1_tbl_i;
+analyze j1_tbl;
+reset enable_seqscan;
+reset enable_hashjoin;
+--
 -- semijoin selectivity for <>
 --
 explain (costs off)
@@ -5265,43 +5431,51 @@ select c.*,a.*,ss1.q1,ss2.q1,ss3.* from
     lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
   ) on c.q2 = ss2.q1,
   lateral (select * from int4_tbl i where ss2.y > f1) ss3;
-                                               QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
- Nested Loop
+                                                  QUERY PLAN                                                   
+---------------------------------------------------------------------------------------------------------------
+ Merge Join
    Output: c.q1, c.q2, a.q1, a.q2, b.q1, d.q1, i.f1
-   Join Filter: ((COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2)) > i.f1)
-   ->  Hash Right Join
+   Merge Cond: (i.f1 < (COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2)))
+   ->  Sort
+         Output: i.f1
+         Sort Key: i.f1 DESC
+         ->  Seq Scan on public.int4_tbl i
+               Output: i.f1
+   ->  Sort
          Output: c.q1, c.q2, a.q1, a.q2, b.q1, d.q1, (COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2))
-         Hash Cond: (d.q1 = c.q2)
-         ->  Nested Loop
-               Output: a.q1, a.q2, b.q1, d.q1, (COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2))
-               ->  Hash Right Join
-                     Output: a.q1, a.q2, b.q1, (COALESCE(b.q2, (b2.f1)::bigint))
-                     Hash Cond: (b.q1 = a.q2)
-                     ->  Nested Loop
-                           Output: b.q1, COALESCE(b.q2, (b2.f1)::bigint)
-                           Join Filter: (b.q1 < b2.f1)
-                           ->  Seq Scan on public.int8_tbl b
-                                 Output: b.q1, b.q2
-                           ->  Materialize
-                                 Output: b2.f1
-                                 ->  Seq Scan on public.int4_tbl b2
-                                       Output: b2.f1
-                     ->  Hash
-                           Output: a.q1, a.q2
+         Sort Key: (COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2)) DESC
+         ->  Hash Right Join
+               Output: c.q1, c.q2, a.q1, a.q2, b.q1, d.q1, (COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2))
+               Hash Cond: (d.q1 = c.q2)
+               ->  Nested Loop
+                     Output: a.q1, a.q2, b.q1, d.q1, (COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2))
+                     ->  Hash Left Join
+                           Output: a.q1, a.q2, b.q1, (COALESCE(b.q2, (b2.f1)::bigint))
+                           Hash Cond: (a.q2 = b.q1)
                            ->  Seq Scan on public.int8_tbl a
                                  Output: a.q1, a.q2
-               ->  Seq Scan on public.int8_tbl d
-                     Output: d.q1, COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2)
-         ->  Hash
-               Output: c.q1, c.q2
-               ->  Seq Scan on public.int8_tbl c
+                           ->  Hash
+                                 Output: b.q1, (COALESCE(b.q2, (b2.f1)::bigint))
+                                 ->  Merge Join
+                                       Output: b.q1, COALESCE(b.q2, (b2.f1)::bigint)
+                                       Merge Cond: (b.q1 < b2.f1)
+                                       ->  Sort
+                                             Output: b.q1, b.q2
+                                             Sort Key: b.q1 DESC
+                                             ->  Seq Scan on public.int8_tbl b
+                                                   Output: b.q1, b.q2
+                                       ->  Sort
+                                             Output: b2.f1
+                                             Sort Key: b2.f1 DESC
+                                             ->  Seq Scan on public.int4_tbl b2
+                                                   Output: b2.f1
+                     ->  Seq Scan on public.int8_tbl d
+                           Output: d.q1, COALESCE((COALESCE(b.q2, (b2.f1)::bigint)), d.q2)
+               ->  Hash
                      Output: c.q1, c.q2
-   ->  Materialize
-         Output: i.f1
-         ->  Seq Scan on public.int4_tbl i
-               Output: i.f1
-(34 rows)
+                     ->  Seq Scan on public.int8_tbl c
+                           Output: c.q1, c.q2
+(42 rows)
 
 -- check processing of postponed quals (bug #9041)
 explain (verbose, costs off)
@@ -5608,6 +5782,7 @@ rollback;
 --
 -- test planner's ability to mark joins as unique
 --
+set enable_mergejoin to 0;
 create table j1 (id int primary key);
 create table j2 (id int primary key);
 create table j3 (id int);
@@ -5877,6 +6052,7 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
 set enable_nestloop to 0;
 set enable_hashjoin to 0;
 set enable_sort to 0;
+set enable_mergejoin to 1;
 -- create an index that will be preferred over the PK to perform the join
 create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
 explain (costs off) select * from j1 j1
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 8b3798e..fef7537 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -4,6 +4,8 @@
 --
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join to true;
+-- Disable merge joins to get predictable plans
+SET enable_mergejoin TO off;
 --
 -- partitioned by a single column
 --
@@ -862,6 +864,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
 -- test merge joins
 SET enable_hashjoin TO off;
 SET enable_nestloop TO off;
+SET enable_mergejoin TO on;
 EXPLAIN (COSTS OFF)
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
                            QUERY PLAN                           
@@ -1044,6 +1047,7 @@ SELECT t1.a, t2.b FROM (SELECT * FROM prt1 WHERE a < 450) t1 LEFT JOIN (SELECT *
 
 RESET enable_hashjoin;
 RESET enable_nestloop;
+SET enable_mergejoin TO off;
 --
 -- partitioned by multiple columns
 --
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d3ba2a1..2e45038 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -157,7 +157,8 @@ SELECT '' AS "xxx", *
 --
 
 SELECT '' AS "xxx", *
-  FROM J1_TBL JOIN J2_TBL ON (J1_TBL.i <= J2_TBL.k);
+  FROM J1_TBL JOIN J2_TBL ON (J1_TBL.i <= J2_TBL.k)
+  ORDER BY J1_TBL.i, J2_TBL.k;
 
 
 --
@@ -194,6 +195,66 @@ SELECT '' AS "xxx", *
   FROM J1_TBL LEFT JOIN J2_TBL USING (i) WHERE (i = 1);
 
 --
+-- Full merge join
+--
+
+set enable_hashjoin to 0;
+
+-- simple
+select * from j1_tbl full join j2_tbl on j1_tbl.i < j2_tbl.k;
+
+-- multiple clauses
+select * from j1_tbl full join j2_tbl on j1_tbl.i > j2_tbl.i and j1_tbl.i = j2_tbl.k;
+
+-- multiple inequality clauses
+select * from j1_tbl full join j2_tbl on j1_tbl.i > j2_tbl.i and j1_tbl.i < j2_tbl.k;
+
+-- outer pathkeys for multiple inequality clauses
+explain (costs off)
+	select * from (select * from j1_tbl order by i) j1_tbl
+	full join j2_tbl
+	on j1_tbl.i > j2_tbl.i and j1_tbl.i > j2_tbl.k;
+
+-- suitable root pathkeys
+explain (costs off)
+	select * from j1_tbl join j2_tbl
+	on j1_tbl.i < j2_tbl.i and j1_tbl.i = j2_tbl.k
+	order by j2_tbl.k, j2_tbl.i;
+
+-- unsuitable root pathkeys
+explain (costs off)
+	select * from j1_tbl join j2_tbl
+	on j1_tbl.i > j2_tbl.i and j1_tbl.i = j2_tbl.k
+	order by j2_tbl.k, j2_tbl.i;
+
+-- suitable outer pathkeys
+explain (costs off)
+	select * from j1_tbl
+	full join (select * from j2_tbl order by k, i) j2_tbl
+	on j1_tbl.i < j2_tbl.i and j1_tbl.i = j2_tbl.k;
+
+-- unsuitable outer pathkeys
+explain (costs off)
+	select * from j1_tbl
+	full join (select * from j2_tbl order by k, i) j2_tbl
+	on j1_tbl.i > j2_tbl.i and j1_tbl.i = j2_tbl.k;
+
+-- using an index
+set enable_seqscan to off;
+create index idx_j1_tbl_i on j1_tbl(i);
+analyze j1_tbl;
+
+explain (costs off) select * from j1_tbl full join j2_tbl on j1_tbl.i > j2_tbl.k;
+explain (costs off) select * from j1_tbl full join j2_tbl on j1_tbl.i < j2_tbl.k;
+
+drop index idx_j1_tbl_i;
+analyze j1_tbl;
+
+reset enable_seqscan;
+
+reset enable_hashjoin;
+
+--
 -- semijoin selectivity for <>
 --
 explain (costs off)
@@ -1874,6 +1935,8 @@ rollback;
 -- test planner's ability to mark joins as unique
 --
 
+set enable_mergejoin to 0;
+
 create table j1 (id int primary key);
 create table j2 (id int primary key);
 create table j3 (id int);
@@ -1974,6 +2037,7 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
 set enable_nestloop to 0;
 set enable_hashjoin to 0;
 set enable_sort to 0;
+set enable_mergejoin to 1;
 
 -- create an index that will be preferred over the PK to perform the join
 create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 5d5de59..323b1c7 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -6,6 +6,9 @@
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join to true;
 
+-- Disable merge joins to get predictable plans
+SET enable_mergejoin TO off;
+
 --
 -- partitioned by a single column
 --
@@ -146,6 +149,7 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
 -- test merge joins
 SET enable_hashjoin TO off;
 SET enable_nestloop TO off;
+SET enable_mergejoin TO on;
 
 EXPLAIN (COSTS OFF)
 SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
@@ -162,6 +166,7 @@ SELECT t1.a, t2.b FROM (SELECT * FROM prt1 WHERE a < 450) t1 LEFT JOIN (SELECT *
 
 RESET enable_hashjoin;
 RESET enable_nestloop;
+SET enable_mergejoin TO off;
 
 --
 -- partitioned by multiple columns
-- 
2.7.4

>From 0c947653325033a08b4065a3667eab76d40d0bcf Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmen...@postgrespro.ru>
Date: Tue, 10 Apr 2018 12:29:47 +0300
Subject: [PATCH 1/2] Preparatory refactoring.

Separate the concepts of "mergejoinable clause" and "equivalence clause". The
former are used to perform merge joins, and the latter -- to build the
equivalence classes. Previously, the only mergejoinable clauses were equality
ones, marked by non-NIL `mergeopfamilies` list. Now that we are going to support
merge joins on comparison clauses, `mergeopfamilies` is set for both equality
and comparison clauses, and in addition to that, the equality clauses have the
flag `is_mj_equality` set to true.

Also rename some things in nodeMergejoin.c to better reflect their purpose.
---
 src/backend/executor/nodeMergejoin.c      | 113 +++++++++++++++++-------------
 src/backend/nodes/copyfuncs.c             |   1 +
 src/backend/nodes/outfuncs.c              |   1 +
 src/backend/optimizer/path/equivclass.c   |  11 +--
 src/backend/optimizer/path/indxpath.c     |   4 +-
 src/backend/optimizer/path/joinrels.c     |   2 +-
 src/backend/optimizer/path/pathkeys.c     |   2 +-
 src/backend/optimizer/plan/analyzejoins.c |  28 ++++----
 src/backend/optimizer/plan/initsplan.c    |  21 ++++--
 src/backend/optimizer/util/restrictinfo.c |   1 +
 src/backend/utils/cache/lsyscache.c       |  20 +++---
 src/include/executor/execdebug.h          |   2 +-
 src/include/nodes/relation.h              |  10 ++-
 src/include/utils/lsyscache.h             |   4 +-
 14 files changed, 125 insertions(+), 95 deletions(-)

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 5e52b90..7298e1c 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -41,7 +41,7 @@
  *
  *		Therefore, rather than directly executing the merge join clauses,
  *		we evaluate the left and right key expressions separately and then
- *		compare the columns one at a time (see MJCompare).  The planner
+ *		compare the columns one at a time (see MJTestTuples).  The planner
  *		passes us enough information about the sort ordering of the inputs
  *		to allow us to determine how to make the comparison.  We may use the
  *		appropriate btree comparison function, since Postgres' only notion
@@ -172,31 +172,31 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause)
+		palloc0(nClauses * sizeof(MergeJoinClauseData));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = &clauses[iClause];
+		MergeJoinClause clause = &parent->mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,26 +207,28 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args),
+									 (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args),
+									 (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else					/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-								   &op_strategy,
+								   &join_strategy,
 								   &op_lefttype,
 								   &op_righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
+		if (join_strategy != BTEqualStrategyNumber)	/* should not happen */
 			elog(ERROR, "cannot merge using non-equality operator %u",
 				 qual->opno);
 
@@ -265,8 +267,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,20 +378,27 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	return result;
 }
 
+/* Tuple test result */
+typedef enum
+{
+	MJCR_NextInner,
+	MJCR_NextOuter,
+	MJCR_Join
+} MJTestResult;
+
 /*
- * MJCompare
+ * MJTestTuples
  *
- * Compare the mergejoinable values of the current two input tuples
- * and return 0 if they are equal (ie, the mergejoin equalities all
- * succeed), >0 if outer > inner, <0 if outer < inner.
+ * Decide whether to join current inner and outer tuples, or to advance
+ * either pointer.
  *
  * MJEvalOuterValues and MJEvalInnerValues must already have been called
  * for the current outer and inner tuples, respectively.
  */
-static int
-MJCompare(MergeJoinState *mergestate)
+static MJTestResult
+MJTestTuples(MergeJoinState *mergestate)
 {
-	int			result = 0;
+	MJTestResult result = MJCR_Join;
 	bool		nulleqnull = false;
 	ExprContext *econtext = mergestate->js.ps.ps_ExprContext;
 	int			i;
@@ -408,6 +415,7 @@ MJCompare(MergeJoinState *mergestate)
 	for (i = 0; i < mergestate->mj_NumClauses; i++)
 	{
 		MergeJoinClause clause = &mergestate->mj_Clauses[i];
+		int			sort_result;
 
 		/*
 		 * Special case for NULL-vs-NULL, else use standard comparison.
@@ -418,11 +426,14 @@ MJCompare(MergeJoinState *mergestate)
 			continue;
 		}
 
-		result = ApplySortComparator(clause->ldatum, clause->lisnull,
-									 clause->rdatum, clause->risnull,
-									 &clause->ssup);
+		sort_result = ApplySortComparator(clause->ldatum, clause->lisnull,
+										  clause->rdatum, clause->risnull,
+										  &clause->ssup);
+
+		result = sort_result == 0 ? MJCR_Join
+					: sort_result < 0 ? MJCR_NextOuter : MJCR_NextInner;
 
-		if (result != 0)
+		if (result != MJCR_Join)
 			break;
 	}
 
@@ -435,9 +446,9 @@ MJCompare(MergeJoinState *mergestate)
 	 * equality.  We have to check this as part of the mergequals, else the
 	 * rescan logic will do the wrong thing.
 	 */
-	if (result == 0 &&
+	if (result == MJCR_Join &&
 		(nulleqnull || mergestate->mj_ConstFalseJoin))
-		result = 1;
+		result = MJCR_NextInner;
 
 	MemoryContextSwitchTo(oldContext);
 
@@ -603,7 +614,7 @@ ExecMergeJoin(PlanState *pstate)
 	ExprState  *joinqual;
 	ExprState  *otherqual;
 	bool		qualResult;
-	int			compareResult;
+	MJTestResult testResult;
 	PlanState  *innerPlan;
 	TupleTableSlot *innerTupleSlot;
 	PlanState  *outerPlan;
@@ -888,14 +899,14 @@ ExecMergeJoin(PlanState *pstate)
 						 * If they do not match then advance to next outer
 						 * tuple.
 						 */
-						compareResult = MJCompare(node);
-						MJ_DEBUG_COMPARE(compareResult);
+						testResult = MJTestTuples(node);
+						MJ_DEBUG_COMPARE(testResult);
 
-						if (compareResult == 0)
+						if (testResult == MJCR_Join)
 							node->mj_JoinState = EXEC_MJ_JOINTUPLES;
 						else
 						{
-							Assert(compareResult < 0);
+							Assert(testResult == MJCR_NextOuter);
 							node->mj_JoinState = EXEC_MJ_NEXTOUTER;
 						}
 						break;
@@ -1045,10 +1056,10 @@ ExecMergeJoin(PlanState *pstate)
 				innerTupleSlot = node->mj_MarkedTupleSlot;
 				(void) MJEvalInnerValues(node, innerTupleSlot);
 
-				compareResult = MJCompare(node);
-				MJ_DEBUG_COMPARE(compareResult);
+				testResult = MJTestTuples(node);
+				MJ_DEBUG_COMPARE(testResult);
 
-				if (compareResult == 0)
+				if (testResult == MJCR_Join)
 				{
 					/*
 					 * the merge clause matched so now we restore the inner
@@ -1106,7 +1117,7 @@ ExecMergeJoin(PlanState *pstate)
 					 *	no more inners, no more matches are possible.
 					 * ----------------
 					 */
-					Assert(compareResult > 0);
+					Assert(testResult == MJCR_NextInner);
 					innerTupleSlot = node->mj_InnerTupleSlot;
 
 					/* reload comparison data for current inner */
@@ -1179,10 +1190,10 @@ ExecMergeJoin(PlanState *pstate)
 				 * satisfy the mergeclauses.  If they do, then we update the
 				 * marked tuple position and go join them.
 				 */
-				compareResult = MJCompare(node);
-				MJ_DEBUG_COMPARE(compareResult);
+				testResult = MJTestTuples(node);
+				MJ_DEBUG_COMPARE(testResult);
 
-				if (compareResult == 0)
+				if (testResult == MJCR_Join)
 				{
 					if (!node->mj_SkipMarkRestore)
 						ExecMarkPos(innerPlan);
@@ -1191,11 +1202,13 @@ ExecMergeJoin(PlanState *pstate)
 
 					node->mj_JoinState = EXEC_MJ_JOINTUPLES;
 				}
-				else if (compareResult < 0)
+				else if (testResult == MJCR_NextOuter)
 					node->mj_JoinState = EXEC_MJ_SKIPOUTER_ADVANCE;
 				else
-					/* compareResult > 0 */
+				{
+					Assert(testResult == MJCR_NextInner);
 					node->mj_JoinState = EXEC_MJ_SKIPINNER_ADVANCE;
+				}
 				break;
 
 				/*
@@ -1593,12 +1606,12 @@ ExecInitMergeJoin(MergeJoin *node, EState *estate, int eflags)
 	 * preprocess the merge clauses
 	 */
 	mergestate->mj_NumClauses = list_length(node->mergeclauses);
-	mergestate->mj_Clauses = MJExamineQuals(node->mergeclauses,
-											node->mergeFamilies,
-											node->mergeCollations,
-											node->mergeStrategies,
-											node->mergeNullsFirst,
-											(PlanState *) mergestate);
+	MJExamineQuals(node->mergeclauses,
+				   node->mergeFamilies,
+				   node->mergeCollations,
+				   node->mergeStrategies,
+				   node->mergeNullsFirst,
+				   mergestate);
 
 	/*
 	 * initialize join state
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1c12075..24e924a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2239,6 +2239,7 @@ _copyRestrictInfo(const RestrictInfo *from)
 	COPY_SCALAR_FIELD(norm_selec);
 	COPY_SCALAR_FIELD(outer_selec);
 	COPY_NODE_FIELD(mergeopfamilies);
+	COPY_SCALAR_FIELD(is_mj_equality);
 	/* EquivalenceClasses are never copied, so shallow-copy the pointers */
 	COPY_SCALAR_FIELD(left_ec);
 	COPY_SCALAR_FIELD(right_ec);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 979d523..fbb203c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2534,6 +2534,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
 	WRITE_FLOAT_FIELD(norm_selec, "%.4f");
 	WRITE_FLOAT_FIELD(outer_selec, "%.4f");
 	WRITE_NODE_FIELD(mergeopfamilies);
+	WRITE_BOOL_FIELD(is_mj_equality);
 	/* don't write left_ec, leads to infinite recursion in plan tree dump */
 	/* don't write right_ec, leads to infinite recursion in plan tree dump */
 	WRITE_NODE_FIELD(left_em);
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index b22b36e..75c8074 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -68,9 +68,9 @@ static bool reconsider_full_join_clause(PlannerInfo *root,
 
 /*
  * process_equivalence
- *	  The given clause has a mergejoinable operator and can be applied without
- *	  any delay by an outer join, so its two sides can be considered equal
- *	  anywhere they are both computable; moreover that equality can be
+ *	  The given clause has a mergejoinable equality operator and can be applied
+ * 	  without any delay by an outer join, so its two sides can be considered
+ * 	  equal anywhere they are both computable; moreover that equality can be
  *	  extended transitively.  Record this knowledge in the EquivalenceClass
  *	  data structure, if applicable.  Returns true if successful, false if not
  *	  (in which case caller should treat the clause as ordinary, not an
@@ -233,6 +233,7 @@ process_equivalence(PlannerInfo *root,
 	op_input_types(opno, &item1_type, &item2_type);
 
 	opfamilies = restrictinfo->mergeopfamilies;
+	Assert(restrictinfo->is_mj_equality);
 
 	/*
 	 * Sweep through the existing EquivalenceClasses looking for matches to
@@ -273,7 +274,7 @@ process_equivalence(PlannerInfo *root,
 		/*
 		 * A "match" requires matching sets of btree opfamilies.  Use of
 		 * equal() for this test has implications discussed in the comments
-		 * for get_mergejoin_opfamilies().
+		 * for get_btree_equality_opfamilies().
 		 */
 		if (!equal(opfamilies, cur_ec->ec_opfamilies))
 			continue;
@@ -2082,7 +2083,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 				 * to test for member matches first.
 				 */
 				if (opfamilies == NIL)	/* compute if we didn't already */
-					opfamilies = get_mergejoin_opfamilies(eqop);
+					opfamilies = get_btree_equality_opfamilies(eqop);
 				if (equal(opfamilies, ec->ec_opfamilies))
 					return ec;
 				/* Otherwise, done with this EC, move on to the next */
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..3e05b45 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3008,8 +3008,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 		 * mergeopfamilies will be if it has a mergejoinable operator and
 		 * doesn't contain volatile functions.
 		 */
-		if (restrictinfo->mergeopfamilies == NIL)
-			continue;			/* not mergejoinable */
+		if (!restrictinfo->is_mj_equality)
+			continue;			/* not a mergejoinable equality */
 
 		/*
 		 * The clause certainly doesn't refer to anything but the given rel.
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7008e13..dc9f4c3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1452,7 +1452,7 @@ have_partkey_equi_join(RelOptInfo *joinrel,
 			continue;
 
 		/* Skip clauses which are not equality conditions. */
-		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
+		if (!rinfo->is_mj_equality && !OidIsValid(rinfo->hashjoinoperator))
 			continue;
 
 		opexpr = (OpExpr *) rinfo->clause;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ec66cb9..0fa6f91 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -199,7 +199,7 @@ make_pathkey_from_sortinfo(PlannerInfo *root,
 	if (!OidIsValid(equality_op))	/* shouldn't happen */
 		elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
 			 BTEqualStrategyNumber, opcintype, opcintype, opfamily);
-	opfamilies = get_mergejoin_opfamilies(equality_op);
+	opfamilies = get_btree_equality_opfamilies(equality_op);
 	if (!opfamilies)			/* certainly should find some */
 		elog(ERROR, "could not find opfamilies for equality operator %u",
 			 equality_op);
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..1fcfb2c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -238,11 +238,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	}
 
 	/*
-	 * Search for mergejoinable clauses that constrain the inner rel against
-	 * either the outer rel or a pseudoconstant.  If an operator is
-	 * mergejoinable then it behaves like equality for some btree opclass, so
-	 * it's what we want.  The mergejoinability test also eliminates clauses
-	 * containing volatile functions, which we couldn't depend on.
+	 * Search for mergejoinable equality clauses that constrain the inner rel
+	 * against either the outer rel or a pseudoconstant. Mergejoinable equality
+	 * clauses are based on equality operators for some btree opclass, and don't
+	 * contain volatile functions, so it's what we want.
 	 */
 	foreach(l, innerrel->joininfo)
 	{
@@ -267,10 +266,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 			continue;			/* else, ignore; not useful here */
 		}
 
-		/* Ignore if it's not a mergejoinable clause */
+		/* Ignore if it's not a mergejoinable equality clause */
 		if (!restrictinfo->can_join ||
-			restrictinfo->mergeopfamilies == NIL)
-			continue;			/* not mergejoinable */
+			!restrictinfo->is_mj_equality)
+			continue;
 
 		/*
 		 * Check if clause has the form "outer op inner" or "inner op outer",
@@ -1087,11 +1086,10 @@ is_innerrel_unique_for(PlannerInfo *root,
 	ListCell   *lc;
 
 	/*
-	 * Search for mergejoinable clauses that constrain the inner rel against
-	 * the outer rel.  If an operator is mergejoinable then it behaves like
-	 * equality for some btree opclass, so it's what we want.  The
-	 * mergejoinability test also eliminates clauses containing volatile
-	 * functions, which we couldn't depend on.
+	 * Search for mergejoinable equality clauses that constrain the inner rel
+	 * against either the outer rel. Mergejoinable equality clauses are based
+	 * on equality operators for some btree opclass, and don't contain volatile
+	 * functions, so it's what we want.
 	 */
 	foreach(lc, restrictlist)
 	{
@@ -1105,9 +1103,9 @@ is_innerrel_unique_for(PlannerInfo *root,
 			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
 			continue;
 
-		/* Ignore if it's not a mergejoinable clause */
+		/* Ignore if it's not a mergejoinable equality clause */
 		if (!restrictinfo->can_join ||
-			restrictinfo->mergeopfamilies == NIL)
+			!restrictinfo->is_mj_equality)
 			continue;			/* not mergejoinable */
 
 		/*
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01335db..e7aa058 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -1552,8 +1552,8 @@ compute_semijoin_info(SpecialJoinInfo *sjinfo, List *clause)
 		if (all_btree)
 		{
 			/* oprcanmerge is considered a hint... */
-			if (!op_mergejoinable(opno, opinputtype) ||
-				get_mergejoin_opfamilies(opno) == NIL)
+			if (!op_mergejoinable_equality(opno, opinputtype) ||
+				get_btree_equality_opfamilies(opno) == NIL)
 				all_btree = false;
 		}
 		if (all_hash)
@@ -1964,15 +1964,17 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
 	 * process_equivalence is successful, it will take care of that;
 	 * otherwise, we have to call initialize_mergeclause_eclasses to do it.
 	 */
-	if (restrictinfo->mergeopfamilies)
+	if (restrictinfo->is_mj_equality)
 	{
+		Assert(restrictinfo->mergeopfamilies != NIL);
+
 		if (maybe_equivalence)
 		{
 			if (check_equivalence_delay(root, restrictinfo) &&
 				process_equivalence(root, &restrictinfo, below_outer_join))
 				return;
 			/* EC rejected it, so set left_ec/right_ec the hard way ... */
-			if (restrictinfo->mergeopfamilies)	/* EC might have changed this */
+			if (restrictinfo->is_mj_equality)	/* EC might have changed this */
 				initialize_mergeclause_eclasses(root, restrictinfo);
 			/* ... and fall through to distribute_restrictinfo_to_rels */
 		}
@@ -2621,9 +2623,14 @@ check_mergejoinable(RestrictInfo *restrictinfo)
 	opno = ((OpExpr *) clause)->opno;
 	leftarg = linitial(((OpExpr *) clause)->args);
 
-	if (op_mergejoinable(opno, exprType(leftarg)) &&
-		!contain_volatile_functions((Node *) clause))
-		restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno);
+	if (!contain_volatile_functions((Node *) clause))
+	{
+		if (op_mergejoinable_equality(opno, exprType(leftarg)))
+		{
+			restrictinfo->mergeopfamilies = get_btree_equality_opfamilies(opno);
+			restrictinfo->is_mj_equality = true;
+		}
+	}
 
 	/*
 	 * Note: op_mergejoinable is just a hint; if we fail to find the operator
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index edf5a48..a928d6c 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -186,6 +186,7 @@ make_restrictinfo_internal(Expr *clause,
 	restrictinfo->outer_selec = -1;
 
 	restrictinfo->mergeopfamilies = NIL;
+	restrictinfo->is_mj_equality = false;
 
 	restrictinfo->left_ec = NULL;
 	restrictinfo->right_ec = NULL;
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index bba595a..a8a175c 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -341,9 +341,9 @@ get_ordering_op_for_equality_op(Oid opno, bool use_lhs_type)
 }
 
 /*
- * get_mergejoin_opfamilies
- *		Given a putatively mergejoinable operator, return a list of the OIDs
- *		of the btree opfamilies in which it represents equality.
+ * get_btree_equality_opfamilies
+ *		Given an operator, return a list of the OIDs of the btree opfamilies
+ * 		in which it represents equality.
  *
  * It is possible (though at present unusual) for an operator to be equality
  * in more than one opfamily, hence the result is a list.  This also lets us
@@ -360,7 +360,7 @@ get_ordering_op_for_equality_op(Oid opno, bool use_lhs_type)
  * or cycles here to guarantee the ordering in that case.
  */
 List *
-get_mergejoin_opfamilies(Oid opno)
+get_btree_equality_opfamilies(Oid opno)
 {
 	List	   *result = NIL;
 	CatCList   *catlist;
@@ -1164,11 +1164,11 @@ op_input_types(Oid opno, Oid *lefttype, Oid *righttype)
 }
 
 /*
- * op_mergejoinable
+ * op_mergejoinable_equality
  *
- * Returns true if the operator is potentially mergejoinable.  (The planner
- * will fail to find any mergejoin plans unless there are suitable btree
- * opfamily entries for this operator and associated sortops.  The pg_operator
+ * Returns true if the operator is a potentially mergejoinable equality operator.
+ * (The planner will fail to find any mergejoin plans unless there are suitable
+ * btree opfamily entries for this operator and associated sortops. The pg_operator
  * flag is just a hint to tell the planner whether to bother looking.)
  *
  * In some cases (currently only array_eq and record_eq), mergejoinability
@@ -1177,7 +1177,7 @@ op_input_types(Oid opno, Oid *lefttype, Oid *righttype)
  * is needed to check this --- by convention, pass the left input's data type.
  */
 bool
-op_mergejoinable(Oid opno, Oid inputtype)
+op_mergejoinable_equality(Oid opno, Oid inputtype)
 {
 	bool		result = false;
 	HeapTuple	tp;
@@ -1234,7 +1234,7 @@ op_hashjoinable(Oid opno, Oid inputtype)
 	HeapTuple	tp;
 	TypeCacheEntry *typentry;
 
-	/* As in op_mergejoinable, let the typcache handle the hard cases */
+	/* As in op_mergejoinable_equality, let the typcache handle the hard cases */
 	/* Eventually we'll need a similar case for record_eq ... */
 	if (opno == ARRAY_EQ_OP)
 	{
diff --git a/src/include/executor/execdebug.h b/src/include/executor/execdebug.h
index 236b2cc..2d5760e 100644
--- a/src/include/executor/execdebug.h
+++ b/src/include/executor/execdebug.h
@@ -105,7 +105,7 @@
 #define MJ_debugtup(slot)				debugtup(slot, NULL)
 #define MJ_dump(state)					ExecMergeTupleDump(state)
 #define MJ_DEBUG_COMPARE(res) \
-  MJ1_printf("  MJCompare() returns %d\n", (res))
+  MJ1_printf("  MJTestTuples() returns %d\n", (res))
 #define MJ_DEBUG_QUAL(clause, res) \
   MJ2_printf("  ExecQual(%s, econtext) returns %s\n", \
 			 CppAsString(clause), T_OR_F(res))
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7cae3fc..5f49c7b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1931,8 +1931,16 @@ typedef struct RestrictInfo
 	Selectivity outer_selec;	/* selectivity for outer join semantics; -1 if
 								 * not yet set */
 
-	/* valid if clause is mergejoinable, else NIL */
+	/*
+	 * The following two fields are used for clauses on which it is possible to
+	 * perform a merge join.
+	 * If mergeopfamilies is not NIL, the clause is mergejoinable. Its operator
+	 * may be either equality or comparison in some btree opfamilies. These
+	 * opfamilies are stored in mergeopfamilies, and for equality clauses,
+	 * is_mj_equality is set to true.
+	 */
 	List	   *mergeopfamilies;	/* opfamilies containing clause operator */
+	bool		is_mj_equality;		/* is this a mergejoinable equality clause? */
 
 	/* cache space for mergeclause processing; NULL if not yet set */
 	EquivalenceClass *left_ec;	/* EquivalenceClass containing lefthand */
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index e55ea40..e8684ad 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -74,7 +74,7 @@ extern bool get_ordering_op_properties(Oid opno,
 						   Oid *opfamily, Oid *opcintype, int16 *strategy);
 extern Oid	get_equality_op_for_ordering_op(Oid opno, bool *reverse);
 extern Oid	get_ordering_op_for_equality_op(Oid opno, bool use_lhs_type);
-extern List *get_mergejoin_opfamilies(Oid opno);
+extern List *get_btree_equality_opfamilies(Oid opno);
 extern bool get_compatible_hash_operators(Oid opno,
 							  Oid *lhs_opno, Oid *rhs_opno);
 extern bool get_op_hash_functions(Oid opno,
@@ -99,7 +99,7 @@ extern RegProcedure get_opcode(Oid opno);
 extern char *get_opname(Oid opno);
 extern Oid	get_op_rettype(Oid opno);
 extern void op_input_types(Oid opno, Oid *lefttype, Oid *righttype);
-extern bool op_mergejoinable(Oid opno, Oid inputtype);
+extern bool op_mergejoinable_equality(Oid opno, Oid inputtype);
 extern bool op_hashjoinable(Oid opno, Oid inputtype);
 extern bool op_strict(Oid opno);
 extern char op_volatile(Oid opno);
-- 
2.7.4

Re: [HACKERS] PoC: full merge join on comparison clause

Reply via email to