Re: [HACKERS] PoC: full merge join on comparison clause
On 11/19/18 04:46, Tom Lane wrote: In short, proceeding like the above when we can't find another plan type for a full join seems like it fixes a far wider variety of cases. The possibility that maybe we could do some of those cases a bit faster isn't sufficiently attractive to me to justify also putting in a mechanism like this patch proposes. We only rarely see complaints at all about can't-do-a-full-join problems, and I do not think this patch would fix enough of those complaints to be worthwhile. I agree, the automated UNION substitutions seems to be a better approach. I'll mark this patch as rejected then. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: full merge join on comparison clause
Alexander Kuzmenkov writes: > [ Inequality-merge-join-v10.patch ] Just thinking about this patch a bit ... I wonder why you were so quick to reject the UNION approach at the outset. This patch is pretty messy, and it complicates a lot of stuff that is quite fundamental to the planner, and you still end up that the only functionality gain is now we can handle full joins whose conditions include a single btree inequality clause. Nor are we doing that remarkably efficiently ... it's pretty much impossible to do it efficiently, in fact, since if the inputs have M and N rows respectively then the output will have something like (M*N)/2 rows. So it seems to me that if we're going to put sweat into this area at all, our ambition ought to be "we'll successfully perform a FULL JOIN with any join clause whatsoever, though it might take O(M*N) time". Now as far as I can tell, the UNION substitution you proposed is completely valid, although it'd be better to phrase the second step as an antijoin. That is, I believe select * from t1 full join t2 on (anything) is exactly equal to select t1.*, t2.* from t1 left join t2 on (anything) union all select t1.*, t2.* from t2 anti join t1 on (anything) There is one fly in the ointment, which is that we will have to run the join clause twice, so it can't contain volatile functions --- but the merge join approach wouldn't handle that case either. Having to read the inputs twice is not good, but we could put them into CTEs, which fixes any problems with volatility below the join and at least alleviates the performance problem. Since we can't currently do any meaningful qual pushdown through full joins, the optimization-fence aspect of a CTE doesn't seem like an issue either. In short, proceeding like the above when we can't find another plan type for a full join seems like it fixes a far wider variety of cases. The possibility that maybe we could do some of those cases a bit faster isn't sufficiently attractive to me to justify also putting in a mechanism like this patch proposes. We only rarely see complaints at all about can't-do-a-full-join problems, and I do not think this patch would fix enough of those complaints to be worthwhile. regards, tom lane
Re: [HACKERS] PoC: full merge join on comparison clause
El 18/07/18 a las 16:58, Ashutosh Bapat escribió: Thanks for the commit messages. I would use word "in-equality" instead of "comparison" since equality is also a comparison. Fixed. Comparing this with the original code, I think, is_mj_equality should be true if restrictinfo->mergeopfamilies is not NIL. My mistake, fixed. With this work the meaning of oprcanmerge (See pg_operator catalog and also CREATE OPERATOR syntax) changes. Every btree operator can now be used to perform a merge join. oprcanmerge however only indicates whether an operator is an equality or not. Have you thought about that? Do we require to re-define oprcanmerge? For now we can test with old oprcanmerge meaning, not to bump the catalog version. Merge join needs only BTORDER_PROC function, which is required for btree opfamilies. This means that it should be always possible to merge join on operators that correspond to standard btree strategies. We could set oprcanmerge to true for all built-in btree comparison operators, and leave the possibility to disable it for custom operators. I think, it should be possible to use this technique with more than one inequality clauses as long as all the operators require the input to be ordered in the same direction and the clauses are ANDed. In that case the for a given outer tuple the matching inner tuples form a contiguous interval. Consider a table "t(a int, b int)", the value of each column can be 1, 2, 3, 4 and the table contains all possible combinations. If merge condition is "a < 2 and b < 2", for each of the four possible sorting directions, the result set won't be contiguous. Generally speaking, this happens when we have several groups with the same value of first column, and the first column matches the join condition. But inside each group, for some rows the second column doesn't match. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From c817ac1f93b83bcf43afac4af2dbaed37403a4a2 Mon Sep 17 00:00:00 2001 From: Alexander Kuzmenkov Date: Tue, 10 Apr 2018 12:31:21 +0300 Subject: [PATCH 2/2] Inequality merge join. Perform merge joins on inequality clause. The current merge join algorithm requires minimal modification to support one inequality clause at the final position. This has performance benefits in some cases, and also allows to perform full joins on inequality, which was not possible before. This commit modifies the merge join path generation logic and cost functions to account for inequality clauses, and adds some tests. --- src/backend/executor/nodeMergejoin.c | 136 +-- src/backend/optimizer/path/costsize.c| 27 ++- src/backend/optimizer/path/joinpath.c| 27 ++- src/backend/optimizer/path/pathkeys.c| 218 --- src/backend/optimizer/plan/initsplan.c | 28 ++- src/backend/utils/adt/selfuncs.c | 40 +++-- src/backend/utils/cache/lsyscache.c | 40 + src/include/nodes/execnodes.h| 5 + src/include/optimizer/paths.h| 3 +- src/include/utils/lsyscache.h| 1 + src/test/regress/expected/join.out | 250 +++ src/test/regress/expected/partition_join.out | 4 + src/test/regress/sql/join.sql| 66 ++- src/test/regress/sql/partition_join.sql | 5 + 14 files changed, 750 insertions(+), 100 deletions(-) diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index 7298e1c..d6e5556 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -89,6 +89,45 @@ * proceed to another state. This state is stored in the node's * execution state information and is preserved across calls to * ExecMergeJoin. -cim 10/31/89 + * + * This algorithm can work almost as-is when the last join clause + * is an inequality. This introduces an additional restriction to + * the ordering of the inputs: when moving to the next outer tuple, + * the beginning of the matching interval of inner tuples must not + * change. For example, if the join operator is ">=", inputs must + * be in ascending order. + * + * Consider this example: + * outer >= innner + * 1 0 - first match for outer = 1, 2 + * 2 1 - last match for outer = 1 + * 2 - last match for outer = 2 + * + * And if the inputs were sorted in descending order: + * outer >= inner + * 2 2 - first match for outer = 2 + * 1 1 - first match for outer = 1 + * 0 - last match for outer = 1, 2 + * + * It can be seen that the beginning of the matching interval of + * inner tuples changes when we move to the next outer tuple. + * Supporting this, i.e. testing and advancing the marked tuple, + * would complicate the join algorithm. Instead of that, we have + * the planner ensure that the inputs are suitably ordered, and + * recheck this on
Re: [HACKERS] PoC: full merge join on comparison clause
On Tue, Jul 10, 2018 at 10:06 PM, Alexander Kuzmenkov wrote: > I tried to fix the things you mentioned and improve the comments. Among > other changes, there is now a description of how merge join works with > inequalities at the top of nodeMergejoin.c. It also explains why we only > support one inequality clause. Thanks for the commit messages. I would use word "in-equality" instead of "comparison" since equality is also a comparison. > > Some particular points: > > On 07/06/2018 04:01 PM, Ashutosh Bapat wrote: >> >> -StrategyNumber opstrategy = mergestrategies[iClause]; >> +StrategyNumber sort_strategy = mergestrategies[iClause]; >> -intop_strategy; >> +intjoin_strategy; >> I don't see a reason why should we change the name of variable here. These >> are >> operator strategies and there's no need to change their names. The name >> change >> is introducing unnecessary diffs. > > > These variables have different meaning but their names differ only with an > underscore. When I had to change this function, I made mistakes because of > this. I'd keep the descriptive names to avoid further confusion. Should this > be a separate patch? No, 0001 suffice. But I am still not sure that the variable name change is worth the trouble. Anyway, will leave this for a committer to judge. > > This is just a cross-check for the planner. Added a comment. We should > probably use a separate error code for internal errors as opposed to user > errors, but I'm not sure if we have one, I see just elog(ERROR) being used > everywhere. elog(ERROR) is fine. Thanks for the comments. -if (op_mergejoinable(opno, exprType(leftarg)) && -!contain_volatile_functions((Node *) clause)) -restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno); +if (!contain_volatile_functions((Node *) clause)) +{ +if (op_mergejoinable_equality(opno, exprType(leftarg))) Why is this condition split. Also why is the change in the order of conditions? +{ +restrictinfo->mergeopfamilies = get_btree_equality_opfamilies(opno); +restrictinfo->is_mj_equality = true; Comparing this with the original code, I think, is_mj_equality should be true if restrictinfo->mergeopfamilies is not NIL. There is no way that a clause can act as an equality clause when there are no families in which the operator is an equality operator. If restrictinfo->mergeopfamilies can not be NIL here, probably we should add an Assert and a bit of explanation as to why is_mj_equality is true. With this work the meaning of oprcanmerge (See pg_operator catalog and also CREATE OPERATOR syntax) changes. Every btree operator can now be used to perform a merge join. oprcanmerge however only indicates whether an operator is an equality or not. Have you thought about that? Do we require to re-define oprcanmerge? + * + * If the inequality clause is not the last one, or if there are several + * of them, this algorithm doesn't work, because it is not possible to + * sort the inputs in such a way that given an outer tuple, the matching + * inner tuples form a contiguous interval. I think, it should be possible to use this technique with more than one inequality clauses as long as all the operators require the input to be ordered in the same direction and the clauses are ANDed. In that case the for a given outer tuple the matching inner tuples form a contiguous interval. I think it's better to straighten out these things before diving further into the 2nd patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] PoC: full merge join on comparison clause
I tried to fix the things you mentioned and improve the comments. Among other changes, there is now a description of how merge join works with inequalities at the top of nodeMergejoin.c. It also explains why we only support one inequality clause. Some particular points: On 07/06/2018 04:01 PM, Ashutosh Bapat wrote: -StrategyNumber opstrategy = mergestrategies[iClause]; +StrategyNumber sort_strategy = mergestrategies[iClause]; -intop_strategy; +intjoin_strategy; I don't see a reason why should we change the name of variable here. These are operator strategies and there's no need to change their names. The name change is introducing unnecessary diffs. These variables have different meaning but their names differ only with an underscore. When I had to change this function, I made mistakes because of this. I'd keep the descriptive names to avoid further confusion. Should this be a separate patch? I think we should write a wrapper around MJCompare which interprets the result rather than changing MJCompare itself. OR at least change the name of MJCompare. Renamed the function to MJTestTuples to reflect that it decides whether we join tuples or advance either side. - * for get_mergejoin_opfamilies(). + * for get_equality_opfamilies(). I think we should leave mergejoin word in there or at least indicate that these are btree opfamilies so that we don't confuse it with hash equality operator families. Renamed these to get_btree_equality_opfamilies() and get_btree_comparison_opfamilies(). +if (parent->mj_Ineq_Present) +elog(ERROR, "inequality mergejoin clause must be the last one"); + IIUC, this never happens. If it really happens, we have created a path which can not be used practically. That should never happen. It will help to add a comment here clarifying that situation. This is just a cross-check for the planner. Added a comment. We should probably use a separate error code for internal errors as opposed to user errors, but I'm not sure if we have one, I see just elog(ERROR) being used everywhere. +boolhave_inequality = have_inequality_mergeclause(mergeclauses); There will be many paths created with different ordering of pathkeys. So, instead of calling have_inequality_mergeclause() for each of those paths, it's better to save this status in the path itself when creating the path. I removed this function altogether, because we can just check the last merge clause. When we cost the path, we already have a proper mergejoinable list of clauses, so if there is an inequality clause, it's the last one. /* Make a pathkey list with this guy first */ if (l != list_head(all_pathkeys)) +{ +if (have_inequality && l == list_tail(all_pathkeys)) +/* Inequality merge clause must be the last, we can't move it */ +break; + I am kind of baffled by this change. IIUC the way we create different orderings of pathkeys here, we are just rotating the pathkeys in circular order. This means there is exactly one ordering of pathkeys where the pathkey corresponding to the inequality clause is the last one. This code does not rotate the pathkeys circularly, but puts each of them in the first position, and keeps the rest in the original order. Say, if we have three equality pathkeys, and one inequality pathkey at the end (let's denote them as E1, E2, E3, IE), the permutations it tries will be like this: E1 E2 E3 IE E2 E1 E3 IE E3 E1 E2 IE Does this sound right? /* Might have no mergeclauses */ if (nClauses == 0) return NIL; +{ +List *ineq_clauses = find_inequality_clauses(mergeclauses); + +if (list_length(ineq_clauses) > 1) +return NIL; Without this patch, when there is an inequality clause with FULL JOIN, we will not create a merge join path because select_mergejoin_clauses() will set mergejoin_allowed to false. This means that we won't call sort_inner_and_outer(). I think this patch also has to do the same i.e. when there are more than one inequality clauses, select_mergejoin_clauses() should set mergejoin_allowed to false in case of a FULL JOIN since merge join machinary won't be able to handle that case. If we do that, we could arrange extra.mergeclause_list such that the inequality clause is always at the end thus finding inequality clause would be easy. I changed select_mergejoin_clauses() to filter multiple inequality clauses and disable join if needed. Now we can use extra inequalities as join filter, if it's not full join. I didn't reorder extra.mergeclause_list there, because this order is ignored later. select_outer_pathkeys_for_merge() chooses the order of pathkeys using some heuristics, and then find_mergeclauses_for_outer_pathkeys() reorders the clauses accordingly. -- Alexander Kuzmenkov Postgres Professional:
Re: [HACKERS] PoC: full merge join on comparison clause
On Tue, Jul 10, 2018 at 12:05 AM, Alexander Kuzmenkov wrote: > On 07/09/2018 04:12 PM, Ashutosh Bapat wrote: >> >> On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat >> wrote: >>> >>> I will continue reviewing the patches. >>> >> Here are some more review comments > > > > Ashutosh, > > Many thanks for the review, I'm glad that we are continuing with this patch. > I'm working on your comments now, will post the updated version this week. While updating the patches, please consider adding some comments as to why only single inequality clause supported. I didn't see comments in the patch explaining that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] PoC: full merge join on comparison clause
On 07/09/2018 04:12 PM, Ashutosh Bapat wrote: On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat wrote: I will continue reviewing the patches. Here are some more review comments Ashutosh, Many thanks for the review, I'm glad that we are continuing with this patch. I'm working on your comments now, will post the updated version this week. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: full merge join on comparison clause
On Fri, Jul 6, 2018 at 6:31 PM, Ashutosh Bapat wrote: > > I will continue reviewing the patches. > Here are some more review comments - * sort ordering for each merge key. The mergejoinable operator is an - * equality operator in the opfamily, and the two inputs are guaranteed to be + * sort ordering for each merge key. The mergejoinable operator is a + * comparison operator in the opfamily, and the two inputs are guaranteed to be I think this prologue has to change substantially. At the beginning of the prologue it explicitly mentions clauses like leftexpr = rightexpr. That needs to be changed. * ordered in either increasing or decreasing (respectively) order according It looks like the order of inputs is constrained by the in-equality operator. That too needs to be specified here. * This allows us to obtain the needed comparison function from the opfamily. @@ -200,6 +200,9 @@ MJExamineQuals(List *mergeclauses, Oidop_righttype; Oidsortfunc; +if (parent->mj_Ineq_Present) +elog(ERROR, "inequality mergejoin clause must be the last one"); + IIUC, this never happens. If it really happens, we have created a path which can not be used practically. That should never happen. It will help to add a comment here clarifying that situation. +boolhave_inequality = have_inequality_mergeclause(mergeclauses); There will be many paths created with different ordering of pathkeys. So, instead of calling have_inequality_mergeclause() for each of those paths, it's better to save this status in the path itself when creating the path. /* Make a pathkey list with this guy first */ if (l != list_head(all_pathkeys)) +{ +if (have_inequality && l == list_tail(all_pathkeys)) +/* Inequality merge clause must be the last, we can't move it */ +break; + I am kind of baffled by this change. IIUC the way we create different orderings of pathkeys here, we are just rotating the pathkeys in circular order. This means there is exactly one ordering of pathkeys where the pathkey corresponding to the inequality clause is the last one. It's only that ordering which will be retained and all other ordering will be discarded. Instead of that, I think we should keep the pathkey corresponding to the inequality clause at the end (or track in separately) and create different orderings of pathkeys by rotating other pathkeys. This will allow us to cost the orderings as intended by this fucntion. /* Might have no mergeclauses */ if (nClauses == 0) return NIL; +{ +List *ineq_clauses = find_inequality_clauses(mergeclauses); + +if (list_length(ineq_clauses) > 1) +return NIL; Without this patch, when there is an inequality clause with FULL JOIN, we will not create a merge join path because select_mergejoin_clauses() will set mergejoin_allowed to false. This means that we won't call sort_inner_and_outer(). I think this patch also has to do the same i.e. when there are more than one inequality clauses, select_mergejoin_clauses() should set mergejoin_allowed to false in case of a FULL JOIN since merge join machinary won't be able to handle that case. If we do that, we could arrange extra.mergeclause_list such that the inequality clause is always at the end thus finding inequality clause would be easy. Again, this is not full review, but I am diving deeper into the patch-set and understanding it better. Sorry. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] PoC: full merge join on comparison clause
Hi, I have started reviewing these patches. I haven't grasped the design yet. But here are some comments on the first patch. -clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); +parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); crosses 80 characters. -StrategyNumber opstrategy = mergestrategies[iClause]; +StrategyNumber sort_strategy = mergestrategies[iClause]; -intop_strategy; +intjoin_strategy; I don't see a reason why should we change the name of variable here. These are operator strategies and there's no need to change their names. The name change is introducing unnecessary diffs. +clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent); +clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent); cross 80 characters. /* @@ -378,20 +375,29 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) return result; } +/* Tuple comparison result */ +typedef enum +{ +MJCR_NextInner = 1, +MJCR_NextOuter = -1, +MJCR_Join = 0 +} MJCompareResult; + /* * MJCompare * - * Compare the mergejoinable values of the current two input tuples - * and return 0 if they are equal (ie, the mergejoin equalities all - * succeed), >0 if outer > inner, <0 if outer < inner. + * Compare the mergejoinable values of the current two input tuples. + * If they are equal, i.e., the mergejoin equalities all succeed, + * return MJCR_Join, if outer > inner, MJCR_NextInner, and else + * MJCR_NextOuter. * * MJEvalOuterValues and MJEvalInnerValues must already have been called * for the current outer and inner tuples, respectively. */ -static int +static MJCompareResult MJCompare(MergeJoinState *mergestate) { I am not sure about this change as well. MJCompare()'s job is to compare given keys in the two tuples and return the comparison result. The result was used as it is to decide which side to advance in an equality based merge join. But for inequality based merge join the result needs to be interpreted further. I think we should write a wrapper around MJCompare which interprets the result rather than changing MJCompare itself. OR at least change the name of MJCompare. The first option is better in case we use MJCompare for purposes other than merge join in future. I am not sure what those could be, but say a merge based union or something like that. /* * Sweep through the existing EquivalenceClasses looking for matches to @@ -273,7 +274,7 @@ process_equivalence(PlannerInfo *root, /* * A "match" requires matching sets of btree opfamilies. Use of * equal() for this test has implications discussed in the comments - * for get_mergejoin_opfamilies(). + * for get_equality_opfamilies(). I think we should leave mergejoin word in there or at least indicate that these are btree opfamilies so that we don't confuse it with hash equality operator families. It will be good if you can write something about why these changes are required in the file. If you are using git format-patch, you could write a commit message that gets added to the patch. That way, it leaves there for anybody to review. I am having a difficult time reading the next patch. There are various changes in the second patch, which I don't understand the reason behind. I think some comments will help, in as commit message or in the code. I will continue reviewing the patches. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] PoC: full merge join on comparison clause
On 05.03.2018 08:30, Ashutosh Bapat wrote: But creating such patches using git format-patch (with -v as some suggest) really helps in general. Thanks for the advice. I heard about this workflow, but never used it myself. Perhaps it's time to try it. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] PoC: full merge join on comparison clause
On Fri, Mar 2, 2018 at 8:02 PM, Alexander Kuzmenkovwrote: > On 22.02.2018 21:42, Alexander Kuzmenkov wrote: >> >> >> Some basic joins work, but I couldn't properly test all the corner cases >> with different orderings, because they depend on a bug in vanilla merge >> joins [1]. >> > > The bug was fixed, so here is the rebased patch. The planner part of the > patch is stable now and can be reviewed, too. > Both the patches are named 01. Their names tell the order in which they need to be applied, so it's ok for these patches. But creating such patches using git format-patch (with -v as some suggest) really helps in general. All you need to do is prepare commits in your repository, one per patch, including changes in each patch in separate commits and then run git format-patch on that repository. I use git format-patch @{upstream}, but there are other ways also. Then you can use git rebase to rebase your patches periodically. If you are already doing that, sorry for the noise. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] PoC: full merge join on comparison clause
On 22.02.2018 21:42, Alexander Kuzmenkov wrote: Some basic joins work, but I couldn't properly test all the corner cases with different orderings, because they depend on a bug in vanilla merge joins [1]. The bug was fixed, so here is the rebased patch. The planner part of the patch is stable now and can be reviewed, too. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index f50205ec8a..861327b928 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -166,8 +166,8 @@ typedef enum * In addition to the expressions themselves, the planner passes the btree * opfamily OID, collation OID, btree strategy number (BTLessStrategyNumber or * BTGreaterStrategyNumber), and nulls-first flag that identify the intended - * sort ordering for each merge key. The mergejoinable operator is an - * equality operator in the opfamily, and the two inputs are guaranteed to be + * sort ordering for each merge key. The mergejoinable operator is a + * comparison operator in the opfamily, and the two inputs are guaranteed to be * ordered in either increasing or decreasing (respectively) order according * to the opfamily and collation, with nulls at the indicated end of the range. * This allows us to obtain the needed comparison function from the opfamily. @@ -200,6 +200,9 @@ MJExamineQuals(List *mergeclauses, Oid op_righttype; Oid sortfunc; + if (parent->mj_Ineq_Present) + elog(ERROR, "inequality mergejoin clause must be the last one"); + if (!IsA(qual, OpExpr)) elog(ERROR, "mergejoin clause is not an OpExpr"); @@ -225,9 +228,40 @@ MJExamineQuals(List *mergeclauses, _strategy, _lefttype, _righttype); - if (join_strategy != BTEqualStrategyNumber) /* should not happen */ - elog(ERROR, "cannot merge using non-equality operator %u", - qual->opno); + + /* + * Determine whether we accept lesser and/or equal tuples of the inner + * relation. + */ + if (join_strategy != BTEqualStrategyNumber) + { + parent->mj_Ineq_Present = true; + switch (join_strategy) + { +case BTLessEqualStrategyNumber: + parent->mj_Ineq_JoinEqual = true; + /* fall through */ +case BTLessStrategyNumber: + parent->mj_Ineq_JoinLesser = true; + if (sort_strategy != BTGreaterStrategyNumber) + elog(ERROR, "join strategy %d is not compatible with sort strategy %d", + join_strategy, sort_strategy); + break; + +case BTGreaterEqualStrategyNumber: + parent->mj_Ineq_JoinEqual = true; + /* fall through */ +case BTGreaterStrategyNumber: + parent->mj_Ineq_JoinLesser = true; + if (sort_strategy != BTLessStrategyNumber) + elog(ERROR, "join strategy %d is not compatible with sort strategy %d", + join_strategy, sort_strategy); + break; + +default: + elog(ERROR, "unsupported join strategy %d", join_strategy); + } + } /* * sortsupport routine must know if abbreviation optimization is @@ -415,6 +449,19 @@ MJCompare(MergeJoinState *mergestate) { MergeJoinClause clause = >mj_Clauses[i]; int sort_result; + bool join_equal = true; + bool join_lesser = false; + + if (mergestate->mj_Ineq_Present && i == mergestate->mj_NumClauses - 1) + { + /* + * If the last merge clause is an inequality, check whether + * we have to join the inner tuples that are less than outer + * and/or equal to outer. + */ + join_equal = mergestate->mj_Ineq_JoinEqual; + join_lesser = mergestate->mj_Ineq_JoinLesser; + } /* * Special case for NULL-vs-NULL, else use standard comparison. @@ -429,8 +476,22 @@ MJCompare(MergeJoinState *mergestate) clause->rdatum, clause->risnull, >ssup); - result = sort_result == 0 ? MJCR_Join - : sort_result < 0 ? MJCR_NextOuter : MJCR_NextInner; + if (sort_result < 0) + result = MJCR_NextOuter; + else if (sort_result == 0) + { + if (join_equal) +result = MJCR_Join; + else +result = MJCR_NextOuter; + } + else /* sort_result > 0 */ + { + if (join_lesser) +result = MJCR_Join; + else +result = MJCR_NextInner; + } if (result != MJCR_Join) break; diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index d8db0b29e1..9d3f177622 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -2797,6 +2797,24 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace, } /* + * Check whether there is an inequality clause in the list + */ +static bool +have_inequality_mergeclause(List *mergeclauses) +{ + ListCell *lc; + + foreach(lc, mergeclauses) + { + RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc)); + Assert(rinfo->mergeopfamilies != NIL); + if (!rinfo->is_mj_equality) + return true; + } + return
Re: [HACKERS] PoC: full merge join on comparison clause
Here are some updates on this patch. I split it into two parts. The preparatory part contains some mechanical changes to prepare for the main part. Most importantly, a new field is added, `RestrictInfo.is_mj_equality`. It is a marker of mergejoinable equality clauses, and `RestrictInfo.mergeopfamilies` is a more general marker of clauses that are mergejoinable but not necessarily equality. The usages are changed accordingly. The main part consists of executor and planner changes required to support inequality merge joins. The executor changes are as described in the original post. The planner part has changed significantly since the last version. It used to apply some shady hacks to ensure we have the required sort orders of inner and outer paths. Now I think I found a reasonable way to generate the pathkeys we need. When we sort outer relation in `sort_inner_and_outer()`, the pathkeys are generated by `select_outer_pathkeys_for_merge()`. When we use the pathkeys we already have for the outer relation in `match_unsorted_outer()`, mergeclauses are selected by `find_mergeclauses_for_pathkeys()`. I changed these functions to select the right pathkey direction for merge clauses, and also ensure that we only have a single inequality merge clause and it is the last one. Also, to use the index paths, I changed `pathkeys_useful_for_merging()` to keep both pathkey directions for inequality merge clauses. Some basic joins work, but I couldn't properly test all the corner cases with different orderings, because they depend on a bug in vanilla merge joins [1]. To sum up, the preparatory and executor changes are stable, and the planner part is WIP. 1. https://www.postgresql.org/message-id/flat/5dad9160-4632-0e47-e120-8e2082000...@postgrespro.ru -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index f50205ec8a..861327b928 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -166,8 +166,8 @@ typedef enum * In addition to the expressions themselves, the planner passes the btree * opfamily OID, collation OID, btree strategy number (BTLessStrategyNumber or * BTGreaterStrategyNumber), and nulls-first flag that identify the intended - * sort ordering for each merge key. The mergejoinable operator is an - * equality operator in the opfamily, and the two inputs are guaranteed to be + * sort ordering for each merge key. The mergejoinable operator is a + * comparison operator in the opfamily, and the two inputs are guaranteed to be * ordered in either increasing or decreasing (respectively) order according * to the opfamily and collation, with nulls at the indicated end of the range. * This allows us to obtain the needed comparison function from the opfamily. @@ -200,6 +200,9 @@ MJExamineQuals(List *mergeclauses, Oid op_righttype; Oid sortfunc; + if (parent->mj_Ineq_Present) + elog(ERROR, "inequality mergejoin clause must be the last one"); + if (!IsA(qual, OpExpr)) elog(ERROR, "mergejoin clause is not an OpExpr"); @@ -225,9 +228,40 @@ MJExamineQuals(List *mergeclauses, _strategy, _lefttype, _righttype); - if (join_strategy != BTEqualStrategyNumber) /* should not happen */ - elog(ERROR, "cannot merge using non-equality operator %u", - qual->opno); + + /* + * Determine whether we accept lesser and/or equal tuples of the inner + * relation. + */ + if (join_strategy != BTEqualStrategyNumber) + { + parent->mj_Ineq_Present = true; + switch (join_strategy) + { +case BTLessEqualStrategyNumber: + parent->mj_Ineq_JoinEqual = true; + /* fall through */ +case BTLessStrategyNumber: + parent->mj_Ineq_JoinLesser = true; + if (sort_strategy != BTGreaterStrategyNumber) + elog(ERROR, "join strategy %d is not compatible with sort strategy %d", + join_strategy, sort_strategy); + break; + +case BTGreaterEqualStrategyNumber: + parent->mj_Ineq_JoinEqual = true; + /* fall through */ +case BTGreaterStrategyNumber: + parent->mj_Ineq_JoinLesser = true; + if (sort_strategy != BTLessStrategyNumber) + elog(ERROR, "join strategy %d is not compatible with sort strategy %d", + join_strategy, sort_strategy); + break; + +default: + elog(ERROR, "unsupported join strategy %d", join_strategy); + } + } /* * sortsupport routine must know if abbreviation optimization is @@ -415,6 +449,19 @@ MJCompare(MergeJoinState *mergestate) { MergeJoinClause clause = >mj_Clauses[i]; int sort_result; + bool join_equal = true; + bool join_lesser = false; + + if (mergestate->mj_Ineq_Present && i == mergestate->mj_NumClauses - 1) + { + /* + * If the last merge clause is an inequality, check whether + * we have to join the inner tuples that are less
Re: [HACKERS] PoC: full merge join on comparison clause
Greetings Alexander, * Alexander Kuzmenkov (a.kuzmen...@postgrespro.ru) wrote: > Here is the patch rebased to a852cfe9. Thanks for updating it. This would definitely be nice to have. Ashutosh, thanks for your previous review, would you have a chance to look at it again? Would be great to at least get this to ready for committer before the end of this CF. Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] PoC: full merge join on comparison clause
Here is the patch rebased to a852cfe9. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index ef9e1ee471..c842ed2968 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -172,31 +172,32 @@ typedef enum * to the opfamily and collation, with nulls at the indicated end of the range. * This allows us to obtain the needed comparison function from the opfamily. */ -static MergeJoinClause +static void MJExamineQuals(List *mergeclauses, Oid *mergefamilies, Oid *mergecollations, int *mergestrategies, bool *mergenullsfirst, - PlanState *parent) + MergeJoinState *parent) { - MergeJoinClause clauses; int nClauses = list_length(mergeclauses); int iClause; ListCell *cl; - clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); + parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); + parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool)); + parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool)); iClause = 0; foreach(cl, mergeclauses) { OpExpr *qual = (OpExpr *) lfirst(cl); - MergeJoinClause clause = [iClause]; + MergeJoinClause clause = >mj_Clauses[iClause]; Oid opfamily = mergefamilies[iClause]; Oid collation = mergecollations[iClause]; - StrategyNumber opstrategy = mergestrategies[iClause]; + StrategyNumber sort_op_strategy = mergestrategies[iClause]; bool nulls_first = mergenullsfirst[iClause]; - int op_strategy; + int join_op_strategy; Oid op_lefttype; Oid op_righttype; Oid sortfunc; @@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses, /* * Prepare the input expressions for execution. */ - clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent); - clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent); + clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent); + clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent); /* Set up sort support data */ clause->ssup.ssup_cxt = CurrentMemoryContext; clause->ssup.ssup_collation = collation; - if (opstrategy == BTLessStrategyNumber) + if (sort_op_strategy == BTLessStrategyNumber) clause->ssup.ssup_reverse = false; - else if (opstrategy == BTGreaterStrategyNumber) + else if (sort_op_strategy == BTGreaterStrategyNumber) clause->ssup.ssup_reverse = true; else /* planner screwed up */ - elog(ERROR, "unsupported mergejoin strategy %d", opstrategy); + elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy); clause->ssup.ssup_nulls_first = nulls_first; /* Extract the operator's declared left/right datatypes */ get_op_opfamily_properties(qual->opno, opfamily, false, - _strategy, + _op_strategy, _lefttype, _righttype); - if (op_strategy != BTEqualStrategyNumber) /* should not happen */ - elog(ERROR, "cannot merge using non-equality operator %u", - qual->opno); + + /* + * Determine whether we accept lesser and/or equal tuples of the inner + * relation. + */ + switch (join_op_strategy) + { + case BTEqualStrategyNumber: +parent->mj_UseEqual[iClause] = true; +break; + + case BTLessEqualStrategyNumber: +parent->mj_UseEqual[iClause] = true; +/* fall through */ + + case BTLessStrategyNumber: +parent->mj_UseLesser[iClause] = true; +break; + + case BTGreaterEqualStrategyNumber: +parent->mj_UseEqual[iClause] = true; +/* fall through */ + + case BTGreaterStrategyNumber: +parent->mj_UseLesser[iClause] = true; +break; + + default: +elog(ERROR, "unsupported join strategy %d", join_op_strategy); + } /* * sortsupport routine must know if abbreviation optimization is @@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses, iClause++; } - - return clauses; } /* @@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) return result; } +/* Tuple comparison result */ +typedef enum +{ + MJCR_NextInner = 1, + MJCR_NextOuter = -1, + MJCR_Join = 0 +} MJCompareResult; + /* * MJCompare * @@ -388,10 +422,10 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) * MJEvalOuterValues and MJEvalInnerValues must already have been called * for the current outer and inner tuples, respectively. */ -static int +static MJCompareResult MJCompare(MergeJoinState *mergestate) { - int result = 0; + MJCompareResult result = MJCR_Join; bool nulleqnull = false; ExprContext *econtext = mergestate->js.ps.ps_ExprContext; int i; @@ -408,6 +442,7 @@ MJCompare(MergeJoinState *mergestate) for (i = 0; i < mergestate->mj_NumClauses; i++) { MergeJoinClause clause = >mj_Clauses[i]; + int