Hi! Thank you for your detailed review, your changes have greatly helped
to improve this patch.
On 06.07.2023 13:20, Andrey Lepikhov wrote:
On 6/7/2023 03:06, Alena Rybakina wrote:
I corrected this constant in the patch.
The patch don't apply cleanly: it contains some trailing spaces.
I fixed it.
Also, quick glance into the code shows some weak points;
1. transformBoolExprOr should have input type BoolExpr.
Agreed.
2. You can avoid the switch operator at the beginning of the function,
because you only need one option.
Agreed.
3. Stale comments: RestrictIinfos definitely not exists at this point.
Yes, unfortunately, I missed this from the previous version when I tried
to perform such a transformation at the index creation stage.
4. I don't know, you really need to copy the expr or not, but it is
better to do as late, as possible.
Yes, I agree with you, copying "expr" is not necessary in this patch
5. You assume, that leftop is non-constant and rightop - constant. Why?
Agreed, It was too presumptuous on my part and I agree with your changes.
6.I doubt about equivalence operator. Someone can invent a custom '='
operator with another semantics, than usual. May be better to check
mergejoinability?
Yes, I agree with you, and I haven't thought about it before. But I
haven't found any functions to arrange this in PostgreSQL, but using
mergejoinability turns out to be more beautiful here.
7. I don't know how to confidently identify constant expressions at
this level. So, I guess, You can only merge here expressions like
"F(X)=Const", not an 'F(X)=ConstExpression'.
I see, you can find solution for this case, thank you for this, and I
think it's reliable enough.
On 07.07.2023 05:43, Andrey Lepikhov wrote:
On 6/7/2023 03:06, Alena Rybakina wrote:
The test was performed on the same benchmark database generated by 2
billion values.
I corrected this constant in the patch.
In attempt to resolve some issues had mentioned in my previous letter
I used op_mergejoinable to detect mergejoinability of a clause.
Constant side of the expression is detected by call of
eval_const_expressions() and check each side on the Const type of node.
See 'diff to diff' in attachment.
I notices you remove condition for checking equal operation.
strcmp(strVal(linitial((arg)->name)), "=") == 0
Firstly, it is noticed me not correct, but a simple example convinced me
otherwise:
postgres=# explain analyze select x from a where x=1 or x>5 or x<3 or x=2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on a (cost=0.00..2291.00 rows=97899 width=4) (actual
time=0.038..104.168 rows=99000 loops=1)
Filter: ((x > '5'::numeric) OR (x < '3'::numeric) OR (x = ANY
('{1,2}'::numeric[])))
Rows Removed by Filter: 1000
Planning Time: 9.938 ms
Execution Time: 113.457 ms
(5 rows)
It surprises me that such check I can write such similar way:
eval_const_expressions(NULL, orqual).
Yes, I see we can remove this code:
bare_orarg = transformExprRecurse(pstate, (Node *)arg);
bare_orarg = coerce_to_boolean(pstate, bare_orarg, "OR");
because we will provide similar manipulation in this:
foreach(l, gentry->consts)
{
Node *rexpr = (Node *) lfirst(l);
rexpr = coerce_to_common_type(pstate, rexpr,
scalar_type,
"IN");
aexprs = lappend(aexprs, rexpr);
}
--
Regards,
Alena Rybakina
Postgres Professional
From 9134099ef9808ca27494bc131558d04b24d9bdeb Mon Sep 17 00:00:00 2001
From: Alena Rybakina <a.rybak...@postgrespro.ru>
Date: Thu, 29 Jun 2023 17:46:58 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
(X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
still working with a tree expression. Firstly, we do not try to make a
transformation for "non-or" expressions or inequalities and the creation of a
relation with "or" expressions occurs according to the same scenario.
Secondly, we do not make transformations if there are less than 500 or
expressions. Thirdly, it is worth considering that we consider "or"
expressions only at the current level.
---
src/backend/parser/parse_expr.c | 269 ++++++++++++++++++++++++++++++-
src/tools/pgindent/typedefs.list | 1 +
2 files changed, 269 insertions(+), 1 deletion(-)
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..961ca3e482c 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,273 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
static Node *make_nulltest_from_distinct(ParseState *pstate,
A_Expr *distincta, Node *arg);
+typedef struct OrClauseGroupEntry
+{
+ Node *node;
+ List *consts;
+ Oid scalar_type;
+ Oid opno;
+ Expr *expr;
+} OrClauseGroupEntry;
+
+static int const_transform_or_limit = 500;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, BoolExpr *expr_orig)
+{
+ List *or_list = NIL;
+ List *groups_list = NIL;
+ ListCell *lc;
+ Node *result;
+ BoolExpr *expr;
+ bool change_apply = false;
+
+ /* If this is not an 'OR' expression, skip the transformation */
+ if (expr_orig->boolop != OR_EXPR ||
+ list_length(expr_orig->args) < const_transform_or_limit)
+ return transformBoolExpr(pstate, (BoolExpr *) expr_orig);
+
+ foreach(lc, expr_orig->args)
+ {
+ Node *arg = lfirst(lc);
+ Node *orqual;
+ Node *const_expr;
+ Node *nconst_expr;
+ ListCell *lc_groups;
+ OrClauseGroupEntry *gentry;
+ bool const_is_left = true;
+
+ /*
+ * The first step: checking that the expression consists only of equality.
+ * We can only do this here, while arg is still row data type, namely A_Expr.
+ * After applying transformExprRecurce, we already know that it will be OpExr type,
+ * but checking the expression for equality is already becoming impossible for us.
+ * Sometimes we have the chance to devide expression into the groups on
+ * equality and inequality. This is possible if all list items are not at the
+ * same level of a single BoolExpr expression, otherwise all of them cannot be converted.
+ */
+
+ orqual = transformExprRecurse(pstate, (Node *) arg);
+ orqual = coerce_to_boolean(pstate, orqual, "OR");
+ orqual = eval_const_expressions(NULL, orqual);
+
+ if (!IsA(orqual, OpExpr))
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
+
+ /* Detect constant side of the clause */
+ if (IsA(get_leftop(orqual), Const))
+ const_is_left = true;
+ else if (IsA(get_rightop(orqual), Const))
+ const_is_left = false;
+ else
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
+
+ /* Get pointers to constant and expression sides of the qual */
+ nconst_expr = (const_is_left) ? get_rightop(orqual) :
+ get_leftop(orqual);
+ const_expr = (const_is_left) ? get_leftop(orqual) :
+ get_rightop(orqual);
+
+ if (!op_mergejoinable(((OpExpr *) orqual)->opno, exprType(nconst_expr)))
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
+
+ /*
+ * The next step: transform all the inputs, and detect whether
+ * any contain Vars.
+ */
+ if (!contain_vars_of_level(orqual, 0))
+ {
+ /* Again, it's not the expr we can transform */
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
+
+ ;
+
+ /*
+ * At this point we definitely have a transformable clause.
+ * Classify it and add into specific group of clauses, or create new
+ * group.
+ * TODO: to manage complexity in the case of many different clauses
+ * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something
+ * like a hash table (htab key ???).
+ */
+ foreach(lc_groups, groups_list)
+ {
+ OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups);
+
+ Assert(v->node != NULL);
+
+ if (equal(v->node, nconst_expr))
+ {
+ v->consts = lappend(v->consts, const_expr);
+ nconst_expr = NULL;
+ break;
+ }
+ }
+
+ if (nconst_expr == NULL)
+ /*
+ * The clause classified successfully and added into existed
+ * clause group.
+ */
+ continue;
+
+ /* New clause group needed */
+ gentry = palloc(sizeof(OrClauseGroupEntry));
+ gentry->node = nconst_expr;
+ gentry->consts = list_make1(const_expr);
+ gentry->opno = ((OpExpr *)orqual)->opno;
+ gentry->expr = (Expr *)orqual;
+ groups_list = lappend(groups_list, (void *) gentry);
+ }
+
+ if (groups_list == NIL)
+ {
+ /*
+ * No any transformations possible with this rinfo, just add itself
+ * to the list and go further.
+ */
+ list_free(or_list);
+
+ return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+ }
+ else
+ {
+ ListCell *lc_args;
+
+ /* Let's convert each group of clauses to an IN operation. */
+
+ /*
+ * Go through the list of groups and convert each, where number of
+ * consts more than 1. trivial groups move to OR-list again
+ */
+
+ foreach(lc_args, groups_list)
+ {
+ OrClauseGroupEntry *gentry = (OrClauseGroupEntry *) lfirst(lc_args);
+ List *allexprs;
+ Oid scalar_type;
+ Oid array_type;
+
+ Assert(list_length(gentry->consts) > 0);
+
+ if (list_length(gentry->consts) == 1)
+ {
+ /*
+ * Only one element in the class. Return rinfo into the BoolExpr
+ * args list unchanged.
+ */
+ list_free(gentry->consts);
+ or_list = lappend(or_list, gentry->expr);
+ continue;
+ }
+
+ /*
+ * Do the transformation. It's been a long way ;)
+ *
+ * First of all, try to select a common type for the array elements. Note that
+ * since the LHS' type is first in the list, it will be preferred when
+ * there is doubt (eg, when all the RHS items are unknown literals).
+ *
+ * Note: use list_concat here not lcons, to avoid damaging rnonvars.
+ *
+ * As a source of insides, use make_scalar_array_op()
+ */
+ allexprs = list_concat(list_make1(gentry->node), gentry->consts);
+ scalar_type = select_common_type(NULL, allexprs, NULL, NULL);
+
+ if (OidIsValid(scalar_type) && scalar_type != RECORDOID)
+ array_type = get_array_type(scalar_type);
+ else
+ array_type = InvalidOid;
+
+ if (array_type != InvalidOid)
+ {
+ /*
+ * OK: coerce all the right-hand non-Var inputs to the common type
+ * and build an ArrayExpr for them.
+ */
+ List *aexprs;
+ ArrayExpr *newa;
+ ScalarArrayOpExpr *saopexpr;
+ ListCell *l;
+
+ aexprs = NIL;
+
+ foreach(l, gentry->consts)
+ {
+ Node *rexpr = (Node *) lfirst(l);
+
+ rexpr = coerce_to_common_type(pstate, rexpr,
+ scalar_type,
+ "IN");
+ aexprs = lappend(aexprs, rexpr);
+ }
+
+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = get_array_type(scalar_type);
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1; /* Position of the new clause is undefined */
+
+ saopexpr = (ScalarArrayOpExpr *)make_scalar_array_op(pstate,
+ list_make1(makeString((char *) "=")),
+ true,
+ gentry->node,
+ (Node *) newa,
+ -1); /* Position of the new clause is undefined */
+
+ /*
+ * TODO: here we can try to coerce the array to a Const and find
+ * hash func instead of linear search (see 50e17ad281b).
+ * convert_saop_to_hashed_saop((Node *) saopexpr);
+ * We don't have to do this anymore, do we?
+ */
+
+ or_list = lappend(or_list, (void *) saopexpr);
+ change_apply = true;
+ }
+ else
+ {
+ list_free(gentry->consts);
+ or_list = lappend(or_list, gentry->expr);
+ continue;
+ }
+ }
+
+ list_free_deep(groups_list);
+ }
+
+ if (change_apply)
+ {
+ /* One more trick: assemble correct clause */
+ expr = list_length(or_list) > 1 ? makeBoolExpr(OR_EXPR, list_copy(or_list), -1) : linitial(or_list);
+ result = (Node *)expr;
+ }
+ /*
+ * There was no reasons to create a new expresion, so
+ * run the original BoolExpr conversion with using
+ * transformBoolExpr function
+ */
+ else
+ result = transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+
+ list_free(or_list);
+
+ return result;
+}
/*
* transformExpr -
@@ -208,7 +475,7 @@ transformExprRecurse(ParseState *pstate, Node *expr)
}
case T_BoolExpr:
- result = transformBoolExpr(pstate, (BoolExpr *) expr);
+ result = (Node *)transformBoolExprOr(pstate, (BoolExpr *)expr);
break;
case T_FuncCall:
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e941fb6c82f..c3abb725c8c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1631,6 +1631,7 @@ NumericVar
OM_uint32
OP
OSAPerGroupState
+OrClauseGroupEntry
OSAPerQueryState
OSInfo
OSSLCipher
--
2.34.1