On Thu, Jul 18, 2013 at 1:54 PM, Gurjeet Singh <gurj...@singh.im> wrote:

> On Thu, Jul 18, 2013 at 10:19 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> >> > Because simpler code is less likely to have bugs and is easier to >> > maintain. >> >> I agree with that point, but one should also remember Polya's Inventor's >> Paradox: the more general problem may be easier to solve. That is, if >> done right, code that fully flattens an AND tree might actually be >> simpler than code that does just a subset of the transformation. The >> current patch fails to meet this expectation, > > > The current patch does completely flatten any type of tree > (left/right-deep or bushy) without recursing, and right-deep and bushy tree > processing is what Robert is recommending to defer to recursive processing. > Maybe I haven't considered a case where it doesn't flatten the tree; do you > have an example in mind. > > >> but maybe you just haven't >> thought about it the right way. >> > I tried to eliminate the 'pending' list, but I don't see a way around it. We need temporary storage somewhere to store the branches encountered on the right; in recursion case the call stack was serving that purpose. > >> My concerns about this patch have little to do with that, though, and >> much more to do with the likelihood that it breaks some other piece of >> code that is expecting AND/OR to be strictly binary operators, which >> is what they've always been in parsetrees that haven't reached the >> planner. It doesn't appear to me that you've done any research on that >> point whatsoever > > > No, I haven't, and I might not be able to research it for a few more weeks. > There are about 30 files (including optimizer and executor) that match case-insensitive search for BoolExpr, and I scanned those for the usage of the member 'args'. All the instances where BoolExpr.args is being accessed, it's being treated as a null-terminated list. There's one exception that I could find, and it was in context of NOT expression: not_clause() in clauses.c. > > >> you have not even updated the comment for BoolExpr >> (in primnodes.h) that this patch falsifies. >> > > I will fix that. > I think this line in that comment already covers the fact that in some "special" cases a BoolExpr may have more than 2 arguments. "There are also a few special cases where more arguments can appear before optimization." I have updated the comment nevertheless, and removed another comment in parse_expr.c that claimed to be the only place where a BoolExpr with more than 2 args is generated. I have isolated the code for right-deep and bushy tree processing via the macro PROCESS_BUSHY_TREES. Also, I have shortened some variable names while retaining their meaning. Please find the updated patch attached (based on master). Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ EDB www.EnterpriseDB.com <http://www.enterprisedb.com>

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 81c9338..eb35d70 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -41,8 +41,7 @@ bool Transform_null_equals = false; static Node *transformExprRecurse(ParseState *pstate, Node *expr); static Node *transformParamRef(ParseState *pstate, ParamRef *pref); static Node *transformAExprOp(ParseState *pstate, A_Expr *a); -static Node *transformAExprAnd(ParseState *pstate, A_Expr *a); -static Node *transformAExprOr(ParseState *pstate, A_Expr *a); +static Node *transformAExprAndOr(ParseState *pstate, A_Expr *a); static Node *transformAExprNot(ParseState *pstate, A_Expr *a); static Node *transformAExprOpAny(ParseState *pstate, A_Expr *a); static Node *transformAExprOpAll(ParseState *pstate, A_Expr *a); @@ -224,10 +223,10 @@ transformExprRecurse(ParseState *pstate, Node *expr) result = transformAExprOp(pstate, a); break; case AEXPR_AND: - result = transformAExprAnd(pstate, a); + result = transformAExprAndOr(pstate, a); break; case AEXPR_OR: - result = transformAExprOr(pstate, a); + result = transformAExprAndOr(pstate, a); break; case AEXPR_NOT: result = transformAExprNot(pstate, a); @@ -918,32 +917,102 @@ transformAExprOp(ParseState *pstate, A_Expr *a) return result; } +/* + * Transform the AND/OR trees non-recursively. + * + * The parser turns a list of consecutive AND expressions into a left-deep tree. + * + * a AND b AND c + * + * AND + * / \ + * AND c + * / \ + * a b + * + * For very long lists, it gets deep enough that processing it recursively causes + * check_stack_depth() to raise error and abort the query. Hence, it is necessary + * that we process these trees iteratively. + */ static Node * -transformAExprAnd(ParseState *pstate, A_Expr *a) +transformAExprAndOr(ParseState *pstate, A_Expr *a) { - Node *lexpr = transformExprRecurse(pstate, a->lexpr); - Node *rexpr = transformExprRecurse(pstate, a->rexpr); +#define PROCESS_BUSHY_TREES 1 + List *exprs = NIL; +#if PROCESS_BUSHY_TREES + List *pending = NIL; +#endif + Node *expr; + A_Expr *root = a; + A_Expr_Kind root_kind = a->kind; + char *root_name = (root_kind == AEXPR_AND ? "AND" : "OR"); + BoolExprType root_expr_type = (root_kind == AEXPR_AND ? AND_EXPR : OR_EXPR); + + do { + Node *tmp; - lexpr = coerce_to_boolean(pstate, lexpr, "AND"); - rexpr = coerce_to_boolean(pstate, rexpr, "AND"); + /* + * Follow the left links to walk the left-deep tree, and process all the + * right branches. We traverse down the left branch as long as we find + * an unbroken chain of node types that match the root node. Then we + * stop and process the rest of the tree in normal recursive manner. +#if PROCESS_BUSHY_TREES + * + * If a right branch is also the same kind of tree as the root, then + * append it to the 'pending' list. The pending list is also processed + * in this function call iteratively rather than recursively. This + * allows us to process bushy and even right-deep trees, not just + * left-deep trees. +#endif + */ + tmp = (Node*)a; + do { + a = (A_Expr*) tmp; - return (Node *) makeBoolExpr(AND_EXPR, - list_make2(lexpr, rexpr), - a->location); -} +#if PROCESS_BUSHY_TREES + if (IsA(a->rexpr, A_Expr) && ((A_Expr*)a->rexpr)->kind == root_kind) + { + pending = lappend(pending, a->rexpr); + } + else +#endif + { + expr = transformExprRecurse(pstate, a->rexpr); + expr = coerce_to_boolean(pstate, expr, root_name); + /* + * Use lcons instead of lappend to retain the order of + * processing of right branches close to the way it was in the + * case of recursive processing. + */ + exprs = lcons(expr, exprs); + } -static Node * -transformAExprOr(ParseState *pstate, A_Expr *a) -{ - Node *lexpr = transformExprRecurse(pstate, a->lexpr); - Node *rexpr = transformExprRecurse(pstate, a->rexpr); + tmp = a->lexpr; + } while (IsA(tmp, A_Expr) && ((A_Expr*)tmp)->kind == root_kind); - lexpr = coerce_to_boolean(pstate, lexpr, "OR"); - rexpr = coerce_to_boolean(pstate, rexpr, "OR"); + /* Now process the node in left branch that broke our chain. */ + expr = transformExprRecurse(pstate, a->lexpr); + expr = coerce_to_boolean(pstate, expr, root_name); + exprs = lcons(expr, exprs); - return (Node *) makeBoolExpr(OR_EXPR, - list_make2(lexpr, rexpr), - a->location); +#if PROCESS_BUSHY_TREES + /* + * Now that we're done processing the edge of the left-deep tree, pop + * the first element from the front of the pending list and process any + * interesting nodes we found earlier. + */ + if (list_length(pending) > 0) + { + a = (A_Expr*) linitial(pending); + pending = list_delete_first(pending); + } + else +#endif + a = NULL; + + } while(a != NULL); + + return (Node *) makeBoolExpr(root_expr_type, exprs, root->location); } static Node * @@ -2428,10 +2497,6 @@ make_row_comparison_op(ParseState *pstate, List *opname, /* * For = and <> cases, we just combine the pairwise operators with AND or * OR respectively. - * - * Note: this is presently the only place where the parser generates - * BoolExpr with more than two arguments. Should be OK since the rest of - * the system thinks BoolExpr is N-argument anyway. */ if (rctype == ROWCOMPARE_EQ) return (Node *) makeBoolExpr(AND_EXPR, opexprs, location); diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 9cce60b..5b5c22b 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -458,12 +458,9 @@ typedef struct ScalarArrayOpExpr * BoolExpr - expression node for the basic Boolean operators AND, OR, NOT * * Notice the arguments are given as a List. For NOT, of course the list - * must always have exactly one element. For AND and OR, the executor can - * handle any number of arguments. The parser generally treats AND and OR - * as binary and so it typically only produces two-element lists, but the - * optimizer will flatten trees of AND and OR nodes to produce longer lists - * when possible. There are also a few special cases where more arguments - * can appear before optimization. + * must always have exactly one element. For AND and OR, there are two or + * more arguments in the list. The optimizer and the executor are capable + * of processing these flat lists. */ typedef enum BoolExprType { diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 6c51d0d..c3188a7 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -2117,7 +2117,7 @@ shoe_ready| SELECT rsh.shoename, int4smaller(rsh.sh_avail, rsl.sl_avail) AS total_avail FROM shoe rsh, shoelace rsl - WHERE (((rsl.sl_color = rsh.slcolor) AND (rsl.sl_len_cm >= rsh.slminlen_cm)) AND (rsl.sl_len_cm <= rsh.slmaxlen_cm)); + WHERE ((rsl.sl_color = rsh.slcolor) AND (rsl.sl_len_cm >= rsh.slminlen_cm) AND (rsl.sl_len_cm <= rsh.slmaxlen_cm)); shoelace| SELECT s.sl_name, s.sl_avail, s.sl_color,

-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers