Hi Richard,

I took a look at the v2, here are some comments:

* This feature needs tests, especially for the cases where opfamilies or data types or collations don't match, and other non-obvious cases where it shouldn't work.


* Deducing an inequality to a constant is not always helpful. If we know that a = b and a = const1 and b < const2, we will deduce b = const1 from the EC, and adding b < const2 doesn't improve selectivity and only makes the cost estimate worse. One situation where it does make sense is when we can detect contradictions in these clauses, e.g. if we know that const1 > const2 and therefore know that the above selection clause is always false. Looking at the regression test changes, I see that v2 doesn't do that. I think the handling of deduced inequalities shoud be modeled on the flow of generate_base_implied_equalities -> process_implied_equality -> distribute_qual_to_rels. This would allow us to correctly handle a deduced const1 < const2 clause and turn it into a gating Result qual.


* There are functions named generate_implied_quals, gen_implied_quals and gen_implied_qual. The names are confusingly similar, we could use something like generate_implied_quals_for_clause and generate_one_implied_qual for the latter two.


@@ gen_implied_quals
    else if (clause && IsA(clause, ScalarArrayOpExpr))
    {
* When can the clause be NULL?


@@ gen_implied_quals
    item1 = canonicalize_ec_expression(item1,
                                       exprType((Node *) item1),
                                       collation);
    item2 = canonicalize_ec_expression(item2,
                                       exprType((Node *) item2),
                                       collation);
* Why do we do this? If the collation or type of the original expression isn't right and we have to add RelabelType, the resulting expression won't match the original clause and we won't be able to substitute it with other equavalence members. So we can just check that the type and collation match.


* In gen_implied_quals, I'd rename item1 => left, item2 => right, em1 => orig_em, em2 => other_em, and same for list cells and types. As it is now, em1 can actually match both item1 and item2, and em2 is not related to either of them, so it took me some effort to understand what's going on.


* In gen_implied_qual, why do we search the clause for a matching subexpression? Reading the thread, I thought that we can only do the substitution for OpExprs of the same opfamilies as the generating EC. This code looks like it can do the substitution an at arbitrary depth, so we might change an argument of some unsuitable function, and the result will not be correct. What we should probably do is that after we matched one side of OpExpr to one EC member, we just replace it with another suitable member and add the resulting clause.


@@ gen_implied_qual
    check_mergejoinable(new_rinfo);
    check_hashjoinable(new_rinfo);
...
    /*
     * If the clause has a mergejoinable operator, set the EquivalenceClass
     * links. Otherwise, a mergejoinable operator with NULL left_ec/right_ec
     * will cause update_mergeclause_eclasses fails at assertion.
     */
    if (new_rinfo->mergeopfamilies)
        initialize_mergeclause_eclasses(root, new_rinfo);
* It's not an equality clause, so it is not going to be mergejoinable nor hashjoinable and we can skip these checks.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to