Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from disjunctive CompoundPredicates.
Might be worth commenting on how this handles > 2 disjuncts. I.e. you need to 
apply the rule from the bottom-up.


Line 49:       if (child1Conjuncts.contains(conjunct)) {
This is n^2 so will get very slow if we have long lists of disjuncts, e.g. 
machine-generated queries. This is fine for small n but I think we should 
seriously consider switching to HashMaps for child1Conjuncts and 
commonConjuncts for n > some threshold. The core logic I think is the same if 
you factor it out into a function that operates on Collection<Expr>.


Line 72:     if (!child0Conjuncts.isEmpty()) {
At this point both child0Conjuncts and child1Conjuncts are non-empty because we 
would have returned early otherwise. So I think these branches are unnecessary.


PS1, Line 84: newDisjuncts.size() > 1
See above - this should always be true.


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
Slightly tangential, but should we apply De Morgan's first to normalise the 
disjunct into a conjunct? That would help if we had something like:

  (!(int_col=5 or tinyint_col > 9) and double_col = 7) or (!(int_col = 5) and 
!(tinyint_col > 9) and double_col = 8)

I guess in general it would be nice to do some basic normalisation of 
expressions in other cases, e.g. convert !(tinyint_col > 9) to tinyint_col <= 
9, convert 9 < tinyint_col to tinyint_col <= 9, etc.

I think that would make the common conjunct extraction a bit more robust. You 
can rewrite the query but avoiding that is the motivation for this in the first 
place, so it would be nice if it worked in some more simple cases.


Line 153:         "(int_col < 10 and id between 5 and 6) or " +
Does common conjunct extraction work if you have a BETWEEN expression and the 
equivalent expression after the rewrite? Should work if we apply the extraction 
after the BETWEEN rewrite, but would be nice to test.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to