Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15278 )

Change subject: IMPALA-7784: Use unescaped string in partition pruning + fix 
duplicatedly unescaping strings
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15278/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15278/3//COMMIT_MSG@17
PS3, Line 17: Tests
HdfsPartitionTest seems to also include related tests.


http://gerrit.cloudera.org:8080/#/c/15278/3//COMMIT_MSG@18
PS3, Line 18:  - Add tests for partition pruning on unescaped strings.
This may be out of scope, but interop tests with Hive could be also useful, 
e.g. can we read each other's partitions and views (with predicates with 
escaped strings).


http://gerrit.cloudera.org:8080/#/c/15278/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/15278/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@42
PS3, Line 42:   private final boolean needsUnescaping_;
optional: I think that the ideal solution would be to have an enum with the 
possible sources of strings, e.g. from double/single quoted SQL, HMS view, or 
the result of an evaluated const expression. This would allow us to check 
whether the assumptions about the source are true + do only the necessary 
conversions.


http://gerrit.cloudera.org:8080/#/c/15278/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@71
PS3, Line 71:   // TODO: should we consider needsUnescaping_ in hashing?
tl;dr: I think that we are ok

hashCode() should match with equals, and equals() is implemented in Expr, and 
is based on matches(), which calls localEquals() in the end:

https://github.com/apache/impala/blob/e0a98df3fabb84cae8c355d046c1aaa14e5bab25/fe/src/main/java/org/apache/impala/analysis/Expr.java#L887

https://github.com/apache/impala/blob/e0a98df3fabb84cae8c355d046c1aaa14e5bab25/fe/src/main/java/org/apache/impala/analysis/Expr.java#L866

I think that we generally violate the a.equals(b)==true -> 
a.hashCode()=b.hashCode() rule, as two Exprs can be equal even if they have 
different types, as equals() ignores implicit casts, but their hashCodes() will 
be totally different. This doesn't seem to be a problem, as I didn't see any 
Map<Expr>

If we only consider StringLiterals, then I think we are ok: localEquals() needs 
all properties (including value_) to be equal, which means that their hashCodes 
will be the same. Adding additional members to hashCode() would just "improve" 
hashing if value_ is the same but other variables differ, which is not a use 
case IMO.

LiteralExpr also shouldn't cause problems, as all subclasses override 
localEquals() and expect the type to be the same.


http://gerrit.cloudera.org:8080/#/c/15278/3/testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test:

http://gerrit.cloudera.org:8080/#/c/15278/3/testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test@25
PS3, Line 25: select * from functional.part_strings_with_quotes where p = "\""
Can you add conjunct with an expression? e.g. AND p=concat("", '"').
The plan should be the same with expression rewrite, as the analyzer should 
discover that the const string literals are the same.



--
To view, visit http://gerrit.cloudera.org:8080/15278
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8070f16a74f9aeade294504f2834abb8b3b38f
Gerrit-Change-Number: 15278
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Mon, 24 Feb 2020 15:12:55 +0000
Gerrit-HasComments: Yes

Reply via email to