Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15278 )
Change subject: IMPALA-7784: Use unescaped string in partition pruning + fix duplicatedly unescaping strings ...................................................................... Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/15278/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15278/3//COMMIT_MSG@17 PS3, Line 17: Tests HdfsPartitionTest seems to also include related tests. http://gerrit.cloudera.org:8080/#/c/15278/3//COMMIT_MSG@18 PS3, Line 18: - Add tests for partition pruning on unescaped strings. This may be out of scope, but interop tests with Hive could be also useful, e.g. can we read each other's partitions and views (with predicates with escaped strings). http://gerrit.cloudera.org:8080/#/c/15278/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java: http://gerrit.cloudera.org:8080/#/c/15278/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@42 PS3, Line 42: private final boolean needsUnescaping_; optional: I think that the ideal solution would be to have an enum with the possible sources of strings, e.g. from double/single quoted SQL, HMS view, or the result of an evaluated const expression. This would allow us to check whether the assumptions about the source are true + do only the necessary conversions. http://gerrit.cloudera.org:8080/#/c/15278/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@71 PS3, Line 71: // TODO: should we consider needsUnescaping_ in hashing? tl;dr: I think that we are ok hashCode() should match with equals, and equals() is implemented in Expr, and is based on matches(), which calls localEquals() in the end: https://github.com/apache/impala/blob/e0a98df3fabb84cae8c355d046c1aaa14e5bab25/fe/src/main/java/org/apache/impala/analysis/Expr.java#L887 https://github.com/apache/impala/blob/e0a98df3fabb84cae8c355d046c1aaa14e5bab25/fe/src/main/java/org/apache/impala/analysis/Expr.java#L866 I think that we generally violate the a.equals(b)==true -> a.hashCode()=b.hashCode() rule, as two Exprs can be equal even if they have different types, as equals() ignores implicit casts, but their hashCodes() will be totally different. This doesn't seem to be a problem, as I didn't see any Map<Expr> If we only consider StringLiterals, then I think we are ok: localEquals() needs all properties (including value_) to be equal, which means that their hashCodes will be the same. Adding additional members to hashCode() would just "improve" hashing if value_ is the same but other variables differ, which is not a use case IMO. LiteralExpr also shouldn't cause problems, as all subclasses override localEquals() and expect the type to be the same. http://gerrit.cloudera.org:8080/#/c/15278/3/testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test File testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test: http://gerrit.cloudera.org:8080/#/c/15278/3/testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test@25 PS3, Line 25: select * from functional.part_strings_with_quotes where p = "\"" Can you add conjunct with an expression? e.g. AND p=concat("", '"'). The plan should be the same with expression rewrite, as the analyzer should discover that the const string literals are the same. -- To view, visit http://gerrit.cloudera.org:8080/15278 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea8070f16a74f9aeade294504f2834abb8b3b38f Gerrit-Change-Number: 15278 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Mon, 24 Feb 2020 15:12:55 +0000 Gerrit-HasComments: Yes
