Taras Bobrovytsky has uploaded this change for review. (
http://gerrit.cloudera.org:8080/10910
Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................
IMPALA-2422: Fix escaping in the LIKE clause
There are two stages to processing a like clause. First, we determine if
it is possible to convert the expression to a simpler function, such as
StartsWith() or EndsWith(). If not, then we use a Regex libarary to
compute the result.
There was a problem in the logic that determines if it is possible to
use a simpler function. It did not take into account escape characters.
For example, "abc\%" was incorrectly converted to StartsWith("abc\").
There was another problem. We always unescaped strings in the frontend.
The RE2 regex function also unescapes the regex before proceeding. So
regexes were unescaped twice, which caused some ambiguity. For example,
"abc\%" and "abc\\%" are unescaped in the frontend and the same pattern,
"abc\%" is sent to the backend. The backend could not decide if this
pattern is an exact or prefix match. To fix this problem, we avoid
unescaping regex pattens in the frontend.
Testing:
-Added expr tests.
Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
---
M be/src/exprs/expr-test.cc
M be/src/exprs/like-predicate.cc
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
5 files changed, 77 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10910/1
--
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky <[email protected]>