Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23932
to look at the new patch set (#6).
Change subject: IMPALA-12374: Optimize trailing/leading % in LIKE
......................................................................
IMPALA-12374: Optimize trailing/leading % in LIKE
When converting LIKE containing a trailing %, leading %, or both,
to a regular expression, use partial match (with anchors as necessary)
in re2 with '.*' trimmed, instead of a full match with trailing or
leading '.*'.
Note that this optimization only concerns more complex patterns,
e.g. '%a%b%'.
Patterns where the trimmed pattern is a fixed string already use more
optimized checks, like a string search, e.g. '%abc%'.
This optimization can make LIKE matching faster, especially if the
trimmed % covers a long part of the string matched.
The performance gain is highest with both leading and trailing %,
and the lowest with only a trailing %.
In expr-benchmark.cc, a new function BenchmarkLikeRegexp was added to
compare LIKE and regexp_like especially in the relevant cases.
In these tests, a string of 100 characters are used to match the
trailing/leading % wildcard.
Before the change, the performance of the test cases are:
Function iters/ms 10%ile 50%ile 90%ile
--------------------------------------------------------------
like 10.7 10.8 10.9
regex 10.7 10.8 10.9
leading like 18.8 19 19.1
leading regex 68.4 69.4 69.9
trailing like 16.2 16.3 16.6
trailing regex 18.6 18.9 19.1
trailing leading like 9.56 9.6 9.77
trailing leading regex 63.5 64.3 65.1
After the change, the performance of LIKE and regexp_like is about the
same in the relevant cases.
Change-Id: I37b472e056f791035d25633f17ad8a6e841cdd18
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/like-predicate.h
3 files changed, 102 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/23932/6
--
To view, visit http://gerrit.cloudera.org:8080/23932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37b472e056f791035d25633f17ad8a6e841cdd18
Gerrit-Change-Number: 23932
Gerrit-PatchSet: 6
Gerrit-Owner: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>