Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/23932

to look at the new patch set (#9).

Change subject: IMPALA-12374: Optimize trailing/leading % in LIKE
......................................................................

IMPALA-12374: Optimize trailing/leading % in LIKE

When converting LIKE containing a trailing %, leading %, or both,
to a regular expression, use partial match (with anchors as necessary)
in re2 with '.*' trimmed, instead of a full match with trailing or
leading '.*'.

Note that this optimization only concerns more complex patterns,
e.g. '%a%b%'.
Patterns where the trimmed pattern is a fixed string already use more
optimized checks, like a string search, e.g. '%abc%'.

This optimization can make LIKE matching faster, especially if the
trimmed % covers a long part of the string matched.
The performance gain is highest with both leading and trailing %,
and the lowest with only a trailing %.

In expr-benchmark.cc, a new function BenchmarkLikeRegexp was added to
compare LIKE and regexp_like especially in the relevant cases.
In these tests, a string of 100 characters are used to match the
trailing/leading % wildcard.

Before the change, the performance of the test cases are:

                 Function  iters/ms   10%ile   50%ile   90%ile
--------------------------------------------------------------

                    like               10.7     10.8     10.9
                   regex               10.7     10.8     10.9
            leading like               18.8       19     19.1
           leading regex               68.4     69.4     69.9
           trailing like               16.2     16.3     16.6
          trailing regex               18.6     18.9     19.1
   trailing leading like               9.56      9.6     9.77
  trailing leading regex               63.5     64.3     65.1

After the change, the performance of LIKE and regexp_like is about the
same in the relevant cases:

                 Function  iters/ms   10%ile   50%ile   90%ile
--------------------------------------------------------------
                    like               10.7     10.8     10.9
                   regex               10.7     10.8     10.9
            leading like               67.9     68.7     69.3
           leading regex               67.4     68.3     69.1
           trailing like               18.5     18.9       19
          trailing regex               18.7     18.9     19.1
   trailing leading like               63.1     63.9     64.6
  trailing leading regex               63.5     63.9     64.8

Change-Id: I37b472e056f791035d25633f17ad8a6e841cdd18
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/like-predicate.h
3 files changed, 114 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/23932/9
--
To view, visit http://gerrit.cloudera.org:8080/23932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37b472e056f791035d25633f17ad8a6e841cdd18
Gerrit-Change-Number: 23932
Gerrit-PatchSet: 9
Gerrit-Owner: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Balazs Hevele <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to