Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15913

to look at the new patch set (#5).

Change subject: [perf] Check range predicate first while evaluating Bloom 
filter predicate
......................................................................

[perf] Check range predicate first while evaluating Bloom filter predicate

Range predicates can be specified along with Bloom filter predicate
for the same column. It's cheaper to check against range
predicate and exit early if the column value is out of bounds
compared to computing hash and then looking up the value in Bloom filter.

This case is common when Impala pushes down Bloom filter
predicate as it'll likely be accompained by min-max filter (i.e. range
predicate) on the same column.

Tests:
Added a test case that scans against 1M column values with a range predicate
and Bloom filter predicate. In one case, with a range predicate that yields
no rows and other with a range predicate that yields all rows.

Modified the test case to run against 100M rows on a 48 core
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz with 94GB of memory.

Across iterations observed an improvement of 20-30% when the range predicate
yields no rows preventing hash computation and Bloom filter lookup.
Don't see any noticeable regression for the case where values are within
range bounds.

Without perf change:
 Counting rows with a range predicate less than the min value: real 0.953s user 
0.001s sys 0.000s
 Counting rows with a range predicate that includes all values: real 0.767s 
user 0.001s sys 0.000s

 Counting rows with a range predicate less than the min value: real 0.899s user 
0.000s sys 0.000s
 Counting rows with a range predicate that includes all values: real 0.775s 
user 0.000s sys 0.001s

 Counting rows with a range predicate less than the min value: real 0.983s user 
0.000s sys 0.000s
 Counting rows with a range predicate that includes all values: real 0.832s 
user 0.001s sys 0.000s

With perf change:
 Counting rows with a range predicate less than the min value: real 0.725s user 
0.001s sys 0.000s
 Counting rows with a range predicate that includes all values: real 0.847s 
user 0.000s sys 0.000s

 Counting rows with a range predicate less than the min value: real 0.664s user 
0.000s sys 0.000s
 Counting rows with a range predicate that includes all values: real 0.794s 
user 0.001s sys 0.000s

 Counting rows with a range predicate less than the min value: real 0.706s user 
0.001s sys 0.000s
 Counting rows with a range predicate that includes all values: real 0.774s 
user 0.000s sys 0.000s

Change-Id: I8451d6ddfe1fbdf307b3e9f2cc23a8d06e655ba3
---
M src/kudu/client/predicate-test.cc
M src/kudu/common/column_predicate.h
2 files changed, 113 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/13/15913/5
--
To view, visit http://gerrit.cloudera.org:8080/15913
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8451d6ddfe1fbdf307b3e9f2cc23a8d06e655ba3
Gerrit-Change-Number: 15913
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

Reply via email to