Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15913 to look at the new patch set (#5). Change subject: [perf] Check range predicate first while evaluating Bloom filter predicate ...................................................................... [perf] Check range predicate first while evaluating Bloom filter predicate Range predicates can be specified along with Bloom filter predicate for the same column. It's cheaper to check against range predicate and exit early if the column value is out of bounds compared to computing hash and then looking up the value in Bloom filter. This case is common when Impala pushes down Bloom filter predicate as it'll likely be accompained by min-max filter (i.e. range predicate) on the same column. Tests: Added a test case that scans against 1M column values with a range predicate and Bloom filter predicate. In one case, with a range predicate that yields no rows and other with a range predicate that yields all rows. Modified the test case to run against 100M rows on a 48 core Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz with 94GB of memory. Across iterations observed an improvement of 20-30% when the range predicate yields no rows preventing hash computation and Bloom filter lookup. Don't see any noticeable regression for the case where values are within range bounds. Without perf change: Counting rows with a range predicate less than the min value: real 0.953s user 0.001s sys 0.000s Counting rows with a range predicate that includes all values: real 0.767s user 0.001s sys 0.000s Counting rows with a range predicate less than the min value: real 0.899s user 0.000s sys 0.000s Counting rows with a range predicate that includes all values: real 0.775s user 0.000s sys 0.001s Counting rows with a range predicate less than the min value: real 0.983s user 0.000s sys 0.000s Counting rows with a range predicate that includes all values: real 0.832s user 0.001s sys 0.000s With perf change: Counting rows with a range predicate less than the min value: real 0.725s user 0.001s sys 0.000s Counting rows with a range predicate that includes all values: real 0.847s user 0.000s sys 0.000s Counting rows with a range predicate less than the min value: real 0.664s user 0.000s sys 0.000s Counting rows with a range predicate that includes all values: real 0.794s user 0.001s sys 0.000s Counting rows with a range predicate less than the min value: real 0.706s user 0.001s sys 0.000s Counting rows with a range predicate that includes all values: real 0.774s user 0.000s sys 0.000s Change-Id: I8451d6ddfe1fbdf307b3e9f2cc23a8d06e655ba3 --- M src/kudu/client/predicate-test.cc M src/kudu/common/column_predicate.h 2 files changed, 113 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/13/15913/5 -- To view, visit http://gerrit.cloudera.org:8080/15913 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8451d6ddfe1fbdf307b3e9f2cc23a8d06e655ba3 Gerrit-Change-Number: 15913 Gerrit-PatchSet: 5 Gerrit-Owner: Bankim Bhavsar <ban...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)