[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Todd Lipcon (Code Review) Thu, 09 Apr 2020 20:49:46 -0700

Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )


Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> This is my understanding, I'll check further within Kudu team.
Right, sortedness in kudu is based on the (compound) primary key. So, min-max 
predicates can help in the following ways:
(1) eliminate range partitions -- eg a min/max on a timestamp could would help 
a lot if the table is timestamp-partitioned by day, whereas a bloom filter on 
timestamp is probably not useful since cardinality is very high

(2) convert to primary key-based range scans -- if we have inequality/equality 
predicates on a prefix of the primary key, we can convert those predicates into 
a range scan on the table using b-tree style indexes. Again if you had a 
min-max on timestamp and timestamp were the leading portion of your key, it's a 
big win. Or, if you had a min-max on timestamp, a composite primary key of 
(entity_id, timestamp), and an equality predicate on entity_id, it would still 
do the PK range scan conversion for a big win.

(3) normal row-by-row evaluation - this is still useful since, particularly for 
primitive columns, it's very fast to evaluate (SIMD-accelerated in latest 
release, etc). It's likely much faster to evaluate than a bloom filter which 
will probably have some frequency of L1 cache misses if not L2/LLC, plus the 
hash calculation and mixing itself which likely takes a couple cycles. So, 
you're right that it may not eliminate much data in some queries, but the cost 
might be small enough that it's worth doing even if it eliminates 10-20%.

To validate this I loaded 800M rows into a table using 'kudu perf loadgen' and 
then compared scanning with

$ :./build/latest/bin/kudu perf table_scan localhost 
default.loadgen_auto_0b01a9eba6d54f2ba7fa5b59e3a597c5 --columns=int_val 
--num-threads=10 --predicates ["AND", [">=", "int_val", 0]]

vs the same without the predicate. In this case the predicate matches all rows, 
but there's no special short-circuit logic, so it actually evaluates it 
everywhere. Wall clock times:

with the predicate:             1.7023 +- 0.0266 seconds time elapsed  ( +-  
1.56% )
without the predicate:             1.6514 +- 0.0370 seconds time elapsed  ( +-  
2.24% )

Also running perf record -a, I can see that about 5% of the total CPU was used 
on the tserver in ApplyPredicatePrimitive.

Admittedly, this overhead will go up on a percentage basis after some other 
in-flight work we have is finished (eg more CPU-efficient integer encoding 
bankim's working on) but I think we also have more room to optimize 
ApplyPredicatePrimitive by using AVX2 instead of just SSE4 instructions when 
available.

As for the overhead of sending to the coordinator and broadcasting, isn't this 
a _tiny_ amount of data? ie it's only two values, which for primitives is going 
to be a max of 16 bytes each, and for varlen, you could feasibly truncate to a 
fixed size bound even in the case of a long string.

So, I'm generally in favor of sending and evaluating both. In the best case 
it's a huge win (eliminating range partitions) and in the worst case it's 
probably not a major cost.

One further thought here as a future extension: we could provide an API in the 
Kudu scanner that says "this predicate is optional: drop it if you think it's 
not filtering much". For the case of these propagated predicates feeding into a 
join, dropping them at runtime if they are ineffective is worthwhile (I think 
Impala already does something like this in its scanner path, right?)



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:49:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Reply via email to