Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )
Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu ...................................................................... Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124 PS3, Line 124: BLOOM_MIN_MAX = 2 > This is my understanding, I'll check further within Kudu team. Right, sortedness in kudu is based on the (compound) primary key. So, min-max predicates can help in the following ways: (1) eliminate range partitions -- eg a min/max on a timestamp could would help a lot if the table is timestamp-partitioned by day, whereas a bloom filter on timestamp is probably not useful since cardinality is very high (2) convert to primary key-based range scans -- if we have inequality/equality predicates on a prefix of the primary key, we can convert those predicates into a range scan on the table using b-tree style indexes. Again if you had a min-max on timestamp and timestamp were the leading portion of your key, it's a big win. Or, if you had a min-max on timestamp, a composite primary key of (entity_id, timestamp), and an equality predicate on entity_id, it would still do the PK range scan conversion for a big win. (3) normal row-by-row evaluation - this is still useful since, particularly for primitive columns, it's very fast to evaluate (SIMD-accelerated in latest release, etc). It's likely much faster to evaluate than a bloom filter which will probably have some frequency of L1 cache misses if not L2/LLC, plus the hash calculation and mixing itself which likely takes a couple cycles. So, you're right that it may not eliminate much data in some queries, but the cost might be small enough that it's worth doing even if it eliminates 10-20%. To validate this I loaded 800M rows into a table using 'kudu perf loadgen' and then compared scanning with $ :./build/latest/bin/kudu perf table_scan localhost default.loadgen_auto_0b01a9eba6d54f2ba7fa5b59e3a597c5 --columns=int_val --num-threads=10 --predicates ["AND", [">=", "int_val", 0]] vs the same without the predicate. In this case the predicate matches all rows, but there's no special short-circuit logic, so it actually evaluates it everywhere. Wall clock times: with the predicate: 1.7023 +- 0.0266 seconds time elapsed ( +- 1.56% ) without the predicate: 1.6514 +- 0.0370 seconds time elapsed ( +- 2.24% ) Also running perf record -a, I can see that about 5% of the total CPU was used on the tserver in ApplyPredicatePrimitive. Admittedly, this overhead will go up on a percentage basis after some other in-flight work we have is finished (eg more CPU-efficient integer encoding bankim's working on) but I think we also have more room to optimize ApplyPredicatePrimitive by using AVX2 instead of just SSE4 instructions when available. As for the overhead of sending to the coordinator and broadcasting, isn't this a _tiny_ amount of data? ie it's only two values, which for primitives is going to be a max of 16 bytes each, and for varlen, you could feasibly truncate to a fixed size bound even in the case of a long string. So, I'm generally in favor of sending and evaluating both. In the best case it's a huge win (eliminating range partitions) and in the worst case it's probably not a major cost. One further thought here as a future extension: we could provide an API in the Kudu scanner that says "this predicate is optional: drop it if you think it's not filtering much". For the case of these propagated predicates feeding into a join, dropping them at runtime if they are ineffective is worthwhile (I think Impala already does something like this in its scanner path, right?) -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Fri, 10 Apr 2020 03:49:11 +0000 Gerrit-HasComments: Yes
