Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )
Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter ...................................................................... Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/generic_iterators.cc File src/kudu/common/generic_iterators.cc: http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/generic_iterators.cc@1290 PS2, Line 1290: num_rows_before Shouldn't we be measuring this before the call to MaterializeColumn(), which might evaluate some predicates? Otherwise, wouldn't dictionary/RLE encoded cells not be accounted for, even though they still consume CPU to evaluate the BF predicate? http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h File src/kudu/common/predicate_effectiveness.h: http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@30 PS2, Line 30: // Per column predicate effectiveness context. : // Though all column predicate types are tracked, effectiveness of only Bloom filter predicate : // is currently enforced. I don't agree with making this generic for all predicates. It seems like effectiveness/disabling of a predicate only makes sense in the context of a bloom filter because applications that use bloom filters typically accept false positives. That isn't the case for any other kind of predicate, where exact evaluation of the predicate is required. That makes me think this should be a bit more specialized. Did you consider making the IteratorPredicateEffectivenessContext only track "disableable" predicates like bloom filters? eg by storing a {predicate_idx => predicate context} map instead of a vector, or storing a vector of is_disableable bools? http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@38 PS2, Line 38: : type(type), rows_read(0), : rows_rejected(0), enabled(true) { micro-nit: would prefer putting these on a new line (at least I tend to favor narrower code where convenient). http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@83 PS2, Line 83: CheckEffectiveness nit: it isn't really clear from this naming that this also disables the predicates. Maybe rename to DisableIneffectivePredicates() or somesuch? Also, why bother returning the disabled predicates? Seems like they're just used for logging -- can we push the logging into this class, eg by storing a pointer to the predicate in the effectiveness context instead of just the type? http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@88 PS2, Line 88: co-ordinate nit: coordinate -- To view, visit http://gerrit.cloudera.org:8080/16036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd Gerrit-Change-Number: 16036 Gerrit-PatchSet: 2 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Wed, 10 Jun 2020 02:12:41 +0000 Gerrit-HasComments: Yes
