ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )
Change subject: Implement BloomFilter Predicate in server side. ...................................................................... Patch Set 11: (15 comments) http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc File src/kudu/common/column_predicate-test.cc: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881 PS9, Line 881: Equali > Should this be 'Equality'? Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230 PS9, Line 230: > Is this used anywhere? If it's test only please indicate that in the doc. It currently not be used, I add this function for potential use, but it seems not needed now. I remove it. Done. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376 PS9, Line 376: size = slice->size(); : data = slice->data(); : } : Slice cell_slice(reinterpret_cast<const uint8_t*>(data), size); : for (const auto& bf : bloom_filters_) { : BloomKeyProbe probe(cell_slice, bf.hash_algorithm()); : if (!BloomFilter(bf.bloom_data(), bf.nhash()).MayContainKey(probe)) { : return false; : } : } > This entire portion could be hoisted outside the for loop, which saves reco Yeah, I correct it. Done. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386 PS9, Line 386: // Check optional lower and upper bound. > Once this for loop is simplified to just be this call, it should further si Because BloomKeyProbe is bf related, so this for loop can not be simplified to be one call and in that condition change it to std::all_of may not needed ? http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115 PS9, Line 115: CHECK(!bfs->empty()); > I think this should be calling Simplify(), or if there's a reason not to pl There is only one condition that InBloomFilter can simplify is that it contains simplifiable bound. And IMO that kind of InBloomFilter should be formed by InBloomFilter and Range merge. When we merge the simplify() is already called. But as I offer this kind of interface to form that InBloomFilter(with bound) directly I should add simplify() call here. Done. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385 PS9, Line 385: }), values_.end()); > Pretty sure this can be simplified to a straight copy, eg: Yeah. Done. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387 PS9, Line 387: Simplify(); > This range portion of this code doesn't need to be duplicated if you move I Got it. Done. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588 PS9, Line 588: CHECK(predicate_type_ == PredicateType::InBloomFilter); > Simplify() should already handle this, so I think you can reduce this to ju Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644 PS9, Line 644: } > can this use copy_if like above? I think it's more elegant that way. Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648 PS9, Line 648: namespace { > likewise, I think this is already handled by the Simplify() call. Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661 PS9, Line 661: if (!sel->IsRowSelected(i)) continue; > likewise I think this can be simplified with the fallthrough trick. Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895 PS9, Line 895: > This could be simplified by moving it above Range and allowing it to fall t Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899 PS9, Line 899: > This is nice, consider doing the same simplification for Range. Done http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto File src/kudu/common/common.proto: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347 PS9, Line 347: > I suggested this back in PS5 but I think there was some confusion. I think Because fs.proto already contains an enum which has UNKNOWN in the same namespace so here I use UNKNOWN_HASH instead. And I find that struct HashBucketSchema doesn't contain a hash algorithm member, it seems that we just ignore the hash algorithm from HashBucketSchemaPB so I think the renaming of UNKNOWN will not cause error. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc@604 PS9, Line 604: if (!bf.has_nhash() > This should also be checking for BloomFilter::UNKNOWN when that variant is Done -- To view, visit http://gerrit.cloudera.org:8080/11100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c Gerrit-Change-Number: 11100 Gerrit-PatchSet: 11 Gerrit-Owner: ZhangYao <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: ZhangYao <[email protected]> Gerrit-Comment-Date: Thu, 11 Oct 2018 05:55:02 +0000 Gerrit-HasComments: Yes
