ZhangYao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 11:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881
PS9, Line 881: Equali
> Should this be 'Equality'?
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230
PS9, Line 230:
> Is this used anywhere?  If it's test only please indicate that in the doc.
It currently not be used, I add this function for potential use, but it seems 
not needed now. I remove it. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376
PS9, Line 376:       size = slice->size();
             :       data = slice->data();
             :     }
             :     Slice cell_slice(reinterpret_cast<const uint8_t*>(data), 
size);
             :     for (const auto& bf : bloom_filters_) {
             :       BloomKeyProbe probe(cell_slice, bf.hash_algorithm());
             :       if (!BloomFilter(bf.bloom_data(), 
bf.nhash()).MayContainKey(probe)) {
             :         return false;
             :       }
             :     }
> This entire portion could be hoisted outside the for loop, which saves reco
Yeah, I correct it. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386
PS9, Line 386:     // Check optional lower and upper bound.
> Once this for loop is simplified to just be this call, it should further si
Because BloomKeyProbe is bf related, so this for loop can not be simplified to 
be one call and in that condition change it to std::all_of may not needed ?


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115
PS9, Line 115:   CHECK(!bfs->empty());
> I think this should be calling Simplify(), or if there's a reason not to pl
There is only one condition that InBloomFilter can simplify is that it contains 
simplifiable bound. And IMO that kind of InBloomFilter should be formed by 
InBloomFilter and Range merge. When we merge the simplify() is already called. 
But as I offer this kind of interface to form that InBloomFilter(with bound) 
directly I should add simplify() call here. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385
PS9, Line 385:                                    }), values_.end());
> Pretty sure this can be simplified to a straight copy, eg:
Yeah. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387
PS9, Line 387:       Simplify();
> This range portion of this code doesn't need to be duplicated if you move I
Got it. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588
PS9, Line 588:   CHECK(predicate_type_ == PredicateType::InBloomFilter);
> Simplify() should already handle this, so I think you can reduce this to ju
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644
PS9, Line 644:   }
> can this use copy_if like above?  I think it's more elegant that way.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648
PS9, Line 648: namespace {
> likewise, I think this is already handled by the Simplify() call.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661
PS9, Line 661:       if (!sel->IsRowSelected(i)) continue;
> likewise I think this can be simplified with the fallthrough trick.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895
PS9, Line 895:
> This could be simplified by moving it above Range and allowing it to fall t
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899
PS9, Line 899:
> This is nice, consider doing the same simplification for Range.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347
PS9, Line 347:
> I suggested this back in PS5 but I think there was some confusion.  I think
Because fs.proto already contains an enum which has UNKNOWN in the same 
namespace so here I use UNKNOWN_HASH instead. And I find that struct 
HashBucketSchema doesn't contain a hash algorithm member, it seems that we just 
ignore the hash algorithm from HashBucketSchemaPB so I think the renaming of 
UNKNOWN will not cause error.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc@604
PS9, Line 604:         if (!bf.has_nhash()
> This should also be checking for BloomFilter::UNKNOWN when that variant is
Done



--
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 11
Gerrit-Owner: ZhangYao <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: ZhangYao <[email protected]>
Gerrit-Comment-Date: Thu, 11 Oct 2018 05:55:02 +0000
Gerrit-HasComments: Yes

Reply via email to