[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-19 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Reviewed-on: http://gerrit.cloudera.org:8080/15034
Reviewed-by: Adar Dembo 
Tested-by: Adar Dembo 
Reviewed-by: helifu 
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util-test.cc
M src/kudu/util/hash_util.h
15 files changed, 488 insertions(+), 373 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved; Verified
  helifu: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 14
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-13 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 13: Code-Review+2

> Leaving open in case Lifu (or others) have more comments.

Thanks. It looks good to me apart from the impact of the evaluation (it's very 
CPU intensive).

It looks good to me


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 14 Feb 2020 03:11:51 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-13 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 13: Verified+1 Code-Review+2

Leaving open in case Lifu (or others) have more comments.


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 14 Feb 2020 01:51:16 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-13 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 13:

Could the reviewers take a look at the latest patchset please?


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 13 Feb 2020 19:20:08 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-12 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 13:

> Patch Set 13: Verified-1
>
> Build Failed
>
> http://jenkins.kudu.apache.org/job/kudu-gerrit/20374/ : FAILURE

Unrelated test failure due to a RPC timeout.

 
/home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/location_assignment-itest.cc:187
 Failed
 Bad status: Timed out: ListTabletServers RPC failed: ListTabletServers RPC to 
127.1.75.62:41813 timed out after 0.736s (SENT)


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 12 Feb 2020 19:49:18 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-09 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 13:

> > In general, the bloom filter is used as run-time filter,
 > especially in a hash-join case. For a computer engine, such as
 > impala, it will wait 1 second(default) to produce the bloom filter
 > and then push it down. So, I have another suggestion: maybe we can
 > push it down in the middle of the scan while the filter is not
 > arrived within the specified interval. It should be in the next
 > patch^_^
 > >
 > > computer engine -> kudu client -> kudu server
 > >[1][2][3]
 > >throw away
 > >[0]
 >
 > That's interesting: are you suggesting we need the ability to add a
 > bloom filter predicate to an ongoing scan, rather than just when
 > starting a new scan? There's a lot of existing machinery that
 > treats predicates as just another aspect of scan configuration
 > (along with e.g. projections), to be applied to a new scanner but
 > immutable after that. I think it'd be a fair amount of work to
 > change that.
 >
 > As an alternative, would it be possible to delay the onset of a
 > Kudu scan from Impala until the bloom filter can be constructed? Or
 > will doing that stall the entire query pipeline?

Nope, I mean that adding a bloom filter predicate to an ongoing scan is a new 
independent feature, and we can implement it in the next patch.
No, it's impossible to delay the onset of a kudu scan. For example, if the time 
to wait for the bloom filter is greater than the time to scan all of the data, 
the waiting would not be the best option.


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 10 Feb 2020 01:49:09 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-07 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 13:

> In general, the bloom filter is used as run-time filter, especially in a 
> hash-join case. For a computer engine, such as impala, it will wait 1 
> second(default) to produce the bloom filter and then push it down. So, I have 
> another suggestion: maybe we can push it down in the middle of the scan while 
> the filter is not arrived within the specified interval. It should be in the 
> next patch^_^
>
> computer engine -> kudu client -> kudu server
>[1][2][3]
>throw away
>[0]

That's interesting: are you suggesting we need the ability to add a bloom 
filter predicate to an ongoing scan, rather than just when starting a new scan? 
There's a lot of existing machinery that treats predicates as just another 
aspect of scan configuration (along with e.g. projections), to be applied to a 
new scanner but immutable after that. I think it'd be a fair amount of work to 
change that.

As an alternative, would it be possible to delay the onset of a Kudu scan from 
Impala until the bloom filter can be constructed? Or will doing that stall the 
entire query pipeline?


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Sat, 08 Feb 2020 05:53:43 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-07 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#13).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util-test.cc
M src/kudu/util/hash_util.h
15 files changed, 488 insertions(+), 373 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/13
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-07 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 12:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.h@379
PS12, Line 379: std::vector bloom_filters_;
> I have discuss with helifu online a long time ago. A column contains more t
Clients supplying multiple Bloom filters would be very rare in my opinion too.  
As Zhang mentioned, different hash algorithms can be used for the supplied 
Bloom filters which theoretically improves the false positive probability rate 
as the element in the table must be found in ALL the Bloom filters which 
effectively translates to AND operation.

Providing multiple Bloom filters keeps the client more future-proof. I don't 
have a strong opinion on accepting multiple Bloom filters in the predicate. I'm 
basically keeping that part same as the original implementation.


http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/column_predicate.cc@113
PS11, Line 113: std::
> remove "std::"
Done


http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.cc@119
PS12, Line 119: pred.Simplify();
> I think we need to check the size(GetSpaceUsed) of bloom filter because the
The RPC transfer layer in such a case will return an error. I don't think 
adding such RPC payload size validation in the bloom filter predicate seems 
right.


http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/util/hash_util.h
File src/kudu/util/hash_util.h:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/util/hash_util.h@122
PS12, Line 122:   case FAST_HASH:
> Why do you select FAST_HASH? Does it work faster than murmur or cityhash? A
See this change: https://gerrit.cloudera.org/c/14934/

I took a look at results published by smhasher and compared the results with 
hashing functions with no quality problems and ones Kudu implements currently 
namely Murmur2, Cityhash, FastHash(Impala uses it), new ones Spooky, t1ha2, 
wyhash.

https://github.com/rurban/smhasher#smhasher

Columns: https://github.com/rurban/smhasher#columns

Linked doc: https://drive.google.com/open?id=16jXDZWFADpt9MJ0pXmqOIweMaIZBi1F2

- Murmur2 has a bunch of quality problems as noted.
- CityHash can be twice as fast as Fasthash but has quality problems except the 
one where only lower 32-bits of 64-bit hash function are considered. However 
size of CityHash is nearly 2x. Not sure I understand the "size" of the object 
here but the lower the better. So might affect fitting in cache?
- Spooky is among the fastest but has among the largest size.
- t1ha2_atonce (odd-name) is one of the fastest and lower on size. One caveat 
is implementation looks architecture specific and will give high performance 
only on little-endian machines. Not sure we support any big-endian 
architectures? Itt might be tricky to import and use.
- wyhash is fast but has larger size.

So overall FastHash and Cityhash look good choices, with Fasthash edging out 
Cityhash due to its size?



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Sat, 08 Feb 2020 00:02:23 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-06 Thread ZhangYao (Code Review)
ZhangYao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.h@379
PS12, Line 379: std::vector bloom_filters_;
> In my understanding, there should be at most one bloom filter in a predicat
I have discuss with helifu online a long time ago. A column contains more than 
one bloomfilter is rare. At the very beginning, I use std::vector because it 
can support more than one hash algorithm and sometimes it will be useful. Will 
we support more hash algorithms in the future?


http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/util/hash_util.h
File src/kudu/util/hash_util.h:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/util/hash_util.h@122
PS12, Line 122:   case FAST_HASH:
Why do you select FAST_HASH? Does it work faster than murmur or cityhash? And 
do we have FAST_HASH in java client?



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 06 Feb 2020 15:00:58 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-06 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 12:

(3 comments)

Good job, Bankim!
In general, the bloom filter is used as run-time filter, especially in a 
hash-join case. For a computer engine, such as impala, it will wait 1 
second(default) to produce the bloom filter and then push it down. So, I have 
another suggestion: maybe we can push it down in the middle of the scan while 
the filter is not arrived within the specified interval. It should be in the 
next patch^_^

computer engine -> kudu client -> kudu server
   [1][2][3]
   throw away
   [0]

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.h@379
PS12, Line 379: std::vector bloom_filters_;
In my understanding, there should be at most one bloom filter in a predicate. 
Even if in a particular case, someone wants to push down some bloom filters on 
the same column at the same time, these bloom filters should have the same 
"ndv" and "log_space_bytes" for the current tablet/table, and all of them could 
be merged into one(AND operation, OR operation is not supported yet.). But in 
fact, nobody will do that since the bloom filter will take lots of CPU cycles 
while evaluating.
So, could you please share your thoughts?


http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/column_predicate.cc@113
PS11, Line 113: std::
remove "std::"


http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/15034/12/src/kudu/common/column_predicate.cc@119
PS12, Line 119: pred.Simplify();
I think we need to check the size(GetSpaceUsed) of bloom filter because the 
default value of 'rpc_max_message_size' is 50Mib.



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 06 Feb 2020 12:31:40 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 05 Feb 2020 22:48:41 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-05 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#12).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util-test.cc
M src/kudu/util/hash_util.h
15 files changed, 488 insertions(+), 373 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/12
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-05 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc@623
PS11, Line 623:   auto allocator = 
arena->NewObject(arena);
> Oh, this should be "auto*" then; I assumed 'allocator' was a stack allocate
Done



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 11
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 05 Feb 2020 22:35:35 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-03 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc@623
PS11, Line 623:   auto allocator = 
arena->NewObject(arena);
> "allocator" is allocated in the arena like on heap and isn't deleted on exi
Oh, this should be "auto*" then; I assumed 'allocator' was a stack allocated 
object.



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 11
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 03 Feb 2020 19:35:41 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-02-03 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc@623
PS11, Line 623:   auto allocator = 
arena->NewObject(arena);
> Looking at the code again, I'm a little confused as to how this works, beca
"allocator" is allocated in the arena like on heap and isn't deleted on exiting 
the scope.

So the "allocator" like the bloom filters allocated in the arena will be valid 
till the arena is destructed.



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 11
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 03 Feb 2020 18:29:16 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-31 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc@623
PS11, Line 623:   auto allocator = 
arena->NewObject(arena);
> IIUC, it's OK to "orphan" the allocator object (previously we would faithfu
Looking at the code again, I'm a little confused as to how this works, because 
the BBF destructor leads to a call to buffer_allocator_->FreeBuffer(). If 
'allocator' here has popped off the stack, why doesn't the FreeBuffer() call 
corrupt memory or crash the process?



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 11
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 31 Jan 2020 23:17:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-31 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/11/src/kudu/common/wire_protocol.cc@623
PS11, Line 623:   auto allocator = 
arena->NewObject(arena);
IIUC, it's OK to "orphan" the allocator object (previously we would faithfully 
preserve it alongside the BlockBloomFilters) because:
1. The arena used to allocate the bfs is preserved for the lifetime of the 
scan, which de facto exceeds the lifetime of the bfs themselves.
2. We don't need to make any further allocations in the bfs themselves, so it's 
OK if their allocator pointers are garbage.

That last point gives me pause; it seems somewhat error prone to have live 
objects associated to garbage data. Perhaps we could mutate the bfs at this 
point to nullify their allocator pointers, so that any future allocations 
trigger a DCHECK?



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 11
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 31 Jan 2020 23:15:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-31 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#11).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util-test.cc
M src/kudu/util/hash_util.h
15 files changed, 488 insertions(+), 373 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/11
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 11
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 8: Verified+1 Code-Review+2

Looks good to me, though not merging yet since I think you wanted to collect 
additional reviews?


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 8
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 21 Jan 2020 22:33:04 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-21 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 8
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-21 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 8:

> Patch Set 8: Verified-1
>
> Build Failed
>
> http://jenkins.kudu.apache.org/job/kudu-gerrit/20217/ : FAILURE

 Bad status: Timed out: ListTabletServers RPC failed: ListTabletServers RPC to 
127.1.100.62:38687 timed out after 0.430s (SENT)

Unrelated test failure: 
http://jenkins.kudu.apache.org/job/kudu-gerrit/20217/BUILD_TYPE=RELEASE/testReport/junit/(root)/TsLocationAssignmentITest/Basic_3/


--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 8
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 21 Jan 2020 22:20:09 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-21 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#8).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util-test.cc
M src/kudu/util/hash_util.h
15 files changed, 514 insertions(+), 390 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/8
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 8
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-21 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15034/7/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/15034/7/src/kudu/util/block_bloom_filter.cc@74
PS7, Line 74:   DCHECK(directory_ == nullptr) <<
: "Close() should have been called before the object is 
destroyed.";
> If the destructor is going to call Close(), doesn't seem like we need this
Done


http://gerrit.cloudera.org:8080/#/c/15034/7/src/kudu/util/block_bloom_filter.cc@220
PS7, Line 220: void BlockBloomFilter::Insert(const Slice& key) noexcept {
 :   Insert(HashUtil::ComputeHash32(key, hash_algorithm_, 
hash_seed_));
 : }
> This and Find() could be inlined in the header.
Done



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 21 Jan 2020 21:42:08 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-17 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15034/7/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/15034/7/src/kudu/util/block_bloom_filter.cc@74
PS7, Line 74:   DCHECK(directory_ == nullptr) <<
: "Close() should have been called before the object is 
destroyed.";
If the destructor is going to call Close(), doesn't seem like we need this 
DHECK anymore.


http://gerrit.cloudera.org:8080/#/c/15034/7/src/kudu/util/block_bloom_filter.cc@220
PS7, Line 220: void BlockBloomFilter::Insert(const Slice& key) noexcept {
 :   Insert(HashUtil::ComputeHash32(key, hash_algorithm_, 
hash_seed_));
 : }
This and Find() could be inlined in the header.



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Sat, 18 Jan 2020 00:28:34 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-17 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#7).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util.h
14 files changed, 480 insertions(+), 384 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/7
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-17 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#6).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate to use the
BlockBloomFilter for the BloomFilter predicate on the server side.

Earlier implementation was still experimental and didn't provide public client
APIs that actually use this BloomFilter predicate so taken the liberty to make
incompatible wire protocol changes.

Updated BlockBloomFilter to take hash_algorithm and hash_seed.
This make serializing and deserializing the BlockBloomFilter convenient and
removes the need of BloomFilterInner in ColumnPredicate.
Added overloaded Insert()/Find() functions to BlockBloomFilter that take Slice
parameter and hashes the key before insertion/lookup.

Most of the change involves refactoring the implementation including the
unit tests.

Currently only FAST_HASH algorithm is supported since 32-bit versions of
MURMUR2 and CITY_HASH are not currently implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util.h
14 files changed, 503 insertions(+), 368 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/6
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-17 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h@152
PS4, Line 152:   std::vector&& bfs,
> I see. That runs afoul of the Google Style Guide's rules on rvalue referenc
Agreeing a bit reluctantly.

Though one benefit I see is that it makes writing unit tests easier and avoids 
creating copies :)


http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/common/wire_protocol.cc@633
PS5, Line 633: if (!bf_src.has_log_space_bytes() || 
!bf_src.has_bloom_data() ||
 : !bf_src.has_hash_algorithm() || 
bf_src.hash_algorithm() == UNKNOWN_HASH ||
 : !bf_src.has_always_false()) {
 :   return Status::InvalidArgument("Invalid in bloom 
filter predicate on column: "
 :  "missing bloom filter 
details", col.name());
 : }
> Should this be moved into BlockBloomFilter::InitFromPB?
Thanks for bringing this up.

BlockBloomFilter doesn't contain the hash_algorithm and hash_seed fields and 
hence these checks were outside in the caller here.
I refactored the code to include hash_algorithm and hash_seed inside 
BlockBloomFilter. This way BlockBloomFilter can be completely initialized from 
the protobuf message and makes it clean. Also removes the need of 
BloomFilterInner which looked like a crutch.

Now that hash_algorithm and hash_seed are known to BlockBloomFilter, added 
Insert() and Find() convenience functions in BlockBloomFilter that takes a 
Slice and hashes prior to insertion/lookup.

To keep BlockBloomFilter usable from Impala, direct insertion and lookup using 
32-bit hash values is still made available. Also there is a slight performance 
optimization in case hash of the key is already known.

There is possible risk of caller using direct 32-bit hash values but specifies 
different hash_algorithm to the BlockBloomFilter. To reduce that risk, made the 
constructor explicitly specify the hash function and seed.


http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/CMakeLists.txt@21
PS5, Line 21:
> Can you fix the indentation for the continuation lines? Check out how the o
Done


http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/block_bloom_filter.cc@113
PS5, Line 113: memcpy(directory_, bf_src.bloom_data().data(), 
bf_src.bloom_data().size());
> Shouldn't we just do this copy to faithfully respect the state of bf_src, r
Done



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 17 Jan 2020 23:15:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h@152
PS4, Line 152:   std::vector&& bfs,
> I understand but using rvalue reference as type for bfs forces caller to pa
I see. That runs afoul of the Google Style Guide's rules on rvalue references 
though: https://google.github.io/styleguide/cppguide.html#Rvalue_references

If you grep across the Kudu codebase, all of the rvalue reference argument 
types are either for move constructors, move assignment operators, or perfect 
forwarding.


http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/common/wire_protocol.cc@633
PS5, Line 633: if (!bf_src.has_log_space_bytes() || 
!bf_src.has_bloom_data() ||
 : !bf_src.has_hash_algorithm() || 
bf_src.hash_algorithm() == UNKNOWN_HASH ||
 : !bf_src.has_always_false()) {
 :   return Status::InvalidArgument("Invalid in bloom 
filter predicate on column: "
 :  "missing bloom filter 
details", col.name());
 : }
Should this be moved into BlockBloomFilter::InitFromPB?


http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/CMakeLists.txt@21
PS5, Line 21:
Can you fix the indentation for the continuation lines? Check out how the other 
PROTOBUF_GENERATE_CPP do it in this file.


http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/15034/5/src/kudu/util/block_bloom_filter.cc@113
PS5, Line 113: memcpy(directory_, bf_src.bloom_data().data(), 
bf_src.bloom_data().size());
Shouldn't we just do this copy to faithfully respect the state of bf_src, 
regardless of the value of always_false?



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 17 Jan 2020 00:31:31 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-16 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#5).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate
to use the BlockBloomFilter for the BloomFilter predicate
on the server side.

Earlier implementation was still experimental and didn't provide public
client APIs that actually use this BloomFilter predicate so taken the
liberty to make incompatible wire protocol changes.

Most of the change involves refactoring the implementation
including the unit tests.

Currently only FAST_HASH algorithm is supported since
32-bit versions of MURMUR2 and CITY_HASH are not currently
implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/CMakeLists.txt
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter.proto
M src/kudu/util/hash.proto
M src/kudu/util/hash_util.h
13 files changed, 476 insertions(+), 314 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/5
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-16 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h@152
PS4, Line 152:   std::vector&& bfs,
> Here and below you don't need the rvalue reference; could just pass std::ve
I understand but using rvalue reference as type for bfs forces caller to pass 
rvalue(using std::move or directly) and avoid the costly copy of a vector by 
mistake. Hence this is deliberate.


http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.h@78
PS3, Line 78:   // Initialize the internal data structures using the supplied 
arguments.
:   // Useful for de-serializing the BlockBloomFilter.
:   Status Init(int log_space_bytes, const void* src_data, size_t 
src_len, bool always_false);
> Good point.
Thanks for the tip. Done.


http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.h@121
PS3, Line 121:   // Returns amount of space used in log2 bytes.
 :   int GetLogSpaceBytes() const {
 : return log_num_buckets_ + kLogBucketByteSize;
 :   }
> Same as above.
Done


http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.cc@108
PS3, Line 108:
> Ths Init variant calls InitInternal (memset on L91) and then calls memcpy (
Intent was to move the memset() outside of InitInternal() but missed removing 
that line from InitInternal(). Fixed.



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 17 Jan 2020 00:08:51 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15034 )

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/15034/4/src/kudu/common/column_predicate.h@152
PS4, Line 152:   std::vector&& bfs,
Here and below you don't need the rvalue reference; could just pass 
std::vector<...> as-is and still move it.


http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.h@78
PS3, Line 78:   // Initialize the internal data structures using the supplied 
arguments.
:   // Useful for de-serializing the BlockBloomFilter.
:   Status Init(int log_space_bytes, const void* src_data, size_t 
src_len, bool always_false);
> Passing the entire kudu::ColumnPredicatePB_BlockBloomFilter would require i
Good point.

What do you think about moving the PB representation of BlockBloomFilter into 
kudu_util? There's some precedence for that (see hash.proto).


http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/15034/3/src/kudu/util/block_bloom_filter.cc@108
PS3, Line 108:
> Done
Ths Init variant calls InitInternal (memset on L91) and then calls memcpy 
(L115). Likewise, the other Init variant calls InitInternal (memset) and does 
another memset on L98.

So it doesn't seem like we've gained here; if anything, we're duplicating work 
more now. Am I missing something?



--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 16 Jan 2020 04:35:30 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

2020-01-15 Thread Bankim Bhavsar (Code Review)
Hello Kudu Jenkins, helifu, Yao Xu, Adar Dembo, ZhangYao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15034

to look at the new patch set (#4).

Change subject: KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on 
server side
..

KUDU-2483 Integrate BlockBloomFilter with ColumnPredicate on server side

This change switches the implementation of the ColumnPredicate
to use the BlockBloomFilter for the BloomFilter predicate
on the server side.

Earlier implementation was still experimental and didn't provide public
client APIs that actually use this BloomFilter predicate so taken the
liberty to make incompatible wire protocol changes.

Most of the change involves refactoring the implementation
including the unit tests.

Currently only FAST_HASH algorithm is supported since
32-bit versions of MURMUR2 and CITY_HASH are not currently
implemented.

Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/block_bloom_filter.cc
M src/kudu/util/block_bloom_filter.h
M src/kudu/util/hash.proto
M src/kudu/util/hash_util.h
11 files changed, 441 insertions(+), 308 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/15034/4
--
To view, visit http://gerrit.cloudera.org:8080/15034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ecfd67e9c5fbe459c5b4aed91e0be2a194d433a
Gerrit-Change-Number: 15034
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yao Xu 
Gerrit-Reviewer: ZhangYao 
Gerrit-Reviewer: helifu