Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13882 )

Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC
......................................................................


Patch Set 27: Code-Review+2

(19 comments)

LGTM. Please address the comments below.

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/benchmarks/bloom-filter-benchmark.cc
File be/src/benchmarks/bloom-filter-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/benchmarks/bloom-filter-benchmark.cc@290
PS27, Line 290:  std::shared_ptr
No need for shared_ptr


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1022
PS25, Line 1022: {
> Thanks for pointing this out.
Makes sense.


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1072
PS25, Line 1072:       
bloom_filter_directory.swap(state->bloom_filter_directory());
               :       DCHECK(rpc_params.bloom_filter().always_false()
> Thanks for the suggestion!
I guess it can simply be

*rpc_params.mutable_bloom_filter() = state->bloom_filter();


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1074
PS25, Line 1074:     || rpc_params.bloo
> Thanks for the suggestion.
Missed the part about the scope of state. Think it's fine to keep it as-is. 
Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1098
PS25, Line 1098:
> Since 'state' is defined inside of the critical section as mentioned above,
Makes sense.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/coordinator.cc@1096
PS27, Line 1096: reinterpret_cast<const char*>(&(bloom_filter_directory[0])),
               :             static_cast<unsigned 
long>(bloom_filter_directory.size())
Please see comments at BloomFilter::AddDirectorySidecar(). We can pass in the 
string directly instead.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/runtime-filter-bank.cc@187
PS27, Line 187: DCHECK(
nit: DCHECK_EQ


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h@35
PS25, Line 35: #include "gen-cpp/ImpalaInternalService.h"
> Thanks for this suggestion.
Looks like it's more involved than expected. Please feel free to defer it to 
the follow-up patch which removes ImpalaInternalService altogether.


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h@70
PS25, Line 70: class TSessionState;
             : class TQueryOptions;
> Thanks for pointing this out!
Sounds good.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@a68
PS27, Line 68:
             :
             :
Why not keep most of this header comment which explains what this test does and 
talks about the output arguments (e.g. success, protobuf, directory)


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@103
PS27, Line 103: *success = directory_y.compare(directory_y2) == 0 ? true : 
false;
*success = directory_y.compare(directory_y2) == 0;


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@365
PS27, Line 365: std::shared_ptr<
No need for shared_ptr


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@367
PS27, Line 367:
nit: unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@395
PS27, Line 395:  TBloomFilter
BloomFilterPB


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.h@95
PS27, Line 95: const uint8_t* directory_in,
             :       size_t directory_in_size)
Can this interface take a string only instead for the directory ? Can we use 
directory.size() to get the size in Init() or are there cases in which passing 
a string won't work ?


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@79
PS27, Line 79: protobuf
rpc_params


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@80
PS27, Line 80: const char* directory,
             :     unsigned long directory_size
See comments below. This could be const string& directory. I believe 
faststring::append() also has an interface for taking string as input.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@85
PS27, Line 85:   unique_ptr<kudu::faststring> sidecar_buf = 
make_unique<kudu::faststring>();
             :   sidecar_buf->append(directory, directory_size);
             :   unique_ptr<kudu::rpc::RpcSidecar> rpc_sidecar =
             :       
kudu::rpc::RpcSidecar::FromFaststring(std::move(sidecar_buf))
Would like to point out an alternative would be to use a slice instead of a 
faststring. This avoids the need to copy into the faststring buffer but 
directory needs to be around until the RPC is done. On the other hand, keeping 
the code as-is leaves a cleaner interface so this may actually be better. If 
you choose to use the option of using a slice, please make sure to highlight 
this side-effect (i.e. directory needs to be alive until the RPC is done).

    kudu::Slice dir_slice(directory);
    kudu::rpc::RpcSidecar::FromSlice(dir_slice);


http://gerrit.cloudera.org:8080/#/c/13882/27/common/protobuf/data_stream_service.proto
File common/protobuf/data_stream_service.proto:

http://gerrit.cloudera.org:8080/#/c/13882/27/common/protobuf/data_stream_service.proto@91
PS27, Line 91:   // The sidecar index associated with the directory of a Bloom 
filter.
             :   // A directory is a list of buckets representing the Bloom 
Filter contents,
             :   // laid out contiguously in one string for efficiency of 
(de)serialisation.
             :   // See BloomFilter::Bucket and BloomFilter::directory_.
             :   optional int32 directory_sidecar_idx = 4;
Since BloomFilterPB is also used as the representation of the aggregated bloom 
filter in the coordinator, does it make more sense to store the 
directory_sidecar_idx in UpdateFilterParamsPB instead ?



--
To view, visit http://gerrit.cloudera.org:8080/13882
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a
Gerrit-Change-Number: 13882
Gerrit-PatchSet: 27
Gerrit-Owner: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Nov 2019 23:52:47 +0000
Gerrit-HasComments: Yes

Reply via email to