Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13882 )
Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC ...................................................................... Patch Set 27: Code-Review+2 (19 comments) LGTM. Please address the comments below. http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/benchmarks/bloom-filter-benchmark.cc File be/src/benchmarks/bloom-filter-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/benchmarks/bloom-filter-benchmark.cc@290 PS27, Line 290: std::shared_ptr No need for shared_ptr http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1022 PS25, Line 1022: { > Thanks for pointing this out. Makes sense. http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1072 PS25, Line 1072: bloom_filter_directory.swap(state->bloom_filter_directory()); : DCHECK(rpc_params.bloom_filter().always_false() > Thanks for the suggestion! I guess it can simply be *rpc_params.mutable_bloom_filter() = state->bloom_filter(); http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1074 PS25, Line 1074: || rpc_params.bloo > Thanks for the suggestion. Missed the part about the scope of state. Think it's fine to keep it as-is. Thanks for the explanation. http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1098 PS25, Line 1098: > Since 'state' is defined inside of the critical section as mentioned above, Makes sense. http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/coordinator.cc@1096 PS27, Line 1096: reinterpret_cast<const char*>(&(bloom_filter_directory[0])), : static_cast<unsigned long>(bloom_filter_directory.size()) Please see comments at BloomFilter::AddDirectorySidecar(). We can pass in the string directly instead. http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/runtime-filter-bank.cc@187 PS27, Line 187: DCHECK( nit: DCHECK_EQ http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h@35 PS25, Line 35: #include "gen-cpp/ImpalaInternalService.h" > Thanks for this suggestion. Looks like it's more involved than expected. Please feel free to defer it to the follow-up patch which removes ImpalaInternalService altogether. http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h@70 PS25, Line 70: class TSessionState; : class TQueryOptions; > Thanks for pointing this out! Sounds good. http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc File be/src/util/bloom-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@a68 PS27, Line 68: : : Why not keep most of this header comment which explains what this test does and talks about the output arguments (e.g. success, protobuf, directory) http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@103 PS27, Line 103: *success = directory_y.compare(directory_y2) == 0 ? true : false; *success = directory_y.compare(directory_y2) == 0; http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@365 PS27, Line 365: std::shared_ptr< No need for shared_ptr http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@367 PS27, Line 367: nit: unnecessary blank line http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@395 PS27, Line 395: TBloomFilter BloomFilterPB http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.h@95 PS27, Line 95: const uint8_t* directory_in, : size_t directory_in_size) Can this interface take a string only instead for the directory ? Can we use directory.size() to get the size in Init() or are there cases in which passing a string won't work ? http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@79 PS27, Line 79: protobuf rpc_params http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@80 PS27, Line 80: const char* directory, : unsigned long directory_size See comments below. This could be const string& directory. I believe faststring::append() also has an interface for taking string as input. http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@85 PS27, Line 85: unique_ptr<kudu::faststring> sidecar_buf = make_unique<kudu::faststring>(); : sidecar_buf->append(directory, directory_size); : unique_ptr<kudu::rpc::RpcSidecar> rpc_sidecar = : kudu::rpc::RpcSidecar::FromFaststring(std::move(sidecar_buf)) Would like to point out an alternative would be to use a slice instead of a faststring. This avoids the need to copy into the faststring buffer but directory needs to be around until the RPC is done. On the other hand, keeping the code as-is leaves a cleaner interface so this may actually be better. If you choose to use the option of using a slice, please make sure to highlight this side-effect (i.e. directory needs to be alive until the RPC is done). kudu::Slice dir_slice(directory); kudu::rpc::RpcSidecar::FromSlice(dir_slice); http://gerrit.cloudera.org:8080/#/c/13882/27/common/protobuf/data_stream_service.proto File common/protobuf/data_stream_service.proto: http://gerrit.cloudera.org:8080/#/c/13882/27/common/protobuf/data_stream_service.proto@91 PS27, Line 91: // The sidecar index associated with the directory of a Bloom filter. : // A directory is a list of buckets representing the Bloom Filter contents, : // laid out contiguously in one string for efficiency of (de)serialisation. : // See BloomFilter::Bucket and BloomFilter::directory_. : optional int32 directory_sidecar_idx = 4; Since BloomFilterPB is also used as the representation of the aggregated bloom filter in the coordinator, does it make more sense to store the directory_sidecar_idx in UpdateFilterParamsPB instead ? -- To view, visit http://gerrit.cloudera.org:8080/13882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a Gerrit-Change-Number: 13882 Gerrit-PatchSet: 27 Gerrit-Owner: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Fri, 01 Nov 2019 23:52:47 +0000 Gerrit-HasComments: Yes