Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/13882 )
Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC ...................................................................... Patch Set 19: (26 comments) http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/rpc/thrift-server-test.cc File be/src/rpc/thrift-server-test.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/rpc/thrift-server-test.cc@22 PS19, Line 22: #include "gen-cpp/ImpalaInternalService.h" Why is this necessary? http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator-backend-state.cc@572 PS19, Line 572: rpc_status.message().ToString() I think you can just call ToString() directly on the kudu::Status without calling .message() http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator-backend-state.cc@573 PS19, Line 573: controller.Reset(); Not needed, since we don't reuse the controller http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator-backend-state.cc@576 PS19, Line 576: res.status().error_msgs(0) We'll want to print the status code and other error messages. Easiest way is to just construct a regular Status object from the StatusPB, i.e. replace this with something like "Status(res.status()).GetDetail()" That's of course less efficient, but this shouldn't happen very often http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.h@44 PS19, Line 44: class RpcController; not used http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1070 PS19, Line 1070: if (rpc_params.has_bloom_filter()) { : if (!rpc_params.bloom_filter().always_false() : && !rpc_params.bloom_filter().always_true()) { combine these with '&&' http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1107 PS19, Line 1107: } else if (!params->bloom_filter().always_false()) { probably more straightforward to flip the conditions, i.e. put the always_false() block here and put this block in the 'else' http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1118 PS19, Line 1118: return; I don't think we want to just return here - we still need to hit the "if (pending_count_ == 0..." below http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1122 PS19, Line 1122: // There are two cases in which 'bloom_filter_.always_false()' : // evaluates to true: (1) 'bloom_filter_' has never been initialized : // by any incoming Bloom filter previously, (2) 'bloom_filter_' has : // been initialized to an always false Bloom filter by an incoming : // Bloom filter previously. We note that : // 'bloom_filter_.has_log_bufferpool_space()' evaluates to false when : // 'bloom_filter_' has never been initialized by any incoming Bloom : // filter and to true otherwise. : // In either case, we will always perform the swap operation. I think this whole comment is unnecessary http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1141 PS19, Line 1141: // We want to move the payload from the request rather than copy it and : // take double the memory cost. this can be removed http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1157 PS19, Line 1157: non_const_filter = &const_cast<BloomFilterPB&>(params->bloom_filter()); might as well put this is in the 'if' for clarity http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1158 PS19, Line 1158: // Even if the current incoming Bloom filter is an always false filter, : // we still need to perform the swap operation to properly initialize : // the field of 'log_bufferpool_space_' if 'bloom_filter_' has not been : // initialized by any incoming Bloom filter yet. Otherwise we will hit a : // DCHECK error in 'buffer-allocator.cc' that requires that the buffer : // length of the directory associated with the Bloom filter to be : // published is greater than some lower bound when attempting to call : // BloomFilter::Init() in RuntimeFilterBank::PublishGlobalFilter(). Unnecessarily long comment. Lets just put a comment inside the 'if' that says something like 'bloom_filter_ hasn't been initialized yet, so do so now' http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/coordinator.cc@1167 PS19, Line 1167: // We want to move the payload from the request rather than copy it and : // take double the memory cost. This can be removed. It was referring to swapping the directory to avoid a large copy, but the directory isn't being swapped here. http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/data-stream-test.cc File be/src/runtime/data-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/data-stream-test.cc@127 PS19, Line 127: nit: extra whitespace. You can just remove all of it and do '{}' http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/fragment-instance-state.h@48 PS19, Line 48: using kudu::rpc::RpcContext; don't use 'using' in headers http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/krpc-data-stream-recvr.cc File be/src/runtime/krpc-data-stream-recvr.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/krpc-data-stream-recvr.cc@469 PS19, Line 469: LOG(INFO) << "KrpcDataStreamRecvr::SenderQueue::AddBatch(), " : "rpc_context->GetTransferSize(): " << rpc_context->GetTransferSize(); I'm guessing you did this for debugging? It should be removed http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/krpc-data-stream-recvr.cc@569 PS19, Line 569: LOG(INFO) << "KrpcDataStreamRecvr::SenderQueue::TakeOverEarlySender, " same here http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/query-state.h@45 PS19, Line 45: using kudu::rpc::RpcContext; don't use using in header files http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/runtime-filter-bank.h@156 PS19, Line 156: num_inflight_rpcs_ nit: when referring to variables in comments, we put single quotes around them http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/runtime/runtime-filter-bank.cc@121 PS19, Line 121: DCHECK_EQ(res->status().status_code(), TErrorCode::OK); Brief comment about why this is correct, i.e. "UpdateFilter should never set an error status" http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/data-stream-service.h File be/src/service/data-stream-service.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/data-stream-service.h@23 PS19, Line 23: //#include "kudu/rpc/rpc_controller.h" remove http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/data-stream-service.h@71 PS19, Line 71: :: nit: remove this, here and elsewhere, to be consistent with the rest of this file http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/data-stream-service.cc File be/src/service/data-stream-service.cc: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/data-stream-service.cc@137 PS19, Line 137: LOG(ERROR) I think this isn't necessarily an error, eg. it can happen if the query was cancelled, so lets do LOG(INFO) http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/service/impala-server.h@60 PS19, Line 60: using kudu::rpc::RpcContext; don't use 'using' in header files http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/util/bloom-filter.h@43 PS19, Line 43: class Coordinator; not used http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/13882/19/be/src/util/min-max-filter.h@22 PS19, Line 22: #include "gen-cpp/control_service.pb.h" shouldn't this be 'data_stream_service'? -- To view, visit http://gerrit.cloudera.org:8080/13882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a Gerrit-Change-Number: 13882 Gerrit-PatchSet: 19 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Tue, 17 Sep 2019 21:50:43 +0000 Gerrit-HasComments: Yes
