Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/13882 )
Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC ...................................................................... Patch Set 18: (31 comments) Thanks so much for all of your work on this! Its getting pretty close, most of these comments are cosmetic One thing: instead of me going through and marking any formatting issues, please run clang-format and take its suggestions. Let me know if you need help with this. Keep in mind that clang-format is just a suggestion and you should: - keep formatting consistent with the rest of the surrounding file or with things that you know are impala's guidelines, even if clang-format disagrees - avoid changing lines that otherwise wouldn't be affected by your patch just to fix the formatting, eg. you don't need to re-arrange includes that were already present even if clang-format says to http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.h@42 PS18, Line 42: namespace kudu { : namespace rpc { : class RpcController; : } // namespace rpc : } // namespace kudu Is this needed? rpc_controller.h is being imported above already http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc@551 PS18, Line 551: DCHECK DCHECK_EQ? http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc@570 PS18, Line 570: kudu::Status rpc_status; : rpc_status = proxy->PublishFilter(rpc_params, &res, &controller); This can all be one line http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc@572 PS18, Line 572: if (!rpc_status.ok()) { In addition to rpc_status, which just reflects whether the rpc actually got to the other machine and was responded to, there's also PublishFilterResultPB::status, which reflects whether the operation itself was successful, so we should add another 'if' that checks that and logs if its an error. http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@994 PS18, Line 994: FilterState* state; I don't think there's any reason to move this here http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1013 PS18, Line 1013: // Check if the filter has already been sent, which could happen in four cases: I think we need another case in this comment for 'failed to get bloom filter sidecar' http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1070 PS18, Line 1070: bs->PublishFilter(rpc_params, controller); Could you add a TODO: make this asynchronous http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1083 PS18, Line 1083: void Coordinator::SetupSidecarForBloomFilter(PublishFilterParamsPB* rpc_params, This function is unnecessary, you can just call AddSidecar directly http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1089 PS18, Line 1089: ( extra parentheses http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1115 PS18, Line 1115: // If the incoming Bloom filter is neither an always true filter nor an : // always false filter, then it must be the case that a non-empty sidecar slice : // has been received. Refer to BloomFilter::ToProtobuf() for further details. Comment is outdated. Thinking through the cases here, I think that this all works correctly, but its a little weird. In particular, in the case where params->bloom_filter() is always_false: - if bloom_filter_ is also always_false, then we'll TryConsume(0), do a swap() that doesn't actually change the value of bloom_filter_ since its still always_false, and then re-instantiate bloom_filter_directory_ to an empty string - if bloom_filter_ is not always_false, then we'll call Or(), which will just return immediately without doing anything This is how it worked before, so it would be fine to leave it, but I think that we can make this clearer by changing the 'else' on line 1109 to be 'else (!params->bloom_filter().always_false)' and skip all the extra work in that case. Then we can make the 'if' here a DCHECK(params->bloom_filter().has_directory_sidecar_idx()), and people won't have to reason through what the code below would do in the case that sidecar_slice doesn't get filled in. You'll of course also want to add appropriate comments (and you should think it through and make sure my reasoning is correct) http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.h@147 PS18, Line 147: /// Lock protecting those BloomFilter's in 'bloom_filters_' from being Lets move this somewhere further down, so that runtime_filter_lock_ is next to the things it protects. Also rename 'num_krpcs_counter_lock_' to 'num_inflight_rpcs_lock_', change the comment above it to just say that it protects 'num_inflight_rpcs_' and also that it shouldn't be taken at the same time as 'runtime_filter_lock_'. Then put a comment above num_inflight_rpcs_ explaining what its used for (as mentioned elsewhere, you can just use your existing comment from the cc file) http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.h@194 PS18, Line 194: void UpdateFilterCompleteCb(const kudu::rpc::RpcController* rpc_controller); brief comment http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@41 PS18, Line 41: #include "service/control-service.h" I don't think you need this (or probably a couple of the other imports you added here) http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@112 PS18, Line 112: void RuntimeFilterBank::UpdateFilterCompleteCb( I think we'll also want to pass in the UpdateFilterResultPB* here by inclusing it in the bind() below, so that we can check it's 'status'. I think in the current code in data-stream-service that it can only be OK, so its fine to DCHECK on it. http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@190 PS18, Line 190: krpc_address, krpc_address.hostname This is wrong. See https://gerrit.cloudera.org/#/c/13818/ http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@199 PS18, Line 199: // Use 'num_inflight_rpcs_' to keep track of the number of current in-flight : // KRPC calls to prevent the memory pointed to by 'bloom_filter' being : // deallocated in RuntimeFilterBank::Close() before all KRPC calls have : // been completed. Lets move this to above the definition of num_inflight_rpcs_ in the header http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@237 PS18, Line 237: if (params->bloom_filter().has_directory_sidecar_idx()) { Maybe add en else{} that just DCHECKs that params->bloom_filter().always_false is true, since that's the only reason this shouldn't be set. http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@242 PS18, Line 242: LOG(ERROR) Probably WARNING, since we're not considering this a reason to fail the query http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter-test.cc File be/src/util/bloom-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter-test.cc@368 PS18, Line 368: (NULL nullptr http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@30 PS18, Line 30: #include "gen-cpp/control_service.pb.h" I don't think that you need this http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@38 PS18, Line 38: class Slice; This isn't used http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@95 PS18, Line 95: Status Init(const BloomFilterPB& protobuf, const std::string& directory); We can eliminate this version, since its basically exactly the same as the one below. Just have the code that calls this instead call .data() and .size() on the string when passing it in. http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@106 PS18, Line 106: /// Also uses a sidecar buffer to instantiate an outbound sidecar, which in turn : /// is passed to 'controller' to get a sidecar index which will be used on the client : /// side to retrieve the contents stored in the sidecar buffer. I think this can be simplified to something more like "Also sets a sidecar on 'controller' containing the bloom filter's directory" http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@125 PS18, Line 125: /// 'directory_in' is the string associated with the BloomFilterPB 'in' and : /// 'directory_out' is the string associated with the BloomFilterPB 'out'. This can be eliminated, its fairly obvious from the names http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@223 PS18, Line 223: /// If the first argument is NULL, it is interpreted as a complete filter which : /// contains all elements. : /// It also uses a sidecar buffer to instantiate an outbound sidecar, which in turn : /// is passed to 'controller' to get a sidecar index which will be used on the : /// receiving side to retrieve the contents stored in the sidecar buffer. This can be eliminated (its covered by the comment for the public ToProtobuf above) http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@229 PS18, Line 229: void AddSidecar(BloomFilterPB* protobuf, : kudu::rpc::RpcController* controller) const; I don't think this is actually used anywhere http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@231 PS18, Line 231: AddSidecar It might be more clear to name this something like AddDirectorySidecar(), and also add a brief comment saying what it does, including noting any requirements this function has (eg. it looks like you require that the filter is neither always_true nor always_false) And of course, when you remove the 'friend' below it'll need to be made public http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@247 PS18, Line 247: friend class Coordinator; Remove this. As a general rule, we avoid using 'friend' because it messes up encapsulation. The main exception is for tests, where encapsulation doesn't matter as much. http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc@92 PS18, Line 92: unsigned long directory_size) { Look like this function requires that the filter is not always_false. You should add a DCHECK for that, and also for any other requirements, eg. that protobuf != nullptr http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc@249 PS18, Line 249: ( I think you got an unnecessary pair of parentheses here and around directory_out on this line, and also in several places on the lines below. Makes it a better hard to read what's going on http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc@257 PS18, Line 257: __m128i* simd_out = reinterpret_cast<__m128i*>(&((*(directory_out))[0])); Might as well leave this where it was, to keep the comment next to the loop its referring to -- To view, visit http://gerrit.cloudera.org:8080/13882 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a Gerrit-Change-Number: 13882 Gerrit-PatchSet: 18 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Mon, 09 Sep 2019 19:15:22 +0000 Gerrit-HasComments: Yes
