Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14538 )
Change subject: IMPALA-4400: aggregate runtime filters locally ...................................................................... Patch Set 15: (15 comments) http://gerrit.cloudera.org:8080/#/c/14538/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14538/15//COMMIT_MSG@18 PS15, Line 18: can done typo http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/exec/kudu-scan-node-base.cc File be/src/exec/kudu-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/exec/kudu-scan-node-base.cc@110 PS15, Line 110: if (filter_ctxs_.size() > 0) WaitForRuntimeFilters(); Maybe outside the scope of this patch, but it looks like the hdfs scanners don't call this until GetNext(), which is nice since it means that all of the setup can happen before we wait. Might want to do the same thing for Kudu. http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h@136 PS15, Line 136: bool HasFragmentIdx(int fragment_idx); const http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h@140 PS15, Line 140: bool HasFragmentIdx(const std::unordered_set<int>& fragment_idxs); const http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.h@161 PS15, Line 161: RuntimeFilterBank* filter_bank() { return filter_bank_.get(); } const http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.cc@a679 PS15, Line 679: Looks like maybe this was the only use of 'fragment_map_' anywhere and its unnecessary now? http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h File be/src/runtime/runtime-filter-bank.h: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@68 PS15, Line 68: AllocateScratchBloomFilter() Not your change, but this is wrong, since the sentence refers to both bloom and min-max filters. Maybe change it to say "AllocateScratch*Filter()" http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@79 PS15, Line 79: PublishGlobalFilte : /// () typo http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@184 PS15, Line 184: locked separately typo http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-state.h@63 PS15, Line 63: /// A collection of items that are part of the global state of a query and shared across Not your change, but this comment seems wrong/confusing to me. RuntimeState isn't shared across a query, its per-fragment instance, right? http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/scheduling/query-schedule.cc File be/src/scheduling/query-schedule.cc: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/scheduling/query-schedule.cc@246 PS15, Line 246: for (const FInstanceExecParams& instance_params: instance_exec_params) { Looks like you're relying on the fact that all instances for a particular host will be contiguous within instance_exec_params. Might want to document that assumption where instance_exec_params is declared, or even make it a map from host to finstances. http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/bloom-filter.cc@251 PS15, Line 251: if (other.AlwaysFalse()) return; Doesn't look like you're handling the always_true case, maybe add a DCHECK? http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.h@74 PS15, Line 74: virtual void SetAlwaysTrue() = 0; protected? http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.cc@324 PS15, Line 324: // Sets 'always_true_' to true and clears the values of 'min_', 'max_', 'min_buffer_', Seems pretty clear from the code, so probably unnecessary, or maybe leave it in the header. http://gerrit.cloudera.org:8080/#/c/14538/15/common/protobuf/data_stream_service.proto File common/protobuf/data_stream_service.proto: http://gerrit.cloudera.org:8080/#/c/14538/15/common/protobuf/data_stream_service.proto@136 PS15, Line 136: // optional int32 dst_fragment_idx = 3; Any reason not to just completely remove this and renumber? Its completely internal so we don't really care about backwards compatibility. -- To view, visit http://gerrit.cloudera.org:8080/14538 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc Gerrit-Change-Number: 14538 Gerrit-PatchSet: 15 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 16 Dec 2019 19:36:30 +0000 Gerrit-HasComments: Yes
