Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
......................................................................


Patch Set 15:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/14538/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14538/15//COMMIT_MSG@18
PS15, Line 18: can done
typo


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/exec/kudu-scan-node-base.cc@110
PS15, Line 110:   if (filter_ctxs_.size() > 0) WaitForRuntimeFilters();
Maybe outside the scope of this patch, but it looks like the hdfs scanners 
don't call this until GetNext(), which is nice since it means that all of the 
setup can happen before we wait. Might want to do the same thing for Kudu.


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h@136
PS15, Line 136:   bool HasFragmentIdx(int fragment_idx);
const


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h@140
PS15, Line 140:   bool HasFragmentIdx(const std::unordered_set<int>& 
fragment_idxs);
const


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.h@161
PS15, Line 161:   RuntimeFilterBank* filter_bank() { return filter_bank_.get(); 
}
const


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.cc@a679
PS15, Line 679:
Looks like maybe this was the only use of 'fragment_map_' anywhere and its 
unnecessary now?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@68
PS15, Line 68: AllocateScratchBloomFilter()
Not your change, but this is wrong, since the sentence refers to both bloom and 
min-max filters. Maybe change it to say "AllocateScratch*Filter()"


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@79
PS15, Line 79: PublishGlobalFilte
             : /// ()
typo


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@184
PS15, Line 184: locked separately
typo


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-state.h@63
PS15, Line 63: /// A collection of items that are part of the global state of a 
query and shared across
Not your change, but this comment seems wrong/confusing to me. RuntimeState 
isn't shared across a query, its per-fragment instance, right?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/scheduling/query-schedule.cc
File be/src/scheduling/query-schedule.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/scheduling/query-schedule.cc@246
PS15, Line 246:   for (const FInstanceExecParams& instance_params: 
instance_exec_params) {
Looks like you're relying on the fact that all instances for a particular host 
will be contiguous within instance_exec_params. Might want to document that 
assumption where instance_exec_params is declared, or even make it a map from 
host to finstances.


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/bloom-filter.cc@251
PS15, Line 251:   if (other.AlwaysFalse()) return;
Doesn't look like you're handling the always_true case, maybe add a DCHECK?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.h@74
PS15, Line 74:   virtual void SetAlwaysTrue() = 0;
protected?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.cc@324
PS15, Line 324: // Sets 'always_true_' to true and clears the values of 'min_', 
'max_', 'min_buffer_',
Seems pretty clear from the code, so probably unnecessary, or maybe leave it in 
the header.


http://gerrit.cloudera.org:8080/#/c/14538/15/common/protobuf/data_stream_service.proto
File common/protobuf/data_stream_service.proto:

http://gerrit.cloudera.org:8080/#/c/14538/15/common/protobuf/data_stream_service.proto@136
PS15, Line 136:   // optional int32 dst_fragment_idx = 3;
Any reason not to just completely remove this and renumber? Its completely 
internal so we don't really care about backwards compatibility.



--
To view, visit http://gerrit.cloudera.org:8080/14538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Gerrit-Change-Number: 14538
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 16 Dec 2019 19:36:30 +0000
Gerrit-HasComments: Yes

Reply via email to