Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20612 )

Change subject: IMPALA-3825: Delegate runtime filter aggregation to some 
executors
......................................................................


Patch Set 3:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/20612/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20612/3//COMMIT_MSG@43
PS3, Line 43: This patch currently targets the boom filter produced by 
partitioned
bloom filter, not boom filter


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/coordinator.cc@488
PS3, Line 488:       int agg_idx = agg_info.aggregator_idx_to_report(i);
aggregator_idx_to_report producing an agg_idx is confusing me. There's 
something about the names I'm missing.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.h@71
PS3, Line 71:   bool need_subaggregegation = false;
nit: should be need_subaggregation


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.h@227
PS3, Line 227:     // Pointer to runtime filter that hold the merge resut of 
all remote updates.
nit: result


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.h@235
PS3, Line 235:     inline int AllRemainingProducers() { return pending_remotes 
+ pending_producers; }
These should all probably have function comments.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc@196
PS3, Line 196:       DCHECK_EQ(res->status().status_code(), TErrorCode::OK);
This feels weird to me because you've already guaranteed that it's false at 
line 188. Maybe simplify with

  if (res->status().status_code() != TErrorCode::OK) {
    // ... never sat an error status
    DCHECK(is_remote_update);
    ...
  }


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc@217
PS3, Line 217:     // Late RPC might come while filter bank is closing.
It'd be helpful to VLOG_RPC that the message was ignored.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc@252
PS3, Line 252:       VLOG(3) << "filter_id=" << params.filter_id()
You use VLOG(3) enough it might warrant its own define in logging.h to name 
this logging scenario. VLOG_FILTER?


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc@256
PS3, Line 256:       produced_filter.pending_remotes = 0;
Wouldn't this + line 286 cause pending_remotes to be negative?


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc@281
PS3, Line 281:         target->Or(params.bloom_filter(), sidecar_slice);
Is there a case where this produces a filter that's always true and we would 
want to stop waiting for other remotes? If FalsePositiveProb == 1.0, then 
presumably we should just use an ALWAYS_TRUE_FILTER.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter-bank.cc@690
PS3, Line 690:   bool try_wait_aggregation = !cancelled_;
Can cancelled_ be updated externally? That seems like a problem, because I 
don't see a lock or atomic.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter.h
File be/src/runtime/runtime-filter.h:

http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/runtime/runtime-filter.h@151
PS3, Line 151:   bool IsReportToPeerRpc() const {
These would benefit from function comments.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/scheduling/scheduler.cc@288
PS3, Line 288:   // Walk the instances and pick two random krpc backend for 
intermediate runtime
Why 2?


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/scheduling/scheduler.cc@291
PS3, Line 291:   vector<vector<pair<int, int>>> instance_groups;
I'm not entirely clear what these pairs represent.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/scheduling/scheduler.cc@298
PS3, Line 298:       if (i == 0
Please add a comment describing what this is preventing.


http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/20612/3/be/src/service/data-stream-service.cc@139
PS3, Line 139:     LOG(INFO) << err_msg;
When would it happen. Should this be a warning?


http://gerrit.cloudera.org:8080/#/c/20612/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20612/3/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@472
PS3, Line 472:         consumedGlobalRuntimeFiltersMemReservationBytes_ += 
f.getFilterSize();
This is probably more than needed, but seems a lot more complicated to have 
per-executor estimates based on where filters will be aggregated.

Will executors process only 1 incoming filter at a time? Seems like 
intermediate aggregators should only have 2 filters in-memory at a time, the 
aggregating filter and an incoming filter to aggregate. I guess I'm not clear 
on the queue size for incoming filters though.


http://gerrit.cloudera.org:8080/#/c/20612/3/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/20612/3/tests/query_test/test_runtime_filters.py@60
PS3, Line 60:       extra_exec_options={
I'm surprised there wasn't already a way to do this.



--
To view, visit http://gerrit.cloudera.org:8080/20612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11d38ed0f223d6e5b32a19ebe725af7738ee4ab0
Gerrit-Change-Number: 20612
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Oct 2023 21:54:00 +0000
Gerrit-HasComments: Yes

Reply via email to