Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19807 )

Change subject: IMPALA-12091: Control scan parallelism by its processing cost
......................................................................


Patch Set 3:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/data-sink.h@48
PS3, Line 48: /// DataSink contains the runtime state and there can be up to 
effective_dop instances of
ExceNode::effective_dop()


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/exec-node.h@132
PS3, Line 132:   int effective_dop() const { return effective_dop_; }
Would rather not see the term "effective" applied during execution. At this 
point, the query is planned and performance is consequence of that plan.


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/scan-node.cc@298
PS3, Line 298:     max_row_batches = max(2, max_row_batches / 
parent->plan_node().effective_dop());
This is too disconnected from the max_row_batches computation above, which does 
not consider effective_dop. I don't see how 
max_queued_row_batches_per_scanner_thread would be enforced.


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@58
PS3, Line 58: static const int MAX_INSTANCES_PER_NODE = 128;
Please add comment and consider making this configurable.


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@275
PS3, Line 275:       // TODO: Fragment with IsExceedMaxFsWriters equals true is 
not checked for now
Let's address this case in this PR


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@283
PS3, Line 283:           fragment_state->fragment.display_name));
Print values with message


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@289
PS3, Line 289:     int idx = 1;
largest_inst_idx


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@290
PS3, Line 290:     for (int i = 1; i < fragment_state->instance_states.size(); 
i++) {
Add a comment to explain what the following blocks are checking. It's not clear 
how the (128) limit is being avoided here.


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@313
PS3, Line 313:     int effective_inst_per_host = 
ceil((float)effective_instance_count / num_host);
Rename planned_inst_per_host


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/scheduling/scheduler.cc@322
PS3, Line 322:               << " num_host=" << num_host;
Can we suggest anything to correct if this occurs?


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/service/query-options.h@293
PS3, Line 293:   QUERY_OPT_FN(processing_cost_max_threads, 
PROCESSING_COST_MAX_THREADS,                 \
Rename max_fragment_threads_per_executor or something more tangible to the user.


http://gerrit.cloudera.org:8080/#/c/19807/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19807/3/common/thrift/ImpalaService.thrift@793
PS3, Line 793:   PROCESSING_COST_MAX_THREADS = 156;
Typo 156


http://gerrit.cloudera.org:8080/#/c/19807/3/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19807/3/common/thrift/Query.thrift@632
PS3, Line 632:   157: optional i32 processing_cost_max_threads = 128;
Can we consolidate there 128 constants somewhere?


http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@165
PS3, Line 165:   public static final int PROCESSING_COST_THREADS_UPPERBOUND = 
128;
max_fragment_threads_per_instance


http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@660
PS3, Line 660:         0;
Is 0 expected here and handled later?


http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java@56
PS3, Line 56:   private final static Logger LOG = 
LoggerFactory.getLogger(ScanNode.class);
Unused?


http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/test/resources/llama-site-3-groups.xml
File fe/src/test/resources/llama-site-3-groups.xml:

http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/test/resources/llama-site-3-groups.xml@41
PS3, Line 41:     <value>94371840</value>
Why is this change necessary?



--
To view, visit http://gerrit.cloudera.org:8080/19807
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If948e45455275d9a61a6cd5d6a30a8b98a7c729a
Gerrit-Change-Number: 19807
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Wed, 03 May 2023 18:44:40 +0000
Gerrit-HasComments: Yes

Reply via email to