Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 43:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@209
PS43, Line 209: 3. PROCESSING_COST_MAX_INSTANCES
Should use THREADS instead of INSTANCES in naming if this is sizing local to an 
impalad instance.


http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213
PS43, Line 213:    option. Setting 0 means MT_DOP will be used to count the 
maximum
Need to rework these options so that the core count for the instance is the 
default limit and that can be overridden higher or lower. There isn't much 
point of preserving mt_dop values when compute_processing_cost is set as the 
sizing is using a different model.


http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@244
PS43, Line 244:
Add
Co-authored-by: Riza..


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@270
PS43, Line 270:     if (ContainsUnionNode(fragment_state->fragment.plan)
Add comment/TODO to explain why these cases are skipped.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@552
PS43, Line 552:   int max_instances = 
input_fragment_state.instance_states.size();
This assignment is always overwritten. Is the value meaningful?


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc@1071
PS43, Line 1071:       case TImpalaQueryOptions::PROCESSING_COST_MAX_INSTANCES: 
{
Change INSTANCES to THREADS


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/service/query-options.cc@1077
PS43, Line 1077:                                    "Valid values are in [0, 
64].",
64 may be tool low of a limit. 96 and 128 core CPUs are becoming more common.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@203
PS43, Line 203:     "Valid value is between (0.0, 10.0]. Default value is 1.");
Suggest >0 lower limit.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@205
PS43, Line 205: DEFINE_bool_hidden(processing_cost_equal_expr_weight, true,
Why is the default true here? Also, are the width and datatype considered 
separately?


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210
PS43, Line 210: DEFINE_int64_hidden(min_processing_cost_per_thread, 1000000,
This may not be aggressive enough of a default. Is this number backed by 
measured results?


http://gerrit.cloudera.org:8080/#/c/19033/38/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/38/common/thrift/Frontend.thrift@769
PS38, Line 769:   // The optional max_mem_limit to determine which executor 
group set to run for a query.
This comment seems out of place?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@85
PS43, Line 85:   private List<ProcessingCost> processingCosts_;
Make this a local variable instead.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java
File fe/src/main/java/org/apache/impala/planner/CoreRequirement.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@59
PS43, Line 59:       total_ = counts_.parallelStream().mapToInt(v -> v).sum();
Is this List really bid enough for parallelStream?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@89
PS43, Line 89:   public static CoreRequirement sum(List<CoreRequirement> cores) 
{
Exposing different aggregate values here might blur the usage of this object. 
Do these interfaces all need to be public?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263:    * Estimate per-row cost as 1 / num rows per batch.
Cost seems too low here. Likely more fixed per-row cost.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@384
PS43, Line 384:     super.computeRowConsumptionAndProductionToCost();
Why does fragment_ have transient state here (instance count)? Maybe 
computeRowConsumptionAndProductionToCost should set the result too?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java@415
PS43, Line 415:   private static ProcessingCost 
computeMinimalProcessCost(PlanFragment root) {
This won't reflect actual processing. Is it still needed somewhere?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java@350
PS43, Line 350:     if (mtDop >= 1) {
Min work computation and max threads/instance should override here.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ScanNode.java@369
PS43, Line 369:     return getAvgRowSize() / PlanNode.ROWBATCH_MAX_MEM_USAGE;
This should be more constant per row.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java@24
PS43, Line 24: public class SumProcessingCost extends ProcessingCost {
Why separate class here? Could sum within ProcessingCost class.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 43
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 07 Feb 2023 15:43:52 +0000
Gerrit-HasComments: Yes

Reply via email to