Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 43: (11 comments) Hi Kurt, thank you for your review. I'm replying some for further discussion. http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213 PS43, Line 213: option. Setting 0 means MT_DOP will be used to count the maximum > Need to rework these options so that the core count for the instance is the We can rework this to be used as a starting count. However, I do think that a hard cap config is still required to prevent excessively high number of instances being scheduled due to stats inaccuracy. We can use the min_processing_cost_per_thread flag to derive the max cap once we figure out a good default value to do so. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@552 PS43, Line 552: int max_instances = input_fragment_state.instance_states.size(); > This assignment is always overwritten. Is the value meaningful? Not always. This is the correct assignment if IsExceedMaxFsWriter return false and COMPUTE_PROCESSING_COST=0. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@203 PS43, Line 203: "Valid value is between (0.0, 10.0]. Default value is 1."); > Suggest >0 lower limit. The '(' sign before 0.0 meant to be an exclusion, while ']' sign after 10.0 meant as inclusion. Will spell it out in next patch set. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@205 PS43, Line 205: DEFINE_bool_hidden(processing_cost_equal_expr_weight, true, > Why is the default true here? Also, are the width and datatype considered s This default to true because I have not really explore if evaluation costing from IMPALA-2805 is a good fit for CPU costing algorithm as well. It was originally added to determine filtering priority order. I can try test setting it false and see how it goes. http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210 PS43, Line 210: DEFINE_int64_hidden(min_processing_cost_per_thread, 1000000, > This may not be aggressive enough of a default. Is this number backed by me The end goal of this flag it to cap maximum number of instances by requiring each instance to at least work this amount of ProcessingCost unit. With this value in tpcds_3000_parquet, I can see some fragment still has high max cap (high hundreds and low thousands), while num executors is only 10 and MT_DOP=12. Any scale down (lower than MT_DOP) that happen was mostly driven by the consume-produce rate comparison. So I'd argue to increase this value to further lower the high max cap. On the other hand, ProcessingCost is not exposed to user anywhere other than EXPLAIN_LEVEL=VERBOSE. This is will confuse user when they need to tune it. I'm thinking of changing the flag based on input row cardinality instead. I think that will be easier to reason and tune since the input row cardinality is shown in query profile. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java File fe/src/main/java/org/apache/impala/planner/CoreRequirement.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@59 PS43, Line 59: total_ = counts_.parallelStream().mapToInt(v -> v).sum(); > Is this List really bid enough for parallelStream? Will replace it with regular stream. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@89 PS43, Line 89: public static CoreRequirement sum(List<CoreRequirement> cores) { > Exposing different aggregate values here might blur the usage of this objec Will change to protected. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263 PS43, Line 263: * Estimate per-row cost as 1 / num rows per batch. > Cost seems too low here. Likely more fixed per-row cost. Any formula suggestion to replace this? If it is a fixed per-row cost, then average row size does not matter? http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@384 PS43, Line 384: super.computeRowConsumptionAndProductionToCost(); > Why does fragment_ have transient state here (instance count)? Maybe comput The super implementation is in DataSink.java. DataSink subclasses generally does not bound number of instance except for HdfsTableSink and shared JoinBuildSink https://github.com/apache/impala/blob/84cb5e8510e0a837c1daaf48bd5e7c1127da8dab/fe/src/main/java/org/apache/impala/planner/PlanFragment.java#L408 This is the reason why computeRowConsumptionAndProductionToCost is extended here and in JoinBuildSink.java to set a fixed instance count. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java@415 PS43, Line 415: private static ProcessingCost computeMinimalProcessCost(PlanFragment root) { > This won't reflect actual processing. Is it still needed somewhere? This is printed to LOG to help with my initial research. We can delete this now. http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java File fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java: http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java@24 PS43, Line 24: public class SumProcessingCost extends ProcessingCost { > Why separate class here? Could sum within ProcessingCost class. This is a separate class to maintain the lineage (cost1_ and cost2_). This makes it easy to debug or find anomaly in costing algorithm when pretty-printed in EXPLAIN_LEVEL=VERBOSE. -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 43 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Tue, 07 Feb 2023 18:14:37 +0000 Gerrit-HasComments: Yes
