Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 43:

(11 comments)

Hi Kurt, thank you for your review. I'm replying some for further discussion.

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213
PS43, Line 213:    option. Setting 0 means MT_DOP will be used to count the 
maximum
> Need to rework these options so that the core count for the instance is the
We can rework this to be used as a starting count. However, I do think that a 
hard cap config is still required to prevent excessively high number of 
instances being scheduled due to stats inaccuracy.

We can use the min_processing_cost_per_thread flag to derive the max cap once 
we figure out a good default value to do so.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@552
PS43, Line 552:   int max_instances = 
input_fragment_state.instance_states.size();
> This assignment is always overwritten. Is the value meaningful?
Not always. This is the correct assignment if IsExceedMaxFsWriter return false 
and COMPUTE_PROCESSING_COST=0.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@203
PS43, Line 203:     "Valid value is between (0.0, 10.0]. Default value is 1.");
> Suggest >0 lower limit.
The '(' sign before 0.0 meant to be an exclusion, while ']' sign after 10.0 
meant as inclusion. Will spell it out in next patch set.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@205
PS43, Line 205: DEFINE_bool_hidden(processing_cost_equal_expr_weight, true,
> Why is the default true here? Also, are the width and datatype considered s
This default to true because I have not really explore if evaluation costing 
from IMPALA-2805 is a good fit for CPU costing algorithm as well. It was 
originally added to determine filtering priority order. I can try test setting 
it false and see how it goes.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210
PS43, Line 210: DEFINE_int64_hidden(min_processing_cost_per_thread, 1000000,
> This may not be aggressive enough of a default. Is this number backed by me
The end goal of this flag it to cap maximum number of instances by requiring 
each instance to at least work this amount of ProcessingCost unit. With this 
value in tpcds_3000_parquet, I can see some fragment still has high max cap 
(high hundreds and low thousands), while num executors is only 10 and 
MT_DOP=12. Any scale down (lower than MT_DOP) that happen was mostly driven by 
the consume-produce rate comparison. So I'd argue to increase this value to 
further lower the high max cap.

On the other hand, ProcessingCost is not exposed to user anywhere other than 
EXPLAIN_LEVEL=VERBOSE. This is will confuse user when they need to tune it. I'm 
thinking of changing the flag based on input row cardinality instead. I think 
that will be easier to reason and tune since the input row cardinality is shown 
in query profile.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java
File fe/src/main/java/org/apache/impala/planner/CoreRequirement.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@59
PS43, Line 59:       total_ = counts_.parallelStream().mapToInt(v -> v).sum();
> Is this List really bid enough for parallelStream?
Will replace it with regular stream.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/CoreRequirement.java@89
PS43, Line 89:   public static CoreRequirement sum(List<CoreRequirement> cores) 
{
> Exposing different aggregate values here might blur the usage of this objec
Will change to protected.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263:    * Estimate per-row cost as 1 / num rows per batch.
> Cost seems too low here. Likely more fixed per-row cost.
Any formula suggestion to replace this? If it is a fixed per-row cost, then 
average row size does not matter?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@384
PS43, Line 384:     super.computeRowConsumptionAndProductionToCost();
> Why does fragment_ have transient state here (instance count)? Maybe comput
The super implementation is in DataSink.java.
DataSink subclasses generally does not bound number of instance except for 
HdfsTableSink and shared JoinBuildSink
https://github.com/apache/impala/blob/84cb5e8510e0a837c1daaf48bd5e7c1127da8dab/fe/src/main/java/org/apache/impala/planner/PlanFragment.java#L408

This is the reason why computeRowConsumptionAndProductionToCost is extended 
here and in JoinBuildSink.java to set a fixed instance count.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/Planner.java@415
PS43, Line 415:   private static ProcessingCost 
computeMinimalProcessCost(PlanFragment root) {
> This won't reflect actual processing. Is it still needed somewhere?
This is printed to LOG to help with my initial research. We can delete this now.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/SumProcessingCost.java@24
PS43, Line 24: public class SumProcessingCost extends ProcessingCost {
> Why separate class here? Could sum within ProcessingCost class.
This is a separate class to maintain the lineage (cost1_ and cost2_). This 
makes it easy to debug or find anomaly in costing algorithm when pretty-printed 
in EXPLAIN_LEVEL=VERBOSE.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 43
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 07 Feb 2023 18:14:37 +0000
Gerrit-HasComments: Yes

Reply via email to