Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................


Patch Set 51:

(26 comments)

Looks very good!

I will look at the code corresponding to section II, III and IV this weekend.

Can you please also confirm that segments are still modeled as a list within a 
fragment?  How hard is it to model as a tree?  Personally I think it is very 
important that all operators can participate in EDoP adjustment to make this 
feature sound.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@151
PS51, Line 151: fragment plan
nit plan fragment, which is blocking since it has 3 blocking PlanNode:
12:AGGREGATE, 06:SORT, and 08:TOP-N.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171
PS51, Line 171: 1. (11:EXCHANGE, 12:AGGREGATE)
              : 2. 06:SORT
              : 3. (07:ANALYTIC, 08:TOP-N)
              : 4. DataStreamSink of F03
indentation and with some additional info as follows.

Blocking segment 1:  (11:EXCHANGE, 12:AGGREGATE)
Blocking segment 2: 06:SORT
Blocking segment 3:  (07:ANALYTIC, 08:TOP-N)
Non-blocking segment 4: DataStreamSink of F03

I also wonder if segment 4 should be a blocking one since by the definition 
above any DataSink is blocking.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@177
PS51, Line 177: PC(segment 1) = 426337+34548320
              : PC(segment 2) = 2159270
              : PC(segment 3) = 23751970+900
              : PC(segment 4) = 22
indentation


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@182
PS51, Line 182: These per-segment costs stored in SegmentCost tree rooted at
nit a


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@183
PS51, Line 183: . In this example, post-order traversal of
              : rootSegment_ will show their associated cost as:
              : [34974657, 2159270, 23752870, 22]
nit. , and are [34974657, 2159270, 23752870, 22] respectively after the 
post-order traversal.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@186
PS51, Line 186: F03 is also a blocking fragment since it has 3 blocking 
PlanNode:
              : 12:AGGREGATE, 06:SORT, and 08:TOP-N.
remove, as the info is described above.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@189
PS51, Line 189: A rootSegment_ is also called an Output ProcessingCost
I think "Output ProcessingCost" should be really called "Total Processing 
cost", since it takes some cost for a fragment to output rows (not cost).


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@196
PS51, Line 196: effective parallelism
nit. effective degree of parallelism (EDoP). We can use EDoP in the rest of the 
text.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@199
PS51, Line 199: algorithms will
              : try to adjust
the costing algorithm attempts to adjust the number of


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@201
PS51, Line 201: Output
see the previous comment.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205
PS51, Line 205: .
Assume that segments are modeled as a list in a plan fragment (true?) in this 
patch, we should append the following here:

, since segments are modeled as a list in a plan fragment .


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@209
PS51, Line 209: the effective parallelism
EDoP


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@261
PS51, Line 261: several
nit suggest to remove since a query plan with a sink node, which is blocking 
node,  at the root maps to one blocking subtree.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@263
PS51, Line 263: leaves. All other fragments in the subtree are
              : non-blocking
the intermediate or leaf nodes.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@268
PS51, Line 268: CoreRequirement
By reading this para, it seems CoreCount is a better name.  Usually a 
requirement in SQL compiler means something not solid, such as ANY, NOT SINGLE, 
etc.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@269
PS51, Line 269: certain
nit remove


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@286
PS51, Line 286: effective parallelism
EDoP


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: executor group to determine if it fits to run in that executor 
group set
set


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: executor group
nit. remove


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@292
PS51, Line 292: this CPU costing algorithm
the entire computation of EDoP.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@300
PS51, Line 300: reducing
computing


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301
PS51, Line 301: Currently, there is no option to
              :    set a threshold for the minimum number of instances.
I strongly suggest that we introduce PROCESSING_COST_MIN_THREADS in this patch 
to safeguard the worst situations that the computed EDoP is 1. Note that the 
processing cost is just an estimate and can be off in a big deal.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@304
PS51, Line 304: 3. PROCESSING_COST_ALLOW_THREAD_INCREMENT
              :    Control whether the costing algorithm is allowed to increase 
the
              :    instance count of a fragment beyond 
PROCESSING_COST_MAX_THREADS due
              :    to the large estimated workload. It is suggested to keep 
this to
              :    false until the min_processing_per_thread backend flag has 
been
              :    finely tuned.
It seems that we can remove this control since one can set 
PROCESSING_COST_MAX_THREADS to max number of available cores to get all 
possible # of cores.

Allowing this control can introduce complexity and confusions.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@320
PS51, Line 320: equal
use_equal


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@326
PS51, Line 326: Minimum processing load that a fragment instance need to work on
              :    before planner consider increasing instance count. Used to 
adjust
              :    fragment instance count based on estimated workload rather 
than the
              :    MT_DOP setting.
Minimum processing load in bytes that a fragment instance needs to work on
   before planner considers increasing instance count based on the processing 
cost rather than the MT_DOP setting. The decision is per fragment.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@335
PS51, Line 335: Q3
              : CoreRequirement={total=12 trace=F00:12}
              : ProcessingCost=87361819
              :
              : Q12
              : CoreRequirement={total=12 trace=F00:12}
              : ProcessingCost=62731290
              :
              : Q15
              : CoreRequirement={total=15 trace=N07:3+F00:12}
              : ProcessingCost=69759065
indentation



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 51
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Thu, 23 Feb 2023 16:49:33 +0000
Gerrit-HasComments: Yes

Reply via email to