Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/19033 )
Change subject: IMPALA-11604 Planner changes for CPU usage ...................................................................... Patch Set 51: (26 comments) Looks very good! I will look at the code corresponding to section II, III and IV this weekend. Can you please also confirm that segments are still modeled as a list within a fragment? How hard is it to model as a tree? Personally I think it is very important that all operators can participate in EDoP adjustment to make this feature sound. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@151 PS51, Line 151: fragment plan nit plan fragment, which is blocking since it has 3 blocking PlanNode: 12:AGGREGATE, 06:SORT, and 08:TOP-N. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171 PS51, Line 171: 1. (11:EXCHANGE, 12:AGGREGATE) : 2. 06:SORT : 3. (07:ANALYTIC, 08:TOP-N) : 4. DataStreamSink of F03 indentation and with some additional info as follows. Blocking segment 1: (11:EXCHANGE, 12:AGGREGATE) Blocking segment 2: 06:SORT Blocking segment 3: (07:ANALYTIC, 08:TOP-N) Non-blocking segment 4: DataStreamSink of F03 I also wonder if segment 4 should be a blocking one since by the definition above any DataSink is blocking. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@177 PS51, Line 177: PC(segment 1) = 426337+34548320 : PC(segment 2) = 2159270 : PC(segment 3) = 23751970+900 : PC(segment 4) = 22 indentation http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@182 PS51, Line 182: These per-segment costs stored in SegmentCost tree rooted at nit a http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@183 PS51, Line 183: . In this example, post-order traversal of : rootSegment_ will show their associated cost as: : [34974657, 2159270, 23752870, 22] nit. , and are [34974657, 2159270, 23752870, 22] respectively after the post-order traversal. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@186 PS51, Line 186: F03 is also a blocking fragment since it has 3 blocking PlanNode: : 12:AGGREGATE, 06:SORT, and 08:TOP-N. remove, as the info is described above. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@189 PS51, Line 189: A rootSegment_ is also called an Output ProcessingCost I think "Output ProcessingCost" should be really called "Total Processing cost", since it takes some cost for a fragment to output rows (not cost). http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@196 PS51, Line 196: effective parallelism nit. effective degree of parallelism (EDoP). We can use EDoP in the rest of the text. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@199 PS51, Line 199: algorithms will : try to adjust the costing algorithm attempts to adjust the number of http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@201 PS51, Line 201: Output see the previous comment. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205 PS51, Line 205: . Assume that segments are modeled as a list in a plan fragment (true?) in this patch, we should append the following here: , since segments are modeled as a list in a plan fragment . http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@209 PS51, Line 209: the effective parallelism EDoP http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@261 PS51, Line 261: several nit suggest to remove since a query plan with a sink node, which is blocking node, at the root maps to one blocking subtree. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@263 PS51, Line 263: leaves. All other fragments in the subtree are : non-blocking the intermediate or leaf nodes. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@268 PS51, Line 268: CoreRequirement By reading this para, it seems CoreCount is a better name. Usually a requirement in SQL compiler means something not solid, such as ANY, NOT SINGLE, etc. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@269 PS51, Line 269: certain nit remove http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@286 PS51, Line 286: effective parallelism EDoP http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288 PS51, Line 288: executor group to determine if it fits to run in that executor group set set http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288 PS51, Line 288: executor group nit. remove http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@292 PS51, Line 292: this CPU costing algorithm the entire computation of EDoP. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@300 PS51, Line 300: reducing computing http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301 PS51, Line 301: Currently, there is no option to : set a threshold for the minimum number of instances. I strongly suggest that we introduce PROCESSING_COST_MIN_THREADS in this patch to safeguard the worst situations that the computed EDoP is 1. Note that the processing cost is just an estimate and can be off in a big deal. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@304 PS51, Line 304: 3. PROCESSING_COST_ALLOW_THREAD_INCREMENT : Control whether the costing algorithm is allowed to increase the : instance count of a fragment beyond PROCESSING_COST_MAX_THREADS due : to the large estimated workload. It is suggested to keep this to : false until the min_processing_per_thread backend flag has been : finely tuned. It seems that we can remove this control since one can set PROCESSING_COST_MAX_THREADS to max number of available cores to get all possible # of cores. Allowing this control can introduce complexity and confusions. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@320 PS51, Line 320: equal use_equal http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@326 PS51, Line 326: Minimum processing load that a fragment instance need to work on : before planner consider increasing instance count. Used to adjust : fragment instance count based on estimated workload rather than the : MT_DOP setting. Minimum processing load in bytes that a fragment instance needs to work on before planner considers increasing instance count based on the processing cost rather than the MT_DOP setting. The decision is per fragment. http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@335 PS51, Line 335: Q3 : CoreRequirement={total=12 trace=F00:12} : ProcessingCost=87361819 : : Q12 : CoreRequirement={total=12 trace=F00:12} : ProcessingCost=62731290 : : Q15 : CoreRequirement={total=15 trace=N07:3+F00:12} : ProcessingCost=69759065 indentation -- To view, visit http://gerrit.cloudera.org:8080/19033 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a Gerrit-Change-Number: 19033 Gerrit-PatchSet: 51 Gerrit-Owner: Qifan Chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Thu, 23 Feb 2023 16:49:33 +0000 Gerrit-HasComments: Yes
