Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 )
Change subject: IMPALA-12980: Translate CpuAsk into admission control slots ...................................................................... Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/21257/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21257/11//COMMIT_MSG@19 PS11, Line 19: rather : than sum of it (48) > So if the configuration is correct, then cpuask will be always <= the number > of cores? Yes, CpuAsk should be <= total cores in selected executor group set. An exception is when CpuAsk still larger than biggest executor group set. In that case, we assign it anyway to largest executor group set since the biggest one is treated as "catch-all" group. > I think that what needs a bit of explanation is "12 cores oversubscribed by > 4x". Lets assume that an executor host has exactly 48 cpu cores. What I want to say with the example is that, if an executor is assigned with 48 fragments instances that all can run in-parallel without blocking each other, they should be assigned 48 slots rather than 12 slots in that executor node. Note that admission control slots inherently control how many concurrent query can run in a cluster. If this query is given 12 slots, then there can be 4 same queries running concurrently. But if query is given 48 slots, only 1 query can run at a time. http://gerrit.cloudera.org:8080/#/c/21257/11/tests/custom_cluster/test_executor_groups.py File tests/custom_cluster/test_executor_groups.py: http://gerrit.cloudera.org:8080/#/c/21257/11/tests/custom_cluster/test_executor_groups.py@1245 PS11, Line 1245: # CoreCount={total=16 trace=F15:3+F01:1+F14:3+F03:1+F13:3+F05:1+F12:3+F07:1}, Right, tpcds_cpu_cost/tpcds-q01.test is confusing here because the test has stats injection. I will add a new Planner test without stats injection. > Does the cost not matter in this case, so we give a parallel fragment a full > slot even if we estimate it to process just a couple of rows? The role of processing cost stop after Planner select parallelism for each fragment. Naively, planner/scheduler can just total num fragment instances assigned to each node as slots requirement, but that will be wasteful if only subset of fragment can run in parallel while others blocked waiting. Therefore, planner/scheduler only select largest subset of fragments that does not block each other, and sum instances of that fragments subset as slot requirement. > Another thing I don't get is that if F15 (a builder for a broadcast join) is > included, then why F02 is not included, which has the source scan node of F15 > and should run in parallel? (+it has a much higher cost than F15) In the new planner test that I will submit next, we will see that parallelism of F02 (3) is lower than parallelism of its child fragments F15 + F01 (3 + 1). F02 and F01 is a blocking fragment because they have AGGREGATE node in it. They also can not begin work until F15, the join builder, is complete. F02:PLAN FRAGMENT [HASH(sr_customer_sk,sr_store_sk)] hosts=3 instances=3 (adjusted from 384) Per-Instance Resources: mem-estimate=10.49MB mem-reservation=1.94MB thread-reservation=1 max-parallelism=3 segment-costs=[327730, 110283] cpu-comparison-result=4 [max(3 (self) vs 4 (sum children))] 17:AGGREGATE [FINALIZE] | output: sum:merge(SR_RETURN_AMT) | group by: sr_customer_sk, sr_store_sk | mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 | tuple-ids=2 row-size=24B cardinality=53.52K cost=315877 | in pipelines: 17(GETNEXT), 00(OPEN) | 16:EXCHANGE [HASH(sr_customer_sk,sr_store_sk)] | mem-estimate=502.09KB mem-reservation=0B thread-reservation=0 | tuple-ids=2 row-size=24B cardinality=53.52K cost=11853 | in pipelines: 00(GETNEXT) | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 (adjusted from 384) Per-Host Shared Resources: mem-estimate=4.00MB mem-reservation=4.00MB thread-reservation=0 runtime-filters-memory=4.00MB Per-Instance Resources: mem-estimate=26.33MB mem-reservation=2.12MB thread-reservation=1 max-parallelism=3 segment-costs=[351629, 110283] cpu-comparison-result=4 [max(3 (self) vs 4 (sum children))] 03:AGGREGATE [STREAMING] | output: sum(SR_RETURN_AMT) | group by: sr_customer_sk, sr_store_sk | mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0 | tuple-ids=2 row-size=24B cardinality=53.52K cost=315877 | in pipelines: 00(GETNEXT) | 02:HASH JOIN [INNER JOIN, BROADCAST] | hash-table-id=04 | hash predicates: sr_returned_date_sk = d_date_sk | fk/pk conjuncts: sr_returned_date_sk = d_date_sk | mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0 | tuple-ids=0,1 row-size=24B cardinality=53.52K cost=23423 | in pipelines: 00(GETNEXT), 01(OPEN) | |--F15:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | Per-Instance Resources: mem-estimate=2.95MB mem-reservation=2.94MB thread-reservation=1 runtime-filters-memory=1.00MB | | max-parallelism=3 segment-costs=[520] | JOIN BUILD ... -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Tue, 16 Apr 2024 15:45:07 +0000 Gerrit-HasComments: Yes
