[jira] [Commented] (IMPALA-12988) Calculate an unbounded version of CpuAsk

2024-04-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839697#comment-17839697
 ] 

ASF subversion and git services commented on IMPALA-12988:
--

Commit c8149d14127968eb9a4d26a623fa6cc82762216e in impala's branch 
refs/heads/branch-4.4.0 from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c8149d141 ]

IMPALA-12988: Calculate an unbounded version of CpuAsk

Planner calculates CpuAsk through a recursive call beginning at
Planner.computeBlockingAwareCores(), which is called after
Planner.computeEffectiveParallelism(). It does blocking operator
analysis over the selected degree of parallelism that was decided during
computeEffectiveParallelism() traversal. That selected degree of
parallelism, however, is already bounded by min and max parallelism
config, derived from PROCESSING_COST_MIN_THREADS and
MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.

This patch calculates an unbounded version of CpuAsk that is not bounded
by min and max parallelism config. It is purely based on the fragment's
ProcessingCost and query plan relationship constraint (for example, the
number of JOIN BUILDER fragments should equal the number of destination
JOIN fragments for partitioned join).

Frontend will receive both bounded and unbounded CpuAsk values from
TQueryExecRequest on each executor group set selection round. The
unbounded CpuAsk is then scaled down once using a nth root based
sublinear-function, controlled by the total cpu count of the smallest
executor group set and the bounded CpuAsk number. Another linear scaling
is then applied on both bounded and unbounded CpuAsk using
QUERY_CPU_COUNT_DIVISOR option. Frontend then compare the unbounded
CpuAsk after scaling against CpuMax to avoid assigning a query to a
small executor group set too soon. The last executor group set stays as
the "catch-all" executor group set.

After this patch, setting COMPUTE_PROCESSING_COST=True will show
following changes in query profile:
- The "max-parallelism" fields in the query plan will all be set to
  maximum parallelism based on ProcessingCost.
- The CpuAsk counter is changed to show the unbounded CpuAsk after
  scaling.
- A new counter CpuAskBounded shows the bounded CpuAsk after scaling. If
  QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting strategy
  is selected, this CpuAskBounded is also the minimum total admission
  slots given to the query.
- A new counter MaxParallelism shows the unbounded CpuAsk before
  scaling.
- The EffectiveParallelism counter remains unchanged,
  showing bounded CpuAsk before scaling.

Testing:
- Update and pass FE test TpcdsCpuCostPlannerTest and
  PlannerTest#testProcessingCost.
- Pass EE test tests/query_test/test_tpcds_queries.py
- Pass custom cluster test tests/custom_cluster/test_executor_groups.py

Change-Id: I5441e31088f90761062af35862be4ce09d116923
Reviewed-on: http://gerrit.cloudera.org:8080/21277
Reviewed-by: Kurt Deschler 
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 


> Calculate an unbounded version of CpuAsk
> 
>
> Key: IMPALA-12988
> URL: https://issues.apache.org/jira/browse/IMPALA-12988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
> Fix For: Impala 4.4.0
>
>
> CpuAsk is calculated through recursive call beginning at 
> Planner.computeBlockingAwareCores(), which called after 
> Planner.computeEffectiveParallelism(). It does blocking operator analysis 
> over selected degree of parallelism that decided during 
> computeEffectiveParallelism() traversal. That selected degree of parallelism, 
> however, is already bounded by min and max parallelism config, derived from 
> PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options 
> accordingly.
> It is beneficial to have another version of CpuAsk that is not bounded by min 
> and max parallelism config. It should purely based on the fragment's 
> ProcessingCost and query plan relationship constraint (ie., num JOIN BUILDER 
> fragment should be equal as num JOIN fragment for partitioned join). During 
> executor group set selection, Frontend should use the unbounded CpuAsk number 
> to avoid assigning query to small executor group set prematurely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-12988) Calculate an unbounded version of CpuAsk

2024-04-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839196#comment-17839196
 ] 

ASF subversion and git services commented on IMPALA-12988:
--

Commit d437334e5304823836b9ceb5ffda9945dd7cb183 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d437334e5 ]

IMPALA-12988: Calculate an unbounded version of CpuAsk

Planner calculates CpuAsk through a recursive call beginning at
Planner.computeBlockingAwareCores(), which is called after
Planner.computeEffectiveParallelism(). It does blocking operator
analysis over the selected degree of parallelism that was decided during
computeEffectiveParallelism() traversal. That selected degree of
parallelism, however, is already bounded by min and max parallelism
config, derived from PROCESSING_COST_MIN_THREADS and
MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.

This patch calculates an unbounded version of CpuAsk that is not bounded
by min and max parallelism config. It is purely based on the fragment's
ProcessingCost and query plan relationship constraint (for example, the
number of JOIN BUILDER fragments should equal the number of destination
JOIN fragments for partitioned join).

Frontend will receive both bounded and unbounded CpuAsk values from
TQueryExecRequest on each executor group set selection round. The
unbounded CpuAsk is then scaled down once using a nth root based
sublinear-function, controlled by the total cpu count of the smallest
executor group set and the bounded CpuAsk number. Another linear scaling
is then applied on both bounded and unbounded CpuAsk using
QUERY_CPU_COUNT_DIVISOR option. Frontend then compare the unbounded
CpuAsk after scaling against CpuMax to avoid assigning a query to a
small executor group set too soon. The last executor group set stays as
the "catch-all" executor group set.

After this patch, setting COMPUTE_PROCESSING_COST=True will show
following changes in query profile:
- The "max-parallelism" fields in the query plan will all be set to
  maximum parallelism based on ProcessingCost.
- The CpuAsk counter is changed to show the unbounded CpuAsk after
  scaling.
- A new counter CpuAskBounded shows the bounded CpuAsk after scaling. If
  QUERY_CPU_COUNT_DIVISOR=1 and PLANNER_CPU_ASK slot counting strategy
  is selected, this CpuAskBounded is also the minimum total admission
  slots given to the query.
- A new counter MaxParallelism shows the unbounded CpuAsk before
  scaling.
- The EffectiveParallelism counter remains unchanged,
  showing bounded CpuAsk before scaling.

Testing:
- Update and pass FE test TpcdsCpuCostPlannerTest and
  PlannerTest#testProcessingCost.
- Pass EE test tests/query_test/test_tpcds_queries.py
- Pass custom cluster test tests/custom_cluster/test_executor_groups.py

Change-Id: I5441e31088f90761062af35862be4ce09d116923
Reviewed-on: http://gerrit.cloudera.org:8080/21277
Reviewed-by: Kurt Deschler 
Reviewed-by: Abhishek Rawat 
Tested-by: Impala Public Jenkins 


> Calculate an unbounded version of CpuAsk
> 
>
> Key: IMPALA-12988
> URL: https://issues.apache.org/jira/browse/IMPALA-12988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Riza Suminto
>Assignee: Riza Suminto
>Priority: Major
>
> CpuAsk is calculated through recursive call beginning at 
> Planner.computeBlockingAwareCores(), which called after 
> Planner.computeEffectiveParallelism(). It does blocking operator analysis 
> over selected degree of parallelism that decided during 
> computeEffectiveParallelism() traversal. That selected degree of parallelism, 
> however, is already bounded by min and max parallelism config, derived from 
> PROCESSING_COST_MIN_THREADS and MAX_FRAGMENT_INSTANCES_PER_NODE options 
> accordingly.
> It is beneficial to have another version of CpuAsk that is not bounded by min 
> and max parallelism config. It should purely based on the fragment's 
> ProcessingCost and query plan relationship constraint (ie., num JOIN BUILDER 
> fragment should be equal as num JOIN fragment for partitioned join). During 
> executor group set selection, Frontend should use the unbounded CpuAsk number 
> to avoid assigning query to small executor group set prematurely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org