subject:"\[Impala\-ASF\-CR\] IMPALA\-11604 Planner changes for CPU usage"

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58: Code-Review-2

Putting -2 for now until I managed to split the patch.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 07 Mar 2023 03:10:31 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 07 Mar 2023 02:38:03 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 22:37:47 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58: -Code-Review

Temporarily removing the +2 votes.
I'll test whether splitting this patch is feasible.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 21:14:08 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Kurt Deschler (Code Review)

Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58:

Changes look good but please split the effective parallelism work into a 
separate patch so that we can manage the two separately and authors are 
correctly represented.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 19:02:58 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58: Code-Review+2

(1 comment)

Thank you Qifan and Wenzhe!
I mention some follow up work in the commit message.
Carry the +2.

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS54, Line 30:   long cardinality, float exprsCost, float 
materializationCost) {
> For VARCHAR, we can use some kind of average width stats, if available.  Fo
Filed IMPALA-11972 for this.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 17:30:44 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#58) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 58:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9113/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 58
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 17:32:07 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-06 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 57: Code-Review+2

(2 comments)

+2 to include WenZhe's +1.

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS54, Line 30:   long cardinality, float exprsCost, float 
materializationCost) {
> Agree that row width might factor in the PC for some operator. Is fact, I a
For VARCHAR, we can use some kind of average width stats, if available.  For 
fixed width columns, we just use the width. In both cases, the unit should be 
in bytes, at least in first draft.

The idea of including a width in costing is to make the outcome as precise and 
less error-prone as possible.

I am okay with making the change in next iteration. Since being very important, 
maybe creating a new JIRA and referring to it in the commit message.


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270:
> Let say,
I see.  Thanks for the examples.  I agree the use of > is fine.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 57
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 06 Mar 2023 14:20:40 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-04 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 57:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12541/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 57
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sun, 05 Mar 2023 05:36:12 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-04 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 57:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12539/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 57
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 04 Mar 2023 14:26:58 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-04 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 56:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12538/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 56
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 04 Mar 2023 11:58:40 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-03 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 55:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12536/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 55
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 04 Mar 2023 06:55:56 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-03 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 57: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 57
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 04 Mar 2023 00:29:07 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 57:

(5 comments)

Done. Sorry for duplicating test_query_assignment_with_two_exec_groups.

http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@744
PS55, Line 744: fair-scheduler-3-groups.xml
> nit: should we rename the file since we add one more group?
Done


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@747
PS55, Line 747: three set
> nit: add one more group
Done


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@750
PS55, Line 750:
> nit: should we rename the file since we add one more group?
Done


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@753
PS55, Line 753:  small and large.
> Could we keep consistent with group names in llama-site-2-groups.xml and fa
Reverted this. Looks like I accidentally edit this.
test_query_assignment_with_two_exec_groups will continue to test with 2 
executor groups only.


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@793
PS55, Line 793: _DIR = os.path.j
> nit: three group sets: tiny,
Done



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 57
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Sat, 04 Mar 2023 00:14:19 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#57) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#56) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-03 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 55:

(5 comments)

Thanks to address all of my comments. Just a few comments for the new test. 
Others looks good to me.

http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@744
PS55, Line 744: fair-scheduler-2-groups.xml
nit: should we rename the file since we add one more group?


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@747
PS55, Line 747: two sets:
nit: add one more group


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@750
PS55, Line 750: llama-site-2-groups.xml
nit: should we rename the file since we add one more group?


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@753
PS55, Line 753: root.small:1,root:medium:2,root.large:3
Could we keep consistent with group names in llama-site-2-groups.xml and 
fair-scheduler-2-groups.xml?


http://gerrit.cloudera.org:8080/#/c/19033/55/tests/custom_cluster/test_executor_groups.py@793
PS55, Line 793: two group sets:
nit: three group sets: tiny,



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 55
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 03 Mar 2023 23:41:24 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. (
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

Patch Set 55:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG@295
PS54, Line 295: .
> Need to mention the default value. Maybe set it to MT_DOP?
Added clarification.

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG@328
PS54, Line 328: processing_cost_use_equal_expr_weight=false.
:
: Q3
: CoreCount={total=12 trace=F00:12}
:
: Q12
: CoreCount={total=12 trace=F00:12}
:
> indentation
Done

http://gerrit.cloudera.org:8080/#/c/19033/54/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/54/common/thrift/ImpalaService.thrift@768
PS54, Line 768: It is recommend to not set i
> What's effect if setting this option as 128 for executors with 8 cores? Sho
There is a possibility that query parallelism scale too high and does not fit
any executor group set.
Added warning to not set it with value more than number of physical cores in
executor node.

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS54, Line 30: long cardinality, float exprsCost, float
materializationCost) {
> I think we should factor in the row width in computing PC, as row width can
Agree that row width might factor in the PC for some operator. Is fact, I added
materialization cost is here to accommodate PC where row width should factor
in. Currently, PC of ScanNode, ExchangeNode, and DataStreamSink has row width
factored in through materialization parameter here.

There is also question whether fixed-length column should be treated the same
as varying-length column? Or, can some operator ignore row width because
CodeGen-ed code can handle it very efficiently?

Ultimately, I think further research is needed to determine which query
operator should care about row width. I hope it is OK to get the current
costing infrastructure merged in first and improve in the next iteration.

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/CoreCount.java
File fe/src/main/java/org/apache/impala/planner/CoreCount.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/CoreCount.java@31
PS54, Line 31: , computed
> CPU cores, computed from the CPU cost,
Done

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@203
PS54, Line 203: costs per row a
> nit: costs per row are
Done

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/Planner.java@480
PS54, Line 480: max
> max?
Yes! Thank you for catching this.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270:
> Not sure I follow. Maybe an example?
Let say,
nodeStepCount = 3
getNumInstancesExpected = 6
maxInstance = 9
Then consumer is still allowed to scale up num instance by 3 (6 + 3 = 9).

However, let say
nodeStepCount = 3
getNumInstancesExpected = 6
maxInstance = 8
Then consumer is in highest count already since it is not possible to scale up
by 3 without exceeding maxInstance 8.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java
File fe/src/main/java/org/apache/impala/planner/SegmentCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@78
PS52, Line 78:
> Yeah a revisit will help. It sounds like we need to deal with the unknown c
I'm not sure whats the best strategy yet to deal with missing cardinality.
I suspect in the current state, worst case situation will adjust to maximum
possible (but still bounded by num_cores_per_executor from exec group set and
PROCESSING_COST_MIN_THREADS).
I plan to revisit this in next iteration.

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#55) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-03 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 54:

(8 comments)

Looks great!

Thanks a lot for the changes, some of them are significant.

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG@295
PS54, Line 295: .
Need to mention the default value. Maybe set it to MT_DOP?

Should we also mention PROCESSING_COST_MAX_THREADS which can be default to max 
# of cores?


http://gerrit.cloudera.org:8080/#/c/19033/54//COMMIT_MSG@328
PS54, Line 328: Q3
  :   CoreCount={total=12 trace=F00:12}
  :
  : Q12
  :   CoreCount={total=12 trace=F00:12}
  :
  : Q15
  :   CoreCount={total=15 trace=N07:3+F00:12}
indentation


http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@30
PS54, Line 30:   long cardinality, float exprsCost, float 
materializationCost) {
I think we should factor in the row width in computing PC, as row width can 
vary a lot. Without considering the width, the computed PC may not be right.

PC = cardinality * width * (expr cost + materialization cost).


http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/CoreCount.java
File fe/src/main/java/org/apache/impala/planner/CoreCount.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/CoreCount.java@31
PS54, Line 31: requirement
CPU cores, computed from the CPU cost,


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270:
> I think '>" is the correct sign here?
Not sure I follow.  Maybe an example?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java
File fe/src/main/java/org/apache/impala/planner/SegmentCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@78
PS52, Line 78:
> When I first work on this, I intentionally made ProcessingCost to not accep
Yeah a revisit will help. It sounds like we need to deal with the unknown cost 
(-1 in cardinality) as well.  Maybe we do not adjust in such situations?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@102
PS52, Line 102:
> I decide to remove traverseBlockingAwareCost() method.
I see.

Just wonder if the algorithm can be modified to not check the state of the 
children. In other words, each child can supply an expected core count which is 
available when the parent node is being processed.


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@118
PS52, Line 118:
> In traverseBlockingAwareCores(), I changed the loop to iterate all children
Sounds like a conservative strategy. Done.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 54
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 03 Mar 2023 14:54:41 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-02 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 54:

(4 comments)

Looks good to me.

http://gerrit.cloudera.org:8080/#/c/19033/54/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/54/common/thrift/ImpalaService.thrift@768
PS54, Line 768: Valid values are in [1, 128]
What's effect if setting this option as 128 for executors with 8 cores? Should 
we recommend not setting it with value more than number of physical cores?


http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@203
PS54, Line 203: cost per row is
nit: costs per row are


http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/54/fe/src/main/java/org/apache/impala/planner/Planner.java@480
PS54, Line 480: min
max?


http://gerrit.cloudera.org:8080/#/c/19033/54/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19033/54/tests/custom_cluster/test_executor_groups.py@842
PS54, Line 842: Expect to run the query on the small group
Could you define three executor groups? default divisor use middle size of 
group, divisor 2 use small group, divisor 0.2 use large group.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 54
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 03 Mar 2023 05:25:35 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 54:

(42 comments)

Patch set 54 address comments from patch set 51 and 52.

I decide to remove some codes that is not very useful anymore to narrow down 
the review. I will add them back in the future if they are deemed important to 
have.

PROCESSING_COST_ALLOW_THREAD_INCREMENT and PROCESSING_COST_MAX_THREAD is now 
removed. PROCESSING_COST_MIN_THREAD, MT_DOP, min_processing_per_thread, and 
num_cores_per_executor from selected executor group set now decide the minimum 
and maximum parallelism per-fragment.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301
PS51, Line 301:  executor group with total
  :available CPU X. Note that setting with a fractional
> Ack. Will take a look at this.
Repurpose PROCESSING_COST_MAX_THREADS into PROCESSING_COST_MIN_THREADS


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@304
PS51, Line 304:> 0.0. The default value is 1.
  :
  : 2. processing_cost_use_equal_expr_weight
  :If true, all expression evaluations are weighted equally to 
1 during
  :the plan node's processing cost calculation. If false, 
expression
  :cost from IMP
> I'll look if I can core core limit from IMPALA-11617.
ps54 pass the num_cores_per_executor from selected executor group set as the 
max thread count.
Final EDoP can still exceed num cores available in selected group set, and 
Frontend will need to replan with the next larger group set.


http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/scheduling/scheduler.h@432
PS52, Line 432: eturn true if 'pl
> Could we change this new function, ContainsNode and following two new funct
Done


http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc@200
PS52, Line 200: query_cpu_count_divisor, 1.0
> Is it possible to add test cases with divisor as 0.5 and 2.0?
Added TestExecutorGroups::test_query_cpu_count_divisor.


http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc@209
PS52, Line 209: eighted equally to 1 during the
> add TODO comments for tune the expression cost
Done


http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc@212
PS52, Line 212:
> This flag variable can be set as any positive number. Add what's the effect
Explained the impact in flag description.


http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/ImpalaService.thrift@771
PS52, Line 771:
> nit: processing cost
This query option is now deleted.


http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/Planner.thrift@90
PS52, Line 90: need
> nit: need to
Done


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@724
PS52, Line 724: alCost(getGroupi
> Do we need this unused parameter? Don't see @Override annotation for this f
Done


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@320
PS52, Line 320: alCost(getSortEx
> unused parameter
Done


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@511
PS52, Line 511: processingCost_ = ProcessingCost.zero();
  : for (AggregateInfo aggInfo : aggInfos_) {
> We can tweak a little to avoid this list and two for loops.
Done


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java@31
PS52, Line 31: countSupplier mus
> don't see multiplier in this class

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-02 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 54:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/12525/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 54
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 03 Mar 2023 00:16:30 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#54) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-03-02 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 53:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12523/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 53
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 02 Mar 2023 23:58:06 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 53:

Patch set 53 update the planner test expected output after switching 
processing_cost_use_equal_expr_weight flag back to true.

It has not address any comments from ps52 yet.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 53
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 02 Mar 2023 23:40:20 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#53) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-27 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 52:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc@200
PS52, Line 200: query_cpu_count_divisor, 1.0
Is it possible to add test cases with divisor as 0.5 and 2.0?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java@31
PS52, Line 31: Multiple supplier
don't see multiplier in this class


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/BroadcastProcessingCost.java@62
PS52, Line 62: numInstanceSupplier_
call getNumInstanceExpected()?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/CoreCount.java
File fe/src/main/java/org/apache/impala/planner/CoreCount.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/CoreCount.java@58
PS52, Line 58: ids_ = ids;
Add Preconditions.check to make sure two lists have same length


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/DataSink.java
File fe/src/main/java/org/apache/impala/planner/DataSink.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/DataSink.java@75
PS52, Line 75: && explainLevel.ordinal() >= TExplainLevel.VERBOSE.ordinal()) {
 : // Print processing cost.
 : 
output.append(processingCost_.getExplainString(detailPrefix, false));
can be moved after line #67


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/DataSink.java@147
PS52, Line 147: fragment_.getPlanRoot().getCardinality()
can be replaced with getNumRowsProduced()? Numbers of rows consumed and 
produced are same?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@245
PS52, Line 245: SerDe
too short, hard to read


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@24
PS52, Line 24: MultipleProcessingCost
> nit.  Maybe ScaledProcessingCost? MultipleProcessingCost sounds like a list
Or MultiplyProcessingCost?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@28
PS52, Line 28: multiple
nit: multiplier?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@39
PS52, Line 39: numInstances
add Preconditions.check for numInstance greater than 0


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java
File fe/src/main/java/org/apache/impala/planner/SegmentCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@72
PS52, Line 72: nodes_.size()
if nodes_size() equal 0?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@83
PS52, Line 83: appendSinkCost
> appendSink()
or setSink()?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/UnionNode.java@155
PS52, Line 155: cost
nit: costs


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/service/Frontend.java@273
PS52, Line 273: Certain queries such as EXPLAIN that do not populate
  :   // TExecRequest.query_exec_request field
nit: sentence seems not complete. set value as -1?


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-27 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 52:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/scheduling/scheduler.h@432
PS52, Line 432: ContainsUnionNode
Could we change this new function, ContainsNode and following two new functions 
as static functions? It seems they don't use any member variables?


http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc@209
PS52, Line 209: expression cost from IMPALA-2805
add TODO comments for tune the expression cost


http://gerrit.cloudera.org:8080/#/c/19033/52/be/src/util/backend-gflag-util.cc@212
PS52, Line 212: // TODO: Benchmark and tune this config with an optimal value.
This flag variable can be set as any positive number. Add what's the 
effectiveness if the value is too high, or too low.


http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/ImpalaService.thrift@771
PS52, Line 771: costing
nit: processing cost


http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/Planner.thrift
File common/thrift/Planner.thrift:

http://gerrit.cloudera.org:8080/#/c/19033/52/common/thrift/Planner.thrift@90
PS52, Line 90: need
nit: need to


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@724
PS52, Line 724: int numInstances
Do we need this unused parameter? Don't see @Override annotation for this 
function


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@320
PS52, Line 320: int numInstances
unused parameter


http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@511
PS52, Line 511: List processingCosts =
  : Lists.newArrayListWithCapacity(aggInfos_.size());
We can tweak a little to avoid this list and two for loops.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 52
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 27 Feb 2023 20:12:39 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-27 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. (
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

Patch Set 52:

(25 comments)

The implementation of the algorithm looks good!

If the specific checks on JOIN and UNION can be generalized at the PlanNode
level, it will be great.

I also wonder if the caching of costs/cores used to compute the total is
necessary.

The producer consumer ratio based adjustment is fine and could be improved
further at the guidance of a plan node in the future.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171
PS51, Line 171:
: In bottom-up direction, there exist four segments in F03:
: Blocking segment 1: (11:EXCHANGE, 12:AGGREGATE)
: Blocking segment 2: 06
> Done. I rather keep calling segment 4 as non-blocking segment since only Jo
Done

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205
PS51, Line 205:
> No, SegmentCost is modelled as tree.
Done

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@24
PS52, Line 24: MultipleProcessingCost
nit. Maybe ScaledProcessingCost? MultipleProcessingCost sounds like a list of
processing costs to me.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@32
PS52, Line 32: > 0
>=0.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/MultipleProcessingCost.java@61
PS52, Line 61: MultCost
ScaledCost()?

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@151
PS52, Line 151: ProcessingCost, List
nit. May need to add a comment for the first and the second. I also wonder if
we need to cache the list, once we have the total cost (the sum over the
segment subtree).

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@155
PS52, Line 155: CoreCount, List
same as above.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@254
PS52, Line 254: consProdRatio
consumerProducerRatio

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@270
PS52, Line 270: >
>=

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java
File fe/src/main/java/org/apache/impala/planner/SegmentCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@48
PS52, Line 48: SegmentCost
Maybe CostingSegment?

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@54
PS52, Line 54: // The ProcessingCost of this fragment segment.
nit. which is the sum of the processing cost of all nodes in nodes_.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@78
PS52, Line 78: 1
I wonder why 0 is being excluded. A node consuming or producing 0 rows is
perfectly fine.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@79
PS52, Line 79: 1
same as above.

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@83
PS52, Line 83: appendSinkCost
appendSink()

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@102
PS52, Line 102: subtreeCostBuilder
I wonder if we need to remember the costs used to compute the max cost for the
tree rooted at the current segment. Once the post-order traversal is done, the
sum is known. These sums can be used to compute the # of cores.
Maybe I missed something?

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@103
PS52, Line 103: segmentCost
nit. maxCost

http://gerrit.cloudera.org:8080/#/c/19033/52/fe/src/main/java/org/apache/impala/planner/SegmentCost.java@105
PS52, Line 105: .
gather the costs of the children first and find the max cost among them.

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-24 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 52:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12438/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 52
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 24 Feb 2023 20:57:30 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-24 Thread Riza Suminto (Code Review)

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 52:

(25 comments)

ps52 is a rebase and address some comments in ps51.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@151
PS51, Line 151:
> nit plan fragment, which is blocking since it has 3 blocking PlanNode:
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171
PS51, Line 171:
  : In bottom-up direction, there exist four segments in F03:
  :   Blocking segment 1: (11:EXCHANGE, 12:AGGREGATE)
  :   Blocking segment 2: 06
> indentation and with some additional info as follows.
Done. I rather keep calling segment 4 as non-blocking segment since only 
JoinBuildSink is considered as blocking DataSink.

This has correlation with definition of blocking fragment. All fragment has 
DataSink. But fragment without blocking PlanNode nor blocking DataSink is not 
blocking fragment.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@177
PS51, Line 177:
  : Therefore we have:
  :   PC(segment 1) = 426337+34548320
  :   PC(segment 2) =
> indentation
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@182
PS51, Line 182:   PC(segment 4) = 22
> nit a
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@183
PS51, Line 183:
  : These per-segment costs stored in a SegmentCost tree rooted at
  : PlanFragment.rootSegment_, and ar
> nit. , and are [34974657, 2159270, 23752870, 22] respectively after the pos
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@189
PS51, Line 189: PlanFragment.collectSegmentCostHelper().
> I think "Output ProcessingCost" should be really called "Total Processing c
Removed this paragraph and refer the cost directly as "the last segment's 
ProcessingCost".

I only consider the last segment rather than the total over all segment in 
fragment to anticipate for burst exchange scenario. For example, fragment that 
only do aggregate may spend long time during aggregation. But when it is ready 
to send rows upward, the receiver fragment above it should have similar EDoP to 
keep-up with the sender.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@196
PS51, Line 196: hat fragment by compa
> nit. effective degree of parallelism (EDoP). We can use EDoP in the rest of
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@199
PS51, Line 199:  UnionNode, or
  : ScanNode will
> the costing algorithm attempts to adjust the number of
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@201
PS51, Line 201:
> see the previous comment.
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205
PS51, Line 205:
> Assume that segments are modeled as a list in a plan fragment (true?) in th
No, SegmentCost is modelled as tree.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@209
PS51, Line 209:  in a similar post-order
> EDoP
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@261
PS51, Line 261:
> nit suggest to remove since a query plan with a sink node, which is blockin
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@263
PS51, Line 263: f a query or the query itself. Each blocking
  : subtree will
> the intermediate or leaf nodes.
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@268
PS51, Line 268: ample is [4, 4,
> By reading this para, it seems CoreCount is a better name.  Usually a requi
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@269
PS51, Line 269:
> nit remove
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@286
PS51, Line 286: control the entire com
> EDoP
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: m or not.
> nit. remove
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288:Control whether to enable this CPU costing algorithm or not.
> set
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@292
PS51, Line 292:  instances (threads) that
> the entire computation of EDoP.
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@300
PS51, Line 300: ount of
> computing
Done


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301
PS51, Line 301:  is suggested to keep this to
  :false until the min_processing_per_thread backend fl
> I strongly suggest that we introduce PROCESSING_COST_MIN_THREADS in this pa
Ack. Will take a look at this.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@304
PS51, Line 304:
  : This patch also adds three backend flags to tune the algorithm.
  : 1. query_cpu_count_divisor
  :Divide the CPU requirement of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-24 Thread Riza Suminto (Code Review)

Riza Suminto has uploaded a new patch set (#52) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-23 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 51:

(26 comments)

Looks very good!

I will look at the code corresponding to section II, III and IV this weekend.

Can you please also confirm that segments are still modeled as a list within a 
fragment?  How hard is it to model as a tree?  Personally I think it is very 
important that all operators can participate in EDoP adjustment to make this 
feature sound.

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@151
PS51, Line 151: fragment plan
nit plan fragment, which is blocking since it has 3 blocking PlanNode:
12:AGGREGATE, 06:SORT, and 08:TOP-N.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@171
PS51, Line 171: 1. (11:EXCHANGE, 12:AGGREGATE)
  : 2. 06:SORT
  : 3. (07:ANALYTIC, 08:TOP-N)
  : 4. DataStreamSink of F03
indentation and with some additional info as follows.

Blocking segment 1:  (11:EXCHANGE, 12:AGGREGATE)
Blocking segment 2: 06:SORT
Blocking segment 3:  (07:ANALYTIC, 08:TOP-N)
Non-blocking segment 4: DataStreamSink of F03

I also wonder if segment 4 should be a blocking one since by the definition 
above any DataSink is blocking.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@177
PS51, Line 177: PC(segment 1) = 426337+34548320
  : PC(segment 2) = 2159270
  : PC(segment 3) = 23751970+900
  : PC(segment 4) = 22
indentation


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@182
PS51, Line 182: These per-segment costs stored in SegmentCost tree rooted at
nit a


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@183
PS51, Line 183: . In this example, post-order traversal of
  : rootSegment_ will show their associated cost as:
  : [34974657, 2159270, 23752870, 22]
nit. , and are [34974657, 2159270, 23752870, 22] respectively after the 
post-order traversal.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@186
PS51, Line 186: F03 is also a blocking fragment since it has 3 blocking 
PlanNode:
  : 12:AGGREGATE, 06:SORT, and 08:TOP-N.
remove, as the info is described above.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@189
PS51, Line 189: A rootSegment_ is also called an Output ProcessingCost
I think "Output ProcessingCost" should be really called "Total Processing 
cost", since it takes some cost for a fragment to output rows (not cost).


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@196
PS51, Line 196: effective parallelism
nit. effective degree of parallelism (EDoP). We can use EDoP in the rest of the 
text.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@199
PS51, Line 199: algorithms will
  : try to adjust
the costing algorithm attempts to adjust the number of


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@201
PS51, Line 201: Output
see the previous comment.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@205
PS51, Line 205: .
Assume that segments are modeled as a list in a plan fragment (true?) in this 
patch, we should append the following here:

, since segments are modeled as a list in a plan fragment .


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@209
PS51, Line 209: the effective parallelism
EDoP


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@261
PS51, Line 261: several
nit suggest to remove since a query plan with a sink node, which is blocking 
node,  at the root maps to one blocking subtree.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@263
PS51, Line 263: leaves. All other fragments in the subtree are
  : non-blocking
the intermediate or leaf nodes.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@268
PS51, Line 268: CoreRequirement
By reading this para, it seems CoreCount is a better name.  Usually a 
requirement in SQL compiler means something not solid, such as ANY, NOT SINGLE, 
etc.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@269
PS51, Line 269: certain
nit remove


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@286
PS51, Line 286: effective parallelism
EDoP


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: executor group to determine if it fits to run in that executor 
group set
set


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@288
PS51, Line 288: executor group
nit. remove


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@292
PS51, Line 292: this CPU costing algorithm
the entire computation of EDoP.


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@300
PS51, Line 300: reducing
computing


http://gerrit.cloudera.org:8080/#/c/19033/51//COMMIT_MSG@301
PS51, Line 301: Currently, there is no option to

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-22 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 51:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12421/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 51
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 22 Feb 2023 18:23:20 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-22 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 50:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12420/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 50
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 22 Feb 2023 18:17:29 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-22 Thread Riza Suminto (Code Review)

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 51:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@325
PS48, Line 325: sing_per_th
> Yes, turn back to false after tweaking the numbers.
Done


http://gerrit.cloudera.org:8080/#/c/19033/49/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/49/be/src/util/backend-gflag-util.cc@213
PS49, Line 213: processing load
> what's unit of processing load? Bytes?
It is in processing cost unit. Clarified the description in ps50.


http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@20
PS49, Line 20: com.google.common.base.Preconditions;
> Does not look like the right class to import.
Done


http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@139
PS49, Line 139: l int finalProducerParal
> why need to define this variable?
I don't have this at first and directly use maxProducerParallelism, but I got 
error from my IDE: "Local variable maxProducerParallelism defined in an 
enclosing scope must be final or effectively final". Added final modifier in 
ps50.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 51
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 22 Feb 2023 18:03:21 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-22 Thread Riza Suminto (Code Review)

Riza Suminto has uploaded a new patch set (#51) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-22 Thread Riza Suminto (Code Review)

Riza Suminto has uploaded a new patch set (#50) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-21 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 49:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@325
PS48, Line 325: sing_per_th
> My thought as well. Should we revert the default back to True?
Yes, turn back to false after tweaking the numbers.


http://gerrit.cloudera.org:8080/#/c/19033/49/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/49/be/src/util/backend-gflag-util.cc@213
PS49, Line 213: processing load
what's unit of processing load? Bytes?


http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@139
PS49, Line 139: finalProducerParallelism
why need to define this variable?



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 49
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 21 Feb 2023 22:34:33 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-16 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 49:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12390/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 49
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 17 Feb 2023 00:33:24 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-16 Thread Riza Suminto (Code Review)

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 49:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213
PS43, Line 213: overlapping between fragment execution and blocking operators. 
We
> The upper bound for each fragment should be the number of threads or someth
min_processing_per_thread=10M seems to be a good upper bound in my local 
machine.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@140
PS47, Line 140: a subtree of PlanNodes/DataSink in the fragment with a DataSink 
or
Added SegmentCost class for segment abstraction.
Also added TPCDS-Q49 into tpcds-processing-cost.test to test against union 
fragment.


http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@325
PS48, Line 325: sing_per_th
> As the comments in https://github.com/apache/impala/blob/master/fe/src/main
My thought as well. Should we revert the default back to True?


http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@346
PS48, Line 346:
> Could you attach the bench mark which show effective parallelism improvemen
Our single-node benchmark script mainly measure query latency. I don't expect 
any faster query latency with this patch since the default combination of all 
new query options and backend flags will actually reduce parallelism in some 
fragments rather than increasing them. As long as latency does not regress 
severely compared to regular MT_DOP mode, I take it as a good outcome.

The improvement probably best expressed as memory and thread count reduction.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210
PS43, Line 210:
> Would rather keep this as cost as cost of a row is a highly variable metric
Changed into min_processing_per_thread in ps49.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263:   // Returns the total estimated size (in bytes) of the row 
batch queues by
> This assume the total cost for a row batch is 1. Is it right estimation?
Changed in ps49 to model the cost as 1 per 1KB of average serialized row size.
That seems good enough to increase DataStreamSink and ExchangeNode cost.


http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/Planner.java@470
PS48, Line 470: ot = postOrderFra
> nit: this result seems not used now. Add "TODO" comment
Done


http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/ProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/19033/49/fe/src/main/java/org/apache/impala/planner/ProcessingCost.java@20
PS49, Line 20: 
com.google.cloud.hadoop.repackaged.gcs.com.google.common.math.LongMath
Does not look like the right class to import.


http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/ScanNode.java@359
PS48, Line 359:
> In ExchangeNode.estimateProcessingCostPerRow(), the cost per row is calcula
Changed in ps49 to model the cost as 1 per 1KB of average row size.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 49
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 17 Feb 2023 00:32:47 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-16 Thread Riza Suminto (Code Review)

Riza Suminto has uploaded a new patch set (#49) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = (average serialized row size) / 1024

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = (average row size) / 1024;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-15 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/Planner.java@470
PS48, Line 470: blockingAwareCost
nit: this result seems not used now. Add "TODO" comment



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 15 Feb 2023 21:00:49 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-15 Thread Wenzhe Zhou (Code Review)

Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@325
PS48, Line 325: IMPALA-2805
As the comments in 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Expr.java#L79-L81,
 it seems relative costs defined in IMPALA-2805 are not accurate. We may need 
to tune the numbers a little.


http://gerrit.cloudera.org:8080/#/c/19033/48//COMMIT_MSG@346
PS48, Line 346: Testing:
Could you attach the bench mark which show effective parallelism improvement?


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263: return deferredBatchQueueSize;
> I intended this to be a serialization/deserialization cost per row.
This assume the total cost for a row batch is 1. Is it right estimation?


http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/48/fe/src/main/java/org/apache/impala/planner/ScanNode.java@359
PS48, Line 359: 1.0f / getRowBatchSize(queryOptions);
In ExchangeNode.estimateProcessingCostPerRow(), the cost per row is calculated 
as 1 / (getRowBatchSize(queryOptions) / avg-row-size). Show we do same?



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 15 Feb 2023 19:39:52 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. (
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

Patch Set 48:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: The costing algorithm splits a query fragment into several
segments
: divided by blocking PlanNode/DataSink boundary. Each fragment
segment is
: a subtree of PlanNodes/DataSink in the fragment with a DataSink
or
> A list implies the linear structure among blocking segments. Not sure it ca
That is actually a good point, thank you.
Within a plan fragment, there are 2 possible branching point: Join and Union.
For Join, the build is on separate fragment making the structure linear within
the fragment. But for Union, it is more complicated. I'll think more about this.

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154
PS47, Line 154: 100]
> Yes, AGGREGATE of 12 is blocking.
There is a significance in treating DataSink equally as PlanNode. When
comparing produce-consume rate with the fragment above it, we take the Output
Processing cost (segment 4 here) of Producer fragment to compare against
Consumer fragment cost.

In this example, the whole F03 seems to be slow because of its 3 blocking
operator. However, above TOP-N, the row transmission is fast since nothing but
DataStreamSink is active serializing and transmitting RowBatch. Simply merging
the cost of DataStreamSink will cause the Consumer fragment (fragment above
F03) to think that F03 is slow and transmitting little-by-little in steady
rate. This can lead to Consumer fragment mistakenly lower its parallelism,
thinking that it can consume faster than the Producer below it can send rows.
But the truth is that it may spent long time up to completion of TOP-N and then
quickly transmitting all the N rows above.

The importance of this equal treatment is more apparent in Pre Aggregation and
Final Aggregation fragment relationship, say the F00 below this F03

12:AGGREGATE [FINALIZE]
|
11:EXCHANGE [HASH(i_class)]
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
05:AGGREGATE [STREAMING]

05:AGGREGATE may be slow to pre-aggregate. But once it complete, the row
transmission by DataStreamSink of F00 is fast. Merging DataStreamSink cost of
F00 into 05:AGGREGATE can mistakenly lower parallelism of F03.

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263: return deferredBatchQueueSize;
> Still doesn't seem right to divide cost by row batch size. The compute cost
I intended this to be a serialization/deserialization cost per row.

--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Kurt Deschler
Gerrit-Reviewer: Qifan Chen
Gerrit-Reviewer: Riza Suminto
Gerrit-Reviewer: Wenzhe Zhou
Gerrit-Comment-Date: Tue, 14 Feb 2023 21:23:35 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-14 Thread Kurt Deschler (Code Review)

Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/43//COMMIT_MSG@213
PS43, Line 213: Effective parallelism of a query is the maximum upper bound of 
CPU core
> We can rework this to be used as a starting count. However, I do think that
The upper bound for each fragment should be the number of threads or something 
close to that. We shouldn't cap it otherwise unless we are seeing specific 
operators that can't scale linearly and in that case the operator costing can 
bound further.


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/scheduling/scheduler.cc@552
PS43, Line 552:   
*state->GetFragmentScheduleState(fragment_state->exchange_input_fragments[0]);
> Not always. This is the correct assignment if IsExceedMaxFsWriter return fa
Done


http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/19033/43/be/src/util/backend-gflag-util.cc@210
PS43, Line 210:
> Renamed this to min_input_rows_per_thread. It is now relied on number of in
Would rather keep this as cost as cost of a row is a highly variable metric.


http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

http://gerrit.cloudera.org:8080/#/c/19033/43/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java@263
PS43, Line 263: return deferredBatchQueueSize;
> Changed the estimate cost per row to 1 / row batch size.
Still doesn't seem right to divide cost by row batch size. The compute cost per 
row should be fairly constant. Are you trying to express the network bandwidth 
and latency? Latency can probably be assumed to be amortized by the row batch 
and ignored while bandwidth cost will be constant per row. We would need some 
factor to connect cost units to wall time for bandwidth calculations.



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 14 Feb 2023 19:56:35 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-14 Thread Impala Public Jenkins (Code Review)

Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12375/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 14 Feb 2023 19:44:29 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

2023-02-14 Thread Qifan Chen (Code Review)

Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: The costing algorithm splits a query fragment into several 
segments
  : divided by blocking PlanNode/DataSink boundary. Each fragment 
segment is
  : a subtree of PlanNodes/DataSink in the fragment with a DataSink 
or
> I will rearrange this paragraph and clarify a bit more about "segment" ment
A list implies the linear structure among blocking segments. Not sure it can 
model the graph relationship among them well.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@154
PS47, Line 154: 100]
> To me, it should be:
Yes, AGGREGATE of 12 is blocking.

A sink is like the bottom part of an exchange. Rows flow into it. Therefore it 
should not be a block segment by itself. Its cost can be included into that of 
the tree sending data to it.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@258
PS47, Line 258:  |
> F06 and F05 has JoinBuildSink (a special kind of DataSink that is blocking)
Done



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 14 Feb 2023 19:43:19 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..


Patch Set 48:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@138
PS47, Line 138: The costing algorithm splits a query fragment into several 
segments
  : divided by blocking PlanNode/DataSink boundary. Each fragment 
segment is
  : a subtree of PlanNodes/DataSink in the fragment with a DataSink 
or
> I will rearrange this paragraph and clarify a bit more about "segment" ment
Rearranged this paragraph in ps48.


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@142
PS47, Line 142: . P
> nit a query
Done


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@149
PS47, Line 149:
> A fragment without any blocking nodes is called a non-blocking fragment.
Done


http://gerrit.cloudera.org:8080/#/c/19033/47//COMMIT_MSG@269
PS47, Line 269:
> Right, "previous" sounds better. Will update this.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32dc770dfffcdd0be2ba789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 48
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 14 Feb 2023 19:32:12 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage

Riza Suminto has uploaded a new patch set (#48) to the change originally 
created by Qifan Chen. ( http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
..

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure to
allow the weighted total amount of data to process to be used as a new
factor in the definition and selection of an executor group. At the
basis of the CPU costing model, we define ProcessingCost as a cost for a
distinct PlanNode / DataSink / PlanFragment to process its input rows
globally across all of its instances. The costing algorithm then tries
to adjust the number of instances for each fragment by considering their
production-consumption ratio, and then finally returns a number
representing an ideal CPU core count required for a query to run
efficiently. A more detailed explanation of the CPU costing algorithm
can be found in the four steps below.

I. Compute ProcessingCost for each plan node and data sink.

ProcessingCost of a PlanNode/DataSink is a weighted amount of data
processed by that node/sink. The basic ProcessingCost is computed with a
general formula as follows.

  ProcessingCost is a pair: PC(D, N), where D = I * (C + M)

  where D is the weighted amount of data processed
I is the input cardinality
C is the expression evaluation cost per row.
  Set to total weight of expression evaluation in node/sink.
M is a materialization cost per row.
  Only used by scan and exchange node. Otherwise, 0.
N is the number of instances.
  Default to D / MIN_COST_PER_THREAD (1 million), but
  is fixed for a certain node/sink and adjustable in step III.

In this patch, the weight of each expression evaluation is set to a
constant of 1. A description of the computation for each kind of
PlanNode/DataSink is given below.

01. AggregationNode:
Each AggregateInfo has its C as a sum of grouping expression and
aggregate expression and then assigned a single ProcessingCost
individually. These ProcessingCosts then summed to be the Aggregation
node's ProcessingCost;

02. AnalyticEvalNode:
C is the sum of the evaluation costs for analytic functions;

03. CardinalityCheckNode:
Use the general formula, I = 1;

04. DataSourceScanNode:
Follow the formula from the superclass ScanNode;

05. EmptySetNode:
  I = 0;

06. ExchangeNode:
  M = 1 / row batch size.

A modification of the general formula when in broadcast mode:
  D = D * number of receivers;

07. HashJoinNode:
  probe cost = PC(I0 * C(equiJoin predicate),  N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equi-join predicate), N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

08. HbaseScanNode, HdfsScanNode, and KuduScanNode:
Follow the formula from the superclass ScanNode;

09. Nested loop join node:
When the right child is not a SingularRowSrcNode:

  probe cost = PC(I0 * C(equiJoin predicate), N)  +
   PC(output cardinality * C(otherJoin predicate), N)
  build cost = PC(I1 * C(equiJoin predicate), N)

When the right child is a SingularRowSrcNode:

  probe cost = PC(I0, N)
  build cost = PC(I0 * I1, N)

With I0 and I1 as input cardinality of the probe and build side
accordingly. If the plan node does not have a separate build, ProcessingCost
is the sum of probe cost and build cost. Otherwise, ProcessingCost is
equal to probeCost.

10. ScanNode:
  M = 1 / row batch size;

11. SelectNode:
Use the general formula;

12. SingularRowSrcNode:
Since the node is involved once per input in nested loop join, the
contribution of this node is computed in nested loop join;

13. SortNode:
C is the evaluation cost for the sort expression;

14. SubplanNode:
C is 1. I is the multiplication of the cardinality of the left and
the right child;

15. Union node:
C is the cost of result expression evaluation from all non-pass-through
children;

16. Unnest node:
I is the cardinality of the containing SubplanNode and C is 1.

17. DataStreamSink:
  M = 1 / num rows per batch.

18. JoinBuildSink:
ProcessingCost is the build cost of its associated JoinNode.

29. PlanRootSink:
If result spooling is enabled, C is the cost of output expression
evaluation. Otherwise. ProcessingCost is zero.

20. TableSink:
C is the cost of output expression evaluation.
TableSink subclasses (including HBaseTableSink, HdfsTableSink, and
KuduTableSink) follows the same formula;

II. Compute the total ProcessingCost of a fragment.

The costing

[Impala-ASF-CR] IMPALA-11604 Planner changes for CPU usage