Qifan Chen has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/19033 )

Change subject: IMPALA-11604 Planner changes for CPU usage
......................................................................

IMPALA-11604 Planner changes for CPU usage

This patch augments IMPALA-10992 by establishing an infrastructure
to allow the weighted total amount of data to process per instance
to be used as a new factor in the definition and selection of an
executor group. In this patch the weight component is set to 1.

The weighted amount of data processed is the sum of that in every
fragment in the query to reflect the worse case of minimal parallel
execution among fragments. A fragment's total is the sum of that of
every node in the fragment to reflect these nodes are executed in a
single thread. For each node, the weighted amount of data processed
is computed with a general formula as follows.

  D = (I * C * W) / N

  where D is the weighted amount of data processed
        I is input cardinality
        C is expression evaluation cost per row, set to 1
        W is average row size
        N is number of instances

A description of the computation for each kind of plan node is
given below.

1. Aggregation node:
    C and W are the sum of the costs and partial row widths for each
    AggregateInfo object.

2. AnalyticEval node:
    C is sum of the evaluation costs for analytic functions, partition
    by equal and order by equal predicate;

3. CardinalityCheck node:
    Both C and I are 1;

4. DataSource scan node:
    C is computed from a subset of the selection predicates excluding
    data source accepted predicates;

5. EmptySet node:
    I is 0;

6. Exchange node:
    A modification of the general formula when in broadcast mode:
    D = (I * C * W / N) * number of receivers;

7. Hash join node:
    C is sum of the evaluation cost for equi-join predicate and for
    other join predicate, for both probe and build side;

8. Hbase scan node:
    N is 1

9. Hdfs and Kudu scan node:
    N is mt_dop when query option mt_dop >= 1, otherwise
    N is number of nodes * max scan threads;

10. Nested loop join node:
    When the right child is not a SingularRowSrc node, C is sum of
    the evaluation cost for equi-join predicate and for other join
    predicate, for both probe and build side.

    When the right child is a SingularRowSrc node, the cost for build
    side is multiplied by the number of probes;

11. Select node:
    Use the general formula;

12. SingularRowSrc node:
    Since the node is involved once per input in nested loop join, the
    contribution of this node is computed in nested loop join;

13. Sort node:
    C is the evalation cost for the sort expression and W is the width
    of the intermediate tuple being sorted;

14. Subplan node:
    C is 1. I is the multiplication of the cardinality of the left and
    the right child;

15. Union node:
    C is the cost of materializing rows from all non pass-through
    children. W is the width of all non pass-through children;

16. Unnest node:
    I is the cardinality of the containing subplan node and C is 1.

Testing:
  [TBD]

Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
---
M common/thrift/Frontend.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SelectNode.java
M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/SubplanNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/planner/UnnestNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
A fe/src/main/java/org/apache/impala/util/AutoScaleUtil.java
M fe/src/main/java/org/apache/impala/util/ExprUtil.java
31 files changed, 499 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/19033/15
--
To view, visit http://gerrit.cloudera.org:8080/19033
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If32dc770dfffcdd0be2b5555a789a7720952c68a
Gerrit-Change-Number: 19033
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>

Reply via email to