Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
......................................................................


Patch Set 1:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@573
PS1, Line 573:       // The cost is a function of both the input size and the 
output size (NDV). For estimating
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@574
PS1, Line 574:       // cost for any pre-aggregation we need to account for the 
likelihood of duplicate keys
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@575
PS1, Line 575:       // across fragments.  Calculate an overall "intermediate" 
output cardinality that attempts
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@576
PS1, Line 576:       // to account for the dups. Cap it at the input 
cardinality because an aggregation cannot
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@580
PS1, Line 580:           
fragment_.getPerInstanceNdvForCpuCosting(inputCardinality, 
aggInfo.getGroupingExprs());
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@40
PS1, Line 40:     totalCost_ = Math.ceil(Math.max(cardinality_, 0) * 
(exprsCost_ + materializationCost_));
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@105
PS1, Line 105:     long outputSize = (long) 
(exchNode_.getAvgDeserializedRowSize(exchNode_) * outputCardinality);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@123
PS1, Line 123:         ProcessingCost.basicCost(getLabel() + "(" + 
exchNode_.getDisplayLabel() + ")", totalCost);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@58
PS1, Line 58:   // Coefficients for estimating hash join CPU processing cost.  
Derived from benchmarking.
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@345
PS1, Line 345:     double totalProbeCost =
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@353
PS1, Line 353:     ProcessingCost probeProcessingCost = 
ProcessingCost.basicCost(getDisplayLabel(),totalProbeCost);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@359
PS1, Line 359:     // In the end this probably doesn't matter to the CPU 
resource allocation since the build
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@360
PS1, Line 360:     // fragment count is fixed for broadcast(at num hosts) 
regardless of the cost computed here.
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@362
PS1, Line 362:     double totalBuildCost = getChild(1).getFilteredCardinality() 
* BUILD_INPUT_COST_COEFFICIENT;
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2213
PS1, Line 2213:    *  bytes read (for non-columnar) plus an additional per row 
cost for each predicate evaluated.
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@167
PS1, Line 167:     if (formats.contains(HdfsFileFormat.PARQUET) || 
formats.contains(HdfsFileFormat.ICEBERG)) {
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@170
PS1, Line 170:           (estBytesInserted * 
PARQUET_BYTES_INSERTED_COST_COEFFICIENT) + PARQUET_FIXED_INSERT_COST;
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@175
PS1, Line 175:           (estBytesInserted * 
DEFAULT_BYTES_INSERTED_COST_COEFFICIENT) + DEFAULT_FIXED_INSERT_COST;
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@621
PS1, Line 621:     * in getPerInstanceNdv() ignores dups and can underestimate 
badly especially for low NDV.
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@631
PS1, Line 631:     long perInstanceInputCardinality = (long) 
Math.ceil(((double) inputCardinality / numInstances));
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@639
PS1, Line 639:       // Estimate the per instance NDV for this expr accounting 
for the probability of duplicates
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@641
PS1, Line 641:           1.0 - Math.pow(((double)(exprGlobalNdv - 1) / 
exprGlobalNdv),perInstanceInputCardinality);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1070
PS1, Line 1070:       final int maxThreadPerNode, final int parentParallelism, 
TQueryOptions queryOptions) {
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078
PS1, Line 1078:         minThreadPerNode, maxThreadPerNode, parentParallelism, 
nodeStepCount, queryOptions);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1127
PS1, Line 1127:   private boolean adjustToMaxParallelism(final int 
minThreadPerNode, final int maxThreadPerNode,
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1173
PS1, Line 1173:                 Math.max(maxScannerThreads, 
scanNode.computeMaxScannerThreadsForCPC(queryOptions));
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1208
PS1, Line 1208:           maxScannerThreads = 
scanNodes.get(0).computeMaxScannerThreadsForCPC(queryOptions);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java@481
PS1, Line 481:         fragment.traverseEffectiveParallelism(minThreadPerNode, 
maxThreadPerNode, -1, queryOptions);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java@494
PS1, Line 494:    * Only valid after {@link #computeEffectiveParallelism(List, 
int, int, TQueryOptions)} has
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java@561
PS1, Line 561:     computeEffectiveParallelism(postOrderFragments, 
rootAnalyzer.getMinParallelismPerNode(),
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@466
PS1, Line 466:     // (<= 10 bytes) the sort time is proportional to NlogN of 
the row count.  For larger row
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@467
PS1, Line 467:     // widths the time is best predicted as a linear function of 
the row count and byte count
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@468
PS1, Line 468:     // (likely because the time is dominated by materialization 
costs and not the actual key sorting
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@469
PS1, Line 469:     // cost). Benchmark coefficients were derived from cases 
with moderately low NDV and with
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@470
PS1, Line 470:     // just one or two sorting runs and minimal or no spilling.  
These are close to best case
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/UnionNode.java@166
PS1, Line 166:         totalMaterializedCardinality = 
checkedAdd(totalMaterializedCardinality, child.cardinality_);
line too long (100 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 1
Gerrit-Owner: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Tue, 09 Apr 2024 21:12:50 +0000
Gerrit-HasComments: Yes

Reply via email to