Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 )
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator ...................................................................... Patch Set 1: (36 comments) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@573 PS1, Line 573: // The cost is a function of both the input size and the output size (NDV). For estimating line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@574 PS1, Line 574: // cost for any pre-aggregation we need to account for the likelihood of duplicate keys line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@575 PS1, Line 575: // across fragments. Calculate an overall "intermediate" output cardinality that attempts line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@576 PS1, Line 576: // to account for the dups. Cap it at the input cardinality because an aggregation cannot line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@580 PS1, Line 580: fragment_.getPerInstanceNdvForCpuCosting(inputCardinality, aggInfo.getGroupingExprs()); line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java File fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java@40 PS1, Line 40: totalCost_ = Math.ceil(Math.max(cardinality_, 0) * (exprsCost_ + materializationCost_)); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@105 PS1, Line 105: long outputSize = (long) (exchNode_.getAvgDeserializedRowSize(exchNode_) * outputCardinality); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@123 PS1, Line 123: ProcessingCost.basicCost(getLabel() + "(" + exchNode_.getDisplayLabel() + ")", totalCost); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@58 PS1, Line 58: // Coefficients for estimating hash join CPU processing cost. Derived from benchmarking. line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@345 PS1, Line 345: double totalProbeCost = line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@353 PS1, Line 353: ProcessingCost probeProcessingCost = ProcessingCost.basicCost(getDisplayLabel(),totalProbeCost); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@359 PS1, Line 359: // In the end this probably doesn't matter to the CPU resource allocation since the build line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@360 PS1, Line 360: // fragment count is fixed for broadcast(at num hosts) regardless of the cost computed here. line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@362 PS1, Line 362: double totalBuildCost = getChild(1).getFilteredCardinality() * BUILD_INPUT_COST_COEFFICIENT; line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2213 PS1, Line 2213: * bytes read (for non-columnar) plus an additional per row cost for each predicate evaluated. line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@167 PS1, Line 167: if (formats.contains(HdfsFileFormat.PARQUET) || formats.contains(HdfsFileFormat.ICEBERG)) { line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@170 PS1, Line 170: (estBytesInserted * PARQUET_BYTES_INSERTED_COST_COEFFICIENT) + PARQUET_FIXED_INSERT_COST; line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@175 PS1, Line 175: (estBytesInserted * DEFAULT_BYTES_INSERTED_COST_COEFFICIENT) + DEFAULT_FIXED_INSERT_COST; line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@621 PS1, Line 621: * in getPerInstanceNdv() ignores dups and can underestimate badly especially for low NDV. line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@631 PS1, Line 631: long perInstanceInputCardinality = (long) Math.ceil(((double) inputCardinality / numInstances)); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@639 PS1, Line 639: // Estimate the per instance NDV for this expr accounting for the probability of duplicates line too long (97 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@641 PS1, Line 641: 1.0 - Math.pow(((double)(exprGlobalNdv - 1) / exprGlobalNdv),perInstanceInputCardinality); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1070 PS1, Line 1070: final int maxThreadPerNode, final int parentParallelism, TQueryOptions queryOptions) { line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1078 PS1, Line 1078: minThreadPerNode, maxThreadPerNode, parentParallelism, nodeStepCount, queryOptions); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1127 PS1, Line 1127: private boolean adjustToMaxParallelism(final int minThreadPerNode, final int maxThreadPerNode, line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1173 PS1, Line 1173: Math.max(maxScannerThreads, scanNode.computeMaxScannerThreadsForCPC(queryOptions)); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1208 PS1, Line 1208: maxScannerThreads = scanNodes.get(0).computeMaxScannerThreadsForCPC(queryOptions); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java@481 PS1, Line 481: fragment.traverseEffectiveParallelism(minThreadPerNode, maxThreadPerNode, -1, queryOptions); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java@494 PS1, Line 494: * Only valid after {@link #computeEffectiveParallelism(List, int, int, TQueryOptions)} has line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/Planner.java@561 PS1, Line 561: computeEffectiveParallelism(postOrderFragments, rootAnalyzer.getMinParallelismPerNode(), line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@466 PS1, Line 466: // (<= 10 bytes) the sort time is proportional to NlogN of the row count. For larger row line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@467 PS1, Line 467: // widths the time is best predicted as a linear function of the row count and byte count line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@468 PS1, Line 468: // (likely because the time is dominated by materialization costs and not the actual key sorting line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@469 PS1, Line 469: // cost). Benchmark coefficients were derived from cases with moderately low NDV and with line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@470 PS1, Line 470: // just one or two sorting runs and minimal or no spilling. These are close to best case line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: http://gerrit.cloudera.org:8080/#/c/21279/1/fe/src/main/java/org/apache/impala/planner/UnionNode.java@166 PS1, Line 166: totalMaterializedCardinality = checkedAdd(totalMaterializedCardinality, child.cardinality_); line too long (100 > 90) -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 1 Gerrit-Owner: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Tue, 09 Apr 2024 21:12:50 +0000 Gerrit-HasComments: Yes
