Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 )
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator ...................................................................... Patch Set 18: Code-Review+1 (7 comments) http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@745 PS17, Line 745: int numAggExprs = getMaterializedAggregateExprs().size(); > AFAICT getMaterializedAggregateExprs().size() should return the count of th Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@760 PS17, Line 760: LOG.trace("Total CPU cost estimate: " + totalCost > Understood. Do we have any strong conventions or standards here. Just looki I don't think there is a consensus in Impala. I think the SLF4J community would recommend using parameterized messages, they're probably slightly more optimal about string building. But this is fine. http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@638 PS17, Line 638: double lhsNetworkCost = (lhsHasCompatPartition) ? 0.0 : > Restored the original formatting. Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@353 PS17, Line 353: // TODO: For broadcast join builds we're underestimating cost here because we're using > I'll enter a ticket for that. It's not a big effort but also not trivial. I Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@209 PS17, Line 209: // Coefficients for estimating scan CPU processing cost. Derived from benchmarking. > Done Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@648 PS17, Line 648: // TODO: Should we use AggregationNode.DEFAULT_SKEW_FACTOR when calculating > I think average case behavior is probably more appropriate for most cases o Ack http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@656 PS17, Line 656: exprGlobalNdv = inputCardinality; > Agree. I've removed this TODO. Ack -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 18 Gerrit-Owner: David Rorke <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Thu, 18 Apr 2024 16:04:21 +0000 Gerrit-HasComments: Yes
