Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 )
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator ...................................................................... Patch Set 17: (7 comments) http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@745 PS17, Line 745: int numAggExprs = getMaterializedAggregateExprs().size(); Why switch to getMaterializedAggregateExprs? http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@760 PS17, Line 760: LOG.trace("Total CPU cost estimate: " + totalCost nit: This could use SLF4J parameterized messages: https://www.slf4j.org/faq.html#logging_performance. Although that involves implicitly constructing an argument array, so your version is still faster. http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@638 PS17, Line 638: double lhsNetworkCost = (lhsHasCompatPartition) ? nit: any good reason to reformat this file? http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@353 PS17, Line 353: // TODO: For broadcast join builds we're underestimating cost here because we're using It would be nice to add tickets for TODOs. Is this one a significant effort? http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@209 PS17, Line 209: // Coefficiencts for estimating scan CPU processing cost. Derived from benchmarking. typo: should be Coefficients http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@648 PS17, Line 648: // TODO: Should we use AggregationNode.DEFAULT_SKEW_FACTOR when calculating I guess do we care more about average or worst-case behavior here? Have any other ideas for making a decision? http://gerrit.cloudera.org:8080/#/c/21279/17/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@656 PS17, Line 656: // TODO: Can we do better than this? This TODO doesn't feel very actionable. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 17 Gerrit-Owner: David Rorke <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Wed, 17 Apr 2024 23:15:00 +0000 Gerrit-HasComments: Yes
