Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/21005 )
Change subject: IMPALA-12790: Fix overestimation in ScanNode.getInputCardinality ...................................................................... Patch Set 1: (3 comments) Thanks for quickly fixing this! I am not sure about increasing estimates up to limit, otherwise LGTM. http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java@335 PS1, Line 335: getInputCardinality Probably a name like capInputCardinalityWithLimit would be clearer? http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java@59 PS1, Line 59: // Stats is missing, so numRows might be a result of extrapolation that is I am not 100% sure here - some clients may add a large limit for the sake of "protection" to avoid fetching excessive amount of rows that they couldn't process anyway. So I think that there can be cases when the limit is unrealistically large and the extrapolated value makes more sense. The planner generally seems to only cap with limit and not use it instead of estimated stats: explain select * from functional_parquet.alltypes limit 20000; The plan uses the estimated 11K as cardinality instead of 20K from the limit. http://gerrit.cloudera.org:8080/#/c/21005/1/testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test File testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test: http://gerrit.cloudera.org:8080/#/c/21005/1/testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test@545 PS1, Line 545: has not stats compute nit, here and in next 3 queries: "has no stats" -- To view, visit http://gerrit.cloudera.org:8080/21005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc5b39a7684fb8748185349d0b80baf8dcd6b126 Gerrit-Change-Number: 21005 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Tue, 06 Feb 2024 17:12:54 +0000 Gerrit-HasComments: Yes
