Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21005 )

Change subject: IMPALA-12790: Fix overestimation in ScanNode.getInputCardinality
......................................................................


Patch Set 1:

(3 comments)

Thanks for quickly fixing this!
I am not sure about increasing estimates up to limit, otherwise LGTM.

http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java@335
PS1, Line 335: getInputCardinality
Probably a name like capInputCardinalityWithLimit would be clearer?


http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java
File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java:

http://gerrit.cloudera.org:8080/#/c/21005/1/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java@59
PS1, Line 59:         // Stats is missing, so numRows might be a result of 
extrapolation that is
I am not 100% sure here - some clients may add a large limit for the sake of 
"protection" to avoid fetching excessive amount of rows that they couldn't 
process anyway. So I think that there can be cases when the limit is 
unrealistically large and the extrapolated value makes more sense.

The planner generally seems to only cap with limit and not use it instead of 
estimated stats:

explain select * from functional_parquet.alltypes limit 20000;

The plan uses the estimated 11K as cardinality instead of 20K from the limit.


http://gerrit.cloudera.org:8080/#/c/21005/1/testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test:

http://gerrit.cloudera.org:8080/#/c/21005/1/testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test@545
PS1, Line 545: has not stats compute
nit, here and in next 3 queries: "has no stats"



--
To view, visit http://gerrit.cloudera.org:8080/21005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icc5b39a7684fb8748185349d0b80baf8dcd6b126
Gerrit-Change-Number: 21005
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Tue, 06 Feb 2024 17:12:54 +0000
Gerrit-HasComments: Yes

Reply via email to