Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 2: (4 comments) I have some serious concerns about the behaviour with predicates, not necessarily that it wouldn't be useful in some circumstances, but that it's changing the meaning of the LIMIT clause too much, even considering that it's under a query option. Otherwise I had more minor comments on the rest of the code. http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@184 PS2, Line 184: // Tracks the simple LIMIT status of this query block Can you explain what the two values are? http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16723/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1560 PS2, Line 1560: // check eligibility of simple limit optimization: It seems like this would most naturally fit as part of HdfsPartition pruning, which is invoked above. Would be good to move this logic into a separate function regardless, but it seems like it could fit in that step. http://gerrit.cloudera.org:8080/#/c/16723/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test: http://gerrit.cloudera.org:8080/#/c/16723/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@24 PS2, Line 24: # limit with a WHERE clause Add a case where the where clause has a subquery, too? http://gerrit.cloudera.org:8080/#/c/16723/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@26 PS2, Line 26: where bigint_col < 100 Trying to apply this with non-partition predicates doesn't seem safe. I get there's a use case for limiting the input data scanned even when it changes results but this seems very different and surprising behavior for the LIMIT clause compared to the other optimisation. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 16 Nov 2020 17:26:15 +0000 Gerrit-HasComments: Yes