Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16942 )
Change subject: IMPALA-10296: Fix analytic limit pushdown when predicates are present ...................................................................... Patch Set 11: (12 comments) Sending comments based on 1st pass of the planner changes. http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@9 PS11, Line 9: when nit: remove http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@22 PS11, Line 22: limit did you mean partition limit >= order by limit ? http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@32 PS11, Line 32: was nit: 'is' http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@40 PS11, Line 40: This patch implements tie handling in the backend (I took most I had previously wondered about how the planner and backend work for this functionality would be combined but it now starts falling into place with the handling of the duplicates :-) http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@68 PS11, Line 68: The for nit: 'The elapsed time for..' ? http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@507 PS11, Line 507: upper top-n. Since we do a distributed top-n, there are 3 top-n's in the plan and the 'upper top-n' may be confusing. Here it refers to the outermost top-n or final top-n or something similar. http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@511 PS11, Line 511: doesn't matter nit: worth clarifying that it doesn't matter for the purpose of the pushdown decision. http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@530 PS11, Line 530: include all of the rows in the final This does not literally mean all rows in the final partition right ? Should it be all eligible rows or all relevant rows ? (based on the P+N value) http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@531 PS11, Line 531: was nit: 'we' http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@603 PS11, Line 603: if (analyticLimit < limit) return falseStatus; One special case where this could work is if each partition had a maximum of analyticLimit rows. Then we know we are not excluding rows as we iterate through the partitions until we reach the final limit. Of course, this knowledge is not readily available or may not be relied upon due to ndv estimates but in practice I suspect this may not be uncommon. For this patch, it makes sense to be conservative in applying the optimization. http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/SortNode.java@83 PS11, Line 83: row.s nit: 'rows' http://gerrit.cloudera.org:8080/#/c/16942/11/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test File testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test: http://gerrit.cloudera.org:8080/#/c/16942/11/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test@1139 PS11, Line 1139: # rank() predicate is not pushed down because TOPN_BYTES_LIMIT prevents conversion The plan shows the lower top-n which indicates the rank was pushed down. Perhaps the top_bytes_limit should be even smaller if this is supposed to be a negative test. -- To view, visit http://gerrit.cloudera.org:8080/16942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I801d7799b0d649c73d2dd1703729a9b58a662509 Gerrit-Change-Number: 16942 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sun, 17 Jan 2021 00:36:51 +0000 Gerrit-HasComments: Yes
