Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 )
Change subject: IMPALA-5004: Switch to sorting node for large TopN queries ...................................................................... Patch Set 5: (8 comments) Addressed comments and made a few more changes: * Changed query-options.cc to use ParseMemValue to parse the value of topn_bytes_limit since it is expressed in bytes * Re-organized the tests slightly into three distinct tests: (1) test with the default value of topn_bytes_limit, (2) test with a small value of topn_bytes_limit, and (3) test with topn_bytes_limit = 0 which disables the changes * A value of 0 or -1 for topn_bytes_limit now disables the changes http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@236 PS4, Line 236: th > the Done http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@238 PS4, Line 238: * average tuple serialized size</code>. 'cardinality' is the cardinality of the TopN > offset is not mentioned so either include it or remove it since the arg des The "estimated # of rows in memory" refers to the sum of the cardinality and offset. I don't explicitly mention the offset, because one can infer from the method body that "estimated # of rows in memory" = "cardinality + offset". If you still think the formula is redundant, I can remove it. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@240 PS4, Line 240: > Even though this style of commenting is widely used, Impala tries not to us Done http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@325 PS4, Line 325: rdinality = > as pointed out in the header for SortInfo, the abstractions are not quite r Moved the tuple handling into estimateTopNMaterializedSize. I couldn't make it static since it needs to reference the tuple descriptors of the current SortInfo. estimateTopNMaterializedSize is called by the SingleNodePlanner and the SortNode now. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@326 PS4, Line 326: th.min(root.cardinalit > this looks correct and consistent with the avgRowSize_ computation for Plan Done http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@334 PS4, Line 334: then it is likely we are sor > if we get this wrong, perhaps due to a mistaken many-to-many join cardinali Yeah, but that doesn't seem ideal. So I the most recent patch allows settings topn_bytes_limit to 0 or -1 which disables the decision entirely. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java@260 PS4, Line 260: nodeResourceProfile_ = ResourceProfile.noReservation( > where'd the return go? pls look into why tests did not get bothered by this I didn't run the full suite of tests, only the topn functional-query tests, which don't validate memory estimates by default. http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test File testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test: http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test@1 PS4, Line 1: > seems to apply to the second test, not the first, so got confused by this. Done -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Thu, 25 Oct 2018 02:13:17 +0000 Gerrit-HasComments: Yes
