Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 )
Change subject: IMPALA-5004: Switch to sorting node for large TopN queries ...................................................................... Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@236 PS4, Line 236: is the http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@238 PS4, Line 238: * average tuple serialized size</code>. offset is not mentioned so either include it or remove it since the arg descriptions and the one-liner definition make things pretty clear already. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java@240 PS4, Line 240: @param Even though this style of commenting is widely used, Impala tries not to use it (yes, there are exceptions in the code base, but its on the rare side). If a file already uses them, then pls stick with the style of that file, but in this case, lets stick with the default style so remove the '@param'. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@325 PS4, Line 325: estimateTopNMaterializedSize as pointed out in the header for SortInfo, the abstractions are not quite right. however, this call-site looks fairly complicated. compueMemLayout does not redo if its already been called, so perhaps simplify this to let SortInfo deal with its descriptors and calling computeMemLayout? You can have a static method do what estimateTopNMaterializedSize does now so that it can be used in SortInfo as well as from the SortNode. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@326 PS4, Line 326: getAvgSerializedSize() this looks correct and consistent with the avgRowSize_ computation for PlanNodes. http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@334 PS4, Line 334: estimatedTopNMaterializedSize if we get this wrong, perhaps due to a mistaken many-to-many join cardinality estimate in some descendant child, what is the easiest way to turn this decision off? set topn_bytes_limit to max long? http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/11698/4/fe/src/main/java/org/apache/impala/planner/SortNode.java@260 PS4, Line 260: nodeResourceProfile_ = ResourceProfile.noReservation( where'd the return go? pls look into why tests did not get bothered by this... the profile could have been overwritten down below, in which case we'd get possibly different estimates. http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test File testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test: http://gerrit.cloudera.org:8080/#/c/11698/4/testdata/workloads/functional-planner/queries/PlannerTest/topn-default-limit-bytes.test@1 PS4, Line 1: triggers sort seems to apply to the second test, not the first, so got confused by this. -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Wed, 24 Oct 2018 17:19:41 +0000 Gerrit-HasComments: Yes
