Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11698 )
Change subject: IMPALA-5004: Switch to sorting node for large TopN queries ...................................................................... Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11698/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/11698/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@313 PS3, Line 313: estimatedTopNMaterializedSize < ctx_.getQueryOptions().topn_bytes_limit; > This code makes the sort/top-n decision based on memory. The number of byte @Paul, thanks for taking a look! I agree with a lot of your concerns. I've raised a few of my own on the JIRA: IMPALA-5004 I agree with your comparison of TopN vs. Sort. In general, the tradeoffs between the two operators don't seem trivial. TopN processes fewer bytes, but cannot spill. While Sort processes significantly more data, but can spill. Furthermore, depending on the size of the limit, sometimes TopN is faster than Sort and sometimes Sort is faster than TopN. I agree that if we do decide to add this parameter, it would be best if we could dynamically set it based on memory requirements rather than depending on users to do it. -- To view, visit http://gerrit.cloudera.org:8080/11698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9 Gerrit-Change-Number: 11698 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Paul Rogers <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Thu, 18 Oct 2018 22:18:10 +0000 Gerrit-HasComments: Yes
