Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5498: Support for partial sorts ......................................................................
Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/exec/partial-sort-node.cc File be/src/exec/partial-sort-node.cc: Line 32: sorter_(NULL), > This doesn't need to handle offset? Done Line 127: > Since this only requires a partial sort, you can exit early when the cumula As noted elsewhere, removing limit-related code. Line 140: *eos = input_eos_; > I guess it is not clear to me why allowing a limit without an offset is use Right, so 'offset' never really makes sense with partial-sort-node, since as you say there's not a total ordering and you're not guaranteed that one instance of partial-sort-node with an offset can pick up where another instance left off with a limit. A limit is potentially useful just for cutting off the number of rows returned in a single partial-sort-node, but with the initial use case for this there's no way to end up with a partial-sort-node with a limit, so in the interest of keeping things simple and not having untested code paths, I think I'll just remove the limit code. http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/exec/partial-sort-node.h File be/src/exec/partial-sort-node.h: Line 58: virtual Status QueryMaintenance(RuntimeState* state); > Just for general niceness, can you also update be/src/exec/sort-node.h's pr Done http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 1384: expr_mem_pool, &sort_tuple_expr_evals_)); > Need to update this calculation for partial sorts. Done Line 1434: if (sorted_runs_.size() == 1) { > Unnecessary - this is unconditionally dereferenced Done http://gerrit.cloudera.org:8080/#/c/7267/3/be/src/runtime/sorter.h File be/src/runtime/sorter.h: Line 96: Sorter(const TupleRowComparator& compare_less_than, > Wasn't the plan to apply the partition sort memory limit to the sorter itse This will be included in a follow up patch. http://gerrit.cloudera.org:8080/#/c/7267/3/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: PS3, Line 349: e { > Should this be "first" since the order is specified elsewhere. My problem with that wording is that it could be interpreted as "take the first N elements of input and then sort them" Maybe: "Return the first N sorted elements"? Line 360: struct TSortNode { > Please ignore my earlier comments about offset handling (although a comment Added to partial-sort-node.h http://gerrit.cloudera.org:8080/#/c/7267/3/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: Line 58: private final long PARTIAL_SORT_MEM_LIMIT = 128 * 1024 * 1024; > It might be clearer to specify this in bytes, then just make sure that its Done Line 286: // The memory limit cannot be less than the size of the required blocks. > This looks good to me. Done -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-HasComments: Yes
