Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5498: Support for partial sorts in Kudu INSERTs ......................................................................
IMPALA-5498: Support for partial sorts in Kudu INSERTs Impala currently supports total sorts (the entire set of data is sorted) and top-n sorts (only the highest/lowest n elements are sorted). This patch adds the ability to do partial sorts, where the data is divided up into some number of subsets, each of which is sorted individually. It accomplishes this by adding a new exec node, PartialSortNode. When PartialSortNode::GetNext() is called, it retrieves input up to the query memory limit, uses the existing Sorter class to sort it, and outputs it. This is faster than a total sort with SortNode as it avoids the need to spill if the input is larger than the memory limit. Future work will look into setting a more restrictive memory limit on the PartialSortNode. (IMPALA-5669) In the planner, the SortNode plan node is used, with an enum value indicating if it is a total or partial sort. This also adds a new counter 'RunSize' to the runtime profile which tracks the min, max, and avg size of the generated runs, in tuples. As a first use case, partial sort is used where a total sort was used previously for inserts/upserts into Kudu tables only. Future work can extend this to other table sinks. (IMPALA-5649) Testing: - E2E test with a large INSERT into a Kudu table with a mem limit. Checks that no spills occurred. - Updated planner tests. - Existing E2E tests and stress test verify correctness of INSERT. - Perf tests on the 10 node cluster: inserting tpch_100.lineitem into a Kudu table with mem_limit=3gb: Previously: 5 runs are spilled, sort took 7m33s Now: no spills, sort takes 6m19s, for ~18% speedup Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Reviewed-on: http://gerrit.cloudera.org:8080/7267 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc A be/src/exec/partial-sort-node.cc A be/src/exec/partial-sort-node.h M be/src/exec/sort-node.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/util/runtime-profile-counters.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 16 files changed, 459 insertions(+), 70 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]>
