Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5520: TopN node periodically reclaims old allocations ......................................................................
IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem 3.96 KB 3.43 KB 110.2 MB 708.8 MB Time (mean) 2s 218ms 6s 391ms 2s 021ms 6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms 70.44ms Reclaims 910 5861 N/A N/A We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Reviewed-on: http://gerrit.cloudera.org:8080/7400 Reviewed-by: Matthew Jacobs <m...@cloudera.com> Tested-by: Impala Public Jenkins --- M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M tests/query_test/test_queries.py 4 files changed, 120 insertions(+), 22 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>