Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5520: TopN node periodically reclaims old allocations
......................................................................


IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;

                       With Reclaim           Without Reclaim
                   Query 1     Query 2      Query 1     Query 2
MaxTuplePoolMem    3.96 KB     3.43 KB      110.2 MB    708.8 MB
Time (mean)        2s 218ms    6s 391ms     2s 021ms    6s 406ms
Time (stdev)       74.38ms     67.45ms      102.71ms    70.44ms
Reclaims            910         5861          N/A         N/A

We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Reviewed-on: http://gerrit.cloudera.org:8080/7400
Reviewed-by: Matthew Jacobs <m...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M tests/query_test/test_queries.py
4 files changed, 120 insertions(+), 22 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to