Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9943

to look at the new patch set (#3).

Change subject: IMPALA-5706: Parallelise read I/O in sorter
......................................................................

IMPALA-5706: Parallelise read I/O in sorter

This patch covers multiple changes with the purpose of optimizing
spilling sort mechanism:
  - Use double-buffering when merging the sorted runs. As a result
    while a run's page is being processed the next one can be loaded
    from disk in the background.
  - Remove the hard-coded maximum limit of buffers that can be used
    for merging the sorted runs. Instead this number is calculated
    based on the available memeory through buffer pool.
  - The already sorted runs are distributed evenly between the last
    intermediate merge and the final merge to avoid that a heavy
    intermediate merge is followed by a light final merge.
  - Right before starting the merging phase Sorter tries to allocate
    additional memory through the buffer pool.
  - An output run is not allocated anymore for the final merge.

Performance measurements were made during manual testing to verify
that this is in fact an optimization:
  - In case doing a sort on top of a join when working with restricted
    amount of memory then additional memory is successfully allocated
    for merging than was available during the initial sort runs. This
    results in shallower merging trees (more runs grabbed for a
    merge).
  - Manual tests showed that when a single final merge is performed
    this change slighlty decrease the execution time for sorting.

Further testing should be done to cover double-buffering scenarios as
my manual testing so far didn't show any performance gain when
intermediate merges were performed. Most probably because due to
double-buffering the number of runs in a single merge decreases and I
have to hit an I/O heavy scenario to overcome this.

Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
---
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-naaj-no-deny-reservation.test
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-sorts-exhaustive.test
M testdata/workloads/tpch/queries/sort-reservation-usage.test
M tests/custom_cluster/test_mem_reservations.py
M tests/query_test/test_sort.py
14 files changed, 319 insertions(+), 245 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/9943/3
--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>

Reply via email to