Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#19). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. ...................................................................... IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---------------+---------+--------------+-----------------------+-------------------------+ | Insert table | #Rows | Avg | no limit | 512 MB limit | | | | SortDataSize +--------+--------------+---------+---------------+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---------------+---------+--------------+--------+--------------+---------+---------------+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---------------+---------+--------------+--------+--------------+---------+---------------+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---------------+---------+--------------+--------+--------------+---------+---------------+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---------------+---------+--------------+--------+--------------+---------+---------------+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 228 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/19 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>