[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Reviewed-on: http://gerrit.cloudera.org:8080/15963 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 224 insertions(+), 10 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 22 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 21: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 21 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 17:13:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 21: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 21 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 12:16:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 20: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 12:16:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 21: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6172/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 21 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 12:16:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 20: Patch set 19 fail the same test, test_multiple_sort_run_bytes_limits. Looks like admission controller does not respect buffer_pool_limit as much as mem_limit. Patch set 20 change the test cases to use mem_limit instead of buffer_pool_limit, just as Tim initially suggest. Some of the sort_run_bytes_limit parameter also adjusted to keep the assertions true. Fang-Yu help me verify that this Patch set 20 can pass ubuntu-16.04-dockerised-tests by rerunning it in this jenkins job: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2814/ -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 02:47:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6692/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 23:01:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#20). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 224 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/20 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/ -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 19:34:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 14:29:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: The dockerised tests have memory-based admission control enabled so it's more likely that the query was given a memory limit by the admission controller and that it behaved differently because of that (rather than concurrency being a factor). You could explicitly set a higher mem_limit for that test if you want more control. I think you're probably achieving the same thing by setting buffer_pool_limit. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 22:47:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6683/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 22:08:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: Patch set 18 failed the first test case of test_multiple_sort_run_bytes_limits on ubuntu-16.04-dockerised-tests setup. I assume it is due to memory pressure from other running queries that force sorter to spill, causing assertion error (there should be 0 spilled runs). Patch set 19 tries to fix it by marking the test to execute serially and raise the buffer_pool_limit for the first test case from 2GB to 3GB. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 21:48:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#19). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 228 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/19 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 18: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6154/ -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 20:51:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 15:42:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 18: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6154/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 15:42:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Jul 2020 15:39:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6658/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 20 Jul 2020 19:02:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#17). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 224 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/17 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 17 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6483/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 02 Jul 2020 04:17:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#16). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 224 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/16 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 15: > Patch Set 15: Code-Review+1 > > Thanks for the changes! I can update to +2 if no one else has comments. > > One more thing to think about is potentially finding a good default for > sort_run_bytes_limit. I agree with doing it in another commit, but it could > be good to know where we will track this, e.g. in IMPALA-6692 or a new jira. Thank you, Csaba. Yes, once this get merged, I can file a follow up JIRA to track what is the best default value for sort_run_bytes_limit. I'm also interested to refine how we buffer OutboundRowBatch in krpc-data-stream-sender. Even with this IMPALA-6692 solution, with around 512mb sort_run_bytes_limit, we will still have 2-3s back pressure anytime we do intermediate sort. We can utilize that time in data stream sender to do more buffering and serialization of next RowBatch while waiting for the pipeline to unblock. Since IMPALA-5444 and IMPALA-9692 (part 3) get in first, it looks like this patch will hit merge conflict. I can prepare a rebase and submit a new patchset later today. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Jul 2020 19:35:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 15: Code-Review+1 Thanks for the changes! I can update to +2 if no one else has comments. One more thing to think about is potentially finding a good default for sort_run_bytes_limit. I agree with doing it in another commit, but it could be good to know where we will track this, e.g. in IMPALA-6692 or a new jira. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Jul 2020 11:26:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6465/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jun 2020 14:45:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 15: (8 comments) Thank you, Csaba! http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@10 PS14, Line 10: batches are added to current unsorted run or memory limit is hit, > missing 'are'? Done http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@11 PS14, Line 11: sorter will immediately start the run. If the latter case happens, > nit: happens Done http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@13 PS14, Line 13: new unsorted run object, and continue to add the next row batches, and > missing 'to' Done http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@15 PS14, Line 15: > nit: tries Done http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@23 PS14, Line 23: > nit: speeds up Done http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@26 PS14, Line 26: 'sort_ > nit: exceeded Done http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@32 PS14, Line 32: ssion > nit: fit Done http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc@952 PS14, Line 952: 3 > I didn't catch this one - I think that it should be also VLOG(3) Done -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jun 2020 14:18:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#15). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 225 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/15 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 14: Code-Review+1 (8 comments) http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@10 PS14, Line 10: batches added to current unsorted run or memory limit is hit, sorter missing 'are'? http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@11 PS14, Line 11: will immediately start the run. If the latter case happen, sorter will nit: happens http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@13 PS14, Line 13: run object, and continue add the next row batches, and so on. missing 'to' http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@15 PS14, Line 15: try nit: tries http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@23 PS14, Line 23: speedup nit: speeds up http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@26 PS14, Line 26: exceed nit: exceeded http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@32 PS14, Line 32: fitted nit: fit http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc@952 PS14, Line 952: 2 I didn't catch this one - I think that it should be also VLOG(3) -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 30 Jun 2020 08:19:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6440/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 27 Jun 2020 02:11:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6439/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 27 Jun 2020 02:08:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 14: (13 comments) http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h File be/src/exec/sort-node.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77 PS12, Line 77: will > nit: "will go"? Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89 PS12, Line 89: > Ok, in that case, we should just abandon estimate in case of cardinality is This is now changed to obtain full input estimate from planner. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: > I will look at possibility to access that average size data in the backend. I figure out how to get more precise estimate from planner. It is close enough to the actual input size that I observe in my test. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92 PS12, Line 92: & > I think that VLOG(3) is enough. Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h File be/src/runtime/sorter.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: t > "the" would be better Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: /// 'estimated_input_size' is the total rows in bytes that are estimated to get added > nit: missing "are" Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: /// into this sorter. This is used to decide if sorter needs to proactively spill for > nit: needs Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: ively sp > nit: spill Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223 PS12, Line 223: do > nit: "do an"? Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816 PS12, Line 816: PrettyPrinter::PrintBytes(estimated_input_size), > I think that VLOG(3) is enough here - this should happen if the cardinality Done http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907 PS12, Line 907: PrettyPrinter::PrintBytes(GetSortRunBytesLimit())); > Same as line 816. Done http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645 PS12, Line 645: a join > typo: backends Done http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74 PS12, Line 74: """The first sort run is given a privilege to ignore sort_run_bytes_limit, except :when estimate hints that spill is inevitable. The lower sort_run_bytes_limit of :a query is, the more sort runs are likely to be produced and spilled. :Case 1 : 0 SpilledRuns, because all rows fit within the maximum reservation. : sort_run_bytes_limit is not enforced. :Case 2 : 3 SpilledRuns, because the first run hit reservation limit, and the : next 2 runs are capped to 150m. :Case 3 : 4 SpilledRuns, because sort node estimate that spill is inevitable. : So all runs are capped to 130m, including the first one.""" > I will look at that 'query_result.runtime_profile'. Done -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 27 Jun 2020 01:50:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#14). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceed sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fitted into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 225 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/14 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#13). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceed sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fitted into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 225 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/13 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain t I will look at possibility to access that average size data in the backend. But just to make sure I get it right. For row that contain varlen data, the GetRowSize() will most likely underestimate the size, since it only takes account for the pointer, but not the string length itself? So that, in turn, will cause return value of this ComputeInputSizeEstimate() to be underestimate as well. But isn't this input size underestimation better than overestimation? In case of underestimation, the worse situation is that we don't enforce sort_run_bytes_limit for the first run (hoping that all will fit in memory), turns out wrong and spill, but we then enforce sort_run_bytes_limit for the next runs. Overestimation is worse, because we unnecessarily spill from beginning when the input can possibly fit in the memory. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 20:09:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > So what is the nature of varlen column? Is each row possibly will have diff I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain the size of the tuple that holds a row - but in case of string and varchar it contains a pointer (+length), so there is additional data in some buffer. The column stats contain AvgSize and MaxSize - these are constants for fixed sized types, but we calculate them for strings during COMPUTE STATS, so we can get a more or less accurate estimation for the total amount of memory consumed. I don't know from the top of my head how to access this data in the backend. Strings are very common, so many queries contain varlen slots. I am not sure if it is a good idea to create an optimization specifically for queries without strings. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 18:11:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (3 comments) Thank you Csaba for your feedback! I have couple follow up questions. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89 PS12, Line 89: cardinality > What is the default value of this? Can it be -1 (unknown)? The result seems Ok, in that case, we should just abandon estimate in case of cardinality is -1. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > I think that this doesn't contain varlen data, so it can greatly underestim So what is the nature of varlen column? Is each row possibly will have different sizes with large variations? And what is GetRowSize() return in that case? Thinking if we should abandon the estimate entirely for input rows having varlen data. http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74 PS12, Line 74: """The first sort run is given a privilege to ignore sort_run_bytes_limit, except :when estimate hints that spill is inevitable. The lower sort_run_bytes_limit of :a query is, the more sort runs are likely to be produced. :Case 1 : 1 run produced, because all rows fit within the maximum reservation. : sort_run_bytes_limit is not enforced. :Case 2 : 3 run produced, because the first run hit reservation limit, and the : next 2 runs are capped to 150m. :Case 3 : 4 run produced, because sort node estimate that spill is inevitable. : So all runs are capped to 130m, including the first one.""" > Isn't there something in query_result.runtime_profile that could be used to I will look at that 'query_result.runtime_profile'. Otherwise, I will change this test to run_test_case and verify the profile via regex. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 16:15:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (13 comments) Sorry for the many grammar comments, I was also the victim of this in the past :) My only real concern is about the case when the cardinality is unknown. My preference would be to try to allow spilling in that case. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h File be/src/exec/sort-node.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77 PS12, Line 77: going nit: "will go"? http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89 PS12, Line 89: cardinality What is the default value of this? Can it be -1 (unknown)? The result seems pretty wrong in that case. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize I think that this doesn't contain varlen data, so it can greatly underestimate the input size if there are strings. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92 PS12, Line 92: 2) I think that VLOG(3) is enough. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h File be/src/runtime/sorter.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: a "the" would be better http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: /// 'estimated_input_size' is a total rows in bytes that estimated to get added into nit: missing "are" http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: /// this sorter. This is used to decide if sorter need to proactively spilling for nit: needs http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: spilling nit: spill http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223 PS12, Line 223: run nit: "do an"? http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816 PS12, Line 816: VLOG(2) << Substitute( I think that VLOG(3) is enough here - this should happen if the cardinality estimation was wrong, which may make WARNING logical, but this seems unavoidable for many queries, so I wouldn't spam the warning log. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907 PS12, Line 907: VLOG(2) << Substitute( Same as line 816. http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645 PS12, Line 645: backeds typo: backends http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74 PS12, Line 74: """The first sort run is given a privilege to ignore sort_run_bytes_limit, except :when estimate hints that spill is inevitable. The lower sort_run_bytes_limit of :a query is, the more sort runs are likely to be produced. :Case 1 : 1 run produced, because all rows fit within the maximum reservation. : sort_run_bytes_limit is not enforced. :Case 2 : 3 run produced, because the first run hit reservation limit, and the : next 2 runs are capped to 150m. :Case 3 : 4 run produced, because sort node estimate that spill is inevitable. : So all runs are capped to 130m, including the first one.""" Isn't there something in query_result.runtime_profile that could be used to check some of these statements? E.g. I think we can check that no spilling occurred for case 1, but it did occur for case 2 and 3 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 25 Jun 2020 15:00:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6338/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 16 Jun 2020 15:50:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 12: (5 comments) Patch Set 12 add num_backends info in TPlanFragmentInstanceCtx. Using this info, sort node can make a better estimation of how many input data that might be received by each backend, and, if necessary, enforce sort_run_bytes_limit right from the beginning. http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc@199 PS10, Line 199: << " nulls " << (tsort_info. > We usually simply use "Status status = ... ". Done http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc@76 PS11, Line 76: SCOPED_TIMER(runtime_profile_->total_time_counter()); : RETURN_IF_ERROR(ExecNode::Prepare(state)); > I'm not sure my calculation is right here. My assumption is that estimated_ Done. We now piggyback num_backends info through TPlanFragmentInstanceCtx. http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h File be/src/runtime/sorter.h: http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@160 PS10, Line 160: > typo Done http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@211 PS10, Line 211: kely go > nit: specifies Done http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@217 PS10, Line 217: /// Get value of sort_run_bytes_limit query option. If user specifies value between > nit: cases Done -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 16 Jun 2020 15:14:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#12). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceed sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fitted into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_sort.py 13 files changed, 211 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/12 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc@199 PS10, Line 199: const ::impala::Status& add_statu We usually simply use "Status status = ... ". Status only has a single pointer to an error message (which is nullptr if it is OK), so there's is no win in using a ptr/reference. http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h File be/src/runtime/sorter.h: http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@160 PS10, Line 160: sot_rub typo http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@211 PS10, Line 211: specify nit: specifies http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@217 PS10, Line 217: /// There are two case where it is necessary to run intermediate run. nit: cases -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 15 Jun 2020 09:54:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7 PS9, Line 7: IMPALA-6692 > This is one step towards solving the problem. The back pressure problem st Resolving this, and continue discussion in latest patch. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > Patch set 10 add flag to either enforce sort_run_bytes_limit or not. It wil Resolving this, and continue discussion in latest patch. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40 PS9, Line 40: intermediary sort. > I later did 256 MB limit and it achieved a little faster query time. Loweri Done http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc@76 PS11, Line 76: int64_t estimated_input_size = children_node->tnode_->estimated_stats.cardinality : * children_node->row_descriptor_->GetRowSize(); I'm not sure my calculation is right here. My assumption is that estimated_stats.cardinality here is per backend, but it seems like it is a query wide. In my experiment today, I did insert to tpcds_300_parquet.web_sales using 5 executor backends. Each SORT_NODE in each backend is expected to process 5.67GB of row batches, which is more beneficial to fit everything in memory. Setting buffer_pool_limit between 8GB to 28GB still cap and spill the first sort run, while setting it as 29GB or above can successfully waive sort_run_bytes_limit for the first sort run (allowing it to use full memory). Maybe I should divide by the number of executor backend involved here. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 13 Jun 2020 00:58:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6231/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 08 Jun 2020 02:27:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#11). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceed sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fitted into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_sort.py 11 files changed, 141 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/11 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 11 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6225/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 21:35:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > The following estimates vs actual from a partitioned parquet insert of cata Patch set 10 add flag to either enforce sort_run_bytes_limit or not. It will not enforce sort_run_bytes_limit for the first run unless planner estimates hint that spill is inevitable. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47 PS9, Line 47: > This is the maximum time of a single AddBatchTime call. I decide to keep the counter name as it is, because there is already another counter for quicksort total time. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879 PS9, Line 879: int num_processed = 0; > Ahh, Sorter::enable_spilling_ is a bit misleading - that actually is whethe Done, it is now summarized by enforce_sort_run_bytes_limit_ flag. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880 PS9, Line 880: int cur_batch_index = 0; : while (cur_batch_index < batch->num_rows()) { : RETURN_IF_ERROR(AddBatchNoSpill(batch, cur_batch_index, &nu > Ack Done http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894 PS9, Line 894: } > Ack Move the timer and counter to sort-node.cc http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904 PS9, Line 904: RETURN_IF_ERROR( > Ack I decide to silently cap it to minimal 32 MB as your later suggestion. It feels more natural so that we can keep -1 and 0 to represent 'no limit' and not invalidate range between 1 byte to 32 MB. http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531 PS9, Line 531: intermediate > Ack Done http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533 PS9, Line 533: SORT_RUN_BYTES_LIMIT = 103 > Ack Done http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79 PS9, Line 79: options = [('-1', '2g'), ('100m', '2g'), ('400m', '400m'), ('120m', '400m')] > Ack Done -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 21:00:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#10). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceed sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fitted into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_sort.py 11 files changed, 141 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/10 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > > Since the patch can have immediate benefit in some well known cases like The following estimates vs actual from a partitioned parquet insert of catalog sales. Multiple runs with differing sort limits. All with 10 TB scale factor on 20 nodes with an explicit 32 GB memory limit set: Sort Limit.Rows. Est Rows. Peak Mem. Est Peak Mem None 14.40B.15.59B. 25.60 GB 2.43 GB 8GB 14.40B 15.59B 8.00 GB2.43 GB 2GB 14.40B 15.59B 2.00 GB2.43 GB 512MB. 14.40B 15.59B 678.65 MB2.43 GB 256MB 14.40B 15.59B 1.51 GB2.43 GB So cardinality estimate is pretty good in this case, estimated peak mem is very low for the reason you gave (sort estimate doesn't assume the full input will be kept in memory). -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 19:45:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > Since the patch can have immediate benefit in some well known cases like > partitioned inserts, we might consider putting it in with the query option > but without additional safeguards against regressions (but documenting the > cases where it's safe to use) and then explore some of these other approaches > in a follow up phase. I'm fine with this, I just want to make sure we're set up with some freedom to change the behaviour in future. As far as estimates go, there's a lot of room for improvement. The approach to estimates for SortNode is an outlier, since it doesn't actually estimate the ideal memory to keep it in-memory, unlike everything else. I'd be a little concerned that the estimate of the memory required to keep the full sort input in memory could be too high, since it depends on getting the cardinality and row size right for the full output of the plan tree. Agree that the estimates would likely be very accurate for plain ETL cases where the source table has stats computed. We could probably finesse the estimate logic a bit, e.g. use the full data size estimate up to a few GB, then switch to the two-phase estimate. David, do you have some examples of numbers for the estimates vs actual that you mentioned? -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 16:46:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > Estimates could work well on very simple queries, but even a query like a s I agree that we shouldn't use estimates in cases where they're likely to be unreliable. Maybe we should separate the discussion into what we think is needed if we want to make this the default behavior vs what's good enough if it's enabled only with a non-default query option. The potential for regression is obviously more of a concern if we're going to enable by default. A few other thoughts: * If we're looking for heuristics at planning or runtime to indicate whether this is likely to help we should also consider whether the sort has partitioned inputs. It's not clear whether this approach helps if the input to the sort isn't partitioned. * Regarding Tim's suggestion for making the sort more incremental but without spilling more aggressively - I agree this would be interesting to explore. It's not obvious how much of the benefit we're seeing in benchmarks of the current patch is caused by the earlier spilling vs just reducing burstiness through more incremental runs. * Since the patch can have immediate benefit in some well known cases like partitioned inserts, we might consider putting it in with the query option but without additional safeguards against regressions (but documenting the cases where it's safe to use) and then explore some of these other approaches in a follow up phase. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 16:05:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > From a quick look at the planner estimate code it seems we're estimating on Estimates could work well on very simple queries, but even a query like a scanning a huge table + a some predicate can be tricky - should we pessimize by always spilling, even if predicate might turn out to be very selective? The sort node is generally at the "end" of the query, so in complex queries the estimates can be easily orders of magnitude off at that point. + stats can be missing/not up to date For these reasons I think that a simple adaptive behavior or a more sophisticated solution mentioned by Tim (doing quicksort in smaller runs from the start but postpone spilling if possible) would be much more reliable. Using estimates in specific cases (e.g. full table scan without predicates) could be a further optimization. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 09:21:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879 PS9, Line 879: if (cur_batch_index < batch->num_rows() > I think it is implied by line 872? Ahh, Sorter::enable_spilling_ is a bit misleading - that actually is whether it's in the partial sort mode or not. You have Sorter::enable_spilling_ = true for full sorts, even if spilling is disabled at the query level. Just as an example of what I mean, if you run a sort query that requires 500mb with, say, scratch_limit=0, then currently that will succeed so long as it can get the memory. But I think if you set sort_bytes_limit=100mb, then it would try to spill and fail, even thought there's memory available. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 05 Jun 2020 00:38:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > I'm not necessarily opposed to this approach (enforcing limit only after sp >From a quick look at the planner estimate code it seems we're estimating only >what needs to be kept in memory assuming an external 2-phase sort (so assuming >we'll spill). So maybe we should just look at the full input size and enforce >the sort_bytes_limit from the outset if the full input size is > the memory >limit. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 04 Jun 2020 22:50:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (4 comments) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40 PS9, Line 40: is varied between unspecified (not limited) vs 512 MB. The > Did you do any experiments with lower values? Wondering if that helps furth I later did 256 MB limit and it achieved a little faster query time. Lowering to 128 MB make the query time worse. In David's large scale experiment though, there is not much difference between 256 MB vs 512 MB. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879 PS9, Line 879: if (cur_batch_index < batch->num_rows() > I have some concern with how this will interact with spilling being disable I think it is implied by line 872? DCHECK(enable_spilling_); http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531 PS9, Line 531: intermediate > intermediate sort isn't that clear - I guess "intermediate sort runs" Ack http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533 PS9, Line 533: SORT_BYTES_LIMIT = 103 > I think we might actually want to make this SORT_RUN_BYTES_LIMIT or somethi Ack -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 04 Jun 2020 23:00:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (4 comments) I'm thinking through the high level of this. I think the biggest obstacle to using this (or turning on by default) is that it can make queries spill that didn't spill before, or spill more than before, and that can cause issues. I guess if the sort is already spilling the different is smaller. It might be more of a clear win if it achieved the more incremental sorting behaviour without the extra spilling, e.g. by having sorted in-memory runs (as opposed to sorted spilled runs and unsorted in-memory runs). I don't know that I necessarily want to expand the scope of this commit, but would be interested on your thoughts on that. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40 PS9, Line 40: is varied between unspecified (not limited) vs 512 MB. The Did you do any experiments with lower values? Wondering if that helps further. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879 PS9, Line 879: if (cur_batch_index < batch->num_rows() I have some concern with how this will interact with spilling being disabled (scratch_limit=0 or disable_unsafe_spills=true, with some missing stats). In those cases StartSpilling() returns an error. Other operators only spill when the alternative is failing the query with OOM, so it's fine to just fail the query. But here we could hit that even if there's plenty of memory. I think it would make sense to also check if spilling is enabled. http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531 PS9, Line 531: intermediate intermediate sort isn't that clear - I guess "intermediate sort runs" http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533 PS9, Line 533: SORT_BYTES_LIMIT = 103 I think we might actually want to make this SORT_RUN_BYTES_LIMIT or something along those lines. This name kinda boxes us into it being an upper bound on the total memory used by the sort node, whereas we might want to use more memory to optimise this further. One way I can see this evolving is that, if we had some free memory to play with when hitting this limit, we could defer spilling and keep some of the runs in memory. Maybe even incrementally do the quicksort work on previous runs in AddBatch() to reduce the blocking time further. If we did that, SORT_BYTES_LIMIT would be pretty misleading. Another case I thought about was the interaction with spilling being disabled - i left a comment about that in sorter.cc. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 04 Jun 2020 22:37:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > Great idea! I will try to implement it that way. I'm not necessarily opposed to this approach (enforcing limit only after spilling starts) but if we had confidence in the memory estimate it seems like we could enforce the limit from the start if the estimate is > the memory limit (we're very likely to spill). Unfortunately in some of the queries I'm looking at our estimates are lower than the actual peak consumed (with no limit) by an order of magnitude even though the queries end up spilling heavily. So we'll have to look into why those estimates are so bad, but for now maybe we should file a follow up JIRA to go back and make the application of the limit consider the estimate once we've improved the estimates (and consider a TODO in the code). -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 04 Jun 2020 22:25:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7 PS9, Line 7: IMPALA-6692 > Should this solve the issue in general, or this is just one step in that di This is one step towards solving the problem. The back pressure problem still exist in some degree, but should be decreased using this method. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > I was thinking about ways to minimize the regression this can cause in quer Great idea! I will try to implement it that way. I think there are two points that I need to pay attention in regards to this idea. First, the size of the first sort run, where sort_bytes_limit is not enforced yet, can be couple times bigger than the rest of other runs. I'm not sure how the merge phase will behave in that case. I suppose it may be work just fine, since in many case the very last run will have different size (smaller) as well. Second, when sort_bytes_limit is going to be enforced, I need to make sure to lower memory reservation for that SORT NODE to at least sort_bytes_limit. The reservation reduction might need to be done gradually as the sorted run pages are unpinned. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47 PS9, Line 47: AddBatchTime > Does this mean the maximum time of a single AddBatchTime call, or the total This is the maximum time of a single AddBatchTime call. AddBatchTime is implemented as SummaryStatsCounter showing min, max, average, and sample count. In regards to your proposed idea earlier, I probably need to rethink about this counter as well. The purpose of this counter was to measure the effect of selected sort_bytes_limit value towards the time a SORT NODE blocked doing quicksort. If we not enforcing sort_bytes_limit in the first run, the Max AddBatchTime will then always associated with the first quicksort. Maybe I need to refocus this counter to just measure the quicksort time. Rename 'AddBatchTime' to 'QuickSortTime'. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880 PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0 :&& buffer_pool_client_->GetUsedReservation() :>= state_->query_options().sort_bytes_limit) > nit: indentation looks a bit messy, but I am not sure what is the rule here Ack http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894 PS9, Line 894: timer.Stop(); > I don't know if it matters, but this means that the counter won't be increa Ack Will rethink a better way to wrap the timer. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904 PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", &sort_bytes_limit)); > I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem Ack http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79 PS9, Line 79: query, exec_option, table_format=table_format) > nit: +2 indentation Ack -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 04 Jun 2020 16:13:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: (7 comments) The change looks good to me in general, I only have some nits and improvement suggestions. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7 PS9, Line 7: IMPALA-6692 Should this solve the issue in general, or this is just one step in that direction? The 3-6 second AddBatchTime still seems enough to fill the buffers and cause back pressure. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, I was thinking about ways to minimize the regression this can cause in queries that spill more because of the sort_bytes_limit, e.g. where the memory need of the sort is between sort_bytes_limit and mem limit. A compromise could be to start using sort_bytes_limit only after the sorter had to spill at least once. This would mean no spilling if it is not needed at all, and the "big sort + spilling" could only occur once for an instance, so the penalty would not scale with the size of the data set. This may potentially allow us to set a safe default for sort_bytes_limit that rarely leads to regression but still helps in huge queries. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47 PS9, Line 47: AddBatchTime Does this mean the maximum time of a single AddBatchTime call, or the total time of fragment instance that spent to most time in the function? http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880 PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0 :&& buffer_pool_client_->GetUsedReservation() :>= state_->query_options().sort_bytes_limit) nit: indentation looks a bit messy, but I am not sure what is the rule here Moving the check to a function like bool ReachedSortBytesLimit() could help. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894 PS9, Line 894: timer.Stop(); I don't know if it matters, but this means that the counter won't be increased if the function returns with an error, while it is possible that a huge amount of time was spent here. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904 PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", &sort_bytes_limit)); I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem with small values is that it would greatly increase the number of runs, which can lead to doing the merge phase of the sort in multiple levels, as all runs during a merge needs a buffer, see Sorter::MaxRunsInNextMerge(). It seems also ok to enforce this in Sorter without notifying the user. http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79 PS9, Line 79: query, exec_option, table_format=table_format) nit: +2 indentation -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 04 Jun 2020 14:31:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6209/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 03 Jun 2020 23:43:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6206/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 03 Jun 2020 23:11:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#9). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happen, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue add the next row batches, and so on. This algorithm try to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speedup the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_bytes_limit'. If the total used reservation of quicksort has exceed sort_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no sort_bytes_limit | 512 MB sort_bytes_limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M tests/query_test/test_sort.py 8 files changed, 49 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/9 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto