Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 )
Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. ...................................................................... Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7 PS9, Line 7: IMPALA-6692 > Should this solve the issue in general, or this is just one step in that di This is one step towards solving the problem. The back pressure problem still exist in some degree, but should be decreased using this method. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23 PS9, Line 23: This patch speedup the decision to start the sort without waiting it : to hit memory limit first by capping the intermediary quicksort run to : lower memory limit, > I was thinking about ways to minimize the regression this can cause in quer Great idea! I will try to implement it that way. I think there are two points that I need to pay attention in regards to this idea. First, the size of the first sort run, where sort_bytes_limit is not enforced yet, can be couple times bigger than the rest of other runs. I'm not sure how the merge phase will behave in that case. I suppose it may be work just fine, since in many case the very last run will have different size (smaller) as well. Second, when sort_bytes_limit is going to be enforced, I need to make sure to lower memory reservation for that SORT NODE to at least sort_bytes_limit. The reservation reduction might need to be done gradually as the sorted run pages are unpinned. http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47 PS9, Line 47: AddBatchTime > Does this mean the maximum time of a single AddBatchTime call, or the total This is the maximum time of a single AddBatchTime call. AddBatchTime is implemented as SummaryStatsCounter showing min, max, average, and sample count. In regards to your proposed idea earlier, I probably need to rethink about this counter as well. The purpose of this counter was to measure the effect of selected sort_bytes_limit value towards the time a SORT NODE blocked doing quicksort. If we not enforcing sort_bytes_limit in the first run, the Max AddBatchTime will then always associated with the first quicksort. Maybe I need to refocus this counter to just measure the quicksort time. Rename 'AddBatchTime' to 'QuickSortTime'. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880 PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0 : && buffer_pool_client_->GetUsedReservation() : >= state_->query_options().sort_bytes_limit) > nit: indentation looks a bit messy, but I am not sure what is the rule here Ack http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894 PS9, Line 894: timer.Stop(); > I don't know if it matters, but this means that the counter won't be increa Ack Will rethink a better way to wrap the timer. http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904 PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", &sort_bytes_limit)); > I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem Ack http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79 PS9, Line 79: query, exec_option, table_format=table_format) > nit: +2 indentation Ack -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 04 Jun 2020 16:13:37 +0000 Gerrit-HasComments: Yes
