Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
> Should this solve the issue in general, or this is just one step in that di
This is one step towards solving the problem.  The back pressure problem still 
exist in some degree, but should be decreased using this method.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
            : to hit memory limit first by capping the intermediary quicksort 
run to
            : lower memory limit,
> I was thinking about ways to minimize the regression this can cause in quer
Great idea! I will try to implement it that way.
I think there are two points that I need to pay attention in regards to this 
idea.

First, the size of the first sort run, where sort_bytes_limit is not enforced 
yet, can be couple times bigger than the rest of other runs. I'm not sure how 
the merge phase will behave in that case. I suppose it may be work just fine, 
since in many case the very last run will have different size (smaller) as well.

Second, when sort_bytes_limit is going to be enforced, I need to make sure to 
lower memory reservation for that SORT NODE to at least sort_bytes_limit. The 
reservation reduction might need to be done gradually as the sorted run pages 
are unpinned.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47: AddBatchTime
> Does this mean the maximum time of a single AddBatchTime call, or the total
This is the maximum time of a single AddBatchTime call.
AddBatchTime is implemented as SummaryStatsCounter showing min, max, average, 
and sample count.

In regards to your proposed idea earlier, I probably need to rethink about this 
counter as well.
The purpose of this counter was to measure the effect of selected 
sort_bytes_limit value towards the time a SORT NODE blocked doing quicksort.
If we not enforcing sort_bytes_limit in the first run, the Max AddBatchTime 
will then always associated with the first quicksort.

Maybe I need to refocus this counter to just measure the quicksort time. Rename 
'AddBatchTime' to 'QuickSortTime'.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880:         || (state_->query_options().sort_bytes_limit > 0
             :                && buffer_pool_client_->GetUsedReservation()
             :                    >= state_->query_options().sort_bytes_limit)
> nit: indentation looks a bit messy, but I am not sure what is the rule here
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894:   timer.Stop();
> I don't know if it matters, but this means that the counter won't be increa
Ack

Will rethink a better way to wrap the timer.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904:         RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", 
&sort_bytes_limit));
> I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79:         query, exec_option, table_format=table_format)
> nit: +2 indentation
Ack



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 04 Jun 2020 16:13:37 +0000
Gerrit-HasComments: Yes

Reply via email to