Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 9:

(7 comments)

The change looks good to me in general, I only have some nits and improvement 
suggestions.

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
Should this solve the issue in general, or this is just one step in that 
direction? The 3-6 second AddBatchTime still seems enough to fill the buffers 
and cause back pressure.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
            : to hit memory limit first by capping the intermediary quicksort 
run to
            : lower memory limit,
I was thinking about ways to minimize the regression this can cause in queries 
that spill more because of the sort_bytes_limit, e.g. where the memory need of 
the sort is between sort_bytes_limit and mem limit.

A compromise could be to start using sort_bytes_limit only after the sorter had 
to spill at least once. This would mean no spilling if it is not needed at all, 
and the "big sort + spilling" could only occur once for an instance, so the 
penalty would not scale with the size of the data set.

This may potentially allow us to set a safe default for sort_bytes_limit that 
rarely leads to regression but still helps in huge queries.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47: AddBatchTime
Does this mean the maximum time of a single AddBatchTime call, or the total 
time of fragment instance that spent to most time in the function?


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880:         || (state_->query_options().sort_bytes_limit > 0
             :                && buffer_pool_client_->GetUsedReservation()
             :                    >= state_->query_options().sort_bytes_limit)
nit: indentation looks a bit messy, but I am not sure what is the rule here

Moving the check to a function like bool ReachedSortBytesLimit() could help.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894:   timer.Stop();
I don't know if it matters, but this means that the counter won't be increased 
if the function returns with an error, while it is possible that a huge amount 
of time was spent here.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904:         RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", 
&sort_bytes_limit));
I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem 
with small values is that it would greatly increase the number of runs, which 
can lead to doing the merge phase of the sort in multiple levels, as all runs 
during a merge needs a buffer, see Sorter::MaxRunsInNextMerge(). It seems also 
ok to enforce this in Sorter without notifying the user.


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79:         query, exec_option, table_format=table_format)
nit: +2 indentation



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 14:31:00 +0000
Gerrit-HasComments: Yes

Reply via email to