Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 9:

(4 comments)

I'm thinking through the high level of this. I think the biggest obstacle to 
using this (or turning on by default) is that it can make queries spill that 
didn't spill before, or spill more than before, and that can cause issues.

I guess if the sort is already spilling the different is smaller.

It might be more of a clear win if it achieved the more incremental sorting 
behaviour without the extra spilling, e.g. by having sorted in-memory runs (as 
opposed to sorted spilled runs and unsorted in-memory runs). I don't know that 
I necessarily want to expand the scope of this commit, but would be interested 
on your thoughts on that.

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40:   is varied between unspecified (not limited) vs 512 MB. The
Did you do any experiments with lower values? Wondering if that helps further.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879:     if (cur_batch_index < batch->num_rows()
I have some concern with how this will interact with spilling being disabled 
(scratch_limit=0 or disable_unsafe_spills=true, with some missing stats). In 
those cases StartSpilling() returns an error.

Other operators only spill when the alternative is failing the query with OOM, 
so it's fine to just fail the query. But here we could hit that even if there's 
plenty of memory. I think it would make sense to also check if spilling is 
enabled.


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531
PS9, Line 531: intermediate
intermediate sort isn't that clear - I guess "intermediate sort runs"


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533
PS9, Line 533:   SORT_BYTES_LIMIT = 103
I think we might actually want to make this SORT_RUN_BYTES_LIMIT or something 
along those lines. This name kinda boxes us into it being an upper bound on the 
total memory used by the sort node, whereas we might want to use more memory to 
optimise this further.

One way I can see this evolving is that, if we had some free memory to play 
with when hitting this limit, we could defer spilling and keep some of the runs 
in memory. Maybe even incrementally do the quicksort work on previous runs in 
AddBatch() to reduce the blocking time further.

If we did that, SORT_BYTES_LIMIT would be pretty misleading.

Another case I thought about was the interaction with spilling being disabled - 
i left a comment about that in sorter.cc.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:37:04 +0000
Gerrit-HasComments: Yes

Reply via email to