Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 12:

(13 comments)

Sorry for the many grammar comments, I was also the victim of this in the past 
:)

My only real concern is about the case when the cardinality is unknown. My 
preference would be to try to allow spilling in that case.

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77
PS12, Line 77: going
nit: "will go"?


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89: cardinality
What is the default value of this? Can it be -1 (unknown)? The result seems 
pretty wrong in that case.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
I think that this doesn't contain varlen data, so it can greatly underestimate 
the input size if there are strings.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92
PS12, Line 92: 2)
I think that VLOG(3) is enough.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101: a
"the" would be better


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101:   /// 'estimated_input_size' is a total rows in bytes that 
estimated to get added into
nit: missing "are"


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102:   /// this sorter. This is used to decide if sorter need to 
proactively spilling for
nit: needs


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102: spilling
nit: spill


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223
PS12, Line 223: run
nit: "do an"?


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816
PS12, Line 816:       VLOG(2) << Substitute(
I think that VLOG(3) is enough here - this should happen if the cardinality 
estimation was wrong, which may make WARNING logical, but this seems 
unavoidable for many queries, so I wouldn't spam the warning log.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907
PS12, Line 907:         VLOG(2) << Substitute(
Same as line 816.


http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645
PS12, Line 645: backeds
typo: backends


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74:     """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
             :        when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
             :        a query is, the more sort runs are likely to be produced.
             :        Case 1 : 1 run produced, because all rows fit within the 
maximum reservation.
             :                 sort_run_bytes_limit is not enforced.
             :        Case 2 : 3 run produced, because the first run hit 
reservation limit, and the
             :                 next 2 runs are capped to 150m.
             :        Case 3 : 4 run produced, because sort node estimate that 
spill is inevitable.
             :                 So all runs are capped to 130m, including the 
first one."""
Isn't there something in query_result.runtime_profile that could be used to 
check some of these statements? E.g. I think we can check that no spilling 
occurred for case 1, but it did occur for case 2 and 3



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 15:00:47 +0000
Gerrit-HasComments: Yes

Reply via email to