Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 )
Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. ...................................................................... Patch Set 12: (13 comments) Sorry for the many grammar comments, I was also the victim of this in the past :) My only real concern is about the case when the cardinality is unknown. My preference would be to try to allow spilling in that case. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h File be/src/exec/sort-node.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77 PS12, Line 77: going nit: "will go"? http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89 PS12, Line 89: cardinality What is the default value of this? Can it be -1 (unknown)? The result seems pretty wrong in that case. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize I think that this doesn't contain varlen data, so it can greatly underestimate the input size if there are strings. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92 PS12, Line 92: 2) I think that VLOG(3) is enough. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h File be/src/runtime/sorter.h: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: a "the" would be better http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101 PS12, Line 101: /// 'estimated_input_size' is a total rows in bytes that estimated to get added into nit: missing "are" http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: /// this sorter. This is used to decide if sorter need to proactively spilling for nit: needs http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102 PS12, Line 102: spilling nit: spill http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223 PS12, Line 223: run nit: "do an"? http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816 PS12, Line 816: VLOG(2) << Substitute( I think that VLOG(3) is enough here - this should happen if the cardinality estimation was wrong, which may make WARNING logical, but this seems unavoidable for many queries, so I wouldn't spam the warning log. http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907 PS12, Line 907: VLOG(2) << Substitute( Same as line 816. http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645 PS12, Line 645: backeds typo: backends http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74 PS12, Line 74: """The first sort run is given a privilege to ignore sort_run_bytes_limit, except : when estimate hints that spill is inevitable. The lower sort_run_bytes_limit of : a query is, the more sort runs are likely to be produced. : Case 1 : 1 run produced, because all rows fit within the maximum reservation. : sort_run_bytes_limit is not enforced. : Case 2 : 3 run produced, because the first run hit reservation limit, and the : next 2 runs are capped to 150m. : Case 3 : 4 run produced, because sort node estimate that spill is inevitable. : So all runs are capped to 130m, including the first one.""" Isn't there something in query_result.runtime_profile that could be used to check some of these statements? E.g. I think we can check that no spilling occurred for case 1, but it did occur for case 2 and 3 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 25 Jun 2020 15:00:47 +0000 Gerrit-HasComments: Yes
