Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21777 )

Change subject: IMPALA-13334: Fix test_sort.py DCHECK hit when 
max_sort_run_size>0
......................................................................


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/21777/3/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/21777/3/be/src/service/query-options-test.cc@274
PS3, Line 274:
             :   for (const auto& test_case : case_set) {
> Ok, I can accept Daniels argument.
I think this test can stay. I notice this is gone in ps6 in favor of set.test.
I also suggest moving this to under SetSpecialOptions test to add 
max_sort_run_size=1 testcase.


http://gerrit.cloudera.org:8080/#/c/21777/6/testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test
File testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test:

http://gerrit.cloudera.org:8080/#/c/21777/6/testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test@9
PS6, Line 9: 2m
"2MB" to differentiate against 2 millions.


http://gerrit.cloudera.org:8080/#/c/21777/5/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/21777/5/tests/query_test/test_sort.py@168
PS5, Line 168:     if exec_option['max_sort_run_size'] == 2:
             :       exec_option['mem_limit'] = "144m"
> AFAIK if memory estimation cannot be calculated, then every query passes ad
Got it, thank you for the explanation.


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@115
PS6, Line 115:     options = [('2g', '100m', '0'), ('400m', '130m', '5')]
I'd suggest to rewrite into this for clarity:

options = [('2g', '100m', '0'), ('400m', '130m', '4')]
if (exec_option['max_sort_run_size'] == 0):
  # max_sort_run_size > 0 will spill more in Case 2.
  options = [('2g', '100m', '0'), ('400m', '130m', '5')]

or

# max_sort_run_size > 0 will spill more in Case 2.
options = [('2g', '100m', '0'),
           ('400m', '130m', ('5' if exec_option['max_sort_run_size'] > 0 else 
'4'))]


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@167
PS6, Line 167: # With 2 pages per run the varlen data is more fragmented and 
requires a higher limit.
# With max_sort_run_size=2 (2 pages per run) the varlen data is more fragmented 
and requires a higher limit to maintain "TotalMergesPerformed: 1" assertion.


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@243
PS6, Line 243: The larger the run size, the sooner the sorter can give up 
memory to the next node.
Move this comment just after L245. And maybe reword to this:

"If max_sort_run_size > 0, the larger the run size, the sooner the sorter can 
give up memory to the next node."


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@246
PS6, Line 246: buffer_limit = '27m'
Add comment:

# Increase buffer_limit to maintain such that query never spill.


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@250
PS6, Line 250: buffer_limit
"buffer_pool_limit", for consistency.


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@250
PS6, Line 250: $LIMIT
"$BUFFER_POOL_LIMIT", for consistency.


http://gerrit.cloudera.org:8080/#/c/21777/6/tests/query_test/test_sort.py@402
PS6, Line 402: # With 2 pages per run the var-len data is more fragmented, 
therefore it spills more.
# With max_sort_run_size=2 (2 pages per run) the var-len data is more 
fragmented, therefore it spills more.



--
To view, visit http://gerrit.cloudera.org:8080/21777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I943d8edcc87df168448a174d6c9c6b46fe960eae
Gerrit-Change-Number: 21777
Gerrit-PatchSet: 6
Gerrit-Owner: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 29 Oct 2024 15:10:53 +0000
Gerrit-HasComments: Yes

Reply via email to