Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9943 )

Change subject: IMPALA-5706: Parallelise read I/O in sorter
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1730
PS7, Line 1730:     return (sorted_runs_.size() + 1) / 2;
If the goal is to minimize the number of "extra merges" per row, then it is 
optimal for the final merge to always merge as much runs as possible, so this 
line could return
sorted_runs_.size() - max_runs_per_intermediate_merge

An example when this would result in less merges:
max_runs_per_intermediate_merge: 3
number of runs: 7
(the numbers are the number of original runs merged into a run)

1 1 1 1 1 1 1

1 1 1 1 3 - current logic decides to merge (5+1/2)=3 runs

1 3 3 - ready for final merge after merging 6 runs

1 1 1 1 1 1 1

1 1 1 1 3 - new logic would decide to merge 5-3=2 runs

1 1 3 2 - ready for final merge after merging 5 runs

On the other hand, merging more runs in the final merge means that the buffers 
will be released later, so I am not completely sure that maximizing it is a 
good idea.



--
To view, visit http://gerrit.cloudera.org:8080/9943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Gerrit-Change-Number: 9943
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 07 May 2018 15:59:02 +0000
Gerrit-HasComments: Yes

Reply via email to