Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9943 )
Change subject: IMPALA-5706: Parallelise read I/O in sorter ...................................................................... Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1730 PS7, Line 1730: return (sorted_runs_.size() + 1) / 2; If the goal is to minimize the number of "extra merges" per row, then it is optimal for the final merge to always merge as much runs as possible, so this line could return sorted_runs_.size() - max_runs_per_intermediate_merge An example when this would result in less merges: max_runs_per_intermediate_merge: 3 number of runs: 7 (the numbers are the number of original runs merged into a run) 1 1 1 1 1 1 1 1 1 1 1 3 - current logic decides to merge (5+1/2)=3 runs 1 3 3 - ready for final merge after merging 6 runs 1 1 1 1 1 1 1 1 1 1 1 3 - new logic would decide to merge 5-3=2 runs 1 1 3 2 - ready for final merge after merging 5 runs On the other hand, merging more runs in the final merge means that the buffers will be released later, so I am not completely sure that maximizing it is a good idea. -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 07 May 2018 15:59:02 +0000 Gerrit-HasComments: Yes
