Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9943 )
Change subject: IMPALA-5706: Parallelise read I/O in sorter ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/9943/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9943/3//COMMIT_MSG@14 PS3, Line 14: - Remove the hard-coded maximum limit of buffers that can be used : for merging the sorted runs. Instead this number is calculated : based on the available memeory through buffer pool. Does this also apply to the final merge? If yes, does this mean that the parent node (which calls Open/GetNext on the sort node) will have less memory to do its job than before this change? My concern is that if the sort node reserves all the memory it can during Open, and the parent node wants to reserve memory after it calls its child node's Open, then it will not have enough memory left. http://gerrit.cloudera.org:8080/#/c/9943/3//COMMIT_MSG@26 PS3, Line 26: - In case doing a sort on top of a join when working with restricted : amount of memory then additional memory is successfully allocated : for merging than was available during the initial sort runs. I found this sentence hard to understand - can you rephrase it in a more detailed way? As I understand it, when the initial sort runs are finished, then the child node (join in this case) can be closed, so its released memory can be used by the merge runs. It would be nice to write a small time line of the reservations during the life of a sort node, e.g. Open: 1. child is opened, it can reserve memory 2. sort node reserves memory for initial sort runs 3. initial runs 4. child is closed, its reservation is released 5. sort node can try to increase its reservation 6. intermediate merges GetNext: 7. final merge Close: 8. reservation released http://gerrit.cloudera.org:8080/#/c/9943/3//COMMIT_MSG@32 PS3, Line 32: this change slighlty decrease the execution time for sorting. type: missing s http://gerrit.cloudera.org:8080/#/c/9943/3//COMMIT_MSG@32 PS3, Line 32: slighlty typo: slightly http://gerrit.cloudera.org:8080/#/c/9943/3/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/9943/3/be/src/runtime/sorter.cc@1037 PS3, Line 1037: DCHECK(next_page->is_pinned()); I think that the original DCHECK is still useful, but not for the next page, but for the page after that if there is one. http://gerrit.cloudera.org:8080/#/c/9943/3/be/src/runtime/sorter.cc@1039 PS3, Line 1039: 'next_page' is already pinn I would add to this comment that all pages must be pinned. -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 20 Apr 2018 16:00:23 +0000 Gerrit-HasComments: Yes
