Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 )
Change subject: IMPALA-2758: Change BufferedTupleStream::GetRows to returning multi batches ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8226/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8226/1//COMMIT_MSG@10 PS1, Line 10: which returns multiple batches instead of one. > The callers iterates through the build rows for every row in null_probe_row I'm ok with the stream staying pinned, the main goal was to remove the GetRows() API, which uses RowBatch in a weird way. I think keeping the stream unpinned to handle many nulls on the build side is just a special case of https://issues.apache.org/jira/browse/IMPALA-4857. I think that is harder to implement well - we'd want to consider doing some kind of blocked algorithm to reduce I/O. http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h File be/src/runtime/buffered-tuple-stream.h: http://gerrit.cloudera.org:8080/#/c/8226/1/be/src/runtime/buffered-tuple-stream.h@332 PS1, Line 332: Status GetNext( My intent was to remove the GetRows() API altogether and switch callers to calling GetNext() directly. Creating many row batches is somewhat better, but there's still a lot of unnecessary memory overhead with all of the tuple_ptrs_. -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 10 Oct 2017 20:34:32 +0000 Gerrit-HasComments: Yes
