Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 )
Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet ...................................................................... Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > (if we pushed predicate evaluation down into the column readers, that might Playing devil's advocate and really trying to understand the crux of why this new approach is simpler. You are right that the stream can free the previous I/O buffer immediately, but you are still accumulating the same amount of memory in the data_pool_, so the peak memory consumption seems no different than before. Overall you might even need one more page than before. We copy and accumulate data pages here, but only free the data page from the stream in the next call to ReadDataPage(), so for some period of time one page exists both in the stream and in the data_pool_. I'm not sure I follow why the the "synchronized release of I/O buffers" is difficult in the copy-in-transfer approach. The scratch batch already has all interesting I/O buffers attached, so we can release those after copying the var-len data of survivors. Returned batches will have no I/O buffers attached. Completely agree about the perf tradeoffs you mentioned. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 29 Sep 2017 00:35:38 +0000 Gerrit-HasComments: Yes
