Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8085 )
Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet ...................................................................... Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071 PS6, Line 1071: if (PageContainsTupleData(current_page_header_.data_page_header.encoding)) { > Playing devil's advocate and really trying to understand the crux of why th The basic problem is that I want to be able to reason about how many I/O buffers you need to make progress in reading through a Parquet file. If each column can make progress independently then it's simple since you can reason about each column in isolation. With the copying out you only need one buffer per column to make progress. If you try to hold onto I/O buffers you end up with dependencies between column readers. E.g. reader A may have materialised 512 values and hit the end of a buffer. Reader A can't free the previous buffer until those rows have been returned. Those rows can't be returned until all of the column values have been materialised for those rows. But maybe column B can only materialise 511 values before hitting the end of its buffer. So then you'd need to output 511 rows, then advance column B, then output 1 row before you advance column A. You can get out of that two ways: * Advance the different column readers different amounts and make progress on materialising as many final rows as you can. I think you would need to try to advance all column readers, then return however many rows you can at that point, then try to inch them forward again. * Let each column reader advance to the next buffer before freeing the previous one and hope that all column readers can materialise the requested number of values without running out of buffers. We'd need to make some assumptions to ensure that we can always materialise batch_size values given M I/O buffers. I don't think we can make any universally valid assumptions so we'd probably have to add code to detect when those assumptions are invalidated and add query options to get out of any problems. It doesn't seem worth spending complexity on either solution, especially since we'd need extra test coverage. Data pages are much smaller (around 64kb) than the 8MB I/O buffers and aren't currently part of the reservation so they're less of a problem for now - the memory overhead per column will be around max(2 * 64kb, 1024 * <avg value size>) which is tolerable. -- To view, visit http://gerrit.cloudera.org:8080/8085 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2 Gerrit-Change-Number: 8085 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 29 Sep 2017 01:41:25 +0000 Gerrit-HasComments: Yes
