Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/6/be/src/exec/parquet-column-readers.cc@1071
PS6, Line 1071:       if 
(PageContainsTupleData(current_page_header_.data_page_header.encoding)) {
> (if we pushed predicate evaluation down into the column readers, that might
Playing devil's advocate and really trying to understand the crux of why this 
new approach is simpler.

You are right that the stream can free the previous I/O buffer immediately, but 
you are still accumulating the same amount of memory in the data_pool_, so the 
peak memory consumption seems no different than before.

Overall you might even need one more page than before. We copy and accumulate 
data pages here, but only free the data page from the stream in the next call 
to ReadDataPage(), so for some period of time one page exists both in the 
stream and in the data_pool_.

I'm not sure I follow why the the "synchronized release of I/O buffers" is 
difficult in the copy-in-transfer approach. The scratch batch already has all 
interesting I/O buffers attached, so we can release those after copying the 
var-len data of survivors. Returned batches will have no I/O buffers attached.

Completely agree about the perf tradeoffs you mentioned.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 29 Sep 2017 00:35:38 +0000
Gerrit-HasComments: Yes

Reply via email to