Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 )
Change subject: IMPALA-5842: Write page index in Parquet files ...................................................................... Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@300 PS2, Line 300: std::vector<bool> null_pages_; > Tim: Yeah, maybe we should combine the code into a shared utility function, since I think the core logic is the same. IMO the sensible thing is to leave the max unset in that case. Hopefully we can amend the spec to allow that. I think technically now our choices are to drop the entire column index or write out a huge value. Dropping the index wouldn't be a total disaster, since this should be a rare edge case, but it is kind of crappy. http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@1195 PS2, Line 1195: // If the file contains more than one row_group, don't write the index. I missed this earlier - this seems weird. Do we ever write more than one row group? Do we test it? http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@1224 PS2, Line 1224: n > I just realized that the min/max values are only strings at this point, at I think I like the idea of determining asc/desc incrementally instead of doing an extra pass. Doing it in Merge() could make sense, but I think we would have to clearly document what that means, since Merge() would no longer be a commutative operation. -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 20 Mar 2018 22:21:36 +0000 Gerrit-HasComments: Yes
