Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 )
Change subject: IMPALA-5842: Write page index in Parquet files ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@300 PS2, Line 300: std::vector<bool> null_pages_; > About the formula for prefix size: what would happen, if there is no common The formula was only a conversation-starter, but the situation is actually worse. On yesterday's Parquet sync meeting it turned out that it is a bad idea to have different limits for each page, here's why: Let's assume we have the following pages: Page 0: 12345612 12345613 12345614 min_value: 12345612 max_value: 12345614 Page 1: 12345615 23456789 min_value: 123 max_value: 235 Note that the values are in ascending order, but Page 0's min value is greater than Page 1's min_value. I'll investigate the second part of your comment. http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py File tests/query_test/test_parquet_page_index.py: http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py@163 PS6, Line 163: : > I was thinking about catching the exception in _validate_parquet_page_index I added the column name. I might add the filename as well in a later commit. -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Anonymous Coward #248 Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 30 Mar 2018 00:05:25 +0000 Gerrit-HasComments: Yes
