Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@300
PS2, Line 300:   std::vector<bool> null_pages_;
> About the formula for prefix size: what would happen, if there is no common
The formula was only a conversation-starter, but the situation is actually 
worse. On yesterday's Parquet sync meeting it turned out that it is a bad idea 
to have different limits for each page, here's why:

Let's assume we have the following pages:

 Page 0:
 12345612
 12345613
 12345614
 min_value: 12345612
 max_value: 12345614
 Page 1:
 12345615
 23456789
 min_value: 123
 max_value: 235

Note that the values are in ascending order, but Page 0's min value is greater 
than Page 1's min_value.


I'll investigate the second part of your comment.


http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py@163
PS6, Line 163:
             :
> I was thinking about catching the exception in _validate_parquet_page_index
I added the column name. I might add the filename as well in a later commit.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 30 Mar 2018 00:05:25 +0000
Gerrit-HasComments: Yes

Reply via email to