Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@300
PS2, Line 300:   std::vector<bool> null_pages_;
> About long strings: I think that it would be safer to start with a smaller
If all strings in a page have the same 1KB prefix, then performance and space 
consumption is still likely to be dominated by those. If we want to cap at a 
small value, we will face this for much less extreme workloads and not writing 
column stats becomes more of an issue. If you want to dive into this, please do 
comprehensive experiments to determine a good size.


http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py@163
PS6, Line 163:             for null_page in null_pages:
             :               assert null_page
> This could be "assert False not in null_pages". Creating a variable to stor
Python has any() and all(). However, it might actually be better to use 
enumerate() and print the index if a page is not null to make debugging easier.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 27 Mar 2018 17:49:05 +0000
Gerrit-HasComments: Yes

Reply via email to