Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 )
Change subject: IMPALA-5842: Write page index in Parquet files ...................................................................... Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@300 PS2, Line 300: std::vector<bool> null_pages_; > About long strings: I think that it would be safer to start with a smaller If all strings in a page have the same 1KB prefix, then performance and space consumption is still likely to be dominated by those. If we want to cap at a small value, we will face this for much less extreme workloads and not writing column stats becomes more of an issue. If you want to dive into this, please do comprehensive experiments to determine a good size. http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py File tests/query_test/test_parquet_page_index.py: http://gerrit.cloudera.org:8080/#/c/9693/6/tests/query_test/test_parquet_page_index.py@163 PS6, Line 163: for null_page in null_pages: : assert null_page > This could be "assert False not in null_pages". Creating a variable to stor Python has any() and all(). However, it might actually be better to use enumerate() and print the index if a page is not null to make debugging easier. -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Anonymous Coward #248 Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 27 Mar 2018 17:49:05 +0000 Gerrit-HasComments: Yes
