Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 )
Change subject: IMPALA-5842: Write page index in Parquet files ...................................................................... Patch Set 12: (5 comments) Do we have tests for the edge cases? For example a table with columns of 63/64/65 length strings. http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@87 PS12, Line 87: const int MAX_STAT_VALUE_LENGTH = 64; If different value is used for page and column stats, then the constants should be placed close to each other and their name should reflect their role. http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@749 PS12, Line 749: Status s_min = TruncateMinValue(page_stats.min_value, MAX_STAT_VALUE_LENGTH, : &min_val); : Status s_max = TruncateMaxValue(page_stats.max_value, MAX_STAT_VALUE_LENGTH, : &max_val); I can't find TruncateMinValue and TruncateMaxValue in code. http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@1221 PS12, Line 1221: column.column_index_.__set_boundary_order(column.row_group_stats_base_->GetBoundaryOrder()); nit: long line http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py File tests/query_test/test_parquet_page_index.py: http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@171 PS12, Line 171: len(page_max_value) != len(column_max_value)): The != should be < - what happens if column_max_value (e.g. "bb") is shorter then page_max_value (e.g "aaaa")? I would also consider using the exact max length used for page stats, to make the tests as strict as possible. http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@172 PS12, Line 172: assert page_max_value[:-1] <= column_max_value[:len(page_max_value) - 1] This may not be true if the last two characters contain the max char value. -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Anonymous Coward #248 Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 24 Apr 2018 10:42:46 +0000 Gerrit-HasComments: Yes
