Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9693 )
Change subject: IMPALA-5842: Write page index in Parquet files ...................................................................... Patch Set 13: (5 comments) Thanks, Csaba! Yes, we have tests for the edge cases in string-util-test.cc. And also added a new python test function 'test_max_string_values' http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@87 PS12, Line 87: // Base class for column writers. This contains most of the logic except for > If different value is used for page and column stats, then the constants sh Moved it to HdfsParquetTableWriter and renamed it to PAGE_INDEX_TRUNCATE_LENGTH. http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@749 PS12, Line 749: Status s_max = TruncateMaxValue(page_stats.max_value, PAGE_INDEX_TRUNCATE_LENGTH, : &max_val); : if (!s_min.ok() || !s_max.ok()) valid_page_index_ = false; : column_index_. > I can't find TruncateMinValue and TruncateMaxValue in code. Sorry, I've uploaded it now. http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@1221 PS12, Line 1221: // We always set null_counts. > nit: long line Done http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py File tests/query_test/test_parquet_page_index.py: http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@171 PS12, Line 171: if (isinstance(page_max_value, basestring) and > The != should be < - what happens if column_max_value (e.g. "bb") is shorte Oops, you are absolutely right. I use the truncate length now. http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@172 PS12, Line 172: len(page_max_value) == PAGE_INDEX_TRUNCATE_LENGTH): > This may not be true if the last two characters contain the max char value. Fixed that as well. -- To view, visit http://gerrit.cloudera.org:8080/9693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9 Gerrit-Change-Number: 9693 Gerrit-PatchSet: 13 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Anonymous Coward #248 Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 25 Apr 2018 10:06:14 +0000 Gerrit-HasComments: Yes