Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 13:

(5 comments)

Thanks, Csaba!
Yes, we have tests for the edge cases in string-util-test.cc. And also added a 
new python test function 'test_max_string_values'

http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@87
PS12, Line 87: // Base class for column writers. This contains most of the 
logic except for
> If different value is used for page and column stats, then the constants sh
Moved it to HdfsParquetTableWriter and renamed it to PAGE_INDEX_TRUNCATE_LENGTH.


http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@749
PS12, Line 749:     Status s_max = TruncateMaxValue(page_stats.max_value, 
PAGE_INDEX_TRUNCATE_LENGTH,
              :         &max_val);
              :     if (!s_min.ok() || !s_max.ok()) valid_page_index_ = false;
              :     column_index_.
> I can't find TruncateMinValue and TruncateMaxValue in code.
Sorry, I've uploaded it now.


http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@1221
PS12, Line 1221:     // We always set null_counts.
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@171
PS12, Line 171:         if (isinstance(page_max_value, basestring) and
> The != should be < - what happens if column_max_value (e.g. "bb") is shorte
Oops, you are absolutely right.
I use the truncate length now.


http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@172
PS12, Line 172:             len(page_max_value) == PAGE_INDEX_TRUNCATE_LENGTH):
> This may not be true if the last two characters contain the max char value.
Fixed that as well.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Apr 2018 10:06:14 +0000
Gerrit-HasComments: Yes

Reply via email to