Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 12:

(5 comments)

Do we have tests for the edge cases? For example a table with columns of 
63/64/65 length strings.

http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@87
PS12, Line 87: const int MAX_STAT_VALUE_LENGTH = 64;
If different value is used for page and column stats, then the constants should 
be placed close to each other and their name should reflect their role.


http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@749
PS12, Line 749:     Status s_min = TruncateMinValue(page_stats.min_value, 
MAX_STAT_VALUE_LENGTH,
              :         &min_val);
              :     Status s_max = TruncateMaxValue(page_stats.max_value, 
MAX_STAT_VALUE_LENGTH,
              :         &max_val);
I can't find TruncateMinValue and TruncateMaxValue in code.


http://gerrit.cloudera.org:8080/#/c/9693/12/be/src/exec/hdfs-parquet-table-writer.cc@1221
PS12, Line 1221:     
column.column_index_.__set_boundary_order(column.row_group_stats_base_->GetBoundaryOrder());
nit: long line


http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@171
PS12, Line 171:             len(page_max_value) != len(column_max_value)):
The != should be < - what happens if column_max_value (e.g. "bb") is shorter 
then page_max_value (e.g "aaaa")?

I would also consider using the exact max length used for page stats, to make 
the tests as strict as possible.


http://gerrit.cloudera.org:8080/#/c/9693/12/tests/query_test/test_parquet_page_index.py@172
PS12, Line 172:           assert page_max_value[:-1] <= 
column_max_value[:len(page_max_value) - 1]
This may not be true if the last two characters contain the max char value.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 24 Apr 2018 10:42:46 +0000
Gerrit-HasComments: Yes

Reply via email to