Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )

Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/9693/10/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/10/be/src/exec/hdfs-parquet-table-writer.cc@301
PS10, Line 301:   std::vector<std::string> min_values_;
> Options 1 or 2 seem fine to me.
Chose option 1, but also added a memtracker.


http://gerrit.cloudera.org:8080/#/c/9693/10/be/src/exec/hdfs-parquet-table-writer.cc@735
PS10, Line 735:     min_values_.push_back(std::string(""));
> I don't know if we need the call to std::string() here, I think it should w
Right. Now the code looks a bit different, and I call push_back after the 'if'.


http://gerrit.cloudera.org:8080/#/c/9693/10/be/src/exec/hdfs-parquet-table-writer.cc@1227
PS10, Line 1227:   for (auto& column : columns_) {
> nit: can fit loop on one line.
Done


http://gerrit.cloudera.org:8080/#/c/9693/10/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

http://gerrit.cloudera.org:8080/#/c/9693/10/be/src/exec/parquet-column-stats.h@159
PS10, Line 159:   // If true, min/max values are ascending.
> Maybe briefly mention why they both start off true? And both can be true at
Added some explanation here and to GetBoundaryOrder().


http://gerrit.cloudera.org:8080/#/c/9693/10/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/9693/10/tests/query_test/test_parquet_page_index.py@37
PS10, Line 37: class TestHdfsParquetTableIndexWriter(ImpalaTestSuite):
> We've got a lot of good coverage in this test.
I agree. It would be best to have a fully-fledged parquet reader implementation 
in Python that we could easily use.
I only found https://pypi.org/project/parquet/ but it seems uncomplete.

Or, maybe we can hard-code the expected values for a concrete parquet file.


http://gerrit.cloudera.org:8080/#/c/9693/10/tests/query_test/test_parquet_page_index.py@177
PS10, Line 177: previouse_value
> typo in variable name
Done


http://gerrit.cloudera.org:8080/#/c/9693/10/tests/query_test/test_parquet_page_index.py@205
PS10, Line 205: falied
> nit: failed
Done


http://gerrit.cloudera.org:8080/#/c/9693/10/tests/query_test/test_parquet_page_index.py@205
PS10, Line 205: column_info_schema
> this variable isn't defined - did you mean column_info?
Done


http://gerrit.cloudera.org:8080/#/c/9693/10/tests/query_test/test_parquet_page_index.py@244
PS10, Line 244: chars_formats
> chars_formats is weird in that it's created by a different test - TestChars
I moved table creation to 'testdata/bin/load-dependent-tables.sql'.



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Apr 2018 15:50:20 +0000
Gerrit-HasComments: Yes

Reply via email to