Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 136:     if (row_group_stats_base_->has_values()
oops, i missed that bug


Line 339: 
> Yes, the dictionary encoders use it as the size of the encoded values. Shou
sounds good


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 139:       row_group_stats_base_->EncodeToThrift(&encoded);
why not just pass in metadata->statistics and then set the __isset flag by hand?


Line 652:     page_stats_base_->EncodeToThrift(&encoded);
avoid copy by passing in header.data_page_header.statistics


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 103:   /// Maximum statistics size. If the size of a single thrift 
Statistics struct for a page
qualify as 'parquet.Statistics' so it's clearer


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 127:     }
> The stats should follow the ordering semantics defined in parquet-format. T
i understand how that may not be the case today, but in order for them to be 
useful, they need to mirror the ordering semantics of the underlying type. 
meaning i still don't see why there should be stats-specific comparisons and 
why we want this todo.


http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 84:   // clang-format off
remove


Line 87:       || std::is_same<bool, T>::value
indent the subsequent lines belonging to the logical expr two more spaces 
(style guide)


Line 157:     // clang-format off
this is very verbose. why needed?


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to