Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer ......................................................................
Patch Set 10: (9 comments) http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 136: if (row_group_stats_base_->has_values() oops, i missed that bug Line 339: > Yes, the dictionary encoders use it as the size of the encoded values. Shou sounds good http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 139: row_group_stats_base_->EncodeToThrift(&encoded); why not just pass in metadata->statistics and then set the __isset flag by hand? Line 652: page_stats_base_->EncodeToThrift(&encoded); avoid copy by passing in header.data_page_header.statistics http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: Line 103: /// Maximum statistics size. If the size of a single thrift Statistics struct for a page qualify as 'parquet.Statistics' so it's clearer http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 127: } > The stats should follow the ordering semantics defined in parquet-format. T i understand how that may not be the case today, but in order for them to be useful, they need to mirror the ordering semantics of the underlying type. meaning i still don't see why there should be stats-specific comparisons and why we want this todo. http://gerrit.cloudera.org:8080/#/c/5611/10/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 84: // clang-format off remove Line 87: || std::is_same<bool, T>::value indent the subsequent lines belonging to the logical expr two more spaces (style guide) Line 157: // clang-format off this is very verbose. why needed? -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
