Tim Armstrong has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer ......................................................................
Patch Set 7: Code-Review+1 (9 comments) http://gerrit.cloudera.org:8080/#/c/5611/6/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS6, Line 178: ProcessValue > Marcel had suggested that name, but I'm good with either. Marcel, do you ha That's fine then, no need to keep renaming it :) Line 389: virtual bool ProcessValue(void* value, int64_t* bytes_needed) { > Done, though it has the same number of lines, but now uses two return state It doesn't make a big difference in this case - we just tend to use the early-return pattern. http://gerrit.cloudera.org:8080/#/c/5611/7/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: Line 325: self.execute_query("drop table %s" % qualified_table_name) Not needed - it should be dropped with the unique_database Line 434: def test_write_statistics_multiple_row_groups(self, vector, unique_database): Nice! PS7, Line 446: num_lines num_rows? Line 447: query = "create table %s like %s stored as parquet" % \ A while back someone who was more up-to-date on python suggested that it was better to use .format() instead of % for string formatting. E.g. https://docs.python.org/3.4/library/stdtypes.html#old-string-formatting I don't feel strongly but thought I should mention it. Line 465: assert l.max < r.min Maybe this should be <=? E.g. consider two row groups that only have one value for that column. Line 467: self.execute_query("drop table %s" % qualified_target_table) Not needed - it should be dropped with the unique_database Line 469: def test_write_statistics_float_infinity(self, vector, unique_database): Didn't think of this - good catch. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
