Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 39: /// TIMESTAMP values are written in the in-memory format used by 
Impala, relative to UTC,
> Lars, if I remember correctly you found that Hive does not write statistics
That's concerning if Hive and parquet-mr are inconsistent in how they define 
ordering for the types. I was assuming that parquet-mr would be the reference 
implementation: 
https://github.com/Parquet/parquet-mr/tree/master/parquet-column/src/main/java/parquet/column/statistics

I'm looking at the specification for statistics and it seems like they don't 
actually specify how min/max are determined: 

https://github.com/Parquet/parquet-format/issues/59

Is there some other place that this might be specified? If not I should open an 
issue against Parquet to get this clarified.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to