Zoltan Ivanfi has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5611/2/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 39: /// TIMESTAMP values are written in the in-memory format used by 
Impala, relative to UTC,
> That's concerning if Hive and parquet-mr are inconsistent in how they defin
It's not that Hive and parquet-mr do it differently, it's simply that there is 
no such type as our timestamp in Parquet (there is a separate timestamp_millis 
type), so parquet-mr doesn't handle this type at all, while Hive does. From 
Parquet's perspective it's just 96 bits without meaning, so probably the min 
and max should be calculated accordingly as well and not based on the base type.

On the other hand, if Hive already has some implementation, we should probably 
use the same logic. But if I remember correctly, Lars looked at Hive's min/max 
values for timestamps and found that they were not the min/max values in any 
way, neither as instants on a timeline, neither as the 96 bits in any byte 
order.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to