There are two different settings inside a Parquet file: physical storage and loigcal annotation. A timestamp should be stored as a physical INT64 with the TIMESTAMP_MILLI annotation. See here:
https://github.com/apache/parquet-format/blob/master/src/thrift/parquet.thrift#L105 On Mon, Jul 13, 2015 at 7:47 AM, Stefán Baxter <[email protected]> wrote: > thank you. > > I had seen this. I was just expecting the list to say 'TIMESTAMP_MILLI' :) > (that would up the confidence level for a newbie) > > Regards, > -Stefan > > On Mon, Jul 13, 2015 at 2:44 PM, Kristine Hahn <[email protected]> wrote: > > > Expected, I think. > > > > > https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types > > says > > that the timestamp type is mapped to the Parquet TIMESTAMP_MILLI, which > is > > a Unix timestamp (int64). Take a look at > > https://drill.apache.org/docs/data-type-conversion/#to_timestamp and the > > Timezone Limitations section. > > > > On Monday, July 13, 2015, Stefán Baxter <[email protected]> > wrote: > > > > > Hi, > > > > > > I have a json file that contains a SQL timestamp. > > > > > > When I use it to create a Parquet file it seems to become a INT64: > > > > > > Jul 12, 2015 3:34:59 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: > > > written 153,728B for [occurred_at] INT64: 28,910 values, 231,288B raw, > > > 153,681B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN] > > > > > > Is that to be expected or am I missing something that needs to be done > > for > > > it to become a timestamp in Parquet? > > > > > > Regards, > > > -Stefan > > > > > > > > > -- > > Kristine Hahn > > Sr. Technical Writer > > 415-497-8107 @krishahn > > >
