There are two different settings inside a Parquet file: physical storage
and loigcal annotation.  A timestamp should be stored as a physical INT64
with the TIMESTAMP_MILLI annotation.  See here:

https://github.com/apache/parquet-format/blob/master/src/thrift/parquet.thrift#L105

On Mon, Jul 13, 2015 at 7:47 AM, Stefán Baxter <[email protected]>
wrote:

> thank you.
>
> I had seen this. I was just expecting the list to say 'TIMESTAMP_MILLI' :)
> (that would up the confidence level for a newbie)
>
> Regards,
>  -Stefan
>
> On Mon, Jul 13, 2015 at 2:44 PM, Kristine Hahn <[email protected]> wrote:
>
> > Expected, I think.
> >
> >
> https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types
> > says
> > that the timestamp type is mapped to the Parquet TIMESTAMP_MILLI, which
> is
> > a Unix timestamp (int64). Take a look at
> > https://drill.apache.org/docs/data-type-conversion/#to_timestamp and the
> > Timezone Limitations section.
> >
> > On Monday, July 13, 2015, Stefán Baxter <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > I have a json file that contains a SQL timestamp.
> > >
> > > When I use it to create a Parquet file it seems to become a INT64:
> > >
> > > Jul 12, 2015 3:34:59 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore:
> > > written 153,728B for [occurred_at] INT64: 28,910 values, 231,288B raw,
> > > 153,681B comp, 1 pages, encodings: [RLE, BIT_PACKED, PLAIN]
> > >
> > > Is that to be expected or am I missing something that needs to be done
> > for
> > > it to become a timestamp in Parquet?
> > >
> > > Regards,
> > >  -Stefan
> > >
> >
> >
> > --
> > Kristine Hahn
> > Sr. Technical Writer
> > 415-497-8107 @krishahn
> >
>

Reply via email to