I think this document contains most of the challenges around timestamp
definition and management.  It's a long read but has the details
behind much of what you have mentioned.
https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit?usp=sharing



On Wed, Dec 19, 2018 at 11:33 AM Boris Tyukin <bo...@boristyukin.com> wrote:
>
> Hello,
>
> I am trying to understand the reasons behind this decision by Impala devs.
>
> From Impala docs:
> http://impala.apache.org/docs/build/html/topics/impala_timestamp.html
>
> By default, Impala stores and interprets TIMESTAMP values in UTC time zone 
> when writing to data files, reading from data files, or converting to and 
> from system time values through functions.
>
> And there are there two switches to change this behavior:
>
> use_local_tz_for_unix_timestamp_conversions
> convert_legacy_hive_parquet_utc_timestamps (performance killer that has just 
> been fixed in the latest Impala release which has not made to CDH yet)
>
> My question is what are the thought process and reasons to do this conversion 
> in the first place  from UTC and having Impala "assume" that timestamp is 
> always UTC?
>
> This is not how Hive or Spark or anything else I've seen before does it. This 
> is really unusual and causes tons of confusion if you try to use the same 
> data set from Hive, Spark and Impala, so when Impala is not the only thing on 
> a cluster.
>
> And second option, why there is no option NOT to convert the time in the 
> first place and just use the one which was intended to be stored? So if I 
> stored 2015-01-01 12:12:00 whatever time zone time is, I still want to see 
> that exact time in Impala, Hive and Spark and I do not need Impala converting 
> this time to my local cluster time.
>
> I am sure there is a reason for that just struggling to understand it...
>
> Thanks,
> Boris

Reply via email to