[ 
https://issues.apache.org/jira/browse/IMPALA-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated IMPALA-7730:
------------------------------------
    Attachment: orc.zip

> Improve ORC File Format Timezone issues
> ---------------------------------------
>
>                 Key: IMPALA-7730
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7730
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 3.0
>            Reporter: Philip Zeyliger
>            Priority: Major
>         Attachments: orc.zip
>
>
> As pointed out in https://gerrit.cloudera.org/#/c/11731 by [~csringhofer], 
> our support for the ORC file format doesn't follow the same timezone 
> conventions as the rest of Impala.
> {quote}
> tldr: ORC's timezone handling is likely to be broken in Impala so we should 
> patch it in the toolchain
> The ORC library implements its own IANA timezone handling to convert stored 
> timestamps from UTC to local time + do something similar for min/max stats. 
> The writer's timezone can be also stored in .orc files and used instead of 
> local timezone.
> Impala's and ORC library's timezone can be different because of several 
> reasons:
> ORC's timezone is not overridden by env var TZ and query option timezone
> ORC uses a simpler way to detect the local timezone which may not work on 
> some Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs 
> LOCAL_TIMEZONE in Orc)
> .orc files can use any time zone as writer's timezone and we cannot be sure 
> that it will exist on the reader machine
> My suggestion is to patch the ORC library in the toolchain and remove 
> timezone handling (e.g. by always using UTC, maybe depending on a flag), as 
> the way it is currently working is likely to be broken and is surely not 
> consistent with the rest of Impala.
> I am not sure how timezones could be handled correctly in Orc + Impala. If 
> someone plans to work on it, I would gladly help in the integration to Impala.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to