[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2021-06-25 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-21291:
---
Labels: compatibility timestamp  (was: )

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
>  Labels: compatibility, timestamp
> Fix For: 3.1.2, 3.2.0, 4.0.0
>
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, 
> HIVE-21291.7.patch, HIVE-21291.branch-3.1.patch, HIVE-21291.branch-3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-26 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21291:
---
Fix Version/s: 3.1.2
   3.2.0

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0, 3.2.0, 3.1.2
>
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, 
> HIVE-21291.7.patch, HIVE-21291.branch-3.1.patch, HIVE-21291.branch-3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-26 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.branch-3.1.patch

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, 
> HIVE-21291.7.patch, HIVE-21291.branch-3.1.patch, HIVE-21291.branch-3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-26 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.branch-3.patch

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, 
> HIVE-21291.7.patch, HIVE-21291.branch-3.1.patch, HIVE-21291.branch-3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-25 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21291:
---
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~klcopp].

Would you mind to rebase on top of branch-3 and branch-3.1 so we can backport 
to those branches too?

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, HIVE-21291.7.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-24 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.7.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, HIVE-21291.7.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-24 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-23 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-23 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.7.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-23 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-23 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.6.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-18 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, HIVE-21291.5.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-18 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.5.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, HIVE-21291.5.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-09 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-09 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.4.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-08 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.4.patch
Status: Patch Available  (was: In Progress)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-08 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-08 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.3.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-08 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-05 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.2.patch
Status: Patch Available  (was: Open)

Patch 2: Fixed a bug and some golden files. Still waiting on more info re: 
Kafka and Hbase.

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-05 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-04 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.1.patch
Status: Patch Available  (was: In Progress)

This is a WIP because the AvroSerDe seems to be used by KafkaSerDe and 
HBaseSerde to deserialize files and/or structs.

This would mean structs deserialized by Avro in HBase 
[(example)|https://blog.cloudera.com/blog/2016/05/how-to-improve-apache-hbase-performance-via-data-serialization-with-apache-avro/]
 would have Instant semantics, which is backwards compatible, if not optimal.

Kafka integration might cause further problems with interoperability, depending 
on which components might be reading from the Kafka clusters that Hive is 
reading/writing from. Still waiting on an expert opinion.

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-02-19 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated HIVE-21291:
-
Description: 
This sub-task is for implementing the Avro-specific parts of the following plan:

h1. Problem

Historically, the semantics of the TIMESTAMP type in Hive depended on the file 
format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
_Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a text 
SerDe had _LocalDateTime_ semantics.

The Hive community wanted to get rid of this inconsistency and have 
_LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
leads to the desired new semantics, it also leads to incorrect results when new 
Hive versions read timestamps written by old Hive versions or when old Hive 
versions or any other component not aware of this change (including legacy 
Impala and Spark versions) read timestamps written by new Hive versions.

h1. Solution

To work around this issue, Hive *should restore the practice of normalizing to 
UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. 
In itself, this would restore the historical _Instant_ semantics, which is 
undesirable. In order to achieve the desired _LocalDateTime_ semantics in spite 
of normalizing to UTC, newer Hive versions should record the session-local 
local time zone in the file metadata fields serving arbitrary key-value storage 
purposes.

When reading back files with this time zone metadata, newer Hive versions (or 
any other new component aware of this extra metadata) can achieve 
_LocalDateTime_ semantics by *converting from UTC to the saved time zone 
(instead of to the local time zone)*. Legacy components that are unaware of the 
new metadata can read the files without any problem and the timestamps will 
show the historical Instant behaviour to them.

  was:
This sub-task is for implementing the Avro-specific parts of the following plan:

h1. Problem

Historically, the semantics of the TIMESTAMP type in Hive depended on the file 
format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
_Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a text 
SerDe had _LocalDateTime_ semantics.

The Hive community wanted to get rid of this inconsistency and have 
_LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
leads to the desired new semantics, it also leads to incorrect results when new 
Hive versions read timestamps written by old Hive versions or when old Hive 
versions or any other component not aware of this change (including legacy 
Impala and Spark versions) read timestamps written by new Hive versions.

h1. Solution

To work around this issue, Hive *should restore the practice of normalizing to 
UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. 
In itself, this would restore the historical _Instant_ semantics, which is 
undesirable. In order to achieve the desired _LocalDateTime_ semantics in spite 
of normalizing to UTC, newer Hive versions should record the session-local 
local time zone in the file metadata fields serving arbitrary key-value storage 
purposes.

 When reading back files with this time zone metadata, newer Hive versions (or 
any other new component aware of this extra metadata) can achieve 
_LocalDateTime_ semantics by *converting from UTC to the saved time zone 
(instead of to the local time zone)*. Legacy components that are unaware of the 
new metadata can read the files without any problem and the timestamps will 
show the historical Instant behaviour to them.


> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to 

[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-02-19 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated HIVE-21291:
-
Description: 
This sub-task is for implementing the Avro-specific parts of the following plan:

h1. Problem

Historically, the semantics of the TIMESTAMP type in Hive depended on the file 
format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
_Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a text 
SerDe had _LocalDateTime_ semantics.

The Hive community wanted to get rid of this inconsistency and have 
_LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
leads to the desired new semantics, it also leads to incorrect results when new 
Hive versions read timestamps written by old Hive versions or when old Hive 
versions or any other component not aware of this change (including legacy 
Impala and Spark versions) read timestamps written by new Hive versions.

h1. Solution

To work around this issue, Hive *should restore the practice of normalizing to 
UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. 
In itself, this would restore the historical _Instant_ semantics, which is 
undesirable. In order to achieve the desired _LocalDateTime_ semantics in spite 
of normalizing to UTC, newer Hive versions should record the session-local 
local time zone in the file metadata fields serving arbitrary key-value storage 
purposes.

 When reading back files with this time zone metadata, newer Hive versions (or 
any other new component aware of this extra metadata) can achieve 
_LocalDateTime_ semantics by *converting from UTC to the saved time zone 
(instead of to the local time zone)*. Legacy components that are unaware of the 
new metadata can read the files without any problem and the timestamps will 
show the historical Instant behaviour to them.

  was:
This sub-task is for implementing the Avro-specific parts of the following plan:

h1. Problem

Historically, the semantics of the TIMESTAMP type in Hive depended on the file 
format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
_Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a text 
SerDe had _LocalDateTime_ semantics.

The Hive community wanted to get rid of this inconsistency and have 
_LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
leads to the desired new semantics, it also leads to incorrect results when new 
Hive versions read timestamps written by old Hive versions or when old Hive 
versions or any other component not aware of this change (including legacy 
Impala and Spark versions) read timestamps written by new Hive versions.

h1. Solution

To work around this issue, Hive *should restore the practice of normalizing to 
UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. 
In itself, this would restore the historical _Instant_ semantics, which is 
undesirable. In order to achieve the desired _LocalDateTime_ semantics in spite 
of normalizing to UTC, newer Hive versions should record the session-local 
local time zone in the file metadata fields serving arbitrary key-value storage 
purposes.



> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a