[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-28 Thread Karen Coppage (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803690#comment-16803690
 ] 

Karen Coppage commented on HIVE-21290:
--

Thanks, Jesus! branch-3 and 3.1 files are attached.

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0, 3.2.0, 3.1.2
>
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch, HIVE-21290.branch-3.1.patch, 
> HIVE-21290.branch-3.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-27 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802892#comment-16802892
 ] 

Jesus Camacho Rodriguez commented on HIVE-21290:


I had not pushed new binary files... I have pushed an addendum in a new commit.

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802101#comment-16802101
 ] 

Jesus Camacho Rodriguez commented on HIVE-21290:


+1

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801623#comment-16801623
 ] 

Hive QA commented on HIVE-21290:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 4s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
28s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} common: The patch generated 0 new + 3 unchanged - 2 
fixed = 3 total (was 5) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 22 new + 195 unchanged - 20 
fixed = 217 total (was 215) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
10s{color} | {color:red} root: The patch generated 22 new + 198 unchanged - 22 
fixed = 220 total (was 220) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16685/dev-support/hive-personality.sh
 |
| git revision | master / 80998ad |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus/diff-checkstyle-root.txt
 |
| modules | C: common ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, 

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801622#comment-16801622
 ] 

Hive QA commented on HIVE-21290:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963705/HIVE-21290.5.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15842 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16685/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16685/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963705 - PreCommit-HIVE-Build

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801027#comment-16801027
 ] 

Hive QA commented on HIVE-21290:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
18s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
51s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} common: The patch generated 0 new + 3 unchanged - 2 
fixed = 3 total (was 5) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
44s{color} | {color:red} ql: The patch generated 25 new + 195 unchanged - 20 
fixed = 220 total (was 215) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
10s{color} | {color:red} root: The patch generated 25 new + 198 unchanged - 22 
fixed = 223 total (was 220) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  1m  
4s{color} | {color:red} ql generated 2 new + 98 unchanged - 2 fixed = 100 total 
(was 100) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  8m 
47s{color} | {color:red} root generated 2 new + 399 unchanged - 2 fixed = 401 
total (was 401) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 72m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16673/dev-support/hive-personality.sh
 |
| git revision | master / c279634 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-checkstyle-root.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-javadoc-javadoc-ql.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-javadoc-javadoc-root.txt
 |
| modules | C: common ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> 

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801024#comment-16801024
 ] 

Hive QA commented on HIVE-21290:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963644/HIVE-21290.4.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testJulianDay
 (batchId=299)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16673/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16673/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16673/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963644 - PreCommit-HIVE-Build

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, HIVE-21290.4.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-22 Thread Karen Coppage (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799004#comment-16799004
 ] 

Karen Coppage commented on HIVE-21290:
--

Patch 3 checkstyle issues are because these changes follow indentation of 
existing code.

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-22 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798984#comment-16798984
 ] 

Hive QA commented on HIVE-21290:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
39s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
18s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 23 new + 193 unchanged - 22 
fixed = 216 total (was 215) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
3s{color} | {color:red} root: The patch generated 23 new + 194 unchanged - 22 
fixed = 217 total (was 216) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16631/dev-support/hive-personality.sh
 |
| git revision | master / 2fa22bf |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16631/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16631/yetus/diff-checkstyle-root.txt
 |
| modules | C: common ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16631/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps 

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-22 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798977#comment-16798977
 ] 

Hive QA commented on HIVE-21290:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963386/HIVE-21290.3.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15837 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16631/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16631/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16631/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963386 - PreCommit-HIVE-Build

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798278#comment-16798278
 ] 

Hive QA commented on HIVE-21290:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963240/HIVE-21290.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 15834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_analyze] 
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_external_time] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_stats] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_0] 
(batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=61)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_0]
 (batchId=118)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16613/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16613/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16613/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963240 - PreCommit-HIVE-Build

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798277#comment-16798277
 ] 

Hive QA commented on HIVE-21290:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
16s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
44s{color} | {color:red} ql: The patch generated 25 new + 193 unchanged - 22 
fixed = 218 total (was 215) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m  
2s{color} | {color:red} root: The patch generated 25 new + 194 unchanged - 22 
fixed = 219 total (was 216) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16613/dev-support/hive-personality.sh
 |
| git revision | master / 38682a4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16613/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16613/yetus/diff-checkstyle-root.txt
 |
| modules | C: common ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16613/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and 

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797798#comment-16797798
 ] 

Hive QA commented on HIVE-21290:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963135/HIVE-21290.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16602/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16602/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16602/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-21 03:54:32.141
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16602/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-21 03:54:32.145
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 25b14be HIVE-21460: ACID: Load data followed by a select * query 
results in incorrect results (Vaibhav Gumashta, reviewed by Gopal V)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 25b14be HIVE-21460: ACID: Load data followed by a select * query 
results in incorrect results (Vaibhav Gumashta, reviewed by Gopal V)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-21 03:54:33.336
+ rm -rf ../yetus_PreCommit-HIVE-Build-16602
+ mkdir ../yetus_PreCommit-HIVE-Build-16602
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16602
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16602/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: cannot apply binary patch to 
'data/files/parquet_historical_timestamp_legacy.parq' without full index line
Falling back to three-way merge...
error: cannot apply binary patch to 
'data/files/parquet_historical_timestamp_legacy.parq' without full index line
error: data/files/parquet_historical_timestamp_legacy.parq: patch does not apply
error: cannot apply binary patch to 
'data/files/parquet_historical_timestamp_new.parq' without full index line
Falling back to three-way merge...
error: cannot apply binary patch to 
'data/files/parquet_historical_timestamp_new.parq' without full index line
error: data/files/parquet_historical_timestamp_new.parq: patch does not apply
error: src/java/org/apache/hadoop/hive/common/type/Timestamp.java: does not 
exist in index
error: cannot apply binary patch to 
'files/parquet_historical_timestamp_legacy.parq' without full index line
Falling back to three-way merge...
error: cannot apply binary patch to 
'files/parquet_historical_timestamp_legacy.parq' without full index line
error: files/parquet_historical_timestamp_legacy.parq: patch does not apply
error: cannot apply binary patch to 
'files/parquet_historical_timestamp_new.parq' without full index line
Falling back to three-way merge...
error: cannot apply binary patch to 
'files/parquet_historical_timestamp_new.parq' without full index line
error: files/parquet_historical_timestamp_new.parq: patch does not apply
error: 
src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java: does 
not exist in index
error: 
src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java:
 does not exist in index
error: 
src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java: 
does not exist in index
error: 
src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java:
 does not exist in index
error: 

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-20 Thread Karen Coppage (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797158#comment-16797158
 ] 

Karen Coppage commented on HIVE-21290:
--

Patch 1 notes:
* Timestamps are converted from JVM time zone, not session ("set time zone...") 
time zone, this is for backwards compatibility reasons.
*  The writer time zone has to be passed through all the vectorized readers so 
that 
org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory.TypesFromInt96PageReader#convert
 can correctly convert int96 to Timestamp.
* ^ It might be a better idea to pass the entire reader metadata (Map with ~5 elements) instead of extracting skipConversion (boolean) and 
writerTimezone (ZoneId) and passing these through all those constructors. Any 
input is welcome.


> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)