[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803690#comment-16803690 ] Karen Coppage commented on HIVE-21290: -- Thanks, Jesus! branch-3 and 3.1 files are attached. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Fix For: 4.0.0, 3.2.0, 3.1.2 > > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, > HIVE-21290.4.patch, HIVE-21290.5.patch, HIVE-21290.branch-3.1.patch, > HIVE-21290.branch-3.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802892#comment-16802892 ] Jesus Camacho Rodriguez commented on HIVE-21290: I had not pushed new binary files... I have pushed an addendum in a new commit. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, > HIVE-21290.4.patch, HIVE-21290.5.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802101#comment-16802101 ] Jesus Camacho Rodriguez commented on HIVE-21290: +1 > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, > HIVE-21290.4.patch, HIVE-21290.5.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801623#comment-16801623 ] Hive QA commented on HIVE-21290: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 4s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 28s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 52s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} common: The patch generated 0 new + 3 unchanged - 2 fixed = 3 total (was 5) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 45s{color} | {color:red} ql: The patch generated 22 new + 195 unchanged - 20 fixed = 217 total (was 215) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 10s{color} | {color:red} root: The patch generated 22 new + 198 unchanged - 22 fixed = 220 total (was 220) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16685/dev-support/hive-personality.sh | | git revision | master / 80998ad | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus/diff-checkstyle-root.txt | | modules | C: common ql . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch,
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801622#comment-16801622 ] Hive QA commented on HIVE-21290: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12963705/HIVE-21290.5.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15842 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16685/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16685/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16685/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12963705 - PreCommit-HIVE-Build > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, > HIVE-21290.4.patch, HIVE-21290.5.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801027#comment-16801027 ] Hive QA commented on HIVE-21290: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 58s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 1s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 18s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 51s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} common: The patch generated 0 new + 3 unchanged - 2 fixed = 3 total (was 5) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s{color} | {color:red} ql: The patch generated 25 new + 195 unchanged - 20 fixed = 220 total (was 215) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 10s{color} | {color:red} root: The patch generated 25 new + 198 unchanged - 22 fixed = 223 total (was 220) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 4s{color} | {color:red} ql generated 2 new + 98 unchanged - 2 fixed = 100 total (was 100) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 47s{color} | {color:red} root generated 2 new + 399 unchanged - 2 fixed = 401 total (was 401) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 20s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16673/dev-support/hive-personality.sh | | git revision | master / c279634 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-checkstyle-root.txt | | javadoc | http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-javadoc-javadoc-ql.txt | | javadoc | http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus/diff-javadoc-javadoc-root.txt | | modules | C: common ql . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16673/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time >
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801024#comment-16801024 ] Hive QA commented on HIVE-21290: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12963644/HIVE-21290.4.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testJulianDay (batchId=299) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16673/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16673/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16673/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12963644 - PreCommit-HIVE-Build > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, HIVE-21290.4.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799004#comment-16799004 ] Karen Coppage commented on HIVE-21290: -- Patch 3 checkstyle issues are because these changes follow indentation of existing code. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798984#comment-16798984 ] Hive QA commented on HIVE-21290: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 1s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 39s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 18s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 1s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s{color} | {color:red} ql: The patch generated 23 new + 193 unchanged - 22 fixed = 216 total (was 215) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 3s{color} | {color:red} root: The patch generated 23 new + 194 unchanged - 22 fixed = 217 total (was 216) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16631/dev-support/hive-personality.sh | | git revision | master / 2fa22bf | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16631/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16631/yetus/diff-checkstyle-root.txt | | modules | C: common ql . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16631/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798977#comment-16798977 ] Hive QA commented on HIVE-21290: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12963386/HIVE-21290.3.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 15837 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16631/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16631/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16631/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12963386 - PreCommit-HIVE-Build > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch, HIVE-21290.3.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798278#comment-16798278 ] Hive QA commented on HIVE-21290: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12963240/HIVE-21290.2.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 15834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_analyze] (batchId=25) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_external_time] (batchId=14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_stats] (batchId=48) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_0] (batchId=18) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] (batchId=61) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[parquet_vectorization_0] (batchId=118) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16613/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16613/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16613/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12963240 - PreCommit-HIVE-Build > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798277#comment-16798277 ] Hive QA commented on HIVE-21290: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 1s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 38s{color} | {color:blue} common in master has 63 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 16s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 45s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 44s{color} | {color:red} ql: The patch generated 25 new + 193 unchanged - 22 fixed = 218 total (was 215) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 2s{color} | {color:red} root: The patch generated 25 new + 194 unchanged - 22 fixed = 219 total (was 216) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16613/dev-support/hive-personality.sh | | git revision | master / 38682a4 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16613/yetus/diff-checkstyle-ql.txt | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-16613/yetus/diff-checkstyle-root.txt | | modules | C: common ql . U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16613/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, > HIVE-21290.2.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797798#comment-16797798 ] Hive QA commented on HIVE-21290: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12963135/HIVE-21290.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16602/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16602/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16602/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-03-21 03:54:32.141 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-16602/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2019-03-21 03:54:32.145 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 25b14be HIVE-21460: ACID: Load data followed by a select * query results in incorrect results (Vaibhav Gumashta, reviewed by Gopal V) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 25b14be HIVE-21460: ACID: Load data followed by a select * query results in incorrect results (Vaibhav Gumashta, reviewed by Gopal V) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-03-21 03:54:33.336 + rm -rf ../yetus_PreCommit-HIVE-Build-16602 + mkdir ../yetus_PreCommit-HIVE-Build-16602 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-16602 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16602/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: cannot apply binary patch to 'data/files/parquet_historical_timestamp_legacy.parq' without full index line Falling back to three-way merge... error: cannot apply binary patch to 'data/files/parquet_historical_timestamp_legacy.parq' without full index line error: data/files/parquet_historical_timestamp_legacy.parq: patch does not apply error: cannot apply binary patch to 'data/files/parquet_historical_timestamp_new.parq' without full index line Falling back to three-way merge... error: cannot apply binary patch to 'data/files/parquet_historical_timestamp_new.parq' without full index line error: data/files/parquet_historical_timestamp_new.parq: patch does not apply error: src/java/org/apache/hadoop/hive/common/type/Timestamp.java: does not exist in index error: cannot apply binary patch to 'files/parquet_historical_timestamp_legacy.parq' without full index line Falling back to three-way merge... error: cannot apply binary patch to 'files/parquet_historical_timestamp_legacy.parq' without full index line error: files/parquet_historical_timestamp_legacy.parq: patch does not apply error: cannot apply binary patch to 'files/parquet_historical_timestamp_new.parq' without full index line Falling back to three-way merge... error: cannot apply binary patch to 'files/parquet_historical_timestamp_new.parq' without full index line error: files/parquet_historical_timestamp_new.parq: patch does not apply error: src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java: does not exist in index error: src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java: does not exist in index error: src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java: does not exist in index error: src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java: does not exist in index error:
[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time
[ https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797158#comment-16797158 ] Karen Coppage commented on HIVE-21290: -- Patch 1 notes: * Timestamps are converted from JVM time zone, not session ("set time zone...") time zone, this is for backwards compatibility reasons. * The writer time zone has to be passed through all the vectorized readers so that org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory.TypesFromInt96PageReader#convert can correctly convert int96 to Timestamp. * ^ It might be a better idea to pass the entire reader metadata (Map with ~5 elements) instead of extracting skipConversion (boolean) and writerTimezone (ZoneId) and passing these through all those constructors. Any input is welcome. > Restore historical way of handling timestamps in Parquet while keeping the > new semantics at the same time > - > > Key: HIVE-21290 > URL: https://issues.apache.org/jira/browse/HIVE-21290 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Ivanfi >Assignee: Karen Coppage >Priority: Major > Attachments: HIVE-21290.1.patch > > > This sub-task is for implementing the Parquet-specific parts of the following > plan: > h1. Problem > Historically, the semantics of the TIMESTAMP type in Hive depended on the > file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had > _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a > text SerDe had _LocalDateTime_ semantics. > The Hive community wanted to get rid of this inconsistency and have > _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as > well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this > leads to the desired new semantics, it also leads to incorrect results when > new Hive versions read timestamps written by old Hive versions or when old > Hive versions or any other component not aware of this change (including > legacy Impala and Spark versions) read timestamps written by new Hive > versions. > h1. Solution > To work around this issue, Hive *should restore the practice of normalizing > to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary > SerDe. In itself, this would restore the historical _Instant_ semantics, > which is undesirable. In order to achieve the desired _LocalDateTime_ > semantics in spite of normalizing to UTC, newer Hive versions should record > the session-local local time zone in the file metadata fields serving > arbitrary key-value storage purposes. > When reading back files with this time zone metadata, newer Hive versions (or > any other new component aware of this extra metadata) can achieve > _LocalDateTime_ semantics by *converting from UTC to the saved time zone > (instead of to the local time zone)*. Legacy components that are unaware of > the new metadata can read the files without any problem and the timestamps > will show the historical Instant behaviour to them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)