[jira] [Commented] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.
[ https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221722#comment-16221722 ] Colin Ma commented on HIVE-17381: - [~vihangk1], the patch for branch-2 is uploaded, please help to merge this, thanks. > When we enable Parquet Writer Version V2, hive throws an exception: > Unsupported encoding: DELTA_BYTE_ARRAY. > --- > > Key: HIVE-17381 > URL: https://issues.apache.org/jira/browse/HIVE-17381 > Project: Hive > Issue Type: Sub-task >Reporter: Ke Jia >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-17381-branch-2.patch, HIVE-17381.001.patch > > > when we set "hive.vectorized.execution.enabled=true" and > "parquet.writer.version=v2" simultaneously, hive throws the following > exception: > Caused by: java.io.IOException: java.io.IOException: > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.UnsupportedOperationException: > Unsupported encoding: DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) > ... 16 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.
[ https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Ma updated HIVE-17381: Attachment: HIVE-17381-branch-2.patch > When we enable Parquet Writer Version V2, hive throws an exception: > Unsupported encoding: DELTA_BYTE_ARRAY. > --- > > Key: HIVE-17381 > URL: https://issues.apache.org/jira/browse/HIVE-17381 > Project: Hive > Issue Type: Sub-task >Reporter: Ke Jia >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-17381-branch-2.patch, HIVE-17381.001.patch > > > when we set "hive.vectorized.execution.enabled=true" and > "parquet.writer.version=v2" simultaneously, hive throws the following > exception: > Caused by: java.io.IOException: java.io.IOException: > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.UnsupportedOperationException: > Unsupported encoding: DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) > ... 16 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17743) Add InterfaceAudience and InterfaceStability annotations for Thrift generated APIs
[ https://issues.apache.org/jira/browse/HIVE-17743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221717#comment-16221717 ] Sahil Takiar commented on HIVE-17743: - Thanks for taking care of this [~kgyrtkirk]. I could have sworn I built hive before pushing the commit, regardless I'll be sure to use that git command in the future. Thanks for the tip! > Add InterfaceAudience and InterfaceStability annotations for Thrift generated > APIs > -- > > Key: HIVE-17743 > URL: https://issues.apache.org/jira/browse/HIVE-17743 > Project: Hive > Issue Type: Sub-task > Components: Thrift API >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 3.0.0 > > Attachments: HIVE-17743.1.patch, HIVE-17743.2.patch > > > The Thrift generated files don't have {{InterfaceAudience}} or > {{InterfaceStability}} annotations on them, mainly because all the files are > auto-generated. > We should add some code that auto-tags all the Java Thrift generated files > with these annotations. This way even when they are re-generated, they still > contain the annotations. > We should be able to do this using the > {{com.google.code.maven-replacer-plugin}} similar to what we do in > {{standalone-metastore/pom.xml}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17912) org.apache.hadoop.hive.metastore.security.DBTokenStore - Parameterize Logging
[ https://issues.apache.org/jira/browse/HIVE-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221708#comment-16221708 ] Hive QA commented on HIVE-17912: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894168/HIVE-17912.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11327 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7495/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7495/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7495/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894168 - PreCommit-HIVE-Build > org.apache.hadoop.hive.metastore.security.DBTokenStore - Parameterize Logging > - > > Key: HIVE-17912 > URL: https://issues.apache.org/jira/browse/HIVE-17912 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: HIVE-17912.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221671#comment-16221671 ] Hive QA commented on HIVE-14731: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894040/HIVE-14731.addendum.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7494/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7494/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7494/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-10-27 03:43:34.732 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-7494/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-10-27 03:43:34.734 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 88bd58e HIVE-17764 : alter view fails when hive.metastore.disallow.incompatible.col.type.changes set to true (Janaki Lahorani, reviewed by Andrew Sherman and Vihang Karajgaonkar) (addendum) + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/plan/AlterWMTriggerDesc.java Removing ql/src/java/org/apache/hadoop/hive/ql/plan/CreateWMTriggerDesc.java Removing ql/src/java/org/apache/hadoop/hive/ql/plan/DropWMTriggerDesc.java Removing standalone-metastore/src/gen/org/ Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMAlterTriggerRequest.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMAlterTriggerResponse.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMCreateTriggerRequest.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMCreateTriggerResponse.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMDropTriggerRequest.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMDropTriggerResponse.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMGetTriggersForResourePlanRequest.java Removing standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/WMGetTriggersForResourePlanResponse.java + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 88bd58e HIVE-17764 : alter view fails when hive.metastore.disallow.incompatible.col.type.changes set to true (Janaki Lahorani, reviewed by Andrew Sherman and Vihang Karajgaonkar) (addendum) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-10-27 03:43:39.284 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/test/results/clientpositive/spark/subquery_multi.q.out:234 error: ql/src/test/results/clientpositive/spark/subquery_multi.q.out: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12894040 - PreCommit-HIVE-Build > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments:
[jira] [Commented] (HIVE-17884) Implement create, alter and drop workload management triggers.
[ https://issues.apache.org/jira/browse/HIVE-17884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221669#comment-16221669 ] Hive QA commented on HIVE-17884: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894107/HIVE-17884.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 11327 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=155) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=101) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch (batchId=270) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7493/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7493/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7493/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894107 - PreCommit-HIVE-Build > Implement create, alter and drop workload management triggers. > -- > > Key: HIVE-17884 > URL: https://issues.apache.org/jira/browse/HIVE-17884 > Project: Hive > Issue Type: Sub-task >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: HIVE-17884.01.patch, HIVE-17884.02.patch > > > Implement triggers for workload management: > The commands to be implemented: > CREATE TRIGGER `resourceplan_name`.`trigger_name` WHEN condition DO action; > condition is a boolean expression: variable operator value types with 'AND' > and 'OR' support. > action is currently: KILL or MOVE TO pool; > ALTER TRIGGER `plan_name`.`trigger_name` WHEN condition DO action; > DROP TRIGGER `plan_name`.`trigger_name`; > Also add WM_TRIGGERS to information schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17911) org.apache.hadoop.hive.metastore.ObjectStore - Tune Up
[ https://issues.apache.org/jira/browse/HIVE-17911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221667#comment-16221667 ] BELUGA BEHR commented on HIVE-17911: Need to investigate the following: {code} 2017-10-26T14:49:56,936 ERROR [main] exec.DDLTask: Failed org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Trying to define foreign key but there are no primary keys or unique keys for referenced table) at org.apache.hadoop.hive.ql.metadata.Hive.addForeignKey(Hive.java:4677) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.DDLTask.addConstraints(DDLTask.java:4360) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:413) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:206) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2276) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1906) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1623) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1362) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1352) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {code} > org.apache.hadoop.hive.metastore.ObjectStore - Tune Up > -- > > Key: HIVE-17911 > URL: https://issues.apache.org/jira/browse/HIVE-17911 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-17911.1.patch > > > # Remove unused variables > # Add logging parameterization > # Use CollectionUtils.isEmpty/isNotEmpty to simplify and unify collection > empty check (and always use null check) > # Minor tweaks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns
[ https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-17874: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > Parquet vectorization fails on tables with complex columns when there are no > projected columns > -- > > Key: HIVE-17874 > URL: https://issues.apache.org/jira/browse/HIVE-17874 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Fix For: 3.0.0 > > Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, > HIVE-17874.02.patch, HIVE-17874.03.patch, HIVE-17874.04.patch, > HIVE-17874.05.patch, HIVE-17874.06.patch > > > When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or > {{UNION}} simple queries like {{select count(*) from table}} fails with > {{unsupported type exception}} even though vectorized reader doesn't really > need read the complex type into batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17841) implement applying the resource plan
[ https://issues.apache.org/jira/browse/HIVE-17841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17841: Attachment: HIVE-17841.03.patch Addressed the CR feedback. > implement applying the resource plan > > > Key: HIVE-17841 > URL: https://issues.apache.org/jira/browse/HIVE-17841 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17841.01.patch, HIVE-17841.02.patch, > HIVE-17841.03.patch, HIVE-17841.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17888) Display the reason for query cancellation
[ https://issues.apache.org/jira/browse/HIVE-17888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221605#comment-16221605 ] Sergey Shelukhin commented on HIVE-17888: - Nit: is it possible to rename reason in OperationState etc to elaborate what the reason is for? Can be fixed on commit +1 pending tests > Display the reason for query cancellation > - > > Key: HIVE-17888 > URL: https://issues.apache.org/jira/browse/HIVE-17888 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17888.1.patch > > > For user convenience and easy debugging, if a trigger kills a query return > the reason for the killing the query. Currently the query kill will only > display the following which is not very useful > {code} > Error: Query was cancelled (state=01000,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17833) Publish split generation counters
[ https://issues.apache.org/jira/browse/HIVE-17833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221604#comment-16221604 ] Sergey Shelukhin commented on HIVE-17833: - Ok I rescind my +1 > Publish split generation counters > - > > Key: HIVE-17833 > URL: https://issues.apache.org/jira/browse/HIVE-17833 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17833.1.patch, HIVE-17833.2.patch, > HIVE-17833.3.patch > > > With TEZ-3856, tez counters are exposed via input initializers which can be > used to publish split generation counters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17918) NPE during semijoin reduction optimization when LLAP caching disabled
[ https://issues.apache.org/jira/browse/HIVE-17918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221600#comment-16221600 ] Sergey Shelukhin commented on HIVE-17918: - There shouldn't ideally be an overload that has the default value, and if there is one the default for "ignore config" should definitely be false... Other than that makes sense > NPE during semijoin reduction optimization when LLAP caching disabled > - > > Key: HIVE-17918 > URL: https://issues.apache.org/jira/browse/HIVE-17918 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17918.1.patch > > > DynamicValue (used by semijoin reduction optimization) relies on the > ObjectCache. If LLAP cache is disabled then the DynamicValue is broken in > LLAP: > {noformat} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:283) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:237) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:254) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:928) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more > Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value > for RS_25_household_demographics_hd_demo_sk_min > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:130) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColumnBetweenDynamicValue.evaluate(FilterLongColumnBetweenDynamicValue.java:80) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:39) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:41) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:112) > at > org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:959) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:828) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:61) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:50) > at >
[jira] [Commented] (HIVE-17595) Correct DAG for updating the last.repl.id for a database during bootstrap load
[ https://issues.apache.org/jira/browse/HIVE-17595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221599#comment-16221599 ] Hive QA commented on HIVE-17595: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894113/HIVE-17595.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11323 tests executed *Failed tests:* {noformat} TestHBaseCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=95) [hbase_ppd_key_range.q,hbasestats.q,hbase_custom_key2.q,hbase_viewjoins.q,hbase_pushdown.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_convert_join] (batchId=52) org.apache.hadoop.hive.cli.TestHBaseCliDriver.org.apache.hadoop.hive.cli.TestHBaseCliDriver (batchId=94) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_custom_key3] (batchId=94) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_null_first_col] (batchId=94) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.exec.TestUtilities.testGetTasksHaveNoRepeats (batchId=281) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7492/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7492/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7492/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894113 - PreCommit-HIVE-Build > Correct DAG for updating the last.repl.id for a database during bootstrap load > -- > > Key: HIVE-17595 > URL: https://issues.apache.org/jira/browse/HIVE-17595 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: anishek > Fix For: 3.0.0 > > Attachments: HIVE-17595.0.patch, HIVE-17595.1.patch > > > We update the last.repl.id as a database property. This is done after all the > bootstrap tasks to load the relevant data are done and is the last task to be > run. however we are currently not setting up the DAG correctly for this task. > This is getting added as the root task for now where as it should be the last > task to be run in a DAG. This becomes more important after the inclusion of > HIVE-17426 since this will lead to parallel execution and incorrect DAG's > will lead to incorrect results/state of the system. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12408) SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions
[ https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221591#comment-16221591 ] Akira Ajisaka commented on HIVE-12408: -- Thank you for the infromation! > SQLStdAuthorizer should not require external table creator to be owner of > directory, in addition to rw permissions > -- > > Key: HIVE-12408 > URL: https://issues.apache.org/jira/browse/HIVE-12408 > Project: Hive > Issue Type: Bug > Components: Authorization, Security, SQLStandardAuthorization >Affects Versions: 0.14.0 > Environment: HDP 2.2 + Kerberos >Reporter: Hari Sekhon >Assignee: Akira Ajisaka >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch > > > When trying to create an external table via beeline in Hive using the > SQLStdAuthorizer it expects the table creator to be the owner of the > directory path and ignores the group rwx permission that is granted to the > user. > {code}Error: Error while compiling statement: FAILED: > HiveAccessControlException Permission denied: Principal [name=hari, > type=USER] does not have following privileges for operation CREATETABLE > [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, > name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code} > All it should be checking is read access to that directory. > The directory owner requirement breaks the ability of more than one user to > create external table definitions to a given location. For example this is a > flume landing directory with json data, and the /etl tree is owned by the > flume user. Even chowning the tree to another user would still break access > to other users who are able to read the directory in hdfs but would still > unable to create external tables on top of it. > This looks like a remnant of the owner only access model in SQLStdAuth and is > a separate issue to HIVE-11864 / HIVE-12324. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17918) NPE during semijoin reduction optimization when LLAP caching disabled
[ https://issues.apache.org/jira/browse/HIVE-17918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17918: -- Attachment: HIVE-17918.1.patch Patch to add a new method to ObjectCacheFactory which returns the LLAP cache even if the cache is disabled. [~sershe] let me know if this is ok. > NPE during semijoin reduction optimization when LLAP caching disabled > - > > Key: HIVE-17918 > URL: https://issues.apache.org/jira/browse/HIVE-17918 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17918.1.patch > > > DynamicValue (used by semijoin reduction optimization) relies on the > ObjectCache. If LLAP cache is disabled then the DynamicValue is broken in > LLAP: > {noformat} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:283) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:237) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:254) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:928) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more > Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value > for RS_25_household_demographics_hd_demo_sk_min > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:130) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColumnBetweenDynamicValue.evaluate(FilterLongColumnBetweenDynamicValue.java:80) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:39) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:41) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:112) > at > org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:959) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:828) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:61) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:50) > at > org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40) >
[jira] [Assigned] (HIVE-17918) NPE during semijoin reduction optimization when LLAP caching disabled
[ https://issues.apache.org/jira/browse/HIVE-17918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere reassigned HIVE-17918: - > NPE during semijoin reduction optimization when LLAP caching disabled > - > > Key: HIVE-17918 > URL: https://issues.apache.org/jira/browse/HIVE-17918 > Project: Hive > Issue Type: Bug >Reporter: Jason Dere >Assignee: Jason Dere > > DynamicValue (used by semijoin reduction optimization) relies on the > ObjectCache. If LLAP cache is disabled then the DynamicValue is broken in > LLAP: > {noformat} > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:283) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:237) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:254) > ... 15 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:928) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more > Caused by: java.lang.IllegalStateException: Failed to retrieve dynamic value > for RS_25_household_demographics_hd_demo_sk_min > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:130) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterLongColumnBetweenDynamicValue.evaluate(FilterLongColumnBetweenDynamicValue.java:80) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:39) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.FilterExprAndExpr.evaluate(FilterExprAndExpr.java:41) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:112) > at > org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:959) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:907) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:828) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:61) > at > org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:50) > at > org.apache.hadoop.hive.ql.exec.ObjectCacheWrapper.retrieve(ObjectCacheWrapper.java:40) > at > org.apache.hadoop.hive.ql.plan.DynamicValue.getValue(DynamicValue.java:123) > ... 27 more > Caused by: java.lang.NullPointerException > at >
[jira] [Commented] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.
[ https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221581#comment-16221581 ] Colin Ma commented on HIVE-17381: - hi, [~vihangk1], got it, I'll update the patch for branch-2 soon. > When we enable Parquet Writer Version V2, hive throws an exception: > Unsupported encoding: DELTA_BYTE_ARRAY. > --- > > Key: HIVE-17381 > URL: https://issues.apache.org/jira/browse/HIVE-17381 > Project: Hive > Issue Type: Sub-task >Reporter: Ke Jia >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-17381.001.patch > > > when we set "hive.vectorized.execution.enabled=true" and > "parquet.writer.version=v2" simultaneously, hive throws the following > exception: > Caused by: java.io.IOException: java.io.IOException: > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.UnsupportedOperationException: > Unsupported encoding: DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) > ... 16 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.
[ https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221577#comment-16221577 ] Vihang Karajgaonkar commented on HIVE-17381: Hi [~colinma] Can you provide a patch for branch-2? I think there were some conflicts when I tried to port this to branch-2. > When we enable Parquet Writer Version V2, hive throws an exception: > Unsupported encoding: DELTA_BYTE_ARRAY. > --- > > Key: HIVE-17381 > URL: https://issues.apache.org/jira/browse/HIVE-17381 > Project: Hive > Issue Type: Sub-task >Reporter: Ke Jia >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-17381.001.patch > > > when we set "hive.vectorized.execution.enabled=true" and > "parquet.writer.version=v2" simultaneously, hive throws the following > exception: > Caused by: java.io.IOException: java.io.IOException: > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.UnsupportedOperationException: > Unsupported encoding: DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) > ... 16 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17433: Attachment: HIVE-17433.08.patch > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17433: Attachment: (was: HIVE-17433.08.patch) > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17458: -- Attachment: HIVE-17458.10.patch > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17901) org.apache.hadoop.hive.ql.exec.Utilities - Use Logging Parameterization and More
[ https://issues.apache.org/jira/browse/HIVE-17901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221557#comment-16221557 ] Hive QA commented on HIVE-17901: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894004/HIVE-17901.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11327 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=172) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7491/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7491/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7491/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894004 - PreCommit-HIVE-Build > org.apache.hadoop.hive.ql.exec.Utilities - Use Logging Parameterization and > More > > > Key: HIVE-17901 > URL: https://issues.apache.org/jira/browse/HIVE-17901 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-17901.1.patch > > > {{org.apache.hadoop.hive.ql.exec.Utilities}} > # Remove unused imports > # Remove unused variables > # Modify logging to use logging parameterization > # Other small tweeks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17833) Publish split generation counters
[ https://issues.apache.org/jira/browse/HIVE-17833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221555#comment-16221555 ] Prasanth Jayachandran commented on HIVE-17833: -- [~sershe] Commented in wrong jira? Can you please transfer your vote to HIVE-17888 :) > Publish split generation counters > - > > Key: HIVE-17833 > URL: https://issues.apache.org/jira/browse/HIVE-17833 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17833.1.patch, HIVE-17833.2.patch, > HIVE-17833.3.patch > > > With TEZ-3856, tez counters are exposed via input initializers which can be > used to publish split generation counters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17902) add a notions of default pool and unmanaged mapping part 1
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221554#comment-16221554 ] Prasanth Jayachandran commented on HIVE-17902: -- minor comment about default plan name and nullable. Looks good otherwise. > add a notions of default pool and unmanaged mapping part 1 > -- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17833) Publish split generation counters
[ https://issues.apache.org/jira/browse/HIVE-17833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221546#comment-16221546 ] Sergey Shelukhin commented on HIVE-17833: - Nit: is it possible to rename reason in OperationState etc to elaborate what the reason is for? Can be fixed on commit +1 pending tests > Publish split generation counters > - > > Key: HIVE-17833 > URL: https://issues.apache.org/jira/browse/HIVE-17833 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17833.1.patch, HIVE-17833.2.patch, > HIVE-17833.3.patch > > > With TEZ-3856, tez counters are exposed via input initializers which can be > used to publish split generation counters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17766) Support non-equi LEFT SEMI JOIN
[ https://issues.apache.org/jira/browse/HIVE-17766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17766: --- Attachment: HIVE-17766.01.patch > Support non-equi LEFT SEMI JOIN > --- > > Key: HIVE-17766 > URL: https://issues.apache.org/jira/browse/HIVE-17766 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17766.01.patch, HIVE-17766.patch > > > Currently we get an error like {noformat}Non equality condition not supported > in Semi-Join{noformat} > This is required to generate better plan for EXISTS/IN correlated subquery > where such queries are transformed into LEFT SEMI JOIN. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17902) add a notions of default pool and unmanaged mapping part 1
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221529#comment-16221529 ] Sergey Shelukhin commented on HIVE-17902: - [~prasanth_j] [~harishjp] can you take a look? thanks > add a notions of default pool and unmanaged mapping part 1 > -- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.
[ https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221531#comment-16221531 ] Colin Ma commented on HIVE-17381: - yes, [~vihangk1], please help merge this to branh-2 as well, thanks. > When we enable Parquet Writer Version V2, hive throws an exception: > Unsupported encoding: DELTA_BYTE_ARRAY. > --- > > Key: HIVE-17381 > URL: https://issues.apache.org/jira/browse/HIVE-17381 > Project: Hive > Issue Type: Sub-task >Reporter: Ke Jia >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-17381.001.patch > > > when we set "hive.vectorized.execution.enabled=true" and > "parquet.writer.version=v2" simultaneously, hive throws the following > exception: > Caused by: java.io.IOException: java.io.IOException: > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.UnsupportedOperationException: > Unsupported encoding: DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) > ... 16 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17902) add a notions of default pool and unmanaged mapping part 1
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17902: Status: Patch Available (was: Open) > add a notions of default pool and unmanaged mapping part 1 > -- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17902) add a notions of default pool and unmanaged mapping part 1
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17902: Attachment: (was: HIVE-17902.nogen.patch) > add a notions of default pool and unmanaged mapping part 1 > -- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17902) add a notions of default pool and unmanaged mapping part 1
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221527#comment-16221527 ] Sergey Shelukhin commented on HIVE-17902: - I'm also modifying the upgrade scripts in place since they were never released. > add a notions of default pool and unmanaged mapping part 1 > -- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.nogen.patch, HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17902) add a notions of default pool and unmanaged mapping part 1
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17902: Summary: add a notions of default pool and unmanaged mapping part 1 (was: add a notions of default pool and unmanaged mapping) > add a notions of default pool and unmanaged mapping part 1 > -- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.nogen.patch, HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info
[ https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221525#comment-16221525 ] Colin Ma commented on HIVE-16672: - [~vihangk1], thanks for merge it to branch-2. > Parquet vectorization doesn't work for tables with partition info > - > > Key: HIVE-16672 > URL: https://issues.apache.org/jira/browse/HIVE-16672 > Project: Hive > Issue Type: Sub-task >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16672-branch2.3.patch, HIVE-16672.001.patch, > HIVE-16672.002.patch > > > VectorizedParquetRecordReader doesn't check and update partition cols, this > should be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17902) add a notions of default pool and unmanaged mapping
[ https://issues.apache.org/jira/browse/HIVE-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17902: Attachment: HIVE-17902.patch HIVE-17902.nogen.patch The patch. One cannot create pools right now nor mappings, so the set default pool command won't work and the mappings command was not updated; will be done with/after the addition of those. > add a notions of default pool and unmanaged mapping > --- > > Key: HIVE-17902 > URL: https://issues.apache.org/jira/browse/HIVE-17902 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17902.nogen.patch, HIVE-17902.patch > > > This is needed to map queries between WM and non-WM execution -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak
[ https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221520#comment-16221520 ] Colin Ma commented on HIVE-16765: - [~vihangk1], tanks for merge it to branch-2. > ParquetFileReader should be closed to avoid resource leak > - > > Key: HIVE-16765 > URL: https://issues.apache.org/jira/browse/HIVE-16765 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.3.0, 3.0.0 >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16765-branch-2.3.patch, HIVE-16765.001.patch > > > ParquetFileReader should be closed to avoid resource leak -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12408) SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions
[ https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221517#comment-16221517 ] Thejas M Nair commented on HIVE-12408: -- [~ajisakaa] Please find instructions to request access to edit wiki here - https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit cc [~leftylev] > SQLStdAuthorizer should not require external table creator to be owner of > directory, in addition to rw permissions > -- > > Key: HIVE-12408 > URL: https://issues.apache.org/jira/browse/HIVE-12408 > Project: Hive > Issue Type: Bug > Components: Authorization, Security, SQLStandardAuthorization >Affects Versions: 0.14.0 > Environment: HDP 2.2 + Kerberos >Reporter: Hari Sekhon >Assignee: Akira Ajisaka >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch > > > When trying to create an external table via beeline in Hive using the > SQLStdAuthorizer it expects the table creator to be the owner of the > directory path and ignores the group rwx permission that is granted to the > user. > {code}Error: Error while compiling statement: FAILED: > HiveAccessControlException Permission denied: Principal [name=hari, > type=USER] does not have following privileges for operation CREATETABLE > [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, > name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code} > All it should be checking is read access to that directory. > The directory owner requirement breaks the ability of more than one user to > create external table definitions to a given location. For example this is a > flume landing directory with json data, and the /etl tree is owned by the > flume user. Even chowning the tree to another user would still break access > to other users who are able to read the directory in hdfs but would still > unable to create external tables on top of it. > This looks like a remnant of the owner only access model in SQLStdAuth and is > a separate issue to HIVE-11864 / HIVE-12324. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17834) Fix flaky triggers test
[ https://issues.apache.org/jira/browse/HIVE-17834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221512#comment-16221512 ] Sergey Shelukhin commented on HIVE-17834: - +1 > Fix flaky triggers test > --- > > Key: HIVE-17834 > URL: https://issues.apache.org/jira/browse/HIVE-17834 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17834.1.patch, HIVE-17834.2.patch, > HIVE-17834.3.patch > > > https://issues.apache.org/jira/browse/HIVE-12631?focusedCommentId=16209803=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16209803 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12745) Hive Timestamp value change after joining two tables
[ https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221513#comment-16221513 ] Jesus Camacho Rodriguez commented on HIVE-12745: [~srajat], [~397090770], have you been able to reproduce this consistently? I am trying to reproduce in another environment but so far no luck. > Hive Timestamp value change after joining two tables > > > Key: HIVE-12745 > URL: https://issues.apache.org/jira/browse/HIVE-12745 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: wyp >Assignee: Dmitry Tolpeko >Priority: Critical > Labels: timestamp > > I have two Hive tables:test and test1: > {code} > CREATE TABLE `test`( `t` timestamp) > CREATE TABLE `test1`( `t` timestamp) > {code} > they all holds a t value with Timestamp datatype,the contents of the two > table as follow: > {code} > hive> select * from test1; > OK > 1970-01-01 00:00:00 > 1970-03-02 00:00:00 > Time taken: 0.091 seconds, Fetched: 2 row(s) > hive> select * from test; > OK > 1970-01-01 00:00:00 > 1970-01-02 00:00:00 > Time taken: 0.085 seconds, Fetched: 2 row(s) > {code} > However when joining this two table, the returned timestamp value changed: > {code} > hive> select test.t, test1.t from test, test1; > OK > 1969-12-31 23:00:00 1970-01-01 00:00:00 > 1970-01-01 23:00:00 1970-01-01 00:00:00 > 1969-12-31 23:00:00 1970-03-02 00:00:00 > 1970-01-01 23:00:00 1970-03-02 00:00:00 > Time taken: 54.347 seconds, Fetched: 4 row(s) > {code} > and I found the result is changed every time > {code} > hive> select test.t, test1.t from test, test1; > OK > 1970-01-01 00:00:00 1970-01-01 00:00:00 > 1970-01-02 00:00:00 1970-01-01 00:00:00 > 1970-01-01 00:00:00 1970-03-02 00:00:00 > 1970-01-02 00:00:00 1970-03-02 00:00:00 > Time taken: 26.308 seconds, Fetched: 4 row(s) > {code} > Any suggestion? Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-12192) Hive should carry out timestamp computations in UTC
[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221507#comment-16221507 ] Jesus Camacho Rodriguez edited comment on HIVE-12192 at 10/27/17 12:28 AM: --- Very much wip but I had been working on this so I attach a draft to see what ptest gives. The idea would be to store the timestamp in ORC as UTC too, but a couple of constructors for the reader/writer of timestamp values need to be extended so we can specify the timezone ourselves instead of taking the system timezone automatically. was (Author: jcamachorodriguez): Very much wip but I had been working on this so I attach a draft to see what ptest gives. The idea would be to store the timestamp in ORC as UTC too, but a couple of constructors for the reader/writer of timestamp values need to be extended so we can specify the timezone ourselves instead of taking the system timezone. > Hive should carry out timestamp computations in UTC > --- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue >Assignee: Jesus Camacho Rodriguez > Labels: timestamp > Attachments: HIVE-12192.patch > > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12192) Hive should carry out timestamp computations in UTC
[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12192: --- Attachment: HIVE-12192.patch > Hive should carry out timestamp computations in UTC > --- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue >Assignee: Jesus Camacho Rodriguez > Labels: timestamp > Attachments: HIVE-12192.patch > > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12192) Hive should carry out timestamp computations in UTC
[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12192: --- Status: Patch Available (was: In Progress) Very much wip but I had been working on this so I attach a draft to see what ptest gives. The idea would be to store the timestamp in ORC as UTC too, but a couple of constructors for the reader/writer of timestamp values need to be extended so we can specify the timezone ourselves instead of taking the system timezone. > Hive should carry out timestamp computations in UTC > --- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue >Assignee: Jesus Camacho Rodriguez > Labels: timestamp > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-12192) Hive should carry out timestamp computations in UTC
[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-12192: -- Assignee: Jesus Camacho Rodriguez > Hive should carry out timestamp computations in UTC > --- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue >Assignee: Jesus Camacho Rodriguez > Labels: timestamp > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-12192) Hive should carry out timestamp computations in UTC
[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-12192 started by Jesus Camacho Rodriguez. -- > Hive should carry out timestamp computations in UTC > --- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue >Assignee: Jesus Camacho Rodriguez > Labels: timestamp > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17874) Parquet vectorization fails on tables with complex columns when there are no projected columns
[ https://issues.apache.org/jira/browse/HIVE-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221491#comment-16221491 ] Vihang Karajgaonkar commented on HIVE-17874: Patch merged to master. > Parquet vectorization fails on tables with complex columns when there are no > projected columns > -- > > Key: HIVE-17874 > URL: https://issues.apache.org/jira/browse/HIVE-17874 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-17874.01-branch-2.patch, HIVE-17874.01.patch, > HIVE-17874.02.patch, HIVE-17874.03.patch, HIVE-17874.04.patch, > HIVE-17874.05.patch, HIVE-17874.06.patch > > > When a parquet table contains an unsupported type like {{Map}}, {{LIST}} or > {{UNION}} simple queries like {{select count(*) from table}} fails with > {{unsupported type exception}} even though vectorized reader doesn't really > need read the complex type into batches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8937) fix description of hive.security.authorization.sqlstd.confwhitelist.* params
[ https://issues.apache.org/jira/browse/HIVE-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221492#comment-16221492 ] Akira Ajisaka commented on HIVE-8937: - Thanks! > fix description of hive.security.authorization.sqlstd.confwhitelist.* params > > > Key: HIVE-8937 > URL: https://issues.apache.org/jira/browse/HIVE-8937 > Project: Hive > Issue Type: Bug > Components: Documentation >Affects Versions: 0.14.0 >Reporter: Thejas M Nair >Assignee: Akira Ajisaka > Fix For: 3.0.0 > > Attachments: HIVE-8937.001.patch, HIVE-8937.002.patch > > > hive.security.authorization.sqlstd.confwhitelist.* param description in > HiveConf is incorrect. The expected value is a regex, not comma separated > regexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12408) SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions
[ https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221488#comment-16221488 ] Akira Ajisaka commented on HIVE-12408: -- Thank you, [~thejas]! In addition, I'd like to update the confluence wiki: https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-PrivilegesRequiredforHiveOperations Would you give me write access to the wiki? > SQLStdAuthorizer should not require external table creator to be owner of > directory, in addition to rw permissions > -- > > Key: HIVE-12408 > URL: https://issues.apache.org/jira/browse/HIVE-12408 > Project: Hive > Issue Type: Bug > Components: Authorization, Security, SQLStandardAuthorization >Affects Versions: 0.14.0 > Environment: HDP 2.2 + Kerberos >Reporter: Hari Sekhon >Assignee: Akira Ajisaka >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch > > > When trying to create an external table via beeline in Hive using the > SQLStdAuthorizer it expects the table creator to be the owner of the > directory path and ignores the group rwx permission that is granted to the > user. > {code}Error: Error while compiling statement: FAILED: > HiveAccessControlException Permission denied: Principal [name=hari, > type=USER] does not have following privileges for operation CREATETABLE > [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, > name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code} > All it should be checking is read access to that directory. > The directory owner requirement breaks the ability of more than one user to > create external table definitions to a given location. For example this is a > flume landing directory with json data, and the /etl tree is owned by the > flume user. Even chowning the tree to another user would still break access > to other users who are able to read the directory in hdfs but would still > unable to create external tables on top of it. > This looks like a remnant of the owner only access model in SQLStdAuth and is > a separate issue to HIVE-11864 / HIVE-12324. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info
[ https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221486#comment-16221486 ] Vihang Karajgaonkar commented on HIVE-16672: I cherry-picked the branch-2.3 patch to branch-2. It is was a clean cherry-pick with no conflicts. Also, ran locally the qtest in this patch. Merged this to branch-2 as well. > Parquet vectorization doesn't work for tables with partition info > - > > Key: HIVE-16672 > URL: https://issues.apache.org/jira/browse/HIVE-16672 > Project: Hive > Issue Type: Sub-task >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16672-branch2.3.patch, HIVE-16672.001.patch, > HIVE-16672.002.patch > > > VectorizedParquetRecordReader doesn't check and update partition cols, this > should be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info
[ https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-16672: --- Fix Version/s: 2.4.0 > Parquet vectorization doesn't work for tables with partition info > - > > Key: HIVE-16672 > URL: https://issues.apache.org/jira/browse/HIVE-16672 > Project: Hive > Issue Type: Sub-task >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16672-branch2.3.patch, HIVE-16672.001.patch, > HIVE-16672.002.patch > > > VectorizedParquetRecordReader doesn't check and update partition cols, this > should be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17764) alter view fails when hive.metastore.disallow.incompatible.col.type.changes set to true
[ https://issues.apache.org/jira/browse/HIVE-17764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-17764: --- Attachment: HIVE-17764-addendum.patch > alter view fails when hive.metastore.disallow.incompatible.col.type.changes > set to true > --- > > Key: HIVE-17764 > URL: https://issues.apache.org/jira/browse/HIVE-17764 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: Janaki Lahorani >Assignee: Janaki Lahorani > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17764-addendum.patch, HIVE-17764-branch-2.01.patch, > HIVE17764.1.patch, HIVE17764.2.patch > > > A view is a virtual structure that derives the type information from the > table(s) the view is based on.If the view definition is altered, the > corresponding column types should be updated. The relevance of the change > depending on the previous structure of the view is irrelevant. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16672) Parquet vectorization doesn't work for tables with partition info
[ https://issues.apache.org/jira/browse/HIVE-16672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221484#comment-16221484 ] Vihang Karajgaonkar commented on HIVE-16672: Hi [~colinma] Does this patch needs to go in branch-2 as well? Currently, I don't see it in branch-2 so Hive 2.4 will not have this patch. > Parquet vectorization doesn't work for tables with partition info > - > > Key: HIVE-16672 > URL: https://issues.apache.org/jira/browse/HIVE-16672 > Project: Hive > Issue Type: Sub-task >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0 > > Attachments: HIVE-16672-branch2.3.patch, HIVE-16672.001.patch, > HIVE-16672.002.patch > > > VectorizedParquetRecordReader doesn't check and update partition cols, this > should be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17381) When we enable Parquet Writer Version V2, hive throws an exception: Unsupported encoding: DELTA_BYTE_ARRAY.
[ https://issues.apache.org/jira/browse/HIVE-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221482#comment-16221482 ] Vihang Karajgaonkar commented on HIVE-17381: Hi [~colinma] Does this patch needs to be go in branch-2 as well? > When we enable Parquet Writer Version V2, hive throws an exception: > Unsupported encoding: DELTA_BYTE_ARRAY. > --- > > Key: HIVE-17381 > URL: https://issues.apache.org/jira/browse/HIVE-17381 > Project: Hive > Issue Type: Sub-task >Reporter: Ke Jia >Assignee: Colin Ma > Fix For: 3.0.0 > > Attachments: HIVE-17381.001.patch > > > when we set "hive.vectorized.execution.enabled=true" and > "parquet.writer.version=v2" simultaneously, hive throws the following > exception: > Caused by: java.io.IOException: java.io.IOException: > java.lang.UnsupportedOperationException: Unsupported encoding: > DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:232) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:254) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.UnsupportedOperationException: > Unsupported encoding: DELTA_BYTE_ARRAY > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229) > ... 16 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak
[ https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-16765: --- Fix Version/s: 2.4.0 > ParquetFileReader should be closed to avoid resource leak > - > > Key: HIVE-16765 > URL: https://issues.apache.org/jira/browse/HIVE-16765 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.3.0, 3.0.0 >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16765-branch-2.3.patch, HIVE-16765.001.patch > > > ParquetFileReader should be closed to avoid resource leak -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak
[ https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221479#comment-16221479 ] Vihang Karajgaonkar commented on HIVE-16765: Looks like this patch was not committed to branch-2. I merged it to branch-2 as well. > ParquetFileReader should be closed to avoid resource leak > - > > Key: HIVE-16765 > URL: https://issues.apache.org/jira/browse/HIVE-16765 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.3.0, 3.0.0 >Reporter: Colin Ma >Assignee: Colin Ma >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16765-branch-2.3.patch, HIVE-16765.001.patch > > > ParquetFileReader should be closed to avoid resource leak -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12745) Hive Timestamp value change after joining two tables
[ https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12745: --- Priority: Critical (was: Major) > Hive Timestamp value change after joining two tables > > > Key: HIVE-12745 > URL: https://issues.apache.org/jira/browse/HIVE-12745 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: wyp >Assignee: Dmitry Tolpeko >Priority: Critical > Labels: timestamp > > I have two Hive tables:test and test1: > {code} > CREATE TABLE `test`( `t` timestamp) > CREATE TABLE `test1`( `t` timestamp) > {code} > they all holds a t value with Timestamp datatype,the contents of the two > table as follow: > {code} > hive> select * from test1; > OK > 1970-01-01 00:00:00 > 1970-03-02 00:00:00 > Time taken: 0.091 seconds, Fetched: 2 row(s) > hive> select * from test; > OK > 1970-01-01 00:00:00 > 1970-01-02 00:00:00 > Time taken: 0.085 seconds, Fetched: 2 row(s) > {code} > However when joining this two table, the returned timestamp value changed: > {code} > hive> select test.t, test1.t from test, test1; > OK > 1969-12-31 23:00:00 1970-01-01 00:00:00 > 1970-01-01 23:00:00 1970-01-01 00:00:00 > 1969-12-31 23:00:00 1970-03-02 00:00:00 > 1970-01-01 23:00:00 1970-03-02 00:00:00 > Time taken: 54.347 seconds, Fetched: 4 row(s) > {code} > and I found the result is changed every time > {code} > hive> select test.t, test1.t from test, test1; > OK > 1970-01-01 00:00:00 1970-01-01 00:00:00 > 1970-01-02 00:00:00 1970-01-01 00:00:00 > 1970-01-01 00:00:00 1970-03-02 00:00:00 > 1970-01-02 00:00:00 1970-03-02 00:00:00 > Time taken: 26.308 seconds, Fetched: 4 row(s) > {code} > Any suggestion? Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12745) Hive Timestamp value change after joining two tables
[ https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12745: --- Target Version/s: 3.0.0 > Hive Timestamp value change after joining two tables > > > Key: HIVE-12745 > URL: https://issues.apache.org/jira/browse/HIVE-12745 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: wyp >Assignee: Dmitry Tolpeko >Priority: Critical > Labels: timestamp > > I have two Hive tables:test and test1: > {code} > CREATE TABLE `test`( `t` timestamp) > CREATE TABLE `test1`( `t` timestamp) > {code} > they all holds a t value with Timestamp datatype,the contents of the two > table as follow: > {code} > hive> select * from test1; > OK > 1970-01-01 00:00:00 > 1970-03-02 00:00:00 > Time taken: 0.091 seconds, Fetched: 2 row(s) > hive> select * from test; > OK > 1970-01-01 00:00:00 > 1970-01-02 00:00:00 > Time taken: 0.085 seconds, Fetched: 2 row(s) > {code} > However when joining this two table, the returned timestamp value changed: > {code} > hive> select test.t, test1.t from test, test1; > OK > 1969-12-31 23:00:00 1970-01-01 00:00:00 > 1970-01-01 23:00:00 1970-01-01 00:00:00 > 1969-12-31 23:00:00 1970-03-02 00:00:00 > 1970-01-01 23:00:00 1970-03-02 00:00:00 > Time taken: 54.347 seconds, Fetched: 4 row(s) > {code} > and I found the result is changed every time > {code} > hive> select test.t, test1.t from test, test1; > OK > 1970-01-01 00:00:00 1970-01-01 00:00:00 > 1970-01-02 00:00:00 1970-01-01 00:00:00 > 1970-01-01 00:00:00 1970-03-02 00:00:00 > 1970-01-02 00:00:00 1970-03-02 00:00:00 > Time taken: 26.308 seconds, Fetched: 4 row(s) > {code} > Any suggestion? Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15552) unable to coalesce DATE and TIMESTAMP types
[ https://issues.apache.org/jira/browse/HIVE-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15552: --- Target Version/s: 3.0.0 > unable to coalesce DATE and TIMESTAMP types > --- > > Key: HIVE-15552 > URL: https://issues.apache.org/jira/browse/HIVE-15552 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Critical > Labels: timestamp > > COALESCE expression does not expect DATE and TIMESTAMP types > select tdt.rnum, coalesce(tdt.cdt, cast(tdt.cdt as timestamp)) from > certtext.tdt > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Argument type mismatch 'cdt': The expressions after COALESCE should all have > the same type: "date" is expected but "timestamp" is found > SQLState: 42000 > ErrorCode: 4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15552) unable to coalesce DATE and TIMESTAMP types
[ https://issues.apache.org/jira/browse/HIVE-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15552: --- Priority: Critical (was: Minor) > unable to coalesce DATE and TIMESTAMP types > --- > > Key: HIVE-15552 > URL: https://issues.apache.org/jira/browse/HIVE-15552 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Critical > Labels: timestamp > > COALESCE expression does not expect DATE and TIMESTAMP types > select tdt.rnum, coalesce(tdt.cdt, cast(tdt.cdt as timestamp)) from > certtext.tdt > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Argument type mismatch 'cdt': The expressions after COALESCE should all have > the same type: "date" is expected but "timestamp" is found > SQLState: 42000 > ErrorCode: 4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17413) predicate involving CAST affects value returned by the SELECT statement
[ https://issues.apache.org/jira/browse/HIVE-17413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17413: --- Target Version/s: 3.0.0 > predicate involving CAST affects value returned by the SELECT statement > --- > > Key: HIVE-17413 > URL: https://issues.apache.org/jira/browse/HIVE-17413 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Jim Hopper >Priority: Critical > Labels: timestamp > > steps to reproduce: > {code} > create table t stored as orc as > select cast('2017-08-29 00:01:26' as timestamp) as ts; > {code} > {code} > select ts from t; > {code} > {code} > ts > 2017-08-29 00:01:26 > {code} > {code} > select ts from t where cast(ts as date) = '2017-08-29'; > {code} > {code} > ts > 2017-08-29 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15157) Partition Table With timestamp type on S3 storage --> Error in getting fields from serde.Invalid Field null
[ https://issues.apache.org/jira/browse/HIVE-15157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15157: --- Target Version/s: 3.0.0 > Partition Table With timestamp type on S3 storage --> Error in getting fields > from serde.Invalid Field null > --- > > Key: HIVE-15157 > URL: https://issues.apache.org/jira/browse/HIVE-15157 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 2.1.0 > Environment: JDK 1.8 101 >Reporter: thauvin damien >Priority: Critical > Labels: timestamp > > Hello > I get the error above when i try to perform : > hive> DESCRIBE formatted table partition (tsbucket='2016-10-28 16%3A00%3A00'); > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from > serde.Invalid Field null > Here is the description of the issue. > --External table Hive with dynamic partition enable on Aws S3 storage. > --Partition Table with timestamp type . > When i perform "show partition table;" everything is fine : > hive> show partitions table; > OK > tsbucket=2016-10-01 11%3A00%3A00 > tsbucket=2016-10-28 16%3A00%3A00 > And when i perform "describe FORMATTED table;" everything is fine > Is this a bug ? > The stacktrace of hive.log : > 2016-11-08T10:30:20,868 ERROR [ac3e0d48-22c5-4d04-a788-aeb004ea94f3 > main([])]: exec.DDLTask (DDLTask.java:failed(574)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Error in getting fields > from serde.Invalid Field null > at > org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3414) > at > org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:3109) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:408) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: MetaException(message:Invalid Field null) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getFieldsFromDeserializer(MetaStoreUtils.java:1336) > at > org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3409) > ... 21 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221467#comment-16221467 ] Eugene Koifman commented on HIVE-17458: --- HIVE-12631 is a prerequisite > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15157) Partition Table With timestamp type on S3 storage --> Error in getting fields from serde.Invalid Field null
[ https://issues.apache.org/jira/browse/HIVE-15157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15157: --- Priority: Critical (was: Major) > Partition Table With timestamp type on S3 storage --> Error in getting fields > from serde.Invalid Field null > --- > > Key: HIVE-15157 > URL: https://issues.apache.org/jira/browse/HIVE-15157 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 2.1.0 > Environment: JDK 1.8 101 >Reporter: thauvin damien >Priority: Critical > Labels: timestamp > > Hello > I get the error above when i try to perform : > hive> DESCRIBE formatted table partition (tsbucket='2016-10-28 16%3A00%3A00'); > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from > serde.Invalid Field null > Here is the description of the issue. > --External table Hive with dynamic partition enable on Aws S3 storage. > --Partition Table with timestamp type . > When i perform "show partition table;" everything is fine : > hive> show partitions table; > OK > tsbucket=2016-10-01 11%3A00%3A00 > tsbucket=2016-10-28 16%3A00%3A00 > And when i perform "describe FORMATTED table;" everything is fine > Is this a bug ? > The stacktrace of hive.log : > 2016-11-08T10:30:20,868 ERROR [ac3e0d48-22c5-4d04-a788-aeb004ea94f3 > main([])]: exec.DDLTask (DDLTask.java:failed(574)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Error in getting fields > from serde.Invalid Field null > at > org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3414) > at > org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:3109) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:408) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: MetaException(message:Invalid Field null) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getFieldsFromDeserializer(MetaStoreUtils.java:1336) > at > org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3409) > ... 21 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-11812) datediff sometimes returns incorrect results when called with dates
[ https://issues.apache.org/jira/browse/HIVE-11812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221453#comment-16221453 ] Jesus Camacho Rodriguez edited comment on HIVE-11812 at 10/26/17 11:45 PM: --- [~jdere], [~mmccline], [~chetna], was this solved with HIVE-15338? was (Author: jcamachorodriguez): [~jdere], [~mmccline], was this solved with HIVE-15338? > datediff sometimes returns incorrect results when called with dates > --- > > Key: HIVE-11812 > URL: https://issues.apache.org/jira/browse/HIVE-11812 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.0.0 >Reporter: Nicholas Brenwald >Assignee: Chetna Chaudhari >Priority: Minor > Labels: timestamp > Attachments: HIVE-11812.1.patch > > > DATEDIFF returns an incorrect result when one of the arguments is a date > type. > The Hive Language Manual provides the following signature for datediff: > {code} > int datediff(string enddate, string startdate) > {code} > I think datediff should either throw an error (if date types are not > supported), or return the correct result. > To reproduce, create a table: > {code} > create table t (c1 string, c2 date); > {code} > Assuming you have a table x containing some data, populate table t with 1 row: > {code} > insert into t select '2015-09-15', '2015-09-15' from x limit 1; > {code} > Then run the following 12 test queries: > {code} > select datediff(c1, '2015-09-14') from t; > select datediff(c1, '2015-09-15') from t; > select datediff(c1, '2015-09-16') from t; > select datediff('2015-09-14', c1) from t; > select datediff('2015-09-15', c1) from t; > select datediff('2015-09-16', c1) from t; > select datediff(c2, '2015-09-14') from t; > select datediff(c2, '2015-09-15') from t; > select datediff(c2, '2015-09-16') from t; > select datediff('2015-09-14', c2) from t; > select datediff('2015-09-15', c2) from t; > select datediff('2015-09-16', c2) from t; > {code} > The below table summarises the result. All results for column c1 (which is a > string) are correct, but when using c2 (which is a date), two of the results > are incorrect. > || Test || Expected Result || Actual Result || Passed / Failed || > |datediff(c1, '2015-09-14')| 1 | 1| Passed | > |datediff(c1, '2015-09-15')| 0 | 0| Passed | > |datediff(c1, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c1) | -1 | -1| Passed | > |datediff('2015-09-15', c1)| 0 | 0| Passed | > |datediff('2015-09-16', c1)| 1 | 1| Passed | > |datediff(c2, '2015-09-14')| 1 | 0| {color:red}Failed{color} | > |datediff(c2, '2015-09-15')| 0 | 0| Passed | > |datediff(c2, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c2) | -1 | 0 | {color:red}Failed{color} | > |datediff('2015-09-15', c2)| 0 | 0| Passed | > |datediff('2015-09-16', c2)| 1 | 1| Passed | -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11812) datediff sometimes returns incorrect results when called with dates
[ https://issues.apache.org/jira/browse/HIVE-11812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221463#comment-16221463 ] Matt McCline commented on HIVE-11812: - [~jcamachorodriguez] [~jdere] Yes, I believe so. > datediff sometimes returns incorrect results when called with dates > --- > > Key: HIVE-11812 > URL: https://issues.apache.org/jira/browse/HIVE-11812 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.0.0 >Reporter: Nicholas Brenwald >Assignee: Chetna Chaudhari >Priority: Minor > Labels: timestamp > Attachments: HIVE-11812.1.patch > > > DATEDIFF returns an incorrect result when one of the arguments is a date > type. > The Hive Language Manual provides the following signature for datediff: > {code} > int datediff(string enddate, string startdate) > {code} > I think datediff should either throw an error (if date types are not > supported), or return the correct result. > To reproduce, create a table: > {code} > create table t (c1 string, c2 date); > {code} > Assuming you have a table x containing some data, populate table t with 1 row: > {code} > insert into t select '2015-09-15', '2015-09-15' from x limit 1; > {code} > Then run the following 12 test queries: > {code} > select datediff(c1, '2015-09-14') from t; > select datediff(c1, '2015-09-15') from t; > select datediff(c1, '2015-09-16') from t; > select datediff('2015-09-14', c1) from t; > select datediff('2015-09-15', c1) from t; > select datediff('2015-09-16', c1) from t; > select datediff(c2, '2015-09-14') from t; > select datediff(c2, '2015-09-15') from t; > select datediff(c2, '2015-09-16') from t; > select datediff('2015-09-14', c2) from t; > select datediff('2015-09-15', c2) from t; > select datediff('2015-09-16', c2) from t; > {code} > The below table summarises the result. All results for column c1 (which is a > string) are correct, but when using c2 (which is a date), two of the results > are incorrect. > || Test || Expected Result || Actual Result || Passed / Failed || > |datediff(c1, '2015-09-14')| 1 | 1| Passed | > |datediff(c1, '2015-09-15')| 0 | 0| Passed | > |datediff(c1, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c1) | -1 | -1| Passed | > |datediff('2015-09-15', c1)| 0 | 0| Passed | > |datediff('2015-09-16', c1)| 1 | 1| Passed | > |datediff(c2, '2015-09-14')| 1 | 0| {color:red}Failed{color} | > |datediff(c2, '2015-09-15')| 0 | 0| Passed | > |datediff(c2, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c2) | -1 | 0 | {color:red}Failed{color} | > |datediff('2015-09-15', c2)| 0 | 0| Passed | > |datediff('2015-09-16', c2)| 1 | 1| Passed | -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12408) SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions
[ https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-12408: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the patch [~ajisakaa]! Committed to master branch. > SQLStdAuthorizer should not require external table creator to be owner of > directory, in addition to rw permissions > -- > > Key: HIVE-12408 > URL: https://issues.apache.org/jira/browse/HIVE-12408 > Project: Hive > Issue Type: Bug > Components: Authorization, Security, SQLStandardAuthorization >Affects Versions: 0.14.0 > Environment: HDP 2.2 + Kerberos >Reporter: Hari Sekhon >Assignee: Akira Ajisaka >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch > > > When trying to create an external table via beeline in Hive using the > SQLStdAuthorizer it expects the table creator to be the owner of the > directory path and ignores the group rwx permission that is granted to the > user. > {code}Error: Error while compiling statement: FAILED: > HiveAccessControlException Permission denied: Principal [name=hari, > type=USER] does not have following privileges for operation CREATETABLE > [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, > name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code} > All it should be checking is read access to that directory. > The directory owner requirement breaks the ability of more than one user to > create external table definitions to a given location. For example this is a > flume landing directory with json data, and the /etl tree is owned by the > flume user. Even chowning the tree to another user would still break access > to other users who are able to read the directory in hdfs but would still > unable to create external tables on top of it. > This looks like a remnant of the owner only access model in SQLStdAuth and is > a separate issue to HIVE-11864 / HIVE-12324. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-11127) Document time zone handling for current_date and current_timestamp
[ https://issues.apache.org/jira/browse/HIVE-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11127: --- Component/s: Documentation > Document time zone handling for current_date and current_timestamp > -- > > Key: HIVE-11127 > URL: https://issues.apache.org/jira/browse/HIVE-11127 > Project: Hive > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.2.0 >Reporter: Punya Biswal > Labels: timestamp > > The new {{current_date}} and {{current_timestamp}} functions introduced in > HIVE-5472 emit dates/timestamps in the user's local timezone. This behavior > should be documented on [the > wiki|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-11127) Document time zone handling for current_date and current_timestamp
[ https://issues.apache.org/jira/browse/HIVE-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11127: --- Labels: timestamp (was: ) > Document time zone handling for current_date and current_timestamp > -- > > Key: HIVE-11127 > URL: https://issues.apache.org/jira/browse/HIVE-11127 > Project: Hive > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.2.0 >Reporter: Punya Biswal > Labels: timestamp > > The new {{current_date}} and {{current_timestamp}} functions introduced in > HIVE-5472 emit dates/timestamps in the user's local timezone. This behavior > should be documented on [the > wiki|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-11377) currrent_timestamp in ISO-SQL is a timezone bearing type but Hive uses timezoneless types
[ https://issues.apache.org/jira/browse/HIVE-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11377: --- Labels: timestamp (was: ) > currrent_timestamp in ISO-SQL is a timezone bearing type but Hive uses > timezoneless types > - > > Key: HIVE-11377 > URL: https://issues.apache.org/jira/browse/HIVE-11377 > Project: Hive > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 1.2.0 >Reporter: N Campbell >Priority: Minor > Labels: timestamp > > Hive 1.2.x has added the niladic function current_timestamp. when ISO SQL > introduced time zone bearing types, it defined two forms of niladic > functions. Current_timetstamp/time return time zone bearing type and > Local_timestamp/time non-time zone bearing types. > This implementation is not described in the current documentation. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Timestamps -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-11812) datediff sometimes returns incorrect results when called with dates
[ https://issues.apache.org/jira/browse/HIVE-11812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221453#comment-16221453 ] Jesus Camacho Rodriguez commented on HIVE-11812: [~jdere], [~mmccline], was this solved with HIVE-15338? > datediff sometimes returns incorrect results when called with dates > --- > > Key: HIVE-11812 > URL: https://issues.apache.org/jira/browse/HIVE-11812 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.0.0 >Reporter: Nicholas Brenwald >Assignee: Chetna Chaudhari >Priority: Minor > Labels: timestamp > Attachments: HIVE-11812.1.patch > > > DATEDIFF returns an incorrect result when one of the arguments is a date > type. > The Hive Language Manual provides the following signature for datediff: > {code} > int datediff(string enddate, string startdate) > {code} > I think datediff should either throw an error (if date types are not > supported), or return the correct result. > To reproduce, create a table: > {code} > create table t (c1 string, c2 date); > {code} > Assuming you have a table x containing some data, populate table t with 1 row: > {code} > insert into t select '2015-09-15', '2015-09-15' from x limit 1; > {code} > Then run the following 12 test queries: > {code} > select datediff(c1, '2015-09-14') from t; > select datediff(c1, '2015-09-15') from t; > select datediff(c1, '2015-09-16') from t; > select datediff('2015-09-14', c1) from t; > select datediff('2015-09-15', c1) from t; > select datediff('2015-09-16', c1) from t; > select datediff(c2, '2015-09-14') from t; > select datediff(c2, '2015-09-15') from t; > select datediff(c2, '2015-09-16') from t; > select datediff('2015-09-14', c2) from t; > select datediff('2015-09-15', c2) from t; > select datediff('2015-09-16', c2) from t; > {code} > The below table summarises the result. All results for column c1 (which is a > string) are correct, but when using c2 (which is a date), two of the results > are incorrect. > || Test || Expected Result || Actual Result || Passed / Failed || > |datediff(c1, '2015-09-14')| 1 | 1| Passed | > |datediff(c1, '2015-09-15')| 0 | 0| Passed | > |datediff(c1, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c1) | -1 | -1| Passed | > |datediff('2015-09-15', c1)| 0 | 0| Passed | > |datediff('2015-09-16', c1)| 1 | 1| Passed | > |datediff(c2, '2015-09-14')| 1 | 0| {color:red}Failed{color} | > |datediff(c2, '2015-09-15')| 0 | 0| Passed | > |datediff(c2, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c2) | -1 | 0 | {color:red}Failed{color} | > |datediff('2015-09-15', c2)| 0 | 0| Passed | > |datediff('2015-09-16', c2)| 1 | 1| Passed | -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12193) Add a consistent unix_timestamp function.
[ https://issues.apache.org/jira/browse/HIVE-12193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12193: --- Labels: timestamp (was: ) > Add a consistent unix_timestamp function. > - > > Key: HIVE-12193 > URL: https://issues.apache.org/jira/browse/HIVE-12193 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 1.1.1 >Reporter: Ryan Blue > Labels: timestamp > > The {{unix_timestamp}} function returns values with respect to the SQL > session time zone (which is the default JVM time zone). This varies depending > on server time zone. This is required by the documentation for the function > and would be difficult to change. > For users that want consistent results across zones, Hive should include a > {{utc_timestamp}} method that is zone independent and gives a result assuming > the timestamp without time zone passed in is in UTC. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12195) Unknown zones should cause an error instead of silently failing
[ https://issues.apache.org/jira/browse/HIVE-12195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12195: --- Labels: timestamp (was: ) > Unknown zones should cause an error instead of silently failing > --- > > Key: HIVE-12195 > URL: https://issues.apache.org/jira/browse/HIVE-12195 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue >Assignee: Shinichi Yamashita > Labels: timestamp > Attachments: HIVE-12195.1.patch, HIVE-12195.2.patch, > HIVE-12195.3.patch, HIVE-12195.4.patch > > > Using an unknown time zone with the {{from_utc_timestamp}} or > {{to_utc_timetamp}} methods returns the time un-adjusted instead of throwing > an error: > {code} > hive> select from_utc_timestamp('2015-04-11 12:24:34.535', 'panda'); > OK > 2015-04-11 12:24:34.535 > {code} > This should be an error because users may attempt to adjust to valid but > unknown zones, like PDT or MDT. This would produce incorrect results with no > warning or error. > *Update*: A good work-around is to add a table of known zones that maps to > offset zone identifiers, like {{GMT-07:00}}. The table is small enough to > always be a broadcast join and results can be filtered (e.g. {{offset_zone IS > NOT NULL}}) so that only valid zones are passed to {{from_utc_timestamp}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12745) Hive Timestamp value change after joining two tables
[ https://issues.apache.org/jira/browse/HIVE-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12745: --- Labels: timestamp (was: ) > Hive Timestamp value change after joining two tables > > > Key: HIVE-12745 > URL: https://issues.apache.org/jira/browse/HIVE-12745 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: wyp >Assignee: Dmitry Tolpeko > Labels: timestamp > > I have two Hive tables:test and test1: > {code} > CREATE TABLE `test`( `t` timestamp) > CREATE TABLE `test1`( `t` timestamp) > {code} > they all holds a t value with Timestamp datatype,the contents of the two > table as follow: > {code} > hive> select * from test1; > OK > 1970-01-01 00:00:00 > 1970-03-02 00:00:00 > Time taken: 0.091 seconds, Fetched: 2 row(s) > hive> select * from test; > OK > 1970-01-01 00:00:00 > 1970-01-02 00:00:00 > Time taken: 0.085 seconds, Fetched: 2 row(s) > {code} > However when joining this two table, the returned timestamp value changed: > {code} > hive> select test.t, test1.t from test, test1; > OK > 1969-12-31 23:00:00 1970-01-01 00:00:00 > 1970-01-01 23:00:00 1970-01-01 00:00:00 > 1969-12-31 23:00:00 1970-03-02 00:00:00 > 1970-01-01 23:00:00 1970-03-02 00:00:00 > Time taken: 54.347 seconds, Fetched: 4 row(s) > {code} > and I found the result is changed every time > {code} > hive> select test.t, test1.t from test, test1; > OK > 1970-01-01 00:00:00 1970-01-01 00:00:00 > 1970-01-02 00:00:00 1970-01-01 00:00:00 > 1970-01-01 00:00:00 1970-03-02 00:00:00 > 1970-01-02 00:00:00 1970-03-02 00:00:00 > Time taken: 26.308 seconds, Fetched: 4 row(s) > {code} > Any suggestion? Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12191) Hive timestamp problems
[ https://issues.apache.org/jira/browse/HIVE-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12191: --- Labels: timestamp (was: ) > Hive timestamp problems > --- > > Key: HIVE-12191 > URL: https://issues.apache.org/jira/browse/HIVE-12191 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1 >Reporter: Ryan Blue > Labels: timestamp > > This is an umbrella JIRA for problems found with Hive's timestamp (without > time zone) implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12192) Hive should carry out timestamp computations in UTC
[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12192: --- Labels: timestamp (was: ) > Hive should carry out timestamp computations in UTC > --- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Ryan Blue > Labels: timestamp > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12194) Daylight savings zones are not recognized (PDT, MDT)
[ https://issues.apache.org/jira/browse/HIVE-12194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12194: --- Labels: timestamp (was: ) > Daylight savings zones are not recognized (PDT, MDT) > > > Key: HIVE-12194 > URL: https://issues.apache.org/jira/browse/HIVE-12194 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 1.1.1 >Reporter: Ryan Blue > Labels: timestamp > > When I call the {{from_utc_timestamp}} function (or {{to_utc_timestamp}}) > using my current time zone, the result is incorrect: > {code} > // CURRENT SERVER TIME ZONE IS PDT > hive> select to_utc_timestamp('2015-10-13 09:15:34.101', 'PDT'); > 2015-10-13 09:15:34.101 // NOT CHANGED! > hive> select to_utc_timestamp('2015-10-13 09:15:34.101', 'PST'); > 2015-10-13 16:15:34.101 // CORRECT VALUE FOR PST > {code} > *UPDATE*: It appears that happens because the daylight savings zones are not > recognized. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST
[ https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14305: --- Labels: timestamp (was: ) > To/From UTC timestamp may return incorrect result because of DST > > > Key: HIVE-14305 > URL: https://issues.apache.org/jira/browse/HIVE-14305 > Project: Hive > Issue Type: Sub-task >Reporter: Rui Li >Assignee: Rui Li > Labels: timestamp > > If the machine's local timezone involves DST, the UDFs return incorrect > results. > For example: > {code} > select to_utc_timestamp('2005-04-03 02:01:00','UTC'); > {code} > returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 > 02:01:00}}. > {code} > select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai'); > {code} > returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 > 02:01:00}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-14657) datediff function produce different results with timestamp and string combination
[ https://issues.apache.org/jira/browse/HIVE-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-14657. Resolution: Duplicate Closing as duplicate of HIVE-11812. > datediff function produce different results with timestamp and string > combination > - > > Key: HIVE-14657 > URL: https://issues.apache.org/jira/browse/HIVE-14657 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.13.0 >Reporter: Anup >Assignee: Chetna Chaudhari >Priority: Minor > Labels: timestamp > > when we use datediff function with string and timestamp type, it produces > different results. > See below queries: > select datediff("2016-08-18 16:48:12", "2016-07-18 12:54:54") from test2; > 31 > select datediff("2016-08-18 16:48:12", date) from test2; > 30 > select datediff("2016-08-18 16:48:12", cast(date as string)) from test2; > 31 > hive> desc test2; > OK > datetimestamp > hive> select * from test2; > OK > 2016-07-18 12:54:54 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14657) datediff function produce different results with timestamp and string combination
[ https://issues.apache.org/jira/browse/HIVE-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14657: --- Labels: timestamp (was: ) > datediff function produce different results with timestamp and string > combination > - > > Key: HIVE-14657 > URL: https://issues.apache.org/jira/browse/HIVE-14657 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.13.0 >Reporter: Anup >Assignee: Chetna Chaudhari >Priority: Minor > Labels: timestamp > > when we use datediff function with string and timestamp type, it produces > different results. > See below queries: > select datediff("2016-08-18 16:48:12", "2016-07-18 12:54:54") from test2; > 31 > select datediff("2016-08-18 16:48:12", date) from test2; > 30 > select datediff("2016-08-18 16:48:12", cast(date as string)) from test2; > 31 > hive> desc test2; > OK > datetimestamp > hive> select * from test2; > OK > 2016-07-18 12:54:54 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-11812) datediff sometimes returns incorrect results when called with dates
[ https://issues.apache.org/jira/browse/HIVE-11812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11812: --- Labels: timestamp (was: ) > datediff sometimes returns incorrect results when called with dates > --- > > Key: HIVE-11812 > URL: https://issues.apache.org/jira/browse/HIVE-11812 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.0.0 >Reporter: Nicholas Brenwald >Assignee: Chetna Chaudhari >Priority: Minor > Labels: timestamp > Attachments: HIVE-11812.1.patch > > > DATEDIFF returns an incorrect result when one of the arguments is a date > type. > The Hive Language Manual provides the following signature for datediff: > {code} > int datediff(string enddate, string startdate) > {code} > I think datediff should either throw an error (if date types are not > supported), or return the correct result. > To reproduce, create a table: > {code} > create table t (c1 string, c2 date); > {code} > Assuming you have a table x containing some data, populate table t with 1 row: > {code} > insert into t select '2015-09-15', '2015-09-15' from x limit 1; > {code} > Then run the following 12 test queries: > {code} > select datediff(c1, '2015-09-14') from t; > select datediff(c1, '2015-09-15') from t; > select datediff(c1, '2015-09-16') from t; > select datediff('2015-09-14', c1) from t; > select datediff('2015-09-15', c1) from t; > select datediff('2015-09-16', c1) from t; > select datediff(c2, '2015-09-14') from t; > select datediff(c2, '2015-09-15') from t; > select datediff(c2, '2015-09-16') from t; > select datediff('2015-09-14', c2) from t; > select datediff('2015-09-15', c2) from t; > select datediff('2015-09-16', c2) from t; > {code} > The below table summarises the result. All results for column c1 (which is a > string) are correct, but when using c2 (which is a date), two of the results > are incorrect. > || Test || Expected Result || Actual Result || Passed / Failed || > |datediff(c1, '2015-09-14')| 1 | 1| Passed | > |datediff(c1, '2015-09-15')| 0 | 0| Passed | > |datediff(c1, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c1) | -1 | -1| Passed | > |datediff('2015-09-15', c1)| 0 | 0| Passed | > |datediff('2015-09-16', c1)| 1 | 1| Passed | > |datediff(c2, '2015-09-14')| 1 | 0| {color:red}Failed{color} | > |datediff(c2, '2015-09-15')| 0 | 0| Passed | > |datediff(c2, '2015-09-16') | -1 | -1| Passed | > |datediff('2015-09-14', c2) | -1 | 0 | {color:red}Failed{color} | > |datediff('2015-09-15', c2)| 0 | 0| Passed | > |datediff('2015-09-16', c2)| 1 | 1| Passed | -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15157) Partition Table With timestamp type on S3 storage --> Error in getting fields from serde.Invalid Field null
[ https://issues.apache.org/jira/browse/HIVE-15157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15157: --- Labels: timestamp (was: ) > Partition Table With timestamp type on S3 storage --> Error in getting fields > from serde.Invalid Field null > --- > > Key: HIVE-15157 > URL: https://issues.apache.org/jira/browse/HIVE-15157 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 2.1.0 > Environment: JDK 1.8 101 >Reporter: thauvin damien > Labels: timestamp > > Hello > I get the error above when i try to perform : > hive> DESCRIBE formatted table partition (tsbucket='2016-10-28 16%3A00%3A00'); > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from > serde.Invalid Field null > Here is the description of the issue. > --External table Hive with dynamic partition enable on Aws S3 storage. > --Partition Table with timestamp type . > When i perform "show partition table;" everything is fine : > hive> show partitions table; > OK > tsbucket=2016-10-01 11%3A00%3A00 > tsbucket=2016-10-28 16%3A00%3A00 > And when i perform "describe FORMATTED table;" everything is fine > Is this a bug ? > The stacktrace of hive.log : > 2016-11-08T10:30:20,868 ERROR [ac3e0d48-22c5-4d04-a788-aeb004ea94f3 > main([])]: exec.DDLTask (DDLTask.java:failed(574)) - > org.apache.hadoop.hive.ql.metadata.HiveException: Error in getting fields > from serde.Invalid Field null > at > org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3414) > at > org.apache.hadoop.hive.ql.exec.DDLTask.describeTable(DDLTask.java:3109) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:408) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: MetaException(message:Invalid Field null) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getFieldsFromDeserializer(MetaStoreUtils.java:1336) > at > org.apache.hadoop.hive.ql.metadata.Hive.getFieldsFromDeserializer(Hive.java:3409) > ... 21 more -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17916) remove ConfVars.HIVE_VECTORIZATION_ROW_IDENTIFIER_ENABLED
[ https://issues.apache.org/jira/browse/HIVE-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17916: - > remove ConfVars.HIVE_VECTORIZATION_ROW_IDENTIFIER_ENABLED > - > > Key: HIVE-17916 > URL: https://issues.apache.org/jira/browse/HIVE-17916 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Teddy Choi > > follow up from HIVE-12631. Filing so it doesn't get lost. > There is this code in UpdateDeleteSemanticAnalyzer > {noformat} > // TODO: remove when this is enabled everywhere > HiveConf.setBoolVar(conf, > ConfVars.HIVE_VECTORIZATION_ROW_IDENTIFIER_ENABLED, true); > {noformat} > The 1st update/delete statement on a session will enable this and it will be > enabled for all future queries which makes this flag useless/misleading. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
[ https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15634: --- Labels: timestamp (was: ) > Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization > > > Key: HIVE-15634 > URL: https://issues.apache.org/jira/browse/HIVE-15634 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: slim bouguerra >Priority: Critical > Labels: timestamp > > {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change > timezone for Tez tasks. However, when Fetch optimizer kicks in because we can > push the full query to Druid, we obtain different values for the timestamp > than when jobs are executed. This probably has to do with the timezone on the > client side. How should we handle this issue? > For instance, this can be observed with the following query: > {code:sql} > set hive.fetch.task.conversion=more; > SELECT DISTINCT `__time` > FROM store_sales_sold_time_subset > WHERE `__time` < '1999-11-10 00:00:00'; > OK > 1999-10-31 19:00:00 > 1999-11-01 19:00:00 > 1999-11-02 19:00:00 > 1999-11-03 19:00:00 > 1999-11-04 19:00:00 > 1999-11-05 19:00:00 > 1999-11-06 19:00:00 > 1999-11-07 19:00:00 > 1999-11-08 19:00:00 > set hive.fetch.task.conversion=none; > SELECT DISTINCT `__time` > FROM store_sales_sold_time_subset > WHERE `__time` < '1999-11-10 00:00:00'; > OK > 1999-11-01 00:00:00 > 1999-11-02 00:00:00 > 1999-11-03 00:00:00 > 1999-11-04 00:00:00 > 1999-11-05 00:00:00 > 1999-11-06 00:00:00 > 1999-11-07 00:00:00 > 1999-11-08 00:00:00 > 1999-11-09 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15552) unable to coalesce DATE and TIMESTAMP types
[ https://issues.apache.org/jira/browse/HIVE-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15552: --- Labels: timestamp (was: ) > unable to coalesce DATE and TIMESTAMP types > --- > > Key: HIVE-15552 > URL: https://issues.apache.org/jira/browse/HIVE-15552 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Minor > Labels: timestamp > > COALESCE expression does not expect DATE and TIMESTAMP types > select tdt.rnum, coalesce(tdt.cdt, cast(tdt.cdt as timestamp)) from > certtext.tdt > Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 > Argument type mismatch 'cdt': The expressions after COALESCE should all have > the same type: "date" is expected but "timestamp" is found > SQLState: 42000 > ErrorCode: 4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
[ https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-15634. Resolution: Not A Problem This should not be an issue anymore, as timestamp with local time zone type is not dependent on system time zone. Closing as not a problem. > Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization > > > Key: HIVE-15634 > URL: https://issues.apache.org/jira/browse/HIVE-15634 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: slim bouguerra >Priority: Critical > Labels: timestamp > > {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change > timezone for Tez tasks. However, when Fetch optimizer kicks in because we can > push the full query to Druid, we obtain different values for the timestamp > than when jobs are executed. This probably has to do with the timezone on the > client side. How should we handle this issue? > For instance, this can be observed with the following query: > {code:sql} > set hive.fetch.task.conversion=more; > SELECT DISTINCT `__time` > FROM store_sales_sold_time_subset > WHERE `__time` < '1999-11-10 00:00:00'; > OK > 1999-10-31 19:00:00 > 1999-11-01 19:00:00 > 1999-11-02 19:00:00 > 1999-11-03 19:00:00 > 1999-11-04 19:00:00 > 1999-11-05 19:00:00 > 1999-11-06 19:00:00 > 1999-11-07 19:00:00 > 1999-11-08 19:00:00 > set hive.fetch.task.conversion=none; > SELECT DISTINCT `__time` > FROM store_sales_sold_time_subset > WHERE `__time` < '1999-11-10 00:00:00'; > OK > 1999-11-01 00:00:00 > 1999-11-02 00:00:00 > 1999-11-03 00:00:00 > 1999-11-04 00:00:00 > 1999-11-05 00:00:00 > 1999-11-06 00:00:00 > 1999-11-07 00:00:00 > 1999-11-08 00:00:00 > 1999-11-09 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization
[ https://issues.apache.org/jira/browse/HIVE-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15634: --- Target Version/s: (was: 3.0.0) > Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization > > > Key: HIVE-15634 > URL: https://issues.apache.org/jira/browse/HIVE-15634 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: slim bouguerra >Priority: Critical > Labels: timestamp > > {{SET hive.tez.java.opts=-Duser.timezone="UTC";}} can be used to change > timezone for Tez tasks. However, when Fetch optimizer kicks in because we can > push the full query to Druid, we obtain different values for the timestamp > than when jobs are executed. This probably has to do with the timezone on the > client side. How should we handle this issue? > For instance, this can be observed with the following query: > {code:sql} > set hive.fetch.task.conversion=more; > SELECT DISTINCT `__time` > FROM store_sales_sold_time_subset > WHERE `__time` < '1999-11-10 00:00:00'; > OK > 1999-10-31 19:00:00 > 1999-11-01 19:00:00 > 1999-11-02 19:00:00 > 1999-11-03 19:00:00 > 1999-11-04 19:00:00 > 1999-11-05 19:00:00 > 1999-11-06 19:00:00 > 1999-11-07 19:00:00 > 1999-11-08 19:00:00 > set hive.fetch.task.conversion=none; > SELECT DISTINCT `__time` > FROM store_sales_sold_time_subset > WHERE `__time` < '1999-11-10 00:00:00'; > OK > 1999-11-01 00:00:00 > 1999-11-02 00:00:00 > 1999-11-03 00:00:00 > 1999-11-04 00:00:00 > 1999-11-05 00:00:00 > 1999-11-06 00:00:00 > 1999-11-07 00:00:00 > 1999-11-08 00:00:00 > 1999-11-09 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16056) Hive Changing Future Timestamp Values column values when any clause or filter applied
[ https://issues.apache.org/jira/browse/HIVE-16056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16056: --- Labels: timestamp (was: ) > Hive Changing Future Timestamp Values column values when any clause or filter > applied > - > > Key: HIVE-16056 > URL: https://issues.apache.org/jira/browse/HIVE-16056 > Project: Hive > Issue Type: Bug > Components: Beeline, Database/Schema >Affects Versions: 1.2.1 >Reporter: Sunil Kumar > Labels: timestamp > > Hi, > We are observing different behavior of Hive for the timestamp column values. > When we apply clause like order by, distinct on same or other other column > in the hive query it print different result for the timestamp value for year > which start after 2300.. > Steps: > 1. Create a hive table >create table cutomer_sample(id int, arrival_time timestamp, dob date) > stored as ORC; > 2. Populate some data with future timestamp values > insert into table cutomer_sample values (1,'2015-01-01 > 00:00:00.0','2015-01-01'), (2,'2018-01-01 00:00:00.0','2018-01-01') , > (3,'2099-01-01 00:00:00.0','2099-01-01'), (4,'2100-01-01 > 00:00:00.0','2100-01-01'),(5,'2500-01-01 > 00:00:00.0','2500-01-01'),(6,'2200-01-01 > 00:00:00.0','2200-01-01'),(7,'2300-01-01 > 00:00:00.0','2300-01-01'),(8,'2400-01-01 00:00:00.0','2400-01-01'); > 3. Select all data with any clause > select * from cutomer_sample; > Output: > select * from cutomer_sample; > ++--+-+--+ > | cutomer_sample.id | cutomer_sample.arrival_time | cutomer_sample.dob | > ++--+-+--+ > | 1 | 2015-01-01 00:00:00.0| 2015-01-01 | > | 2 | 2018-01-01 00:00:00.0| 2018-01-01 | > | 3 | 2099-01-01 00:00:00.0| 2099-01-01 | > | 4 | 2100-01-01 00:00:00.0| 2100-01-01 | > | 5 | 2500-01-01 00:00:00.0| 2500-01-01 | > | 6 | 2200-01-01 00:00:00.0| 2200-01-01 | > | 7 | 2300-01-01 00:00:00.0| 2300-01-01 | > | 8 | 2400-01-01 00:00:00.0| 2400-01-01 | > ++--+-+--+ > 4. Apply order by on timestamp column > select * from cutomer_sample order by arrival_time ; > +++-+--+ > | cutomer_sample.id | cutomer_sample.arrival_time | cutomer_sample.dob | > +++-+--+ > | 7 | 1715-06-13 00:25:26.290448384 | 2300-01-01 | > | 8 | 1815-06-13 00:25:26.290448384 | 2400-01-01 | > | 5 | 1915-06-14 00:48:46.290448384 | 2500-01-01 | > | 1 | 2015-01-01 00:00:00.0 | 2015-01-01 | > | 2 | 2018-01-01 00:00:00.0 | 2018-01-01 | > | 3 | 2099-01-01 00:00:00.0 | 2099-01-01 | > | 4 | 2100-01-01 00:00:00.0 | 2100-01-01 | > | 6 | 2200-01-01 00:00:00.0 | 2200-01-01 | > +++-+--+ > you can see value of timestamp got changed after 2300 year.. > > 5. Apply order by on some other column still same behavior > +++-+--+ > | cutomer_sample.id | cutomer_sample.arrival_time | cutomer_sample.dob | > +++-+--+ > | 1 | 2015-01-01 00:00:00.0 | 2015-01-01 | > | 2 | 2018-01-01 00:00:00.0 | 2018-01-01 | > | 3 | 2099-01-01 00:00:00.0 | 2099-01-01 | > | 4 | 2100-01-01 00:00:00.0 | 2100-01-01 | > | 6 | 2200-01-01 00:00:00.0 | 2200-01-01 | > | 7 | 1715-06-13 00:25:26.290448384 | 2300-01-01 | > | 8 | 1815-06-13 00:25:26.290448384 | 2400-01-01 | > | 5 | 1915-06-14 00:48:46.290448384 | 2500-01-01 | > +++-+--+ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16262) Inconsistent result when casting integer to timestamp
[ https://issues.apache.org/jira/browse/HIVE-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16262: --- Labels: timestamp (was: ) > Inconsistent result when casting integer to timestamp > - > > Key: HIVE-16262 > URL: https://issues.apache.org/jira/browse/HIVE-16262 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Labels: timestamp > > As reported by [~jcamachorodriguez]: > {code} > To give a concrete example, consider the following query: > select cast(0 as timestamp) from src limit 1; > The result if Hive is running in Santa Clara is: > 1969-12-31 16:00:00 > While the result if Hive is running in London is: > 1970-01-01 00:00:00 > This is not the behavior defined by the standard for TIMESTAMP type. The > result should be consistent, in this case the correct result is: > 1970-01-01 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16262) Inconsistent result when casting integer to timestamp
[ https://issues.apache.org/jira/browse/HIVE-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-16262. Resolution: Not A Problem Not clear whether this is a problem, it would depend on what you consider the epoch to be (in UTC? in local time zone?). Closing as not an issue. > Inconsistent result when casting integer to timestamp > - > > Key: HIVE-16262 > URL: https://issues.apache.org/jira/browse/HIVE-16262 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > As reported by [~jcamachorodriguez]: > {code} > To give a concrete example, consider the following query: > select cast(0 as timestamp) from src limit 1; > The result if Hive is running in Santa Clara is: > 1969-12-31 16:00:00 > While the result if Hive is running in London is: > 1970-01-01 00:00:00 > This is not the behavior defined by the standard for TIMESTAMP type. The > result should be consistent, in this case the correct result is: > 1970-01-01 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-10728) deprecate unix_timestamp(void) and make it deterministic
[ https://issues.apache.org/jira/browse/HIVE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10728: --- Labels: TODOC1.3 timestamp (was: TODOC1.3) > deprecate unix_timestamp(void) and make it deterministic > > > Key: HIVE-10728 > URL: https://issues.apache.org/jira/browse/HIVE-10728 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC1.3, timestamp > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-10728.01.patch, HIVE-10728.02.patch, > HIVE-10728.03.patch, HIVE-10728.patch > > > We have a proper current_timestamp function that is not evaluated at runtime. > Behavior of unix_timestamp(void) is both surprising, and is preventing some > optimizations on the other overload since the function becomes > non-deterministic. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17869) unix_timestamp(string date, string pattern) UDF does not verify date is valid
[ https://issues.apache.org/jira/browse/HIVE-17869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17869: --- Labels: timestamp (was: ) > unix_timestamp(string date, string pattern) UDF does not verify date is valid > - > > Key: HIVE-17869 > URL: https://issues.apache.org/jira/browse/HIVE-17869 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.1 >Reporter: Brian Goerlitz > Labels: timestamp > > unix_timestamp(string date, string pattern) returns a value in situations > which would be expected to return 0 (fail): > {noformat} > hive> -- Date does not exist > > select unix_timestamp('2017/02/29', '/MM/dd'); > OK > 1488326400 > Time taken: 0.317 seconds, Fetched: 1 row(s) > hive> -- Date does not exist > > select from_unixtime(unix_timestamp('2017/02/29', '/MM/dd')); > OK > 2017-03-01 00:00:00 > Time taken: 0.28 seconds, Fetched: 1 row(s) > hive> -- Date in wrong format > > select unix_timestamp('2017/02/29', 'MM/dd/'); > OK > -55950393600 > Time taken: 0.303 seconds, Fetched: 1 row(s) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17278) Incorrect output timestamp from from_utc_timestamp()/to_utc_timestamp when local timezone has DST
[ https://issues.apache.org/jira/browse/HIVE-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17278: --- Labels: timestamp (was: ) > Incorrect output timestamp from from_utc_timestamp()/to_utc_timestamp when > local timezone has DST > - > > Key: HIVE-17278 > URL: https://issues.apache.org/jira/browse/HIVE-17278 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 2.0.0 >Reporter: Leela Krishna > Labels: timestamp > > HIVE-12706 is resolved but there is still a bug in this - > from_utc_timestamp() is interpreting a GMT timestamp with DST. > HS2 on PST timezone: > GMT timestamp PST timestamp PST 2GMT > 2012-03-11 01:30:15.332 2012-03-10 17:30:15.332 2012-03-11 01:30:15.332 > 2012-03-11 02:30:15.332 2012-03-10 19:30:15.332 2012-03-11 03:30:15.332 > (<--- We got 1 hour more on GMT) > PSTtimestap is generated using from_utc_timestamp('2012-03-11 02:30:15.332', > 'PST') > PST2GMT timestamp is generated using > to_utc_timestamp(from_utc_timestamp('2012-03-11 02:30:15.332', 'PST'), 'PST') -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17413) predicate involving CAST affects value returned by the SELECT statement
[ https://issues.apache.org/jira/browse/HIVE-17413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17413: --- Labels: timestamp (was: ) > predicate involving CAST affects value returned by the SELECT statement > --- > > Key: HIVE-17413 > URL: https://issues.apache.org/jira/browse/HIVE-17413 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Jim Hopper >Priority: Critical > Labels: timestamp > > steps to reproduce: > {code} > create table t stored as orc as > select cast('2017-08-29 00:01:26' as timestamp) as ts; > {code} > {code} > select ts from t; > {code} > {code} > ts > 2017-08-29 00:01:26 > {code} > {code} > select ts from t where cast(ts as date) = '2017-08-29'; > {code} > {code} > ts > 2017-08-29 00:00:00 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16434) Add support for parsing additional timestamp formats
[ https://issues.apache.org/jira/browse/HIVE-16434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16434: --- Labels: timestamp (was: ) > Add support for parsing additional timestamp formats > > > Key: HIVE-16434 > URL: https://issues.apache.org/jira/browse/HIVE-16434 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Planning >Reporter: Ashutosh Chauhan > Labels: timestamp > > Will be useful to handle additional timestamp formats. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17888) Display the reason for query cancellation
[ https://issues.apache.org/jira/browse/HIVE-17888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17888: - Status: Patch Available (was: Open) > Display the reason for query cancellation > - > > Key: HIVE-17888 > URL: https://issues.apache.org/jira/browse/HIVE-17888 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17888.1.patch > > > For user convenience and easy debugging, if a trigger kills a query return > the reason for the killing the query. Currently the query kill will only > display the following which is not very useful > {code} > Error: Query was cancelled (state=01000,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17888) Display the reason for query cancellation
[ https://issues.apache.org/jira/browse/HIVE-17888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17888: - Attachment: HIVE-17888.1.patch > Display the reason for query cancellation > - > > Key: HIVE-17888 > URL: https://issues.apache.org/jira/browse/HIVE-17888 > Project: Hive > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17888.1.patch > > > For user convenience and easy debugging, if a trigger kills a query return > the reason for the killing the query. Currently the query kill will only > display the following which is not very useful > {code} > Error: Query was cancelled (state=01000,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12408) SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions
[ https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-12408: - Summary: SQLStdAuthorizer should not require external table creator to be owner of directory, in addition to rw permissions (was: SQLStdAuthorizer expects external table creator to be owner of directory, does not respect rwx group permission. ) > SQLStdAuthorizer should not require external table creator to be owner of > directory, in addition to rw permissions > -- > > Key: HIVE-12408 > URL: https://issues.apache.org/jira/browse/HIVE-12408 > Project: Hive > Issue Type: Bug > Components: Authorization, Security, SQLStandardAuthorization >Affects Versions: 0.14.0 > Environment: HDP 2.2 + Kerberos >Reporter: Hari Sekhon >Assignee: Akira Ajisaka >Priority: Critical > Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch > > > When trying to create an external table via beeline in Hive using the > SQLStdAuthorizer it expects the table creator to be the owner of the > directory path and ignores the group rwx permission that is granted to the > user. > {code}Error: Error while compiling statement: FAILED: > HiveAccessControlException Permission denied: Principal [name=hari, > type=USER] does not have following privileges for operation CREATETABLE > [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, > name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code} > All it should be checking is read access to that directory. > The directory owner requirement breaks the ability of more than one user to > create external table definitions to a given location. For example this is a > flume landing directory with json data, and the /etl tree is owned by the > flume user. Even chowning the tree to another user would still break access > to other users who are able to read the directory in hdfs but would still > unable to create external tables on top of it. > This looks like a remnant of the owner only access model in SQLStdAuth and is > a separate issue to HIVE-11864 / HIVE-12324. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12408) SQLStdAuthorizer expects external table creator to be owner of directory, does not respect rwx group permission.
[ https://issues.apache.org/jira/browse/HIVE-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-12408: - Summary: SQLStdAuthorizer expects external table creator to be owner of directory, does not respect rwx group permission. (was: SQLStdAuthorizer expects external table creator to be owner of directory, does not respect rwx group permission. Only one user could ever create an external table definition to dir!) > SQLStdAuthorizer expects external table creator to be owner of directory, > does not respect rwx group permission. > - > > Key: HIVE-12408 > URL: https://issues.apache.org/jira/browse/HIVE-12408 > Project: Hive > Issue Type: Bug > Components: Authorization, Security, SQLStandardAuthorization >Affects Versions: 0.14.0 > Environment: HDP 2.2 + Kerberos >Reporter: Hari Sekhon >Assignee: Akira Ajisaka >Priority: Critical > Attachments: HIVE-12408.001.patch, HIVE-12408.002.patch > > > When trying to create an external table via beeline in Hive using the > SQLStdAuthorizer it expects the table creator to be the owner of the > directory path and ignores the group rwx permission that is granted to the > user. > {code}Error: Error while compiling statement: FAILED: > HiveAccessControlException Permission denied: Principal [name=hari, > type=USER] does not have following privileges for operation CREATETABLE > [[INSERT, DELETE, OBJECT OWNERSHIP] on Object [type=DFS_URI, > name=/etl/path/to/hdfs/dir]] (state=42000,code=4){code} > All it should be checking is read access to that directory. > The directory owner requirement breaks the ability of more than one user to > create external table definitions to a given location. For example this is a > flume landing directory with json data, and the /etl tree is owned by the > flume user. Even chowning the tree to another user would still break access > to other users who are able to read the directory in hdfs but would still > unable to create external tables on top of it. > This looks like a remnant of the owner only access model in SQLStdAuth and is > a separate issue to HIVE-11864 / HIVE-12324. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`
[ https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17791: Resolution: Fixed Fix Version/s: 2.2.1 2.4.0 Status: Resolved (was: Patch Available) > Temp dirs under the staging directory should honour `inheritPerms` > -- > > Key: HIVE-17791 > URL: https://issues.apache.org/jira/browse/HIVE-17791 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 2.2.0, 2.4.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Fix For: 2.4.0, 2.2.1 > > Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch > > > For [~cdrome]: > CLI creates two levels of staging directories but calls setPermissions on the > top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}. > The top-level directory, > {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}} > is created the first time {{Context.getExternalTmpPath}} is called. > The child directory, > {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}} > is created when {{TezTask.execute}} is called at line 164: > {code:java} > DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx); > {code} > This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}: > {code:java} > 3770 private static void createTmpDirs(Configuration conf, > 3771 Listops) throws IOException { > 3772 > 3773 while (!ops.isEmpty()) { > 3774 Operator op = ops.remove(0); > 3775 > 3776 if (op instanceof FileSinkOperator) { > 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf(); > 3778 Path tempDir = fdesc.getDirName(); > 3779 > 3780 if (tempDir != null) { > 3781 Path tempPath = Utilities.toTempPath(tempDir); > 3782 FileSystem fs = tempPath.getFileSystem(conf); > 3783 fs.mkdirs(tempPath); // <-- HERE! > 3784 } > 3785 } > 3786 > 3787 if (op.getChildOperators() != null) { > 3788 ops.addAll(op.getChildOperators()); > 3789 } > 3790 } > 3791 } > {code} > It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll > rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to > wait till the issues around {{StorageBasedAuthProvider}}, directory > permissions, etc. are sorted out. > (Note to self: YHIVE-857) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17791) Temp dirs under the staging directory should honour `inheritPerms`
[ https://issues.apache.org/jira/browse/HIVE-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221299#comment-16221299 ] Mithun Radhakrishnan commented on HIVE-17791: - Committed to {{branch-2}}, and {{branch-2.2}}. Thanks, [~cdrome]. > Temp dirs under the staging directory should honour `inheritPerms` > -- > > Key: HIVE-17791 > URL: https://issues.apache.org/jira/browse/HIVE-17791 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 2.2.0, 2.4.0 >Reporter: Mithun Radhakrishnan >Assignee: Chris Drome > Attachments: HIVE-17791.1-branch-2.patch, HIVE-17791.2-branch-2.patch > > > For [~cdrome]: > CLI creates two levels of staging directories but calls setPermissions on the > top-level directory only if {{hive.warehouse.subdir.inherit.perms=true}}. > The top-level directory, > {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1}} > is created the first time {{Context.getExternalTmpPath}} is called. > The child directory, > {{/user/cdrome/hive/words_text_dist/dt=c/.hive-staging_hive_2016-07-15_08-44-22_082_5534649671389063929-1/_tmp.-ext-1}} > is created when {{TezTask.execute}} is called at line 164: > {code:java} > DAG dag = build(jobConf, work, scratchDir, appJarLr, additionalLr, ctx); > {code} > This calls {{DagUtils.createVertex}}, which calls {{Utilities.createTmpDirs}}: > {code:java} > 3770 private static void createTmpDirs(Configuration conf, > 3771 Listops) throws IOException { > 3772 > 3773 while (!ops.isEmpty()) { > 3774 Operator op = ops.remove(0); > 3775 > 3776 if (op instanceof FileSinkOperator) { > 3777 FileSinkDesc fdesc = ((FileSinkOperator) op).getConf(); > 3778 Path tempDir = fdesc.getDirName(); > 3779 > 3780 if (tempDir != null) { > 3781 Path tempPath = Utilities.toTempPath(tempDir); > 3782 FileSystem fs = tempPath.getFileSystem(conf); > 3783 fs.mkdirs(tempPath); // <-- HERE! > 3784 } > 3785 } > 3786 > 3787 if (op.getChildOperators() != null) { > 3788 ops.addAll(op.getChildOperators()); > 3789 } > 3790 } > 3791 } > {code} > It turns out that {{inheritPerms}} is no longer part of {{master}}. I'll > rebase this for {{branch-2}}, and {{branch-2.2}}. {{master}} will have to > wait till the issues around {{StorageBasedAuthProvider}}, directory > permissions, etc. are sorted out. > (Note to self: YHIVE-857) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17911) org.apache.hadoop.hive.metastore.ObjectStore - Tune Up
[ https://issues.apache.org/jira/browse/HIVE-17911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221298#comment-16221298 ] Hive QA commented on HIVE-17911: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894166/HIVE-17911.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 11325 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7490/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7490/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7490/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894166 - PreCommit-HIVE-Build > org.apache.hadoop.hive.metastore.ObjectStore - Tune Up > -- > > Key: HIVE-17911 > URL: https://issues.apache.org/jira/browse/HIVE-17911 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-17911.1.patch > > > # Remove unused variables > # Add logging parameterization > # Use CollectionUtils.isEmpty/isNotEmpty to simplify and unify collection > empty check (and always use null check) > # Minor tweaks -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17433) Vectorization: Support Decimal64 in Hive Query Engine
[ https://issues.apache.org/jira/browse/HIVE-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-17433: Attachment: HIVE-17433.08.patch > Vectorization: Support Decimal64 in Hive Query Engine > - > > Key: HIVE-17433 > URL: https://issues.apache.org/jira/browse/HIVE-17433 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-17433.03.patch, HIVE-17433.04.patch, > HIVE-17433.05.patch, HIVE-17433.06.patch, HIVE-17433.07.patch, > HIVE-17433.08.patch > > > Provide partial support for Decimal64 within Hive. By partial I mean that > our current decimal has a large surface area of features (rounding, multiply, > divide, remainder, power, big precision, and many more) but only a small > number has been identified as being performance hotspots. > Those are small precision decimals with precision <= 18 that fit within a > 64-bit long we are calling Decimal64 . Just as we optimize row-mode > execution engine hotspots by selectively adding new vectorization code, we > can treat the current decimal as the full featured one and add additional > Decimal64 optimization where query benchmarks really show it help. > This change creates a Decimal64ColumnVector. > This change currently detects small decimal with Hive for Vectorized text > input format and uses some new Decimal64 vectorized classes for comparison, > addition, and later perhaps a few GroupBy aggregations like sum, avg, min, > max. > The patch also supports a new annotation that can mark a > VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, > in separate work those other formats such as ORC, PARQUET, etc can be done in > later JIRAs so they participate in the Decimal64 performance optimization. > The idea is when you annotate your input format with: > @VectorizedInputFormatSupports(supports = {DECIMAL_64}) > the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of > DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being > used, the input format can fill that column vector with decimal64 longs > instead of HiveDecimalWritable objects of DecimalColumnVector. > There will be a Hive environment variable > hive.vectorized.input.format.supports.enabled that has a string list of > supported features. The default will start as "decimal_64". It can be > turned off to allow for performance comparisons and testing. > The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY > key, value > Will have a vectorized explain plan looking like: > ... > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: > FilterDecimal64ColLessDecimal64Scalar(col 2, val 2000)(children: > Decimal64ColSubtractDecimal64Scalar(col 0, val 1000, > outputDecimal64AbsMax 999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean > predicate: ((key - 100) < 200) (type: boolean) > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)