[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I
[ https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372141#comment-15372141 ] Hive QA commented on HIVE-14111: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817215/HIVE-14111.06.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10294 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-tez_self_join.q-filter_join_breaktask.q-vector_decimal_precision.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/476/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/476/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-476/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12817215 - PreCommit-HIVE-MASTER-Build > better concurrency handling for TezSessionState - part I > > > Key: HIVE-14111 > URL: https://issues.apache.org/jira/browse/HIVE-14111 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, > HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, > HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372058#comment-15372058 ] Hive QA commented on HIVE-14007: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817200/HIVE-14007.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/475/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/475/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-475/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]] + export JAVA_HOME=/usr/java/jdk1.8.0_25 + JAVA_HOME=/usr/java/jdk1.8.0_25 + export PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-475/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at a61c351 HIVE-14200: Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther Hagleitner) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at a61c351 HIVE-14200: Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther Hagleitner) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12817200 - PreCommit-HIVE-MASTER-Build > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly
[ https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372057#comment-15372057 ] Hive QA commented on HIVE-12646: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817193/HIVE-12646.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10308 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/474/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/474/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-474/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12817193 - PreCommit-HIVE-MASTER-Build > beeline and HIVE CLI do not parse ; in quote properly > - > > Key: HIVE-12646 > URL: https://issues.apache.org/jira/browse/HIVE-12646 > Project: Hive > Issue Type: Bug > Components: CLI, Clients >Reporter: Yongzhi Chen >Assignee: Sahil Takiar > Attachments: HIVE-12646.2.patch, HIVE-12646.3.patch, HIVE-12646.patch > > > Beeline and Cli have to escape ; in the quote while most other shell scripts > need not. For example: > in Beeline: > {noformat} > 0: jdbc:hive2://localhost:1> select ';' from tlb1; > select ';' from tlb1; > 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115 > 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403 > Error: Error while compiling statement: FAILED: ParseException line 1:8 > cannot recognize input near '' ' > {noformat} > while in mysql shell: > {noformat} > mysql> SELECT CONCAT(';', 'foo') FROM test limit 3; > ++ > | ;foo | > | ;foo | > | ;foo | > ++ > 3 rows in set (0.00 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14212) hbase_queries result out of date on branch-2.1
[ https://issues.apache.org/jira/browse/HIVE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372032#comment-15372032 ] Pengcheng Xiong commented on HIVE-14212: After I checked master, I found that it was not consistent with 2.1. Thus, +1. Btw, i think updating q file outputs does not require +1. :) > hbase_queries result out of date on branch-2.1 > -- > > Key: HIVE-14212 > URL: https://issues.apache.org/jira/browse/HIVE-14212 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Trivial > Attachments: HIVE-14212-branch-2.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14196: - Attachment: HIVE-14196.3.patch Minor change > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch, > HIVE-14196.3.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14196: - Attachment: HIVE-14196.3.patch Added the check at compilation as well. > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch, > HIVE-14196.3.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14212) hbase_queries result out of date on branch-2.1
[ https://issues.apache.org/jira/browse/HIVE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14212: Status: Patch Available (was: Open) > hbase_queries result out of date on branch-2.1 > -- > > Key: HIVE-14212 > URL: https://issues.apache.org/jira/browse/HIVE-14212 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Trivial > Attachments: HIVE-14212-branch-2.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables
[ https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371988#comment-15371988 ] Sahil Takiar commented on HIVE-14137: - Re-basing again > Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty > tables > --- > > Key: HIVE-14137 > URL: https://issues.apache.org/jira/browse/HIVE-14137 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, > HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, > HIVE-14137.6.patch, HIVE-14137.patch > > > The following queries: > {code} > -- Setup > drop table if exists empty1; > create table empty1 (col1 bigint) stored as parquet tblproperties > ('parquet.compress'='snappy'); > drop table if exists empty2; > create table empty2 (col1 bigint, col2 bigint) stored as parquet > tblproperties ('parquet.compress'='snappy'); > drop table if exists empty3; > create table empty3 (col1 bigint) stored as parquet tblproperties > ('parquet.compress'='snappy'); > -- All empty HDFS directories. > -- Fails with [08S01]: Error while processing statement: FAILED: Execution > Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > -- Two empty HDFS directories. > -- Create an empty file in HDFS. > insert into empty1 select * from empty1 where false; > -- Same query fails with [08S01]: Error while processing statement: FAILED: > Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > -- One empty HDFS directory. > -- Create an empty file in HDFS. > insert into empty2 select * from empty2 where false; > -- Same query succeeds. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > {code} > Will result in the following exception: > {code} > org.apache.hadoop.fs.FileAlreadyExistsException: > /tmp/hive/hive/1f3837aa-9407-4780-92b1-42a66d205139/hive_2016-06-24_15-45-23_206_79177714958655528-2/-mr-10004/0/emptyFile > for client 172.26.14.151 already exists > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2561) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:593) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1902) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1738) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1663) > at >
[jira] [Updated] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables
[ https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-14137: Attachment: HIVE-14137.6.patch > Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty > tables > --- > > Key: HIVE-14137 > URL: https://issues.apache.org/jira/browse/HIVE-14137 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, > HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, > HIVE-14137.6.patch, HIVE-14137.patch > > > The following queries: > {code} > -- Setup > drop table if exists empty1; > create table empty1 (col1 bigint) stored as parquet tblproperties > ('parquet.compress'='snappy'); > drop table if exists empty2; > create table empty2 (col1 bigint, col2 bigint) stored as parquet > tblproperties ('parquet.compress'='snappy'); > drop table if exists empty3; > create table empty3 (col1 bigint) stored as parquet tblproperties > ('parquet.compress'='snappy'); > -- All empty HDFS directories. > -- Fails with [08S01]: Error while processing statement: FAILED: Execution > Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > -- Two empty HDFS directories. > -- Create an empty file in HDFS. > insert into empty1 select * from empty1 where false; > -- Same query fails with [08S01]: Error while processing statement: FAILED: > Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > -- One empty HDFS directory. > -- Create an empty file in HDFS. > insert into empty2 select * from empty2 where false; > -- Same query succeeds. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > {code} > Will result in the following exception: > {code} > org.apache.hadoop.fs.FileAlreadyExistsException: > /tmp/hive/hive/1f3837aa-9407-4780-92b1-42a66d205139/hive_2016-06-24_15-45-23_206_79177714958655528-2/-mr-10004/0/emptyFile > for client 172.26.14.151 already exists > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2561) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:593) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1902) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1738) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1663) > at >
[jira] [Updated] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich updated HIVE-14210: Attachment: HIVE-14210.1.patch > SSLFactory truststore reloader threads leaking in HiveServer2 > - > > Key: HIVE-14210 > URL: https://issues.apache.org/jira/browse/HIVE-14210 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 2.1.0 >Reporter: Thomas Friedrich >Assignee: Thomas Friedrich > Attachments: HIVE-14210.1.patch, HIVE-14210.patch > > > We found an issue in a customer environment where the HS2 crashed after a few > days and the Java core dump contained several thousands of truststore > reloader threads: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > We found the issue to be caused by a bug in Hadoop where the > TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in > Hadoop and the timeline server is running. I opened YARN-5309 which has more > details on the problem, and a patch was submitted a few days back. > In addition to the changes in Hadoop, there are a couple of Hive changes > required: > - ExecDriver needs to call jobclient.close() to trigger the clean-up of the > resources after the submitted job is done/failed > - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 > and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both > fixes are included in Hadoop 2.6.4. > However, since we also need to pick up YARN-5309, we need to wait for a new > release of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371977#comment-15371977 ] Sergey Shelukhin commented on HIVE-14210: - nit: can you add braces to the if? Otherwise +1 cc [~thejas] [~vgumashta] > SSLFactory truststore reloader threads leaking in HiveServer2 > - > > Key: HIVE-14210 > URL: https://issues.apache.org/jira/browse/HIVE-14210 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 2.1.0 >Reporter: Thomas Friedrich >Assignee: Thomas Friedrich > Attachments: HIVE-14210.patch > > > We found an issue in a customer environment where the HS2 crashed after a few > days and the Java core dump contained several thousands of truststore > reloader threads: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > We found the issue to be caused by a bug in Hadoop where the > TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in > Hadoop and the timeline server is running. I opened YARN-5309 which has more > details on the problem, and a patch was submitted a few days back. > In addition to the changes in Hadoop, there are a couple of Hive changes > required: > - ExecDriver needs to call jobclient.close() to trigger the clean-up of the > resources after the submitted job is done/failed > - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 > and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both > fixes are included in Hadoop 2.6.4. > However, since we also need to pick up YARN-5309, we need to wait for a new > release of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371944#comment-15371944 ] Thomas Friedrich edited comment on HIVE-14210 at 7/12/16 12:12 AM: --- Provided patch for ExecDriver.java to call jobclient.close() was (Author: tfriedr): Patch for ExecDriver.java > SSLFactory truststore reloader threads leaking in HiveServer2 > - > > Key: HIVE-14210 > URL: https://issues.apache.org/jira/browse/HIVE-14210 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 2.1.0 >Reporter: Thomas Friedrich > Attachments: HIVE-14210.patch > > > We found an issue in a customer environment where the HS2 crashed after a few > days and the Java core dump contained several thousands of truststore > reloader threads: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > We found the issue to be caused by a bug in Hadoop where the > TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in > Hadoop and the timeline server is running. I opened YARN-5309 which has more > details on the problem, and a patch was submitted a few days back. > In addition to the changes in Hadoop, there are a couple of Hive changes > required: > - ExecDriver needs to call jobclient.close() to trigger the clean-up of the > resources after the submitted job is done/failed > - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 > and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both > fixes are included in Hadoop 2.6.4. > However, since we also need to pick up YARN-5309, we need to wait for a new > release of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich reassigned HIVE-14210: --- Assignee: Thomas Friedrich > SSLFactory truststore reloader threads leaking in HiveServer2 > - > > Key: HIVE-14210 > URL: https://issues.apache.org/jira/browse/HIVE-14210 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 2.1.0 >Reporter: Thomas Friedrich >Assignee: Thomas Friedrich > Attachments: HIVE-14210.patch > > > We found an issue in a customer environment where the HS2 crashed after a few > days and the Java core dump contained several thousands of truststore > reloader threads: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > We found the issue to be caused by a bug in Hadoop where the > TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in > Hadoop and the timeline server is running. I opened YARN-5309 which has more > details on the problem, and a patch was submitted a few days back. > In addition to the changes in Hadoop, there are a couple of Hive changes > required: > - ExecDriver needs to call jobclient.close() to trigger the clean-up of the > resources after the submitted job is done/failed > - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 > and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both > fixes are included in Hadoop 2.6.4. > However, since we also need to pick up YARN-5309, we need to wait for a new > release of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich updated HIVE-14210: Attachment: HIVE-14210.patch Patch for ExecDriver.java > SSLFactory truststore reloader threads leaking in HiveServer2 > - > > Key: HIVE-14210 > URL: https://issues.apache.org/jira/browse/HIVE-14210 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 1.2.1, 2.0.0, 2.1.0 >Reporter: Thomas Friedrich > Attachments: HIVE-14210.patch > > > We found an issue in a customer environment where the HS2 crashed after a few > days and the Java core dump contained several thousands of truststore > reloader threads: > "Truststore reloader thread" #126 daemon prio=5 os_prio=0 > tid=0x7f680d2e3000 nid=0x98fd waiting on > condition [0x7f67e482c000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run > (ReloadingX509TrustManager.java:225) > at java.lang.Thread.run(Thread.java:745) > We found the issue to be caused by a bug in Hadoop where the > TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in > Hadoop and the timeline server is running. I opened YARN-5309 which has more > details on the problem, and a patch was submitted a few days back. > In addition to the changes in Hadoop, there are a couple of Hive changes > required: > - ExecDriver needs to call jobclient.close() to trigger the clean-up of the > resources after the submitted job is done/failed > - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 > and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both > fixes are included in Hadoop 2.6.4. > However, since we also need to pick up YARN-5309, we need to wait for a new > release of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables
[ https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371945#comment-15371945 ] Hive QA commented on HIVE-14137: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817191/HIVE-14137.5.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/473/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/473/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-473/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]] + export JAVA_HOME=/usr/java/jdk1.8.0_25 + JAVA_HOME=/usr/java/jdk1.8.0_25 + export PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-473/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive c790391..a61c351 master -> origin/master + git reset --hard HEAD HEAD is now at c790391 HIVE-14151: Use of USE_DEPRECATED_CLI environment variable does not work (Vihang Karajgaonkar, reviewed by Sergio Pena) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at a61c351 HIVE-14200: Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther Hagleitner) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12817191 - PreCommit-HIVE-MASTER-Build > Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty > tables > --- > > Key: HIVE-14137 > URL: https://issues.apache.org/jira/browse/HIVE-14137 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, > HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, HIVE-14137.patch > > > The following queries: > {code} > -- Setup > drop table if exists empty1; > create table empty1 (col1 bigint) stored as parquet tblproperties > ('parquet.compress'='snappy'); > drop table if exists empty2; > create table empty2 (col1 bigint, col2 bigint) stored as parquet > tblproperties ('parquet.compress'='snappy'); > drop table if exists empty3; > create table empty3 (col1 bigint) stored as parquet tblproperties > ('parquet.compress'='snappy'); > -- All empty HDFS directories. > -- Fails with [08S01]: Error while processing statement: FAILED: Execution > Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > -- Two empty HDFS directories. > -- Create an empty file in HDFS. > insert into empty1 select * from empty1 where false; > -- Same query fails with [08S01]: Error while processing statement: FAILED: > Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. > select empty1.col1 > from empty1 > inner join empty2 > on empty2.col1 = empty1.col1 > inner join empty3 > on empty3.col1 = empty2.col2; > -- One empty
[jira] [Commented] (HIVE-14204) Optimize loading dynamic partitions
[ https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371942#comment-15371942 ] Hive QA commented on HIVE-14204: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817168/HIVE-14204.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 114 failed/errored test(s), 10310 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_vectorization_missing_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_empty_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_dynamic_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_decimal org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapreduce2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge11 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_diff_fs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_incompat1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_incompat3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf_matchpath org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf_streaming org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_part
[jira] [Commented] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
[ https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371939#comment-15371939 ] Eugene Koifman commented on HIVE-14211: --- HIVE-13369 added a fix for the autoCommit=true mode. Multi-statment txns require a more complicated change. > AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files > etc > - > > Key: HIVE-14211 > URL: https://issues.apache.org/jira/browse/HIVE-14211 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
[ https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14211: -- Target Version/s: 1.3.0, 2.2.0 (was: 1.3.0, 2.2.0, 2.1.1) > AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files > etc > - > > Key: HIVE-14211 > URL: https://issues.apache.org/jira/browse/HIVE-14211 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
[ https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14211: -- Target Version/s: (was: 1.3.0, 2.2.0) > AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files > etc > - > > Key: HIVE-14211 > URL: https://issues.apache.org/jira/browse/HIVE-14211 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc
[ https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14211: -- Issue Type: Sub-task (was: Bug) Parent: HIVE-9675 > AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files > etc > - > > Key: HIVE-14211 > URL: https://issues.apache.org/jira/browse/HIVE-14211 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > > The JavaDoc on getAcidState() reads, in part: > "Note that because major compactions don't >preserve the history, we can't use a base directory that includes a >transaction id that we must exclude." > which is correct but there is nothing in the code that does this. > And if we detect a situation where txn X must be excluded but and there are > deltas that contain X, we'll have to abort the txn. This can't (reasonably) > happen with auto commit mode, but with multi statement txns it's possible. > Suppose some long running txn starts and lock in snapshot at 17 (HWM). An > hour later it decides to access some partition for which all txns < 20 (for > example) have already been compacted (i.e. GC'd). > == > Here is a more concrete example. Let's say the file for table A are as > follows and created in the order listed. > delta_4_4 > delta_5_5 > delta_4_5 > base_5 > delta_16_16 > delta_17_17 > base_17 (for example user ran major compaction) > let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 > and ExceptionList=<16> > Assume that all txns <= 20 commit. > Reader can't use base_17 because it has result of txn16. So it should chose > base_5 "TxnBase bestBase" in _getChildState()_. > Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and > delta_17_17 in _Directory_ object. This would represent acceptable snapshot > for such reader. > The issue is if at the same time the Cleaner process is running. It will see > everything with txnid<17 as obsolete. Then it will check lock manger state > and decide to delete (as there may not be any locks in LM for table A). The > order in which the files are deleted is undefined right now. It may delete > delta_16_16 and delta_17_17 first and right at this moment the read request > with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by > some multi-stmt txn that started some time ago. It acquires locks after the > Cleaner checks LM state and calls getAcidState(). This request will choose > base_5 but it won't see delta_16_16 and delta_17_17 and thus return the > snapshot w/o modifications made by those txns. > [This is not possible currently since we only support autoCommit=true. The > reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) > locks in the snapshot. The cleaner won't delete anything for a given > compaction (partition) if there are locks on it. Thus for duration of the > transaction, nothing will be deleted so it's safe to use base_5] > This is a subtle race condition but possible. > 1. So the safest thing to do to ensure correctness is to use the latest > base_x as the "best" and check against exceptions in ValidTxnList and throw > an exception if there is an exception <=x. > 2. A better option is to keep 2 exception lists: aborted and open and only > throw if there is an open txn <=x. Compaction throws away data from aborted > txns and thus there is no harm using base with aborted txns in its range. > 3. You could make each txn record the lowest open txn id at its start and > prevent the cleaner from cleaning anything delta with id range that includes > this open txn id for any txn that is still running. This has a drawback of > potentially delaying GC of old files for arbitrarily long periods. So this > should be a user config choice. The implementation is not trivial. > I would go with 1 now and do 2/3 together with multi-statement txn work. > Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of > the other -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13369: -- Description: The JavaDoc on getAcidState() reads, in part: "Note that because major compactions don't preserve the history, we can't use a base directory that includes a transaction id that we must exclude." which is correct but there is nothing in the code that does this. And if we detect a situation where txn X must be excluded but and there are deltas that contain X, we'll have to abort the txn. This can't (reasonably) happen with auto commit mode, but with multi statement txns it's possible. Suppose some long running txn starts and lock in snapshot at 17 (HWM). An hour later it decides to access some partition for which all txns < 20 (for example) have already been compacted (i.e. GC'd). == Here is a more concrete example. Let's say the file for table A are as follows and created in the order listed. delta_4_4 delta_5_5 delta_4_5 base_5 delta_16_16 delta_17_17 base_17 (for example user ran major compaction) let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16> Assume that all txns <= 20 commit. Reader can't use base_17 because it has result of txn16. So it should chose base_5 "TxnBase bestBase" in _getChildState()_. Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and delta_17_17 in _Directory_ object. This would represent acceptable snapshot for such reader. The issue is if at the same time the Cleaner process is running. It will see everything with txnid<17 as obsolete. Then it will check lock manger state and decide to delete (as there may not be any locks in LM for table A). The order in which the files are deleted is undefined right now. It may delete delta_16_16 and delta_17_17 first and right at this moment the read request with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some multi-stmt txn that started some time ago. It acquires locks after the Cleaner checks LM state and calls getAcidState(). This request will choose base_5 but it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o modifications made by those txns. [This is not possible currently since we only support autoCommit=true. The reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) locks in the snapshot. The cleaner won't delete anything for a given compaction (partition) if there are locks on it. Thus for duration of the transaction, nothing will be deleted so it's safe to use base_5] This is a subtle race condition but possible. 1. So the safest thing to do to ensure correctness is to use the latest base_x as the "best" and check against exceptions in ValidTxnList and throw an exception if there is an exception <=x. 2. A better option is to keep 2 exception lists: aborted and open and only throw if there is an open txn <=x. Compaction throws away data from aborted txns and thus there is no harm using base with aborted txns in its range. 3. You could make each txn record the lowest open txn id at its start and prevent the cleaner from cleaning anything delta with id range that includes this open txn id for any txn that is still running. This has a drawback of potentially delaying GC of old files for arbitrarily long periods. So this should be a user config choice. The implementation is not trivial. I would go with 1 now and do 2/3 together with multi-statement txn work. Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of the other was: The JavaDoc on getAcidState() reads, in part: "Note that because major compactions don't preserve the history, we can't use a base directory that includes a transaction id that we must exclude." which is correct but there is nothing in the code that does this. And if we detect a situation where txn X must be excluded but and there are deltas that contain X, we'll have to abort the txn. This can't (reasonably) happen with auto commit mode, but with multi statement txns it's possible. Suppose some long running txn starts and lock in snapshot at 17 (HWM). An hour later it decides to access some partition for which all txns < 20 (for example) have already been compacted (i.e. GC'd). == Here is a more concrete example. Let's say the file for table A are as follows and created in the order listed. delta_4_4 delta_5_5 delta_4_5 base_5 delta_16_16 delta_17_17 base_17 (for example user ran major compaction) let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16> Assume that all txns <= 20 commit. Reader can't use base_17 because it has result of txn16. So it should chose base_5 "TxnBase
[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13369: -- Description: The JavaDoc on getAcidState() reads, in part: "Note that because major compactions don't preserve the history, we can't use a base directory that includes a transaction id that we must exclude." which is correct but there is nothing in the code that does this. And if we detect a situation where txn X must be excluded but and there are deltas that contain X, we'll have to abort the txn. This can't (reasonably) happen with auto commit mode, but with multi statement txns it's possible. Suppose some long running txn starts and lock in snapshot at 17 (HWM). An hour later it decides to access some partition for which all txns < 20 (for example) have already been compacted (i.e. GC'd). == Here is a more concrete example. Let's say the file for table A are as follows and created in the order listed. delta_4_4 delta_5_5 delta_4_5 base_5 delta_16_16 delta_17_17 base_17 (for example user ran major compaction) let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16> Assume that all txns <= 20 commit. Reader can't use base_17 because it has result of txn16. So it should chose base_5 "TxnBase bestBase" in _getChildState()_. Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and delta_17_17 in _Directory_ object. This would represent acceptable snapshot for such reader. The issue is if at the same time the Cleaner process is running. It will see everything with txnid<17 as obsolete. Then it will check lock manger state and decide to delete (as there may not be any locks in LM for table A). The order in which the files are deleted is undefined right now. It may delete delta_16_16 and delta_17_17 first and right at this moment the read request with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some multi-stmt txn that started some time ago. It acquires locks after the Cleaner checks LM state and calls getAcidState(). This request will choose base_5 but it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o modifications made by those txns. [This is not possible currently since we only support autoCommit=true. The reason is the a query (0) opens txn (if appropriate), (1) acquires locks (2) locks in the snapshot. The cleaner won't delete anything for a given compaction (partition) if there are locks on it. Thus for duration of the transaction, nothing will be deleted so it's safe to use base_5] This is a subtle race condition but possible. 1. So the safest thing to do to ensure correctness is to use the latest base_x as the "best" and check against exceptions in ValidTxnList and throw an exception if there is an exception <=x. 2. A better option is to keep 2 exception lists: aborted and open and only throw if there is an open txn <=x. Compaction throws away data from aborted txns and thus there is no harm using base with aborted txns in its range. 3. You could make each txn record the lowest open txn id at its start and prevent the cleaner from cleaning anything delta with id range that includes this open txn id for any txn that is still running. This has a drawback of potentially delaying GC of old files for arbitrarily long periods. So this should be a user config choice. The implementation is not trivial. I would go with 1 now and do 2/3 together with multi-statement txn work. Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of the other was: The JavaDoc on getAcidState() reads, in part: "Note that because major compactions don't preserve the history, we can't use a base directory that includes a transaction id that we must exclude." which is correct but there is nothing in the code that does this. And if we detect a situation where txn X must be excluded but and there are deltas that contain X, we'll have to abort the txn. This can't (reasonably) happen with auto commit mode, but with multi statement txns it's possible. Suppose some long running txn starts and lock in snapshot at 17 (HWM). An hour later it decides to access some partition for which all txns < 20 (for example) have already been compacted (i.e. GC'd). == Here is a more concrete example. Let's say the file for table A are as follows and created in the order listed. delta_4_4 delta_5_5 delta_4_5 base_5 delta_16_16 delta_17_17 base_17 (for example user ran major compaction) let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16> Assume that all txns <= 20 commit. Reader can't use base_17 because it has result of txn16. So it should chose base_5 "TxnBase
[jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType
[ https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371921#comment-15371921 ] Owen O'Malley commented on HIVE-14004: -- Ok, I'm looking at this bug now too, since this seems like the important part of HIVE-13974. > Minor compaction produces ArrayIndexOutOfBoundsException: 7 in > SchemaEvolution.getFileType > -- > > Key: HIVE-14004 > URL: https://issues.apache.org/jira/browse/HIVE-14004 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Matt McCline > Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, > HIVE-14004.03.patch > > > Easiest way to repro is to add TestTxnCommands2 > {noformat} > @Test > public void testCompactWithDelete() throws Exception { > int[][] tableData = {{1,2},{3,4}}; > runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + > makeValuesClause(tableData)); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'"); > Worker t = new Worker(); > t.setThreadId((int) t.getId()); > t.setHiveConf(hiveConf); > AtomicBoolean stop = new AtomicBoolean(); > AtomicBoolean looped = new AtomicBoolean(); > stop.set(true); > t.init(stop, looped); > t.run(); > runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4"); > runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = > 2"); > runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'"); > t.run(); > } > {noformat} > to TestTxnCommands2 and run it. > Test won't fail but if you look > in target/tmp/log/hive.log for the following exception (from Minor > compaction). > {noformat} > 2016-06-09T18:36:39,071 WARN [Thread-190[]]: mapred.LocalJobRunner > (LocalJobRunner.java:run(560)) - job_local1233973168_0005 > java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > [hadoop-mapreduce-client-common-2.6.1.jar:?] > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) > ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630) > ~[classes/:?] > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609) > ~[classes/:?] > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > ~[hadoop-mapreduce-client-core-2.6.1.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > ~[hadoop-mapreduce-client-common-2.6.1.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[?:1.7.0_71] > at
[jira] [Commented] (HIVE-14200) Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0
[ https://issues.apache.org/jira/browse/HIVE-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371882#comment-15371882 ] Gopal V commented on HIVE-14200: Pushed to master, thanks [~hagleitn] > Tez: disable auto-reducer parallelism when reducer-count * > min.partition.factor < 1.0 > - > > Key: HIVE-14200 > URL: https://issues.apache.org/jira/browse/HIVE-14200 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V > Fix For: 2.2.0 > > Attachments: HIVE-14200.1.patch, HIVE-14200.2.patch, > HIVE-14200.3.patch > > > The min/max factors offer no real improvement when the fractions are > meaningless, for example when 0.25 * 2 is applied as the min. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None
[ https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371884#comment-15371884 ] Lefty Leverenz commented on HIVE-13159: --- Thanks Shannon! > TxnHandler should support datanucleus.connectionPoolingType = None > -- > > Key: HIVE-13159 > URL: https://issues.apache.org/jira/browse/HIVE-13159 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Sergey Shelukhin >Assignee: Alan Gates > Fix For: 2.2.0 > > Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch > > > Right now, one has to choose bonecp or dbcp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14200) Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0
[ https://issues.apache.org/jira/browse/HIVE-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14200: --- Resolution: Fixed Fix Version/s: 2.2.0 Release Note: Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther Hagleitner) Status: Resolved (was: Patch Available) > Tez: disable auto-reducer parallelism when reducer-count * > min.partition.factor < 1.0 > - > > Key: HIVE-14200 > URL: https://issues.apache.org/jira/browse/HIVE-14200 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V > Fix For: 2.2.0 > > Attachments: HIVE-14200.1.patch, HIVE-14200.2.patch, > HIVE-14200.3.patch > > > The min/max factors offer no real improvement when the fractions are > meaningless, for example when 0.25 * 2 is applied as the min. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None
[ https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shannon Ladymon updated HIVE-13159: --- Labels: (was: TODOC2.2) > TxnHandler should support datanucleus.connectionPoolingType = None > -- > > Key: HIVE-13159 > URL: https://issues.apache.org/jira/browse/HIVE-13159 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Sergey Shelukhin >Assignee: Alan Gates > Fix For: 2.2.0 > > Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch > > > Right now, one has to choose bonecp or dbcp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None
[ https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371879#comment-15371879 ] Shannon Ladymon commented on HIVE-13159: Doc done. > TxnHandler should support datanucleus.connectionPoolingType = None > -- > > Key: HIVE-13159 > URL: https://issues.apache.org/jira/browse/HIVE-13159 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Sergey Shelukhin >Assignee: Alan Gates > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch > > > Right now, one has to choose bonecp or dbcp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()
[ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371868#comment-15371868 ] Ashutosh Chauhan commented on HIVE-13704: - +1 > Don't call DistCp.execute() instead of DistCp.run() > --- > > Key: HIVE-13704 > URL: https://issues.apache.org/jira/browse/HIVE-13704 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.3.0, 2.0.0 >Reporter: Harsh J >Assignee: Sergio Peña >Priority: Critical > Attachments: HIVE-13704.1.patch > > > HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} > method runs added logic that drives the state of {{SimpleCopyListing}} which > runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. > When Hive ends up running DistCp for copy work (Between non matching FS or > between encrypted/non-encrypted zones, for sizes above a configured value) > this state not being set causes wrong paths to appear on the target (subdirs > named after the file, instead of just the file). > Hive should call DistCp's Tool {{run}} method and not the {{execute}} method > directly, to not skip the target exists flag that the {{setTargetPathExists}} > call would set: > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371856#comment-15371856 ] Shannon Ladymon commented on HIVE-14007: [~owen.omalley], why are some ORC parameters (*hive.orc.splits.include.file.footer, hive.orc.cache.stripe.details.size, hive.orc.compute.splits.num.threads, hive.exec.orc.split.strategy, hive.merge.orcfile.stripe.level, hive.exec.orc.base.delta.ratio*) not being removed from HiveConf? Are these not duplicates? > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371794#comment-15371794 ] Matt McCline commented on HIVE-13974: - *No it is not an excuse*. I'll defer my cussing. > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14144) Permanent functions are showing up in show functions, but describe says it doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371753#comment-15371753 ] Hive QA commented on HIVE-14144: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817085/HIVE-14144.01-branch-2.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 233 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file TestJdbcNonKrbSASLWithMiniKdc - did not produce a TEST-*.xml file TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file TestJdbcWithMiniKdc - did not produce a TEST-*.xml file TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371746#comment-15371746 ] Owen O'Malley edited comment on HIVE-13974 at 7/11/16 10:09 PM: {quote} No, the semantics of sameCategoryAndAttributes is different than equals. {quote} *Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a negative patch of 2MB "going out"?). In any case, do not add the new method. ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would you like a back port of that patch? {quote} There are 3 kinds of schema not 2. {quote} Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' schema is the one that the user asked for. I don't think we need anything else. {quote} About ORC-54 -- it is not practical right now in terms of time. {quote} ORC-54 is closer to going in. It has unit tests and I believe handles this as a sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch. {quote} Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for {quote} Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to maintain two versions of ORC with a forked code base is a bad thing. {quote} Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. {quote} That is not an excuse. Unit tests are MUCH more likely to be correct because the errors aren't hidden under layers of the execution engine. Being difficult to get right is why not having unit tests is unacceptable. was (Author: owen.omalley): {quote} No, the semantics of sameCategoryAndAttributes is different than equals. {quote} *Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a negative patch of 2MB "going out"?). In any case, do not add the new method. ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would you like a back port of that patch? {quote} There are 3 kinds of schema not 2. {quote} Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' schema is the one that the user asked for. I don't think we need anything else. {quote} About ORC-54 -- it is not practical right now in terms of time. {quote} ORC-54 is closer to going in. It has unit tests and I believe handles this as a sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch. {quote} Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to maintain two versions of ORC with a forked code base is a bad thing. {quote} Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. {quote} That is not an excuse. Unit tests are MUCH more likely to be correct because the errors aren't hidden under layers of the execution engine. > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371746#comment-15371746 ] Owen O'Malley commented on HIVE-13974: -- {quote} No, the semantics of sameCategoryAndAttributes is different than equals. {quote} *Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a negative patch of 2MB "going out"?). In any case, do not add the new method. ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would you like a back port of that patch? {quote} There are 3 kinds of schema not 2. {quote} Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' schema is the one that the user asked for. I don't think we need anything else. {quote} About ORC-54 -- it is not practical right now in terms of time. {quote} ORC-54 is closer to going in. It has unit tests and I believe handles this as a sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch. {quote} Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to maintain two versions of ORC with a forked code base is a bad thing. {quote} Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. {quote} That is not an excuse. Unit tests are MUCH more likely to be correct because the errors aren't hidden under layers of the execution engine. > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Owen O'Malley edited comment on HIVE-13974 at 7/11/16 10:07 PM: [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format, is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get Erie out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] was (Author: mmccline): {quote} No, the semantics of sameCategoryAndAttributes is different than equals. {quote} *Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a negative patch of 2MB "going out"?). In any case, do not add the new method. ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would you like a back port of that patch? {quote} There are 3 kinds of schema not 2. {quote} Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' schema is the one that the user asked for. I don't think we need anything else. {quote} About ORC-54 -- it is not practical right now in terms of time. {quote} ORC-54 is closer to going in. It has unit tests and I believe handles this as a sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch. {quote} Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. {quote} Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to maintain two versions of ORC with a forked code base is a bad thing. {code} Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. {code} That is not an excuse. Unit tests are MUCH more likely to be correct because the errors aren't hidden under layers of the execution engine. > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371732#comment-15371732 ] Owen O'Malley commented on HIVE-13974: -- Sorry, I seem to have edited your comment instead of leaving a new comment. Sorry! > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Owen O'Malley edited comment on HIVE-13974 at 7/11/16 10:02 PM: {quote} No, the semantics of sameCategoryAndAttributes is different than equals. {quote} *Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a negative patch of 2MB "going out"?). In any case, do not add the new method. ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would you like a back port of that patch? {quote} There are 3 kinds of schema not 2. {quote} Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' schema is the one that the user asked for. I don't think we need anything else. {quote} About ORC-54 -- it is not practical right now in terms of time. {quote} ORC-54 is closer to going in. It has unit tests and I believe handles this as a sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch. {quote} Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. {quote} Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to maintain two versions of ORC with a forked code base is a bad thing. {code} Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. {code} That is not an excuse. Unit tests are MUCH more likely to be correct because the errors aren't hidden under layers of the execution engine. was (Author: mmccline): [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. Is it the one being returned by the input file format (and the one that the needed column environment variable and PPD apply to), is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- There really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Matt McCline edited comment on HIVE-13974 at 7/11/16 9:59 PM: -- [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. Is it the one being returned by the input file format (and the one that the needed column environment variable and PPD apply to), is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- There really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] was (Author: mmccline): [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. Is it the one being returned by the input file format (and the one that the needed column environment variable and PPD apply to), is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get our release out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14209) Add some logging info for session and operation management
[ https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371708#comment-15371708 ] Chaoyu Tang commented on HIVE-14209: +1 > Add some logging info for session and operation management > -- > > Key: HIVE-14209 > URL: https://issues.apache.org/jira/browse/HIVE-14209 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-14209.1.patch > > > It's hard to track the session and operation open and close in multiple user > env. Add some logging info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14207) Strip HiveConf hidden params in webui conf
[ https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371704#comment-15371704 ] Thejas M Nair commented on HIVE-14207: -- +1 pending tests > Strip HiveConf hidden params in webui conf > -- > > Key: HIVE-14207 > URL: https://issues.apache.org/jira/browse/HIVE-14207 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-14207.2.patch, HIVE-14207.3.patch, HIVE-14207.patch > > > HIVE-12338 introduced a new web ui, which has a page that displays the > current HiveConf being used by HS2. However, before it displays that config, > it does not strip entries from it which are considered "hidden" conf > parameters, thus exposing those values from a web-ui for HS2. We need to add > stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14209) Add some logging info for session and operation management
[ https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14209: Status: Patch Available (was: Open) Path-1: trivial change to add op handler and session handler in the message. > Add some logging info for session and operation management > -- > > Key: HIVE-14209 > URL: https://issues.apache.org/jira/browse/HIVE-14209 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-14209.1.patch > > > It's hard to track the session and operation open and close in multiple user > env. Add some logging info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14209) Add some logging info for session and operation management
[ https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14209: Attachment: HIVE-14209.1.patch > Add some logging info for session and operation management > -- > > Key: HIVE-14209 > URL: https://issues.apache.org/jira/browse/HIVE-14209 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-14209.1.patch > > > It's hard to track the session and operation open and close in multiple user > env. Add some logging info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14152) datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling downgrade
[ https://issues.apache.org/jira/browse/HIVE-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371695#comment-15371695 ] Sushanth Sowmyan commented on HIVE-14152: - +1. > datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling > downgrade > -- > > Key: HIVE-14152 > URL: https://issues.apache.org/jira/browse/HIVE-14152 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Daniel Dai >Assignee: Thejas M Nair > Attachments: HIVE-14152.1.patch, HIVE-14152.2.patch, > HIVE-14152.3.patch > > > We see the following issue when downgrading metastore: > 1. Run some query using new tables > 2. Downgrade metastore > 3. Restart metastore will complain the new table does not exist > In particular, constaints tables does not exist in branch-1. If we run Hive 2 > and create a constraint, then downgrade metastore to Hive 1, datanucleus will > complain: > {code} > javax.jdo.JDOFatalUserException: Error starting up DataNucleus : a class > "org.apache.hadoop.hive.metastore.model.MConstraint" was listed as being > persisted previously in this datastore, yet the class wasnt found. Perhaps it > is used by a different DataNucleus-enabled application in this datastore, or > you have changed your class names. > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:528) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) > at java.security.AccessController.doPrivileged(Native Method) > at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:406) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:299) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:266) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:650) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:628) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:677) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:77) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:83) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5905) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5900) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6159) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6084) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Matt McCline edited comment on HIVE-13974 at 7/11/16 9:38 PM: -- [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. Is it the one being returned by the input file format (and the one that the needed column environment variable and PPD apply to), is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get our release out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] was (Author: mmccline): [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format (and the one that the needed column environment variable and PPD apply to), is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get our release out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Matt McCline edited comment on HIVE-13974 at 7/11/16 9:25 PM: -- [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format (and the one that the needed column environment variable and PPD apply to), is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get our release out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] was (Author: mmccline): [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format, is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get our release out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Matt McCline edited comment on HIVE-13974 at 7/11/16 9:24 PM: -- [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format, is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get our release out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] was (Author: mmccline): [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format, is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get Erie out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651 ] Matt McCline commented on HIVE-13974: - [~owen.omalley] Thanks for looking at this. No, the semantics of sameCategoryAndAttributes is different than equals. The TypeDescription.equals method compares (type) id and maximumId which does not work when there is an interior STRUCT column with a different number of columns. It makes it seem like a type conversion is needed when one is not needed and other parts of the code throw exceptions complaining "no need to convert a STRING to a STRING". There are 3 kinds of schema not 2. Part of the problem I'm trying to solve is the ambiguity at different parts of the code as to which schema is being used. It is the one being returned by the input file format, is it the schema being fed back to the ORC raw merger that included ACID columns, or is it the unconverted file schema. I don't care what the first 2 schemas are called as long as the names are distinct. Maybe the names could be reader, internalReader, and file. About ORC-54 -- it is not practical right now in terms of time. We have got to get Erie out the door. We have so little runway left. I've had 10+ JIRAs for weeks. Whenever I knock some down more appear. Also, there really needs to be a parallel HIVE JIRA for it and we must make sure name mapping is fully supported for HIVE. Given how *difficult* Schema Evolution has been I simply don't believe it will *just work* with ORC only unit tests. FYI [~hagleitn] [~ekoifman] > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14074) RELOAD FUNCTION should update dropped functions
[ https://issues.apache.org/jira/browse/HIVE-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdullah Yousufi updated HIVE-14074: Attachment: HIVE-14074.03.patch > RELOAD FUNCTION should update dropped functions > --- > > Key: HIVE-14074 > URL: https://issues.apache.org/jira/browse/HIVE-14074 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi > Fix For: 2.2.0 > > Attachments: HIVE-14074.01.patch, HIVE-14074.02.patch, > HIVE-14074.03.patch > > > Due to HIVE-2573, functions are stored in a per-session registry and only > loaded in from the metastore when hs2 or hive cli is started. Running RELOAD > FUNCTION in the current session is a way to force a reload of the functions, > so that changes that occurred in other running sessions will be reflected in > the current session, without having to restart the current session. However, > while functions that are created in other sessions will now appear in the > current session, functions that have been dropped are not removed from the > current session's registry. It seems inconsistent that created functions are > updated while dropped functions are not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14207) Strip HiveConf hidden params in webui conf
[ https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14207: Attachment: HIVE-14207.3.patch Updated patch. .3.patch now introduces new method to find a free port that is guaranteed to not be a port number that we specify. > Strip HiveConf hidden params in webui conf > -- > > Key: HIVE-14207 > URL: https://issues.apache.org/jira/browse/HIVE-14207 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-14207.2.patch, HIVE-14207.3.patch, HIVE-14207.patch > > > HIVE-12338 introduced a new web ui, which has a page that displays the > current HiveConf being used by HS2. However, before it displays that config, > it does not strip entries from it which are considered "hidden" conf > parameters, thus exposing those values from a web-ui for HS2. We need to add > stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14158) deal with derived column names
[ https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371627#comment-15371627 ] Ashutosh Chauhan commented on HIVE-14158: - We should avoid calling genOPTree() in case there is no masking/row filtering. Looks good other than that. Although failure authorization_create_temp_table looks relevant. > deal with derived column names > -- > > Key: HIVE-14158 > URL: https://issues.apache.org/jira/browse/HIVE-14158 > Project: Hive > Issue Type: Sub-task > Components: Security >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, > HIVE-14158.03.patch, HIVE-14158.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()
[ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-13704: --- Attachment: HIVE-13704.1.patch [~ashutoshc] Could you review this small patch? I just start using run() again. I run a test with the old code and the issue was happening as stated in this patch. When I changed to run(), then the problem got away. Btw, I reproduced the issue using: {{LOAD PATH INPATH '/tmp/dummytext.txt' OVERWRITE INTO TABLE dummytext;}} dummytext was in an encryption zone, and when I run it with the execute() method, then the final destination for the file was: {{/user/hive/warehouse/dummytext/dummytext.txt/dummytext.txt}}. It was creating a new subdirectory inside the table location. > Don't call DistCp.execute() instead of DistCp.run() > --- > > Key: HIVE-13704 > URL: https://issues.apache.org/jira/browse/HIVE-13704 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.3.0, 2.0.0 >Reporter: Harsh J >Assignee: Sergio Peña >Priority: Critical > Attachments: HIVE-13704.1.patch > > > HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} > method runs added logic that drives the state of {{SimpleCopyListing}} which > runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. > When Hive ends up running DistCp for copy work (Between non matching FS or > between encrypted/non-encrypted zones, for sizes above a configured value) > this state not being set causes wrong paths to appear on the target (subdirs > named after the file, instead of just the file). > Hive should call DistCp's Tool {{run}} method and not the {{execute}} method > directly, to not skip the target exists flag that the {{setTargetPathExists}} > call would set: > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()
[ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-13704: --- Status: Patch Available (was: Open) > Don't call DistCp.execute() instead of DistCp.run() > --- > > Key: HIVE-13704 > URL: https://issues.apache.org/jira/browse/HIVE-13704 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0, 1.3.0 >Reporter: Harsh J >Assignee: Sergio Peña >Priority: Critical > Attachments: HIVE-13704.1.patch > > > HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} > method runs added logic that drives the state of {{SimpleCopyListing}} which > runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. > When Hive ends up running DistCp for copy work (Between non matching FS or > between encrypted/non-encrypted zones, for sizes above a configured value) > this state not being set causes wrong paths to appear on the target (subdirs > named after the file, instead of just the file). > Hive should call DistCp's Tool {{run}} method and not the {{execute}} method > directly, to not skip the target exists flag that the {{setTargetPathExists}} > call would set: > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()
[ https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña reassigned HIVE-13704: -- Assignee: Sergio Peña > Don't call DistCp.execute() instead of DistCp.run() > --- > > Key: HIVE-13704 > URL: https://issues.apache.org/jira/browse/HIVE-13704 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.3.0, 2.0.0 >Reporter: Harsh J >Assignee: Sergio Peña >Priority: Critical > > HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} > method runs added logic that drives the state of {{SimpleCopyListing}} which > runs in the driver, and of {{CopyCommitter}} which runs in the job runtime. > When Hive ends up running DistCp for copy work (Between non matching FS or > between encrypted/non-encrypted zones, for sizes above a configured value) > this state not being set causes wrong paths to appear on the target (subdirs > named after the file, instead of just the file). > Hive should call DistCp's Tool {{run}} method and not the {{execute}} method > directly, to not skip the target exists flag that the {{setTargetPathExists}} > call would set: > https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14191) bump a new api version for ThriftJDBCBinarySerde changes
[ https://issues.apache.org/jira/browse/HIVE-14191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziyang Zhao updated HIVE-14191: --- Attachment: HIVE-14191.2.patch Created a new api version and generated thrift files > bump a new api version for ThriftJDBCBinarySerde changes > > > Key: HIVE-14191 > URL: https://issues.apache.org/jira/browse/HIVE-14191 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, JDBC >Affects Versions: 2.1.0 >Reporter: Ziyang Zhao >Assignee: Ziyang Zhao > Attachments: HIVE-14191.1.patch, HIVE-14191.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371565#comment-15371565 ] Prasanth Jayachandran commented on HIVE-14196: -- Also, when this patch was committed initially before HIVE-13617 the compilation stage will say no inputs supported. Now we have non-vector reader for llap the compilation says all inputs supported but may fail at runtime if it finds any complex types (which will soon be removed with proper fix). > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371561#comment-15371561 ] Sergey Shelukhin commented on HIVE-14196: - Well that causes incorrect explain. > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14207) Strip HiveConf hidden params in webui conf
[ https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371555#comment-15371555 ] Sushanth Sowmyan commented on HIVE-14207: - Sounds good. > Strip HiveConf hidden params in webui conf > -- > > Key: HIVE-14207 > URL: https://issues.apache.org/jira/browse/HIVE-14207 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-14207.2.patch, HIVE-14207.patch > > > HIVE-12338 introduced a new web ui, which has a page that displays the > current HiveConf being used by HS2. However, before it displays that config, > it does not strip entries from it which are considered "hidden" conf > parameters, thus exposing those values from a web-ui for HS2. We need to add > stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371556#comment-15371556 ] Prasanth Jayachandran commented on HIVE-14196: -- Exactly. It's handled at runtime when record reader is created instead of compilation stage. > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-13191) DummyTable map joins mix up columns between tables
[ https://issues.apache.org/jira/browse/HIVE-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reopened HIVE-13191: - [~jcamachorodriguez] I can still repro this on master. Though only with MiniTezCliDriver. Seems like works fine for CliDriver. > DummyTable map joins mix up columns between tables > -- > > Key: HIVE-13191 > URL: https://issues.apache.org/jira/browse/HIVE-13191 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Pengcheng Xiong > Attachments: tez.q > > > {code} > SELECT > a.key, > a.a_one, > b.b_one, > a.a_zero, > b.b_zero > FROM > ( > SELECT > 11 key, > 0 confuse_you, > 1 a_one, > 0 a_zero > ) a > LEFT JOIN > ( > SELECT > 11 key, > 0 confuse_you, > 1 b_one, > 0 b_zero > ) b > ON a.key = b.key > ; > 11 1 0 0 1 > {code} > This should be 11, 1, 1, 0, 0 instead. > Disabling map-joins & using shuffle-joins returns the right result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14135) beeline output not formatted correctly for large column widths
[ https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371544#comment-15371544 ] Vihang Karajgaonkar commented on HIVE-14135: Updated the patch with a testcase to handle columns with large widths. Changes the default column width from 15 to 50 characters. > beeline output not formatted correctly for large column widths > -- > > Key: HIVE-14135 > URL: https://issues.apache.org/jira/browse/HIVE-14135 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, > longKeyValues.txt, output_after.txt, output_before.txt > > > If the column width is too large then beeline uses the maximum column width > when normalizing all the column widths. In order to reproduce the issue, run > set -v; > Once the configuration variables is classpath which can be extremely large > width (41k characters in my environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths
[ https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-14135: --- Attachment: HIVE-14135.2.patch > beeline output not formatted correctly for large column widths > -- > > Key: HIVE-14135 > URL: https://issues.apache.org/jira/browse/HIVE-14135 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, > longKeyValues.txt, output_after.txt, output_before.txt > > > If the column width is too large then beeline uses the maximum column width > when normalizing all the column widths. In order to reproduce the issue, run > set -v; > Once the configuration variables is classpath which can be extremely large > width (41k characters in my environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths
[ https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-14135: --- Status: Patch Available (was: Open) > beeline output not formatted correctly for large column widths > -- > > Key: HIVE-14135 > URL: https://issues.apache.org/jira/browse/HIVE-14135 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, > longKeyValues.txt, output_after.txt, output_before.txt > > > If the column width is too large then beeline uses the maximum column width > when normalizing all the column widths. In order to reproduce the issue, run > set -v; > Once the configuration variables is classpath which can be extremely large > width (41k characters in my environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths
[ https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-14135: --- Status: Open (was: Patch Available) > beeline output not formatted correctly for large column widths > -- > > Key: HIVE-14135 > URL: https://issues.apache.org/jira/browse/HIVE-14135 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, > longKeyValues.txt, output_after.txt, output_before.txt > > > If the column width is too large then beeline uses the maximum column width > when normalizing all the column widths. In order to reproduce the issue, run > set -v; > Once the configuration variables is classpath which can be extremely large > width (41k characters in my environment). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14159) sorting of tuple array using multiple field[s]
[ https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371536#comment-15371536 ] Hive QA commented on HIVE-14159: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817103/HIVE-14159.4.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10318 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/469/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/469/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-469/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12817103 - PreCommit-HIVE-MASTER-Build > sorting of tuple array using multiple field[s] > -- > > Key: HIVE-14159 > URL: https://issues.apache.org/jira/browse/HIVE-14159 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Simanchal Das >Assignee: Simanchal Das > Labels: patch > Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, > HIVE-14159.3.patch, HIVE-14159.4.patch > > > Problem Statement: > When we are working with complex structure of data like avro. > Most of the times we are encountering array contains multiple tuples and each > tuple have struct schema. > Suppose here struct schema is like below: > {noformat} > { > "name": "employee", > "type": [{ > "type": "record", > "name": "Employee", > "namespace": "com.company.Employee", > "fields": [{ > "name": "empId", > "type": "int" > }, { > "name": "empName", > "type": "string" > }, { > "name": "age", > "type": "int" > }, { > "name": "salary", > "type": "double" > }] > }] > } > {noformat} > Then while running our hive query complex array looks like array of employee > objects. > {noformat} > Example: > //(array>) > > Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)] > {noformat} > When we are implementing business use cases day to day life we are > encountering problems like sorting a tuple array by specific field[s] like > empId,name,salary,etc by ASC or DESC order. > Proposal: > I have developed a udf 'sort_array_by' which will sort a tuple array by one > or more fields in ASC or DESC order provided by user ,default is ascending > order . > {noformat} > Example: > 1.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC"); > output: > array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)] > > 2.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > 3.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9928) Empty buckets are not created on non-HDFS file system
[ https://issues.apache.org/jira/browse/HIVE-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9928: --- Resolution: Duplicate Assignee: Ankit Kamboj Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Pushed to master * branch-2.1 > Empty buckets are not created on non-HDFS file system > - > > Key: HIVE-9928 > URL: https://issues.apache.org/jira/browse/HIVE-9928 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Ankit Kamboj >Assignee: Ankit Kamboj > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-9928.1.patch > > > Bucketing should create empty buckets on the destination file system. There > is a problem in that logic that it uses path.toUri().getPath().toString() to > find the relevant path. But this chain of methods always resolves to relative > path which ends up creating the empty buckets in hdfs rather than actual > destination fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14175) Fix creating buckets without scheme information
[ https://issues.apache.org/jira/browse/HIVE-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14175: Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Pushed to master & branch-2.1 > Fix creating buckets without scheme information > --- > > Key: HIVE-14175 > URL: https://issues.apache.org/jira/browse/HIVE-14175 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1, 2.1.0 >Reporter: Thomas Poepping >Assignee: Thomas Poepping > Labels: patch > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14175.2.patch, HIVE-14175.patch, HIVE-14175.patch > > > If a table is created on a non-default filesystem (i.e. non-hdfs), the empty > files will be created with incorrect scheme information. This patch extracts > the scheme and authority information for the new paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14172) LLAP: force evict blocks by size to handle memory fragmentation
[ https://issues.apache.org/jira/browse/HIVE-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371527#comment-15371527 ] Sergey Shelukhin commented on HIVE-14172: - Hmm, I thought I commented here. Probably commented on some wrong JIRA 0_o [~gopalv] ping? Failures are either known or caused by NN being in safe mode. > LLAP: force evict blocks by size to handle memory fragmentation > --- > > Key: HIVE-14172 > URL: https://issues.apache.org/jira/browse/HIVE-14172 > Project: Hive > Issue Type: Bug >Reporter: Nita Dembla >Assignee: Sergey Shelukhin > Attachments: HIVE-14172.01.patch, HIVE-14172.patch > > > In the long run, we should replace buddy allocator with a better scheme. For > now do a workaround for fragmentation that cannot be easily resolved. It's > still not perfect but works for practical ORC cases, where we have the > default size and smaller blocks, rather than large allocations having trouble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14152) datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling downgrade
[ https://issues.apache.org/jira/browse/HIVE-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371514#comment-15371514 ] Thejas M Nair commented on HIVE-14152: -- The test failures are unrelated, they happen in runs with other jiras as well. > datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling > downgrade > -- > > Key: HIVE-14152 > URL: https://issues.apache.org/jira/browse/HIVE-14152 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Daniel Dai >Assignee: Thejas M Nair > Attachments: HIVE-14152.1.patch, HIVE-14152.2.patch, > HIVE-14152.3.patch > > > We see the following issue when downgrading metastore: > 1. Run some query using new tables > 2. Downgrade metastore > 3. Restart metastore will complain the new table does not exist > In particular, constaints tables does not exist in branch-1. If we run Hive 2 > and create a constraint, then downgrade metastore to Hive 1, datanucleus will > complain: > {code} > javax.jdo.JDOFatalUserException: Error starting up DataNucleus : a class > "org.apache.hadoop.hive.metastore.model.MConstraint" was listed as being > persisted previously in this datastore, yet the class wasnt found. Perhaps it > is used by a different DataNucleus-enabled application in this datastore, or > you have changed your class names. > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:528) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) > at java.security.AccessController.doPrivileged(Native Method) > at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:406) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:299) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:266) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:650) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:628) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:677) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:77) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:83) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5905) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5900) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6159) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6084) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at
[jira] [Updated] (HIVE-11402) HS2 - add an option to disallow parallel query execution within a single Session
[ https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11402: Attachment: HIVE-11402.03.patch The handling for async ops, as well as some refactoring. Thanks for the pointer! > HS2 - add an option to disallow parallel query execution within a single > Session > > > Key: HIVE-11402 > URL: https://issues.apache.org/jira/browse/HIVE-11402 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Thejas M Nair >Assignee: Sergey Shelukhin > Attachments: HIVE-11402.01.patch, HIVE-11402.02.patch, > HIVE-11402.03.patch, HIVE-11402.patch > > > HiveServer2 currently allows concurrent queries to be run in a single > session. However, every HS2 session has an associated SessionState object, > and the use of SessionState in many places assumes that only one thread is > using it, ie it is not thread safe. > There are many places where SesssionState thread safety needs to be > addressed, and until then we should serialize all query execution for a > single HS2 session. -This problem can become more visible with HIVE-4239 now > allowing parallel query compilation.- > Note that running queries in parallel for single session is not > straightforward with jdbc, you need to spawn another thread as the > Statement.execute calls are blocking. I believe ODBC has non blocking query > execution API, and Hue is another well known application that shares sessions > for all queries that a user runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14152) datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling downgrade
[ https://issues.apache.org/jira/browse/HIVE-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371509#comment-15371509 ] Thejas M Nair commented on HIVE-14152: -- [~sushanth] Can you please review this change ? > datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling > downgrade > -- > > Key: HIVE-14152 > URL: https://issues.apache.org/jira/browse/HIVE-14152 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Daniel Dai >Assignee: Thejas M Nair > Attachments: HIVE-14152.1.patch, HIVE-14152.2.patch, > HIVE-14152.3.patch > > > We see the following issue when downgrading metastore: > 1. Run some query using new tables > 2. Downgrade metastore > 3. Restart metastore will complain the new table does not exist > In particular, constaints tables does not exist in branch-1. If we run Hive 2 > and create a constraint, then downgrade metastore to Hive 1, datanucleus will > complain: > {code} > javax.jdo.JDOFatalUserException: Error starting up DataNucleus : a class > "org.apache.hadoop.hive.metastore.model.MConstraint" was listed as being > persisted previously in this datastore, yet the class wasnt found. Perhaps it > is used by a different DataNucleus-enabled application in this datastore, or > you have changed your class names. > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:528) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) > at java.security.AccessController.doPrivileged(Native Method) > at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:406) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:299) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:266) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:650) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:628) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:677) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:77) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:83) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5905) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5900) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6159) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6084) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at
[jira] [Updated] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications
[ https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Sharma updated HIVE-13966: Attachment: HIVE-13966.1.patch Attaching the initial patch. > DbNotificationListener: can loose DDL operation notifications > - > > Key: HIVE-13966 > URL: https://issues.apache.org/jira/browse/HIVE-13966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Nachiket Vaidya >Assignee: Rahul Sharma >Priority: Critical > Attachments: HIVE-13966.1.patch > > > The code for each API in HiveMetaStore.java is like this: > 1. openTransaction() > 2. -- operation-- > 3. commit() or rollback() based on result of the operation. > 4. add entry to notification log (unconditionally) > If the operation is failed (in step 2), we still add entry to notification > log. Found this issue in testing. > It is still ok as this is the case of false positive. > If the operation is successful and adding to notification log failed, the > user will get an MetaException. It will not rollback the operation, as it is > already committed. We need to handle this case so that we will not have false > negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications
[ https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Sharma updated HIVE-13966: Status: Patch Available (was: In Progress) > DbNotificationListener: can loose DDL operation notifications > - > > Key: HIVE-13966 > URL: https://issues.apache.org/jira/browse/HIVE-13966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Nachiket Vaidya >Assignee: Rahul Sharma >Priority: Critical > Attachments: HIVE-13966.1.patch > > > The code for each API in HiveMetaStore.java is like this: > 1. openTransaction() > 2. -- operation-- > 3. commit() or rollback() based on result of the operation. > 4. add entry to notification log (unconditionally) > If the operation is failed (in step 2), we still add entry to notification > log. Found this issue in testing. > It is still ok as this is the case of false positive. > If the operation is successful and adding to notification log failed, the > user will get an MetaException. It will not rollback the operation, as it is > already committed. We need to handle this case so that we will not have false > negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file
[ https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-13369: -- Description: The JavaDoc on getAcidState() reads, in part: "Note that because major compactions don't preserve the history, we can't use a base directory that includes a transaction id that we must exclude." which is correct but there is nothing in the code that does this. And if we detect a situation where txn X must be excluded but and there are deltas that contain X, we'll have to abort the txn. This can't (reasonably) happen with auto commit mode, but with multi statement txns it's possible. Suppose some long running txn starts and lock in snapshot at 17 (HWM). An hour later it decides to access some partition for which all txns < 20 (for example) have already been compacted (i.e. GC'd). == Here is a more concrete example. Let's say the file for table A are as follows and created in the order listed. delta_4_4 delta_5_5 delta_4_5 base_5 delta_16_16 delta_17_17 base_17 (for example user ran major compaction) let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16> Assume that all txns <= 20 commit. Reader can't use base_17 because it has result of txn16. So it should chose base_5 "TxnBase bestBase" in _getChildState()_. Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and delta_17_17 in _Directory_ object. This would represent acceptable snapshot for such reader. The issue is if at the same time the Cleaner process is running. It will see everything with txnid<17 as obsolete. Then it will check lock manger state and decide to delete (as there may not be any locks in LM for table A). The order in which the files are deleted is undefined right now. It may delete delta_16_16 and delta_17_17 first and right at this moment the read request with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some multi-stmt txn that started some time ago. It acquires locks after the Cleaner checks LM state and calls getAcidState(). This request will choose base_5 but it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o modifications made by those txns. [This is not possible currently since we only support autoCommit=true. The reason is the a query (1) acquires locks (2) locks in the snapshot. The cleaner won't delete anything for a given compaction (partition) if there are locks on it. Thus for duration of the transaction, nothing will be deleted so it's safe to use base_5] This is a subtle race condition but possible. 1. So the safest thing to do to ensure correctness is to use the latest base_x as the "best" and check against exceptions in ValidTxnList and throw an exception if there is an exception <=x. 2. A better option is to keep 2 exception lists: aborted and open and only throw if there is an open txn <=x. Compaction throws away data from aborted txns and thus there is no harm using base with aborted txns in its range. 3. You could make each txn record the lowest open txn id at its start and prevent the cleaner from cleaning anything delta with id range that includes this open txn id for any txn that is still running. This has a drawback of potentially delaying GC of old files for arbitrarily long periods. So this should be a user config choice. The implementation is not trivial. I would go with 1 now and do 2/3 together with multi-statement txn work. Side note: if 2 deltas have overlapping ID range, then 1 must be a subset of the other was: The JavaDoc on getAcidState() reads, in part: "Note that because major compactions don't preserve the history, we can't use a base directory that includes a transaction id that we must exclude." which is correct but there is nothing in the code that does this. And if we detect a situation where txn X must be excluded but and there are deltas that contain X, we'll have to abort the txn. This can't (reasonably) happen with auto commit mode, but with multi statement txns it's possible. Suppose some long running txn starts and lock in snapshot at 17 (HWM). An hour later it decides to access some partition for which all txns < 20 (for example) have already been compacted (i.e. GC'd). == Here is a more concrete example. Let's say the file for table A are as follows and created in the order listed. delta_4_4 delta_5_5 delta_4_5 base_5 delta_16_16 delta_17_17 base_17 (for example user ran major compaction) let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 and ExceptionList=<16> Assume that all txns <= 20 commit. Reader can't use base_17 because it has result of txn16. So it should chose base_5 "TxnBase bestBase" in _getChildState()_. Then
[jira] [Updated] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work
[ https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14151: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) > Use of USE_DEPRECATED_CLI environment variable does not work > > > Key: HIVE-14151 > URL: https://issues.apache.org/jira/browse/HIVE-14151 > Project: Hive > Issue Type: Bug > Components: CLI >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Fix For: 2.2.0 > > Attachments: HIVE-14151.1.patch > > > According to > https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline > if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it > doesn't seem to work. > In order to reproduce this issue: > {noformat} > $ echo $USE_DEPRECATED_CLI > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > $ > $ export USE_DEPRECATED_CLI=false > $ echo $USE_DEPRECATED_CLI > false > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work
[ https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371451#comment-15371451 ] Vihang Karajgaonkar commented on HIVE-14151: Hi [~spena] I have tested the change manually it should work fine. There are anyways no tests which run cli.sh so I guess you can go ahead and commit it if it looks good to you. Thanks! > Use of USE_DEPRECATED_CLI environment variable does not work > > > Key: HIVE-14151 > URL: https://issues.apache.org/jira/browse/HIVE-14151 > Project: Hive > Issue Type: Bug > Components: CLI >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14151.1.patch > > > According to > https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline > if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it > doesn't seem to work. > In order to reproduce this issue: > {noformat} > $ echo $USE_DEPRECATED_CLI > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > $ > $ export USE_DEPRECATED_CLI=false > $ echo $USE_DEPRECATED_CLI > false > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371447#comment-15371447 ] Sergey Shelukhin edited comment on HIVE-14196 at 7/11/16 7:19 PM: -- Hmm... seems like LLAP IO changes in out files are incorrect? I wonder if it's because it's handled at split generation stage, not compilation stage. was (Author: sershe): Hmm... seems like LLAP IO changes in out files are incorrect? > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371447#comment-15371447 ] Sergey Shelukhin commented on HIVE-14196: - Hmm... seems like LLAP IO changes in out files are incorrect? > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371431#comment-15371431 ] Sahil Takiar commented on HIVE-13930: - Hey Everyone, Sergio and I uploaded a new Spark tar-ball that is built against Hadoop 2.6.0 (the previous version was built against Hadoop 2.4.0). This new version should work, although there is a chance there may be some problems since it was built against 2.6.0 and not 2.7.2. Can someone re-trigger the Hive QA test to see if the {{TestSparkCliDriver}} tests are now passing? We couldn't compile against Hadoop 2.7.2 because Spark 1.6.0 doesn't provide an option of compiling against Hadoop 2.7+ (we are working on fixing this). In the future, we want to remove the dependency on the Spark installation tar-ball, we are currently thinking of the best way to do so. > upgrade Hive to latest Hadoop version > - > > Key: HIVE-13930 > URL: https://issues.apache.org/jira/browse/HIVE-13930 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, > HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work
[ https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371417#comment-15371417 ] Sergio Peña commented on HIVE-14151: I don't think there is a test case that executes cli.sh, and tests were not executed either way. [~vihangk1] Should I commit this patch now? > Use of USE_DEPRECATED_CLI environment variable does not work > > > Key: HIVE-14151 > URL: https://issues.apache.org/jira/browse/HIVE-14151 > Project: Hive > Issue Type: Bug > Components: CLI >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14151.1.patch > > > According to > https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline > if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it > doesn't seem to work. > In order to reproduce this issue: > {noformat} > $ echo $USE_DEPRECATED_CLI > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > $ > $ export USE_DEPRECATED_CLI=false > $ echo $USE_DEPRECATED_CLI > false > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14208) Outer MapJoin uses key of outer input and Converter
[ https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14208: --- Summary: Outer MapJoin uses key of outer input and Converter (was: Outer MapJoin uses key data of outer input and Converter) > Outer MapJoin uses key of outer input and Converter > --- > > Key: HIVE-14208 > URL: https://issues.apache.org/jira/browse/HIVE-14208 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Priority: Minor > > Consider an left outer MapJoin operator. OI for the outputs are created from > outer and inner side from their inputs. However, when there is a match in the > join, the data for the key is always taken from the outer side (as it is done > currently). Thus, we need to apply the Converter logic on the data to get the > correct type. > This issue is to explore whether a better solution would be to use the key > from correct inputs of the join to eliminate need of Converters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work
[ https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371412#comment-15371412 ] Sergio Peña commented on HIVE-14151: Looks good. +1 > Use of USE_DEPRECATED_CLI environment variable does not work > > > Key: HIVE-14151 > URL: https://issues.apache.org/jira/browse/HIVE-14151 > Project: Hive > Issue Type: Bug > Components: CLI >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-14151.1.patch > > > According to > https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline > if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it > doesn't seem to work. > In order to reproduce this issue: > {noformat} > $ echo $USE_DEPRECATED_CLI > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > $ > $ export USE_DEPRECATED_CLI=false > $ echo $USE_DEPRECATED_CLI > false > $ ./hive > Hive-on-MR is deprecated in Hive 2 and may not be available in the future > versions. Consider using a different execution engine (i.e. tez, spark) or > using Hive 1.X releases. > hive> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371407#comment-15371407 ] Prasanth Jayachandran commented on HIVE-14196: -- The test failures seems unrelated btw. > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14141) Fix for HIVE-14062 breaks indirect urls in beeline
[ https://issues.apache.org/jira/browse/HIVE-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14141: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) > Fix for HIVE-14062 breaks indirect urls in beeline > -- > > Key: HIVE-14141 > URL: https://issues.apache.org/jira/browse/HIVE-14141 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14141.1.patch > > > Looks like the patch for HIVE-14062 breaks indirect urls which uses > environment variables to get the url in beeline > In order to reproduce this issue: > {noformat} > $ export BEELINE_URL_DEFAULT="jdbc:hive2://localhost:1" > $ beeline -u default > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14141) Fix for HIVE-14062 breaks indirect urls in beeline
[ https://issues.apache.org/jira/browse/HIVE-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371403#comment-15371403 ] Sergio Peña commented on HIVE-14141: Thanks. Looks good to me +1 > Fix for HIVE-14062 breaks indirect urls in beeline > -- > > Key: HIVE-14141 > URL: https://issues.apache.org/jira/browse/HIVE-14141 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: HIVE-14141.1.patch > > > Looks like the patch for HIVE-14062 breaks indirect urls which uses > environment variables to get the url in beeline > In order to reproduce this issue: > {noformat} > $ export BEELINE_URL_DEFAULT="jdbc:hive2://localhost:1" > $ beeline -u default > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371400#comment-15371400 ] Prasanth Jayachandran commented on HIVE-14196: -- readAllColumns is just used in debug logging. Updated patch to return false immediately when first unsupported type is found. > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14208) MapJoin uses key of outer input and Converter
[ https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14208: --- Summary: MapJoin uses key of outer input and Converter (was: Outer MapJoin uses key of outer input and Converter) > MapJoin uses key of outer input and Converter > - > > Key: HIVE-14208 > URL: https://issues.apache.org/jira/browse/HIVE-14208 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Priority: Minor > > Consider an left outer MapJoin operator. OI for the outputs are created from > outer and inner side from their inputs. However, when there is a match in the > join, the data for the key is always taken from the outer side (as it is done > currently). Thus, we need to apply the Converter logic on the data to get the > correct type. > This issue is to explore whether a better solution would be to use the key > from correct inputs of the join to eliminate need of Converters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14208) MapJoin uses key of outer input and Converter
[ https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371404#comment-15371404 ] Jesus Camacho Rodriguez commented on HIVE-14208: Cc [~ashutoshc] > MapJoin uses key of outer input and Converter > - > > Key: HIVE-14208 > URL: https://issues.apache.org/jira/browse/HIVE-14208 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Priority: Minor > > Consider an left outer MapJoin operator. OI for the outputs are created from > outer and inner side from their inputs. However, when there is a match in the > join, the data for the key is always taken from the outer side (as it is done > currently). Thus, we need to apply the Converter logic on the data to get the > correct type. > This issue is to explore whether a better solution would be to use the key > from correct inputs of the join to eliminate need of Converters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14208) Outer MapJoin uses key data of outer input and Converter
[ https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14208: --- Issue Type: Improvement (was: Bug) > Outer MapJoin uses key data of outer input and Converter > > > Key: HIVE-14208 > URL: https://issues.apache.org/jira/browse/HIVE-14208 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez > > Consider an left outer MapJoin operator. OI for the outputs are created from > outer and inner side from their inputs. However, when there is a match in the > join, the data for the key is always taken from the outer side (as it is done > currently). Thus, we need to apply the Converter logic on the data to get the > correct type. > This issue is to explore whether a better solution would be to use the key > from correct inputs of the join to eliminate need of Converters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14208) Outer MapJoin uses key data of outer input and Converter
[ https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14208: --- Priority: Minor (was: Major) > Outer MapJoin uses key data of outer input and Converter > > > Key: HIVE-14208 > URL: https://issues.apache.org/jira/browse/HIVE-14208 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Priority: Minor > > Consider an left outer MapJoin operator. OI for the outputs are created from > outer and inner side from their inputs. However, when there is a match in the > join, the data for the key is always taken from the outer side (as it is done > currently). Thus, we need to apply the Converter logic on the data to get the > correct type. > This issue is to explore whether a better solution would be to use the key > from correct inputs of the join to eliminate need of Converters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14208) Outer MapJoin uses key data of outer input and Converter
[ https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14208: --- Component/s: Query Processor > Outer MapJoin uses key data of outer input and Converter > > > Key: HIVE-14208 > URL: https://issues.apache.org/jira/browse/HIVE-14208 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Priority: Minor > > Consider an left outer MapJoin operator. OI for the outputs are created from > outer and inner side from their inputs. However, when there is a match in the > join, the data for the key is always taken from the outer side (as it is done > currently). Thus, we need to apply the Converter logic on the data to get the > correct type. > This issue is to explore whether a better solution would be to use the key > from correct inputs of the join to eliminate need of Converters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9928) Empty buckets are not created on non-HDFS file system
[ https://issues.apache.org/jira/browse/HIVE-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371397#comment-15371397 ] Rob Leidle commented on HIVE-9928: -- Can be closed as a duplicate of HIVE-14175. > Empty buckets are not created on non-HDFS file system > - > > Key: HIVE-9928 > URL: https://issues.apache.org/jira/browse/HIVE-9928 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Ankit Kamboj > Attachments: HIVE-9928.1.patch > > > Bucketing should create empty buckets on the destination file system. There > is a problem in that logic that it uses path.toUri().getPath().toString() to > find the relevant path. But this chain of methods always resolves to relative > path which ends up creating the empty buckets in hdfs rather than actual > destination fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14196) Disable LLAP IO when complex types are involved
[ https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14196: - Attachment: HIVE-14196.2.patch Addressed review comments > Disable LLAP IO when complex types are involved > --- > > Key: HIVE-14196 > URL: https://issues.apache.org/jira/browse/HIVE-14196 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch > > > Let's exclude vector_complex_* tests added for llap which is currently broken > and fails in all test runs. We can re-enable it with HIVE-14089 patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14207) Strip HiveConf hidden params in webui conf
[ https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371394#comment-15371394 ] Thejas M Nair commented on HIVE-14207: -- On 2.patch - I like the idea of using custom port to enable HS2 web UI instead of having to turn off the in.test config. However, the MetaStoreUtils.findFreePort() call has a non zero probability to return default port as the available one. Can you also skip the default port if thats what is returned ? > Strip HiveConf hidden params in webui conf > -- > > Key: HIVE-14207 > URL: https://issues.apache.org/jira/browse/HIVE-14207 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-14207.2.patch, HIVE-14207.patch > > > HIVE-12338 introduced a new web ui, which has a page that displays the > current HiveConf being used by HS2. However, before it displays that config, > it does not strip entries from it which are considered "hidden" conf > parameters, thus exposing those values from a web-ui for HS2. We need to add > stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14175) Fix creating buckets without scheme information
[ https://issues.apache.org/jira/browse/HIVE-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371391#comment-15371391 ] Thomas Poepping commented on HIVE-14175: Never mind, you pushed this [~ashutoshc]? thanks! > Fix creating buckets without scheme information > --- > > Key: HIVE-14175 > URL: https://issues.apache.org/jira/browse/HIVE-14175 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.1, 2.1.0 >Reporter: Thomas Poepping >Assignee: Thomas Poepping > Labels: patch > Attachments: HIVE-14175.2.patch, HIVE-14175.patch, HIVE-14175.patch > > > If a table is created on a non-default filesystem (i.e. non-hdfs), the empty > files will be created with incorrect scheme information. This patch extracts > the scheme and authority information for the new paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13191) DummyTable map joins mix up columns between tables
[ https://issues.apache.org/jira/browse/HIVE-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-13191. Resolution: Duplicate Closed as duplicate of HIVE-14027. > DummyTable map joins mix up columns between tables > -- > > Key: HIVE-13191 > URL: https://issues.apache.org/jira/browse/HIVE-13191 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Pengcheng Xiong > Attachments: tez.q > > > {code} > SELECT > a.key, > a.a_one, > b.b_one, > a.a_zero, > b.b_zero > FROM > ( > SELECT > 11 key, > 0 confuse_you, > 1 a_one, > 0 a_zero > ) a > LEFT JOIN > ( > SELECT > 11 key, > 0 confuse_you, > 1 b_one, > 0 b_zero > ) b > ON a.key = b.key > ; > 11 1 0 0 1 > {code} > This should be 11, 1, 1, 0, 0 instead. > Disabling map-joins & using shuffle-joins returns the right result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371388#comment-15371388 ] Owen O'Malley commented on HIVE-13974: -- First pass comments on the ORC changes: * You *must* include unit tests in the ORC module for changes there. * Don't move checkAcidSchema around and certainly don't make it a public API. We should probably have ReaderImpl pass a boolean to the constructor of SchemaEvolution saying that the file is Acid. Using the column names is bad and we should probably move over to use the acid stats property as the check. * SameCategoryAndAttributes is a duplication of TypeDescription.equals. * We need to integrate this with ORC-54 too. * I like pulling the include logic into SchemaEvolution. * Please use 'reader' instead of 'logical' in the names in SchemaEvolution. I'm still going through the SchemaEvolution changes. > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Blocker > Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, > HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, > HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, > HIVE-13974.09.patch, HIVE-13974.091.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
[ https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371355#comment-15371355 ] Hive QA commented on HIVE-14195: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12817101/HIVE-14195.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10304 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/468/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/468/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-468/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12817101 - PreCommit-HIVE-MASTER-Build > HiveMetaStoreClient getFunction() does not throw NoSuchObjectException > -- > > Key: HIVE-14195 > URL: https://issues.apache.org/jira/browse/HIVE-14195 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14195.2.patch, HIVE-14195.patch > > > HiveMetaStoreClient getFunction(dbName, funcName) does not throw > NoSuchObjectException when no function with funcName exists in the db. > Instead, I need to search the MetaException message for > 'NoSuchObjectException'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14207) Strip HiveConf hidden params in webui conf
[ https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14207: Attachment: HIVE-14207.2.patch Updated patch. > Strip HiveConf hidden params in webui conf > -- > > Key: HIVE-14207 > URL: https://issues.apache.org/jira/browse/HIVE-14207 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-14207.2.patch, HIVE-14207.patch > > > HIVE-12338 introduced a new web ui, which has a page that displays the > current HiveConf being used by HS2. However, before it displays that config, > it does not strip entries from it which are considered "hidden" conf > parameters, thus exposing those values from a web-ui for HS2. We need to add > stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14207) Strip HiveConf hidden params in webui conf
[ https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14207: Status: Patch Available (was: Open) > Strip HiveConf hidden params in webui conf > -- > > Key: HIVE-14207 > URL: https://issues.apache.org/jira/browse/HIVE-14207 > Project: Hive > Issue Type: Bug > Components: Web UI >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Attachments: HIVE-14207.2.patch, HIVE-14207.patch > > > HIVE-12338 introduced a new web ui, which has a page that displays the > current HiveConf being used by HS2. However, before it displays that config, > it does not strip entries from it which are considered "hidden" conf > parameters, thus exposing those values from a web-ui for HS2. We need to add > stripping to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14188) LLAPIF: wrong user field is used from the token
[ https://issues.apache.org/jira/browse/HIVE-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371310#comment-15371310 ] Sergey Shelukhin commented on HIVE-14188: - Most of the failures are known issues; the timed-out test is due to the NN going into safemode. [~gopalv] ping? > LLAPIF: wrong user field is used from the token > --- > > Key: HIVE-14188 > URL: https://issues.apache.org/jira/browse/HIVE-14188 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14188.patch, HIVE-14188.patch > > > realUser is not usually set in all cases for delegation tokens; we should use > the owner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)