[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372141#comment-15372141
 ] 

Hive QA commented on HIVE-14111:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817215/HIVE-14111.06.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10294 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-tez_self_join.q-filter_join_breaktask.q-vector_decimal_precision.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/476/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/476/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-476/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817215 - PreCommit-HIVE-MASTER-Build

> better concurrency handling for TezSessionState - part I
> 
>
> Key: HIVE-14111
> URL: https://issues.apache.org/jira/browse/HIVE-14111
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, 
> HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, 
> HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372058#comment-15372058
 ] 

Hive QA commented on HIVE-14007:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817200/HIVE-14007.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/475/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/475/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-475/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-475/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at a61c351 HIVE-14200: Tez: disable auto-reducer parallelism when 
reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther 
Hagleitner)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at a61c351 HIVE-14200: Tez: disable auto-reducer parallelism when 
reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther 
Hagleitner)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817200 - PreCommit-HIVE-MASTER-Build

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372057#comment-15372057
 ] 

Hive QA commented on HIVE-12646:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817193/HIVE-12646.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10308 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/474/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/474/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-474/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817193 - PreCommit-HIVE-MASTER-Build

> beeline and HIVE CLI do not parse ; in quote properly
> -
>
> Key: HIVE-12646
> URL: https://issues.apache.org/jira/browse/HIVE-12646
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients
>Reporter: Yongzhi Chen
>Assignee: Sahil Takiar
> Attachments: HIVE-12646.2.patch, HIVE-12646.3.patch, HIVE-12646.patch
>
>
> Beeline and Cli have to escape ; in the quote while most other shell scripts 
> need not. For example:
> in Beeline:
> {noformat}
> 0: jdbc:hive2://localhost:1> select ';' from tlb1;
> select ';' from tlb1;
> 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115
> 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403
> Error: Error while compiling statement: FAILED: ParseException line 1:8 
> cannot recognize input near '' '
> {noformat}
> while in mysql shell:
> {noformat}
> mysql> SELECT CONCAT(';', 'foo') FROM test limit 3;
> ++
> | ;foo   |
> | ;foo   |
> | ;foo   |
> ++
> 3 rows in set (0.00 sec)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14212) hbase_queries result out of date on branch-2.1

2016-07-11 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372032#comment-15372032
 ] 

Pengcheng Xiong commented on HIVE-14212:


After I checked master, I found that it was not consistent with 2.1. Thus, +1. 
Btw, i think updating q file outputs does not require +1. :)

> hbase_queries result out of date on branch-2.1
> --
>
> Key: HIVE-14212
> URL: https://issues.apache.org/jira/browse/HIVE-14212
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Attachments: HIVE-14212-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14196:
-
Attachment: HIVE-14196.3.patch

Minor change

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch, 
> HIVE-14196.3.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14196:
-
Attachment: HIVE-14196.3.patch

Added the check at compilation as well.

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch, 
> HIVE-14196.3.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14212) hbase_queries result out of date on branch-2.1

2016-07-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14212:

Status: Patch Available  (was: Open)

> hbase_queries result out of date on branch-2.1
> --
>
> Key: HIVE-14212
> URL: https://issues.apache.org/jira/browse/HIVE-14212
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Attachments: HIVE-14212-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables

2016-07-11 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371988#comment-15371988
 ] 

Sahil Takiar commented on HIVE-14137:
-

Re-basing again

> Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty 
> tables
> ---
>
> Key: HIVE-14137
> URL: https://issues.apache.org/jira/browse/HIVE-14137
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, 
> HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, 
> HIVE-14137.6.patch, HIVE-14137.patch
>
>
> The following queries:
> {code}
> -- Setup
> drop table if exists empty1;
> create table empty1 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> drop table if exists empty2;
> create table empty2 (col1 bigint, col2 bigint) stored as parquet 
> tblproperties ('parquet.compress'='snappy');
> drop table if exists empty3;
> create table empty3 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> -- All empty HDFS directories.
> -- Fails with [08S01]: Error while processing statement: FAILED: Execution 
> Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- Two empty HDFS directories.
> -- Create an empty file in HDFS.
> insert into empty1 select * from empty1 where false;
> -- Same query fails with [08S01]: Error while processing statement: FAILED: 
> Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- One empty HDFS directory.
> -- Create an empty file in HDFS.
> insert into empty2 select * from empty2 where false;
> -- Same query succeeds.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> {code}
> Will result in the following exception:
> {code}
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /tmp/hive/hive/1f3837aa-9407-4780-92b1-42a66d205139/hive_2016-06-24_15-45-23_206_79177714958655528-2/-mr-10004/0/emptyFile
>  for client 172.26.14.151 already exists
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2561)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:593)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1902)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1738)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1663)
>   at 
> 

[jira] [Updated] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables

2016-07-11 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-14137:

Attachment: HIVE-14137.6.patch

> Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty 
> tables
> ---
>
> Key: HIVE-14137
> URL: https://issues.apache.org/jira/browse/HIVE-14137
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, 
> HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, 
> HIVE-14137.6.patch, HIVE-14137.patch
>
>
> The following queries:
> {code}
> -- Setup
> drop table if exists empty1;
> create table empty1 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> drop table if exists empty2;
> create table empty2 (col1 bigint, col2 bigint) stored as parquet 
> tblproperties ('parquet.compress'='snappy');
> drop table if exists empty3;
> create table empty3 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> -- All empty HDFS directories.
> -- Fails with [08S01]: Error while processing statement: FAILED: Execution 
> Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- Two empty HDFS directories.
> -- Create an empty file in HDFS.
> insert into empty1 select * from empty1 where false;
> -- Same query fails with [08S01]: Error while processing statement: FAILED: 
> Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- One empty HDFS directory.
> -- Create an empty file in HDFS.
> insert into empty2 select * from empty2 where false;
> -- Same query succeeds.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> {code}
> Will result in the following exception:
> {code}
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /tmp/hive/hive/1f3837aa-9407-4780-92b1-42a66d205139/hive_2016-06-24_15-45-23_206_79177714958655528-2/-mr-10004/0/emptyFile
>  for client 172.26.14.151 already exists
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2561)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:593)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:111)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:393)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1902)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1738)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1663)
>   at 
> 

[jira] [Updated] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-11 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-14210:

Attachment: HIVE-14210.1.patch

> SSLFactory truststore reloader threads leaking in HiveServer2
> -
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Attachments: HIVE-14210.1.patch, HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371977#comment-15371977
 ] 

Sergey Shelukhin commented on HIVE-14210:
-

nit: can you add braces to the if? Otherwise +1

cc [~thejas] [~vgumashta]

> SSLFactory truststore reloader threads leaking in HiveServer2
> -
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Attachments: HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-11 Thread Thomas Friedrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371944#comment-15371944
 ] 

Thomas Friedrich edited comment on HIVE-14210 at 7/12/16 12:12 AM:
---

Provided patch for ExecDriver.java to call jobclient.close()


was (Author: tfriedr):
Patch for ExecDriver.java

> SSLFactory truststore reloader threads leaking in HiveServer2
> -
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
> Attachments: HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-11 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich reassigned HIVE-14210:
---

Assignee: Thomas Friedrich

> SSLFactory truststore reloader threads leaking in HiveServer2
> -
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Attachments: HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-11 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-14210:

Attachment: HIVE-14210.patch

Patch for ExecDriver.java

> SSLFactory truststore reloader threads leaking in HiveServer2
> -
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
> Attachments: HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371945#comment-15371945
 ] 

Hive QA commented on HIVE-14137:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817191/HIVE-14137.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/473/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/473/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-473/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-MASTER-Build-473/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   c790391..a61c351  master -> origin/master
+ git reset --hard HEAD
HEAD is now at c790391 HIVE-14151: Use of USE_DEPRECATED_CLI environment 
variable does not work (Vihang Karajgaonkar, reviewed by Sergio Pena)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at a61c351 HIVE-14200: Tez: disable auto-reducer parallelism when 
reducer-count * min.partition.factor < 1.0 (Gopal V, reviewed by Gunther 
Hagleitner)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817191 - PreCommit-HIVE-MASTER-Build

> Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty 
> tables
> ---
>
> Key: HIVE-14137
> URL: https://issues.apache.org/jira/browse/HIVE-14137
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, 
> HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, HIVE-14137.patch
>
>
> The following queries:
> {code}
> -- Setup
> drop table if exists empty1;
> create table empty1 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> drop table if exists empty2;
> create table empty2 (col1 bigint, col2 bigint) stored as parquet 
> tblproperties ('parquet.compress'='snappy');
> drop table if exists empty3;
> create table empty3 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> -- All empty HDFS directories.
> -- Fails with [08S01]: Error while processing statement: FAILED: Execution 
> Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- Two empty HDFS directories.
> -- Create an empty file in HDFS.
> insert into empty1 select * from empty1 where false;
> -- Same query fails with [08S01]: Error while processing statement: FAILED: 
> Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- One empty 

[jira] [Commented] (HIVE-14204) Optimize loading dynamic partitions

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371942#comment-15371942
 ] 

Hive QA commented on HIVE-14204:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817168/HIVE-14204.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 114 failed/errored test(s), 10310 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_vectorization_missing_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_empty_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_dynamic_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapreduce2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_merge1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries_with_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadataonly1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_diff_fs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_incompat1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge_incompat3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf_matchpath
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf_streaming
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_part
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_nonvec_fetchwork_part

[jira] [Commented] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc

2016-07-11 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371939#comment-15371939
 ] 

Eugene Koifman commented on HIVE-14211:
---

HIVE-13369 added a fix for the autoCommit=true mode.  Multi-statment txns 
require a more complicated change.

> AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files 
> etc
> -
>
> Key: HIVE-14211
> URL: https://issues.apache.org/jira/browse/HIVE-14211
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14211:
--
Target Version/s: 1.3.0, 2.2.0  (was: 1.3.0, 2.2.0, 2.1.1)

> AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files 
> etc
> -
>
> Key: HIVE-14211
> URL: https://issues.apache.org/jira/browse/HIVE-14211
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14211:
--
Target Version/s:   (was: 1.3.0, 2.2.0)

> AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files 
> etc
> -
>
> Key: HIVE-14211
> URL: https://issues.apache.org/jira/browse/HIVE-14211
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14211) AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files etc

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14211:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-9675

> AcidUtils.getAcidState()/Cleaner - make it consistent wrt multiple base files 
> etc
> -
>
> Key: HIVE-14211
> URL: https://issues.apache.org/jira/browse/HIVE-14211
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
locks in the snapshot.  The cleaner won't delete anything for a given 
compaction (partition) if there are locks on it.  Thus for duration of the 
transaction, nothing will be deleted so it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (0) opens txn (if appropriate), (1) acquires locks (2) 
locks in the snapshot.  The cleaner won't delete anything for a given 
compaction (partition) if there are locks on it.  Thus for duration of the 
transaction, nothing will be deleted so it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase 

[jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371921#comment-15371921
 ] 

Owen O'Malley commented on HIVE-14004:
--

Ok, I'm looking at this bug now too, since this seems like the important part 
of HIVE-13974.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> --
>
> Key: HIVE-14004
> URL: https://issues.apache.org/jira/browse/HIVE-14004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Matt McCline
> Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
> int[][] tableData = {{1,2},{3,4}};
> runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
> Worker t = new Worker();
> t.setThreadId((int) t.getId());
> t.setHiveConf(hiveConf);
> AtomicBoolean stop = new AtomicBoolean();
> AtomicBoolean looped = new AtomicBoolean();
> stop.set(true);
> t.init(stop, looped);
> t.run();
> runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
> runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
> t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_71]
> at 

[jira] [Commented] (HIVE-14200) Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0

2016-07-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371882#comment-15371882
 ] 

Gopal V commented on HIVE-14200:


Pushed to master, thanks [~hagleitn]

> Tez: disable auto-reducer parallelism when reducer-count * 
> min.partition.factor < 1.0
> -
>
> Key: HIVE-14200
> URL: https://issues.apache.org/jira/browse/HIVE-14200
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.2.0
>
> Attachments: HIVE-14200.1.patch, HIVE-14200.2.patch, 
> HIVE-14200.3.patch
>
>
> The min/max factors offer no real improvement when the fractions are 
> meaningless, for example when 0.25 * 2  is applied as the min.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None

2016-07-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371884#comment-15371884
 ] 

Lefty Leverenz commented on HIVE-13159:
---

Thanks Shannon!

> TxnHandler should support datanucleus.connectionPoolingType = None
> --
>
> Key: HIVE-13159
> URL: https://issues.apache.org/jira/browse/HIVE-13159
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
> Fix For: 2.2.0
>
> Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch
>
>
> Right now, one has to choose bonecp or dbcp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14200) Tez: disable auto-reducer parallelism when reducer-count * min.partition.factor < 1.0

2016-07-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14200:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
 Release Note:  Tez: disable auto-reducer parallelism when reducer-count * 
min.partition.factor < 1.0 (Gopal V, reviewed by Gunther Hagleitner)
   Status: Resolved  (was: Patch Available)

> Tez: disable auto-reducer parallelism when reducer-count * 
> min.partition.factor < 1.0
> -
>
> Key: HIVE-14200
> URL: https://issues.apache.org/jira/browse/HIVE-14200
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.2.0
>
> Attachments: HIVE-14200.1.patch, HIVE-14200.2.patch, 
> HIVE-14200.3.patch
>
>
> The min/max factors offer no real improvement when the fractions are 
> meaningless, for example when 0.25 * 2  is applied as the min.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None

2016-07-11 Thread Shannon Ladymon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shannon Ladymon updated HIVE-13159:
---
Labels:   (was: TODOC2.2)

> TxnHandler should support datanucleus.connectionPoolingType = None
> --
>
> Key: HIVE-13159
> URL: https://issues.apache.org/jira/browse/HIVE-13159
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
> Fix For: 2.2.0
>
> Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch
>
>
> Right now, one has to choose bonecp or dbcp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13159) TxnHandler should support datanucleus.connectionPoolingType = None

2016-07-11 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371879#comment-15371879
 ] 

Shannon Ladymon commented on HIVE-13159:


Doc done.

> TxnHandler should support datanucleus.connectionPoolingType = None
> --
>
> Key: HIVE-13159
> URL: https://issues.apache.org/jira/browse/HIVE-13159
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Sergey Shelukhin
>Assignee: Alan Gates
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-13159.2.patch, HIVE-13159.3.patch, HIVE-13159.patch
>
>
> Right now, one has to choose bonecp or dbcp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371868#comment-15371868
 ] 

Ashutosh Chauhan commented on HIVE-13704:
-

+1

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-07-11 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371856#comment-15371856
 ] 

Shannon Ladymon commented on HIVE-14007:


[~owen.omalley], why are some ORC parameters 
(*hive.orc.splits.include.file.footer, hive.orc.cache.stripe.details.size, 
hive.orc.compute.splits.num.threads, hive.exec.orc.split.strategy, 
hive.merge.orcfile.stripe.level, hive.exec.orc.base.delta.ratio*) not being 
removed from HiveConf? Are these not duplicates?

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371794#comment-15371794
 ] 

Matt McCline commented on HIVE-13974:
-

*No it is not an excuse*.  I'll defer my cussing.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14144) Permanent functions are showing up in show functions, but describe says it doesn't exist

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371753#comment-15371753
 ] 

Hive QA commented on HIVE-14144:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817085/HIVE-14144.01-branch-2.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 233 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestHs2HooksWithMiniKdc - did not produce a TEST-*.xml file
TestJdbcNonKrbSASLWithMiniKdc - did not produce a TEST-*.xml file
TestJdbcWithDBTokenStore - did not produce a TEST-*.xml file
TestJdbcWithMiniKdc - did not produce a TEST-*.xml file
TestJdbcWithMiniKdcCookie - did not produce a TEST-*.xml file
TestJdbcWithMiniKdcSQLAuthBinary - did not produce a TEST-*.xml file
TestJdbcWithMiniKdcSQLAuthHttp - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autoColumnStats_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binary_output_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnStatsUpdateForStatsOptimizer_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppr_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input42
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_orig_table_use_metadata
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_map_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_json_serde1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3

[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371746#comment-15371746
 ] 

Owen O'Malley edited comment on HIVE-13974 at 7/11/16 10:09 PM:


{quote}
No, the semantics of sameCategoryAndAttributes is different than equals.
{quote}

*Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as 
part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a 
negative patch of 2MB "going out"?). In any case, do not add the new method. 
ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would 
you like a back port of that patch?

{quote}
There are 3 kinds of schema not 2.
{quote}

Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' 
schema is the one that the user asked for. I don't think we need anything else.

{quote}
About ORC-54 -- it is not practical right now in terms of time.
{quote}

ORC-54 is closer to going in. It has unit tests and I believe handles this as a 
sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch.

{quote}
Also, there really needs to be a parallel HIVE JIRA for it and we must make 
sure name mapping is fully supported for
{quote}

Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to 
maintain two versions of ORC with a forked code base is a bad thing.

{quote}
Given how *difficult* Schema Evolution has been I simply don't believe it will 
*just work* with ORC only unit tests.
{quote}
That is not an excuse. Unit tests are MUCH more likely to be correct because 
the errors aren't hidden under layers of the execution engine. Being difficult 
to get right is why not having unit tests is unacceptable.


was (Author: owen.omalley):
{quote}
No, the semantics of sameCategoryAndAttributes is different than equals.
{quote}
*Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as 
part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a 
negative patch of 2MB "going out"?). In any case, do not add the new method. 
ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would 
you like a back port of that patch?

{quote}
There are 3 kinds of schema not 2.
{quote}

Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' 
schema is the one that the user asked for. I don't think we need anything else.

{quote}
About ORC-54 -- it is not practical right now in terms of time.
{quote}
ORC-54 is closer to going in. It has unit tests and I believe handles this as a 
sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch.

{quote}
Also, there really needs to be a parallel HIVE JIRA for it and we must make 
sure name mapping is fully supported for
Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to 
maintain two versions of ORC with a forked code base is a bad thing.

{quote}
Given how *difficult* Schema Evolution has been I simply don't believe it will 
*just work* with ORC only unit tests.
{quote}
That is not an excuse. Unit tests are MUCH more likely to be correct because 
the errors aren't hidden under layers of the execution engine.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371746#comment-15371746
 ] 

Owen O'Malley commented on HIVE-13974:
--

{quote}
No, the semantics of sameCategoryAndAttributes is different than equals.
{quote}
*Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as 
part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a 
negative patch of 2MB "going out"?). In any case, do not add the new method. 
ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would 
you like a back port of that patch?

{quote}
There are 3 kinds of schema not 2.
{quote}

Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' 
schema is the one that the user asked for. I don't think we need anything else.

{quote}
About ORC-54 -- it is not practical right now in terms of time.
{quote}
ORC-54 is closer to going in. It has unit tests and I believe handles this as a 
sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch.

{quote}
Also, there really needs to be a parallel HIVE JIRA for it and we must make 
sure name mapping is fully supported for
Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to 
maintain two versions of ORC with a forked code base is a bad thing.

{quote}
Given how *difficult* Schema Evolution has been I simply don't believe it will 
*just work* with ORC only unit tests.
{quote}
That is not an excuse. Unit tests are MUCH more likely to be correct because 
the errors aren't hidden under layers of the execution engine.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Owen O'Malley edited comment on HIVE-13974 at 7/11/16 10:07 PM:


[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get Erie out the door.  We have so little runway left.  I've had 10+ JIRAs for 
weeks.  Whenever I knock some down more appear.  Also, there really needs to be 
a parallel HIVE JIRA for it and we must make sure name mapping is fully 
supported for HIVE.  Given how *difficult* Schema Evolution has been I simply 
don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


was (Author: mmccline):
{quote}
No, the semantics of sameCategoryAndAttributes is different than equals.
{quote}
*Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as 
part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a 
negative patch of 2MB "going out"?). In any case, do not add the new method. 
ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would 
you like a back port of that patch?

{quote}
There are 3 kinds of schema not 2.
{quote}

Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' 
schema is the one that the user asked for. I don't think we need anything else.

{quote}
About ORC-54 -- it is not practical right now in terms of time. 
{quote}
ORC-54 is closer to going in. It has unit tests and I believe handles this as a 
sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch.

{quote}
Also, there really needs to be a parallel HIVE JIRA for it and we must make 
sure name mapping is fully supported for HIVE.
{quote}

Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to 
maintain two versions of ORC with a forked code base is a bad thing.

{code}
Given how *difficult* Schema Evolution has been I simply don't believe it will 
*just work* with ORC only unit tests.
{code}
That is not an excuse. Unit tests are MUCH more likely to be correct because 
the errors aren't hidden under layers of the execution engine.


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371732#comment-15371732
 ] 

Owen O'Malley commented on HIVE-13974:
--

Sorry, I seem to have edited your comment instead of leaving a new comment. 
Sorry!

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Owen O'Malley edited comment on HIVE-13974 at 7/11/16 10:02 PM:


{quote}
No, the semantics of sameCategoryAndAttributes is different than equals.
{quote}
*Sigh* Ok, I forgot that I had only fixed that on the ORC side of the world as 
part of ORC-53. Hive will get that as soon as HIVE-14007 goes in (or is a 
negative patch of 2MB "going out"?). In any case, do not add the new method. 
ORC-53's impact on orc-core is pretty small outside of TypeDescription. Would 
you like a back port of that patch?

{quote}
There are 3 kinds of schema not 2.
{quote}

Ugh. That seems unnecessary. The 'file' schema is pretty clear. The 'reader' 
schema is the one that the user asked for. I don't think we need anything else.

{quote}
About ORC-54 -- it is not practical right now in terms of time. 
{quote}
ORC-54 is closer to going in. It has unit tests and I believe handles this as a 
sub-case. I'm trying to figure out what we gain out of the HIVE-13974 patch.

{quote}
Also, there really needs to be a parallel HIVE JIRA for it and we must make 
sure name mapping is fully supported for HIVE.
{quote}

Uh no. The Hive ORC code is about to disappear with HIVE-14007. Continuing to 
maintain two versions of ORC with a forked code base is a bad thing.

{code}
Given how *difficult* Schema Evolution has been I simply don't believe it will 
*just work* with ORC only unit tests.
{code}
That is not an excuse. Unit tests are MUCH more likely to be correct because 
the errors aren't hidden under layers of the execution engine.



was (Author: mmccline):
[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
Is it the one being returned by the input file format (and the one that the 
needed column environment variable and PPD apply to), is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- There really needs to be a parallel HIVE JIRA for it and we 
must make sure name mapping is fully supported for HIVE.  Given how *difficult* 
Schema Evolution has been I simply don't believe it will *just work* with ORC 
only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Matt McCline edited comment on HIVE-13974 at 7/11/16 9:59 PM:
--

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
Is it the one being returned by the input file format (and the one that the 
needed column environment variable and PPD apply to), is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- There really needs to be a parallel HIVE JIRA for it and we 
must make sure name mapping is fully supported for HIVE.  Given how *difficult* 
Schema Evolution has been I simply don't believe it will *just work* with ORC 
only unit tests.

FYI [~hagleitn] [~ekoifman]



was (Author: mmccline):
[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
Is it the one being returned by the input file format (and the one that the 
needed column environment variable and PPD apply to), is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14209) Add some logging info for session and operation management

2016-07-11 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371708#comment-15371708
 ] 

Chaoyu Tang commented on HIVE-14209:


+1

> Add some logging info for session and operation management
> --
>
> Key: HIVE-14209
> URL: https://issues.apache.org/jira/browse/HIVE-14209
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-14209.1.patch
>
>
> It's hard to track the session and operation open and close in multiple user 
> env. Add some logging info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371704#comment-15371704
 ] 

Thejas M Nair commented on HIVE-14207:
--

+1 pending tests


> Strip HiveConf hidden params in webui conf
> --
>
> Key: HIVE-14207
> URL: https://issues.apache.org/jira/browse/HIVE-14207
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14207.2.patch, HIVE-14207.3.patch, HIVE-14207.patch
>
>
> HIVE-12338 introduced a new web ui, which has a page that displays the 
> current HiveConf being used by HS2. However, before it displays that config, 
> it does not strip entries from it which are considered "hidden" conf 
> parameters, thus exposing those values from a web-ui for HS2. We need to add 
> stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14209) Add some logging info for session and operation management

2016-07-11 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14209:

Status: Patch Available  (was: Open)

Path-1: trivial change to add op handler and session handler in the message.

> Add some logging info for session and operation management
> --
>
> Key: HIVE-14209
> URL: https://issues.apache.org/jira/browse/HIVE-14209
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-14209.1.patch
>
>
> It's hard to track the session and operation open and close in multiple user 
> env. Add some logging info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14209) Add some logging info for session and operation management

2016-07-11 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14209:

Attachment: HIVE-14209.1.patch

> Add some logging info for session and operation management
> --
>
> Key: HIVE-14209
> URL: https://issues.apache.org/jira/browse/HIVE-14209
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-14209.1.patch
>
>
> It's hard to track the session and operation open and close in multiple user 
> env. Add some logging info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14152) datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling downgrade

2016-07-11 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371695#comment-15371695
 ] 

Sushanth Sowmyan commented on HIVE-14152:
-

+1.

> datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling 
> downgrade 
> --
>
> Key: HIVE-14152
> URL: https://issues.apache.org/jira/browse/HIVE-14152
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Thejas M Nair
> Attachments: HIVE-14152.1.patch, HIVE-14152.2.patch, 
> HIVE-14152.3.patch
>
>
> We see the following issue when downgrading metastore:
> 1. Run some query using new tables
> 2. Downgrade metastore
> 3. Restart metastore will complain the new table does not exist
> In particular, constaints tables does not exist in branch-1. If we run Hive 2 
> and create a constraint, then downgrade metastore to Hive 1, datanucleus will 
> complain:
> {code}
> javax.jdo.JDOFatalUserException: Error starting up DataNucleus : a class 
> "org.apache.hadoop.hive.metastore.model.MConstraint" was listed as being 
> persisted previously in this datastore, yet the class wasnt found. Perhaps it 
> is used by a different DataNucleus-enabled application in this datastore, or 
> you have changed your class names.
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:528)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
>   at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:377)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:406)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:266)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:650)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:628)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:677)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:77)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:83)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5905)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5900)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6159)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6084)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at 

[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Matt McCline edited comment on HIVE-13974 at 7/11/16 9:38 PM:
--

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
Is it the one being returned by the input file format (and the one that the 
needed column environment variable and PPD apply to), is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]



was (Author: mmccline):
[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format (and the one that the 
needed column environment variable and PPD apply to), is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Matt McCline edited comment on HIVE-13974 at 7/11/16 9:25 PM:
--

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format (and the one that the 
needed column environment variable and PPD apply to), is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]



was (Author: mmccline):
[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Matt McCline edited comment on HIVE-13974 at 7/11/16 9:24 PM:
--

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]



was (Author: mmccline):

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get Erie out the door.  We have so little runway left.  I've had 10+ JIRAs for 
weeks.  Whenever I knock some down more appear.  Also, there really needs to be 
a parallel HIVE JIRA for it and we must make sure name mapping is fully 
supported for HIVE.  Given how *difficult* Schema Evolution has been I simply 
don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371651#comment-15371651
 ] 

Matt McCline commented on HIVE-13974:
-


[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get Erie out the door.  We have so little runway left.  I've had 10+ JIRAs for 
weeks.  Whenever I knock some down more appear.  Also, there really needs to be 
a parallel HIVE JIRA for it and we must make sure name mapping is fully 
supported for HIVE.  Given how *difficult* Schema Evolution has been I simply 
don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14074) RELOAD FUNCTION should update dropped functions

2016-07-11 Thread Abdullah Yousufi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14074:

Attachment: HIVE-14074.03.patch

> RELOAD FUNCTION should update dropped functions
> ---
>
> Key: HIVE-14074
> URL: https://issues.apache.org/jira/browse/HIVE-14074
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Fix For: 2.2.0
>
> Attachments: HIVE-14074.01.patch, HIVE-14074.02.patch, 
> HIVE-14074.03.patch
>
>
> Due to HIVE-2573, functions are stored in a per-session registry and only 
> loaded in from the metastore when hs2 or hive cli is started. Running RELOAD 
> FUNCTION in the current session is a way to force a reload of the functions, 
> so that changes that occurred in other running sessions will be reflected in 
> the current session, without having to restart the current session. However, 
> while functions that are created in other sessions will now appear in the 
> current session, functions that have been dropped are not removed from the 
> current session's registry. It seems inconsistent that created functions are 
> updated while dropped functions are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14207:

Attachment: HIVE-14207.3.patch

Updated patch. .3.patch now introduces new method to find a free port that is 
guaranteed to not be a port number that we specify.

> Strip HiveConf hidden params in webui conf
> --
>
> Key: HIVE-14207
> URL: https://issues.apache.org/jira/browse/HIVE-14207
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14207.2.patch, HIVE-14207.3.patch, HIVE-14207.patch
>
>
> HIVE-12338 introduced a new web ui, which has a page that displays the 
> current HiveConf being used by HS2. However, before it displays that config, 
> it does not strip entries from it which are considered "hidden" conf 
> parameters, thus exposing those values from a web-ui for HS2. We need to add 
> stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14158) deal with derived column names

2016-07-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371627#comment-15371627
 ] 

Ashutosh Chauhan commented on HIVE-14158:
-

We should avoid calling genOPTree() in case there is no masking/row filtering. 
Looks good other than that. Although failure authorization_create_temp_table 
looks relevant.

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch, HIVE-14158.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-13704:
---
Attachment: HIVE-13704.1.patch

[~ashutoshc] Could you review this small patch? I just start using run() again. 

I run a test with the old code and the issue was happening as stated in this 
patch. When I changed to run(), then the problem got away.

Btw, I reproduced the issue using:
{{LOAD PATH INPATH '/tmp/dummytext.txt' OVERWRITE INTO TABLE dummytext;}}

dummytext was in an encryption zone, and when I run it with the execute() 
method, then the final destination for the file was: 
{{/user/hive/warehouse/dummytext/dummytext.txt/dummytext.txt}}. It was creating 
a new subdirectory inside the table location.

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-13704:
---
Status: Patch Available  (was: Open)

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reassigned HIVE-13704:
--

Assignee: Sergio Peña

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14191) bump a new api version for ThriftJDBCBinarySerde changes

2016-07-11 Thread Ziyang Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziyang Zhao updated HIVE-14191:
---
Attachment: HIVE-14191.2.patch

Created a new api version and generated thrift files

> bump a new api version for ThriftJDBCBinarySerde changes
> 
>
> Key: HIVE-14191
> URL: https://issues.apache.org/jira/browse/HIVE-14191
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC
>Affects Versions: 2.1.0
>Reporter: Ziyang Zhao
>Assignee: Ziyang Zhao
> Attachments: HIVE-14191.1.patch, HIVE-14191.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371565#comment-15371565
 ] 

Prasanth Jayachandran commented on HIVE-14196:
--

Also, when this patch was committed initially before HIVE-13617 the compilation 
stage will say no inputs supported. Now we have non-vector reader for llap the 
compilation says all inputs supported but may fail at runtime if it finds any 
complex types (which will soon be removed with proper fix). 

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371561#comment-15371561
 ] 

Sergey Shelukhin commented on HIVE-14196:
-

Well that causes incorrect explain.

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371555#comment-15371555
 ] 

Sushanth Sowmyan commented on HIVE-14207:
-

Sounds good.

> Strip HiveConf hidden params in webui conf
> --
>
> Key: HIVE-14207
> URL: https://issues.apache.org/jira/browse/HIVE-14207
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14207.2.patch, HIVE-14207.patch
>
>
> HIVE-12338 introduced a new web ui, which has a page that displays the 
> current HiveConf being used by HS2. However, before it displays that config, 
> it does not strip entries from it which are considered "hidden" conf 
> parameters, thus exposing those values from a web-ui for HS2. We need to add 
> stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371556#comment-15371556
 ] 

Prasanth Jayachandran commented on HIVE-14196:
--

Exactly. It's handled at runtime when record reader is created instead of 
compilation stage.

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-13191) DummyTable map joins mix up columns between tables

2016-07-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reopened HIVE-13191:
-

[~jcamachorodriguez] I can still repro this on master. Though only with 
MiniTezCliDriver. Seems like works fine for CliDriver.

> DummyTable map joins mix up columns between tables
> --
>
> Key: HIVE-13191
> URL: https://issues.apache.org/jira/browse/HIVE-13191
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Pengcheng Xiong
> Attachments: tez.q
>
>
> {code}
> SELECT
>   a.key,
>   a.a_one,
>   b.b_one,
>   a.a_zero,
>   b.b_zero
> FROM
> (
> SELECT
>   11 key,
>   0 confuse_you,
>   1 a_one,
>   0 a_zero
> ) a
> LEFT JOIN
> (
> SELECT
>   11 key,
>   0 confuse_you,
>   1 b_one,
>   0 b_zero
> ) b
> ON a.key = b.key
> ;
> 11  1   0   0   1
> {code}
> This should be 11, 1, 1, 0, 0 instead. 
> Disabling map-joins & using shuffle-joins returns the right result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14135) beeline output not formatted correctly for large column widths

2016-07-11 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371544#comment-15371544
 ] 

Vihang Karajgaonkar commented on HIVE-14135:


Updated the patch with a testcase to handle columns with large widths. Changes 
the default column width from 15 to 50 characters.

> beeline output not formatted correctly for large column widths
> --
>
> Key: HIVE-14135
> URL: https://issues.apache.org/jira/browse/HIVE-14135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, 
> longKeyValues.txt, output_after.txt, output_before.txt
>
>
> If the column width is too large then beeline uses the maximum column width 
> when normalizing all the column widths. In order to reproduce the issue, run 
> set -v; 
> Once the configuration variables is classpath which can be extremely large 
> width (41k characters in my environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths

2016-07-11 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14135:
---
Attachment: HIVE-14135.2.patch

> beeline output not formatted correctly for large column widths
> --
>
> Key: HIVE-14135
> URL: https://issues.apache.org/jira/browse/HIVE-14135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, 
> longKeyValues.txt, output_after.txt, output_before.txt
>
>
> If the column width is too large then beeline uses the maximum column width 
> when normalizing all the column widths. In order to reproduce the issue, run 
> set -v; 
> Once the configuration variables is classpath which can be extremely large 
> width (41k characters in my environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths

2016-07-11 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14135:
---
Status: Patch Available  (was: Open)

> beeline output not formatted correctly for large column widths
> --
>
> Key: HIVE-14135
> URL: https://issues.apache.org/jira/browse/HIVE-14135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, 
> longKeyValues.txt, output_after.txt, output_before.txt
>
>
> If the column width is too large then beeline uses the maximum column width 
> when normalizing all the column widths. In order to reproduce the issue, run 
> set -v; 
> Once the configuration variables is classpath which can be extremely large 
> width (41k characters in my environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14135) beeline output not formatted correctly for large column widths

2016-07-11 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14135:
---
Status: Open  (was: Patch Available)

> beeline output not formatted correctly for large column widths
> --
>
> Key: HIVE-14135
> URL: https://issues.apache.org/jira/browse/HIVE-14135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, 
> longKeyValues.txt, output_after.txt, output_before.txt
>
>
> If the column width is too large then beeline uses the maximum column width 
> when normalizing all the column widths. In order to reproduce the issue, run 
> set -v; 
> Once the configuration variables is classpath which can be extremely large 
> width (41k characters in my environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14159) sorting of tuple array using multiple field[s]

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371536#comment-15371536
 ] 

Hive QA commented on HIVE-14159:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817103/HIVE-14159.4.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10318 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/469/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/469/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-469/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817103 - PreCommit-HIVE-MASTER-Build

> sorting of tuple array using multiple field[s]
> --
>
> Key: HIVE-14159
> URL: https://issues.apache.org/jira/browse/HIVE-14159
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>  Labels: patch
> Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, 
> HIVE-14159.3.patch, HIVE-14159.4.patch
>
>
> Problem Statement:
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {noformat}
> {
>   "name": "employee",
>   "type": [{
>   "type": "record",
>   "name": "Employee",
>   "namespace": "com.company.Employee",
>   "fields": [{
>   "name": "empId",
>   "type": "int"
>   }, {
>   "name": "empName",
>   "type": "string"
>   }, {
>   "name": "age",
>   "type": "int"
>   }, {
>   "name": "salary",
>   "type": "double"
>   }]
>   }]
> }
> {noformat}
> Then while running our hive query complex array looks like array of employee 
> objects.
> {noformat}
> Example: 
>   //(array>)
>   
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> {noformat}
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> {noformat}
> Example:
>   1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>   output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>   
>   2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
>   3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>   output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9928) Empty buckets are not created on non-HDFS file system

2016-07-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9928:
---
   Resolution: Duplicate
 Assignee: Ankit Kamboj
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master * branch-2.1

> Empty buckets are not created on non-HDFS file system
> -
>
> Key: HIVE-9928
> URL: https://issues.apache.org/jira/browse/HIVE-9928
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Ankit Kamboj
>Assignee: Ankit Kamboj
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-9928.1.patch
>
>
> Bucketing should create empty buckets on the destination file system. There 
> is a problem in that logic that it uses path.toUri().getPath().toString() to 
> find the relevant path. But this chain of methods always resolves to relative 
> path which ends up creating the empty buckets in hdfs rather than actual 
> destination fs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14175) Fix creating buckets without scheme information

2016-07-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14175:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master & branch-2.1

> Fix creating buckets without scheme information
> ---
>
> Key: HIVE-14175
> URL: https://issues.apache.org/jira/browse/HIVE-14175
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>  Labels: patch
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14175.2.patch, HIVE-14175.patch, HIVE-14175.patch
>
>
> If a table is created on a non-default filesystem (i.e. non-hdfs), the empty 
> files will be created with incorrect scheme information. This patch extracts 
> the scheme and authority information for the new paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14172) LLAP: force evict blocks by size to handle memory fragmentation

2016-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371527#comment-15371527
 ] 

Sergey Shelukhin commented on HIVE-14172:
-

Hmm, I thought I commented here. Probably commented on some wrong JIRA 0_o 
[~gopalv] ping? Failures are either known or caused by NN being in safe mode.

> LLAP: force evict blocks by size to handle memory fragmentation
> ---
>
> Key: HIVE-14172
> URL: https://issues.apache.org/jira/browse/HIVE-14172
> Project: Hive
>  Issue Type: Bug
>Reporter: Nita Dembla
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14172.01.patch, HIVE-14172.patch
>
>
> In the long run, we should replace buddy allocator with a better scheme. For 
> now do a workaround for fragmentation that cannot be easily resolved. It's 
> still not perfect but works for practical  ORC cases, where we have the 
> default size and smaller blocks, rather than large allocations having trouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14152) datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling downgrade

2016-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371514#comment-15371514
 ] 

Thejas M Nair commented on HIVE-14152:
--

The test failures are unrelated, they happen in runs with other jiras as well.



> datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling 
> downgrade 
> --
>
> Key: HIVE-14152
> URL: https://issues.apache.org/jira/browse/HIVE-14152
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Thejas M Nair
> Attachments: HIVE-14152.1.patch, HIVE-14152.2.patch, 
> HIVE-14152.3.patch
>
>
> We see the following issue when downgrading metastore:
> 1. Run some query using new tables
> 2. Downgrade metastore
> 3. Restart metastore will complain the new table does not exist
> In particular, constaints tables does not exist in branch-1. If we run Hive 2 
> and create a constraint, then downgrade metastore to Hive 1, datanucleus will 
> complain:
> {code}
> javax.jdo.JDOFatalUserException: Error starting up DataNucleus : a class 
> "org.apache.hadoop.hive.metastore.model.MConstraint" was listed as being 
> persisted previously in this datastore, yet the class wasnt found. Perhaps it 
> is used by a different DataNucleus-enabled application in this datastore, or 
> you have changed your class names.
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:528)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
>   at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:377)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:406)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:266)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:650)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:628)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:677)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:77)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:83)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5905)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5900)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6159)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6084)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 

[jira] [Updated] (HIVE-11402) HS2 - add an option to disallow parallel query execution within a single Session

2016-07-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11402:

Attachment: HIVE-11402.03.patch

The handling for async ops, as well as some refactoring. Thanks for the pointer!

> HS2 - add an option to disallow parallel query execution within a single 
> Session
> 
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11402.01.patch, HIVE-11402.02.patch, 
> HIVE-11402.03.patch, HIVE-11402.patch
>
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14152) datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling downgrade

2016-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371509#comment-15371509
 ] 

Thejas M Nair commented on HIVE-14152:
--

[~sushanth] Can you please review this change ?


> datanucleus.autoStartMechanismMode should set to 'Ignored' to allow rolling 
> downgrade 
> --
>
> Key: HIVE-14152
> URL: https://issues.apache.org/jira/browse/HIVE-14152
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Thejas M Nair
> Attachments: HIVE-14152.1.patch, HIVE-14152.2.patch, 
> HIVE-14152.3.patch
>
>
> We see the following issue when downgrading metastore:
> 1. Run some query using new tables
> 2. Downgrade metastore
> 3. Restart metastore will complain the new table does not exist
> In particular, constaints tables does not exist in branch-1. If we run Hive 2 
> and create a constraint, then downgrade metastore to Hive 1, datanucleus will 
> complain:
> {code}
> javax.jdo.JDOFatalUserException: Error starting up DataNucleus : a class 
> "org.apache.hadoop.hive.metastore.model.MConstraint" was listed as being 
> persisted previously in this datastore, yet the class wasnt found. Perhaps it 
> is used by a different DataNucleus-enabled application in this datastore, or 
> you have changed your class names.
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:528)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
>   at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:377)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:406)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:266)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:650)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:628)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:677)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:484)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:77)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:83)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5905)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5900)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:6159)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:6084)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at 

[jira] [Updated] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-07-11 Thread Rahul Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Sharma updated HIVE-13966:

Attachment: HIVE-13966.1.patch

Attaching the initial patch.

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Rahul Sharma
>Priority: Critical
> Attachments: HIVE-13966.1.patch
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-07-11 Thread Rahul Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Sharma updated HIVE-13966:

Status: Patch Available  (was: In Progress)

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Rahul Sharma
>Priority: Critical
> Attachments: HIVE-13966.1.patch
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Description: 
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
for such reader.

The issue is if at the same time the Cleaner process is running.  It will see 
everything with txnid<17 as obsolete.  Then it will check lock manger state and 
decide to delete (as there may not be any locks in LM for table A).  The order 
in which the files are deleted is undefined right now.  It may delete 
delta_16_16 and delta_17_17 first and right at this moment the read request 
with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by some 
multi-stmt txn that started some time ago.  It acquires locks after the Cleaner 
checks LM state and calls getAcidState(). This request will choose base_5 but 
it won't see delta_16_16 and delta_17_17 and thus return the snapshot w/o 
modifications made by those txns.
[This is not possible currently since we only support autoCommit=true.  The 
reason is the a query (1) acquires locks (2) locks in the snapshot.  The 
cleaner won't delete anything for a given compaction (partition) if there are 
locks on it.  Thus for duration of the transaction, nothing will be deleted so 
it's safe to use base_5]


This is a subtle race condition but possible.

1. So the safest thing to do to ensure correctness is to use the latest base_x 
as the "best" and check against exceptions in ValidTxnList and throw an 
exception if there is an exception <=x.

2. A better option is to keep 2 exception lists: aborted and open and only 
throw if there is an open txn <=x.  Compaction throws away data from aborted 
txns and thus there is no harm using base with aborted txns in its range.

3. You could make each txn record the lowest open txn id at its start and 
prevent the cleaner from cleaning anything delta with id range that includes 
this open txn id for any txn that is still running.  This has a drawback of 
potentially delaying GC of old files for arbitrarily long periods.  So this 
should be a user config choice.   The implementation is not trivial.

I would go with 1 now and do 2/3 together with multi-statement txn work.



Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
the other

  was:
The JavaDoc on getAcidState() reads, in part:

"Note that because major compactions don't
   preserve the history, we can't use a base directory that includes a
   transaction id that we must exclude."

which is correct but there is nothing in the code that does this.

And if we detect a situation where txn X must be excluded but and there are 
deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
happen with auto commit mode, but with multi statement txns it's possible.
Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An hour 
later it decides to access some partition for which all txns < 20 (for example) 
have already been compacted (i.e. GC'd).  

==
Here is a more concrete example.  Let's say the file for table A are as follows 
and created in the order listed.
delta_4_4
delta_5_5
delta_4_5
base_5
delta_16_16
delta_17_17
base_17  (for example user ran major compaction)

let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
and ExceptionList=<16>
Assume that all txns <= 20 commit.

Reader can't use base_17 because it has result of txn16.  So it should chose 
base_5 "TxnBase bestBase" in _getChildState()_.
Then 

[jira] [Updated] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work

2016-07-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14151:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> Use of USE_DEPRECATED_CLI environment variable does not work
> 
>
> Key: HIVE-14151
> URL: https://issues.apache.org/jira/browse/HIVE-14151
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-14151.1.patch
>
>
> According to 
> https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline
>  if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it 
> doesn't seem to work.
> In order to reproduce this issue:
> {noformat}
> $ echo $USE_DEPRECATED_CLI
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> $
> $ export USE_DEPRECATED_CLI=false
> $ echo $USE_DEPRECATED_CLI
> false
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work

2016-07-11 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371451#comment-15371451
 ] 

Vihang Karajgaonkar commented on HIVE-14151:


Hi [~spena] I have tested the change manually it should work fine. There are 
anyways no tests which run cli.sh so I guess you can go ahead and commit it if 
it looks good to you. Thanks!

> Use of USE_DEPRECATED_CLI environment variable does not work
> 
>
> Key: HIVE-14151
> URL: https://issues.apache.org/jira/browse/HIVE-14151
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14151.1.patch
>
>
> According to 
> https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline
>  if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it 
> doesn't seem to work.
> In order to reproduce this issue:
> {noformat}
> $ echo $USE_DEPRECATED_CLI
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> $
> $ export USE_DEPRECATED_CLI=false
> $ echo $USE_DEPRECATED_CLI
> false
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371447#comment-15371447
 ] 

Sergey Shelukhin edited comment on HIVE-14196 at 7/11/16 7:19 PM:
--

Hmm... seems like LLAP IO changes in out files are incorrect? I wonder if it's 
because it's handled at split generation stage, not compilation stage. 


was (Author: sershe):
Hmm... seems like LLAP IO changes in out files are incorrect?

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371447#comment-15371447
 ] 

Sergey Shelukhin commented on HIVE-14196:
-

Hmm... seems like LLAP IO changes in out files are incorrect?

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-07-11 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371431#comment-15371431
 ] 

Sahil Takiar commented on HIVE-13930:
-

Hey Everyone,

Sergio and I uploaded a new Spark tar-ball that is built against Hadoop 2.6.0 
(the previous version was built against Hadoop 2.4.0). This new version should 
work, although there is a chance there may be some problems since it was built 
against 2.6.0 and not 2.7.2. Can someone re-trigger the Hive QA test to see if 
the {{TestSparkCliDriver}} tests are now passing?

We couldn't compile against Hadoop 2.7.2 because Spark 1.6.0 doesn't provide an 
option of compiling against Hadoop 2.7+ (we are working on fixing this).

In the future, we want to remove the dependency on the Spark installation 
tar-ball, we are currently thinking of the best way to do so.

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work

2016-07-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371417#comment-15371417
 ] 

Sergio Peña commented on HIVE-14151:


I don't think there is a test case that executes cli.sh, and tests were not 
executed either way. 
[~vihangk1] Should I commit this patch now?

> Use of USE_DEPRECATED_CLI environment variable does not work
> 
>
> Key: HIVE-14151
> URL: https://issues.apache.org/jira/browse/HIVE-14151
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14151.1.patch
>
>
> According to 
> https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline
>  if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it 
> doesn't seem to work.
> In order to reproduce this issue:
> {noformat}
> $ echo $USE_DEPRECATED_CLI
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> $
> $ export USE_DEPRECATED_CLI=false
> $ echo $USE_DEPRECATED_CLI
> false
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14208) Outer MapJoin uses key of outer input and Converter

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14208:
---
Summary: Outer MapJoin uses key of outer input and Converter  (was: Outer 
MapJoin uses key data of outer input and Converter)

> Outer MapJoin uses key of outer input and Converter
> ---
>
> Key: HIVE-14208
> URL: https://issues.apache.org/jira/browse/HIVE-14208
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Priority: Minor
>
> Consider an left outer MapJoin operator. OI for the outputs are created from 
> outer and inner side from their inputs. However, when there is a match in the 
> join, the data for the key is always taken from the outer side (as it is done 
> currently). Thus, we need to apply the Converter logic on the data to get the 
> correct type.
> This issue is to explore whether a better solution would be to use the key 
> from correct inputs of the join to eliminate need of Converters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14151) Use of USE_DEPRECATED_CLI environment variable does not work

2016-07-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371412#comment-15371412
 ] 

Sergio Peña commented on HIVE-14151:


Looks good.
+1

> Use of USE_DEPRECATED_CLI environment variable does not work
> 
>
> Key: HIVE-14151
> URL: https://issues.apache.org/jira/browse/HIVE-14151
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14151.1.patch
>
>
> According to 
> https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline
>  if we set USE_DEPRECATED_CLI=false it should use beeline for hiveCli. But it 
> doesn't seem to work.
> In order to reproduce this issue:
> {noformat}
> $ echo $USE_DEPRECATED_CLI
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> $
> $ export USE_DEPRECATED_CLI=false
> $ echo $USE_DEPRECATED_CLI
> false
> $ ./hive
> Hive-on-MR is deprecated in Hive 2 and may not be available in the future 
> versions. Consider using a different execution engine (i.e. tez, spark) or 
> using Hive 1.X releases.
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371407#comment-15371407
 ] 

Prasanth Jayachandran commented on HIVE-14196:
--

The test failures seems unrelated btw.

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14141) Fix for HIVE-14062 breaks indirect urls in beeline

2016-07-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14141:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> Fix for HIVE-14062 breaks indirect urls in beeline
> --
>
> Key: HIVE-14141
> URL: https://issues.apache.org/jira/browse/HIVE-14141
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14141.1.patch
>
>
> Looks like the patch for HIVE-14062 breaks indirect urls which uses 
> environment variables to get the url in beeline
> In order to reproduce this issue:
> {noformat}
> $ export BEELINE_URL_DEFAULT="jdbc:hive2://localhost:1"
> $ beeline -u default
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14141) Fix for HIVE-14062 breaks indirect urls in beeline

2016-07-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371403#comment-15371403
 ] 

Sergio Peña commented on HIVE-14141:


Thanks. Looks good to me
+1

> Fix for HIVE-14062 breaks indirect urls in beeline
> --
>
> Key: HIVE-14141
> URL: https://issues.apache.org/jira/browse/HIVE-14141
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-14141.1.patch
>
>
> Looks like the patch for HIVE-14062 breaks indirect urls which uses 
> environment variables to get the url in beeline
> In order to reproduce this issue:
> {noformat}
> $ export BEELINE_URL_DEFAULT="jdbc:hive2://localhost:1"
> $ beeline -u default
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371400#comment-15371400
 ] 

Prasanth Jayachandran commented on HIVE-14196:
--

readAllColumns is just used in debug logging. Updated patch to return false 
immediately when first unsupported type is found.

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14208) MapJoin uses key of outer input and Converter

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14208:
---
Summary: MapJoin uses key of outer input and Converter  (was: Outer MapJoin 
uses key of outer input and Converter)

> MapJoin uses key of outer input and Converter
> -
>
> Key: HIVE-14208
> URL: https://issues.apache.org/jira/browse/HIVE-14208
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Priority: Minor
>
> Consider an left outer MapJoin operator. OI for the outputs are created from 
> outer and inner side from their inputs. However, when there is a match in the 
> join, the data for the key is always taken from the outer side (as it is done 
> currently). Thus, we need to apply the Converter logic on the data to get the 
> correct type.
> This issue is to explore whether a better solution would be to use the key 
> from correct inputs of the join to eliminate need of Converters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14208) MapJoin uses key of outer input and Converter

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371404#comment-15371404
 ] 

Jesus Camacho Rodriguez commented on HIVE-14208:


Cc [~ashutoshc]

> MapJoin uses key of outer input and Converter
> -
>
> Key: HIVE-14208
> URL: https://issues.apache.org/jira/browse/HIVE-14208
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Priority: Minor
>
> Consider an left outer MapJoin operator. OI for the outputs are created from 
> outer and inner side from their inputs. However, when there is a match in the 
> join, the data for the key is always taken from the outer side (as it is done 
> currently). Thus, we need to apply the Converter logic on the data to get the 
> correct type.
> This issue is to explore whether a better solution would be to use the key 
> from correct inputs of the join to eliminate need of Converters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14208) Outer MapJoin uses key data of outer input and Converter

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14208:
---
Issue Type: Improvement  (was: Bug)

> Outer MapJoin uses key data of outer input and Converter
> 
>
> Key: HIVE-14208
> URL: https://issues.apache.org/jira/browse/HIVE-14208
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> Consider an left outer MapJoin operator. OI for the outputs are created from 
> outer and inner side from their inputs. However, when there is a match in the 
> join, the data for the key is always taken from the outer side (as it is done 
> currently). Thus, we need to apply the Converter logic on the data to get the 
> correct type.
> This issue is to explore whether a better solution would be to use the key 
> from correct inputs of the join to eliminate need of Converters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14208) Outer MapJoin uses key data of outer input and Converter

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14208:
---
Priority: Minor  (was: Major)

> Outer MapJoin uses key data of outer input and Converter
> 
>
> Key: HIVE-14208
> URL: https://issues.apache.org/jira/browse/HIVE-14208
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Priority: Minor
>
> Consider an left outer MapJoin operator. OI for the outputs are created from 
> outer and inner side from their inputs. However, when there is a match in the 
> join, the data for the key is always taken from the outer side (as it is done 
> currently). Thus, we need to apply the Converter logic on the data to get the 
> correct type.
> This issue is to explore whether a better solution would be to use the key 
> from correct inputs of the join to eliminate need of Converters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14208) Outer MapJoin uses key data of outer input and Converter

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14208:
---
Component/s: Query Processor

> Outer MapJoin uses key data of outer input and Converter
> 
>
> Key: HIVE-14208
> URL: https://issues.apache.org/jira/browse/HIVE-14208
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Priority: Minor
>
> Consider an left outer MapJoin operator. OI for the outputs are created from 
> outer and inner side from their inputs. However, when there is a match in the 
> join, the data for the key is always taken from the outer side (as it is done 
> currently). Thus, we need to apply the Converter logic on the data to get the 
> correct type.
> This issue is to explore whether a better solution would be to use the key 
> from correct inputs of the join to eliminate need of Converters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9928) Empty buckets are not created on non-HDFS file system

2016-07-11 Thread Rob Leidle (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371397#comment-15371397
 ] 

Rob Leidle commented on HIVE-9928:
--

Can be closed as a duplicate of HIVE-14175.

> Empty buckets are not created on non-HDFS file system
> -
>
> Key: HIVE-9928
> URL: https://issues.apache.org/jira/browse/HIVE-9928
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Ankit Kamboj
> Attachments: HIVE-9928.1.patch
>
>
> Bucketing should create empty buckets on the destination file system. There 
> is a problem in that logic that it uses path.toUri().getPath().toString() to 
> find the relevant path. But this chain of methods always resolves to relative 
> path which ends up creating the empty buckets in hdfs rather than actual 
> destination fs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14196) Disable LLAP IO when complex types are involved

2016-07-11 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14196:
-
Attachment: HIVE-14196.2.patch

Addressed review comments

> Disable LLAP IO when complex types are involved
> ---
>
> Key: HIVE-14196
> URL: https://issues.apache.org/jira/browse/HIVE-14196
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14196.1.patch, HIVE-14196.2.patch
>
>
> Let's exclude vector_complex_* tests added for llap which is currently broken 
> and fails in all test runs. We can re-enable it with HIVE-14089 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371394#comment-15371394
 ] 

Thejas M Nair commented on HIVE-14207:
--

On 2.patch - I like the idea of using custom port to enable HS2 web UI instead 
of having to turn off the in.test config. However, the 
MetaStoreUtils.findFreePort() call has a non zero probability to return default 
port as the available one. Can you also skip the default port if thats what is 
returned ?


> Strip HiveConf hidden params in webui conf
> --
>
> Key: HIVE-14207
> URL: https://issues.apache.org/jira/browse/HIVE-14207
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14207.2.patch, HIVE-14207.patch
>
>
> HIVE-12338 introduced a new web ui, which has a page that displays the 
> current HiveConf being used by HS2. However, before it displays that config, 
> it does not strip entries from it which are considered "hidden" conf 
> parameters, thus exposing those values from a web-ui for HS2. We need to add 
> stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14175) Fix creating buckets without scheme information

2016-07-11 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371391#comment-15371391
 ] 

Thomas Poepping commented on HIVE-14175:


Never mind, you pushed this [~ashutoshc]? thanks!

> Fix creating buckets without scheme information
> ---
>
> Key: HIVE-14175
> URL: https://issues.apache.org/jira/browse/HIVE-14175
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
>  Labels: patch
> Attachments: HIVE-14175.2.patch, HIVE-14175.patch, HIVE-14175.patch
>
>
> If a table is created on a non-default filesystem (i.e. non-hdfs), the empty 
> files will be created with incorrect scheme information. This patch extracts 
> the scheme and authority information for the new paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13191) DummyTable map joins mix up columns between tables

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-13191.

Resolution: Duplicate

Closed as duplicate of HIVE-14027.

> DummyTable map joins mix up columns between tables
> --
>
> Key: HIVE-13191
> URL: https://issues.apache.org/jira/browse/HIVE-13191
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Pengcheng Xiong
> Attachments: tez.q
>
>
> {code}
> SELECT
>   a.key,
>   a.a_one,
>   b.b_one,
>   a.a_zero,
>   b.b_zero
> FROM
> (
> SELECT
>   11 key,
>   0 confuse_you,
>   1 a_one,
>   0 a_zero
> ) a
> LEFT JOIN
> (
> SELECT
>   11 key,
>   0 confuse_you,
>   1 b_one,
>   0 b_zero
> ) b
> ON a.key = b.key
> ;
> 11  1   0   0   1
> {code}
> This should be 11, 1, 1, 0, 0 instead. 
> Disabling map-joins & using shuffle-joins returns the right result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371388#comment-15371388
 ] 

Owen O'Malley commented on HIVE-13974:
--

First pass comments on the ORC changes:

* You *must* include unit tests in the ORC module for changes there.
* Don't move checkAcidSchema around and certainly don't make it a public API. 
We should probably have ReaderImpl pass a boolean to the constructor of 
SchemaEvolution saying that the file is Acid. Using the column names is bad and 
we should probably move over to use the acid stats property as the check.
* SameCategoryAndAttributes is a duplication of TypeDescription.equals.
* We need to integrate this with ORC-54 too.
* I like pulling the include logic into SchemaEvolution.
* Please use 'reader' instead of 'logical' in the names in SchemaEvolution.

I'm still going through the SchemaEvolution changes.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371355#comment-15371355
 ] 

Hive QA commented on HIVE-14195:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817101/HIVE-14195.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10304 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/468/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/468/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-468/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817101 - PreCommit-HIVE-MASTER-Build

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14195.2.patch, HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14207:

Attachment: HIVE-14207.2.patch

Updated patch.

> Strip HiveConf hidden params in webui conf
> --
>
> Key: HIVE-14207
> URL: https://issues.apache.org/jira/browse/HIVE-14207
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14207.2.patch, HIVE-14207.patch
>
>
> HIVE-12338 introduced a new web ui, which has a page that displays the 
> current HiveConf being used by HS2. However, before it displays that config, 
> it does not strip entries from it which are considered "hidden" conf 
> parameters, thus exposing those values from a web-ui for HS2. We need to add 
> stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14207:

Status: Patch Available  (was: Open)

> Strip HiveConf hidden params in webui conf
> --
>
> Key: HIVE-14207
> URL: https://issues.apache.org/jira/browse/HIVE-14207
> Project: Hive
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14207.2.patch, HIVE-14207.patch
>
>
> HIVE-12338 introduced a new web ui, which has a page that displays the 
> current HiveConf being used by HS2. However, before it displays that config, 
> it does not strip entries from it which are considered "hidden" conf 
> parameters, thus exposing those values from a web-ui for HS2. We need to add 
> stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14188) LLAPIF: wrong user field is used from the token

2016-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371310#comment-15371310
 ] 

Sergey Shelukhin commented on HIVE-14188:
-

Most of the failures are known issues; the timed-out test is due to the NN 
going into safemode. [~gopalv] ping?

> LLAPIF: wrong user field is used from the token
> ---
>
> Key: HIVE-14188
> URL: https://issues.apache.org/jira/browse/HIVE-14188
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14188.patch, HIVE-14188.patch
>
>
> realUser is not usually set in all cases for delegation tokens; we should use 
> the owner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >