[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-10 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081694#comment-16081694
 ] 

liyunzhang_intel commented on HIVE-17018:
-

[~csun]: 
{quote}
Are you trying to explain that HoS is overly aggressive in turning JOINs to 
MAPJOINs when there're chained JOIN operators?
{quote}
I can not explain. From the 
[code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364],
 I guess this is what author wanted. 

But from the definition of {{hive.auto.convert.join.noconditionaltask.size}}, I 
think this is confusing.
{noformat}
hive.auto.convert.join.noconditionaltask.size means the sum of size for n-1 of 
the tables/partitions for a n-way join is smaller than it, it will be converted 
to a map join. 
{noformat}

The code was committed by
{noformat}
HIVE-8943: Fix memory limit check for combine nested mapjoins [Spark Branch] 
(Szehon via Xuefu)  git-svn-id: 
https://svn.apache.org/repos/asf/hive/branches/spark@1643058 
13f79535-47bb-0310-9956-ffa450edef68
{noformat}


> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Open  (was: Patch Available)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Patch Available  (was: Open)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Attachment: HIVE-16966.04.patch

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-10 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081648#comment-16081648
 ] 

Chao Sun commented on HIVE-17018:
-

Thanks for the examples [~kellyzly]. Are you trying to explain that HoS is 
overly aggressive in turning JOINs to MAPJOINs when there're chained JOIN 
operators? E.g., the above {{JOIN 8}} cannot be converted.

If so, I'm thinking this may be OK since the two MAPJOINs are in different work 
(one in Map 1 and another in Reducer 2).

> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Patch Available  (was: Open)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Open  (was: Patch Available)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081661#comment-16081661
 ] 

Hive QA commented on HIVE-17066:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876541/HIVE-17066.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5949/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5949/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5949/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876541 - PreCommit-HIVE-Build

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081658#comment-16081658
 ] 

Eugene Koifman commented on HIVE-16177:
---

no related failures
committed patch 18 to master (3.0)
thanks Sergey, Owen for the review

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15898:
--
Attachment: HIVE-15898.06.patch

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, 
> HIVE-15898.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081621#comment-16081621
 ] 

Hive QA commented on HIVE-16177:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876537/HIVE-16177.18.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6]
 (batchId=7)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5948/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5948/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5948/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876537 - PreCommit-HIVE-Build

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081614#comment-16081614
 ] 

Eugene Koifman commented on HIVE-16732:
---

failures with "error code = 10266" are expected since HIVE-16177 is not 
committed yet
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_orig_table] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table]
 (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table]
 (batchId=156)

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17010) Fix the overflow problem of Long type in SetSparkReducerParallelism

2017-07-10 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-17010:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to the upstream. Thanks Liyun for the patch and thanks Chao and Rui 
for the reviews.

> Fix the overflow problem of Long type in SetSparkReducerParallelism
> ---
>
> Key: HIVE-17010
> URL: https://issues.apache.org/jira/browse/HIVE-17010
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: 3.0.0
>
> Attachments: HIVE-17010.1.patch, HIVE-17010.2.patch, 
> HIVE-17010.3.patch
>
>
> We use 
> [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129]
>  to collect the numberOfBytes of sibling of specified RS. We use Long type 
> and it happens overflow when the data is too big. After happening this 
> situation, the parallelism is decided by 
> [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184]
>  if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond 
> is a dymamic value which is decided by spark runtime. For example, the value 
> of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility 
> that the value may be 1. The may problem here is the overflow of addition of 
> Long type.  You can reproduce the overflow problem by following code
> {code}
> public static void main(String[] args) {
>   long a1= 9223372036854775807L;
>   long a2=1022672;
>   long res = a1+a2;
>   System.out.println(res);  //-9223372036853753137
>   BigInteger b1= BigInteger.valueOf(a1);
>   BigInteger b2 = BigInteger.valueOf(a2);
>   BigInteger bigRes = b1.add(b2);
>   System.out.println(bigRes); //9223372036855798479
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17070) remove .orig files from src

2017-07-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081607#comment-16081607
 ] 

Eugene Koifman commented on HIVE-17070:
---

no related failures
[~jdere] could you review please

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-10 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081599#comment-16081599
 ] 

Bing Li commented on HIVE-4577:
---

Hi, [~vgumashta]
Yes, sure. I will rebase the patch with the latest master.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081568#comment-16081568
 ] 

Hive QA commented on HIVE-17071:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876536/HIVE-17071-branch-2.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5947/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5947/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5947/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-11 03:02:17.732
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5947/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z branch-2.3 ]]
+ [[ -d apache-github-branch-2.3-source ]]
+ [[ ! -d apache-github-branch-2.3-source/.git ]]
+ [[ ! -d apache-github-branch-2.3-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-11 03:02:17.761
+ cd apache-github-branch-2.3-source
+ git fetch origin
>From https://github.com/apache/hive
   32fd02b..31cee7e  branch-2.3 -> origin/branch-2.3
   5431fad..6a63742  branch-2   -> origin/branch-2
 + 1efb4da...61867c7 branch-2.2 -> origin/branch-2.2  (forced update)
   52e0f8f..81853c1  hive-14535 -> origin/hive-14535
   a18e772..7580de9  master -> origin/master
   e2ecc92..fea9142  storage-branch-2.3 -> origin/storage-branch-2.3
 * [new tag] rel/storage-release-2.3.1 -> rel/storage-release-2.3.1
+ git reset --hard HEAD
HEAD is now at 32fd02b Revert "HIVE-12767: Implement table property to address 
Parquet int96 timestamp bug (Barna Zsombor Klara and Sergio Pena, reviewed by 
Ryan Blue)"
+ git clean -f -d
Removing 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java.orig
+ git checkout branch-2.3
Already on 'branch-2.3'
Your branch is behind 'origin/branch-2.3' by 8 commits, and can be 
fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/branch-2.3
HEAD is now at 31cee7e HIVE-15144: JSON.org license is now CatX (Owen O'Malley, 
reviewed by Alan Gates)
+ git merge --ff-only origin/branch-2.3
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-11 03:02:26.431
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file pom.xml
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-branch-2.3-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-branch-2.3-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList
ENHANCED (Persistable) : 

[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081565#comment-16081565
 ] 

Hive QA commented on HIVE-16732:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876526/HIVE-16732.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10835 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_orig_table] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table]
 (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5946/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5946/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5946/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876526 - PreCommit-HIVE-Build

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17070) remove .orig files from src

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081508#comment-16081508
 ] 

Hive QA commented on HIVE-17070:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876523/HIVE-17070.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimal 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5945/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5945/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5945/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876523 - PreCommit-HIVE-Build

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations

2017-07-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17067:
-
Attachment: HIVE-17067.2.patch

Avoided running sysctl on every invocation. Requires /system?refresh=true to 
re-run sysctl command. Minor fixes for Mac sysctl output. 

> LLAP: Add http endpoint to provide system level configurations
> --
>
> Key: HIVE-17067
> URL: https://issues.apache.org/jira/browse/HIVE-17067
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17067.1.patch, HIVE-17067.2.patch
>
>
> Add an endpoint to get kernel and network configs via sysctl. Also memory 
> related configs like transparent huge pages config can be added. "ulimit -a" 
> can be added to llap startup script as it needs a shell. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081457#comment-16081457
 ] 

Hive QA commented on HIVE-15898:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876527/HIVE-15898.05.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10835 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=289)
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore
 (batchId=229)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=220)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5944/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5944/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5944/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876527 - PreCommit-HIVE-Build

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-10 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081444#comment-16081444
 ] 

Vineet Garg commented on HIVE-17066:


[~ashutoshc] Uploaded patch with updated golden files. Review board link is [RB 
LINK | https://reviews.apache.org/r/60757/]

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-07-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081441#comment-16081441
 ] 

Vaibhav Gumashta commented on HIVE-13989:
-

Thanks a lot [~cdrome]

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989-branch-1.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> group::r-x
> mask::rwx
> other::---
> default:user::rwx
> default:user:hdfs:rwx
> default:group::r-x
> default:mask::rwx
> default:other::---
> {noformat}
> Note that the basic GROUP permission is set to {{rwx}} after setting the 
> ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for 
> the {{hdfs}} user.
> Run the following query to populate the table:
> {noformat}
> insert into acl_test partition (dt='a', ds='b') select a, b from words_text 
> where dt = 'c';
> {noformat}
> Note that words_text only has a single partition key.
> Now examine the ACLs for the resulting directories:
> {noformat}
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> group::r-x
> mask::rwx
> other::---
> default:user::rwx
> default:user:hdfs:rwx
> default:group::r-x
> 

[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Patch Available  (was: Open)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Open  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: HIVE-17066.2.patch

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-07-10 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081414#comment-16081414
 ] 

Chris Drome commented on HIVE-13989:


[~vgumashta], yes, I will come back to this and verify whether there are still 
issues in trunk (this patch was originally written against 1.2).

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989-branch-1.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> group::r-x
> mask::rwx
> other::---
> default:user::rwx
> default:user:hdfs:rwx
> default:group::r-x
> default:mask::rwx
> default:other::---
> {noformat}
> Note that the basic GROUP permission is set to {{rwx}} after setting the 
> ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for 
> the {{hdfs}} user.
> Run the following query to populate the table:
> {noformat}
> insert into acl_test partition (dt='a', ds='b') select a, b from words_text 
> where dt = 'c';
> {noformat}
> Note that words_text only has a single partition key.
> Now examine the ACLs for the resulting directories:
> {noformat}
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> 

[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16973:
---
Target Version/s: 3.0.0  (was: 2.3.0)

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, 
> HIVE-16973.004.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081405#comment-16081405
 ] 

Pengcheng Xiong commented on HIVE-16973:


Hello, I am deferring this to Hive 3.0 as we are going to cut the next RC and 
it is not marked as blocker. Please feel free to commit to the branch if this 
can be resolved before the release.Thanks!


> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, 
> HIVE-16973.004.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081400#comment-16081400
 ] 

Hive QA commented on HIVE-16177:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876521/HIVE-16177.17.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.ql.TestTxnCommands.testNonAcidToAcidConversion01 
(batchId=282)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 
(batchId=269)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02
 (batchId=280)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02
 (batchId=277)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5943/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5943/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5943/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876521 - PreCommit-HIVE-Build

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081390#comment-16081390
 ] 

Eugene Koifman commented on HIVE-16177:
---

patch 18 (vs 16) adds a bunch of clarifying comments and contains very minor 
code changes: 1. creates a Comparator in AcidUtils to sort "original" files per 
Owen's suggestion and makes "isLastFileForThisBucket" in 
OrcRawRecordMerger.OriginalReaderPair() make more sense.

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-07-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081249#comment-16081249
 ] 

Ashutosh Chauhan commented on HIVE-16888:
-

[~bslim] Can you please also review plan changes for druid test cases as I am 
not sure about some of them.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch, 
> HIVE-16888.06.patch, HIVE-16888.07.patch, HIVE-16888.08.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-17071:
---
Attachment: HIVE-17071-branch-2.3.patch

> Make hive 2.3 depend on storage-api-2.3
> ---
>
> Key: HIVE-17071
> URL: https://issues.apache.org/jira/browse/HIVE-17071
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
> Fix For: 2.3.0
>
> Attachments: HIVE-17071-branch-2.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-17071:
--

Assignee: Pengcheng Xiong

> Make hive 2.3 depend on storage-api-2.3
> ---
>
> Key: HIVE-17071
> URL: https://issues.apache.org/jira/browse/HIVE-17071
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.3.0
>
> Attachments: HIVE-17071-branch-2.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.18.patch

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-17071:
---
Status: Patch Available  (was: Open)

> Make hive 2.3 depend on storage-api-2.3
> ---
>
> Key: HIVE-17071
> URL: https://issues.apache.org/jira/browse/HIVE-17071
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
> Fix For: 2.3.0
>
> Attachments: HIVE-17071-branch-2.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081366#comment-16081366
 ] 

Ashutosh Chauhan commented on HIVE-17066:
-

Can you update golden files and create a RB for this?

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081337#comment-16081337
 ] 

Hive QA commented on HIVE-16996:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876501/HIVE-16966.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 313 failed/errored test(s), 10245 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_update_status]
 (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_column_stats]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status]
 (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[analyze_tbl_part] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_deep_filters]
 (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby2] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_limit] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_part] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_union] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal_native] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_const] (batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join0] 
(batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_join0] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[colstats_all_nulls] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_names_with_leading_and_trailing_spaces]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_pruner_multiple_children]
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl_dp] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_quoting] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_tbllvl] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_date] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[confirm_initial_tbl_stats]
 (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_2] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlated_join_keys] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_stats] 
(batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[describe_table] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[display_colstats_tbllvl] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exec_parallel_column_stats]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_full]
 (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_partial]
 (batchId=46)

[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-07-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081335#comment-16081335
 ] 

Vaibhav Gumashta commented on HIVE-13989:
-

[~cdrome] Thanks for the work so far. Looks like a bug we should definitely 
merge into master. Will you have time to address [~caritaou]'s review comments?

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989-branch-1.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> user:hdfs:rwx
> group::r-x
> mask::rwx
> other::---
> default:user::rwx
> default:user:hdfs:rwx
> default:group::r-x
> default:mask::rwx
> default:other::---
> {noformat}
> Note that the basic GROUP permission is set to {{rwx}} after setting the 
> ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for 
> the {{hdfs}} user.
> Run the following query to populate the table:
> {noformat}
> insert into acl_test partition (dt='a', ds='b') select a, b from words_text 
> where dt = 'c';
> {noformat}
> Note that words_text only has a single partition key.
> Now examine the ACLs for the resulting directories:
> {noformat}
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: hdfs
> user::rwx
> 

[jira] [Updated] (HIVE-17070) remove .orig files from src

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17070:
--
Status: Patch Available  (was: Open)

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2017-07-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081312#comment-16081312
 ] 

Pengcheng Xiong commented on HIVE-15144:


pushed to master. and cherry-picked to 2.3

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, 
> HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15898:
--
Attachment: HIVE-15898.05.patch

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081330#comment-16081330
 ] 

Matt McCline commented on HIVE-16730:
-

+1 LGTM

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, 
> HIVE-16730.3.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16732:
--
Attachment: HIVE-16732.02.patch

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17070) remove .orig files from src

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17070:
-


> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations

2017-07-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081294#comment-16081294
 ] 

Prasanth Jayachandran commented on HIVE-17067:
--

I will cache the results from sysctl and add an option to force read (refresh) 
in case if required.

> LLAP: Add http endpoint to provide system level configurations
> --
>
> Key: HIVE-17067
> URL: https://issues.apache.org/jira/browse/HIVE-17067
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17067.1.patch
>
>
> Add an endpoint to get kernel and network configs via sysctl. Also memory 
> related configs like transparent huge pages config can be added. "ulimit -a" 
> can be added to llap startup script as it needs a shell. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.17.patch

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.17.patch

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: (was: HIVE-16177.17.patch)

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17070) remove .orig files from src

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17070:
--
Attachment: HIVE-17070.patch

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations

2017-07-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081280#comment-16081280
 ] 

Gopal V commented on HIVE-17067:


I don't have a specific security concern for the information in sysctl (maybe 
kernel version).

My concern is the fork + exec operation that happens here (with YARN memory 
monitoring) - each web hit to the end point triggering a fork + exec, is 
somewhat of a noisy operation.

Most of this does not change every second (unlike say SNMP counters).

> LLAP: Add http endpoint to provide system level configurations
> --
>
> Key: HIVE-17067
> URL: https://issues.apache.org/jira/browse/HIVE-17067
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17067.1.patch
>
>
> Add an endpoint to get kernel and network configs via sysctl. Also memory 
> related configs like transparent huge pages config can be added. "ulimit -a" 
> can be added to llap startup script as it needs a shell. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081273#comment-16081273
 ] 

Hive QA commented on HIVE-16730:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876497/HIVE-16730.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=109)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5940/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5940/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5940/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876497 - PreCommit-HIVE-Build

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, 
> HIVE-16730.3.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081245#comment-16081245
 ] 

Prasanth Jayachandran commented on HIVE-17068:
--

[~masokan] Thanks for the pointer. Yes. Both looks the same. I will mark this 
jira as duplicate. 

> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17068:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Dup of HIVE-8838


> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16911) Upgrade groovy version to 2.4.11

2017-07-10 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16911:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks Yongzhi for reviewing.

> Upgrade groovy version to 2.4.11
> 
>
> Key: HIVE-16911
> URL: https://issues.apache.org/jira/browse/HIVE-16911
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 3.0.0
>
> Attachments: HIVE-16911.1.patch
>
>
> Hive currently uses groovy 2.4.4 which has security issue 
> (https://access.redhat.com/security/cve/cve-2016-6814). Need to upgrade to 
> 2.4.8 or later. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081238#comment-16081238
 ] 

Mariappan Asokan commented on HIVE-17068:
-

Hi Prasanth,
  I am wondering whether HIVE-8838 is related to this Jira.  Thanks.


> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17068:
-
Attachment: HIVE-17068.2.patch

Enable Parquet unit tests. 

> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Attachment: HIVE-16966.03.patch

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Patch Available  (was: Open)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Open  (was: Patch Available)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17069:
-


> Refactor OrcRawRecrodMerger.ReaderPair
> --
>
> Key: HIVE-17069
> URL: https://issues.apache.org/jira/browse/HIVE-17069
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> this should be done post HIVE-16177 so as not to obscure the functional 
> changes completely
> Make ReaderPair an interface
> ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" 
> code path
> OriginalReaderPair - same as now but w/o incomprehensible override/variable 
> shadowing logic.
> Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common 
> base class.
> Push discoverKeyBounds() into appropriate implementation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-10 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16730:
--
Attachment: HIVE-16730.3.patch

This third patch fixes a bug in LazySimpleDeserializeRead. It tried to parse 
the last column in a struct column with its data and following additional data, 
too. This patch makes LazySimpleDeserializeRead read the last column in a 
struct column with its column data only.

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, 
> HIVE-16730.3.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations

2017-07-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081161#comment-16081161
 ] 

Prasanth Jayachandran commented on HIVE-17067:
--

Some configs already throw Permission Denied for non-root users. This exposes 
what non-root user can read via sysctl -a although no check 
for root access is done. Also we don't support POST/update to configs via 
endpoint. [~gopalv] any thoughts for security here?


> LLAP: Add http endpoint to provide system level configurations
> --
>
> Key: HIVE-17067
> URL: https://issues.apache.org/jira/browse/HIVE-17067
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17067.1.patch
>
>
> Add an endpoint to get kernel and network configs via sysctl. Also memory 
> related configs like transparent huge pages config can be added. "ulimit -a" 
> can be added to llap startup script as it needs a shell. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations

2017-07-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081123#comment-16081123
 ] 

Siddharth Seth commented on HIVE-17067:
---

+1. Looks good. Does anything specific need to be looked at in terms of 
security?

> LLAP: Add http endpoint to provide system level configurations
> --
>
> Key: HIVE-17067
> URL: https://issues.apache.org/jira/browse/HIVE-17067
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17067.1.patch
>
>
> Add an endpoint to get kernel and network configs via sysctl. Also memory 
> related configs like transparent huge pages config can be added. "ulimit -a" 
> can be added to llap startup script as it needs a shell. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081094#comment-16081094
 ] 

Hive QA commented on HIVE-16973:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876479/HIVE-16973.004-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10582 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5939/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5939/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5939/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876479 - PreCommit-HIVE-Build

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, 
> HIVE-16973.004.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size

2017-07-10 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16690:

Attachment: HIVE-16690.addendum.patch

Upload addendum patch which avoids accessing uninitialized Llap cluster info 
(which cause NPE).

> Configure Tez cartesian product edge based on LLAP cluster size
> ---
>
> Key: HIVE-16690
> URL: https://issues.apache.org/jira/browse/HIVE-16690
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16690.1.patch, HIVE-16690.addendum.patch
>
>
> In HIVE-14731 we are using default value for target parallelism of fair 
> cartesian product edge. Ideally this should be set according to cluster size. 
> In case of LLAP it's pretty easy to get cluster size, i.e., number of 
> executors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080902#comment-16080902
 ] 

Hive QA commented on HIVE-16973:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876465/HIVE-16973.004.branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5938/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5938/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5938/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-10 19:09:02.704
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5938/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 19:09:02.709
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 19:09:07.749
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java:52
error: 
accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java:
 patch does not apply
error: patch failed: 
accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java:125
error: 
accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876465 - PreCommit-HIVE-Build

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch, HIVE-16973.004.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the 

[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-16973:
--
Attachment: HIVE-16973.004-branch-2.patch

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, 
> HIVE-16973.004.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080892#comment-16080892
 ] 

Hive QA commented on HIVE-17068:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876460/HIVE-17068.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5937/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5937/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5937/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876460 - PreCommit-HIVE-Build

> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-16973:
--
Attachment: HIVE-16973.004.branch-2.patch

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch, HIVE-16973.004.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16688) Make sure Alter Table to set transaction=true acquires X lock

2017-07-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16688:
--
Priority: Critical  (was: Major)

> Make sure Alter Table to set transaction=true acquires X lock
> -
>
> Key: HIVE-16688
> URL: https://issues.apache.org/jira/browse/HIVE-16688
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
> suppose we have non-acid table with some data
> An insert op starts (long running)
> An alter table runs to add (transactional=true)
> An update is run which will read the list of "original" files and assign IDs 
> on the fly which are written to a delta file.
> The long running insert completes.
> Another update is run which now sees a different set of "original" files and 
> will (most likely) assign different IDs.
> Need to make sure to mutex this



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080778#comment-16080778
 ] 

Hive QA commented on HIVE-4577:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12728375/HIVE-4577.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5936/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5936/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5936/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-10 18:08:49.193
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5936/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 18:08:49.197
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 18:08:49.895
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java: No 
such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12728375 - PreCommit-HIVE-Build

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080771#comment-16080771
 ] 

Prasanth Jayachandran commented on HIVE-17068:
--

[~sushanth] can you please review this patch?

> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17068:
-
Status: Patch Available  (was: Open)

> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17068:
-
Attachment: HIVE-17068.1.patch

> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17068.1.patch
>
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-10 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080758#comment-16080758
 ] 

Vaibhav Gumashta commented on HIVE-4577:


[~libing] thanks a lot for the patch and apologies that this went out of sight. 
Would you like to rebase it one more time for master? I am +1 on the changes. 

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-16973:
--
Attachment: HIVE-16973.003.branch-2.patch

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080734#comment-16080734
 ] 

Josh Elser commented on HIVE-16973:
---

bq. Spinning up my local install to verify the Kerberos portion wasn't affected.

Local tests with Kerberos show this is fine as well. I'll need to spend the 
time to add a qtest that does Accumulo with Kerberos to try to prevent some 
regressions (this will take a day or two though).

v3 patch is the same as v2 but was just a normal `git diff` patch instead of a 
formatted for email (e.g. git-format-patch).

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, 
> HIVE-16973.003.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080733#comment-16080733
 ] 

Hive QA commented on HIVE-16973:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876456/HIVE-16973.003.branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5935/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5935/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5935/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-10 17:48:43.713
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5935/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 17:48:43.716
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 17:48:44.321
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloConnectionParameters.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/HiveAccumuloHelper.java:
 No such file or directory
error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java: 
No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/CompositeAccumuloRowIdFactory.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/DefaultAccumuloRowIdFactory.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestHiveAccumuloHelper.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableInputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableOutputFormat.java:
 No such file or directory
error: a/itests/qtest-accumulo/pom.xml: No such file or directory
error: a/pom.xml: No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876456 - PreCommit-HIVE-Build

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
> 

[jira] [Assigned] (HIVE-17068) HCatalog: Add parquet support

2017-07-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-17068:



> HCatalog: Add parquet support
> -
>
> Key: HIVE-17068
> URL: https://issues.apache.org/jira/browse/HIVE-17068
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> MapredParquetOutputFormat has to support getRecordWriter() for parquet format 
> to be used from HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080716#comment-16080716
 ] 

Hive QA commented on HIVE-16973:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876450/HIVE-16973.002.branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5934/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5934/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5934/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-10 17:36:50.135
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5934/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 17:36:50.138
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should 
compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh 
Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-10 17:36:52.908
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloConnectionParameters.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/HiveAccumuloHelper.java:
 No such file or directory
error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java: 
No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/CompositeAccumuloRowIdFactory.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/DefaultAccumuloRowIdFactory.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestHiveAccumuloHelper.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableInputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableOutputFormat.java:
 No such file or directory
error: a/itests/qtest-accumulo/pom.xml: No such file or directory
error: a/pom.xml: No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876450 - PreCommit-HIVE-Build

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
> 

[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-07-10 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-16973:
--
Attachment: HIVE-16973.002.branch-2.patch

.002 Some more code consolidation/cleanup. Ran unit test and the accumulo 
qtests locally.

Spinning up my local install to verify the Kerberos portion wasn't affected.

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080711#comment-16080711
 ] 

Hive QA commented on HIVE-16730:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876439/HIVE-16730.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=100)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5933/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5933/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5933/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876439 - PreCommit-HIVE-Build

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17021) Support replication of concatenate operation.

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080614#comment-16080614
 ] 

Hive QA commented on HIVE-17021:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876435/HIVE-17021.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10836 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5932/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5932/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5932/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876435 - PreCommit-HIVE-Build

> Support replication of concatenate operation.
> -
>
> Key: HIVE-17021
> URL: https://issues.apache.org/jira/browse/HIVE-17021
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17021.01.patch
>
>
> We need to handle cases like ALTER TABLE ... CONCATENATE that also change the 
> files on disk, and potentially treat them similar to INSERT OVERWRITE, as it 
> does something equivalent to a compaction.
> Note that a ConditionalTask might also be fired at the end of inserts at the 
> end of a tez task (or other exec engine) if appropriate HiveConf settings are 
> set, to automatically do this operation - these also need to be taken care of 
> for replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-10 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080588#comment-16080588
 ] 

Teddy Choi commented on HIVE-16730:
---

This second patch fixes a bug that doesn't skip columns when a deserialize read 
supports read field.

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types

2017-07-10 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16730:
--
Attachment: HIVE-16730.2.patch

> Vectorization: Schema Evolution for Text Vectorization / Complex Types
> --
>
> Key: HIVE-16730
> URL: https://issues.apache.org/jira/browse/HIVE-16730
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch
>
>
> With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes 
> PARTIAL2, FINAL, and COMPLETE  for AVG" change, the tests 
> schema_evol_text_vec_part_all_complex.q and 
> schema_evol_text_vecrow_part_all_complex.q fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17021) Support replication of concatenate operation.

2017-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080518#comment-16080518
 ] 

ASF GitHub Bot commented on HIVE-17021:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/202

HIVE-17021: Support replication of concatenate operation.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-17021

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #202


commit e68853deef64037260fb951da329011ef6e5a3d5
Author: Sankar Hariappan 
Date:   2017-07-10T15:24:45Z

HIVE-17021: Support replication of concatenate operation.




> Support replication of concatenate operation.
> -
>
> Key: HIVE-17021
> URL: https://issues.apache.org/jira/browse/HIVE-17021
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17021.01.patch
>
>
> We need to handle cases like ALTER TABLE ... CONCATENATE that also change the 
> files on disk, and potentially treat them similar to INSERT OVERWRITE, as it 
> does something equivalent to a compaction.
> Note that a ConditionalTask might also be fired at the end of inserts at the 
> end of a tez task (or other exec engine) if appropriate HiveConf settings are 
> set, to automatically do this operation - these also need to be taken care of 
> for replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17021) Support replication of concatenate operation.

2017-07-10 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17021:

Status: Patch Available  (was: Open)

> Support replication of concatenate operation.
> -
>
> Key: HIVE-17021
> URL: https://issues.apache.org/jira/browse/HIVE-17021
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17021.01.patch
>
>
> We need to handle cases like ALTER TABLE ... CONCATENATE that also change the 
> files on disk, and potentially treat them similar to INSERT OVERWRITE, as it 
> does something equivalent to a compaction.
> Note that a ConditionalTask might also be fired at the end of inserts at the 
> end of a tez task (or other exec engine) if appropriate HiveConf settings are 
> set, to automatically do this operation - these also need to be taken care of 
> for replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work stopped] (HIVE-17021) Support replication of concatenate operation.

2017-07-10 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17021 stopped by Sankar Hariappan.
---
> Support replication of concatenate operation.
> -
>
> Key: HIVE-17021
> URL: https://issues.apache.org/jira/browse/HIVE-17021
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> We need to handle cases like ALTER TABLE ... CONCATENATE that also change the 
> files on disk, and potentially treat them similar to INSERT OVERWRITE, as it 
> does something equivalent to a compaction.
> Note that a ConditionalTask might also be fired at the end of inserts at the 
> end of a tez task (or other exec engine) if appropriate HiveConf settings are 
> set, to automatically do this operation - these also need to be taken care of 
> for replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17021) Support replication of concatenate operation.

2017-07-10 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17021:

Attachment: HIVE-17021.01.patch

Added 01.patch with test cases to verify concatenate operations.
- Concatenate operation either from ALTER TABLE or ConditionalTask prepares the 
plan MergeOperator->MoveTask
- MergeOperator, merge all the files from the oven input path and push the 
output merged file to the temporary staging directory.
- MoveTask, moves the merged file from temporary directory to the final 
warehouse data location. This task uses loadTable and loadPartition methods to 
load data from temp path to the warehouse which is basically used by Insert 
Overwrite flow. 
- Hence, CM recycle and firing insert event done already in the existing code. 
- Just added test cases to verify it.
Request [~anishek]/[~daijy]/[~sushanth]/[~thejas] to review!

> Support replication of concatenate operation.
> -
>
> Key: HIVE-17021
> URL: https://issues.apache.org/jira/browse/HIVE-17021
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17021.01.patch
>
>
> We need to handle cases like ALTER TABLE ... CONCATENATE that also change the 
> files on disk, and potentially treat them similar to INSERT OVERWRITE, as it 
> does something equivalent to a compaction.
> Note that a ConditionalTask might also be fired at the end of inserts at the 
> end of a tez task (or other exec engine) if appropriate HiveConf settings are 
> set, to automatically do this operation - these also need to be taken care of 
> for replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080486#comment-16080486
 ] 

Hive QA commented on HIVE-17063:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876417/HIVE-17063.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10835 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_after_drop_partition]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5931/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5931/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5931/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876417 - PreCommit-HIVE-Build

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at 

[jira] [Commented] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error

2017-07-10 Thread Krishna Vaidyanath (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080472#comment-16080472
 ] 

Krishna Vaidyanath commented on HIVE-16983:
---

If we solve this, I propose we add the solution to the troubleshooting section. 

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error

2017-07-10 Thread Krishna Vaidyanath (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080465#comment-16080465
 ] 

Krishna Vaidyanath commented on HIVE-16983:
---

I've been working this case with Alex, and can tell you that the same 
permissions, credentials, etc. all work correctly and as expected with s3n:// 
and s3://. They do not work with s3a://, despite setting the properties in 
core-site.xml that in code and doc are expected. 

We went through the troubleshooting docs but they did not provide any insight 
or guidance to fix this problem. 

Vlad, you're on to something, we will test with joda 2.9.9, we have been using 
2.8.1. 



> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-10 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17063:
---
Status: Patch Available  (was: In Progress)

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1, 1.2.2, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> 

[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-10 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17063:
---
Attachment: HIVE-17063.2.patch

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-10 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17063:
---
Status: In Progress  (was: Patch Available)

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1, 1.2.2, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   

[jira] [Comment Edited] (HIVE-14487) Add REBUILD statement for materialized views

2017-07-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438566#comment-15438566
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14487 at 7/10/17 12:33 PM:
--

[~ekoifman], thanks for the feedback.

That is fair point and something I had not considered yet; we do not do 
anything special in HIVE-14249, which would lead to inconsistent/incorrect 
results if a user uses the materialized view while it is being rebuilt. I guess 
raising an error should be enough. Then we would need to keep the state for the 
materialized view in the metastore? Or do you have any other idea?

I can 1) create a follow-up for this, as HIVE-14249 has passed QA and is ready 
to go in, 2) I can add the new logic to HIVE-14249, or 3) I can remove the 
logic for REBUILD completely from HIVE-14249 and put it all together in a new 
patch. I am inclined to go with 3. What is your take?


was (Author: jcamachorodriguez):
[~ekoifman], thanks for the feedback.

That is fair point and something I had not considered yet; we do not do 
anything special in HIVE-14487, which would lead to inconsistent/incorrect 
results if a user uses the materialized view while it is being rebuilt. I guess 
raising an error should be enough. Then we would need to keep the state for the 
materialized view in the metastore? Or do you have any other idea?

I can 1) create a follow-up for this, as HIVE-14487 has passed QA and is ready 
to go in, 2) I can add the new logic to HIVE-14487, or 3) I can remove the 
logic for REBUILD completely from HIVE-14487 and put it all together in a new 
patch. I am inclined to go with 3. What is your take?

> Add REBUILD statement for materialized views
> 
>
> Key: HIVE-14487
> URL: https://issues.apache.org/jira/browse/HIVE-14487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> Support for rebuilding existing materialized views. The statement is the 
> following:
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14487) Add REBUILD statement for materialized views

2017-07-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080263#comment-16080263
 ] 

Jesus Camacho Rodriguez commented on HIVE-14487:


[~asomani], since then I have not had the chance to work on this. We went with 
option 3 described above, thus the _rebuild_ option was not added in 
HIVE-14249. I hope to find the time to add the _REBUILD_ option for 3.0 
release; in turn, contributions are welcome. Thanks

> Add REBUILD statement for materialized views
> 
>
> Key: HIVE-14487
> URL: https://issues.apache.org/jira/browse/HIVE-14487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> Support for rebuilding existing materialized views. The statement is the 
> following:
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17035) Optimizer: Lineage transform() should be invoked after rest of the optimizers are invoked

2017-07-10 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080239#comment-16080239
 ] 

Rajesh Balamohan commented on HIVE-17035:
-

RB: https://reviews.apache.org/r/60743/

> Optimizer: Lineage transform() should be invoked after rest of the optimizers 
> are invoked
> -
>
> Key: HIVE-17035
> URL: https://issues.apache.org/jira/browse/HIVE-17035
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-17035.1.patch, HIVE-17035.2.patch, 
> HIVE-17035.3.patch, HIVE-17035.4.patch
>
>
> In a fairly large query which had tens of left join, time taken to create 
> linageInfo itself took 1500+ seconds. This is due to the fact that the table 
> had lots of columns and in some processing, it ended up processing 7000+ 
> value columns in {{ReduceSinkLineage}}, though only 50 columns were projected 
> in the query. 
> It would be good to invoke lineage transform when rest of the optimizers in 
> {{Optimizer}} are invoked. This would avoid unwanted processing and help in 
> improving the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17035) Optimizer: Lineage transform() should be invoked after rest of the optimizers are invoked

2017-07-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080221#comment-16080221
 ] 

Hive QA commented on HIVE-17035:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876397/HIVE-17035.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 74 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[join] 
(batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extract] (batchId=3)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[reduce_deduplicate_extended]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_table_update]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_table_update]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_result_complex]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_union_multiinsert]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_partitioned]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_map_operators]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_reducers_power_two]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_map_operators]
 (batchId=86)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_reducers_power_two]
 (batchId=86)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[mapreduce_stack_trace]
 (batchId=91)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[mapreduce_stack_trace_turnoff]
 (batchId=91)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[minimr_broken_pipe]
 (batchId=91)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[add_part_multiple] 
(batchId=129)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join26] 
(batchId=106)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join8] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_smb_mapjoin_14]
 (batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[date_udf] 
(batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[filter_join_breaktask2]
 (batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby4_map] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby4_map_skew] 
(batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_cube1] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_multi_single_reducer]
 (batchId=126)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_rollup1] 
(batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[input_part2] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join27] 
(batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join30] 
(batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32] 
(batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32_lessSize] 
(batchId=103)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join33] 
(batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join38] 
(batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join8] (batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_map_ppr] 
(batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_gby2] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_with_join]
 (batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[pcr] (batchId=125)

[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-07-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080217#comment-16080217
 ] 

Jesus Camacho Rodriguez commented on HIVE-16751:


[~leftylev], I think there is no need for extra documentation for this one, 
since this just makes the execution more efficient but it is transparent to 
final user. Thanks!

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >