[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508829#comment-15508829
 ] 

Hive QA commented on HIVE-14797:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829373/HIVE-14797.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1248/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1248/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1248/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829373 - PreCommit-HIVE-Build

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508805#comment-15508805
 ] 

Rui Li commented on HIVE-14797:
---

[~roncenzhao] your solution seems also OK and simpler.
Would like to know [~xuefuz]'s opinions.

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508789#comment-15508789
 ] 

Rui Li edited comment on HIVE-14797 at 9/21/16 5:16 AM:


Hmm random prime won't work because we need to make sure same rows always have 
same hash code. I can think of another way:
{code}
1. If we have only one field, we can just return the field's hash code.
2. If we have multiple fields, we can compute hash code as: 
P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, 
and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. 
Seems BigInteger::nextProbablePrime() can help generate the series.
{code}


was (Author: lirui):
Hmm random prime won't work because we need to make sure same rows always have 
same hash code. I can think of another way:
{code}
1. If we have only one field, we can just return the field's hash code.
2. If we have multiple fields, we can compute hash code as: 
P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, 
and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. 
Seems {{BigInteger::nextProbablePrime()}} can help generate the series.
{code}

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508789#comment-15508789
 ] 

Rui Li edited comment on HIVE-14797 at 9/21/16 5:15 AM:


Hmm random prime won't work because we need to make sure same rows always have 
same hash code. I can think of another way:
{code}
1. If we have only one field, we can just return the field's hash code.
2. If we have multiple fields, we can compute hash code as: 
P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, 
and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. 
Seems {{BigInteger::nextProbablePrime()}} can help generate the series.
{code}


was (Author: lirui):
Hmm random prime won't work because we need to make sure same rows always have 
same hash code. I can think of another way:
1. If we have only one field, we can just return the field's hash code.
2. If we have multiple fields, we can compute hash code as: 
P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, 
and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. 
Seems {{BigInteger::nextProbablePrime()}} can help generate the series.

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508789#comment-15508789
 ] 

Rui Li commented on HIVE-14797:
---

Hmm random prime won't work because we need to make sure same rows always have 
same hash code. I can think of another way:
1. If we have only one field, we can just return the field's hash code.
2. If we have multiple fields, we can compute hash code as: 
P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, 
and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. 
Seems {{BigInteger::nextProbablePrime()}} can help generate the series.

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread roncenzhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508756#comment-15508756
 ] 

roncenzhao commented on HIVE-14797:
---

Or we can use the follow way:
Let the seed have two options: 31 and 131. In `ReduceSinkOperator` we can get 
the reducer number named `reduceNum`, and then we can choose the other value if 
the `reduceNum` is equal to 31 or 131.
Is it OK?

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508650#comment-15508650
 ] 

Hive QA commented on HIVE-14803:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829480/HIVE-14803.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_allchildsarenull]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_count_distinct]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[optimize_nullscan]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_25]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1247/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1247/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1247/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829480 - PreCommit-HIVE-Build

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508643#comment-15508643
 ] 

Rui Li commented on HIVE-14797:
---

If user specifies #reducers to be 31, we shouldn't change it. Is it possible we 
can use random prime numbers to compute the hash code?

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508636#comment-15508636
 ] 

Ferdinand Xu commented on HIVE-14029:
-

Hi [~lirui] 
bq. Is there any other way we can track the read method? If not, guess we can 
just remove the class from Hive side.

I will investigate this in a separate JIRA. Thank you for pointing this out.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508625#comment-15508625
 ] 

Lefty Leverenz commented on HIVE-14793:
---

Should this be documented now, or wait and do it with the rest of HIVE-14744?

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, 
> HIVE-14793.03.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: HIVE-14029.2.patch

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508520#comment-15508520
 ] 

Hive QA commented on HIVE-14782:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829483/HIVE-14782.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 105 failed/errored test(s), 10554 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[acid_overwrite]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_partition_coltype_2columns]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_partition_coltype_invalidcolname]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_partition_coltype_invalidtype]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_as_select_not_exist]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_as_select_with_partition]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure6]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure7]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[analyze1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[analyze]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive5]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_corrupt]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi5]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi6]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi7]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec5]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[bad_sample_clause]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view5]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view6]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view7]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view8]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[delete_non_acid_table]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[desc_failure2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[dyn_part2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[dyn_part4]

[jira] [Updated] (HIVE-14412) Add a timezone-aware timestamp

2016-09-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-14412:
--
Attachment: (was: HIVE-14412.5.patch)

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, 
> HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508499#comment-15508499
 ] 

Rui Li commented on HIVE-14412:
---

Most of the recent failures are because "TIME" is added as a new key word so it 
can't be used as column name. So I have to rename the columns in the qtests. 
This can also require users to update their current queries. Do you guys think 
this is acceptable?

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, 
> HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch, HIVE-14412.5.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508437#comment-15508437
 ] 

Rui Li commented on HIVE-14029:
---

I agree to move this forward. HIVE-14240 can be done in parallel, if it doesn't 
depend on this one :)

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508419#comment-15508419
 ] 

Rui Li commented on HIVE-14714:
---

+1
Thanks for the update [~gszadovszky]. I'll commit this shortly if no one has 
any other comments.

> Finishing Hive on Spark causes "java.io.IOException: Stream closed"
> ---
>
> Key: HIVE-14714
> URL: https://issues.apache.org/jira/browse/HIVE-14714
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
> Attachments: HIVE-14714.2.patch, HIVE-14714.3.patch, HIVE-14714.patch
>
>
> After execute hive command with Spark, finishing the beeline session or
> even switch the engine causes IOException. The following executed Ctrl-D to
> finish the session but "!quit" or even "set hive.execution.engine=mr;" causes
> the issue.
> From HS2 log:
> {code}
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote 
> driver, interrupting...
> 2016-09-06 16:15:12,291 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [Driver]: Waiting thread interrupted, killing child process.
> 2016-09-06 16:15:12,296 WARN  org.apache.hive.spark.client.SparkClientImpl: 
> [stderr-redir-1]: Error in redirector thread.
> java.io.IOException: Stream closed
> at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:154)
> at java.io.BufferedReader.readLine(BufferedReader.java:317)
> at java.io.BufferedReader.readLine(BufferedReader.java:382)
> at 
> org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread roncenzhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

roncenzhao updated HIVE-14797:
--
Status: Patch Available  (was: Open)

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread roncenzhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508405#comment-15508405
 ] 

roncenzhao commented on HIVE-14797:
---

Yes, we can not hard code the number (31). But we cannot know which number to 
be set before the end of the job. 
So, I think we can solve it easily by the follow ways:
In the method "Utilities.estimateReducers(xxx)", when the `reducers` value can 
be divisible by 31 we let it plus 1.

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508403#comment-15508403
 ] 

Rui Li commented on HIVE-14240:
---

We have two kinds of test for HoS - TestSparkCliDriver runs on local-cluster, 
and TestMiniSparkOnYarnCliDriver runs on a mini yarn cluster. I know 
local-cluster is not intended to be used outside spark. So if local-cluster 
causes trouble for this task, I think it's acceptable to migrate the qtest in 
TestSparkCliDriver to TestMiniSparkOnYarnCliDriver.

> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver

2016-09-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14782:
-
Attachment: HIVE-14782.2.patch

All test cases just reference src table so reusing the init/cleanup scripts of 
encrypted driver.

> Improve runtime of NegativeMinimrCliDriver
> --
>
> Key: HIVE-14782
> URL: https://issues.apache.org/jira/browse/HIVE-14782
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14782.1.patch, HIVE-14782.2.patch
>
>
> NegativeMinimrCliDriver is one of the slowest test batch. The actual test 
> takes only 3 minutes where as initialization of test takes around 15 minutes. 
> Also remove hadoop20.q tests from NegativeMinimrCliDriver batch as it is no 
> longer supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-09-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14803:

Target Version/s: 2.2.0
  Status: Patch Available  (was: Open)

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver

2016-09-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14782:
-
Description: 
NegativeMinimrCliDriver is one of the slowest test batch. The actual test takes 
only 3 minutes where as initialization of test takes around 15 minutes. Also 
remove hadoop20.q tests from NegativeMinimrCliDriver batch as it is no longer 
supported.


  was:mapreduce_stack_trace_hadoop20.q runs as an isolated test which is no 
longer required as we no longer support hadoop 0.20.x


> Improve runtime of NegativeMinimrCliDriver
> --
>
> Key: HIVE-14782
> URL: https://issues.apache.org/jira/browse/HIVE-14782
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14782.1.patch
>
>
> NegativeMinimrCliDriver is one of the slowest test batch. The actual test 
> takes only 3 minutes where as initialization of test takes around 15 minutes. 
> Also remove hadoop20.q tests from NegativeMinimrCliDriver batch as it is no 
> longer supported.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver

2016-09-20 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14782:
-
Summary: Improve runtime of NegativeMinimrCliDriver  (was: Remove 
mapreduce_stack_trace_hadoop20.q as we no longer have hadoop20)

> Improve runtime of NegativeMinimrCliDriver
> --
>
> Key: HIVE-14782
> URL: https://issues.apache.org/jira/browse/HIVE-14782
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14782.1.patch
>
>
> mapreduce_stack_trace_hadoop20.q runs as an isolated test which is no longer 
> required as we no longer support hadoop 0.20.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508363#comment-15508363
 ] 

liyunzhang_intel commented on HIVE-14240:
-

[~Ferd]: 
bq. In Pig, they don't require Spark distribution since they only test Spark 
standalone mode in their integration test.

In Pig on Spark, we don't need download spark distribution to run unit test 
because now we only enable "local"(SPARK_MASTER) mode. we don't support 
standalone, yarn-client, yarn-cluster mode now. We just [copy all spark 
dependency jars published from mvn repository to the run-time 
classpath|https://github.com/apache/pig/blob/spark/bin/pig#L399] when running 
unit tests.

> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508353#comment-15508353
 ] 

Dapeng Sun edited comment on HIVE-14029 at 9/21/16 1:24 AM:


[~Ferd]
Yes, I used this command


was (Author: dapengsun):
Yes, I used this command

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508353#comment-15508353
 ] 

Dapeng Sun commented on HIVE-14029:
---

Yes, I used this command

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508350#comment-15508350
 ] 

Ferdinand Xu commented on HIVE-14029:
-

Hi [~spena], I think we should move it forwards since HIVE-14240 needs further 
discussions and it doesn't block this ticket. We can upload the tgz into a 
stable location to upgrade the Spark version and once we fixed HIVE-14240, we 
can easily remove this tgz. [~lirui] [~stakiar] [~aihuaxu] any thoughts?

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508316#comment-15508316
 ] 

Ferdinand Xu edited comment on HIVE-14029 at 9/21/16 1:16 AM:
--

Hi [~stakiar], the tgz was built via the following commands:
{code}
sh ./dev/make-distribution.sh  --name hadoop2-without-hive --tgz -Phadoop-2.7 
-Pyarn -Pparquet-provided -Dhadoop.version=2.7.3
{code}
[~dapengsun], can you confirm it please?


was (Author: ferd):
Hi [~stakiar], the tgz was built via the following commands:
{code}
sh ./dev/make-distribution.sh  --name hadoop2-without-hive --tgz -Phadoop-2.7 
-Pyarn -Pparquet-provided -Dhadoop.version=2.7.3
{code}


> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508335#comment-15508335
 ] 

Ferdinand Xu commented on HIVE-14240:
-

Thanks [~stakiar] for your input. 
AFAIK, TestSparkCliDriver needs SparkSubmit to submit a job which requires 
SPARK_HOME to direct to a Spark distribution because it tests SparkOnYarn. 
[~kellyzly] [~mohitsabharwal], please correct it if any  following statements 
are wrong. In Pig, they don't require Spark distribution since they only test 
Spark standalone mode in their integration test.


> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset

2016-09-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14803:

Attachment: HIVE-14803.1.patch

Observed 12% improvement in runtime with 100 partition dataset.

\cc [~ashutoshc]

> S3: Stats gathering for insert queries can be expensive for partitioned 
> dataset
> ---
>
> Key: HIVE-14803
> URL: https://issues.apache.org/jira/browse/HIVE-14803
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14803.1.patch
>
>
> StatsTask's aggregateStats populates stats details for all partitions by 
> checking the file sizes which turns out to be expensive when larger number of 
> partitions are inserted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508316#comment-15508316
 ] 

Ferdinand Xu commented on HIVE-14029:
-

Hi [~stakiar], the tgz was built via the following commands:
{code}
sh ./dev/make-distribution.sh  --name hadoop2-without-hive --tgz -Phadoop-2.7 
-Pyarn -Pparquet-provided -Dhadoop.version=2.7.3
{code}


> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508218#comment-15508218
 ] 

Sahil Takiar commented on HIVE-14240:
-

I looked into this today and tried to get something working, but I don't think 
its possible without making some modifications to Spark.

* The HoS integration tests run with {{spark.master=local-cluster[2,2,1024]}}
** Basically, the {{TestSparkCliDriver}} JVM run the SparkSubmit command (which 
will spawn a new process), the SparkSubmit process will then create 2 more 
processes (the Spark Executors do the actual work) with 2 cores and 1024 Mb 
memory each
** The {{local-cluster}} option is not present in the Spark docs because it is 
mainly used for integration testing within the Spark project itself; it 
basically provides a way of deploying a mini cluster locally
** The advantage of the {{local-cluster}} is that it does not require Spark 
Masters or Workers to be running
*** Spark Workers are basically like NodeManagers, a Spark Master is basically 
like HS2
* Looked through the Spark code that launches actual Spark Executors and they 
more or less require a {{SPARK_HOME}} directory to be present (ref: 
https://github.com/apache/spark/blob/branch-2.0/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java)
** {{SPARK_HOME}} is suppose to point to a directory containing a Spark 
distribution

Thus, we would need to modify the {{AbstractCommandBuilder.java}} class in 
Spark so that it doesn't require {{SPARK_HOME}} to be set. However, I'm not 
sure how difficult this will be to do in Spark.

We could change the {{spark.master} from {{local-cluster}} to {{local}}, in 
which case everything will be run locally. However, I think this removes some 
functionality from the HoS tests since running locally isn't the same as 
running against a real mini-cluster.

> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508218#comment-15508218
 ] 

Sahil Takiar edited comment on HIVE-14240 at 9/21/16 12:16 AM:
---

I looked into this today and tried to get something working, but I don't think 
its possible without making some modifications to Spark.

* The HoS integration tests run with {{spark.master=local-cluster[2,2,1024]}}
** Basically, the {{TestSparkCliDriver}} JVM run the SparkSubmit command (which 
will spawn a new process), the SparkSubmit process will then create 2 more 
processes (the Spark Executors do the actual work) with 2 cores and 1024 Mb 
memory each
** The {{local-cluster}} option is not present in the Spark docs because it is 
mainly used for integration testing within the Spark project itself; it 
basically provides a way of deploying a mini cluster locally
** The advantage of the {{local-cluster}} is that it does not require Spark 
Masters or Workers to be running
*** Spark Workers are basically like NodeManagers, a Spark Master is basically 
like HS2
* Looked through the Spark code that launches actual Spark Executors and they 
more or less require a {{SPARK_HOME}} directory to be present (ref: 
https://github.com/apache/spark/blob/branch-2.0/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java)
** {{SPARK_HOME}} is suppose to point to a directory containing a Spark 
distribution

Thus, we would need to modify the {{AbstractCommandBuilder.java}} class in 
Spark so that it doesn't require {{SPARK_HOME}} to be set. However, I'm not 
sure how difficult this will be to do in Spark.

We could change the {{spark.master}} from {{local-cluster}} to {{local}}, in 
which case everything will be run locally. However, I think this removes some 
functionality from the HoS tests since running locally isn't the same as 
running against a real mini-cluster.


was (Author: stakiar):
I looked into this today and tried to get something working, but I don't think 
its possible without making some modifications to Spark.

* The HoS integration tests run with {{spark.master=local-cluster[2,2,1024]}}
** Basically, the {{TestSparkCliDriver}} JVM run the SparkSubmit command (which 
will spawn a new process), the SparkSubmit process will then create 2 more 
processes (the Spark Executors do the actual work) with 2 cores and 1024 Mb 
memory each
** The {{local-cluster}} option is not present in the Spark docs because it is 
mainly used for integration testing within the Spark project itself; it 
basically provides a way of deploying a mini cluster locally
** The advantage of the {{local-cluster}} is that it does not require Spark 
Masters or Workers to be running
*** Spark Workers are basically like NodeManagers, a Spark Master is basically 
like HS2
* Looked through the Spark code that launches actual Spark Executors and they 
more or less require a {{SPARK_HOME}} directory to be present (ref: 
https://github.com/apache/spark/blob/branch-2.0/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java)
** {{SPARK_HOME}} is suppose to point to a directory containing a Spark 
distribution

Thus, we would need to modify the {{AbstractCommandBuilder.java}} class in 
Spark so that it doesn't require {{SPARK_HOME}} to be set. However, I'm not 
sure how difficult this will be to do in Spark.

We could change the {{spark.master} from {{local-cluster}} to {{local}}, in 
which case everything will be run locally. However, I think this removes some 
functionality from the HoS tests since running locally isn't the same as 
running against a real mini-cluster.

> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. 

[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508203#comment-15508203
 ] 

Daniel Dai commented on HIVE-14801:
---

+1

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508086#comment-15508086
 ] 

Hive QA commented on HIVE-14801:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829450/HIVE-14801.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1245/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1245/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1245/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829450 - PreCommit-HIVE-Build

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508063#comment-15508063
 ] 

Thejas M Nair commented on HIVE-14801:
--

[~sseth]

Looking at mvn outputs, yes, the difference looks larger -

Before -
Hive Integration - Unit Tests .. SUCCESS [ 35.974 s]
Total time: 01:28 min

After -
Hive Integration - Unit Tests .. SUCCESS [ 26.785 s]
Total time: 01:09 min

Though there seems to be more noise when total time is considered .

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat

2016-09-20 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508057#comment-15508057
 ] 

Sushanth Sowmyan commented on HIVE-13853:
-

That is how I had started testing, but HS2 has some quirks on filter load time 
due to which it has to be loaded explicitly at HS2 start time. Thus, this 
covers changes to HS2 start as well, and not simply the filter.

> Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
> -
>
> Key: HIVE-13853
> URL: https://issues.apache.org/jira/browse/HIVE-13853
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, WebHCat
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13853.2.patch, HIVE-13853.patch
>
>
> There is a possibility that there may be a CSRF-based attack on various 
> hadoop components, and thus, there is an effort to add a block for all 
> incoming http requests if they do not contain a X-XSRF-Header header. (See 
> HADOOP-12691 for motivation)
> This has potential to affect HS2 when running on thrift-over-http mode(if 
> cookie-based-auth is used), and webhcat.
> We introduce new flags to determine whether or not we're using the filter, 
> and if we are, we will automatically reject any http requests which do not 
> contain this header.
> To allow this to work, we also need to make changes to our JDBC driver to 
> automatically inject this header into any requests it makes. Also, any 
> client-side programs/api not using the JDBC driver directly will need to make 
> changes to add a X-XSRF-Header header to the request to make calls to 
> HS2/WebHCat if this filter is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-14793.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Committed to master. Precommit not required, since this makes no difference on 
an already deployed precommit run. Tested offline.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, 
> HIVE-14793.03.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14793:
--
Attachment: HIVE-14793.03.patch

Updated.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, 
> HIVE-14793.03.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508003#comment-15508003
 ] 

Siddharth Seth commented on HIVE-14793:
---

Oops. I'll generate it again properly. The diff is because of line spacing. It 
should not have shown up.
Thanks for the review.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14800) Handle off by 3 in ORC split generation based on split strategy used

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507998#comment-15507998
 ] 

Siddharth Seth commented on HIVE-14800:
---

They are valid splits - however, it should be possible to make them consistent 
when splits are generated by ORC itself. Either special case BI or ETL to 
generate the same split as the other for the starting split of a file.

In terms of hashCode for consistent splits - that should be independent of the 
format.

> Handle off by 3 in ORC split generation based on split strategy used
> 
>
> Key: HIVE-14800
> URL: https://issues.apache.org/jira/browse/HIVE-14800
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>
> BI will apparently generate splits starting at offset 0.
> ETL will skip the ORC header and generate a split starting at offset 3.
> There's a workaround in the HiveSplitGenreator to handle this for consistent 
> splits. Ideally, Orc split generation should take care of this.
> cc [~prasanth_j], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507954#comment-15507954
 ] 

Sergio Peña commented on HIVE-14793:


Patch looks good.
+1

However, I see the outputDir change in the patch is already on master.
https://github.com/apache/hive/blob/master/dev-support/jenkins-execute-build.sh#L45

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755
 ] 

Siddharth Seth edited comment on HIVE-14793 at 9/20/16 9:58 PM:


bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq. --outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir once it runs successfully.


was (Author: sseth):
bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq .outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir once it runs successfully.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755
 ] 

Siddharth Seth edited comment on HIVE-14793 at 9/20/16 9:57 PM:


bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq .outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir once it runs successfully.


was (Author: sseth):
bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq .--outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir once it runs successfully.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14793:
--
Attachment: HIVE-14793.02.patch

Updated patch, which removes the change to target/. [~spena] - could you please 
take another look.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507855#comment-15507855
 ] 

Siddharth Seth commented on HIVE-14801:
---

[~thejas] - you may want to look at mvn test output, rather than the junit 
result file. The junit result file does not include the setup time. My guess is 
the metastore is started during initialization? It may be more than 5 seconds 
saved overall.

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507845#comment-15507845
 ] 

Thejas M Nair commented on HIVE-14801:
--

Overall around 5 seconds saved -
cc [~sseth]

Before -
{code}

http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report.xsd;
 name="org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation" 
time="14.252" tests="6" errors="0" skipped="0" failures="0">
  







http://java.oracle.com/"/>

















































http://bugreport.sun.com/bugreport/"/>





  
  
  
  
  
  
  

{code}

After - 
{code}

http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report.xsd;
 name="org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation" 
time="9.412" tests="6" errors="0" skipped="0" failures="0">
  







http://java.oracle.com/"/>

















































http://bugreport.sun.com/bugreport/"/>





  
  
  
  
  
  
  

{code}


> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14801:
-
Attachment: HIVE-14801.2.patch

2.patch - Moving the setup to a one time setup saves at least another second .

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507830#comment-15507830
 ] 

Siddharth Seth commented on HIVE-13853:
---

Can this be written as a unit test instead of a test which requires HS2 to be 
brought up? Test the functionality of the filter independent of where it is 
running.

> Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
> -
>
> Key: HIVE-13853
> URL: https://issues.apache.org/jira/browse/HIVE-13853
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, WebHCat
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13853.2.patch, HIVE-13853.patch
>
>
> There is a possibility that there may be a CSRF-based attack on various 
> hadoop components, and thus, there is an effort to add a block for all 
> incoming http requests if they do not contain a X-XSRF-Header header. (See 
> HADOOP-12691 for motivation)
> This has potential to affect HS2 when running on thrift-over-http mode(if 
> cookie-based-auth is used), and webhcat.
> We introduce new flags to determine whether or not we're using the filter, 
> and if we are, we will automatically reject any http requests which do not 
> contain this header.
> To allow this to work, we also need to make changes to our JDBC driver to 
> automatically inject this header into any requests it makes. Also, any 
> client-side programs/api not using the JDBC driver directly will need to make 
> changes to add a X-XSRF-Header header to the request to make calls to 
> HS2/WebHCat if this filter is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507799#comment-15507799
 ] 

Hive QA commented on HIVE-14713:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829432/HIVE-14713.2.patch

{color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10626 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1244/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1244/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1244/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829432 - PreCommit-HIVE-Build

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14801:
-
Status: Patch Available  (was: Open)

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14801:
-
Description: 
TestPartitionNameWhitelistValidation uses remote metastore. However, there can 
be multiple issues around startup of remote metastore, including race 
conditions in finding available port. In addition, all the initialization done 
at startup of remote metastore is likely to make the test case take more time.
This test case doesn't need remote metastore, so it should be moved to using 
embedded metastore.

  was:
TestPartitionNameWhitelistValidation uses remote metastore. However, there can 
be multiple issues around startup of remote metastore, including race 
conditions in finding available port. In addition, all the initialization done 
at startup of remote metastore is likely to make the test case take more time.



> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability

2016-09-20 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14801:
-
Attachment: HIVE-14801.1.patch

> improve TestPartitionNameWhitelistValidation stability
> --
>
> Key: HIVE-14801
> URL: https://issues.apache.org/jira/browse/HIVE-14801
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14801.1.patch
>
>
> TestPartitionNameWhitelistValidation uses remote metastore. However, there 
> can be multiple issues around startup of remote metastore, including race 
> conditions in finding available port. In addition, all the initialization 
> done at startup of remote metastore is likely to make the test case take more 
> time.
> This test case doesn't need remote metastore, so it should be moved to using 
> embedded metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755
 ] 

Siddharth Seth edited comment on HIVE-14793 at 9/20/16 8:49 PM:


bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq .--outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir once it runs successfully.


was (Author: sseth):
bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq .--outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755
 ] 

Siddharth Seth commented on HIVE-14793:
---

bq. 1. Can we create a new function that checks and/or initializes environment 
variables? I think this would be useful for new devs when looking at what 
config variables can be used.
I don't think we should do this in the current jira. There's a lot more 
variables other than the ones added here.
Beyond that, this may not be a great approach since it separates the logic for 
processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only 
required while setting up the branch. I don't think a separate method to take 
care of this, along with initialization of other methods helps a lot.

bq .--outputDir is not necessary anymore.
Reverting the change and testing (this still leaves the outputDir, but I 
believe it's required by the ptest client). Will post a patch after testing 
with the reverted outputDir.

> Allow ptest branch to be specified, PROFILE override
> 
>
> Key: HIVE-14793
> URL: https://issues.apache.org/jira/browse/HIVE-14793
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14793.01.patch
>
>
> Post HIVE-14734 - the profile is automatically determined. Add an option to 
> override this via Jenkins. Also add an option to specify the branch from 
> which ptest is built (This is hardcoded to github.com/apache/hive)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14461) Investigate HBaseMinimrCliDriver tests

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507714#comment-15507714
 ] 

Siddharth Seth edited comment on HIVE-14461 at 9/20/16 8:32 PM:


Yes. Initial plan was to remove it, but looks like the test does not exist in 
CliDriver - so moving it over.


was (Author: sseth):
Yes.

> Investigate HBaseMinimrCliDriver tests
> --
>
> Key: HIVE-14461
> URL: https://issues.apache.org/jira/browse/HIVE-14461
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Siddharth Seth
> Attachments: HIVE-14461.01.patch
>
>
> during HIVE-1 i've encountered an odd thing:
> HBaseMinimrCliDriver only executes single test...and that test is set using 
> the qfile selector...which looks a out-of-place.
> The only test it executes doesn't follow regular qtest file naming...and has 
> an extension 'm'
> At least the file should be renamedbut I think change wasn't 
> intentional



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507714#comment-15507714
 ] 

Siddharth Seth commented on HIVE-14461:
---

Yes.

> Investigate HBaseMinimrCliDriver tests
> --
>
> Key: HIVE-14461
> URL: https://issues.apache.org/jira/browse/HIVE-14461
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Siddharth Seth
> Attachments: HIVE-14461.01.patch
>
>
> during HIVE-1 i've encountered an odd thing:
> HBaseMinimrCliDriver only executes single test...and that test is set using 
> the qfile selector...which looks a out-of-place.
> The only test it executes doesn't follow regular qtest file naming...and has 
> an extension 'm'
> At least the file should be renamedbut I think change wasn't 
> intentional



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507644#comment-15507644
 ] 

Sahil Takiar commented on HIVE-14240:
-

Hey [~Ferd] I haven't had time to look into this, although it shouldn't be 
particularly difficult (I would hope). I don't think this blocks HIVE-14029 but 
I'm trying to talk to some Spark committers to see what they think.

> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507641#comment-15507641
 ] 

Sahil Takiar commented on HIVE-14029:
-

[~Ferd] how was the 
http://blog.sundp.me/spark/spark-2.0.0-bin-hadoop2-without-hive.tgz built?

I don't think HIVE-14240 is a blocker for this assuming the tar-ball was built 
in a supported way, but I'm trying to contact some Spark committers to see if 
they have any input.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat

2016-09-20 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507612#comment-15507612
 ] 

Sushanth Sowmyan commented on HIVE-13853:
-

[~sseth], I've been looking at some other tests, and I come to a similar 
question - given that the runtime is not the actual problem of this test, the 
problem is the miniHS2 start, which we do need to test that HS2 is able to 
filter this properly.

Things we could do to improve:

a) batch the miniHS2 tests together - however, this can be problematic as each 
test might do a different confOverlay in the beginning
b) look into why miniHS2 takes so long to start sometimes.

> Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
> -
>
> Key: HIVE-13853
> URL: https://issues.apache.org/jira/browse/HIVE-13853
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, WebHCat
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13853.2.patch, HIVE-13853.patch
>
>
> There is a possibility that there may be a CSRF-based attack on various 
> hadoop components, and thus, there is an effort to add a block for all 
> incoming http requests if they do not contain a X-XSRF-Header header. (See 
> HADOOP-12691 for motivation)
> This has potential to affect HS2 when running on thrift-over-http mode(if 
> cookie-based-auth is used), and webhcat.
> We introduce new flags to determine whether or not we're using the filter, 
> and if we are, we will automatically reject any http requests which do not 
> contain this header.
> To allow this to work, we also need to make changes to our JDBC driver to 
> automatically inject this header into any requests it makes. Also, any 
> client-side programs/api not using the JDBC driver directly will need to make 
> changes to add a X-XSRF-Header header to the request to make calls to 
> HS2/WebHCat if this filter is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-20 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507583#comment-15507583
 ] 

Illya Yalovyy commented on HIVE-14713:
--

I have updated Patch and CR with a fixed version.

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-20 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14713:
-
Status: Open  (was: Patch Available)

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-20 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14713:
-
Attachment: HIVE-14713.2.patch

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests

2016-09-20 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14713:
-
Status: Patch Available  (was: Open)

> LDAP Authentication Provider should be covered with unit tests
> --
>
> Key: HIVE-14713
> URL: https://issues.apache.org/jira/browse/HIVE-14713
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch
>
>
> Currently LdapAuthenticationProviderImpl class is not covered with unit 
> tests. To make this class testable some minor refactoring will be required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14800) Handle off by 3 in ORC split generation based on split strategy used

2016-09-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507454#comment-15507454
 ] 

Gopal V commented on HIVE-14800:


They are both valid splits, so unlikely to need changes there - the only issue 
is with external consumption of those details, like using those values as 
hashcode inputs.

We could implement a hashcode directly out of ORC though.

> Handle off by 3 in ORC split generation based on split strategy used
> 
>
> Key: HIVE-14800
> URL: https://issues.apache.org/jira/browse/HIVE-14800
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>
> BI will apparently generate splits starting at offset 0.
> ETL will skip the ORC header and generate a split starting at offset 3.
> There's a workaround in the HiveSplitGenreator to handle this for consistent 
> splits. Ideally, Orc split generation should take care of this.
> cc [~prasanth_j], [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14624) LLAP: Use FQDN when submitting work to LLAP

2016-09-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14624:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master

> LLAP: Use FQDN when submitting work to LLAP 
> 
>
> Key: HIVE-14624
> URL: https://issues.apache.org/jira/browse/HIVE-14624
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, 
> HIVE-14624.03.patch, HIVE-14624.patch
>
>
> {code}
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> + socketAddress.getHostName());
> llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java:
> host = socketAddress.getHostName();
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:  
> public static String getHostName() {
> llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java:   
>return InetAddress.getLocalHost().getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> String name = address.getHostName();
> llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java:
> builder.setAmHost(address.getHostName());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: 
>nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), 
> localAddress.get().getPort());
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
> localAddress.get().getHostName(), vertex.getDagName(), 
> qIdProto.getDagIndex(),
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java:
>   new ExecutionContextImpl(localAddress.get().getHostName()), env,
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: 
>String hostName = MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java:
> .setBindAddress(addr.getHostName())
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java:
>   request.getContainerIdString(), executionContext.getHostName(), 
> vertex.getDagName(),
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>String displayName = "LlapDaemonCacheMetrics-" + 
> MetricsUtils.getHostName();
> llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: 
>displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName();
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java:
>   new LlapProtocolClientImpl(new Configuration(), 
> serverAddr.getHostName(),
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java:
> builder.setAmHost(getAddress().getHostName());
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java:
>   String displayName = "LlapTaskSchedulerMetrics-" + 
> MetricsUtils.getHostName();
> {code}
> In systems where the hostnames do not match FQDN, calling the 
> getCanonicalHostName() will allow for resolution of the hostname when 
> accessing from a different base domain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests

2016-09-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507343#comment-15507343
 ] 

Prasanth Jayachandran commented on HIVE-14461:
--

I don't see test report for HBaseMinimrCliDriver tests. 
https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/1214/testReport/org.apache.hadoop.hive.cli/

Are we removing these tests altogether from HBaseMinimrCliDriver and running 
them in HbaseCliDriver?

> Investigate HBaseMinimrCliDriver tests
> --
>
> Key: HIVE-14461
> URL: https://issues.apache.org/jira/browse/HIVE-14461
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Siddharth Seth
> Attachments: HIVE-14461.01.patch
>
>
> during HIVE-1 i've encountered an odd thing:
> HBaseMinimrCliDriver only executes single test...and that test is set using 
> the qfile selector...which looks a out-of-place.
> The only test it executes doesn't follow regular qtest file naming...and has 
> an extension 'm'
> At least the file should be renamedbut I think change wasn't 
> intentional



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507337#comment-15507337
 ] 

Siddharth Seth commented on HIVE-14680:
---

Really think we're better off fixing this within Orc itself - instead of 
working around it in the split generator (which at some point will handle 
different file types). Can the ORC getSplits not deal with this in BI mode?

> retain consistent splits /during/ (as opposed to across) LLAP failures on top 
> of HIVE-14589
> ---
>
> Key: HIVE-14680
> URL: https://issues.apache.org/jira/browse/HIVE-14680
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, 
> HIVE-14680.03.patch, HIVE-14680.patch
>
>
> see HIVE-14589.
> Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) 
> is to return locations for all slots to HostAffinitySplitLocationProvider, 
> the missing slots being inactive locations (based solely on the last slot 
> actually present). For the splits mapped to these locations, fall back via 
> different hash functions, or some sort of probing.
> This still doesn't handle all the cases, namely when the last slots are gone 
> (consistent hashing is supposed to be good for this?); however for that we'd 
> need more involved coordination between nodes or a central updater to 
> indicate the number of nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507333#comment-15507333
 ] 

Siddharth Seth commented on HIVE-7926:
--

bq. In the sentence: “The initial stage of the query is pushed into #LLAP, 
large shuffle is performed in their own containers” - What does "their own 
containers" refer to? Is there only one large shuffle, or multiple shuffles?
When executing a query, it's possible to launch separate containers (Java 
processes, fallback to regular Tez execution) to perform the large Shuffles. In 
many cases, running a Shuffle / Reduce within LLAP may not be beneficial (no 
caching gains, etc). That said - it's also possible to run these Shuffle/Reduce 
steps within LLAP itself, and that is the typical case for short running 
queries. Multiple shuffles are possible.
This point primarily talks about where a reduce will run - within the LLAP 
daemon itself, or as a separate container (process).

bq. In the sentence: "The node allows parallel execution for multiple query 
fragments from different queries and sessions” - what does "the node" refer to? 
A single LLAP node?
Yes - that refers to an LLAP instance. A single LLAP process can handle 
multiple fragments from different queries, or the same query.

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat

2016-09-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507317#comment-15507317
 ] 

Siddharth Seth commented on HIVE-13853:
---

The test added in this patch - TestXSRFFilter - runs for close to 20 minutes, 
about 2 minutes of that is actual run time, the rest is setup time. Is there 
anyway to make this faster?

> Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
> -
>
> Key: HIVE-13853
> URL: https://issues.apache.org/jira/browse/HIVE-13853
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, WebHCat
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-13853.2.patch, HIVE-13853.patch
>
>
> There is a possibility that there may be a CSRF-based attack on various 
> hadoop components, and thus, there is an effort to add a block for all 
> incoming http requests if they do not contain a X-XSRF-Header header. (See 
> HADOOP-12691 for motivation)
> This has potential to affect HS2 when running on thrift-over-http mode(if 
> cookie-based-auth is used), and webhcat.
> We introduce new flags to determine whether or not we're using the filter, 
> and if we are, we will automatically reject any http requests which do not 
> contain this header.
> To allow this to work, we also need to make changes to our JDBC driver to 
> automatically inject this header into any requests it makes. Also, any 
> client-side programs/api not using the JDBC driver directly will need to make 
> changes to add a X-XSRF-Header header to the request to make calls to 
> HS2/WebHCat if this filter is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14461) Investigate HBaseMinimrCliDriver tests

2016-09-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-14461:
-

Assignee: Siddharth Seth

> Investigate HBaseMinimrCliDriver tests
> --
>
> Key: HIVE-14461
> URL: https://issues.apache.org/jira/browse/HIVE-14461
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Siddharth Seth
> Attachments: HIVE-14461.01.patch
>
>
> during HIVE-1 i've encountered an odd thing:
> HBaseMinimrCliDriver only executes single test...and that test is set using 
> the qfile selector...which looks a out-of-place.
> The only test it executes doesn't follow regular qtest file naming...and has 
> an extension 'm'
> At least the file should be renamedbut I think change wasn't 
> intentional



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14651) Add a local cluster for Tez and LLAP

2016-09-20 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14651:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews.

> Add a local cluster for Tez and LLAP
> 
>
> Key: HIVE-14651
> URL: https://issues.apache.org/jira/browse/HIVE-14651
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.2.0
>
> Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, 
> HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception

2016-09-20 Thread Anbu Cheeralan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anbu Cheeralan updated HIVE-14798:
--
Description: 
MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)

Hive Logs:

2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: 
metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException
2016-09-20T17:28:00,717 WARN  [HiveServer2-Background-Pool: Thread-92]: 
exec.DDLTask (:()) - Failed to run metacheck: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
 at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90)
at 
org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:432)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:418)
... 4 more

Here are the steps to recreate this issue:
use default;
DROP TABLE IF EXISTS repairtable;
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING);
MSCK REPAIR TABLE default.repairtable;

  was:
MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)

Hive Logs:

2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: 
metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException
2016-09-20T17:28:00,717 WARN  [HiveServer2-Background-Pool: Thread-92]: 
exec.DDLTask (:()) - Failed to run metacheck: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at 

[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception

2016-09-20 Thread Anbu Cheeralan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anbu Cheeralan updated HIVE-14798:
--
Description: 
MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)

Hive Logs:

2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: 
metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException
2016-09-20T17:28:00,717 WARN  [HiveServer2-Background-Pool: Thread-92]: 
exec.DDLTask (:()) - Failed to run metacheck: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
 at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90)
at 
org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:432)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:418)
... 4 more

Here are the steps to recreate this issue:
use default
DROP TABLE IF EXISTS repairtable
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
MSCK REPAIR TABLE default.repairtable

  was:
MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
"FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask"

Hive Logs:

2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - 
java.lang.NullPointerException
2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (: ()) - Failed to run 
metacheck:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448

Here are the steps to recreate this issue:
use default
DROP TABLE IF EXISTS repairtable
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
MSCK REPAIR TABLE default.repairtable


> MSCK REPAIR TABLE throws null pointer exception
> 

[jira] [Resolved] (HIVE-14787) Ability to access DistributedCache from UDFs via Java API

2016-09-20 Thread Ilya Bystrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Bystrov resolved HIVE-14787.
-
Resolution: Invalid

> Ability to access DistributedCache from UDFs via Java API
> -
>
> Key: HIVE-14787
> URL: https://issues.apache.org/jira/browse/HIVE-14787
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: 1.1.0+cdh5.7.1
>Reporter: Ilya Bystrov
>
> I'm trying to create custom function
> {{create function geoip as 'some.package.UDFGeoIp' using jar 
> 'hdfs:///user/hive/ext/HiveGeoIP.jar', file 
> 'hdfs:///user/hive/ext/GeoIP.dat';}}
> According to https://issues.apache.org/jira/browse/HIVE-1016
> I should be able to access file via {{new File("./GeoIP.dat");}} (in 
> overridden method {{GenericUDF#evaluate(DeferredObject[] arguments)}})
> But this doesn't work.
> I use the following workaround, but it's ugly:
> {code}
> CodeSource codeSource = 
> GenericUDFGeoIP.class.getProtectionDomain().getCodeSource();
> File jarFile = new File(codeSource.getLocation().toURI().getPath());
> String jarDir = jarFile.getParentFile().getPath();
> File actualFile = new File(jarDir + "/GeoIP.dat");
> {code}
> UPDATE:
> It looks like I should use {{ClassLoader#getSystemResource(String resource)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14787) Ability to access DistributedCache from UDFs via Java API

2016-09-20 Thread Ilya Bystrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Bystrov updated HIVE-14787:

Description: 
I'm trying to create custom function

{{create function geoip as 'some.package.UDFGeoIp' using jar 
'hdfs:///user/hive/ext/HiveGeoIP.jar', file 'hdfs:///user/hive/ext/GeoIP.dat';}}

According to https://issues.apache.org/jira/browse/HIVE-1016
I should be able to access file via {{new File("./GeoIP.dat");}} (in overridden 
method {{GenericUDF#evaluate(DeferredObject[] arguments)}})
But this doesn't work.

I use the following workaround, but it's ugly:
{code}
CodeSource codeSource = 
GenericUDFGeoIP.class.getProtectionDomain().getCodeSource();
File jarFile = new File(codeSource.getLocation().toURI().getPath());
String jarDir = jarFile.getParentFile().getPath();
File actualFile = new File(jarDir + "/GeoIP.dat");
{code}

UPDATE:
It looks like I should use {{ClassLoader#getSystemResource(String resource)}}

  was:
I'm trying to create custom function

{{create function geoip as 'some.package.UDFGeoIp' using jar 
'hdfs:///user/hive/ext/HiveGeoIP.jar', file 'hdfs:///user/hive/ext/GeoIP.dat';}}

According to https://issues.apache.org/jira/browse/HIVE-1016
I should be able to access file via {{new File("./GeoIP.dat");}} (in overridden 
method {{GenericUDF#evaluate(DeferredObject[] arguments)}})
But this doesn't work.

I use the following workaround, but it's ugly:
{code}
CodeSource codeSource = 
GenericUDFGeoIP.class.getProtectionDomain().getCodeSource();
File jarFile = new File(codeSource.getLocation().toURI().getPath());
String jarDir = jarFile.getParentFile().getPath();
File actualFile = new File(jarDir + "/GeoIP.dat");
{code}


> Ability to access DistributedCache from UDFs via Java API
> -
>
> Key: HIVE-14787
> URL: https://issues.apache.org/jira/browse/HIVE-14787
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: 1.1.0+cdh5.7.1
>Reporter: Ilya Bystrov
>
> I'm trying to create custom function
> {{create function geoip as 'some.package.UDFGeoIp' using jar 
> 'hdfs:///user/hive/ext/HiveGeoIP.jar', file 
> 'hdfs:///user/hive/ext/GeoIP.dat';}}
> According to https://issues.apache.org/jira/browse/HIVE-1016
> I should be able to access file via {{new File("./GeoIP.dat");}} (in 
> overridden method {{GenericUDF#evaluate(DeferredObject[] arguments)}})
> But this doesn't work.
> I use the following workaround, but it's ugly:
> {code}
> CodeSource codeSource = 
> GenericUDFGeoIP.class.getProtectionDomain().getCodeSource();
> File jarFile = new File(codeSource.getLocation().toURI().getPath());
> String jarDir = jarFile.getParentFile().getPath();
> File actualFile = new File(jarDir + "/GeoIP.dat");
> {code}
> UPDATE:
> It looks like I should use {{ClassLoader#getSystemResource(String resource)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507121#comment-15507121
 ] 

Hive QA commented on HIVE-14412:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829401/HIVE-14412.5.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 10563 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_like]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join43]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[serde_regex]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[serde_regex]
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testCliDriver[serde_regex]
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_timestamp]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_4]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_5]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_6]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[serde_regex2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[serde_regex3]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[serde_regex]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[wrong_column_type]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1243/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1243/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1243/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 23 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829401 - PreCommit-HIVE-Build

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, 
> HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch, HIVE-14412.5.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-20 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-9423:
-
Attachment: HIVE-9423.patch

THRIFT-2046 handles connection overflow with closing the connection without 
writing any data into it. See:
{code}
if (t instanceof RejectedExecutionException) {
  retryCount++;
  try {
if (remainTimeInMillis > 0) {
  //do a truncated 20 binary exponential backoff sleep
[..]
} else {
  client.close();
  wp = null;
  LOGGER.warn("Task has been rejected by ExecutorService " + 
retryCount
  + " times till timedout, reason: " + t);
  break;
}
  }
{code}

On the client side this generates a TTransportException in 
TIOStreamTransport.java with specific types, which helps us to differentiate 
between the cases.

So in the proposed solution we could print different error message, when the 
connection pool is exhausted, and when the connection is not available, etc.

What do you think about the proposed solution [~aihuaxu], [~ctang.ma], 
[~ngangam]?

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
> Attachments: HIVE-9423.patch
>
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew

2016-09-20 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506967#comment-15506967
 ] 

Xuefu Zhang commented on HIVE-14797:


This seems making sense, but can we not hard code the number (31)?

> reducer number estimating may lead to data skew
> ---
>
> Key: HIVE-14797
> URL: https://issues.apache.org/jira/browse/HIVE-14797
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: roncenzhao
>Assignee: roncenzhao
> Attachments: HIVE-14797.patch
>
>
> HiveKey's hash code is generated by multipling by 31 key by key which is 
> implemented in method `ObjectInspectorUtils.getBucketHashCode()`:
> for (int i = 0; i < bucketFields.length; i++) {
>   int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], 
> bucketFieldInspectors[i]);
>   hashCode = 31 * hashCode + fieldHash;
> }
> The follow example will lead to data skew:
> I hava two table called tbl1 and tbl2 and they have the same column: a int, b 
> string. The values of column 'a' in both two tables are not skew, but values 
> of column 'b' in both two tables are skew.
> When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and 
> tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data 
> skew.
> As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. 
> When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the 
> result, the job will be skew.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506942#comment-15506942
 ] 

Hive QA commented on HIVE-14029:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12829398/HIVE-14029.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 10556 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket4]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket5]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[disable_merge_for_bucketing]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[list_bucket_dml_10]
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[reduce_deduplicate]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket4]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[disable_merge_for_bucketing]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1242/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1242/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1242/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 23 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12829398 - PreCommit-HIVE-Build

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14775) Investigate IOException usage in Metrics APIs

2016-09-20 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14775:
---
Description: 
A large number of metrics APIs seem to declare to throw IOExceptions 
needlessly. (incrementCounter, decrementCounter etc.)
This is not only misleading but it fills up the code with unnecessary catch 
blocks never to be reached.

We should investigate if these exceptions are thrown at all, and remove them if 
 it is truly unused.

  was:
A large number of metrics APIs seems to declare to throw IOExceptions 
needlessly. (incrementCounter, decrementCounter etc.)
This is not only misleading but it fills up the code with unnecessary catch 
blocks never to be reached.

We should investigate if these exceptions are thrown at all, and remove them if 
 it is truly unused.


> Investigate IOException usage in Metrics APIs
> -
>
> Key: HIVE-14775
> URL: https://issues.apache.org/jira/browse/HIVE-14775
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> A large number of metrics APIs seem to declare to throw IOExceptions 
> needlessly. (incrementCounter, decrementCounter etc.)
> This is not only misleading but it fills up the code with unnecessary catch 
> blocks never to be reached.
> We should investigate if these exceptions are thrown at all, and remove them 
> if  it is truly unused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-20 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-9423:


Assignee: Peter Vary

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>Assignee: Peter Vary
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506839#comment-15506839
 ] 

Rui Li commented on HIVE-14029:
---

My understanding is spark needs hive libraries only for SparkSQL, which is not 
needed for HoS.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506833#comment-15506833
 ] 

Rui Li commented on HIVE-14029:
---

bq. some APIs are changed in Spark side
Is there any other way we can track the read method? If not, guess we can just 
remove the class from Hive side.
bq. Hive2 itests will use Spark2 assembly to run Hive2 tests. This means Hive2 
might not test Spark2 correctly due to the lack of Hive 1.2 libraries in it.
I'm not sure what problem spark has without hive libraries. We have been 
requiring that spark is built without hive. Otherwise we'll have different hive 
libraries in our classpath which causes conflicts.
I don't think HIVE-14240 blocks this one. Actually HIVE-14240 should be 
implemented for Spark 2.0 right?

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-9530) constant * column is null interpreted as constant * boolean

2016-09-20 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-9530 started by Miklos Csanady.

> constant * column is null interpreted as constant * boolean
> ---
>
> Key: HIVE-9530
> URL: https://issues.apache.org/jira/browse/HIVE-9530
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Miklos Csanady
>Priority: Minor
>
> {code}
> select c1 from tversion where 1 * cnnull is null
> FAILED: SemanticException [Error 10014]: Line 1:30 Wrong arguments 'cnnull': 
> No matching method for class 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply with (int, boolean)
> create table if not exists TVERSION (
>   RNUM int,
>   C1 int,
>   CVER char(6),
>   CNNULL int,
>   CCNULL char(1)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
> STORED AS TEXTFILE ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception

2016-09-20 Thread Anbu Cheeralan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anbu Cheeralan updated HIVE-14798:
--
Description: 
MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
"FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask"

Hive Logs:

2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - 
java.lang.NullPointerException
2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (: ()) - Failed to run 
metacheck:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448

Here are the steps to recreate this issue:
use default
DROP TABLE IF EXISTS repairtable
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
MSCK REPAIR TABLE default.repairtable

  was:
MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
"FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask"

Hive Logs:

2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - 
java.lang.NullPointerException
2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (: ()) - Failed to run 
metacheck:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448

Here are the steps to recreate this issue:
use default
DROP TABLE IF EXISTS repairtable
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
MSCK REPAIR TABLE default.repairtable


> MSCK REPAIR TABLE throws null pointer exception
> ---
>
> Key: HIVE-14798
> URL: https://issues.apache.org/jira/browse/HIVE-14798
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Anbu Cheeralan
>
> MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1
> I have tested the same against external/internal tables created both in HDFS 
> and in Google Cloud.
> The error shown in beeline/sql client 
> "FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask"
> Hive Logs:
> 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - 
> java.lang.NullPointerException
> 2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (: ()) - Failed to run 
> metacheck:
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448
> Here are the steps to recreate this issue:
> use default
> DROP TABLE IF EXISTS repairtable
> CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
> MSCK REPAIR TABLE default.repairtable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506772#comment-15506772
 ] 

Sergio Peña commented on HIVE-14029:


Sure. [~stakiar] Let me know if the below statements are correct, and feel free 
to correct me.

- Spark2 uses a fork of Hive 1.2 due to issues with Apache Hive. They called 
this project {{spark-hive}}. Spark only uses Hive 1.2 metastore/serde/udf jars 
form this forked project.
  They download this from 
https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 

- Spark2 assembly without hive will be built without any of the above 
dependencies.

- Hive2 itests will use Spark2 assembly to run Hive2 tests. This means Hive2 
might not test Spark2 correctly due to the lack of Hive 1.2 libraries in it.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception

2016-09-20 Thread Anbu Cheeralan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anbu Cheeralan updated HIVE-14798:
--
Description: 
MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
"FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask"

Hive Logs:

2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - 
java.lang.NullPointerException
2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (: ()) - Failed to run 
metacheck:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448

Here are the steps to recreate this issue:
use default
DROP TABLE IF EXISTS repairtable
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
MSCK REPAIR TABLE default.repairtable

  was:
MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1
I have tested the same against external/internal tables created both in HDFS 
and in Google Cloud.

The error shown in beeline/sql client 
"FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask"

Hive Logs:

2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (:()) - 
java.lang.NullPointerException
2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (:()) - Failed to run 
metacheck:
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448

Here are the steps to recreate this issue:
use default
DROP TABLE IF EXISTS repairtable
CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
MSCK REPAIR TABLE default.repairtable


> MSCK REPAIR TABLE throws null pointer exception
> ---
>
> Key: HIVE-14798
> URL: https://issues.apache.org/jira/browse/HIVE-14798
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Anbu Cheeralan
>
> MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1
> I have tested the same against external/internal tables created both in HDFS 
> and in Google Cloud.
> The error shown in beeline/sql client 
> "FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask"
> Hive Logs:
> 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - 
> java.lang.NullPointerException
> 2016-09-14T04:08:02,434 WARN  [main]: exec.DDLTask (: ()) - Failed to run 
> metacheck:
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
> at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448
> Here are the steps to recreate this issue:
> use default
> DROP TABLE IF EXISTS repairtable
> CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING)
> MSCK REPAIR TABLE default.repairtable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1555) JDBC Storage Handler

2016-09-20 Thread Dmitry Zagorulkin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506789#comment-15506789
 ] 

Dmitry Zagorulkin commented on HIVE-1555:
-

Actually it works fine with mysql and postgres.
For oracle it does not work.

See here: 
http://stackoverflow.com/questions/39594861/creating-sparksql-jdbc-federation-with-oracle-db-table-fails-with-strange-error#comment66500144_39594861

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Teddy Choi
> Attachments: JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9530) constant * column is null interpreted as constant * boolean

2016-09-20 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Csanady reassigned HIVE-9530:


Assignee: Miklos Csanady

> constant * column is null interpreted as constant * boolean
> ---
>
> Key: HIVE-9530
> URL: https://issues.apache.org/jira/browse/HIVE-9530
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Assignee: Miklos Csanady
>Priority: Minor
>
> {code}
> select c1 from tversion where 1 * cnnull is null
> FAILED: SemanticException [Error 10014]: Line 1:30 Wrong arguments 'cnnull': 
> No matching method for class 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply with (int, boolean)
> create table if not exists TVERSION (
>   RNUM int,
>   C1 int,
>   CVER char(6),
>   CNNULL int,
>   CCNULL char(1)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
> STORED AS TEXTFILE ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9530) constant * column is null interpreted as constant * boolean

2016-09-20 Thread Miklos Csanady (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506778#comment-15506778
 ] 

Miklos Csanady commented on HIVE-9530:
--

Use brackets to prevent this error.
Your command it is interpreted like this:
{code}

select c1 from tversion where 1 * ( cnull is null );
{code}

You should use this instead:
{code}

 select c1 from tversion where ( 1 * cnnull ) is null; 
 OK
 +-+
 | c1  |
 +-+
 +-+
{code}


> constant * column is null interpreted as constant * boolean
> ---
>
> Key: HIVE-9530
> URL: https://issues.apache.org/jira/browse/HIVE-9530
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.14.0
>Reporter: N Campbell
>Priority: Minor
>
> {code}
> select c1 from tversion where 1 * cnnull is null
> FAILED: SemanticException [Error 10014]: Line 1:30 Wrong arguments 'cnnull': 
> No matching method for class 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply with (int, boolean)
> create table if not exists TVERSION (
>   RNUM int,
>   C1 int,
>   CVER char(6),
>   CNNULL int,
>   CCNULL char(1)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
> STORED AS TEXTFILE ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14412) Add a timezone-aware timestamp

2016-09-20 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-14412:
--
Attachment: HIVE-14412.5.patch

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, 
> HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch, HIVE-14412.5.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506745#comment-15506745
 ] 

Ferdinand Xu commented on HIVE-14240:
-

Hi [~stakiar], do you have any updates for this ticket? I am trying to move 
HIVE-14029 forwards.

Thanks,
Ferd

> HoS itests shouldn't depend on a Spark distribution
> ---
>
> Key: HIVE-14240
> URL: https://issues.apache.org/jira/browse/HIVE-14240
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The HoS integration tests download a full Spark Distribution (a tar-ball) 
> from CloudFront. It uses this distribution to run Spark locally. It runs a 
> few tests with Spark in embedded mode, and some tests against a local Spark 
> on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download 
> the tar-ball from a pre-defined location.
> This is problematic because the Spark Distribution shades all its 
> dependencies, including Hadoop dependencies. This can cause problems when 
> upgrading the Hadoop version for Hive (ref: HIVE-13930).
> Removing it will also avoid having to download the tar-ball during every 
> build, and simplify the build process for the itests module.
> The Hive itests should instead directly depend on Spark artifacts published 
> in Maven Central. It will require some effort to get this working. The 
> current Hive Spark Client uses a launch script in the Spark installation to 
> run Spark jobs. The script basically does some setup work and invokes 
> org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class 
> directly, which avoids the need to have a full Spark distribution available 
> locally (in fact this option already exists, but isn't tested).
> There may be other issues around classpath conflicts between Hive and Spark. 
> For example, Hive and Spark require different versions of Kyro. One solution 
> to this would be to take Spark artifacts and shade Kyro inside them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506738#comment-15506738
 ] 

Ferdinand Xu commented on HIVE-14029:
-

Hi [~spena], I am not quite sure why assembly tar.gz  will cause issues for 
Hive since it's included in Hive itest only. Could you explain a little bit 
more?  BTW, I will take a look at how to remove it from itest.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12266) When client exists abnormally, it doesn't release ACID locks

2016-09-20 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506736#comment-15506736
 ] 

Chaoyu Tang commented on HIVE-12266:


[~wzheng], To my understand, this JIRA could only address the issue for CLI, 
but not for beeline. Since the shutdown hook could only be invoked when JVM 
running the Driver is shutdown, in beeline case, the beeline is running in a 
different JVM from HS2. Is that correct? 

> When client exists abnormally, it doesn't release ACID locks
> 
>
> Key: HIVE-12266
> URL: https://issues.apache.org/jira/browse/HIVE-12266
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12266.1.patch, HIVE-12266.2.patch, 
> HIVE-12266.3.patch, HIVE-12266.branch-1.patch
>
>
> if you start Hive CLI (locking enabled) and run some command that acquires 
> locks and ^C the shell before command completes the locks for the command 
> remain until they timeout.
> I believe Beeline has the same issue.
> Need to add proper hooks to release locks when command dies. (As much as 
> possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506716#comment-15506716
 ] 

Sergio Peña commented on HIVE-14029:


[~Ferd] I think we should try to fix HIVE-14240 first to avoid dependencies 
issues when Spark and Hive are running in the same machine. I talked with the 
Spark team a few times, and they think this assembly tar.gz will cause issues 
due to other Hive libraries Spark depends, such as Hive 1.2 metastore and Hive 
1.2 serde.

Would you like to start working on HIVE-14240? You can ask [~stakiar] if you're 
interested.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted

2016-09-20 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506729#comment-15506729
 ] 

Peter Vary commented on HIVE-9423:
--

I have investigated the issue, and here is what I found:
- There was an issue in the Thrift code, that if there was not enough executor, 
then the TTheadPoolExecutor stuck in an infinite loop see: THRIFT-2046. This 
issue is resolved in Thrift 0.9.2
- Hive 1.x, 2.x uses Thrift 0.9.3.

I have tested the behavior on Hive 2.2.0-SNAPSHOT with the following 
configuration:
- Add the following lines to hive-site.xml:
{code}

  hive.server2.thrift.max.worker.threads
  1


  hive.server2.thrift.min.worker.threads
  1

{code}
- Start a metastore, and a HS2 instance
- Start 2 BeeLine, and connect to the HS2

The 1st BeeLine connected as expected, the 2nd BeeLine after the configured 
timeout period (default 20s) printed out the following:
{code}
Connecting to jdbc:hive2://localhost:1
16/09/20 16:23:57 [main]: WARN jdbc.HiveConnection: Failed to connect to 
localhost:1
HS2 may be unavailable, check server status
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:1: null (state=08S01,code=0)
Beeline version 2.2.0-SNAPSHOT by Apache Hive
beeline> 
{code}

This is behavior is much better than the original problem (no HS2 restart is 
needed, and closing unused connections helps), but this is not a perfect 
solution, since there is no difference between a non-running HS2, and a HS2 
with exhausted executor pool.

> HiveServer2: Implement some admission control mechanism for graceful 
> degradation when resources are exhausted
> -
>
> Key: HIVE-9423
> URL: https://issues.apache.org/jira/browse/HIVE-9423
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0
>Reporter: Vaibhav Gumashta
>
> An example of where it is needed: it has been reported that when # of client 
> connections is greater than   {{hive.server2.thrift.max.worker.threads}}, 
> HiveServer2 stops accepting new connections and ends up having to be 
> restarted. This should be handled more gracefully by the server and the JDBC 
> driver, so that the end user gets aware of the problem and can take 
> appropriate steps (either close existing connections or bump of the config 
> value or use multiple server instances with dynamic service discovery 
> enabled). Similarly, we should also review the behaviour of background thread 
> pool to have a well defined behavior on the the pool getting exhausted. 
> Ideally implementing some form of general admission control will be a better 
> solution, so that we do not accept new work unless sufficient resources are 
> available and display graceful degradation under overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: HIVE-14029.1.patch

https://builds.apache.org/job/PreCommit-HIVE-Build/1239 was failed. Retest the 
patch.

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.1.patch, HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0

2016-09-20 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14029:

Attachment: (was: HIVE-14029.1.patch)

> Update Spark version to 2.0.0
> -
>
> Key: HIVE-14029
> URL: https://issues.apache.org/jira/browse/HIVE-14029
> Project: Hive
>  Issue Type: Bug
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14029.patch
>
>
> There are quite some new optimizations in Spark 2.0.0. We need to bump up 
> Spark to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10701) Escape apostrophe not work properly

2016-09-20 Thread Miklos Csanady (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506651#comment-15506651
 ] 

Miklos Csanady edited comment on HIVE-10701 at 9/20/16 2:11 PM:


According to 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string
"Hive uses C-style escaping within the strings."

1: jdbc:hive2://> select 's''2' ;
OK
 +--+
 | _c0  |
 +--+
 | s2   |
 +--+
1 row selected (0.051 seconds)
1: jdbc:hive2://> select 's\'2' ;
OK
 +--+
 | _c0  |
 +--+
 | s'2  |
 +--+

IMHO the '' is not meant to be an escape-sequence, but two string literals.



was (Author: mcsanady):
According to 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string
"Hive uses C-style escaping within the strings."

1: jdbc:hive2://> select 's''2' ;
OK
+--+
| _c0  |
+--+
| s2   |
+--+
1 row selected (0.051 seconds)
1: jdbc:hive2://> select 's\'2' ;
OK
+--+
| _c0  |
+--+
| s'2  |
+--+

IMHO the '' is not meant to be an escape-sequence, but two string literals.


> Escape apostrophe not work properly
> ---
>
> Key: HIVE-10701
> URL: https://issues.apache.org/jira/browse/HIVE-10701
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.12.0, 0.13.0, 0.14.0
>Reporter: Tracy Y
>Assignee: Miklos Csanady
>Priority: Minor
>
> SELECT  'S''2'  FROM table return S2 instead of S'2
> The apostrophe suppose to be escaped by the single quote in front.  
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >