[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508829#comment-15508829 ] Hive QA commented on HIVE-14797: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829373/HIVE-14797.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1248/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1248/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1248/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829373 - PreCommit-HIVE-Build > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508805#comment-15508805 ] Rui Li commented on HIVE-14797: --- [~roncenzhao] your solution seems also OK and simpler. Would like to know [~xuefuz]'s opinions. > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508789#comment-15508789 ] Rui Li edited comment on HIVE-14797 at 9/21/16 5:16 AM: Hmm random prime won't work because we need to make sure same rows always have same hash code. I can think of another way: {code} 1. If we have only one field, we can just return the field's hash code. 2. If we have multiple fields, we can compute hash code as: P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. Seems BigInteger::nextProbablePrime() can help generate the series. {code} was (Author: lirui): Hmm random prime won't work because we need to make sure same rows always have same hash code. I can think of another way: {code} 1. If we have only one field, we can just return the field's hash code. 2. If we have multiple fields, we can compute hash code as: P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. Seems {{BigInteger::nextProbablePrime()}} can help generate the series. {code} > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508789#comment-15508789 ] Rui Li edited comment on HIVE-14797 at 9/21/16 5:15 AM: Hmm random prime won't work because we need to make sure same rows always have same hash code. I can think of another way: {code} 1. If we have only one field, we can just return the field's hash code. 2. If we have multiple fields, we can compute hash code as: P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. Seems {{BigInteger::nextProbablePrime()}} can help generate the series. {code} was (Author: lirui): Hmm random prime won't work because we need to make sure same rows always have same hash code. I can think of another way: 1. If we have only one field, we can just return the field's hash code. 2. If we have multiple fields, we can compute hash code as: P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. Seems {{BigInteger::nextProbablePrime()}} can help generate the series. > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508789#comment-15508789 ] Rui Li commented on HIVE-14797: --- Hmm random prime won't work because we need to make sure same rows always have same hash code. I can think of another way: 1. If we have only one field, we can just return the field's hash code. 2. If we have multiple fields, we can compute hash code as: P1*hash(F1)+...+Pn*hash(Fn). Where hash(Fn) is the hash code of the nth field, and {P1,...,Pn} is a deterministic series of prime numbers, e.g. {17,19,...}. Seems {{BigInteger::nextProbablePrime()}} can help generate the series. > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508756#comment-15508756 ] roncenzhao commented on HIVE-14797: --- Or we can use the follow way: Let the seed have two options: 31 and 131. In `ReduceSinkOperator` we can get the reducer number named `reduceNum`, and then we can choose the other value if the `reduceNum` is equal to 31 or 131. Is it OK? > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508650#comment-15508650 ] Hive QA commented on HIVE-14803: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829480/HIVE-14803.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_allchildsarenull] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_1] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[vector_count_distinct] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[optimize_nullscan] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union_remove_25] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1247/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1247/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1247/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829480 - PreCommit-HIVE-Build > S3: Stats gathering for insert queries can be expensive for partitioned > dataset > --- > > Key: HIVE-14803 > URL: https://issues.apache.org/jira/browse/HIVE-14803 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14803.1.patch > > > StatsTask's aggregateStats populates stats details for all partitions by > checking the file sizes which turns out to be expensive when larger number of > partitions are inserted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508643#comment-15508643 ] Rui Li commented on HIVE-14797: --- If user specifies #reducers to be 31, we shouldn't change it. Is it possible we can use random prime numbers to compute the hash code? > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508636#comment-15508636 ] Ferdinand Xu commented on HIVE-14029: - Hi [~lirui] bq. Is there any other way we can track the read method? If not, guess we can just remove the class from Hive side. I will investigate this in a separate JIRA. Thank you for pointing this out. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508625#comment-15508625 ] Lefty Leverenz commented on HIVE-14793: --- Should this be documented now, or wait and do it with the rest of HIVE-14744? > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, > HIVE-14793.03.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-14029: Attachment: HIVE-14029.2.patch > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.2.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver
[ https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508520#comment-15508520 ] Hive QA commented on HIVE-14782: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829483/HIVE-14782.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 105 failed/errored test(s), 10554 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[acid_overwrite] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_partition_coltype_2columns] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_partition_coltype_invalidcolname] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_partition_coltype_invalidtype] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_as_select_not_exist] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_as_select_with_partition] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure6] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[alter_view_failure7] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[analyze1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[analyze] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive5] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_corrupt] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_insert4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi5] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi6] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_multi7] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[archive_partspec5] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[bad_sample_clause] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view5] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view6] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view7] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view8] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[delete_non_acid_table] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[desc_failure2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[describe_xpath4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[dyn_part2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[dyn_part4]
[jira] [Updated] (HIVE-14412) Add a timezone-aware timestamp
[ https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-14412: -- Attachment: (was: HIVE-14412.5.patch) > Add a timezone-aware timestamp > -- > > Key: HIVE-14412 > URL: https://issues.apache.org/jira/browse/HIVE-14412 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, > HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch > > > Java's Timestamp stores the time elapsed since the epoch. While it's by > itself unambiguous, ambiguity comes when we parse a string into timestamp, or > convert a timestamp to string, causing problems like HIVE-14305. > To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp
[ https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508499#comment-15508499 ] Rui Li commented on HIVE-14412: --- Most of the recent failures are because "TIME" is added as a new key word so it can't be used as column name. So I have to rename the columns in the qtests. This can also require users to update their current queries. Do you guys think this is acceptable? > Add a timezone-aware timestamp > -- > > Key: HIVE-14412 > URL: https://issues.apache.org/jira/browse/HIVE-14412 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, > HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch, HIVE-14412.5.patch > > > Java's Timestamp stores the time elapsed since the epoch. While it's by > itself unambiguous, ambiguity comes when we parse a string into timestamp, or > convert a timestamp to string, causing problems like HIVE-14305. > To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508437#comment-15508437 ] Rui Li commented on HIVE-14029: --- I agree to move this forward. HIVE-14240 can be done in parallel, if it doesn't depend on this one :) > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14714) Finishing Hive on Spark causes "java.io.IOException: Stream closed"
[ https://issues.apache.org/jira/browse/HIVE-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508419#comment-15508419 ] Rui Li commented on HIVE-14714: --- +1 Thanks for the update [~gszadovszky]. I'll commit this shortly if no one has any other comments. > Finishing Hive on Spark causes "java.io.IOException: Stream closed" > --- > > Key: HIVE-14714 > URL: https://issues.apache.org/jira/browse/HIVE-14714 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.1.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky > Attachments: HIVE-14714.2.patch, HIVE-14714.3.patch, HIVE-14714.patch > > > After execute hive command with Spark, finishing the beeline session or > even switch the engine causes IOException. The following executed Ctrl-D to > finish the session but "!quit" or even "set hive.execution.engine=mr;" causes > the issue. > From HS2 log: > {code} > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [HiveServer2-Handler-Pool: Thread-106]: Timed out shutting down remote > driver, interrupting... > 2016-09-06 16:15:12,291 WARN org.apache.hive.spark.client.SparkClientImpl: > [Driver]: Waiting thread interrupted, killing child process. > 2016-09-06 16:15:12,296 WARN org.apache.hive.spark.client.SparkClientImpl: > [stderr-redir-1]: Error in redirector thread. > java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:272) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > org.apache.hive.spark.client.SparkClientImpl$Redirector.run(SparkClientImpl.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] roncenzhao updated HIVE-14797: -- Status: Patch Available (was: Open) > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508405#comment-15508405 ] roncenzhao commented on HIVE-14797: --- Yes, we can not hard code the number (31). But we cannot know which number to be set before the end of the job. So, I think we can solve it easily by the follow ways: In the method "Utilities.estimateReducers(xxx)", when the `reducers` value can be divisible by 31 we let it plus 1. > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508403#comment-15508403 ] Rui Li commented on HIVE-14240: --- We have two kinds of test for HoS - TestSparkCliDriver runs on local-cluster, and TestMiniSparkOnYarnCliDriver runs on a mini yarn cluster. I know local-cluster is not intended to be used outside spark. So if local-cluster causes trouble for this task, I think it's acceptable to migrate the qtest in TestSparkCliDriver to TestMiniSparkOnYarnCliDriver. > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working. The > current Hive Spark Client uses a launch script in the Spark installation to > run Spark jobs. The script basically does some setup work and invokes > org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class > directly, which avoids the need to have a full Spark distribution available > locally (in fact this option already exists, but isn't tested). > There may be other issues around classpath conflicts between Hive and Spark. > For example, Hive and Spark require different versions of Kyro. One solution > to this would be to take Spark artifacts and shade Kyro inside them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver
[ https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14782: - Attachment: HIVE-14782.2.patch All test cases just reference src table so reusing the init/cleanup scripts of encrypted driver. > Improve runtime of NegativeMinimrCliDriver > -- > > Key: HIVE-14782 > URL: https://issues.apache.org/jira/browse/HIVE-14782 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14782.1.patch, HIVE-14782.2.patch > > > NegativeMinimrCliDriver is one of the slowest test batch. The actual test > takes only 3 minutes where as initialization of test takes around 15 minutes. > Also remove hadoop20.q tests from NegativeMinimrCliDriver batch as it is no > longer supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-14803: Target Version/s: 2.2.0 Status: Patch Available (was: Open) > S3: Stats gathering for insert queries can be expensive for partitioned > dataset > --- > > Key: HIVE-14803 > URL: https://issues.apache.org/jira/browse/HIVE-14803 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14803.1.patch > > > StatsTask's aggregateStats populates stats details for all partitions by > checking the file sizes which turns out to be expensive when larger number of > partitions are inserted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver
[ https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14782: - Description: NegativeMinimrCliDriver is one of the slowest test batch. The actual test takes only 3 minutes where as initialization of test takes around 15 minutes. Also remove hadoop20.q tests from NegativeMinimrCliDriver batch as it is no longer supported. was:mapreduce_stack_trace_hadoop20.q runs as an isolated test which is no longer required as we no longer support hadoop 0.20.x > Improve runtime of NegativeMinimrCliDriver > -- > > Key: HIVE-14782 > URL: https://issues.apache.org/jira/browse/HIVE-14782 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14782.1.patch > > > NegativeMinimrCliDriver is one of the slowest test batch. The actual test > takes only 3 minutes where as initialization of test takes around 15 minutes. > Also remove hadoop20.q tests from NegativeMinimrCliDriver batch as it is no > longer supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14782) Improve runtime of NegativeMinimrCliDriver
[ https://issues.apache.org/jira/browse/HIVE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14782: - Summary: Improve runtime of NegativeMinimrCliDriver (was: Remove mapreduce_stack_trace_hadoop20.q as we no longer have hadoop20) > Improve runtime of NegativeMinimrCliDriver > -- > > Key: HIVE-14782 > URL: https://issues.apache.org/jira/browse/HIVE-14782 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14782.1.patch > > > mapreduce_stack_trace_hadoop20.q runs as an isolated test which is no longer > required as we no longer support hadoop 0.20.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508363#comment-15508363 ] liyunzhang_intel commented on HIVE-14240: - [~Ferd]: bq. In Pig, they don't require Spark distribution since they only test Spark standalone mode in their integration test. In Pig on Spark, we don't need download spark distribution to run unit test because now we only enable "local"(SPARK_MASTER) mode. we don't support standalone, yarn-client, yarn-cluster mode now. We just [copy all spark dependency jars published from mvn repository to the run-time classpath|https://github.com/apache/pig/blob/spark/bin/pig#L399] when running unit tests. > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working. The > current Hive Spark Client uses a launch script in the Spark installation to > run Spark jobs. The script basically does some setup work and invokes > org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class > directly, which avoids the need to have a full Spark distribution available > locally (in fact this option already exists, but isn't tested). > There may be other issues around classpath conflicts between Hive and Spark. > For example, Hive and Spark require different versions of Kyro. One solution > to this would be to take Spark artifacts and shade Kyro inside them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508353#comment-15508353 ] Dapeng Sun edited comment on HIVE-14029 at 9/21/16 1:24 AM: [~Ferd] Yes, I used this command was (Author: dapengsun): Yes, I used this command > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508353#comment-15508353 ] Dapeng Sun commented on HIVE-14029: --- Yes, I used this command > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508350#comment-15508350 ] Ferdinand Xu commented on HIVE-14029: - Hi [~spena], I think we should move it forwards since HIVE-14240 needs further discussions and it doesn't block this ticket. We can upload the tgz into a stable location to upgrade the Spark version and once we fixed HIVE-14240, we can easily remove this tgz. [~lirui] [~stakiar] [~aihuaxu] any thoughts? > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508316#comment-15508316 ] Ferdinand Xu edited comment on HIVE-14029 at 9/21/16 1:16 AM: -- Hi [~stakiar], the tgz was built via the following commands: {code} sh ./dev/make-distribution.sh --name hadoop2-without-hive --tgz -Phadoop-2.7 -Pyarn -Pparquet-provided -Dhadoop.version=2.7.3 {code} [~dapengsun], can you confirm it please? was (Author: ferd): Hi [~stakiar], the tgz was built via the following commands: {code} sh ./dev/make-distribution.sh --name hadoop2-without-hive --tgz -Phadoop-2.7 -Pyarn -Pparquet-provided -Dhadoop.version=2.7.3 {code} > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508335#comment-15508335 ] Ferdinand Xu commented on HIVE-14240: - Thanks [~stakiar] for your input. AFAIK, TestSparkCliDriver needs SparkSubmit to submit a job which requires SPARK_HOME to direct to a Spark distribution because it tests SparkOnYarn. [~kellyzly] [~mohitsabharwal], please correct it if any following statements are wrong. In Pig, they don't require Spark distribution since they only test Spark standalone mode in their integration test. > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working. The > current Hive Spark Client uses a launch script in the Spark installation to > run Spark jobs. The script basically does some setup work and invokes > org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class > directly, which avoids the need to have a full Spark distribution available > locally (in fact this option already exists, but isn't tested). > There may be other issues around classpath conflicts between Hive and Spark. > For example, Hive and Spark require different versions of Kyro. One solution > to this would be to take Spark artifacts and shade Kyro inside them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14803) S3: Stats gathering for insert queries can be expensive for partitioned dataset
[ https://issues.apache.org/jira/browse/HIVE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-14803: Attachment: HIVE-14803.1.patch Observed 12% improvement in runtime with 100 partition dataset. \cc [~ashutoshc] > S3: Stats gathering for insert queries can be expensive for partitioned > dataset > --- > > Key: HIVE-14803 > URL: https://issues.apache.org/jira/browse/HIVE-14803 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14803.1.patch > > > StatsTask's aggregateStats populates stats details for all partitions by > checking the file sizes which turns out to be expensive when larger number of > partitions are inserted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508316#comment-15508316 ] Ferdinand Xu commented on HIVE-14029: - Hi [~stakiar], the tgz was built via the following commands: {code} sh ./dev/make-distribution.sh --name hadoop2-without-hive --tgz -Phadoop-2.7 -Pyarn -Pparquet-provided -Dhadoop.version=2.7.3 {code} > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508218#comment-15508218 ] Sahil Takiar commented on HIVE-14240: - I looked into this today and tried to get something working, but I don't think its possible without making some modifications to Spark. * The HoS integration tests run with {{spark.master=local-cluster[2,2,1024]}} ** Basically, the {{TestSparkCliDriver}} JVM run the SparkSubmit command (which will spawn a new process), the SparkSubmit process will then create 2 more processes (the Spark Executors do the actual work) with 2 cores and 1024 Mb memory each ** The {{local-cluster}} option is not present in the Spark docs because it is mainly used for integration testing within the Spark project itself; it basically provides a way of deploying a mini cluster locally ** The advantage of the {{local-cluster}} is that it does not require Spark Masters or Workers to be running *** Spark Workers are basically like NodeManagers, a Spark Master is basically like HS2 * Looked through the Spark code that launches actual Spark Executors and they more or less require a {{SPARK_HOME}} directory to be present (ref: https://github.com/apache/spark/blob/branch-2.0/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java) ** {{SPARK_HOME}} is suppose to point to a directory containing a Spark distribution Thus, we would need to modify the {{AbstractCommandBuilder.java}} class in Spark so that it doesn't require {{SPARK_HOME}} to be set. However, I'm not sure how difficult this will be to do in Spark. We could change the {{spark.master} from {{local-cluster}} to {{local}}, in which case everything will be run locally. However, I think this removes some functionality from the HoS tests since running locally isn't the same as running against a real mini-cluster. > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working. The > current Hive Spark Client uses a launch script in the Spark installation to > run Spark jobs. The script basically does some setup work and invokes > org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class > directly, which avoids the need to have a full Spark distribution available > locally (in fact this option already exists, but isn't tested). > There may be other issues around classpath conflicts between Hive and Spark. > For example, Hive and Spark require different versions of Kyro. One solution > to this would be to take Spark artifacts and shade Kyro inside them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508218#comment-15508218 ] Sahil Takiar edited comment on HIVE-14240 at 9/21/16 12:16 AM: --- I looked into this today and tried to get something working, but I don't think its possible without making some modifications to Spark. * The HoS integration tests run with {{spark.master=local-cluster[2,2,1024]}} ** Basically, the {{TestSparkCliDriver}} JVM run the SparkSubmit command (which will spawn a new process), the SparkSubmit process will then create 2 more processes (the Spark Executors do the actual work) with 2 cores and 1024 Mb memory each ** The {{local-cluster}} option is not present in the Spark docs because it is mainly used for integration testing within the Spark project itself; it basically provides a way of deploying a mini cluster locally ** The advantage of the {{local-cluster}} is that it does not require Spark Masters or Workers to be running *** Spark Workers are basically like NodeManagers, a Spark Master is basically like HS2 * Looked through the Spark code that launches actual Spark Executors and they more or less require a {{SPARK_HOME}} directory to be present (ref: https://github.com/apache/spark/blob/branch-2.0/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java) ** {{SPARK_HOME}} is suppose to point to a directory containing a Spark distribution Thus, we would need to modify the {{AbstractCommandBuilder.java}} class in Spark so that it doesn't require {{SPARK_HOME}} to be set. However, I'm not sure how difficult this will be to do in Spark. We could change the {{spark.master}} from {{local-cluster}} to {{local}}, in which case everything will be run locally. However, I think this removes some functionality from the HoS tests since running locally isn't the same as running against a real mini-cluster. was (Author: stakiar): I looked into this today and tried to get something working, but I don't think its possible without making some modifications to Spark. * The HoS integration tests run with {{spark.master=local-cluster[2,2,1024]}} ** Basically, the {{TestSparkCliDriver}} JVM run the SparkSubmit command (which will spawn a new process), the SparkSubmit process will then create 2 more processes (the Spark Executors do the actual work) with 2 cores and 1024 Mb memory each ** The {{local-cluster}} option is not present in the Spark docs because it is mainly used for integration testing within the Spark project itself; it basically provides a way of deploying a mini cluster locally ** The advantage of the {{local-cluster}} is that it does not require Spark Masters or Workers to be running *** Spark Workers are basically like NodeManagers, a Spark Master is basically like HS2 * Looked through the Spark code that launches actual Spark Executors and they more or less require a {{SPARK_HOME}} directory to be present (ref: https://github.com/apache/spark/blob/branch-2.0/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java) ** {{SPARK_HOME}} is suppose to point to a directory containing a Spark distribution Thus, we would need to modify the {{AbstractCommandBuilder.java}} class in Spark so that it doesn't require {{SPARK_HOME}} to be set. However, I'm not sure how difficult this will be to do in Spark. We could change the {{spark.master} from {{local-cluster}} to {{local}}, in which case everything will be run locally. However, I think this removes some functionality from the HoS tests since running locally isn't the same as running against a real mini-cluster. > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working.
[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508203#comment-15508203 ] Daniel Dai commented on HIVE-14801: --- +1 > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508086#comment-15508086 ] Hive QA commented on HIVE-14801: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829450/HIVE-14801.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1245/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1245/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1245/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829450 - PreCommit-HIVE-Build > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508063#comment-15508063 ] Thejas M Nair commented on HIVE-14801: -- [~sseth] Looking at mvn outputs, yes, the difference looks larger - Before - Hive Integration - Unit Tests .. SUCCESS [ 35.974 s] Total time: 01:28 min After - Hive Integration - Unit Tests .. SUCCESS [ 26.785 s] Total time: 01:09 min Though there seems to be more noise when total time is considered . > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
[ https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508057#comment-15508057 ] Sushanth Sowmyan commented on HIVE-13853: - That is how I had started testing, but HS2 has some quirks on filter load time due to which it has to be loaded explicitly at HS2 start time. Thus, this covers changes to HS2 start as well, and not simply the filter. > Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat > - > > Key: HIVE-13853 > URL: https://issues.apache.org/jira/browse/HIVE-13853 > Project: Hive > Issue Type: Bug > Components: HiveServer2, WebHCat >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13853.2.patch, HIVE-13853.patch > > > There is a possibility that there may be a CSRF-based attack on various > hadoop components, and thus, there is an effort to add a block for all > incoming http requests if they do not contain a X-XSRF-Header header. (See > HADOOP-12691 for motivation) > This has potential to affect HS2 when running on thrift-over-http mode(if > cookie-based-auth is used), and webhcat. > We introduce new flags to determine whether or not we're using the filter, > and if we are, we will automatically reject any http requests which do not > contain this header. > To allow this to work, we also need to make changes to our JDBC driver to > automatically inject this header into any requests it makes. Also, any > client-side programs/api not using the JDBC driver directly will need to make > changes to add a X-XSRF-Header header to the request to make calls to > HS2/WebHCat if this filter is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved HIVE-14793. --- Resolution: Fixed Fix Version/s: 2.2.0 Committed to master. Precommit not required, since this makes no difference on an already deployed precommit run. Tested offline. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, > HIVE-14793.03.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14793: -- Attachment: HIVE-14793.03.patch Updated. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch, > HIVE-14793.03.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508003#comment-15508003 ] Siddharth Seth commented on HIVE-14793: --- Oops. I'll generate it again properly. The diff is because of line spacing. It should not have shown up. Thanks for the review. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14800) Handle off by 3 in ORC split generation based on split strategy used
[ https://issues.apache.org/jira/browse/HIVE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507998#comment-15507998 ] Siddharth Seth commented on HIVE-14800: --- They are valid splits - however, it should be possible to make them consistent when splits are generated by ORC itself. Either special case BI or ETL to generate the same split as the other for the starting split of a file. In terms of hashCode for consistent splits - that should be independent of the format. > Handle off by 3 in ORC split generation based on split strategy used > > > Key: HIVE-14800 > URL: https://issues.apache.org/jira/browse/HIVE-14800 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth > > BI will apparently generate splits starting at offset 0. > ETL will skip the ORC header and generate a split starting at offset 3. > There's a workaround in the HiveSplitGenreator to handle this for consistent > splits. Ideally, Orc split generation should take care of this. > cc [~prasanth_j], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507954#comment-15507954 ] Sergio Peña commented on HIVE-14793: Patch looks good. +1 However, I see the outputDir change in the patch is already on master. https://github.com/apache/hive/blob/master/dev-support/jenkins-execute-build.sh#L45 > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755 ] Siddharth Seth edited comment on HIVE-14793 at 9/20/16 9:58 PM: bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq. --outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir once it runs successfully. was (Author: sseth): bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq .outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir once it runs successfully. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755 ] Siddharth Seth edited comment on HIVE-14793 at 9/20/16 9:57 PM: bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq .outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir once it runs successfully. was (Author: sseth): bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq .--outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir once it runs successfully. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14793: -- Attachment: HIVE-14793.02.patch Updated patch, which removes the change to target/. [~spena] - could you please take another look. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch, HIVE-14793.02.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507855#comment-15507855 ] Siddharth Seth commented on HIVE-14801: --- [~thejas] - you may want to look at mvn test output, rather than the junit result file. The junit result file does not include the setup time. My guess is the metastore is started during initialization? It may be more than 5 seconds saved overall. > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507845#comment-15507845 ] Thejas M Nair commented on HIVE-14801: -- Overall around 5 seconds saved - cc [~sseth] Before - {code} http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report.xsd; name="org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation" time="14.252" tests="6" errors="0" skipped="0" failures="0"> http://java.oracle.com/"/> http://bugreport.sun.com/bugreport/"/> {code} After - {code} http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="https://maven.apache.org/surefire/maven-surefire-plugin/xsd/surefire-test-report.xsd; name="org.apache.hadoop.hive.metastore.TestPartitionNameWhitelistValidation" time="9.412" tests="6" errors="0" skipped="0" failures="0"> http://java.oracle.com/"/> http://bugreport.sun.com/bugreport/"/> {code} > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14801: - Attachment: HIVE-14801.2.patch 2.patch - Moving the setup to a one time setup saves at least another second . > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch, HIVE-14801.2.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
[ https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507830#comment-15507830 ] Siddharth Seth commented on HIVE-13853: --- Can this be written as a unit test instead of a test which requires HS2 to be brought up? Test the functionality of the filter independent of where it is running. > Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat > - > > Key: HIVE-13853 > URL: https://issues.apache.org/jira/browse/HIVE-13853 > Project: Hive > Issue Type: Bug > Components: HiveServer2, WebHCat >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13853.2.patch, HIVE-13853.patch > > > There is a possibility that there may be a CSRF-based attack on various > hadoop components, and thus, there is an effort to add a block for all > incoming http requests if they do not contain a X-XSRF-Header header. (See > HADOOP-12691 for motivation) > This has potential to affect HS2 when running on thrift-over-http mode(if > cookie-based-auth is used), and webhcat. > We introduce new flags to determine whether or not we're using the filter, > and if we are, we will automatically reject any http requests which do not > contain this header. > To allow this to work, we also need to make changes to our JDBC driver to > automatically inject this header into any requests it makes. Also, any > client-side programs/api not using the JDBC driver directly will need to make > changes to add a X-XSRF-Header header to the request to make calls to > HS2/WebHCat if this filter is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507799#comment-15507799 ] Hive QA commented on HIVE-14713: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829432/HIVE-14713.2.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10626 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1244/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1244/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1244/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829432 - PreCommit-HIVE-Build > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14801: - Status: Patch Available (was: Open) > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14801: - Description: TestPartitionNameWhitelistValidation uses remote metastore. However, there can be multiple issues around startup of remote metastore, including race conditions in finding available port. In addition, all the initialization done at startup of remote metastore is likely to make the test case take more time. This test case doesn't need remote metastore, so it should be moved to using embedded metastore. was: TestPartitionNameWhitelistValidation uses remote metastore. However, there can be multiple issues around startup of remote metastore, including race conditions in finding available port. In addition, all the initialization done at startup of remote metastore is likely to make the test case take more time. > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14801) improve TestPartitionNameWhitelistValidation stability
[ https://issues.apache.org/jira/browse/HIVE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14801: - Attachment: HIVE-14801.1.patch > improve TestPartitionNameWhitelistValidation stability > -- > > Key: HIVE-14801 > URL: https://issues.apache.org/jira/browse/HIVE-14801 > Project: Hive > Issue Type: Bug >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14801.1.patch > > > TestPartitionNameWhitelistValidation uses remote metastore. However, there > can be multiple issues around startup of remote metastore, including race > conditions in finding available port. In addition, all the initialization > done at startup of remote metastore is likely to make the test case take more > time. > This test case doesn't need remote metastore, so it should be moved to using > embedded metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755 ] Siddharth Seth edited comment on HIVE-14793 at 9/20/16 8:49 PM: bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq .--outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir once it runs successfully. was (Author: sseth): bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq .--outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14793) Allow ptest branch to be specified, PROFILE override
[ https://issues.apache.org/jira/browse/HIVE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507755#comment-15507755 ] Siddharth Seth commented on HIVE-14793: --- bq. 1. Can we create a new function that checks and/or initializes environment variables? I think this would be useful for new devs when looking at what config variables can be used. I don't think we should do this in the current jira. There's a lot more variables other than the ones added here. Beyond that, this may not be a great approach since it separates the logic for processing variables into multiple places. e.g. PTEST_GIT_BRANCH is only required while setting up the branch. I don't think a separate method to take care of this, along with initialization of other methods helps a lot. bq .--outputDir is not necessary anymore. Reverting the change and testing (this still leaves the outputDir, but I believe it's required by the ptest client). Will post a patch after testing with the reverted outputDir. > Allow ptest branch to be specified, PROFILE override > > > Key: HIVE-14793 > URL: https://issues.apache.org/jira/browse/HIVE-14793 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14793.01.patch > > > Post HIVE-14734 - the profile is automatically determined. Add an option to > override this via Jenkins. Also add an option to specify the branch from > which ptest is built (This is hardcoded to github.com/apache/hive) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14461) Investigate HBaseMinimrCliDriver tests
[ https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507714#comment-15507714 ] Siddharth Seth edited comment on HIVE-14461 at 9/20/16 8:32 PM: Yes. Initial plan was to remove it, but looks like the test does not exist in CliDriver - so moving it over. was (Author: sseth): Yes. > Investigate HBaseMinimrCliDriver tests > -- > > Key: HIVE-14461 > URL: https://issues.apache.org/jira/browse/HIVE-14461 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Siddharth Seth > Attachments: HIVE-14461.01.patch > > > during HIVE-1 i've encountered an odd thing: > HBaseMinimrCliDriver only executes single test...and that test is set using > the qfile selector...which looks a out-of-place. > The only test it executes doesn't follow regular qtest file naming...and has > an extension 'm' > At least the file should be renamedbut I think change wasn't > intentional -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests
[ https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507714#comment-15507714 ] Siddharth Seth commented on HIVE-14461: --- Yes. > Investigate HBaseMinimrCliDriver tests > -- > > Key: HIVE-14461 > URL: https://issues.apache.org/jira/browse/HIVE-14461 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Siddharth Seth > Attachments: HIVE-14461.01.patch > > > during HIVE-1 i've encountered an odd thing: > HBaseMinimrCliDriver only executes single test...and that test is set using > the qfile selector...which looks a out-of-place. > The only test it executes doesn't follow regular qtest file naming...and has > an extension 'm' > At least the file should be renamedbut I think change wasn't > intentional -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507644#comment-15507644 ] Sahil Takiar commented on HIVE-14240: - Hey [~Ferd] I haven't had time to look into this, although it shouldn't be particularly difficult (I would hope). I don't think this blocks HIVE-14029 but I'm trying to talk to some Spark committers to see what they think. > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working. The > current Hive Spark Client uses a launch script in the Spark installation to > run Spark jobs. The script basically does some setup work and invokes > org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class > directly, which avoids the need to have a full Spark distribution available > locally (in fact this option already exists, but isn't tested). > There may be other issues around classpath conflicts between Hive and Spark. > For example, Hive and Spark require different versions of Kyro. One solution > to this would be to take Spark artifacts and shade Kyro inside them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507641#comment-15507641 ] Sahil Takiar commented on HIVE-14029: - [~Ferd] how was the http://blog.sundp.me/spark/spark-2.0.0-bin-hadoop2-without-hive.tgz built? I don't think HIVE-14240 is a blocker for this assuming the tar-ball was built in a supported way, but I'm trying to contact some Spark committers to see if they have any input. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
[ https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507612#comment-15507612 ] Sushanth Sowmyan commented on HIVE-13853: - [~sseth], I've been looking at some other tests, and I come to a similar question - given that the runtime is not the actual problem of this test, the problem is the miniHS2 start, which we do need to test that HS2 is able to filter this properly. Things we could do to improve: a) batch the miniHS2 tests together - however, this can be problematic as each test might do a different confOverlay in the beginning b) look into why miniHS2 takes so long to start sometimes. > Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat > - > > Key: HIVE-13853 > URL: https://issues.apache.org/jira/browse/HIVE-13853 > Project: Hive > Issue Type: Bug > Components: HiveServer2, WebHCat >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13853.2.patch, HIVE-13853.patch > > > There is a possibility that there may be a CSRF-based attack on various > hadoop components, and thus, there is an effort to add a block for all > incoming http requests if they do not contain a X-XSRF-Header header. (See > HADOOP-12691 for motivation) > This has potential to affect HS2 when running on thrift-over-http mode(if > cookie-based-auth is used), and webhcat. > We introduce new flags to determine whether or not we're using the filter, > and if we are, we will automatically reject any http requests which do not > contain this header. > To allow this to work, we also need to make changes to our JDBC driver to > automatically inject this header into any requests it makes. Also, any > client-side programs/api not using the JDBC driver directly will need to make > changes to add a X-XSRF-Header header to the request to make calls to > HS2/WebHCat if this filter is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507583#comment-15507583 ] Illya Yalovyy commented on HIVE-14713: -- I have updated Patch and CR with a fixed version. > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-14713: - Status: Open (was: Patch Available) > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-14713: - Attachment: HIVE-14713.2.patch > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14713) LDAP Authentication Provider should be covered with unit tests
[ https://issues.apache.org/jira/browse/HIVE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-14713: - Status: Patch Available (was: Open) > LDAP Authentication Provider should be covered with unit tests > -- > > Key: HIVE-14713 > URL: https://issues.apache.org/jira/browse/HIVE-14713 > Project: Hive > Issue Type: Test > Components: Authentication, Tests >Affects Versions: 2.1.0 >Reporter: Illya Yalovyy >Assignee: Illya Yalovyy > Attachments: HIVE-14713.1.patch, HIVE-14713.2.patch > > > Currently LdapAuthenticationProviderImpl class is not covered with unit > tests. To make this class testable some minor refactoring will be required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14800) Handle off by 3 in ORC split generation based on split strategy used
[ https://issues.apache.org/jira/browse/HIVE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507454#comment-15507454 ] Gopal V commented on HIVE-14800: They are both valid splits, so unlikely to need changes there - the only issue is with external consumption of those details, like using those values as hashcode inputs. We could implement a hashcode directly out of ORC though. > Handle off by 3 in ORC split generation based on split strategy used > > > Key: HIVE-14800 > URL: https://issues.apache.org/jira/browse/HIVE-14800 > Project: Hive > Issue Type: Bug >Reporter: Siddharth Seth > > BI will apparently generate splits starting at offset 0. > ETL will skip the ORC header and generate a split starting at offset 3. > There's a workaround in the HiveSplitGenreator to handle this for consistent > splits. Ideally, Orc split generation should take care of this. > cc [~prasanth_j], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14624) LLAP: Use FQDN when submitting work to LLAP
[ https://issues.apache.org/jira/browse/HIVE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14624: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master > LLAP: Use FQDN when submitting work to LLAP > > > Key: HIVE-14624 > URL: https://issues.apache.org/jira/browse/HIVE-14624 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-14624.01.patch, HIVE-14624.02.patch, > HIVE-14624.03.patch, HIVE-14624.patch > > > {code} > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > + socketAddress.getHostName()); > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java: > host = socketAddress.getHostName(); > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: > public static String getHostName() { > llap-common/src/java/org/apache/hadoop/hive/llap/metrics/MetricsUtils.java: >return InetAddress.getLocalHost().getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > String name = address.getHostName(); > llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java: > builder.setAmHost(address.getHostName()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java: >nodeId = LlapNodeId.getInstance(localAddress.get().getHostName(), > localAddress.get().getPort()); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > localAddress.get().getHostName(), vertex.getDagName(), > qIdProto.getDagIndex(), > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java: > new ExecutionContextImpl(localAddress.get().getHostName()), env, > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapDaemon.java: >String hostName = MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapProtocolServerImpl.java: > .setBindAddress(addr.getHostName()) > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java: > request.getContainerIdString(), executionContext.getHostName(), > vertex.getDagName(), > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >String displayName = "LlapDaemonCacheMetrics-" + > MetricsUtils.getHostName(); > llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java: >displayName = "LlapDaemonIOMetrics-" + MetricsUtils.getHostName(); > llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestLlapDaemonProtocolServerImpl.java: > new LlapProtocolClientImpl(new Configuration(), > serverAddr.getHostName(), > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java: > builder.setAmHost(getAddress().getHostName()); > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java: > String displayName = "LlapTaskSchedulerMetrics-" + > MetricsUtils.getHostName(); > {code} > In systems where the hostnames do not match FQDN, calling the > getCanonicalHostName() will allow for resolution of the hostname when > accessing from a different base domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14461) Investigate HBaseMinimrCliDriver tests
[ https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507343#comment-15507343 ] Prasanth Jayachandran commented on HIVE-14461: -- I don't see test report for HBaseMinimrCliDriver tests. https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/1214/testReport/org.apache.hadoop.hive.cli/ Are we removing these tests altogether from HBaseMinimrCliDriver and running them in HbaseCliDriver? > Investigate HBaseMinimrCliDriver tests > -- > > Key: HIVE-14461 > URL: https://issues.apache.org/jira/browse/HIVE-14461 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Siddharth Seth > Attachments: HIVE-14461.01.patch > > > during HIVE-1 i've encountered an odd thing: > HBaseMinimrCliDriver only executes single test...and that test is set using > the qfile selector...which looks a out-of-place. > The only test it executes doesn't follow regular qtest file naming...and has > an extension 'm' > At least the file should be renamedbut I think change wasn't > intentional -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14680) retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589
[ https://issues.apache.org/jira/browse/HIVE-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507337#comment-15507337 ] Siddharth Seth commented on HIVE-14680: --- Really think we're better off fixing this within Orc itself - instead of working around it in the split generator (which at some point will handle different file types). Can the ORC getSplits not deal with this in BI mode? > retain consistent splits /during/ (as opposed to across) LLAP failures on top > of HIVE-14589 > --- > > Key: HIVE-14680 > URL: https://issues.apache.org/jira/browse/HIVE-14680 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-14680.01.patch, HIVE-14680.02.patch, > HIVE-14680.03.patch, HIVE-14680.patch > > > see HIVE-14589. > Basic idea (spent about 7 minutes thinking about this based on RB comment ;)) > is to return locations for all slots to HostAffinitySplitLocationProvider, > the missing slots being inactive locations (based solely on the last slot > actually present). For the splits mapped to these locations, fall back via > different hash functions, or some sort of probing. > This still doesn't handle all the cases, namely when the last slots are gone > (consistent hashing is supposed to be good for this?); however for that we'd > need more involved coordination between nodes or a central updater to > indicate the number of nodes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507333#comment-15507333 ] Siddharth Seth commented on HIVE-7926: -- bq. In the sentence: “The initial stage of the query is pushed into #LLAP, large shuffle is performed in their own containers” - What does "their own containers" refer to? Is there only one large shuffle, or multiple shuffles? When executing a query, it's possible to launch separate containers (Java processes, fallback to regular Tez execution) to perform the large Shuffles. In many cases, running a Shuffle / Reduce within LLAP may not be beneficial (no caching gains, etc). That said - it's also possible to run these Shuffle/Reduce steps within LLAP itself, and that is the typical case for short running queries. Multiple shuffles are possible. This point primarily talks about where a reduce will run - within the LLAP daemon itself, or as a separate container (process). bq. In the sentence: "The node allows parallel execution for multiple query fragments from different queries and sessions” - what does "the node" refer to? A single LLAP node? Yes - that refers to an LLAP instance. A single LLAP process can handle multiple fragments from different queries, or the same query. > long-lived daemons for query fragment execution, I/O and caching > > > Key: HIVE-7926 > URL: https://issues.apache.org/jira/browse/HIVE-7926 > Project: Hive > Issue Type: New Feature >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: LLAPdesigndocument.pdf > > > We are proposing a new execution model for Hive that is a combination of > existing process-based tasks and long-lived daemons running on worker nodes. > These nodes can take care of efficient I/O, caching and query fragment > execution, while heavy lifting like most joins, ordering, etc. can be handled > by tasks. > The proposed model is not a 2-system solution for small and large queries; > neither it is a separate execution engine like MR or Tez. It can be used by > any Hive execution engine, if support is added; in future even external > products (e.g. Pig) can use it. > The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
[ https://issues.apache.org/jira/browse/HIVE-13853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507317#comment-15507317 ] Siddharth Seth commented on HIVE-13853: --- The test added in this patch - TestXSRFFilter - runs for close to 20 minutes, about 2 minutes of that is actual run time, the rest is setup time. Is there anyway to make this faster? > Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat > - > > Key: HIVE-13853 > URL: https://issues.apache.org/jira/browse/HIVE-13853 > Project: Hive > Issue Type: Bug > Components: HiveServer2, WebHCat >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13853.2.patch, HIVE-13853.patch > > > There is a possibility that there may be a CSRF-based attack on various > hadoop components, and thus, there is an effort to add a block for all > incoming http requests if they do not contain a X-XSRF-Header header. (See > HADOOP-12691 for motivation) > This has potential to affect HS2 when running on thrift-over-http mode(if > cookie-based-auth is used), and webhcat. > We introduce new flags to determine whether or not we're using the filter, > and if we are, we will automatically reject any http requests which do not > contain this header. > To allow this to work, we also need to make changes to our JDBC driver to > automatically inject this header into any requests it makes. Also, any > client-side programs/api not using the JDBC driver directly will need to make > changes to add a X-XSRF-Header header to the request to make calls to > HS2/WebHCat if this filter is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14461) Investigate HBaseMinimrCliDriver tests
[ https://issues.apache.org/jira/browse/HIVE-14461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-14461: - Assignee: Siddharth Seth > Investigate HBaseMinimrCliDriver tests > -- > > Key: HIVE-14461 > URL: https://issues.apache.org/jira/browse/HIVE-14461 > Project: Hive > Issue Type: Sub-task > Components: Tests >Reporter: Zoltan Haindrich >Assignee: Siddharth Seth > Attachments: HIVE-14461.01.patch > > > during HIVE-1 i've encountered an odd thing: > HBaseMinimrCliDriver only executes single test...and that test is set using > the qfile selector...which looks a out-of-place. > The only test it executes doesn't follow regular qtest file naming...and has > an extension 'm' > At least the file should be renamedbut I think change wasn't > intentional -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14651) Add a local cluster for Tez and LLAP
[ https://issues.apache.org/jira/browse/HIVE-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14651: -- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the reviews. > Add a local cluster for Tez and LLAP > > > Key: HIVE-14651 > URL: https://issues.apache.org/jira/browse/HIVE-14651 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 2.2.0 > > Attachments: HIVE-14651.01.patch, HIVE-14651.02.patch, > HIVE-14651.03.patch, HIVE-14651.04.patch, HIVE-14651.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anbu Cheeralan updated HIVE-14798: -- Description: MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Hive Logs: 2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException 2016-09-20T17:28:00,717 WARN [HiveServer2-Background-Pool: Thread-92]: exec.DDLTask (:()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109) at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:432) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:418) ... 4 more Here are the steps to recreate this issue: use default; DROP TABLE IF EXISTS repairtable; CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING); MSCK REPAIR TABLE default.repairtable; was: MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Hive Logs: 2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException 2016-09-20T17:28:00,717 WARN [HiveServer2-Background-Pool: Thread-92]: exec.DDLTask (:()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309) at
[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anbu Cheeralan updated HIVE-14798: -- Description: MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Hive Logs: 2016-09-20T17:28:00,717 ERROR [HiveServer2-Background-Pool: Thread-92]: metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException 2016-09-20T17:28:00,717 WARN [HiveServer2-Background-Pool: Thread-92]: exec.DDLTask (:()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109) at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1077) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:90) at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:312) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:432) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$1.call(HiveMetaStoreChecker.java:418) ... 4 more Here are the steps to recreate this issue: use default DROP TABLE IF EXISTS repairtable CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) MSCK REPAIR TABLE default.repairtable was: MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask" Hive Logs: 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - java.lang.NullPointerException 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (: ()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 Here are the steps to recreate this issue: use default DROP TABLE IF EXISTS repairtable CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) MSCK REPAIR TABLE default.repairtable > MSCK REPAIR TABLE throws null pointer exception >
[jira] [Resolved] (HIVE-14787) Ability to access DistributedCache from UDFs via Java API
[ https://issues.apache.org/jira/browse/HIVE-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Bystrov resolved HIVE-14787. - Resolution: Invalid > Ability to access DistributedCache from UDFs via Java API > - > > Key: HIVE-14787 > URL: https://issues.apache.org/jira/browse/HIVE-14787 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: 1.1.0+cdh5.7.1 >Reporter: Ilya Bystrov > > I'm trying to create custom function > {{create function geoip as 'some.package.UDFGeoIp' using jar > 'hdfs:///user/hive/ext/HiveGeoIP.jar', file > 'hdfs:///user/hive/ext/GeoIP.dat';}} > According to https://issues.apache.org/jira/browse/HIVE-1016 > I should be able to access file via {{new File("./GeoIP.dat");}} (in > overridden method {{GenericUDF#evaluate(DeferredObject[] arguments)}}) > But this doesn't work. > I use the following workaround, but it's ugly: > {code} > CodeSource codeSource = > GenericUDFGeoIP.class.getProtectionDomain().getCodeSource(); > File jarFile = new File(codeSource.getLocation().toURI().getPath()); > String jarDir = jarFile.getParentFile().getPath(); > File actualFile = new File(jarDir + "/GeoIP.dat"); > {code} > UPDATE: > It looks like I should use {{ClassLoader#getSystemResource(String resource)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14787) Ability to access DistributedCache from UDFs via Java API
[ https://issues.apache.org/jira/browse/HIVE-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Bystrov updated HIVE-14787: Description: I'm trying to create custom function {{create function geoip as 'some.package.UDFGeoIp' using jar 'hdfs:///user/hive/ext/HiveGeoIP.jar', file 'hdfs:///user/hive/ext/GeoIP.dat';}} According to https://issues.apache.org/jira/browse/HIVE-1016 I should be able to access file via {{new File("./GeoIP.dat");}} (in overridden method {{GenericUDF#evaluate(DeferredObject[] arguments)}}) But this doesn't work. I use the following workaround, but it's ugly: {code} CodeSource codeSource = GenericUDFGeoIP.class.getProtectionDomain().getCodeSource(); File jarFile = new File(codeSource.getLocation().toURI().getPath()); String jarDir = jarFile.getParentFile().getPath(); File actualFile = new File(jarDir + "/GeoIP.dat"); {code} UPDATE: It looks like I should use {{ClassLoader#getSystemResource(String resource)}} was: I'm trying to create custom function {{create function geoip as 'some.package.UDFGeoIp' using jar 'hdfs:///user/hive/ext/HiveGeoIP.jar', file 'hdfs:///user/hive/ext/GeoIP.dat';}} According to https://issues.apache.org/jira/browse/HIVE-1016 I should be able to access file via {{new File("./GeoIP.dat");}} (in overridden method {{GenericUDF#evaluate(DeferredObject[] arguments)}}) But this doesn't work. I use the following workaround, but it's ugly: {code} CodeSource codeSource = GenericUDFGeoIP.class.getProtectionDomain().getCodeSource(); File jarFile = new File(codeSource.getLocation().toURI().getPath()); String jarDir = jarFile.getParentFile().getPath(); File actualFile = new File(jarDir + "/GeoIP.dat"); {code} > Ability to access DistributedCache from UDFs via Java API > - > > Key: HIVE-14787 > URL: https://issues.apache.org/jira/browse/HIVE-14787 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: 1.1.0+cdh5.7.1 >Reporter: Ilya Bystrov > > I'm trying to create custom function > {{create function geoip as 'some.package.UDFGeoIp' using jar > 'hdfs:///user/hive/ext/HiveGeoIP.jar', file > 'hdfs:///user/hive/ext/GeoIP.dat';}} > According to https://issues.apache.org/jira/browse/HIVE-1016 > I should be able to access file via {{new File("./GeoIP.dat");}} (in > overridden method {{GenericUDF#evaluate(DeferredObject[] arguments)}}) > But this doesn't work. > I use the following workaround, but it's ugly: > {code} > CodeSource codeSource = > GenericUDFGeoIP.class.getProtectionDomain().getCodeSource(); > File jarFile = new File(codeSource.getLocation().toURI().getPath()); > String jarDir = jarFile.getParentFile().getPath(); > File actualFile = new File(jarDir + "/GeoIP.dat"); > {code} > UPDATE: > It looks like I should use {{ClassLoader#getSystemResource(String resource)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp
[ https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507121#comment-15507121 ] Hive QA commented on HIVE-14412: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829401/HIVE-14412.5.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 10563 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_like] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join43] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[serde_regex] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[serde_regex] org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testCliDriver[serde_regex] org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_timestamp] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_1] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_4] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_5] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[invalid_cast_from_binary_6] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[serde_regex2] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[serde_regex3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[serde_regex] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[wrong_column_type] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.spark.client.TestSparkClient.testJobSubmission {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1243/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1243/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1243/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 23 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829401 - PreCommit-HIVE-Build > Add a timezone-aware timestamp > -- > > Key: HIVE-14412 > URL: https://issues.apache.org/jira/browse/HIVE-14412 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, > HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch, HIVE-14412.5.patch > > > Java's Timestamp stores the time elapsed since the epoch. While it's by > itself unambiguous, ambiguity comes when we parse a string into timestamp, or > convert a timestamp to string, causing problems like HIVE-14305. > To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-9423: - Attachment: HIVE-9423.patch THRIFT-2046 handles connection overflow with closing the connection without writing any data into it. See: {code} if (t instanceof RejectedExecutionException) { retryCount++; try { if (remainTimeInMillis > 0) { //do a truncated 20 binary exponential backoff sleep [..] } else { client.close(); wp = null; LOGGER.warn("Task has been rejected by ExecutorService " + retryCount + " times till timedout, reason: " + t); break; } } {code} On the client side this generates a TTransportException in TIOStreamTransport.java with specific types, which helps us to differentiate between the cases. So in the proposed solution we could print different error message, when the connection pool is exhausted, and when the connection is not available, etc. What do you think about the proposed solution [~aihuaxu], [~ctang.ma], [~ngangam]? > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > Attachments: HIVE-9423.patch > > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14797) reducer number estimating may lead to data skew
[ https://issues.apache.org/jira/browse/HIVE-14797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506967#comment-15506967 ] Xuefu Zhang commented on HIVE-14797: This seems making sense, but can we not hard code the number (31)? > reducer number estimating may lead to data skew > --- > > Key: HIVE-14797 > URL: https://issues.apache.org/jira/browse/HIVE-14797 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: roncenzhao >Assignee: roncenzhao > Attachments: HIVE-14797.patch > > > HiveKey's hash code is generated by multipling by 31 key by key which is > implemented in method `ObjectInspectorUtils.getBucketHashCode()`: > for (int i = 0; i < bucketFields.length; i++) { > int fieldHash = ObjectInspectorUtils.hashCode(bucketFields[i], > bucketFieldInspectors[i]); > hashCode = 31 * hashCode + fieldHash; > } > The follow example will lead to data skew: > I hava two table called tbl1 and tbl2 and they have the same column: a int, b > string. The values of column 'a' in both two tables are not skew, but values > of column 'b' in both two tables are skew. > When my sql is "select * from tbl1 join tbl2 on tbl1.a=tbl2.a and > tbl1.b=tbl2.b" and the estimated reducer number is 31, it will lead to data > skew. > As we know, the HiveKey's hash code is generated by `hash(a)*31 + hash(b)`. > When reducer number is 31 the reducer No. of each row is `hash(b)%31`. In the > result, the job will be skew. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506942#comment-15506942 ] Hive QA commented on HIVE-14029: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12829398/HIVE-14029.1.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 10556 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket4] org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket5] org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[disable_merge_for_bucketing] org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[list_bucket_dml_10] org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[reduce_deduplicate] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket4] org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[disable_merge_for_bucketing] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1242/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1242/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1242/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 23 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12829398 - PreCommit-HIVE-Build > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14775) Investigate IOException usage in Metrics APIs
[ https://issues.apache.org/jira/browse/HIVE-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-14775: --- Description: A large number of metrics APIs seem to declare to throw IOExceptions needlessly. (incrementCounter, decrementCounter etc.) This is not only misleading but it fills up the code with unnecessary catch blocks never to be reached. We should investigate if these exceptions are thrown at all, and remove them if it is truly unused. was: A large number of metrics APIs seems to declare to throw IOExceptions needlessly. (incrementCounter, decrementCounter etc.) This is not only misleading but it fills up the code with unnecessary catch blocks never to be reached. We should investigate if these exceptions are thrown at all, and remove them if it is truly unused. > Investigate IOException usage in Metrics APIs > - > > Key: HIVE-14775 > URL: https://issues.apache.org/jira/browse/HIVE-14775 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2, Metastore >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > A large number of metrics APIs seem to declare to throw IOExceptions > needlessly. (incrementCounter, decrementCounter etc.) > This is not only misleading but it fills up the code with unnecessary catch > blocks never to be reached. > We should investigate if these exceptions are thrown at all, and remove them > if it is truly unused. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-9423: Assignee: Peter Vary > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta >Assignee: Peter Vary > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506839#comment-15506839 ] Rui Li commented on HIVE-14029: --- My understanding is spark needs hive libraries only for SparkSQL, which is not needed for HoS. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506833#comment-15506833 ] Rui Li commented on HIVE-14029: --- bq. some APIs are changed in Spark side Is there any other way we can track the read method? If not, guess we can just remove the class from Hive side. bq. Hive2 itests will use Spark2 assembly to run Hive2 tests. This means Hive2 might not test Spark2 correctly due to the lack of Hive 1.2 libraries in it. I'm not sure what problem spark has without hive libraries. We have been requiring that spark is built without hive. Otherwise we'll have different hive libraries in our classpath which causes conflicts. I don't think HIVE-14240 blocks this one. Actually HIVE-14240 should be implemented for Spark 2.0 right? > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-9530) constant * column is null interpreted as constant * boolean
[ https://issues.apache.org/jira/browse/HIVE-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-9530 started by Miklos Csanady. > constant * column is null interpreted as constant * boolean > --- > > Key: HIVE-9530 > URL: https://issues.apache.org/jira/browse/HIVE-9530 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Miklos Csanady >Priority: Minor > > {code} > select c1 from tversion where 1 * cnnull is null > FAILED: SemanticException [Error 10014]: Line 1:30 Wrong arguments 'cnnull': > No matching method for class > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply with (int, boolean) > create table if not exists TVERSION ( > RNUM int, > C1 int, > CVER char(6), > CNNULL int, > CCNULL char(1) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anbu Cheeralan updated HIVE-14798: -- Description: MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask" Hive Logs: 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - java.lang.NullPointerException 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (: ()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 Here are the steps to recreate this issue: use default DROP TABLE IF EXISTS repairtable CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) MSCK REPAIR TABLE default.repairtable was: MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask" Hive Logs: 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - java.lang.NullPointerException 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (: ()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 Here are the steps to recreate this issue: use default DROP TABLE IF EXISTS repairtable CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) MSCK REPAIR TABLE default.repairtable > MSCK REPAIR TABLE throws null pointer exception > --- > > Key: HIVE-14798 > URL: https://issues.apache.org/jira/browse/HIVE-14798 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.0 >Reporter: Anbu Cheeralan > > MSCK REPAIR TABLE statement throws null pointer exception in Hive 2.1 > I have tested the same against external/internal tables created both in HDFS > and in Google Cloud. > The error shown in beeline/sql client > "FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask" > Hive Logs: > 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - > java.lang.NullPointerException > 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (: ()) - Failed to run > metacheck: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 > Here are the steps to recreate this issue: > use default > DROP TABLE IF EXISTS repairtable > CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) > MSCK REPAIR TABLE default.repairtable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506772#comment-15506772 ] Sergio Peña commented on HIVE-14029: Sure. [~stakiar] Let me know if the below statements are correct, and feel free to correct me. - Spark2 uses a fork of Hive 1.2 due to issues with Apache Hive. They called this project {{spark-hive}}. Spark only uses Hive 1.2 metastore/serde/udf jars form this forked project. They download this from https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 - Spark2 assembly without hive will be built without any of the above dependencies. - Hive2 itests will use Spark2 assembly to run Hive2 tests. This means Hive2 might not test Spark2 correctly due to the lack of Hive 1.2 libraries in it. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14798) MSCK REPAIR TABLE throws null pointer exception
[ https://issues.apache.org/jira/browse/HIVE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anbu Cheeralan updated HIVE-14798: -- Description: MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask" Hive Logs: 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - java.lang.NullPointerException 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (: ()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 Here are the steps to recreate this issue: use default DROP TABLE IF EXISTS repairtable CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) MSCK REPAIR TABLE default.repairtable was: MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1 I have tested the same against external/internal tables created both in HDFS and in Google Cloud. The error shown in beeline/sql client "FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask" Hive Logs: 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (:()) - java.lang.NullPointerException 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (:()) - Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 Here are the steps to recreate this issue: use default DROP TABLE IF EXISTS repairtable CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) MSCK REPAIR TABLE default.repairtable > MSCK REPAIR TABLE throws null pointer exception > --- > > Key: HIVE-14798 > URL: https://issues.apache.org/jira/browse/HIVE-14798 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.1.0 >Reporter: Anbu Cheeralan > > MSKC REPAIR TABLE statement throws null pointer exception in Hive 2.1 > I have tested the same against external/internal tables created both in HDFS > and in Google Cloud. > The error shown in beeline/sql client > "FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask" > Hive Logs: > 2016-09-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (: ()) - > java.lang.NullPointerException > 2016-09-14T04:08:02,434 WARN [main]: exec.DDLTask (: ()) - Failed to run > metacheck: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448 > Here are the steps to recreate this issue: > use default > DROP TABLE IF EXISTS repairtable > CREATE TABLE repairtable(col STRING) PARTITIONED BY (p1 STRING, p2 STRING) > MSCK REPAIR TABLE default.repairtable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506789#comment-15506789 ] Dmitry Zagorulkin commented on HIVE-1555: - Actually it works fine with mysql and postgres. For oracle it does not work. See here: http://stackoverflow.com/questions/39594861/creating-sparksql-jdbc-federation-with-oracle-db-table-fails-with-strange-error#comment66500144_39594861 > JDBC Storage Handler > > > Key: HIVE-1555 > URL: https://issues.apache.org/jira/browse/HIVE-1555 > Project: Hive > Issue Type: New Feature > Components: JDBC >Reporter: Bob Robertson >Assignee: Teddy Choi > Attachments: JDBCStorageHandler Design Doc.pdf > > Original Estimate: 24h > Remaining Estimate: 24h > > With the Cassandra and HBase Storage Handlers I thought it would make sense > to include a generic JDBC RDBMS Storage Handler so that you could import a > standard DB table into Hive. Many people must want to perform HiveQL joins, > etc against tables in other systems etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9530) constant * column is null interpreted as constant * boolean
[ https://issues.apache.org/jira/browse/HIVE-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Csanady reassigned HIVE-9530: Assignee: Miklos Csanady > constant * column is null interpreted as constant * boolean > --- > > Key: HIVE-9530 > URL: https://issues.apache.org/jira/browse/HIVE-9530 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Assignee: Miklos Csanady >Priority: Minor > > {code} > select c1 from tversion where 1 * cnnull is null > FAILED: SemanticException [Error 10014]: Line 1:30 Wrong arguments 'cnnull': > No matching method for class > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply with (int, boolean) > create table if not exists TVERSION ( > RNUM int, > C1 int, > CVER char(6), > CNNULL int, > CCNULL char(1) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9530) constant * column is null interpreted as constant * boolean
[ https://issues.apache.org/jira/browse/HIVE-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506778#comment-15506778 ] Miklos Csanady commented on HIVE-9530: -- Use brackets to prevent this error. Your command it is interpreted like this: {code} select c1 from tversion where 1 * ( cnull is null ); {code} You should use this instead: {code} select c1 from tversion where ( 1 * cnnull ) is null; OK +-+ | c1 | +-+ +-+ {code} > constant * column is null interpreted as constant * boolean > --- > > Key: HIVE-9530 > URL: https://issues.apache.org/jira/browse/HIVE-9530 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.14.0 >Reporter: N Campbell >Priority: Minor > > {code} > select c1 from tversion where 1 * cnnull is null > FAILED: SemanticException [Error 10014]: Line 1:30 Wrong arguments 'cnnull': > No matching method for class > org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPMultiply with (int, boolean) > create table if not exists TVERSION ( > RNUM int, > C1 int, > CVER char(6), > CNNULL int, > CCNULL char(1) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE ; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14412) Add a timezone-aware timestamp
[ https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-14412: -- Attachment: HIVE-14412.5.patch > Add a timezone-aware timestamp > -- > > Key: HIVE-14412 > URL: https://issues.apache.org/jira/browse/HIVE-14412 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-14412.1.patch, HIVE-14412.2.patch, > HIVE-14412.3.patch, HIVE-14412.4.patch, HIVE-14412.5.patch, HIVE-14412.5.patch > > > Java's Timestamp stores the time elapsed since the epoch. While it's by > itself unambiguous, ambiguity comes when we parse a string into timestamp, or > convert a timestamp to string, causing problems like HIVE-14305. > To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14240) HoS itests shouldn't depend on a Spark distribution
[ https://issues.apache.org/jira/browse/HIVE-14240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506745#comment-15506745 ] Ferdinand Xu commented on HIVE-14240: - Hi [~stakiar], do you have any updates for this ticket? I am trying to move HIVE-14029 forwards. Thanks, Ferd > HoS itests shouldn't depend on a Spark distribution > --- > > Key: HIVE-14240 > URL: https://issues.apache.org/jira/browse/HIVE-14240 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The HoS integration tests download a full Spark Distribution (a tar-ball) > from CloudFront. It uses this distribution to run Spark locally. It runs a > few tests with Spark in embedded mode, and some tests against a local Spark > on YARN cluster. The {{itests/pom.xml}} actually contains scripts to download > the tar-ball from a pre-defined location. > This is problematic because the Spark Distribution shades all its > dependencies, including Hadoop dependencies. This can cause problems when > upgrading the Hadoop version for Hive (ref: HIVE-13930). > Removing it will also avoid having to download the tar-ball during every > build, and simplify the build process for the itests module. > The Hive itests should instead directly depend on Spark artifacts published > in Maven Central. It will require some effort to get this working. The > current Hive Spark Client uses a launch script in the Spark installation to > run Spark jobs. The script basically does some setup work and invokes > org.apache.spark.deploy.SparkSubmit. It is possible to invoke this class > directly, which avoids the need to have a full Spark distribution available > locally (in fact this option already exists, but isn't tested). > There may be other issues around classpath conflicts between Hive and Spark. > For example, Hive and Spark require different versions of Kyro. One solution > to this would be to take Spark artifacts and shade Kyro inside them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506738#comment-15506738 ] Ferdinand Xu commented on HIVE-14029: - Hi [~spena], I am not quite sure why assembly tar.gz will cause issues for Hive since it's included in Hive itest only. Could you explain a little bit more? BTW, I will take a look at how to remove it from itest. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12266) When client exists abnormally, it doesn't release ACID locks
[ https://issues.apache.org/jira/browse/HIVE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506736#comment-15506736 ] Chaoyu Tang commented on HIVE-12266: [~wzheng], To my understand, this JIRA could only address the issue for CLI, but not for beeline. Since the shutdown hook could only be invoked when JVM running the Driver is shutdown, in beeline case, the beeline is running in a different JVM from HS2. Is that correct? > When client exists abnormally, it doesn't release ACID locks > > > Key: HIVE-12266 > URL: https://issues.apache.org/jira/browse/HIVE-12266 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-12266.1.patch, HIVE-12266.2.patch, > HIVE-12266.3.patch, HIVE-12266.branch-1.patch > > > if you start Hive CLI (locking enabled) and run some command that acquires > locks and ^C the shell before command completes the locks for the command > remain until they timeout. > I believe Beeline has the same issue. > Need to add proper hooks to release locks when command dies. (As much as > possible) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506716#comment-15506716 ] Sergio Peña commented on HIVE-14029: [~Ferd] I think we should try to fix HIVE-14240 first to avoid dependencies issues when Spark and Hive are running in the same machine. I talked with the Spark team a few times, and they think this assembly tar.gz will cause issues due to other Hive libraries Spark depends, such as Hive 1.2 metastore and Hive 1.2 serde. Would you like to start working on HIVE-14240? You can ask [~stakiar] if you're interested. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9423) HiveServer2: Implement some admission control mechanism for graceful degradation when resources are exhausted
[ https://issues.apache.org/jira/browse/HIVE-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506729#comment-15506729 ] Peter Vary commented on HIVE-9423: -- I have investigated the issue, and here is what I found: - There was an issue in the Thrift code, that if there was not enough executor, then the TTheadPoolExecutor stuck in an infinite loop see: THRIFT-2046. This issue is resolved in Thrift 0.9.2 - Hive 1.x, 2.x uses Thrift 0.9.3. I have tested the behavior on Hive 2.2.0-SNAPSHOT with the following configuration: - Add the following lines to hive-site.xml: {code} hive.server2.thrift.max.worker.threads 1 hive.server2.thrift.min.worker.threads 1 {code} - Start a metastore, and a HS2 instance - Start 2 BeeLine, and connect to the HS2 The 1st BeeLine connected as expected, the 2nd BeeLine after the configured timeout period (default 20s) printed out the following: {code} Connecting to jdbc:hive2://localhost:1 16/09/20 16:23:57 [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:1 HS2 may be unavailable, check server status Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:1: null (state=08S01,code=0) Beeline version 2.2.0-SNAPSHOT by Apache Hive beeline> {code} This is behavior is much better than the original problem (no HS2 restart is needed, and closing unused connections helps), but this is not a perfect solution, since there is no difference between a non-running HS2, and a HS2 with exhausted executor pool. > HiveServer2: Implement some admission control mechanism for graceful > degradation when resources are exhausted > - > > Key: HIVE-9423 > URL: https://issues.apache.org/jira/browse/HIVE-9423 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0 >Reporter: Vaibhav Gumashta > > An example of where it is needed: it has been reported that when # of client > connections is greater than {{hive.server2.thrift.max.worker.threads}}, > HiveServer2 stops accepting new connections and ends up having to be > restarted. This should be handled more gracefully by the server and the JDBC > driver, so that the end user gets aware of the problem and can take > appropriate steps (either close existing connections or bump of the config > value or use multiple server instances with dynamic service discovery > enabled). Similarly, we should also review the behaviour of background thread > pool to have a well defined behavior on the the pool getting exhausted. > Ideally implementing some form of general admission control will be a better > solution, so that we do not accept new work unless sufficient resources are > available and display graceful degradation under overload. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-14029: Attachment: HIVE-14029.1.patch https://builds.apache.org/job/PreCommit-HIVE-Build/1239 was failed. Retest the patch. > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.1.patch, HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14029) Update Spark version to 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-14029: Attachment: (was: HIVE-14029.1.patch) > Update Spark version to 2.0.0 > - > > Key: HIVE-14029 > URL: https://issues.apache.org/jira/browse/HIVE-14029 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > Attachments: HIVE-14029.patch > > > There are quite some new optimizations in Spark 2.0.0. We need to bump up > Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10701) Escape apostrophe not work properly
[ https://issues.apache.org/jira/browse/HIVE-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506651#comment-15506651 ] Miklos Csanady edited comment on HIVE-10701 at 9/20/16 2:11 PM: According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string "Hive uses C-style escaping within the strings." 1: jdbc:hive2://> select 's''2' ; OK +--+ | _c0 | +--+ | s2 | +--+ 1 row selected (0.051 seconds) 1: jdbc:hive2://> select 's\'2' ; OK +--+ | _c0 | +--+ | s'2 | +--+ IMHO the '' is not meant to be an escape-sequence, but two string literals. was (Author: mcsanady): According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-string "Hive uses C-style escaping within the strings." 1: jdbc:hive2://> select 's''2' ; OK +--+ | _c0 | +--+ | s2 | +--+ 1 row selected (0.051 seconds) 1: jdbc:hive2://> select 's\'2' ; OK +--+ | _c0 | +--+ | s'2 | +--+ IMHO the '' is not meant to be an escape-sequence, but two string literals. > Escape apostrophe not work properly > --- > > Key: HIVE-10701 > URL: https://issues.apache.org/jira/browse/HIVE-10701 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.12.0, 0.13.0, 0.14.0 >Reporter: Tracy Y >Assignee: Miklos Csanady >Priority: Minor > > SELECT 'S''2' FROM table return S2 instead of S'2 > The apostrophe suppose to be escaped by the single quote in front. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)