[jira] [Commented] (HIVE-10384) RetryingMetaStoreClient does not retry wrapped TTransportExceptions
[ https://issues.apache.org/jira/browse/HIVE-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507336#comment-14507336 ] Chaoyu Tang commented on HIVE-10384: The test failures are not related to the patch. RetryingMetaStoreClient does not retry wrapped TTransportExceptions --- Key: HIVE-10384 URL: https://issues.apache.org/jira/browse/HIVE-10384 Project: Hive Issue Type: Bug Components: Clients Reporter: Eric Liang Assignee: Chaoyu Tang Attachments: HIVE-10384.1.patch, HIVE-10384.patch This bug is very similar to HIVE-9436, in that a TTransportException wrapped in a MetaException will not be retried. RetryingMetaStoreClient has a block of code above the MetaException handler that retries thrift exceptions, but this doesn't work when the exception is wrapped. {code} if ((e.getCause() instanceof TApplicationException) || (e.getCause() instanceof TProtocolException) || (e.getCause() instanceof TTransportException)) { caughtException = (TException) e.getCause(); } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches((?s).*JDO[a-zA-Z]*Exception.*)) { caughtException = (MetaException) e.getCause(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Attachment: Proposal-rscompressor.pdf Design document explaining the changes for ResultSet compressor architecture Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Labels: patch Attachments: Proposal-rscompressor.pdf This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5850) Multiple table join error for avro
[ https://issues.apache.org/jira/browse/HIVE-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507323#comment-14507323 ] Miguel Romero commented on HIVE-5850: - I have just found with the same problem. Is there a workaround? Will it be solved in any version? Multiple table join error for avro --- Key: HIVE-5850 URL: https://issues.apache.org/jira/browse/HIVE-5850 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Shengjun Xin Attachments: part.tar.gz, partsupp.tar.gz, schema.tar.gz Reproduce step: {code} -- Create table Part. CREATE EXTERNAL TABLE part ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://hostname/user/hadoop/tpc-h/data/part' TBLPROPERTIES ('avro.schema.url'='hdfs://hostname/user/hadoop/tpc-h/schema/part.avsc'); -- Create table Part Supplier. CREATE EXTERNAL TABLE partsupp ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://hostname/user/hadoop/tpc-h/data/partsupp' TBLPROPERTIES ('avro.schema.url'='hdfs://hostname/user/hadoop/tpc-h/schema/partsupp.avsc'); --- Query select * from partsupp ps join part p on ps.ps_partkey = p.p_partkey where p.p_partkey=1; {code} {code} Error message is: Error: java.io.IOException: java.io.IOException: org.apache.avro.AvroTypeException: Found { type : record, name : partsupp, namespace : com.gs.sdst.pl.avro.tpch, fields : [ { name : ps_partkey, type : long }, { name : ps_suppkey, type : long }, { name : ps_availqty, type : long }, { name : ps_supplycost, type : double }, { name : ps_comment, type : string }, { name : systimestamp, type : long } ] }, expecting { type : record, name : part, namespace : com.gs.sdst.pl.avro.tpch, fields : [ { name : p_partkey, type : long }, { name : p_name, type : string }, { name : p_mfgr, type : string }, { name : p_brand, type : string }, { name : p_type, type : string }, { name : p_size, type : int }, { name : p_container, type : string }, { name : p_retailprice, type : double }, { name : p_comment, type : string }, { name : systimestamp, type : long } ] } at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10439) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia resolved HIVE-10439. --- Resolution: Duplicate Architecture for ResultSet Compression via external plugin --- Key: HIVE-10439 URL: https://issues.apache.org/jira/browse/HIVE-10439 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Labels: patch This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Attachment: TestingIntegerCompression.pdf Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Labels: patch Attachments: Proposal-rscompressor.pdf, TestingIntegerCompression.pdf This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507383#comment-14507383 ] Ashutosh Chauhan commented on HIVE-10416: - I like new patch since it projects only needed columns while generating Sel Op as oppose to adding unnecessary SelOp at the top. [~jpullokkaran] what do you think? CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10440) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia resolved HIVE-10440. --- Resolution: Duplicate Architecture for ResultSet Compression via external plugin --- Key: HIVE-10440 URL: https://issues.apache.org/jira/browse/HIVE-10440 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Labels: patch This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9824) LLAP: Native Vectorization of Map Join so previously CPU bound queries shift their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)
[ https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507257#comment-14507257 ] Hive QA commented on HIVE-9824: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727207/HIVE-9824.09.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 8750 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3527/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3527/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3527/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727207 - PreCommit-HIVE-TRUNK-Build LLAP: Native Vectorization of Map Join so previously CPU bound queries shift their bottleneck to I/O and make it possible for the rest of LLAP to shine ;) -- Key: HIVE-9824 URL: https://issues.apache.org/jira/browse/HIVE-9824 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, HIVE-9824.08.patch, HIVE-9824.09.patch Today's VectorMapJoinOperator is a pass-through that converts each row from a vectorized row batch in a Java Object[] row and passes it to the MapJoinOperator superclass. This enhancement creates specialized vectorized map join operator classes that are optimized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10417) Parallel Order By return wrong results for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-10417: - Component/s: Query Processor Parallel Order By return wrong results for partitioned tables - Key: HIVE-10417 URL: https://issues.apache.org/jira/browse/HIVE-10417 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0, 0.13.1, 1.0.0 Reporter: Nemon Lou Assignee: Nemon Lou Attachments: HIVE-10417.patch Following is the script that reproduce this bug. set hive.optimize.sampling.orderby=true; set mapreduce.job.reduces=10; select * from src order by key desc limit 10; +--++ | src.key | src.value | +--++ | 98 | val_98 | | 98 | val_98 | | 97 | val_97 | | 97 | val_97 | | 96 | val_96 | | 95 | val_95 | | 95 | val_95 | | 92 | val_92 | | 90 | val_90 | | 90 | val_90 | +--++ 10 rows selected (47.916 seconds) reset; create table src_orc_p (key string ,value string ) partitioned by (kp string) stored as orc tblproperties(orc.compress=SNAPPY); set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions.pernode=1; set hive.exec.max.dynamic.partitions=1; insert into table src_orc_p partition(kp) select *,substring(key,1) from src distribute by substring(key,1); set mapreduce.job.reduces=10; set hive.optimize.sampling.orderby=true; select * from src_orc_p order by key desc limit 10; ++--+-+ | src_orc_p.key | src_orc_p.value | src_orc_p.kend | ++--+-+ | 0 | val_0| 0 | | 0 | val_0| 0 | | 0 | val_0| 0 | ++--+-+ 3 rows selected (39.861 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507138#comment-14507138 ] Hive QA commented on HIVE-10416: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727191/HIVE-10416.01.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3526/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3526/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3526/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727191 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Attachment: (was: HIVE-10165.0.patch) Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Fix For: 1.2.0 h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation Our changes do not break the existing API contracts. Instead our approach has been to consider the functionality offered by the existing API and our proposed API as fulfilling separate and distinct use-cases. The existing API is primarily focused on the task of continuously writing large volumes of new data into a Hive table for near-immediate analysis. Our use-case however, is concerned more with the frequent but not continuous ingestion of mutations to a Hive table from some ETL merge process. Consequently we feel it is justifiable to add our new functionality via an alternative set of public interfaces and leave the existing API as is. This keeps both APIs clean and focused at the expense of presenting additional options to potential users. Wherever possible, shared implementation concerns have been factored out into abstract base classes that are open to third-party extension. A detailed breakdown of the changes is as follows: * We've introduced a public {{RecordMutator}} interface whose purpose is to expose insert/update/delete operations to the user. This is a counterpart to the write-only {{RecordWriter}}. We've also factored out life-cycle methods common to these two interfaces into a super {{RecordOperationWriter}} interface. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. * The introduction of {{RecordMutator}} requires that insert/update/delete operations are then also exposed on a {{TransactionBatch}} type. We've done this with the introduction of a public {{MutatorTransactionBatch}} interface which is a counterpart to the write-only {{TransactionBatch}}. We've also factored out life-cycle methods common to these two interfaces into a super {{BaseTransactionBatch}} interface. * Functionality that would be shared by implementations of both {{RecordWriters}} and {{RecordMutators}} has been factored out of {{AbstractRecordWriter}} into a new abstract base class {{AbstractOperationRecordWriter}}.
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Attachment: HIVE-10165.0.patch Updated patch. Includes tests. Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Fix For: 1.2.0 Attachments: HIVE-10165.0.patch h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation Our changes do not break the existing API contracts. Instead our approach has been to consider the functionality offered by the existing API and our proposed API as fulfilling separate and distinct use-cases. The existing API is primarily focused on the task of continuously writing large volumes of new data into a Hive table for near-immediate analysis. Our use-case however, is concerned more with the frequent but not continuous ingestion of mutations to a Hive table from some ETL merge process. Consequently we feel it is justifiable to add our new functionality via an alternative set of public interfaces and leave the existing API as is. This keeps both APIs clean and focused at the expense of presenting additional options to potential users. Wherever possible, shared implementation concerns have been factored out into abstract base classes that are open to third-party extension. A detailed breakdown of the changes is as follows: * We've introduced a public {{RecordMutator}} interface whose purpose is to expose insert/update/delete operations to the user. This is a counterpart to the write-only {{RecordWriter}}. We've also factored out life-cycle methods common to these two interfaces into a super {{RecordOperationWriter}} interface. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. * The introduction of {{RecordMutator}} requires that insert/update/delete operations are then also exposed on a {{TransactionBatch}} type. We've done this with the introduction of a public {{MutatorTransactionBatch}} interface which is a counterpart to the write-only {{TransactionBatch}}. We've also factored out life-cycle methods common to these two interfaces into a super {{BaseTransactionBatch}} interface. * Functionality that would be shared by implementations of both {{RecordWriters}} and {{RecordMutators}} has been factored out of {{AbstractRecordWriter}} into a
[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files
[ https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507225#comment-14507225 ] Owen O'Malley commented on HIVE-4227: - I've started working on this. I'll post a patch this week. Add column level encryption to ORC files Key: HIVE-4227 URL: https://issues.apache.org/jira/browse/HIVE-4227 Project: Hive Issue Type: New Feature Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4227) Add column level encryption to ORC files
[ https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4227: Labels: (was: gsoc gsoc2013) Add column level encryption to ORC files Key: HIVE-4227 URL: https://issues.apache.org/jira/browse/HIVE-4227 Project: Hive Issue Type: New Feature Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-4227) Add column level encryption to ORC files
[ https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HIVE-4227: --- Assignee: Owen O'Malley Add column level encryption to ORC files Key: HIVE-4227 URL: https://issues.apache.org/jira/browse/HIVE-4227 Project: Hive Issue Type: New Feature Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Labels: gsoc, gsoc2013 It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column
[ https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506522#comment-14506522 ] Hive QA commented on HIVE-10391: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727019/HIVE-10391.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3523/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3523/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3523/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727019 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column - Key: HIVE-10391 URL: https://issues.apache.org/jira/browse/HIVE-10391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0
[ https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506493#comment-14506493 ] Thejas M Nair commented on HIVE-9957: - [~subhashmv] If you don't want to build hive , you can also use Hive 1.0.0 (unless you are looking for some specific hive 1.1.0 feature). Hive 1.1.0 not compatible with Hadoop 2.4.0 --- Key: HIVE-9957 URL: https://issues.apache.org/jira/browse/HIVE-9957 Project: Hive Issue Type: Bug Components: Encryption Reporter: Vivek Shrivastava Assignee: Sergio Peña Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9957.1.patch Getting this exception while accessing data through Hive. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider; at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.init(Hadoop23Shims.java:1152) at org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279) at org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Attachment: CompressorProtocolHS2.patch Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL
[ https://issues.apache.org/jira/browse/HIVE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-10239: - Attachment: HIVE-10239.02.patch Attaching a patch that has some debug. Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL Key: HIVE-10239 URL: https://issues.apache.org/jira/browse/HIVE-10239 Project: Hive Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: HIVE-10239-donotcommit.patch, HIVE-10239.0.patch, HIVE-10239.0.patch, HIVE-10239.00.patch, HIVE-10239.01.patch, HIVE-10239.02.patch, HIVE-10239.patch Need to create DB-implementation specific scripts to use the framework introduced in HIVE-9800 to have any metastore schema changes tested across all supported databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column
[ https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507407#comment-14507407 ] Ashutosh Chauhan commented on HIVE-10391: - +1 TODOs of patch I guess will resolve themselves once we increase test coverage : ) I guess hope is once this gets in, we can probably enable IdentityProjectRemoval optimization again on return path. Right ? CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column - Key: HIVE-10391 URL: https://issues.apache.org/jira/browse/HIVE-10391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9726) Upgrade to spark 1.3 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507419#comment-14507419 ] Sushanth Sowmyan commented on HIVE-9726: Hi, I've had a request for inclusion of this patch in the upcoming 1.2 release. Looking at trunk's pom.xml, I see that the spark.version there is 1.2. Given that spark just released 1.3, is it feasible to port this patch to trunk as well? Upgrade to spark 1.3 [Spark Branch] --- Key: HIVE-9726 URL: https://issues.apache.org/jira/browse/HIVE-9726 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch Attachments: HIVE-9671.1-spark.patch, HIVE-9726.1-spark.patch, hive.log.txt.gz, yarn-am-stderr.txt, yarn-am-stdout.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column
[ https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507458#comment-14507458 ] Pengcheng Xiong commented on HIVE-10391: [~ashutoshc], im not sure if we can enable IdentityProjectRemoval because the reason that we turn IdentityProjectRemoval off is because of the difference between OI and RS. I will test and let you know. CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column - Key: HIVE-10391 URL: https://issues.apache.org/jira/browse/HIVE-10391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10275) GenericUDF getTimestampValue should return Timestamp instead of Date
[ https://issues.apache.org/jira/browse/HIVE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507561#comment-14507561 ] Jason Dere commented on HIVE-10275: --- +1 GenericUDF getTimestampValue should return Timestamp instead of Date Key: HIVE-10275 URL: https://issues.apache.org/jira/browse/HIVE-10275 Project: Hive Issue Type: Bug Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10275.1.patch Currently getTimestampValue casts Timestamp to Date and returns Date. Hive Timestamp type stores time with nanosecond precision. Timestamp class has getNanos method to extract nanoseconds. Date class has getTime method which returns unix time in milliseconds. So, it order to be able to get nanoseconds from Timestamp fields GenericUDF.getTimestampValue should return Timestamp instead of Date. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10409) Webhcat tests need to be updated, to accomodate HADOOP-10193
[ https://issues.apache.org/jira/browse/HIVE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aswathy Chellammal Sreekumar resolved HIVE-10409. - Resolution: Invalid Issue addressed by https://issues.apache.org/jira/browse/HADOOP-11859, no need to change tests Webhcat tests need to be updated, to accomodate HADOOP-10193 Key: HIVE-10409 URL: https://issues.apache.org/jira/browse/HIVE-10409 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.2.0 Reporter: Aswathy Chellammal Sreekumar Assignee: Aswathy Chellammal Sreekumar Priority: Minor Fix For: 1.2.0 Attachments: HIVE-10409.1.patch, HIVE-10409.patch Webhcat tests need to be updated to accommodate the url change brought in by HADOOP-10193. Add ?user.name=user-name for the templeton calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10437) NullPointerException on queries where map/reduce is not involved on tables with partitions
[ https://issues.apache.org/jira/browse/HIVE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507644#comment-14507644 ] Gunther Hagleitner commented on HIVE-10437: --- Seems pretty serious to break backward compat for SerDes. fyi [~ashutoshc]/[~navis] NullPointerException on queries where map/reduce is not involved on tables with partitions -- Key: HIVE-10437 URL: https://issues.apache.org/jira/browse/HIVE-10437 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Demeter Sztanko Priority: Critical Original Estimate: 0.5h Remaining Estimate: 0.5h On a table with partitions, whenever I try to do a simple query which tells hive not to execute mapreduce but just read data straight from hdfs, it raises an exception: {code} create external table jsonbug( a int, b string ) PARTITIONED BY ( `c` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'ignore.malformed.json'='true') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/jsonbug'; ALTER TABLE jsonbug ADD PARTITION(c='1'); {code} Runnin simple {code} select * from jsonbug; {code} Raises the following exception: {code} FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607) at org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578) at org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) at org.apache.hadoop.hive.ql.exec.FetchOperator.init(FetchOperator.java:140) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) {code} It works fine if I execute a query involving map/reduce job though. This problem occurs only when using SerDe's created for hive versions pre 1.1.0, those which do not have @SerDeSpec annotation specified. Most of the third party SerDE's, including hcat's JsonSerde have this problem as well. It seems like changes made in HIVE-7977 introduce this bug. See org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607) {code} Class? tableSerDe = tableDesc.getDeserializerClass(); String[] schemaProps = AnnotationUtils.getAnnotation(tableSerDe, SerDeSpec.class).schemaProps(); {code} And it also seems like a relatively easy fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10443: - Assignee: Vaibhav Gumashta HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-10403: - Attachment: HIVE-10403.03.patch Upload patch 03 for testing Add n-way join support for Hybrid Grace Hash Join - Key: HIVE-10403 URL: https://issues.apache.org/jira/browse/HIVE-10403 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-10403.01.patch, HIVE-10403.02.patch, HIVE-10403.03.patch Currently Hybrid Grace Hash Join only supports 2-way join (one big table and one small table). This task will enable n-way join (one big table and multiple small tables). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10441) Fix confusing log statement in SessionState about hive.execution.engine setting
[ https://issues.apache.org/jira/browse/HIVE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10441: -- Attachment: HIVE-10441.1.patch I don't see a lot of value in having this statement here to mention that no Tez Session is necessary, because that is redundant for mr/spark. Also, if a Tez session is created, there are log statements elsewhere for that. I'm just going to remove this log statement. Fix confusing log statement in SessionState about hive.execution.engine setting --- Key: HIVE-10441 URL: https://issues.apache.org/jira/browse/HIVE-10441 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10441.1.patch {code} LOG.info(No Tez session required at this point. hive.execution.engine=mr.); {code} This statement is misleading. It is true that it is printed in the case that Tez session does not need to be created, but it is not necessarily true that hive.execution.engine=mr - it could be Spark, or it could even be set to Tez but the Session determined that a Tez Session did not need to be created (which is the case for HiveServer2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10444: - Assignee: Gunther Hagleitner HIVE-10223 breaks hadoop-1 build Key: HIVE-10444 URL: https://issues.apache.org/jira/browse/HIVE-10444 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Gunther Hagleitner FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 are not present in hadoop 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10347) Merge spark to trunk 4/15/2015
[ https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507684#comment-14507684 ] Brock Noland commented on HIVE-10347: - +1 Merge spark to trunk 4/15/2015 -- Key: HIVE-10347 URL: https://issues.apache.org/jira/browse/HIVE-10347 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9726) Upgrade to spark 1.3 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507578#comment-14507578 ] Sushanth Sowmyan commented on HIVE-9726: +cc [~xuefuz] : Same question as above. :) Upgrade to spark 1.3 [Spark Branch] --- Key: HIVE-9726 URL: https://issues.apache.org/jira/browse/HIVE-9726 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch Attachments: HIVE-9671.1-spark.patch, HIVE-9726.1-spark.patch, hive.log.txt.gz, yarn-am-stderr.txt, yarn-am-stdout.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10385) Optionally disable partition creation to speedup ETL jobs
[ https://issues.apache.org/jira/browse/HIVE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507584#comment-14507584 ] Slava Markeyev commented on HIVE-10385: --- This came up on the mailing list the other week. The use case that several people seem to have is we use hive to ETL and partition data. We don't necessarily care about the metastore partitions because the output data gets moved (at least in my case) after the query is completed. This makes the table partitions unnecessary. Optionally disable partition creation to speedup ETL jobs - Key: HIVE-10385 URL: https://issues.apache.org/jira/browse/HIVE-10385 Project: Hive Issue Type: Improvement Components: Hive Reporter: Slava Markeyev Priority: Minor Attachments: HIVE-10385.patch ETL jobs that create dynamic partitions with high cardinality perform the expensive step of metastore partition creation after query completion. Until bulk partition creation can be optimized there should be a way of optionally skipping this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column
[ https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507594#comment-14507594 ] Pengcheng Xiong commented on HIVE-10391: [~ashutoshc], as I just tested, cbo_union.q will still fail if we turn IdentityProjectRemoval on with this patch. CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column - Key: HIVE-10391 URL: https://issues.apache.org/jira/browse/HIVE-10391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10442) HIVE-10098 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507639#comment-14507639 ] Prasanth Jayachandran commented on HIVE-10442: -- [~csun]/[~ychena] Can someone take a look at this one? HIVE-10098 broke hadoop-1 build --- Key: HIVE-10442 URL: https://issues.apache.org/jira/browse/HIVE-10442 Project: Hive Issue Type: Bug Reporter: Prasanth Jayachandran fs.addDelegationTokens() method does not seem to exist in hadoop 1.2.1. This breaks the hadoop-1 builds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10437) NullPointerException on queries where map/reduce is not involved on tables with partitions
[ https://issues.apache.org/jira/browse/HIVE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10437: -- Priority: Critical (was: Minor) NullPointerException on queries where map/reduce is not involved on tables with partitions -- Key: HIVE-10437 URL: https://issues.apache.org/jira/browse/HIVE-10437 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Demeter Sztanko Priority: Critical Original Estimate: 0.5h Remaining Estimate: 0.5h On a table with partitions, whenever I try to do a simple query which tells hive not to execute mapreduce but just read data straight from hdfs, it raises an exception: {code} create external table jsonbug( a int, b string ) PARTITIONED BY ( `c` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'ignore.malformed.json'='true') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/jsonbug'; ALTER TABLE jsonbug ADD PARTITION(c='1'); {code} Runnin simple {code} select * from jsonbug; {code} Raises the following exception: {code} FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607) at org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578) at org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) at org.apache.hadoop.hive.ql.exec.FetchOperator.init(FetchOperator.java:140) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) {code} It works fine if I execute a query involving map/reduce job though. This problem occurs only when using SerDe's created for hive versions pre 1.1.0, those which do not have @SerDeSpec annotation specified. Most of the third party SerDE's, including hcat's JsonSerde have this problem as well. It seems like changes made in HIVE-7977 introduce this bug. See org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607) {code} Class? tableSerDe = tableDesc.getDeserializerClass(); String[] schemaProps = AnnotationUtils.getAnnotation(tableSerDe, SerDeSpec.class).schemaProps(); {code} And it also seems like a relatively easy fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507649#comment-14507649 ] Prasanth Jayachandran commented on HIVE-10443: -- [~vgumashta] fyi.. HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10442) HIVE-10098 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10442: - Affects Version/s: 1.2.0 HIVE-10098 broke hadoop-1 build --- Key: HIVE-10442 URL: https://issues.apache.org/jira/browse/HIVE-10442 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran fs.addDelegationTokens() method does not seem to exist in hadoop 1.2.1. This breaks the hadoop-1 builds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9726) Upgrade to spark 1.3 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507661#comment-14507661 ] Sushanth Sowmyan commented on HIVE-9726: Awesome, thanks - I'll add that jira into the list then. Upgrade to spark 1.3 [Spark Branch] --- Key: HIVE-9726 URL: https://issues.apache.org/jira/browse/HIVE-9726 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch Attachments: HIVE-9671.1-spark.patch, HIVE-9726.1-spark.patch, hive.log.txt.gz, yarn-am-stderr.txt, yarn-am-stdout.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10444) HIVE-10223 breaks hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507674#comment-14507674 ] Prasanth Jayachandran commented on HIVE-10444: -- [~hagleitn] FYI... HIVE-10223 breaks hadoop-1 build Key: HIVE-10444 URL: https://issues.apache.org/jira/browse/HIVE-10444 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Gunther Hagleitner FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 are not present in hadoop 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL
[ https://issues.apache.org/jira/browse/HIVE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507723#comment-14507723 ] Hive QA commented on HIVE-10239: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727282/HIVE-10239.02.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3528/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3528/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3528/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727282 - PreCommit-HIVE-TRUNK-Build Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL Key: HIVE-10239 URL: https://issues.apache.org/jira/browse/HIVE-10239 Project: Hive Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: HIVE-10239-donotcommit.patch, HIVE-10239.0.patch, HIVE-10239.0.patch, HIVE-10239.00.patch, HIVE-10239.01.patch, HIVE-10239.02.patch, HIVE-10239.patch Need to create DB-implementation specific scripts to use the framework introduced in HIVE-9800 to have any metastore schema changes tested across all supported databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10227) Concrete implementation of Export/Import based ReplicationTaskFactory
[ https://issues.apache.org/jira/browse/HIVE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507738#comment-14507738 ] Alan Gates commented on HIVE-10227: --- +1 to committing this since we're agreed on 98% of it. I'm open to where you're going with this on the InvalidStateFactory, we'll continue the discussion on the other JIRA. Concrete implementation of Export/Import based ReplicationTaskFactory - Key: HIVE-10227 URL: https://issues.apache.org/jira/browse/HIVE-10227 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10227.2.patch, HIVE-10227.3.patch, HIVE-10227.4.patch, HIVE-10227.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507739#comment-14507739 ] Laljo John Pullokkaran commented on HIVE-10416: --- [~jcamachorodriguez] Introducing top level select needs to traverse recursively as long as nodes are sortrel and !ProjectRel. Practically this may happen only in very few cases (may be OB followed by limit). regardless its better to traverse it down till you hit a non sort rel. CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation
[ https://issues.apache.org/jira/browse/HIVE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507765#comment-14507765 ] Alan Gates commented on HIVE-10426: --- One thing that I mis-understood before that I want to make sure I have right now: ReplicationTask will be called in the context of the client. Part of my concern over the repeating error was that I thought this was being called in the context of the metastore server. In the client the repeated logs is less of a concern. I think this is a better approach with the invalid factory returning an error early and refusing to allow instantiation of replication tasks. +1 Rework/simplify ReplicationTaskFactory instantiation Key: HIVE-10426 URL: https://issues.apache.org/jira/browse/HIVE-10426 Project: Hive Issue Type: Sub-task Components: Import/Export Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10426.patch Creating a new jira to continue discussions from HIVE-10227 as to what ReplicationTask.Factory instantiation should look like. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10441) Fix confusing log statement in SessionState about hive.execution.engine setting
[ https://issues.apache.org/jira/browse/HIVE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507711#comment-14507711 ] Gunther Hagleitner commented on HIVE-10441: --- +1 Fix confusing log statement in SessionState about hive.execution.engine setting --- Key: HIVE-10441 URL: https://issues.apache.org/jira/browse/HIVE-10441 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10441.1.patch {code} LOG.info(No Tez session required at this point. hive.execution.engine=mr.); {code} This statement is misleading. It is true that it is printed in the case that Tez session does not need to be created, but it is not necessarily true that hive.execution.engine=mr - it could be Spark, or it could even be set to Tez but the Session determined that a Tez Session did not need to be created (which is the case for HiveServer2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore
[ https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-4625: Attachment: HIVE-4625.5.patch HS2 should not attempt to get delegation token from metastore if using embedded metastore - Key: HIVE-4625 URL: https://issues.apache.org/jira/browse/HIVE-4625 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, HIVE-4625.4.patch, HIVE-4625.5.patch In kerberos secure mode, with doas enabled, Hive server2 tries to get delegation token from metastore even if the metastore is being used in embedded mode. To avoid failure in that case, it uses catch block for UnsupportedOperationException thrown that does nothing. But this leads to an error being logged by lower levels and can mislead users into thinking that there is a problem. It should check if delegation token mode is supported with current configuration before calling the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures
[ https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508162#comment-14508162 ] Richard Williams commented on HIVE-10410: - [~vgumashta] I can confirm that this issue only seems to manifest with a remote metastore. [~ekoifman] That's what I suspect as well. I was taking a look at the code that implements asynchronous execution of submitted statements in org.apache.hive.service.cli.operation.SqlOperation, and I noticed this suspicious-looking bit of code in the runInternal method: {noformat} // ThreadLocal Hive object needs to be set in background thread. // The metastore client in Hive is associated with right user. final Hive parentHive = getSessionHive(); // Current UGI will get used by metastore when metsatore is in embedded mode // So this needs to get passed to the new background thread final UserGroupInformation currentUGI = getCurrentUGI(opConfig); // Runnable impl to call runInternal asynchronously, // from a different thread Runnable backgroundOperation = new Runnable() { @Override public void run() { PrivilegedExceptionActionObject doAsAction = new PrivilegedExceptionActionObject() { @Override public Object run() throws HiveSQLException { Hive.set(parentHive); SessionState.setCurrentSessionState(parentSessionState); // Set current OperationLog in this async thread for keeping on saving query log. registerCurrentOperationLog(); try { runQuery(opConfig); } catch (HiveSQLException e) { setOperationException(e); LOG.error(Error running hive query: , e); } finally { unregisterOperationLog(); } return null; } }; {noformat} Correct me if I'm wrong, but it seems to me that passing the parent thread's ThreadLocal Hive object to Hive.set in the children will effectively thwart the usage of ThreadLocal, resulting in the children and the parent all sharing the same Hive object. There are a number of paths in which calls to one of the Hive.get methods result in the current ThreadLocal Hive object being removed from the ThreadLocal map and replaced with a new Hive instance; however, I don't see anything that guarantees that that always happens on the first call to Hive.get in the child threads. Apparent race condition in HiveServer2 causing intermittent query failures -- Key: HIVE-10410 URL: https://issues.apache.org/jira/browse/HIVE-10410 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.1 Environment: CDH 5.3.3 CentOS 6.4 Reporter: Richard Williams On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC occasionally trigger odd Thrift exceptions with messages such as Read a negative frame size (-2147418110)! or out of sequence response in HiveServer2's connections to the metastore. For certain metastore calls (for example, showDatabases), these Thrift exceptions are converted to MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient from retrying these calls and thus causes the failure to bubble out to the JDBC client. Note that as far as we can tell, this issue appears to only affect queries that are submitted with the runAsync flag on TExecuteStatementReq set to true (which, in practice, seems to mean all JDBC queries), and it appears to only manifest when HiveServer2 is using the new HTTP transport mechanism. When both these conditions hold, we are able to fairly reliably reproduce the issue by spawning about 100 simple, concurrent hive queries (we have been using show databases), two or three of which typically fail. However, when either of these conditions do not hold, we are no longer able to reproduce the issue. Some example stack traces from the HiveServer2 logs: {noformat} 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException Read a negative frame size (-2147418110)! org.apache.thrift.transport.TTransportException: Read a negative frame size (-2147418110)! at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at
[jira] [Commented] (HIVE-9824) LLAP: Native Vectorization of Map Join
[ https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508178#comment-14508178 ] Lefty Leverenz commented on HIVE-9824: -- Doc note: This adds 5 configuration parameters (and changes indentation of the descriptions for 2 others). * hive.vectorized.execution.mapjoin.native.enabled * hive.vectorized.execution.mapjoin.native.multikey.only.enabled * hive.vectorized.execution.mapjoin.minmax.enabled * hive.vectorized.execution.mapjoin.overflow.repeated.threshold * hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled The new parameters need to be documented in the Vectorization section of Configuration Properties: * [Configuration Properties -- Vectorization | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization] Is any other documentation needed? LLAP: Native Vectorization of Map Join -- Key: HIVE-9824 URL: https://issues.apache.org/jira/browse/HIVE-9824 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, HIVE-9824.08.patch, HIVE-9824.09.patch Today's VectorMapJoinOperator is a pass-through that converts each row from a vectorized row batch in a Java Object[] row and passes it to the MapJoinOperator superclass. This enhancement creates specialized vectorized map join operator classes that are optimized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10451) IdentityProjectRemover removed useful project
[ https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10451: Attachment: HIVE-10451.patch IdentityProjectRemover removed useful project - Key: HIVE-10451 URL: https://issues.apache.org/jira/browse/HIVE-10451 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10451.patch In this particular case Select on top of PTF Op is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10451) IdentityProjectRemover removed useful project
[ https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10451: Reporter: Gopal V (was: Ashutosh Chauhan) IdentityProjectRemover removed useful project - Key: HIVE-10451 URL: https://issues.apache.org/jira/browse/HIVE-10451 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-10451.patch In this particular case Select on top of PTF Op is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.
[ https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-10452: - Attachment: HIVE-10452.patch Attached in a patch to resolve this issue. Followup fix for HIVE-10202 to restrict it it for script mode. -- Key: HIVE-10452 URL: https://issues.apache.org/jira/browse/HIVE-10452 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Attachments: HIVE-10452.patch The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.
[ https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-10452: - Description: The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. Followup fix for HIVE-10202 to restrict it it for script mode. -- Key: HIVE-10452 URL: https://issues.apache.org/jira/browse/HIVE-10452 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10453) HS2 leaking open file descriptors when using UDFs
[ https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen reassigned HIVE-10453: --- Assignee: Yongzhi Chen HS2 leaking open file descriptors when using UDFs - Key: HIVE-10453 URL: https://issues.apache.org/jira/browse/HIVE-10453 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen 1. create a custom function by CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar'; 2. Create a simple jdbc client, just do connect, run simple query which using the function such as: select myfunc(col1) from sometable 3. Disconnect. Check open file for HiveServer2 by: lsof -p HSProcID | grep myudf.jar You will see the leak as: {noformat} java 28718 ychen txt REG1,4741 212977666 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar java 28718 ychen 330r REG1,4741 212977666 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on LLAP: Memory manager
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-10233: -- Attachment: HIVE-10233-WIP-5.patch Fix runtime issues. Hive on LLAP: Memory manager Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10456: - Attachment: HIVE-10456.1.patch Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508292#comment-14508292 ] Prasanth Jayachandran commented on HIVE-10456: -- [~hagleitn]/[~wzheng]/[~mmokhtar] Can someone review this patch? Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10347) Merge spark to trunk 4/15/2015
[ https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508326#comment-14508326 ] Hive QA commented on HIVE-10347: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727366/HIVE-10347.6.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 8760 tests executed *Failed tests:* {noformat} TestCompareCliDriver - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3533/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3533/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3533/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727366 - PreCommit-HIVE-TRUNK-Build Merge spark to trunk 4/15/2015 -- Key: HIVE-10347 URL: https://issues.apache.org/jira/browse/HIVE-10347 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-10438: -- Assignee: Rohit Dholakia Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf, hs2resultSetcompressor.zip This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10451) IdentityProjectRemover removed useful project
[ https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508404#comment-14508404 ] Hive QA commented on HIVE-10451: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727484/HIVE-10451.patch {color:red}ERROR:{color} -1 due to 131 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input39 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merge_multi_expressions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1
[jira] [Commented] (HIVE-10429) LLAP: Abort hive tez processor on interrupts
[ https://issues.apache.org/jira/browse/HIVE-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508187#comment-14508187 ] Prasanth Jayachandran commented on HIVE-10429: -- [~hagleitn] can you take a look at this patch? LLAP: Abort hive tez processor on interrupts Key: HIVE-10429 URL: https://issues.apache.org/jira/browse/HIVE-10429 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10429.1.patch Executors in LLAP can be interrupted by the user (kill) or by system (pre-emption). The task interruption should be propagated all the way down to the operator pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10429) LLAP: Abort hive tez processor on interrupts
[ https://issues.apache.org/jira/browse/HIVE-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10429: - Attachment: HIVE-10429.1.patch LLAP: Abort hive tez processor on interrupts Key: HIVE-10429 URL: https://issues.apache.org/jira/browse/HIVE-10429 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10429.1.patch Executors in LLAP can be interrupted by the user (kill) or by system (pre-emption). The task interruption should be propagated all the way down to the operator pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10429) LLAP: Abort hive tez processor on interrupts
[ https://issues.apache.org/jira/browse/HIVE-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508230#comment-14508230 ] Gunther Hagleitner commented on HIVE-10429: --- Comments on rb LLAP: Abort hive tez processor on interrupts Key: HIVE-10429 URL: https://issues.apache.org/jira/browse/HIVE-10429 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10429.1.patch Executors in LLAP can be interrupted by the user (kill) or by system (pre-emption). The task interruption should be propagated all the way down to the operator pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10323) Tez merge join operator does not honor hive.join.emit.interval
[ https://issues.apache.org/jira/browse/HIVE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-10323: -- Attachment: HIVE-10323.2.patch Tez merge join operator does not honor hive.join.emit.interval -- Key: HIVE-10323 URL: https://issues.apache.org/jira/browse/HIVE-10323 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10323.1.patch, HIVE-10323.2.patch This affects efficiency in case of skews. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10324) Hive metatool should take table_param_key to allow for changes to avro serde's schema url key
[ https://issues.apache.org/jira/browse/HIVE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508270#comment-14508270 ] Ferdinand Xu commented on HIVE-10324: - Awesome!! You can add the following information to the use example section. Thanks [~leftylev] {noformat} ./hive --service metatool -updateLocation hdfs://localhost:9000 hdfs://namenode2:8020 -tablePropKey avro.schema.url -serdePropKey avro.schema.url Initializing HiveMetaTool.. 15/04/22 14:18:42 INFO metastore.ObjectStore: ObjectStore, initialize called 15/04/22 14:18:42 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 15/04/22 14:18:42 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 15/04/22 14:18:43 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order 15/04/22 14:18:43 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 15/04/22 14:18:43 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 15/04/22 14:18:44 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as embedded-only so does not have its own datastore table. 15/04/22 14:18:44 INFO DataNucleus.Datastore: The class org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so does not have its own datastore table. 15/04/22 14:18:44 INFO DataNucleus.Query: Reading in results for query org.datanucleus.store.rdbms.query.SQLQuery@0 since the connection used is closing 15/04/22 14:18:44 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 15/04/22 14:18:44 INFO metastore.ObjectStore: Initialized ObjectStore Looking for LOCATION_URI field in DBS table to update.. Successfully updated the following locations.. Updated 0 records in DBS table Looking for LOCATION field in SDS table to update.. Successfully updated the following locations.. Updated 0 records in SDS table Looking for value of avro.schema.url key in TABLE_PARAMS table to update.. Successfully updated the following locations.. Updated 0 records in TABLE_PARAMS table Looking for value of avro.schema.url key in SD_PARAMS table to update.. Successfully updated the following locations.. Updated 0 records in SD_PARAMS table Looking for value of avro.schema.url key in SERDE_PARAMS table to update.. Successfully updated the following locations.. Updated 0 records in SERDE_PARAMS table {noformat} Hive metatool should take table_param_key to allow for changes to avro serde's schema url key - Key: HIVE-10324 URL: https://issues.apache.org/jira/browse/HIVE-10324 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Ferdinand Xu Fix For: 1.2.0 Attachments: HIVE-10324.1.patch, HIVE-10324.patch, HIVE-10324.patch.WIP HIVE-3443 added support to change the serdeParams from 'metatool updateLocation' command. However, in avro it is possible to specify the schema via the tableParams: {noformat} CREATE TABLE `testavro`( `test` string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://namenode:8020/tmp/test.avsc', 'kite.compression.type'='snappy', 'transient_lastDdlTime'='1427996456') {noformat} Hence for those tables the 'metatool updateLocation' will not help. This is necessary in case like upgrade the namenode to HA where the absolute paths have changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10459) Add materialized views to Hive
[ https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-10459: -- Attachment: HIVE-10459.patch This patch is a start at implementing simple views. It doesn't have enough testing yet (e.g. there's no negative testing). And I know it fails in the partitioned case. I suspect things like security and locking don't work properly yet either. But I'm posting it as a starting point. In this initial patch I'm just handling simple materialized views with manual rebuilds. In later JIRAs we can add features such as allowing the optimizer to rewrite queries to use materialized views rather than tables named in the queries, giving the optimizer the ability to determine when a materialized view is stale, etc. Also, I didn't rebase this patch against trunk after the migration from svn-git so it may not apply cleanly. Add materialized views to Hive -- Key: HIVE-10459 URL: https://issues.apache.org/jira/browse/HIVE-10459 Project: Hive Issue Type: Improvement Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-10459.patch Materialized views are useful as ways to store either alternate versions of data (e.g. same data, different sort order) or derivatives of data sets (e.g. commonly used aggregates). It is useful to store these as materialized views rather than as tables because it can give the optimizer the ability to understand how data sets are related. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column
[ https://issues.apache.org/jira/browse/HIVE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10413: -- Attachment: HIVE-10413.2.patch [CBO] Return path assumes distinct column cant be same as grouping column - Key: HIVE-10413 URL: https://issues.apache.org/jira/browse/HIVE-10413 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Laljo John Pullokkaran Attachments: HIVE-10413.1.patch, HIVE-10413.2.patch, HIVE-10413.patch Found in cbo_udf_udaf.q tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures
[ https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508195#comment-14508195 ] Eugene Koifman commented on HIVE-10410: --- I think you are right, ThreadLocal in this case doesn't prevent multiple threads sharing a connection. Apparent race condition in HiveServer2 causing intermittent query failures -- Key: HIVE-10410 URL: https://issues.apache.org/jira/browse/HIVE-10410 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.1 Environment: CDH 5.3.3 CentOS 6.4 Reporter: Richard Williams On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC occasionally trigger odd Thrift exceptions with messages such as Read a negative frame size (-2147418110)! or out of sequence response in HiveServer2's connections to the metastore. For certain metastore calls (for example, showDatabases), these Thrift exceptions are converted to MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient from retrying these calls and thus causes the failure to bubble out to the JDBC client. Note that as far as we can tell, this issue appears to only affect queries that are submitted with the runAsync flag on TExecuteStatementReq set to true (which, in practice, seems to mean all JDBC queries), and it appears to only manifest when HiveServer2 is using the new HTTP transport mechanism. When both these conditions hold, we are able to fairly reliably reproduce the issue by spawning about 100 simple, concurrent hive queries (we have been using show databases), two or three of which typically fail. However, when either of these conditions do not hold, we are no longer able to reproduce the issue. Some example stack traces from the HiveServer2 logs: {noformat} 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException Read a negative frame size (-2147418110)! org.apache.thrift.transport.TTransportException: Read a negative frame size (-2147418110)! at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837) at org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90) at com.sun.proxy.$Proxy6.getDatabases(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139) at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69) at
[jira] [Assigned] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column
[ https://issues.apache.org/jira/browse/HIVE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-10413: - Assignee: Laljo John Pullokkaran (was: Ashutosh Chauhan) [CBO] Return path assumes distinct column cant be same as grouping column - Key: HIVE-10413 URL: https://issues.apache.org/jira/browse/HIVE-10413 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Laljo John Pullokkaran Attachments: HIVE-10413.1.patch, HIVE-10413.patch Found in cbo_udf_udaf.q tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10424) LLAP: Factor known capacity into scheduling decisions
[ https://issues.apache.org/jira/browse/HIVE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10424: -- Attachment: HIVE-10424.1.txt Patch to factor in running queue + wait queue capacity per node. Also moves all scheduler on to a single thread - requests go on to a queue and are taken off whenever a node becomes available, or has capacity. Can run with the old 'unlimited' capacity by setting llap.task.scheduler.num.schedulable.tasks.per.node to -1 LLAP: Factor known capacity into scheduling decisions - Key: HIVE-10424 URL: https://issues.apache.org/jira/browse/HIVE-10424 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10424.1.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10456: - Affects Version/s: (was: llap) 1.2.0 Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10457) Merge trunk to spark (4/22/15) [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10457: - Summary: Merge trunk to spark (4/22/15) [Spark Branch] (was: Merge trunk to spark (4/22/15)) Merge trunk to spark (4/22/15) [Spark Branch] - Key: HIVE-10457 URL: https://issues.apache.org/jira/browse/HIVE-10457 Project: Hive Issue Type: Bug Reporter: Szehon Ho -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9674) *DropPartitionEvent should handle partition-sets.
[ https://issues.apache.org/jira/browse/HIVE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508407#comment-14508407 ] Hive QA commented on HIVE-9674: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12726535/HIVE-9674.4.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3535/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3535/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3535/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3535/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientpositive/windowing_navfn.q.out' Reverted 'ql/src/test/queries/clientpositive/windowing_navfn.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/thirdparty itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target itests/util/target itests/qtest-spark/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen spark-client/target contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1675534. At revision 1675534. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12726535 - PreCommit-HIVE-TRUNK-Build *DropPartitionEvent should handle partition-sets. - Key: HIVE-9674 URL: https://issues.apache.org/jira/browse/HIVE-9674 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9674.2.patch, HIVE-9674.3.patch, HIVE-9674.4.patch Dropping a set of N partitions from a table currently results in N DropPartitionEvents (and N PreDropPartitionEvents) being fired serially. This is wasteful, especially so for large N. It also makes it impossible to even try to run authorization-checks on all partitions in a batch. Taking the cue from HIVE-9609, we should compose an {{IterablePartition}} in the event, and expose them via an
[jira] [Commented] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation
[ https://issues.apache.org/jira/browse/HIVE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508409#comment-14508409 ] Hive QA commented on HIVE-10426: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12726987/HIVE-10426.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3536/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3536/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3536/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3536/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1675535. At revision 1675535. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12726987 - PreCommit-HIVE-TRUNK-Build Rework/simplify ReplicationTaskFactory instantiation Key: HIVE-10426 URL: https://issues.apache.org/jira/browse/HIVE-10426 Project: Hive Issue Type: Sub-task Components: Import/Export Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10426.patch Creating a new jira to continue discussions from HIVE-10227 as to what ReplicationTask.Factory instantiation should look like. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-5672: Attachment: HIVE-5672.5.patch.tar.gz Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.
[ https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-10452: - Description: The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. say {code} beeline -u jdbc:hive2://localhost:1 --silent=true {code} It appears to hang but in reality it doesnt display any prompt. The workaround is to not use the --silent=true option with non-interactive mode. was: The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. Followup fix for HIVE-10202 to restrict it it for script mode. -- Key: HIVE-10452 URL: https://issues.apache.org/jira/browse/HIVE-10452 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 1.2.0 Attachments: HIVE-10452.patch The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. say {code} beeline -u jdbc:hive2://localhost:1 --silent=true {code} It appears to hang but in reality it doesnt display any prompt. The workaround is to not use the --silent=true option with non-interactive mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10454: Description: The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. was: The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-10454: --- Assignee: Aihua Xu Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10424) LLAP: Factor known capacity into scheduling decisions
[ https://issues.apache.org/jira/browse/HIVE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved HIVE-10424. --- Resolution: Fixed LLAP: Factor known capacity into scheduling decisions - Key: HIVE-10424 URL: https://issues.apache.org/jira/browse/HIVE-10424 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10424.1.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-5672: Attachment: HIVE-5672.4.patch Re base the patch. the .out file will be added later. Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join
[ https://issues.apache.org/jira/browse/HIVE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507863#comment-14507863 ] Hive QA commented on HIVE-10403: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727302/HIVE-10403.03.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8729 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3529/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3529/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3529/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727302 - PreCommit-HIVE-TRUNK-Build Add n-way join support for Hybrid Grace Hash Join - Key: HIVE-10403 URL: https://issues.apache.org/jira/browse/HIVE-10403 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Wei Zheng Attachments: HIVE-10403.01.patch, HIVE-10403.02.patch, HIVE-10403.03.patch Currently Hybrid Grace Hash Join only supports 2-way join (one big table and one small table). This task will enable n-way join (one big table and multiple small tables). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10328) Enable new return path for cbo
[ https://issues.apache.org/jira/browse/HIVE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10328: Attachment: HIVE-10328.2.patch Enable new return path for cbo -- Key: HIVE-10328 URL: https://issues.apache.org/jira/browse/HIVE-10328 Project: Hive Issue Type: Task Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10328.1.patch, HIVE-10328.2.patch, HIVE-10328.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10441) Fix confusing log statement in SessionState about hive.execution.engine setting
[ https://issues.apache.org/jira/browse/HIVE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508020#comment-14508020 ] Hive QA commented on HIVE-10441: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727303/HIVE-10441.1.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8726 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3530/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3530/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3530/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727303 - PreCommit-HIVE-TRUNK-Build Fix confusing log statement in SessionState about hive.execution.engine setting --- Key: HIVE-10441 URL: https://issues.apache.org/jira/browse/HIVE-10441 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10441.1.patch {code} LOG.info(No Tez session required at this point. hive.execution.engine=mr.); {code} This statement is misleading. It is true that it is printed in the case that Tez session does not need to be created, but it is not necessarily true that hive.execution.engine=mr - it could be Spark, or it could even be set to Tez but the Session determined that a Tez Session did not need to be created (which is the case for HiveServer2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore
[ https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508150#comment-14508150 ] Hive QA commented on HIVE-4625: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727315/HIVE-4625.5.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3531/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3531/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3531/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727315 - PreCommit-HIVE-TRUNK-Build HS2 should not attempt to get delegation token from metastore if using embedded metastore - Key: HIVE-4625 URL: https://issues.apache.org/jira/browse/HIVE-4625 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, HIVE-4625.4.patch, HIVE-4625.5.patch In kerberos secure mode, with doas enabled, Hive server2 tries to get delegation token from metastore even if the metastore is being used in embedded mode. To avoid failure in that case, it uses catch block for UnsupportedOperationException thrown that does nothing. But this leads to an error being logged by lower levels and can mislead users into thinking that there is a problem. It should check if delegation token mode is supported with current configuration before calling the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore
[ https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508168#comment-14508168 ] Thejas M Nair commented on HIVE-4625: - +1 HS2 should not attempt to get delegation token from metastore if using embedded metastore - Key: HIVE-4625 URL: https://issues.apache.org/jira/browse/HIVE-4625 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, HIVE-4625.4.patch, HIVE-4625.5.patch In kerberos secure mode, with doas enabled, Hive server2 tries to get delegation token from metastore even if the metastore is being used in embedded mode. To avoid failure in that case, it uses catch block for UnsupportedOperationException thrown that does nothing. But this leads to an error being logged by lower levels and can mislead users into thinking that there is a problem. It should check if delegation token mode is supported with current configuration before calling the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9824) LLAP: Native Vectorization of Map Join
[ https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9824: - Labels: TODOC1.2 (was: ) LLAP: Native Vectorization of Map Join -- Key: HIVE-9824 URL: https://issues.apache.org/jira/browse/HIVE-9824 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, HIVE-9824.08.patch, HIVE-9824.09.patch Today's VectorMapJoinOperator is a pass-through that converts each row from a vectorized row batch in a Java Object[] row and passes it to the MapJoinOperator superclass. This enhancement creates specialized vectorized map join operator classes that are optimized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc
[ https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10397: - Attachment: (was: HIVE-10397.trunk.patch) LLAP: Implement Tez SplitSizeEstimator for Orc -- Key: HIVE-10397 URL: https://issues.apache.org/jira/browse/HIVE-10397 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10397.patch This is patch for HIVE-7428. For now this will be in llap branch as hive has not bumped up the tez version yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10347) Merge spark to trunk 4/15/2015
[ https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10347: - Attachment: HIVE-10347.6.patch Merge spark to trunk 4/15/2015 -- Key: HIVE-10347 URL: https://issues.apache.org/jira/browse/HIVE-10347 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10347) Merge spark to trunk 4/15/2015
[ https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10347: - Attachment: (was: HIVE-10347.6.patch) Merge spark to trunk 4/15/2015 -- Key: HIVE-10347 URL: https://issues.apache.org/jira/browse/HIVE-10347 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication
[ https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508124#comment-14508124 ] Xuefu Zhang commented on HIVE-10312: +1 SASL.QOP in JDBC URL is ignored for Delegation token Authentication --- Key: HIVE-10312 URL: https://issues.apache.org/jira/browse/HIVE-10312 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 1.2.0 Reporter: Mubashir Kazia Assignee: Mubashir Kazia Fix For: 1.2.0 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch When HS2 is configured for QOP other than auth (auth-int or auth-conf), Kerberos client connection works fine when the JDBC URL specifies the matching QOP, however when this HS2 is accessed through Oozie (Delegation token / Digest authentication), connections fails because the JDBC driver ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be valid for DIGEST Auth mech. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10347) Merge spark to trunk 4/15/2015
[ https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10347: - Attachment: HIVE-10347.6.patch Made a version of the patch for git (now the repos have changed). Submitting it to be tested. Merge spark to trunk 4/15/2015 -- Key: HIVE-10347 URL: https://issues.apache.org/jira/browse/HIVE-10347 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10368) VectorExpressionWriter doesn't match vectorColumn during row spilling in HybridGraceHashJoin
[ https://issues.apache.org/jira/browse/HIVE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507950#comment-14507950 ] Wei Zheng commented on HIVE-10368: -- Here's another similar failure, probably related. TestMiniTezCliDriver.testCliDriver_vector_char_mapjoin1 Caused by: java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcStruct at org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:232) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:316) at org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addProjectedRowToBatchFrom(VectorizedBatchUtil.java:271) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.reProcessBigTable(VectorMapJoinOperator.java:320) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:530) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:485) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.closeOp(VectorMapJoinOperator.java:237) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:616) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:630) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:324) ... 14 more VectorExpressionWriter doesn't match vectorColumn during row spilling in HybridGraceHashJoin Key: HIVE-10368 URL: https://issues.apache.org/jira/browse/HIVE-10368 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Wei Zheng Assignee: Matt McCline This problem was exposed by HIVE-10284, when testing vectorized_context.q Below is the query and backtrace: {code} select store.s_city, ss_net_profit from store_sales JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN household_demographics ON store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk limit 100 {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:175) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.getRowObject(VectorMapJoinOperator.java:347) at org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:306) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390) ... 24 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc
[ https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10397: - Attachment: (was: HIVE-10397.trunk.patch) LLAP: Implement Tez SplitSizeEstimator for Orc -- Key: HIVE-10397 URL: https://issues.apache.org/jira/browse/HIVE-10397 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch This is patch for HIVE-7428. For now this will be in llap branch as hive has not bumped up the tez version yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc
[ https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10397: - Attachment: HIVE-10397.trunk.patch [~gopalv] I made a patch based on your suggestions. This patch removes OrcInputFormat implementing SplitSizeEstimator. Instead a new generic ColumnSizeEstimator is added to tez code path. But this will still not solve the hadoop-1 issue or compatible with older tez installs. LLAP: Implement Tez SplitSizeEstimator for Orc -- Key: HIVE-10397 URL: https://issues.apache.org/jira/browse/HIVE-10397 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch This is patch for HIVE-7428. For now this will be in llap branch as hive has not bumped up the tez version yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc
[ https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10397: - Attachment: HIVE-10397.trunk.patch LLAP: Implement Tez SplitSizeEstimator for Orc -- Key: HIVE-10397 URL: https://issues.apache.org/jira/browse/HIVE-10397 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch This is patch for HIVE-7428. For now this will be in llap branch as hive has not bumped up the tez version yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc
[ https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10397: - Attachment: HIVE-10397.trunk.patch LLAP: Implement Tez SplitSizeEstimator for Orc -- Key: HIVE-10397 URL: https://issues.apache.org/jira/browse/HIVE-10397 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch This is patch for HIVE-7428. For now this will be in llap branch as hive has not bumped up the tez version yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9711) ORC Vectorization DoubleColumnVector.isRepeating=false if all entries are NaN
[ https://issues.apache.org/jira/browse/HIVE-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508084#comment-14508084 ] Prasanth Jayachandran commented on HIVE-9711: - Committed patch to master. Thanks [~gopalv] for the patch! ORC Vectorization DoubleColumnVector.isRepeating=false if all entries are NaN - Key: HIVE-9711 URL: https://issues.apache.org/jira/browse/HIVE-9711 Project: Hive Issue Type: Bug Components: File Formats, Vectorization Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Fix For: 1.2.0 Attachments: HIVE-9711.1.patch, HIVE-9711.2.patch, HIVE-9711.3.patch The isRepeating=true check uses Java equality, which results in NaN != NaN comparison operations. The noNulls case needs the current check folded into the previous loop, while the hasNulls case needs a logical AND of the isNull[] field instead of == comparisons. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9824) LLAP: Native Vectorization of Map Join so previously CPU bound queries shift their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)
[ https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508014#comment-14508014 ] Matt McCline commented on HIVE-9824: Added 2 new JIRA, as [~sershe] requested: HIVE-10448: Consider replacing BytesBytesMultiHashMap with new fast hash table code of Native Vector Map Join HIVE-10449: LLAP: Make new fast hash table for Native Vector Map Join work with Hybrid Grace LLAP: Native Vectorization of Map Join so previously CPU bound queries shift their bottleneck to I/O and make it possible for the rest of LLAP to shine ;) -- Key: HIVE-9824 URL: https://issues.apache.org/jira/browse/HIVE-9824 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, HIVE-9824.08.patch, HIVE-9824.09.patch Today's VectorMapJoinOperator is a pass-through that converts each row from a vectorized row batch in a Java Object[] row and passes it to the MapJoinOperator superclass. This enhancement creates specialized vectorized map join operator classes that are optimized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.
[ https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508474#comment-14508474 ] Hive QA commented on HIVE-10452: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727486/HIVE-10452.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3537/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3537/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3537/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727486 - PreCommit-HIVE-TRUNK-Build Followup fix for HIVE-10202 to restrict it it for script mode. -- Key: HIVE-10452 URL: https://issues.apache.org/jira/browse/HIVE-10452 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 1.2.0 Attachments: HIVE-10452.patch The fix made in HIVE-10202 needs to be limited to when beeline is running in a script mode aka -f option. Otherwise, if --silent=true is set in interactive mode, the prompt disappears and so does what you type in. say {code} beeline -u jdbc:hive2://localhost:1 --silent=true {code} It appears to hang but in reality it doesnt display any prompt. The workaround is to not use the --silent=true option with non-interactive mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Attachment: (was: ReflectiveOperationWriter.java) Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Fix For: 1.2.0 h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation Our changes do not break the existing API contracts. Instead our approach has been to consider the functionality offered by the existing API and our proposed API as fulfilling separate and distinct use-cases. The existing API is primarily focused on the task of continuously writing large volumes of new data into a Hive table for near-immediate analysis. Our use-case however, is concerned more with the frequent but not continuous ingestion of mutations to a Hive table from some ETL merge process. Consequently we feel it is justifiable to add our new functionality via an alternative set of public interfaces and leave the existing API as is. This keeps both APIs clean and focused at the expense of presenting additional options to potential users. Wherever possible, shared implementation concerns have been factored out into abstract base classes that are open to third-party extension. A detailed breakdown of the changes is as follows: * We've introduced a public {{RecordMutator}} interface whose purpose is to expose insert/update/delete operations to the user. This is a counterpart to the write-only {{RecordWriter}}. We've also factored out life-cycle methods common to these two interfaces into a super {{RecordOperationWriter}} interface. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. * The introduction of {{RecordMutator}} requires that insert/update/delete operations are then also exposed on a {{TransactionBatch}} type. We've done this with the introduction of a public {{MutatorTransactionBatch}} interface which is a counterpart to the write-only {{TransactionBatch}}. We've also factored out life-cycle methods common to these two interfaces into a super {{BaseTransactionBatch}} interface. * Functionality that would be shared by implementations of both {{RecordWriters}} and {{RecordMutators}} has been factored out of {{AbstractRecordWriter}} into a new abstract base class
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Description: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation Our changes do not break the existing API contracts. Instead our approach has been to consider the functionality offered by the existing API and our proposed API as fulfilling separate and distinct use-cases. The existing API is primarily focused on the task of continuously writing large volumes of new data into a Hive table for near-immediate analysis. Our use-case however, is concerned more with the frequent but not continuous ingestion of mutations to a Hive table from some ETL merge process. Consequently we feel it is justifiable to add our new functionality via an alternative set of public interfaces and leave the existing API as is. This keeps both APIs clean and focused at the expense of presenting additional options to potential users. Wherever possible, shared implementation concerns have been factored out into abstract base classes that are open to third-party extension. A detailed breakdown of the changes is as follows: * We've introduced a public {{RecordMutator}} interface whose purpose is to expose insert/update/delete operations to the user. This is a counterpart to the write-only {{RecordWriter}}. We've also factored out life-cycle methods common to these two interfaces into a super {{RecordOperationWriter}} interface. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. * The introduction of {{RecordMutator}} requires that insert/update/delete operations are then also exposed on a {{TransactionBatch}} type. We've done this with the introduction of a public {{MutatorTransactionBatch}} interface which is a counterpart to the write-only {{TransactionBatch}}. We've also factored out life-cycle methods common to these two interfaces into a super {{BaseTransactionBatch}} interface. * Functionality that would be shared by implementations of both {{RecordWriters}} and {{RecordMutators}} has been factored out of {{AbstractRecordWriter}} into a new abstract base class {{AbstractOperationRecordWriter}}. The visibility is such that it is open to extension by third parties. The {{AbstractOperationRecordWriter}} also permits the setting of the {{AcidOutputFormat.Options#recordIdColumn()}} (defaulted to {{-1}}) which is a requirement for enabling updates and deletes. Additionally, these options are now fed an {{ObjectInspector}} via an abstract method so that a {{SerDe}} is not mandated (it was not required for our use-case). The {{AbstractRecordWriter}} is now much leaner, handling only the extraction of the {{ObjectInspector}} from the {{SerDe}}. * A new abstract class,
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Attachment: hs2resultSetcompressor.zip Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.1.0 Reporter: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf, hs2resultSetcompressor.zip This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)