date:20150422


 [ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-10438:
--
Attachment: Proposal-rscompressor.pdf

Design document explaining the changes for ResultSet compressor architecture

 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10438
 URL: https://issues.apache.org/jira/browse/HIVE-10438
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.1.0
Reporter: Rohit Dholakia
  Labels: patch
 Attachments: Proposal-rscompressor.pdf


 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors.
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5850) Multiple table join error for avro

2015-04-22 Thread Miguel Romero (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507323#comment-14507323
 ] 

Miguel Romero commented on HIVE-5850:
-

I have just found with the same problem. Is there a workaround? Will it be 
solved in any version?

 Multiple table join error for avro 
 ---

 Key: HIVE-5850
 URL: https://issues.apache.org/jira/browse/HIVE-5850
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Shengjun Xin
 Attachments: part.tar.gz, partsupp.tar.gz, schema.tar.gz


 Reproduce step:
 {code}
 -- Create table Part.
 CREATE EXTERNAL TABLE part
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 'hdfs://hostname/user/hadoop/tpc-h/data/part'
 TBLPROPERTIES 
 ('avro.schema.url'='hdfs://hostname/user/hadoop/tpc-h/schema/part.avsc');
 -- Create table Part Supplier.
 CREATE EXTERNAL TABLE partsupp
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 'hdfs://hostname/user/hadoop/tpc-h/data/partsupp'
 TBLPROPERTIES 
 ('avro.schema.url'='hdfs://hostname/user/hadoop/tpc-h/schema/partsupp.avsc');
 --- Query
 select * from partsupp ps join part p on ps.ps_partkey = p.p_partkey where 
 p.p_partkey=1;
 {code}
 {code}
 Error message is:
 Error: java.io.IOException: java.io.IOException: 
 org.apache.avro.AvroTypeException: Found {
   type : record,
   name : partsupp,
   namespace : com.gs.sdst.pl.avro.tpch,
   fields : [ {
 name : ps_partkey,
 type : long
   }, {
 name : ps_suppkey,
 type : long
   }, {
 name : ps_availqty,
 type : long
   }, {
 name : ps_supplycost,
 type : double
   }, {
 name : ps_comment,
 type : string
   }, {
 name : systimestamp,
 type : long
   } ]
 }, expecting {
   type : record,
   name : part,
   namespace : com.gs.sdst.pl.avro.tpch,
   fields : [ {
 name : p_partkey,
 type : long
   }, {
 name : p_name,
 type : string
   }, {
 name : p_mfgr,
 type : string
   }, {
 name : p_brand,
 type : string
   }, {
 name : p_type,
 type : string
   }, {
 name : p_size,
 type : int
   }, {
 name : p_container,
 type : string
   }, {
 name : p_retailprice,
 type : double
   }, {
 name : p_comment,
 type : string
   }, {
 name : systimestamp,
 type : long
   } ]
 }
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10439) Architecture for ResultSet Compression via external plugin


 [ 
https://issues.apache.org/jira/browse/HIVE-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia resolved HIVE-10439.
---
Resolution: Duplicate

 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10439
 URL: https://issues.apache.org/jira/browse/HIVE-10439
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.1.0
Reporter: Rohit Dholakia
  Labels: patch

 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors.
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin


 [ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-10438:
--
Attachment: TestingIntegerCompression.pdf

 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10438
 URL: https://issues.apache.org/jira/browse/HIVE-10438
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.1.0
Reporter: Rohit Dholakia
  Labels: patch
 Attachments: Proposal-rscompressor.pdf, TestingIntegerCompression.pdf


 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors.
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite


[ 
https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507383#comment-14507383
 ] 

Ashutosh Chauhan commented on HIVE-10416:
-

I like new patch since it projects only needed columns while generating Sel Op 
as oppose to adding unnecessary SelOp at  the top.
[~jpullokkaran] what do you think?

 CBO (Calcite Return Path): Fix return columns if Sort operator is on top of 
 plan returned by Calcite
 

 Key: HIVE-10416
 URL: https://issues.apache.org/jira/browse/HIVE-10416
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.2.0

 Attachments: HIVE-10416.01.patch, HIVE-10416.patch


 When return path is on, if the plan's top operator is a Sort, we need to 
 produce a SelectOp that will output exactly the columns needed by the FS.
 The following query reproduces the problem:
 {noformat}
 select cbo_t3.c_int, c, count(*)
 from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1
 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int  0 or cbo_t1.c_float = 0)
 group by c_float, cbo_t1.c_int, key order by a) cbo_t1
 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2
 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int  0 or cbo_t2.c_float = 0)
 group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on 
 cbo_t1.a=p
 join cbo_t3 on cbo_t1.a=key
 where (b + cbo_t2.q = 0) and (b  0 or c_int = 0)
 group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10440) Architecture for ResultSet Compression via external plugin


 [ 
https://issues.apache.org/jira/browse/HIVE-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia resolved HIVE-10440.
---
Resolution: Duplicate

 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10440
 URL: https://issues.apache.org/jira/browse/HIVE-10440
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.1.0
Reporter: Rohit Dholakia
  Labels: patch

 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors.
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9824) LLAP: Native Vectorization of Map Join so previously CPU bound queries shift their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)


[ 
https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507257#comment-14507257
 ] 

Hive QA commented on HIVE-9824:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727207/HIVE-9824.09.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 8750 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3527/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3527/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3527/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727207 - PreCommit-HIVE-TRUNK-Build

 LLAP: Native Vectorization of Map Join so previously CPU bound queries shift 
 their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)
 --

 Key: HIVE-9824
 URL: https://issues.apache.org/jira/browse/HIVE-9824
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, 
 HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, 
 HIVE-9824.08.patch, HIVE-9824.09.patch


 Today's VectorMapJoinOperator is a pass-through that converts each row from a 
 vectorized row batch in a Java Object[] row and passes it to the 
 MapJoinOperator superclass.
 This enhancement creates specialized vectorized map join operator classes 
 that are optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10417) Parallel Order By return wrong results for partitioned tables

2015-04-22 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10417:
-
Component/s: Query Processor

 Parallel Order By return wrong results for partitioned tables
 -

 Key: HIVE-10417
 URL: https://issues.apache.org/jira/browse/HIVE-10417
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0, 0.13.1, 1.0.0
Reporter: Nemon Lou
Assignee: Nemon Lou
 Attachments: HIVE-10417.patch


 Following is the script that reproduce this bug.
 set hive.optimize.sampling.orderby=true;
 set mapreduce.job.reduces=10;
 select * from src order by key desc limit 10;
 +--++
 | src.key  | src.value  |
 +--++
 | 98   | val_98 |
 | 98   | val_98 |
 | 97   | val_97 |
 | 97   | val_97 |
 | 96   | val_96 |
 | 95   | val_95 |
 | 95   | val_95 |
 | 92   | val_92 |
 | 90   | val_90 |
 | 90   | val_90 |
 +--++
 10 rows selected (47.916 seconds)
 reset;
 create table src_orc_p (key string ,value string )
 partitioned by (kp string)
 stored as orc
 tblproperties(orc.compress=SNAPPY);
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.exec.max.dynamic.partitions.pernode=1;
 set hive.exec.max.dynamic.partitions=1;
 insert into table src_orc_p partition(kp) select *,substring(key,1) from src 
 distribute by substring(key,1);
 set mapreduce.job.reduces=10;
 set hive.optimize.sampling.orderby=true;
 select * from src_orc_p order by key desc limit 10;
 ++--+-+
 | src_orc_p.key  | src_orc_p.value  | src_orc_p.kend  |
 ++--+-+
 | 0  | val_0| 0   |
 | 0  | val_0| 0   |
 | 0  | val_0| 0   |
 ++--+-+
 3 rows selected (39.861 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite


[ 
https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507138#comment-14507138
 ] 

Hive QA commented on HIVE-10416:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727191/HIVE-10416.01.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3526/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3526/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3526/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727191 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): Fix return columns if Sort operator is on top of 
 plan returned by Calcite
 

 Key: HIVE-10416
 URL: https://issues.apache.org/jira/browse/HIVE-10416
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.2.0

 Attachments: HIVE-10416.01.patch, HIVE-10416.patch


 When return path is on, if the plan's top operator is a Sort, we need to 
 produce a SelectOp that will output exactly the columns needed by the FS.
 The following query reproduces the problem:
 {noformat}
 select cbo_t3.c_int, c, count(*)
 from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1
 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int  0 or cbo_t1.c_float = 0)
 group by c_float, cbo_t1.c_int, key order by a) cbo_t1
 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2
 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int  0 or cbo_t2.c_float = 0)
 group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on 
 cbo_t1.a=p
 join cbo_t3 on cbo_t1.a=key
 where (b + cbo_t2.q = 0) and (b  0 or c_int = 0)
 group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.


 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Attachment: (was: HIVE-10165.0.patch)

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Fix For: 1.2.0


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.
 h3. Implementation
 Our changes do not break the existing API contracts. Instead our approach has 
 been to consider the functionality offered by the existing API and our 
 proposed API as fulfilling separate and distinct use-cases. The existing API 
 is primarily focused on the task of continuously writing large volumes of new 
 data into a Hive table for near-immediate analysis. Our use-case however, is 
 concerned more with the frequent but not continuous ingestion of mutations to 
 a Hive table from some ETL merge process. Consequently we feel it is 
 justifiable to add our new functionality via an alternative set of public 
 interfaces and leave the existing API as is. This keeps both APIs clean and 
 focused at the expense of presenting additional options to potential users. 
 Wherever possible, shared implementation concerns have been factored out into 
 abstract base classes that are open to third-party extension. A detailed 
 breakdown of the changes is as follows:
 * We've introduced a public {{RecordMutator}} interface whose purpose is to 
 expose insert/update/delete operations to the user. This is a counterpart to 
 the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
 common to these two interfaces into a super {{RecordOperationWriter}} 
 interface.  Note that the row representation has be changed from {{byte[]}} 
 to {{Object}}. Within our data processing jobs our records are often 
 available in a strongly typed and decoded form such as a POJO or a Tuple 
 object. Therefore is seems to make sense that we are able to pass this 
 through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
 encoding step. This of course still allows users to use {{byte[]}} if they 
 wish.
 * The introduction of {{RecordMutator}} requires that insert/update/delete 
 operations are then also exposed on a {{TransactionBatch}} type. We've done 
 this with the introduction of a public {{MutatorTransactionBatch}} interface 
 which is a counterpart to the write-only {{TransactionBatch}}. We've also 
 factored out life-cycle methods common to these two interfaces into a super 
 {{BaseTransactionBatch}} interface. 
 * Functionality that would be shared by implementations of both 
 {{RecordWriters}} and {{RecordMutators}} has been factored out of 
 {{AbstractRecordWriter}} into a new abstract base class 
 {{AbstractOperationRecordWriter}}.

[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.


 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Attachment: HIVE-10165.0.patch

Updated patch. Includes tests.

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Fix For: 1.2.0

 Attachments: HIVE-10165.0.patch


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.
 h3. Implementation
 Our changes do not break the existing API contracts. Instead our approach has 
 been to consider the functionality offered by the existing API and our 
 proposed API as fulfilling separate and distinct use-cases. The existing API 
 is primarily focused on the task of continuously writing large volumes of new 
 data into a Hive table for near-immediate analysis. Our use-case however, is 
 concerned more with the frequent but not continuous ingestion of mutations to 
 a Hive table from some ETL merge process. Consequently we feel it is 
 justifiable to add our new functionality via an alternative set of public 
 interfaces and leave the existing API as is. This keeps both APIs clean and 
 focused at the expense of presenting additional options to potential users. 
 Wherever possible, shared implementation concerns have been factored out into 
 abstract base classes that are open to third-party extension. A detailed 
 breakdown of the changes is as follows:
 * We've introduced a public {{RecordMutator}} interface whose purpose is to 
 expose insert/update/delete operations to the user. This is a counterpart to 
 the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
 common to these two interfaces into a super {{RecordOperationWriter}} 
 interface.  Note that the row representation has be changed from {{byte[]}} 
 to {{Object}}. Within our data processing jobs our records are often 
 available in a strongly typed and decoded form such as a POJO or a Tuple 
 object. Therefore is seems to make sense that we are able to pass this 
 through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
 encoding step. This of course still allows users to use {{byte[]}} if they 
 wish.
 * The introduction of {{RecordMutator}} requires that insert/update/delete 
 operations are then also exposed on a {{TransactionBatch}} type. We've done 
 this with the introduction of a public {{MutatorTransactionBatch}} interface 
 which is a counterpart to the write-only {{TransactionBatch}}. We've also 
 factored out life-cycle methods common to these two interfaces into a super 
 {{BaseTransactionBatch}} interface. 
 * Functionality that would be shared by implementations of both 
 {{RecordWriters}} and {{RecordMutators}} has been factored out of 
 {{AbstractRecordWriter}} into a

[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files

2015-04-22 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507225#comment-14507225
 ] 

Owen O'Malley commented on HIVE-4227:
-

I've started working on this. I'll post a patch this week.

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4227) Add column level encryption to ORC files

2015-04-22 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4227:

Labels:   (was: gsoc gsoc2013)

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-4227) Add column level encryption to ORC files

2015-04-22 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-4227:
---

Assignee: Owen O'Malley

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
  Labels: gsoc, gsoc2013

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column


[ 
https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506522#comment-14506522
 ] 

Hive QA commented on HIVE-10391:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727019/HIVE-10391.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8728 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3523/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3523/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3523/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727019 - PreCommit-HIVE-TRUNK-Build

 CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter 
 does not include a partition column
 -

 Key: HIVE-10391
 URL: https://issues.apache.org/jira/browse/HIVE-10391
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10391.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-04-22 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506493#comment-14506493
 ] 

Thejas M Nair commented on HIVE-9957:
-

[~subhashmv] If you don't want to build hive , you can also use Hive 1.0.0 
(unless you are looking for some specific hive 1.1.0 feature).



 Hive 1.1.0 not compatible with Hadoop 2.4.0
 ---

 Key: HIVE-9957
 URL: https://issues.apache.org/jira/browse/HIVE-9957
 Project: Hive
  Issue Type: Bug
  Components: Encryption
Reporter: Vivek Shrivastava
Assignee: Sergio Peña
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9957.1.patch


 Getting this exception while accessing data through Hive. 
 Exception in thread main java.lang.NoSuchMethodError: 
 org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.init(Hadoop23Shims.java:1152)
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
 at 
 org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin


 [ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Dholakia updated HIVE-10438:
--
Attachment: CompressorProtocolHS2.patch

 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10438
 URL: https://issues.apache.org/jira/browse/HIVE-10438
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.1.0
Reporter: Rohit Dholakia
  Labels: patch
 Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, 
 TestingIntegerCompression.pdf


 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors.
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL


 [ 
https://issues.apache.org/jira/browse/HIVE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-10239:
-
Attachment: HIVE-10239.02.patch

Attaching a patch that has some debug.

 Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and 
 PostgreSQL
 

 Key: HIVE-10239
 URL: https://issues.apache.org/jira/browse/HIVE-10239
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
 Attachments: HIVE-10239-donotcommit.patch, HIVE-10239.0.patch, 
 HIVE-10239.0.patch, HIVE-10239.00.patch, HIVE-10239.01.patch, 
 HIVE-10239.02.patch, HIVE-10239.patch


 Need to create DB-implementation specific scripts to use the framework 
 introduced in HIVE-9800 to have any metastore schema changes tested across 
 all supported databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column


[ 
https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507407#comment-14507407
 ] 

Ashutosh Chauhan commented on HIVE-10391:
-

+1 TODOs of patch I guess will resolve themselves once we increase test 
coverage : )
I guess hope is once this gets in, we can probably enable 
IdentityProjectRemoval optimization again on return path. Right ?

 CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter 
 does not include a partition column
 -

 Key: HIVE-10391
 URL: https://issues.apache.org/jira/browse/HIVE-10391
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10391.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9726) Upgrade to spark 1.3 [Spark Branch]

2015-04-22 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507419#comment-14507419
 ] 

Sushanth Sowmyan commented on HIVE-9726:


Hi,

I've had a request for inclusion of this patch in the upcoming 1.2 release. 
Looking at trunk's pom.xml, I see that the spark.version there is 1.2. Given 
that spark just released 1.3, is it feasible to port this patch to trunk as 
well?

 Upgrade to spark 1.3 [Spark Branch]
 ---

 Key: HIVE-9726
 URL: https://issues.apache.org/jira/browse/HIVE-9726
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-9671.1-spark.patch, HIVE-9726.1-spark.patch, 
 hive.log.txt.gz, yarn-am-stderr.txt, yarn-am-stdout.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column

2015-04-22 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507458#comment-14507458
 ] 

Pengcheng Xiong commented on HIVE-10391:


[~ashutoshc], im not sure if we can enable IdentityProjectRemoval because the 
reason that we turn IdentityProjectRemoval off is because of the difference 
between OI and RS. I will test and let you know.

 CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter 
 does not include a partition column
 -

 Key: HIVE-10391
 URL: https://issues.apache.org/jira/browse/HIVE-10391
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10391.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10275) GenericUDF getTimestampValue should return Timestamp instead of Date

2015-04-22 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507561#comment-14507561
 ] 

Jason Dere commented on HIVE-10275:
---

+1

 GenericUDF getTimestampValue should return Timestamp instead of Date
 

 Key: HIVE-10275
 URL: https://issues.apache.org/jira/browse/HIVE-10275
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10275.1.patch


 Currently getTimestampValue casts Timestamp to Date and returns Date.
 Hive Timestamp type stores time with nanosecond precision.
 Timestamp class has getNanos method to extract nanoseconds.
 Date class has getTime method which returns unix time in milliseconds.
 So, it order to be able to get nanoseconds from Timestamp fields 
 GenericUDF.getTimestampValue should return Timestamp instead of Date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10409) Webhcat tests need to be updated, to accomodate HADOOP-10193

2015-04-22 Thread Aswathy Chellammal Sreekumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar resolved HIVE-10409.
-
Resolution: Invalid

Issue addressed by https://issues.apache.org/jira/browse/HADOOP-11859, no need 
to change tests

 Webhcat tests need to be updated, to accomodate HADOOP-10193
 

 Key: HIVE-10409
 URL: https://issues.apache.org/jira/browse/HIVE-10409
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.2.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-10409.1.patch, HIVE-10409.patch


 Webhcat tests need to be updated to accommodate the url change brought in by 
 HADOOP-10193. Add ?user.name=user-name for the templeton calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10437) NullPointerException on queries where map/reduce is not involved on tables with partitions


[ 
https://issues.apache.org/jira/browse/HIVE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507644#comment-14507644
 ] 

Gunther Hagleitner commented on HIVE-10437:
---

Seems pretty serious to break backward compat for SerDes. fyi 
[~ashutoshc]/[~navis]

 NullPointerException on queries where map/reduce is not involved on tables 
 with partitions
 --

 Key: HIVE-10437
 URL: https://issues.apache.org/jira/browse/HIVE-10437
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Demeter Sztanko
Priority: Critical
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 On a table with partitions, whenever I try to do a simple query which tells 
 hive not to execute mapreduce but just read data straight from hdfs, it 
 raises an exception:
 {code}
 create external table jsonbug(
 a int,
 b string
 )
 PARTITIONED BY (
 `c` string)
 ROW FORMAT SERDE
   'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES (
   'ignore.malformed.json'='true')
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   '/tmp/jsonbug';
 ALTER TABLE jsonbug ADD PARTITION(c='1');
 {code}
 Runnin simple 
 {code}
 select * from jsonbug;
 {code}
 Raises the following exception:
 {code}
 FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Failed with exception nulljava.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.init(FetchOperator.java:140)
 at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 {code}
 It works fine if I execute a query involving map/reduce job though.
 This problem occurs only when using SerDe's created for hive versions pre 
 1.1.0, those which do not have @SerDeSpec annotation specified. Most of the 
 third party SerDE's, including hcat's JsonSerde have this problem as well. 
 It seems like changes made in HIVE-7977 introduce this bug. See 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 {code}
 Class? tableSerDe = tableDesc.getDeserializerClass();
 String[] schemaProps = AnnotationUtils.getAnnotation(tableSerDe, 
 SerDeSpec.class).schemaProps();
 {code}
 And it also seems like a relatively easy fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build


 [ 
https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10443:
-
Assignee: Vaibhav Gumashta

 HIVE-9870 broke hadoop-1 build
 --

 Key: HIVE-10443
 URL: https://issues.apache.org/jira/browse/HIVE-10443
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Vaibhav Gumashta

 JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. 
 HiveServer2.startPauseMonitor() does not use reflection properly to start 
 JvmPauseMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join

2015-04-22 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-10403:
-
Attachment: HIVE-10403.03.patch

Upload patch 03 for testing

 Add n-way join support for Hybrid Grace Hash Join
 -

 Key: HIVE-10403
 URL: https://issues.apache.org/jira/browse/HIVE-10403
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-10403.01.patch, HIVE-10403.02.patch, 
 HIVE-10403.03.patch


 Currently Hybrid Grace Hash Join only supports 2-way join (one big table and 
 one small table). This task will enable n-way join (one big table and 
 multiple small tables).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10441) Fix confusing log statement in SessionState about hive.execution.engine setting

2015-04-22 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10441:
--
Attachment: HIVE-10441.1.patch

I don't see a lot of value in having this statement here to mention that no Tez 
Session is necessary, because that is redundant for mr/spark. Also, if a Tez 
session is created, there are log statements elsewhere for that. I'm just going 
to remove this log statement.

 Fix confusing log statement in SessionState about hive.execution.engine 
 setting
 ---

 Key: HIVE-10441
 URL: https://issues.apache.org/jira/browse/HIVE-10441
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10441.1.patch


 {code}
 LOG.info(No Tez session required at this point. hive.execution.engine=mr.);
 {code}
 This statement is misleading. It is true that it is printed in the case that 
 Tez session does not need to be created, but it is not necessarily true that 
 hive.execution.engine=mr - it could be Spark, or it could even be set to Tez 
 but the Session determined that a Tez Session did not need to be created 
 (which is the case for HiveServer2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10444) HIVE-10223 breaks hadoop-1 build


 [ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10444:
-
Assignee: Gunther Hagleitner

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Gunther Hagleitner

 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10347) Merge spark to trunk 4/15/2015

2015-04-22 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507684#comment-14507684
 ] 

Brock Noland commented on HIVE-10347:
-

+1

 Merge spark to trunk 4/15/2015
 --

 Key: HIVE-10347
 URL: https://issues.apache.org/jira/browse/HIVE-10347
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, 
 HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, 
 HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9726) Upgrade to spark 1.3 [Spark Branch]

2015-04-22 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507578#comment-14507578
 ] 

Sushanth Sowmyan commented on HIVE-9726:


+cc [~xuefuz] : Same question as above. :)

 Upgrade to spark 1.3 [Spark Branch]
 ---

 Key: HIVE-9726
 URL: https://issues.apache.org/jira/browse/HIVE-9726
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-9671.1-spark.patch, HIVE-9726.1-spark.patch, 
 hive.log.txt.gz, yarn-am-stderr.txt, yarn-am-stdout.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10385) Optionally disable partition creation to speedup ETL jobs

2015-04-22 Thread Slava Markeyev (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507584#comment-14507584
 ] 

Slava Markeyev commented on HIVE-10385:
---

This came up on the mailing list the other week. The use case that several 
people seem to have is we use hive to ETL and partition data. We don't 
necessarily care about the metastore partitions because the output data gets 
moved (at least in my case) after the query is completed. This makes the table 
partitions unnecessary.

 Optionally disable partition creation to speedup ETL jobs
 -

 Key: HIVE-10385
 URL: https://issues.apache.org/jira/browse/HIVE-10385
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Slava Markeyev
Priority: Minor
 Attachments: HIVE-10385.patch


 ETL jobs that create dynamic partitions with high cardinality perform the 
 expensive step of metastore partition creation after query completion. Until 
 bulk partition creation can be optimized there should be a way of optionally 
 skipping this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column

2015-04-22 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507594#comment-14507594
 ] 

Pengcheng Xiong commented on HIVE-10391:


[~ashutoshc], as I just tested, cbo_union.q will still fail if we turn 
IdentityProjectRemoval on with this patch.

 CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter 
 does not include a partition column
 -

 Key: HIVE-10391
 URL: https://issues.apache.org/jira/browse/HIVE-10391
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Laljo John Pullokkaran
 Fix For: 1.2.0

 Attachments: HIVE-10391.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10442) HIVE-10098 broke hadoop-1 build


[ 
https://issues.apache.org/jira/browse/HIVE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507639#comment-14507639
 ] 

Prasanth Jayachandran commented on HIVE-10442:
--

[~csun]/[~ychena] Can someone take a look at this one?

 HIVE-10098 broke hadoop-1 build
 ---

 Key: HIVE-10442
 URL: https://issues.apache.org/jira/browse/HIVE-10442
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth Jayachandran

 fs.addDelegationTokens() method does not seem to exist in hadoop 1.2.1. This 
 breaks the hadoop-1 builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10437) NullPointerException on queries where map/reduce is not involved on tables with partitions


 [ 
https://issues.apache.org/jira/browse/HIVE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10437:
--
Priority: Critical  (was: Minor)

 NullPointerException on queries where map/reduce is not involved on tables 
 with partitions
 --

 Key: HIVE-10437
 URL: https://issues.apache.org/jira/browse/HIVE-10437
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Demeter Sztanko
Priority: Critical
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 On a table with partitions, whenever I try to do a simple query which tells 
 hive not to execute mapreduce but just read data straight from hdfs, it 
 raises an exception:
 {code}
 create external table jsonbug(
 a int,
 b string
 )
 PARTITIONED BY (
 `c` string)
 ROW FORMAT SERDE
   'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES (
   'ignore.malformed.json'='true')
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION
   '/tmp/jsonbug';
 ALTER TABLE jsonbug ADD PARTITION(c='1');
 {code}
 Runnin simple 
 {code}
 select * from jsonbug;
 {code}
 Raises the following exception:
 {code}
 FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Failed with exception nulljava.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:578)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.init(FetchOperator.java:140)
 at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 {code}
 It works fine if I execute a query involving map/reduce job though.
 This problem occurs only when using SerDe's created for hive versions pre 
 1.1.0, those which do not have @SerDeSpec annotation specified. Most of the 
 third party SerDE's, including hcat's JsonSerde have this problem as well. 
 It seems like changes made in HIVE-7977 introduce this bug. See 
 org.apache.hadoop.hive.ql.exec.FetchOperator.needConversion(FetchOperator.java:607)
 {code}
 Class? tableSerDe = tableDesc.getDeserializerClass();
 String[] schemaProps = AnnotationUtils.getAnnotation(tableSerDe, 
 SerDeSpec.class).schemaProps();
 {code}
 And it also seems like a relatively easy fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10443) HIVE-9870 broke hadoop-1 build


[ 
https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507649#comment-14507649
 ] 

Prasanth Jayachandran commented on HIVE-10443:
--

[~vgumashta] fyi..

 HIVE-9870 broke hadoop-1 build
 --

 Key: HIVE-10443
 URL: https://issues.apache.org/jira/browse/HIVE-10443
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Vaibhav Gumashta

 JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. 
 HiveServer2.startPauseMonitor() does not use reflection properly to start 
 JvmPauseMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10442) HIVE-10098 broke hadoop-1 build


 [ 
https://issues.apache.org/jira/browse/HIVE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10442:
-
Affects Version/s: 1.2.0

 HIVE-10098 broke hadoop-1 build
 ---

 Key: HIVE-10442
 URL: https://issues.apache.org/jira/browse/HIVE-10442
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran

 fs.addDelegationTokens() method does not seem to exist in hadoop 1.2.1. This 
 breaks the hadoop-1 builds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9726) Upgrade to spark 1.3 [Spark Branch]

2015-04-22 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507661#comment-14507661
 ] 

Sushanth Sowmyan commented on HIVE-9726:


Awesome, thanks - I'll add that jira into the list then.

 Upgrade to spark 1.3 [Spark Branch]
 ---

 Key: HIVE-9726
 URL: https://issues.apache.org/jira/browse/HIVE-9726
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-9671.1-spark.patch, HIVE-9726.1-spark.patch, 
 hive.log.txt.gz, yarn-am-stderr.txt, yarn-am-stdout.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10444) HIVE-10223 breaks hadoop-1 build


[ 
https://issues.apache.org/jira/browse/HIVE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507674#comment-14507674
 ] 

Prasanth Jayachandran commented on HIVE-10444:
--

[~hagleitn] FYI...

 HIVE-10223 breaks hadoop-1 build
 

 Key: HIVE-10444
 URL: https://issues.apache.org/jira/browse/HIVE-10444
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Gunther Hagleitner

 FileStatus.isFile() and FileStatus.isDirectory() methods added in HIVE-10223 
 are not present in hadoop 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL


[ 
https://issues.apache.org/jira/browse/HIVE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507723#comment-14507723
 ] 

Hive QA commented on HIVE-10239:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727282/HIVE-10239.02.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8728 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3528/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3528/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3528/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727282 - PreCommit-HIVE-TRUNK-Build

 Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and 
 PostgreSQL
 

 Key: HIVE-10239
 URL: https://issues.apache.org/jira/browse/HIVE-10239
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
 Attachments: HIVE-10239-donotcommit.patch, HIVE-10239.0.patch, 
 HIVE-10239.0.patch, HIVE-10239.00.patch, HIVE-10239.01.patch, 
 HIVE-10239.02.patch, HIVE-10239.patch


 Need to create DB-implementation specific scripts to use the framework 
 introduced in HIVE-9800 to have any metastore schema changes tested across 
 all supported databases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10227) Concrete implementation of Export/Import based ReplicationTaskFactory

2015-04-22 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507738#comment-14507738
 ] 

Alan Gates commented on HIVE-10227:
---

+1 to committing this since we're agreed on 98% of it.

I'm open to where you're going with this on the InvalidStateFactory, we'll 
continue the discussion on the other JIRA.

 Concrete implementation of Export/Import based ReplicationTaskFactory
 -

 Key: HIVE-10227
 URL: https://issues.apache.org/jira/browse/HIVE-10227
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-10227.2.patch, HIVE-10227.3.patch, 
 HIVE-10227.4.patch, HIVE-10227.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite

2015-04-22 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507739#comment-14507739
 ] 

Laljo John Pullokkaran commented on HIVE-10416:
---

[~jcamachorodriguez] Introducing top level select needs to traverse recursively 
as long as nodes are sortrel and !ProjectRel. Practically this may happen only 
in very few cases (may be OB followed by limit).

regardless its better to traverse it down till you hit a non sort rel.

 CBO (Calcite Return Path): Fix return columns if Sort operator is on top of 
 plan returned by Calcite
 

 Key: HIVE-10416
 URL: https://issues.apache.org/jira/browse/HIVE-10416
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.2.0

 Attachments: HIVE-10416.01.patch, HIVE-10416.patch


 When return path is on, if the plan's top operator is a Sort, we need to 
 produce a SelectOp that will output exactly the columns needed by the FS.
 The following query reproduces the problem:
 {noformat}
 select cbo_t3.c_int, c, count(*)
 from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1
 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int  0 or cbo_t1.c_float = 0)
 group by c_float, cbo_t1.c_int, key order by a) cbo_t1
 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2
 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int  0 or cbo_t2.c_float = 0)
 group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on 
 cbo_t1.a=p
 join cbo_t3 on cbo_t1.a=key
 where (b + cbo_t2.q = 0) and (b  0 or c_int = 0)
 group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation

2015-04-22 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507765#comment-14507765
 ] 

Alan Gates commented on HIVE-10426:
---

One thing that I mis-understood before that I want to make sure I have right 
now:  ReplicationTask will be called in the context of the client.  Part of my 
concern over the repeating error was that I thought this was being called in 
the context of the metastore server.  In the client the repeated logs is less 
of a concern.

I think this is a better approach with the invalid factory returning an error 
early and refusing to allow instantiation of replication tasks.

+1


 Rework/simplify ReplicationTaskFactory instantiation
 

 Key: HIVE-10426
 URL: https://issues.apache.org/jira/browse/HIVE-10426
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-10426.patch


 Creating a new jira to continue discussions from HIVE-10227 as to what 
 ReplicationTask.Factory instantiation should look like.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10441) Fix confusing log statement in SessionState about hive.execution.engine setting

2015-04-22 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507711#comment-14507711
 ] 

Gunther Hagleitner commented on HIVE-10441:
---

+1

 Fix confusing log statement in SessionState about hive.execution.engine 
 setting
 ---

 Key: HIVE-10441
 URL: https://issues.apache.org/jira/browse/HIVE-10441
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10441.1.patch


 {code}
 LOG.info(No Tez session required at this point. hive.execution.engine=mr.);
 {code}
 This statement is misleading. It is true that it is printed in the case that 
 Tez session does not need to be created, but it is not necessarily true that 
 hive.execution.engine=mr - it could be Spark, or it could even be set to Tez 
 but the Session determined that a Tez Session did not need to be created 
 (which is the case for HiveServer2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore


 [ 
https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-4625:

Attachment: HIVE-4625.5.patch

 HS2 should not attempt to get delegation token from metastore if using 
 embedded metastore
 -

 Key: HIVE-4625
 URL: https://issues.apache.org/jira/browse/HIVE-4625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, 
 HIVE-4625.4.patch, HIVE-4625.5.patch


 In kerberos secure mode, with doas enabled, Hive server2 tries to get 
 delegation token from metastore even if the metastore is being used in 
 embedded mode. 
 To avoid failure in that case, it uses catch block for 
 UnsupportedOperationException thrown that does nothing. But this leads to an 
 error being logged  by lower levels and can mislead users into thinking that 
 there is a problem.
 It should check if delegation token mode is supported with current 
 configuration before calling the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures

2015-04-22 Thread Richard Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508162#comment-14508162
 ] 

Richard Williams commented on HIVE-10410:
-

[~vgumashta] I can confirm that this issue only seems to manifest with a remote 
metastore.

[~ekoifman] That's what I suspect as well. I was taking a look at the code that 
implements asynchronous execution of submitted statements in 
org.apache.hive.service.cli.operation.SqlOperation, and I noticed this 
suspicious-looking bit of code in the runInternal method:

{noformat}
// ThreadLocal Hive object needs to be set in background thread.
// The metastore client in Hive is associated with right user.
final Hive parentHive = getSessionHive();
// Current UGI will get used by metastore when metsatore is in embedded mode
// So this needs to get passed to the new background thread
final UserGroupInformation currentUGI = getCurrentUGI(opConfig);
// Runnable impl to call runInternal asynchronously,
// from a different thread
Runnable backgroundOperation = new Runnable() {
@Override
public void run() {
  PrivilegedExceptionActionObject doAsAction = new 
PrivilegedExceptionActionObject() {
@Override
public Object run() throws HiveSQLException {
  Hive.set(parentHive);
  SessionState.setCurrentSessionState(parentSessionState);
  // Set current OperationLog in this async thread for keeping on saving 
query log.
  registerCurrentOperationLog();
  try {
runQuery(opConfig);
  } catch (HiveSQLException e) {
setOperationException(e);
LOG.error(Error running hive query: , e);
  } finally {
unregisterOperationLog();
  }
  return null;
}
  };

{noformat}

Correct me if I'm wrong, but it seems to me that passing the parent thread's 
ThreadLocal Hive object to Hive.set in the children will effectively thwart the 
usage of ThreadLocal, resulting in the children and the parent all sharing the 
same Hive object. There are a number of paths in which calls to one of the 
Hive.get methods result in the current ThreadLocal Hive object being removed 
from the ThreadLocal map and replaced with a new Hive instance; however, I 
don't see anything that guarantees that that always happens on the first call 
to Hive.get in the child threads.

 Apparent race condition in HiveServer2 causing intermittent query failures
 --

 Key: HIVE-10410
 URL: https://issues.apache.org/jira/browse/HIVE-10410
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
 Environment: CDH 5.3.3
 CentOS 6.4
Reporter: Richard Williams

 On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC 
 occasionally trigger odd Thrift exceptions with messages such as Read a 
 negative frame size (-2147418110)! or out of sequence response in 
 HiveServer2's connections to the metastore. For certain metastore calls (for 
 example, showDatabases), these Thrift exceptions are converted to 
 MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient 
 from retrying these calls and thus causes the failure to bubble out to the 
 JDBC client.
 Note that as far as we can tell, this issue appears to only affect queries 
 that are submitted with the runAsync flag on TExecuteStatementReq set to true 
 (which, in practice, seems to mean all JDBC queries), and it appears to only 
 manifest when HiveServer2 is using the new HTTP transport mechanism. When 
 both these conditions hold, we are able to fairly reliably reproduce the 
 issue by spawning about 100 simple, concurrent hive queries (we have been 
 using show databases), two or three of which typically fail. However, when 
 either of these conditions do not hold, we are no longer able to reproduce 
 the issue.
 Some example stack traces from the HiveServer2 logs:
 {noformat}
 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: 
 org.apache.thrift.transport.TTransportException Read a negative frame size 
 (-2147418110)!
 org.apache.thrift.transport.TTransportException: Read a negative frame size 
 (-2147418110)!
 at 
 org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
 at 
 org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
 at 
 org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at

[jira] [Commented] (HIVE-9824) LLAP: Native Vectorization of Map Join

2015-04-22 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508178#comment-14508178
 ] 

Lefty Leverenz commented on HIVE-9824:
--

Doc note:  This adds 5 configuration parameters (and changes indentation of the 
descriptions for 2 others).

*  hive.vectorized.execution.mapjoin.native.enabled
*  hive.vectorized.execution.mapjoin.native.multikey.only.enabled
*  hive.vectorized.execution.mapjoin.minmax.enabled
*  hive.vectorized.execution.mapjoin.overflow.repeated.threshold
*  hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled

The new parameters need to be documented in the Vectorization section of 
Configuration Properties:

* [Configuration Properties -- Vectorization | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization]

Is any other documentation needed?

 LLAP: Native Vectorization of Map Join
 --

 Key: HIVE-9824
 URL: https://issues.apache.org/jira/browse/HIVE-9824
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, 
 HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, 
 HIVE-9824.08.patch, HIVE-9824.09.patch


 Today's VectorMapJoinOperator is a pass-through that converts each row from a 
 vectorized row batch in a Java Object[] row and passes it to the 
 MapJoinOperator superclass.
 This enhancement creates specialized vectorized map join operator classes 
 that are optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10451) IdentityProjectRemover removed useful project


 [ 
https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10451:

Attachment: HIVE-10451.patch

 IdentityProjectRemover removed useful project
 -

 Key: HIVE-10451
 URL: https://issues.apache.org/jira/browse/HIVE-10451
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10451.patch


 In this particular case Select on top of PTF Op is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10451) IdentityProjectRemover removed useful project


 [ 
https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10451:

Reporter: Gopal V  (was: Ashutosh Chauhan)

 IdentityProjectRemover removed useful project
 -

 Key: HIVE-10451
 URL: https://issues.apache.org/jira/browse/HIVE-10451
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Gopal V
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10451.patch


 In this particular case Select on top of PTF Op is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.


 [ 
https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-10452:
-
Attachment: HIVE-10452.patch

Attached in a patch to resolve this issue.

 Followup fix for HIVE-10202 to restrict it it for script mode.
 --

 Key: HIVE-10452
 URL: https://issues.apache.org/jira/browse/HIVE-10452
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Attachments: HIVE-10452.patch


 The fix made in HIVE-10202 needs to be limited to when beeline is running in 
 a script mode aka -f option. Otherwise, if --silent=true is set in 
 interactive mode, the prompt disappears and so does what you type in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.


 [ 
https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-10452:
-
Description: 
The fix made in HIVE-10202 needs to be limited to when beeline is running in a 
script mode aka -f option. Otherwise, if --silent=true is set in interactive 
mode, the prompt disappears and so does what you type in.



 Followup fix for HIVE-10202 to restrict it it for script mode.
 --

 Key: HIVE-10452
 URL: https://issues.apache.org/jira/browse/HIVE-10452
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor

 The fix made in HIVE-10202 needs to be limited to when beeline is running in 
 a script mode aka -f option. Otherwise, if --silent=true is set in 
 interactive mode, the prompt disappears and so does what you type in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-04-22 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-10453:
---

Assignee: Yongzhi Chen

 HS2 leaking open file descriptors when using UDFs
 -

 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 1. create a custom function by
 CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
 2. Create a simple jdbc client, just do 
 connect, 
 run simple query which using the function such as:
 select myfunc(col1) from sometable
 3. Disconnect.
 Check open file for HiveServer2 by:
 lsof -p HSProcID | grep myudf.jar
 You will see the leak as:
 {noformat}
 java  28718 ychen  txt  REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 java  28718 ychen  330r REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10233) Hive on LLAP: Memory manager

2015-04-22 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10233:
--
Attachment: HIVE-10233-WIP-5.patch

Fix runtime issues.

 Hive on LLAP: Memory manager
 

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort


 [ 
https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10456:
-
Attachment: HIVE-10456.1.patch

 Grace Hash Join should not load spilled partitions on abort
 ---

 Key: HIVE-10456
 URL: https://issues.apache.org/jira/browse/HIVE-10456
 Project: Hive
  Issue Type: Bug
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10456.1.patch


 Grace Hash Join loads the spilled partitions to complete the join in 
 closeOp(). This should not happen when closeOp with abort is invoked. Instead 
 it should clean up all the spilled data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort


[ 
https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508292#comment-14508292
 ] 

Prasanth Jayachandran commented on HIVE-10456:
--

[~hagleitn]/[~wzheng]/[~mmokhtar] Can someone review this patch?

 Grace Hash Join should not load spilled partitions on abort
 ---

 Key: HIVE-10456
 URL: https://issues.apache.org/jira/browse/HIVE-10456
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10456.1.patch


 Grace Hash Join loads the spilled partitions to complete the join in 
 closeOp(). This should not happen when closeOp with abort is invoked. Instead 
 it should clean up all the spilled data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10347) Merge spark to trunk 4/15/2015


[ 
https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508326#comment-14508326
 ] 

Hive QA commented on HIVE-10347:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727366/HIVE-10347.6.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 8760 tests 
executed
*Failed tests:*
{noformat}
TestCompareCliDriver - did not produce a TEST-*.xml file
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3533/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3533/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3533/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727366 - PreCommit-HIVE-TRUNK-Build

 Merge spark to trunk 4/15/2015
 --

 Key: HIVE-10347
 URL: https://issues.apache.org/jira/browse/HIVE-10347
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, 
 HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, 
 HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin

2015-04-22 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-10438:
--
Assignee: Rohit Dholakia

 Architecture for  ResultSet Compression via external plugin
 ---

 Key: HIVE-10438
 URL: https://issues.apache.org/jira/browse/HIVE-10438
 Project: Hive
  Issue Type: New Feature
  Components: Hive, Thrift API
Affects Versions: 1.1.0
Reporter: Rohit Dholakia
Assignee: Rohit Dholakia
  Labels: patch
 Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, 
 TestingIntegerCompression.pdf, hs2resultSetcompressor.zip


 This JIRA proposes an architecture for enabling ResultSet compression which 
 uses an external plugin. 
 The patch has three aspects to it: 
 0. An architecture for enabling ResultSet compression with external plugins
 1. An example plugin to demonstrate end-to-end functionality 
 2. A container to allow everyone to write and test ResultSet compressors.
 Also attaching a design document explaining the changes, experimental results 
 document, and a pdf explaining how to setup the docker container to observe 
 end-to-end functionality of ResultSet compression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10451) IdentityProjectRemover removed useful project


[ 
https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508404#comment-14508404
 ] 

Hive QA commented on HIVE-10451:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727484/HIVE-10451.patch

{color:red}ERROR:{color} -1 due to 131 failed/errored test(s), 8728 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input39
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merge_multi_expressions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadataonly1

[jira] [Commented] (HIVE-10429) LLAP: Abort hive tez processor on interrupts


[ 
https://issues.apache.org/jira/browse/HIVE-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508187#comment-14508187
 ] 

Prasanth Jayachandran commented on HIVE-10429:
--

[~hagleitn] can you take a look at this patch?

 LLAP: Abort hive tez processor on interrupts
 

 Key: HIVE-10429
 URL: https://issues.apache.org/jira/browse/HIVE-10429
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10429.1.patch


 Executors in LLAP can be interrupted by the user (kill) or by system 
 (pre-emption). The task interruption should be propagated all the way down to 
 the operator pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10429) LLAP: Abort hive tez processor on interrupts


 [ 
https://issues.apache.org/jira/browse/HIVE-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10429:
-
Attachment: HIVE-10429.1.patch

 LLAP: Abort hive tez processor on interrupts
 

 Key: HIVE-10429
 URL: https://issues.apache.org/jira/browse/HIVE-10429
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10429.1.patch


 Executors in LLAP can be interrupted by the user (kill) or by system 
 (pre-emption). The task interruption should be propagated all the way down to 
 the operator pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10429) LLAP: Abort hive tez processor on interrupts


[ 
https://issues.apache.org/jira/browse/HIVE-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508230#comment-14508230
 ] 

Gunther Hagleitner commented on HIVE-10429:
---

Comments on rb

 LLAP: Abort hive tez processor on interrupts
 

 Key: HIVE-10429
 URL: https://issues.apache.org/jira/browse/HIVE-10429
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10429.1.patch


 Executors in LLAP can be interrupted by the user (kill) or by system 
 (pre-emption). The task interruption should be propagated all the way down to 
 the operator pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10323) Tez merge join operator does not honor hive.join.emit.interval

2015-04-22 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10323:
--
Attachment: HIVE-10323.2.patch

 Tez merge join operator does not honor hive.join.emit.interval
 --

 Key: HIVE-10323
 URL: https://issues.apache.org/jira/browse/HIVE-10323
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10323.1.patch, HIVE-10323.2.patch


 This affects efficiency in case of skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10324) Hive metatool should take table_param_key to allow for changes to avro serde's schema url key

2015-04-22 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508270#comment-14508270
 ] 

Ferdinand Xu commented on HIVE-10324:
-

Awesome!! You can add the following information to the use example section. 
Thanks [~leftylev]
{noformat}
./hive --service metatool -updateLocation hdfs://localhost:9000 
hdfs://namenode2:8020 -tablePropKey avro.schema.url -serdePropKey 
avro.schema.url
Initializing HiveMetaTool..
15/04/22 14:18:42 INFO metastore.ObjectStore: ObjectStore, initialize called
15/04/22 14:18:42 INFO DataNucleus.Persistence: Property 
hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/04/22 14:18:42 INFO DataNucleus.Persistence: Property 
datanucleus.cache.level2 unknown - will be ignored
15/04/22 14:18:43 INFO metastore.ObjectStore: Setting MetaStore object pin 
classes with 
hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order
15/04/22 14:18:43 INFO DataNucleus.Datastore: The class 
org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as 
embedded-only so does not have its own datastore table.
15/04/22 14:18:43 INFO DataNucleus.Datastore: The class 
org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so 
does not have its own datastore table.
15/04/22 14:18:44 INFO DataNucleus.Datastore: The class 
org.apache.hadoop.hive.metastore.model.MFieldSchema is tagged as 
embedded-only so does not have its own datastore table.
15/04/22 14:18:44 INFO DataNucleus.Datastore: The class 
org.apache.hadoop.hive.metastore.model.MOrder is tagged as embedded-only so 
does not have its own datastore table.
15/04/22 14:18:44 INFO DataNucleus.Query: Reading in results for query 
org.datanucleus.store.rdbms.query.SQLQuery@0 since the connection used is 
closing
15/04/22 14:18:44 INFO metastore.MetaStoreDirectSql: Using direct SQL, 
underlying DB is MYSQL
15/04/22 14:18:44 INFO metastore.ObjectStore: Initialized ObjectStore
Looking for LOCATION_URI field in DBS table to update..
Successfully updated the following locations..
Updated 0 records in DBS table
Looking for LOCATION field in SDS table to update..
Successfully updated the following locations..
Updated 0 records in SDS table
Looking for value of avro.schema.url key in TABLE_PARAMS table to update..
Successfully updated the following locations..
Updated 0 records in TABLE_PARAMS table
Looking for value of avro.schema.url key in SD_PARAMS table to update..
Successfully updated the following locations..
Updated 0 records in SD_PARAMS table
Looking for value of avro.schema.url key in SERDE_PARAMS table to update..
Successfully updated the following locations..
Updated 0 records in SERDE_PARAMS table
{noformat} 

 Hive metatool should take table_param_key to allow for changes to avro 
 serde's schema url key
 -

 Key: HIVE-10324
 URL: https://issues.apache.org/jira/browse/HIVE-10324
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Ferdinand Xu
 Fix For: 1.2.0

 Attachments: HIVE-10324.1.patch, HIVE-10324.patch, 
 HIVE-10324.patch.WIP


 HIVE-3443 added support to change the serdeParams from 'metatool 
 updateLocation' command.
 However, in avro it is possible to specify the schema via the tableParams:
 {noformat}
 CREATE  TABLE `testavro`(
   `test` string COMMENT 'from deserializer')
 ROW FORMAT SERDE 
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
 STORED AS INPUTFORMAT 
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
 OUTPUTFORMAT 
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
   'avro.schema.url'='hdfs://namenode:8020/tmp/test.avsc', 
   'kite.compression.type'='snappy', 
   'transient_lastDdlTime'='1427996456')
 {noformat}
 Hence for those tables the 'metatool updateLocation' will not help.
 This is necessary in case like upgrade the namenode to HA where the absolute 
 paths have changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10459) Add materialized views to Hive

2015-04-22 Thread Alan Gates (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates updated HIVE-10459:
--
Attachment: HIVE-10459.patch

This patch is a start at implementing simple views. It doesn't have enough
testing yet (e.g. there's no negative testing). And I know it fails in the
partitioned case. I suspect things like security and locking don't work
properly yet either. But I'm posting it as a starting point.

In this initial patch I'm just handling simple materialized views with manual
rebuilds. In later JIRAs we can add features such as allowing the optimizer to
rewrite queries to use materialized views rather than tables named in the
queries, giving the optimizer the ability to determine when a materialized view
is stale, etc.

Also, I didn't rebase this patch against trunk after the migration from
svn-git so it may not apply cleanly.

Add materialized views to Hive
--

Key: HIVE-10459
URL: https://issues.apache.org/jira/browse/HIVE-10459
Project: Hive
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Alan Gates
Attachments: HIVE-10459.patch

Materialized views are useful as ways to store either alternate versions of
data (e.g. same data, different sort order) or derivatives of data sets (e.g.
commonly used aggregates). It is useful to store these as materialized views
rather than as tables because it can give the optimizer the ability to
understand how data sets are related.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column

2015-04-22 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10413:
--
Attachment: HIVE-10413.2.patch

 [CBO] Return path assumes distinct column cant be same as grouping column
 -

 Key: HIVE-10413
 URL: https://issues.apache.org/jira/browse/HIVE-10413
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10413.1.patch, HIVE-10413.2.patch, HIVE-10413.patch


 Found in cbo_udf_udaf.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10410) Apparent race condition in HiveServer2 causing intermittent query failures

2015-04-22 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508195#comment-14508195
 ] 

Eugene Koifman commented on HIVE-10410:
---

I think you are right, ThreadLocal in this case doesn't prevent multiple 
threads sharing a connection.

 Apparent race condition in HiveServer2 causing intermittent query failures
 --

 Key: HIVE-10410
 URL: https://issues.apache.org/jira/browse/HIVE-10410
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
 Environment: CDH 5.3.3
 CentOS 6.4
Reporter: Richard Williams

 On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC 
 occasionally trigger odd Thrift exceptions with messages such as Read a 
 negative frame size (-2147418110)! or out of sequence response in 
 HiveServer2's connections to the metastore. For certain metastore calls (for 
 example, showDatabases), these Thrift exceptions are converted to 
 MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient 
 from retrying these calls and thus causes the failure to bubble out to the 
 JDBC client.
 Note that as far as we can tell, this issue appears to only affect queries 
 that are submitted with the runAsync flag on TExecuteStatementReq set to true 
 (which, in practice, seems to mean all JDBC queries), and it appears to only 
 manifest when HiveServer2 is using the new HTTP transport mechanism. When 
 both these conditions hold, we are able to fairly reliably reproduce the 
 issue by spawning about 100 simple, concurrent hive queries (we have been 
 using show databases), two or three of which typically fail. However, when 
 either of these conditions do not hold, we are no longer able to reproduce 
 the issue.
 Some example stack traces from the HiveServer2 logs:
 {noformat}
 2015-04-16 13:54:55,486 ERROR hive.log: Got exception: 
 org.apache.thrift.transport.TTransportException Read a negative frame size 
 (-2147418110)!
 org.apache.thrift.transport.TTransportException: Read a negative frame size 
 (-2147418110)!
 at 
 org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
 at 
 org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
 at 
 org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
 at 
 org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
 at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
 at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
 at 
 org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
 at 
 org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
 at

[jira] [Assigned] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column

2015-04-22 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-10413:
-

Assignee: Laljo John Pullokkaran  (was: Ashutosh Chauhan)

 [CBO] Return path assumes distinct column cant be same as grouping column
 -

 Key: HIVE-10413
 URL: https://issues.apache.org/jira/browse/HIVE-10413
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Ashutosh Chauhan
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-10413.1.patch, HIVE-10413.patch


 Found in cbo_udf_udaf.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10424) LLAP: Factor known capacity into scheduling decisions

2015-04-22 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10424:
--
Attachment: HIVE-10424.1.txt

Patch to factor in running queue + wait queue capacity per node.
Also moves all scheduler on to a single thread - requests go on to a queue and 
are taken off whenever a node becomes available, or has capacity.

Can run with the old 'unlimited' capacity by setting 
llap.task.scheduler.num.schedulable.tasks.per.node to -1

 LLAP: Factor known capacity into scheduling decisions
 -

 Key: HIVE-10424
 URL: https://issues.apache.org/jira/browse/HIVE-10424
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10424.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort


 [ 
https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10456:
-
Affects Version/s: (was: llap)
   1.2.0

 Grace Hash Join should not load spilled partitions on abort
 ---

 Key: HIVE-10456
 URL: https://issues.apache.org/jira/browse/HIVE-10456
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10456.1.patch


 Grace Hash Join loads the spilled partitions to complete the join in 
 closeOp(). This should not happen when closeOp with abort is invoked. Instead 
 it should clean up all the spilled data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10457) Merge trunk to spark (4/22/15) [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10457:
-
Summary: Merge trunk to spark (4/22/15) [Spark Branch]  (was: Merge trunk 
to spark (4/22/15))

 Merge trunk to spark (4/22/15) [Spark Branch]
 -

 Key: HIVE-10457
 URL: https://issues.apache.org/jira/browse/HIVE-10457
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9674) *DropPartitionEvent should handle partition-sets.


[ 
https://issues.apache.org/jira/browse/HIVE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508407#comment-14508407
 ] 

Hive QA commented on HIVE-9674:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12726535/HIVE-9674.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3535/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3535/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3535/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3535/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientpositive/windowing_navfn.q.out'
Reverted 'ql/src/test/queries/clientpositive/windowing_navfn.q'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target contrib/target service/target serde/target beeline/target 
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1675534.

At revision 1675534.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12726535 - PreCommit-HIVE-TRUNK-Build

 *DropPartitionEvent should handle partition-sets.
 -

 Key: HIVE-9674
 URL: https://issues.apache.org/jira/browse/HIVE-9674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9674.2.patch, HIVE-9674.3.patch, HIVE-9674.4.patch


 Dropping a set of N partitions from a table currently results in N 
 DropPartitionEvents (and N PreDropPartitionEvents) being fired serially. This 
 is wasteful, especially so for large N. It also makes it impossible to even 
 try to run authorization-checks on all partitions in a batch.
 Taking the cue from HIVE-9609, we should compose an {{IterablePartition}} 
 in the event, and expose them via an

[jira] [Commented] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation


[ 
https://issues.apache.org/jira/browse/HIVE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508409#comment-14508409
 ] 

Hive QA commented on HIVE-10426:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12726987/HIVE-10426.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3536/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3536/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3536/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3536/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1675535.

At revision 1675535.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12726987 - PreCommit-HIVE-TRUNK-Build

 Rework/simplify ReplicationTaskFactory instantiation
 

 Key: HIVE-10426
 URL: https://issues.apache.org/jira/browse/HIVE-10426
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-10426.patch


 Creating a new jira to continue discussions from HIVE-10227 as to what 
 ReplicationTask.Factory instantiation should look like.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5672) Insert with custom separator not supported for non-local directory

2015-04-22 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5672:

Attachment: HIVE-5672.5.patch.tar.gz

 Insert with custom separator not supported for non-local directory
 --

 Key: HIVE-5672
 URL: https://issues.apache.org/jira/browse/HIVE-5672
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 1.0.0
Reporter: Romain Rigaux
Assignee: Nemon Lou
 Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, 
 HIVE-5672.4.patch, HIVE-5672.5.patch.tar.gz


 https://issues.apache.org/jira/browse/HIVE-3682 is great but non local 
 directory don't seem to be supported:
 {code}
 insert overwrite directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select description FROM sample_07
 {code}
 {code}
 Error while compiling statement: FAILED: ParseException line 2:0 cannot 
 recognize input near 'row' 'format' 'delimited' in select clause
 {code}
 This works (with 'local'):
 {code}
 insert overwrite local directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select code, description FROM sample_07
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.

[
https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naveen Gangam updated HIVE-10452:
-
Description:
The fix made in HIVE-10202 needs to be limited to when beeline is running in a
script mode aka -f option. Otherwise, if --silent=true is set in interactive
mode, the prompt disappears and so does what you type in.
say
{code}
beeline -u jdbc:hive2://localhost:1 --silent=true

{code}
It appears to hang but in reality it doesnt display any prompt. The workaround
is to not use the --silent=true option with non-interactive mode.

was:
The fix made in HIVE-10202 needs to be limited to when beeline is running in a
script mode aka -f option. Otherwise, if --silent=true is set in interactive
mode, the prompt disappears and so does what you type in.

Followup fix for HIVE-10202 to restrict it it for script mode.
--

Key: HIVE-10452
URL: https://issues.apache.org/jira/browse/HIVE-10452
Project: Hive
Issue Type: Bug
Components: Beeline
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
Fix For: 1.2.0

Attachments: HIVE-10452.patch

The fix made in HIVE-10202 needs to be limited to when beeline is running in
a script mode aka -f option. Otherwise, if --silent=true is set in
interactive mode, the prompt disappears and so does what you type in.
say
{code}
beeline -u jdbc:hive2://localhost:1 --silent=true
{code}
It appears to hang but in reality it doesnt display any prompt. The
workaround is to not use the --silent=true option with non-interactive mode.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-22 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10454:

Description: 
The following queries fail:

{noformat}

create table t1 (c1 int) PARTITIONED BY (c2 string);

set hive.mapred.mode=strict;
select * from t1 where t1.c2  to_date(date_add(from_unixtime( unix_timestamp() 
),1));
{noformat}

The query failed with No partition predicate found for alias t1.



  was:
The following queries fail:

{noformat}

create table t1 (c1 int) PARTITIONED BY (c2 string);

set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
unix_timestamp() ),1));
{noformat}

The query failed with No partition predicate found for alias t1.




 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu

 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-22 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-10454:
---

Assignee: Aihua Xu

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu

 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10424) LLAP: Factor known capacity into scheduling decisions

2015-04-22 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved HIVE-10424.
---
Resolution: Fixed

 LLAP: Factor known capacity into scheduling decisions
 -

 Key: HIVE-10424
 URL: https://issues.apache.org/jira/browse/HIVE-10424
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10424.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5672) Insert with custom separator not supported for non-local directory

2015-04-22 Thread Nemon Lou (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5672:

Attachment: HIVE-5672.4.patch

Re base the patch.
the .out file will be added later.

 Insert with custom separator not supported for non-local directory
 --

 Key: HIVE-5672
 URL: https://issues.apache.org/jira/browse/HIVE-5672
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 1.0.0
Reporter: Romain Rigaux
Assignee: Nemon Lou
 Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, 
 HIVE-5672.4.patch


 https://issues.apache.org/jira/browse/HIVE-3682 is great but non local 
 directory don't seem to be supported:
 {code}
 insert overwrite directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select description FROM sample_07
 {code}
 {code}
 Error while compiling statement: FAILED: ParseException line 2:0 cannot 
 recognize input near 'row' 'format' 'delimited' in select clause
 {code}
 This works (with 'local'):
 {code}
 insert overwrite local directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select code, description FROM sample_07
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10403) Add n-way join support for Hybrid Grace Hash Join


[ 
https://issues.apache.org/jira/browse/HIVE-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507863#comment-14507863
 ] 

Hive QA commented on HIVE-10403:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727302/HIVE-10403.03.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8729 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3529/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3529/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3529/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727302 - PreCommit-HIVE-TRUNK-Build

 Add n-way join support for Hybrid Grace Hash Join
 -

 Key: HIVE-10403
 URL: https://issues.apache.org/jira/browse/HIVE-10403
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-10403.01.patch, HIVE-10403.02.patch, 
 HIVE-10403.03.patch


 Currently Hybrid Grace Hash Join only supports 2-way join (one big table and 
 one small table). This task will enable n-way join (one big table and 
 multiple small tables).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10328) Enable new return path for cbo


 [ 
https://issues.apache.org/jira/browse/HIVE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10328:

Attachment: HIVE-10328.2.patch

 Enable new return path for cbo
 --

 Key: HIVE-10328
 URL: https://issues.apache.org/jira/browse/HIVE-10328
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10328.1.patch, HIVE-10328.2.patch, HIVE-10328.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10441) Fix confusing log statement in SessionState about hive.execution.engine setting


[ 
https://issues.apache.org/jira/browse/HIVE-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508020#comment-14508020
 ] 

Hive QA commented on HIVE-10441:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727303/HIVE-10441.1.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8726 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3530/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3530/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3530/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727303 - PreCommit-HIVE-TRUNK-Build

 Fix confusing log statement in SessionState about hive.execution.engine 
 setting
 ---

 Key: HIVE-10441
 URL: https://issues.apache.org/jira/browse/HIVE-10441
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10441.1.patch


 {code}
 LOG.info(No Tez session required at this point. hive.execution.engine=mr.);
 {code}
 This statement is misleading. It is true that it is printed in the case that 
 Tez session does not need to be created, but it is not necessarily true that 
 hive.execution.engine=mr - it could be Spark, or it could even be set to Tez 
 but the Session determined that a Tez Session did not need to be created 
 (which is the case for HiveServer2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore


[ 
https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508150#comment-14508150
 ] 

Hive QA commented on HIVE-4625:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727315/HIVE-4625.5.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3531/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3531/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3531/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727315 - PreCommit-HIVE-TRUNK-Build

 HS2 should not attempt to get delegation token from metastore if using 
 embedded metastore
 -

 Key: HIVE-4625
 URL: https://issues.apache.org/jira/browse/HIVE-4625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, 
 HIVE-4625.4.patch, HIVE-4625.5.patch


 In kerberos secure mode, with doas enabled, Hive server2 tries to get 
 delegation token from metastore even if the metastore is being used in 
 embedded mode. 
 To avoid failure in that case, it uses catch block for 
 UnsupportedOperationException thrown that does nothing. But this leads to an 
 error being logged  by lower levels and can mislead users into thinking that 
 there is a problem.
 It should check if delegation token mode is supported with current 
 configuration before calling the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore

2015-04-22 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508168#comment-14508168
 ] 

Thejas M Nair commented on HIVE-4625:
-

+1


 HS2 should not attempt to get delegation token from metastore if using 
 embedded metastore
 -

 Key: HIVE-4625
 URL: https://issues.apache.org/jira/browse/HIVE-4625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, 
 HIVE-4625.4.patch, HIVE-4625.5.patch


 In kerberos secure mode, with doas enabled, Hive server2 tries to get 
 delegation token from metastore even if the metastore is being used in 
 embedded mode. 
 To avoid failure in that case, it uses catch block for 
 UnsupportedOperationException thrown that does nothing. But this leads to an 
 error being logged  by lower levels and can mislead users into thinking that 
 there is a problem.
 It should check if delegation token mode is supported with current 
 configuration before calling the function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9824) LLAP: Native Vectorization of Map Join

2015-04-22 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9824:
-
Labels: TODOC1.2  (was: )

 LLAP: Native Vectorization of Map Join
 --

 Key: HIVE-9824
 URL: https://issues.apache.org/jira/browse/HIVE-9824
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, 
 HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, 
 HIVE-9824.08.patch, HIVE-9824.09.patch


 Today's VectorMapJoinOperator is a pass-through that converts each row from a 
 vectorized row batch in a Java Object[] row and passes it to the 
 MapJoinOperator superclass.
 This enhancement creates specialized vectorized map join operator classes 
 that are optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc


 [ 
https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10397:
-
Attachment: (was: HIVE-10397.trunk.patch)

 LLAP: Implement Tez SplitSizeEstimator for Orc
 --

 Key: HIVE-10397
 URL: https://issues.apache.org/jira/browse/HIVE-10397
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10397.patch


 This is patch for HIVE-7428. For now this will be in llap branch as hive has 
 not bumped up the tez version yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10347) Merge spark to trunk 4/15/2015


 [ 
https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10347:
-
Attachment: HIVE-10347.6.patch

 Merge spark to trunk 4/15/2015
 --

 Key: HIVE-10347
 URL: https://issues.apache.org/jira/browse/HIVE-10347
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, 
 HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, 
 HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10347) Merge spark to trunk 4/15/2015


 [ 
https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10347:
-
Attachment: (was: HIVE-10347.6.patch)

 Merge spark to trunk 4/15/2015
 --

 Key: HIVE-10347
 URL: https://issues.apache.org/jira/browse/HIVE-10347
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, 
 HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, 
 HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication

2015-04-22 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508124#comment-14508124
 ] 

Xuefu Zhang commented on HIVE-10312:


+1

 SASL.QOP in JDBC URL is ignored for Delegation token Authentication
 ---

 Key: HIVE-10312
 URL: https://issues.apache.org/jira/browse/HIVE-10312
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 1.2.0
Reporter: Mubashir Kazia
Assignee: Mubashir Kazia
 Fix For: 1.2.0

 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch


 When HS2 is configured for QOP other than auth (auth-int or auth-conf), 
 Kerberos client connection works fine when the JDBC URL specifies the 
 matching QOP, however when this HS2 is accessed through Oozie (Delegation 
 token / Digest authentication), connections fails because the JDBC driver 
 ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be 
 valid for DIGEST Auth mech.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10347) Merge spark to trunk 4/15/2015


 [ 
https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10347:
-
Attachment: HIVE-10347.6.patch

Made a version of the patch for git (now the repos have changed).  Submitting 
it to be tested.

 Merge spark to trunk 4/15/2015
 --

 Key: HIVE-10347
 URL: https://issues.apache.org/jira/browse/HIVE-10347
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, 
 HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, 
 HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10368) VectorExpressionWriter doesn't match vectorColumn during row spilling in HybridGraceHashJoin

2015-04-22 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507950#comment-14507950
 ] 

Wei Zheng commented on HIVE-10368:
--

Here's another similar failure, probably related.

TestMiniTezCliDriver.testCliDriver_vector_char_mapjoin1
  Caused by: java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast 
to org.apache.hadoop.hive.ql.io.orc.OrcStruct
  at 
org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:232)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.setVector(VectorizedBatchUtil.java:316)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedBatchUtil.addProjectedRowToBatchFrom(VectorizedBatchUtil.java:271)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.reProcessBigTable(VectorMapJoinOperator.java:320)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:530)
  at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:485)
  at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.closeOp(VectorMapJoinOperator.java:237)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:616)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:630)
  at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:324)
  ... 14 more

 VectorExpressionWriter doesn't match vectorColumn during row spilling in 
 HybridGraceHashJoin
 

 Key: HIVE-10368
 URL: https://issues.apache.org/jira/browse/HIVE-10368
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Wei Zheng
Assignee: Matt McCline

 This problem was exposed by HIVE-10284, when testing vectorized_context.q
 Below is the query and backtrace:
 {code}
 select store.s_city, ss_net_profit
 from store_sales
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN household_demographics ON store_sales.ss_hdemo_sk = 
 household_demographics.hd_demo_sk
 limit 100
 {code}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector cannot be cast to 
 org.apache.hadoop.hive.ql.exec.vector.LongColumnVector
   at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:175)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.getRowObject(VectorMapJoinOperator.java:347)
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.spillBigTableRow(VectorMapJoinOperator.java:306)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:390)
   ... 24 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc


 [ 
https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10397:
-
Attachment: (was: HIVE-10397.trunk.patch)

 LLAP: Implement Tez SplitSizeEstimator for Orc
 --

 Key: HIVE-10397
 URL: https://issues.apache.org/jira/browse/HIVE-10397
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch


 This is patch for HIVE-7428. For now this will be in llap branch as hive has 
 not bumped up the tez version yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc


 [ 
https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10397:
-
Attachment: HIVE-10397.trunk.patch

[~gopalv] I made a patch based on your suggestions. This patch removes 
OrcInputFormat implementing SplitSizeEstimator. Instead a new generic 
ColumnSizeEstimator is added to tez code path. But this will still not solve 
the hadoop-1 issue or compatible with older tez installs.

 LLAP: Implement Tez SplitSizeEstimator for Orc
 --

 Key: HIVE-10397
 URL: https://issues.apache.org/jira/browse/HIVE-10397
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch


 This is patch for HIVE-7428. For now this will be in llap branch as hive has 
 not bumped up the tez version yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc


 [ 
https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10397:
-
Attachment: HIVE-10397.trunk.patch

 LLAP: Implement Tez SplitSizeEstimator for Orc
 --

 Key: HIVE-10397
 URL: https://issues.apache.org/jira/browse/HIVE-10397
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch


 This is patch for HIVE-7428. For now this will be in llap branch as hive has 
 not bumped up the tez version yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10397) LLAP: Implement Tez SplitSizeEstimator for Orc


 [ 
https://issues.apache.org/jira/browse/HIVE-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10397:
-
Attachment: HIVE-10397.trunk.patch

 LLAP: Implement Tez SplitSizeEstimator for Orc
 --

 Key: HIVE-10397
 URL: https://issues.apache.org/jira/browse/HIVE-10397
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10397.patch, HIVE-10397.trunk.patch


 This is patch for HIVE-7428. For now this will be in llap branch as hive has 
 not bumped up the tez version yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9711) ORC Vectorization DoubleColumnVector.isRepeating=false if all entries are NaN


[ 
https://issues.apache.org/jira/browse/HIVE-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508084#comment-14508084
 ] 

Prasanth Jayachandran commented on HIVE-9711:
-

Committed patch to master. Thanks [~gopalv] for the patch!

 ORC Vectorization DoubleColumnVector.isRepeating=false if all entries are NaN
 -

 Key: HIVE-9711
 URL: https://issues.apache.org/jira/browse/HIVE-9711
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Vectorization
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 1.2.0

 Attachments: HIVE-9711.1.patch, HIVE-9711.2.patch, HIVE-9711.3.patch


 The isRepeating=true check uses Java equality, which results in NaN != NaN 
 comparison operations.
 The noNulls case needs the current check folded into the previous loop, while 
 the hasNulls case needs a logical AND of the isNull[] field instead of == 
 comparisons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9824) LLAP: Native Vectorization of Map Join so previously CPU bound queries shift their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)

2015-04-22 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508014#comment-14508014
 ] 

Matt McCline commented on HIVE-9824:


Added 2 new JIRA, as [~sershe] requested:

HIVE-10448: Consider replacing BytesBytesMultiHashMap with new fast hash table 
code of Native Vector Map Join

HIVE-10449: LLAP: Make new fast hash table for Native Vector Map Join work with 
Hybrid Grace


 LLAP: Native Vectorization of Map Join so previously CPU bound queries shift 
 their bottleneck to I/O and make it possible for the rest of LLAP to shine ;)
 --

 Key: HIVE-9824
 URL: https://issues.apache.org/jira/browse/HIVE-9824
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9824.01.patch, HIVE-9824.02.patch, 
 HIVE-9824.04.patch, HIVE-9824.06.patch, HIVE-9824.07.patch, 
 HIVE-9824.08.patch, HIVE-9824.09.patch


 Today's VectorMapJoinOperator is a pass-through that converts each row from a 
 vectorized row batch in a Java Object[] row and passes it to the 
 MapJoinOperator superclass.
 This enhancement creates specialized vectorized map join operator classes 
 that are optimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10452) Followup fix for HIVE-10202 to restrict it it for script mode.


[ 
https://issues.apache.org/jira/browse/HIVE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508474#comment-14508474
 ] 

Hive QA commented on HIVE-10452:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12727486/HIVE-10452.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8728 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did 
not produce a TEST-*.xml file
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3537/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3537/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3537/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12727486 - PreCommit-HIVE-TRUNK-Build

 Followup fix for HIVE-10202 to restrict it it for script mode.
 --

 Key: HIVE-10452
 URL: https://issues.apache.org/jira/browse/HIVE-10452
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-10452.patch


 The fix made in HIVE-10202 needs to be limited to when beeline is running in 
 a script mode aka -f option. Otherwise, if --silent=true is set in 
 interactive mode, the prompt disappears and so does what you type in.
 say 
 {code}
 beeline -u jdbc:hive2://localhost:1 --silent=true
 {code}
 It appears to hang but in reality it doesnt display any prompt. The 
 workaround is to not use the --silent=true option with non-interactive mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.


 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Attachment: (was: ReflectiveOperationWriter.java)

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Fix For: 1.2.0


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.
 h3. Implementation
 Our changes do not break the existing API contracts. Instead our approach has 
 been to consider the functionality offered by the existing API and our 
 proposed API as fulfilling separate and distinct use-cases. The existing API 
 is primarily focused on the task of continuously writing large volumes of new 
 data into a Hive table for near-immediate analysis. Our use-case however, is 
 concerned more with the frequent but not continuous ingestion of mutations to 
 a Hive table from some ETL merge process. Consequently we feel it is 
 justifiable to add our new functionality via an alternative set of public 
 interfaces and leave the existing API as is. This keeps both APIs clean and 
 focused at the expense of presenting additional options to potential users. 
 Wherever possible, shared implementation concerns have been factored out into 
 abstract base classes that are open to third-party extension. A detailed 
 breakdown of the changes is as follows:
 * We've introduced a public {{RecordMutator}} interface whose purpose is to 
 expose insert/update/delete operations to the user. This is a counterpart to 
 the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
 common to these two interfaces into a super {{RecordOperationWriter}} 
 interface.  Note that the row representation has be changed from {{byte[]}} 
 to {{Object}}. Within our data processing jobs our records are often 
 available in a strongly typed and decoded form such as a POJO or a Tuple 
 object. Therefore is seems to make sense that we are able to pass this 
 through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} 
 encoding step. This of course still allows users to use {{byte[]}} if they 
 wish.
 * The introduction of {{RecordMutator}} requires that insert/update/delete 
 operations are then also exposed on a {{TransactionBatch}} type. We've done 
 this with the introduction of a public {{MutatorTransactionBatch}} interface 
 which is a counterpart to the write-only {{TransactionBatch}}. We've also 
 factored out life-cycle methods common to these two interfaces into a super 
 {{BaseTransactionBatch}} interface. 
 * Functionality that would be shared by implementations of both 
 {{RecordWriters}} and {{RecordMutators}} has been factored out of 
 {{AbstractRecordWriter}} into a new abstract base class

[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.


 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Description: 
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes outside of Hive that merge changed facts into 
existing datasets. Traditionally we achieve this by: reading in a ground-truth 
dataset and a modified dataset, grouping by a key, sorting by a sequence and 
then applying a function to determine inserted, updated, and deleted rows. 
However, in our current scheme we must rewrite all partitions that may 
potentially contain changes. In practice the number of mutated records is very 
small when compared with the records contained in a partition. This approach 
results in a number of operational issues:
* Excessive amount of write activity required for small data changes.
* Downstream applications cannot robustly read these datasets while they are 
being updated.
* Due to scale of the updates (hundreds or partitions) the scope for contention 
is high. 

I believe we can address this problem by instead writing only the changed 
records to a Hive transactional table. This should drastically reduce the 
amount of data that we need to write and also provide a means for managing 
concurrent access to the data. Our existing merge processes can read and retain 
each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an 
updated form of the hive-hcatalog-streaming API which will then have the 
required data to perform an update or insert in a transactional manner. 

h3. Benefits
* Enables the creation of large-scale dataset merge processes  
* Opens up Hive transactional functionality in an accessible manner to 
processes that operate outside of Hive.

h3. Implementation
Our changes do not break the existing API contracts. Instead our approach has 
been to consider the functionality offered by the existing API and our proposed 
API as fulfilling separate and distinct use-cases. The existing API is 
primarily focused on the task of continuously writing large volumes of new data 
into a Hive table for near-immediate analysis. Our use-case however, is 
concerned more with the frequent but not continuous ingestion of mutations to a 
Hive table from some ETL merge process. Consequently we feel it is justifiable 
to add our new functionality via an alternative set of public interfaces and 
leave the existing API as is. This keeps both APIs clean and focused at the 
expense of presenting additional options to potential users. Wherever possible, 
shared implementation concerns have been factored out into abstract base 
classes that are open to third-party extension. A detailed breakdown of the 
changes is as follows:

* We've introduced a public {{RecordMutator}} interface whose purpose is to 
expose insert/update/delete operations to the user. This is a counterpart to 
the write-only {{RecordWriter}}. We've also factored out life-cycle methods 
common to these two interfaces into a super {{RecordOperationWriter}} 
interface.  Note that the row representation has be changed from {{byte[]}} to 
{{Object}}. Within our data processing jobs our records are often available in 
a strongly typed and decoded form such as a POJO or a Tuple object. Therefore 
is seems to make sense that we are able to pass this through to the 
{{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. 
This of course still allows users to use {{byte[]}} if they wish.
* The introduction of {{RecordMutator}} requires that insert/update/delete 
operations are then also exposed on a {{TransactionBatch}} type. We've done 
this with the introduction of a public {{MutatorTransactionBatch}} interface 
which is a counterpart to the write-only {{TransactionBatch}}. We've also 
factored out life-cycle methods common to these two interfaces into a super 
{{BaseTransactionBatch}} interface. 
* Functionality that would be shared by implementations of both 
{{RecordWriters}} and {{RecordMutators}} has been factored out of 
{{AbstractRecordWriter}} into a new abstract base class 
{{AbstractOperationRecordWriter}}. The visibility is such that it is open to 
extension by third parties. The {{AbstractOperationRecordWriter}} also permits 
the setting of the {{AcidOutputFormat.Options#recordIdColumn()}} (defaulted to 
{{-1}}) which is a requirement for enabling updates and deletes. Additionally, 
these options are now fed an {{ObjectInspector}} via an abstract method so that 
a {{SerDe}} is not mandated (it was not required for our use-case). The 
{{AbstractRecordWriter}} is now much leaner, handling only the extraction of 
the {{ObjectInspector}} from the {{SerDe}}.
* A new abstract class,

[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin