[jira] [Commented] (HIVE-11041) Update tests for HIVE-9302 after removing binaries

2015-06-18 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591662#comment-14591662
 ] 

Jesus Camacho Rodriguez commented on HIVE-11041:


[~hsubramaniyan], could you take a look? This patch contains the missing pieces 
from HIVE-10684/HIVE-10705 for 1.2. Thanks

 Update tests for HIVE-9302 after removing binaries
 --

 Key: HIVE-11041
 URL: https://issues.apache.org/jira/browse/HIVE-11041
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11041.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591767#comment-14591767
 ] 

Hive QA commented on HIVE-11028:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740285/HIVE-11028.2.patch

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 9010 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_eq_with_case_when
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_when
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_subquery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4302/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4302/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4302/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740285 - PreCommit-HIVE-TRUNK-Build

 Tez: table self join and join with another table fails with 
 IndexOutOfBoundsException
 -

 Key: HIVE-11028
 URL: https://issues.apache.org/jira/browse/HIVE-11028
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch


 {noformat}
 create table tez_self_join1(id1 int, id2 string, id3 string);
 insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
 (3,'ba','ba');
 create table tez_self_join2(id1 int);
 insert into table tez_self_join2 values(1),(2),(3);
 explain
 select s.id2, s.id3
 from
 (
  select self1.id1, self1.id2, self1.id3
  from tez_self_join1 self1 join tez_self_join1 self2
  on self1.id2=self2.id3 ) s
 join tez_self_join2
 on s.id1=tez_self_join2.id1
 where s.id2='ab';
 {noformat}
 fails with error:
 {noformat}
 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
 (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
 diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
 0, Size: 0
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 

[jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591684#comment-14591684
 ] 

Hive QA commented on HIVE-11031:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740329/HIVE-11031.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9010 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4301/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4301/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4301/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740329 - PreCommit-HIVE-TRUNK-Build

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-11031.2.patch, HIVE-11031.3.patch, 
 HIVE-11031.4.patch, HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9388) HiveServer2 fails to reconnect to MetaStore after MetaStore restart

2015-06-18 Thread Mariusz Strzelecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariusz Strzelecki resolved HIVE-9388.
--
Resolution: Duplicate

HIVE-10384

 HiveServer2 fails to reconnect to MetaStore after MetaStore restart
 ---

 Key: HIVE-9388
 URL: https://issues.apache.org/jira/browse/HIVE-9388
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0, 0.14.0, 0.13.1, 1.0.0
Reporter: Piotr Ackermann
 Attachments: HIVE-9388.2.patch, HIVE-9388.patch


 How to reproduce:
 # Use Hue to connect to HiveServer2
 # Restart Metastore
 # Try to execute any query in Hue
 HiveServer2 report error:
 {quote}
 ERROR hive.log: Got exception: 
 org.apache.thrift.transport.TTransportException null
 org.apache.thrift.transport.TTransportException
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:355)
 at 
 org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:432)
 at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
 at 
 org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
 at com.sun.proxy.$Proxy10.getDatabases(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1681)
 at com.sun.proxy.$Proxy10.getDatabases(Unknown Source)
 at 
 org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:438)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
 at com.sun.proxy.$Proxy19.getSchemas(Unknown Source)
 at org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:277)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:436)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1433)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1418)
 at 

[jira] [Updated] (HIVE-11041) Update tests for HIVE-9302 after removing binaries

2015-06-18 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11041:
---
Attachment: HIVE-11041.patch

 Update tests for HIVE-9302 after removing binaries
 --

 Key: HIVE-11041
 URL: https://issues.apache.org/jira/browse/HIVE-11041
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11041.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11041) Update tests for HIVE-9302 after removing binaries

2015-06-18 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11041:
---
Component/s: Tests

 Update tests for HIVE-9302 after removing binaries
 --

 Key: HIVE-11041
 URL: https://issues.apache.org/jira/browse/HIVE-11041
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11041.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10746) Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from TextInputFormat

2015-06-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10746:
---
Attachment: HIVE-10746.2.patch

Test failures look unrelated.

Reformat before commit.

  Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from 
 TextInputFormat
 --

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Attachments: HIVE-10746.1.patch, HIVE-10746.2.patch, 
 slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
 consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to 
 run this same query against Tez as the execution engine it consistently runs 
 for over 300-500 seconds this seems extremely long. This is a basic external 
 table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
 query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now 
 Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an 
 execution engine with Single or small file tables. I can attach further logs 
 if someone needs them for deeper analysis.
 HDFS Output:
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 Hive Table Describe:
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  
 arsn_mbr_only_ind   string  
 arsn_qrmb_ind   string  
  
 # Detailed Table Information 
 Database:   adw  
 Owner:  loadu...@exa.example.com   
 CreateTime: Mon Apr 28 13:28:05 EDT 2014 
 LastAccessTime: UNKNOWN  
 Protect Mode:   None 
 Retention:  0
 Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn 

 Table Type: EXTERNAL_TABLE   
 Table Parameters:
 EXTERNALTRUE
 transient_lastDdlTime   1398706085  
  
 # Storage Information
 SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

 InputFormat:org.apache.hadoop.mapred.TextInputFormat 
 OutputFormat:   
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
 Compressed: No   
 Num Buckets:-1   
 Bucket Columns: []   
 Sort Columns:  

[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592061#comment-14592061
 ] 

Hive QA commented on HIVE-10996:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740328/HIVE-10996.01.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8996 tests executed
*Failed tests:*
{noformat}
TestCliDriver-enforce_order.q-bucketcontext_4.q-stats_publisher_error_1.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4304/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4304/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4304/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740328 - PreCommit-HIVE-TRUNK-Build

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced 

[jira] [Updated] (HIVE-10746) Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from TextInputFormat

2015-06-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10746:
---
Description: 
The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run 
this same query against Tez as the execution engine it consistently runs for 
over 300-500 seconds this seems extremely long. This is a basic external table 
delimited by tabs and is a single file in a folder. In Hive 0.13 this query 
with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 
and there clearly is something going awry with Hive w/Tez as an execution 
engine with Single or small file tables. I can attach further logs if someone 
needs them for deeper analysis.

HDFS Output:
hadoop fs -ls /example_dw/crc/arsn
Found 2 items
-rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
/example_dw/crc/arsn/_SUCCESS
-rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
/example_dw/crc/arsn/part-m-0


Hive Table Describe:
{code}
hive describe formatted crc_arsn;
OK
# col_name  data_type   comment 
 
arsn_cd string  
clmlvl_cd   string  
arclss_cd   string  
arclssg_cd  string  
arsn_prcsr_rmk_ind  string  
arsn_mbr_rspns_ind  string  
savtyp_cd   string  
arsn_eff_dt string  
arsn_exp_dt string  
arsn_pstd_dts   string  
arsn_lstupd_dts string  
arsn_updrsn_txt string  
appl_user_idstring  
arsntyp_cd  string  
pre_d_indicator string  
arsn_display_txtstring  
arstat_cd   string  
arsn_tracking_nostring  
arsn_cstspcfc_ind   string  
arsn_mstr_rcrd_ind  string  
state_specific_ind  string  
region_specific_in  string  
arsn_dpndnt_cd  string  
unit_adjustment_in  string  
arsn_mbr_only_ind   string  
arsn_qrmb_ind   string  
 
# Detailed Table Information 
Database:   adw  
Owner:  loadu...@exa.example.com   
CreateTime: Mon Apr 28 13:28:05 EDT 2014 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn   
 
Table Type: EXTERNAL_TABLE   
Table Parameters:
EXTERNALTRUE
transient_lastDdlTime   1398706085  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
field.delim \t  
line.delim  \n  
serialization.format\t  
Time taken: 1.245 seconds, Fetched: 54 row(s)

{code}


Explain Hive 1.2.0 w/Tez:
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 - Map 1 (SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)


Explain Hive 0.13 w/Tez:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 - Map 1 (SIMPLE_EDGE)
Reducer 3 

[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11031:
-
Attachment: HIVE-11031-branch-1.0.patch

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, 
 HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on LLAP: Memory manager

2015-06-18 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10233:
--
Affects Version/s: 2.0.0

 Hive on LLAP: Memory manager
 

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap, 2.0.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: (was: HIVE-10999.1-spark.patch)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11031:
-
Attachment: HIVE-11031-branch-1.0.patch

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, 
 HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10746) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat

2015-06-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10746:
---
Summary:  Hive 1.2.0+Tez produces 1-byte FileSplits from 
mapred.TextInputFormat  (was:  Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 
1-byte FileSplits from TextInputFormat)

  Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat
 --

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Attachments: HIVE-10746.1.patch, HIVE-10746.2.patch, 
 slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
 consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to 
 run this same query against Tez as the execution engine it consistently runs 
 for over 300-500 seconds this seems extremely long. This is a basic external 
 table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
 query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now 
 Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an 
 execution engine with Single or small file tables. I can attach further logs 
 if someone needs them for deeper analysis.
 HDFS Output:
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 Hive Table Describe:
 {code}
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  
 arsn_mbr_only_ind   string  
 arsn_qrmb_ind   string  
  
 # Detailed Table Information 
 Database:   adw  
 Owner:  loadu...@exa.example.com   
 CreateTime: Mon Apr 28 13:28:05 EDT 2014 
 LastAccessTime: UNKNOWN  
 Protect Mode:   None 
 Retention:  0
 Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn 

 Table Type: EXTERNAL_TABLE   
 Table Parameters:
 EXTERNALTRUE
 transient_lastDdlTime   1398706085  
  
 # Storage Information
 SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

 InputFormat:org.apache.hadoop.mapred.TextInputFormat 
 OutputFormat:   
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
 Compressed: No   
 Num Buckets:-1   
 

[jira] [Commented] (HIVE-11029) hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be impersonated

2015-06-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592128#comment-14592128
 ] 

Xuefu Zhang commented on HIVE-11029:


[~nyang], thanks for working on this. To clarify, the problem happens only if 
the cluster is not a secure one, correct?

 hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be 
 impersonated
 --

 Key: HIVE-11029
 URL: https://issues.apache.org/jira/browse/HIVE-11029
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-11029.patch


 In the core-site.xml, the hadoop.proxyuser.user.groups specifies the user 
 groups which can be impersonated by the HS2 user. However, this does not 
 work properly in Hive. 
 In my core-site.xml, I have the following configs:
 property
   namehadoop.proxyuser.mapr.hosts/name
   value*/value
 /property
 property
   namehadoop.proxyuser.mapr.groups/name
   valueroot/value
 /property
 I would expect with this configuration that 'mapr' can impersonate only 
 members of the Unix group 'root'. However if I submit a query as user 'jon' 
 the query is running as user 'jon' even though 'mapr' should not be able to 
 impersonate this user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11042) Need fix Utilities.replaceTaskId method

2015-06-18 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11042:

Attachment: HIVE-11042.1.patch

Need code review

 Need fix Utilities.replaceTaskId method
 ---

 Key: HIVE-11042
 URL: https://issues.apache.org/jira/browse/HIVE-11042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11042.1.patch


 When I are looking at other bug, I found Utilities.replaceTaskId (String, 
 int) method is not right.
 For example 
 Utilities.replaceTaskId(ds%3D1)01, 5); 
 return 5
 It should return (ds%3D1)05



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11031:
-
Attachment: (was: HIVE-11031-branch-1.0.patch)

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-11031.2.patch, HIVE-11031.3.patch, 
 HIVE-11031.4.patch, HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-06-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592029#comment-14592029
 ] 

Alan Gates commented on HIVE-10165:
---

bq. I wanted the ability to mock them in the TestMutatorCoordinator test. They 
are package private, so this separation doesn't leak into the public API.
If this is undesirable, can you recommend an alternative approach?
That's fine.  I think comments to reflect that those arguments are only for 
testing purposes would be helpful.

bq. This class relies on the correct grouping of the data (by partition,bucket) 
to avoid the problem that you describe. ... Very keen to hear your thoughts on 
this.
I am fine with pushing this responsibility to the client.  But the following in 
the class javadoc is confusing.  It starts by saying {{Events must be grouped 
by partition, then bucket}} but then later says {{Events are free to target any 
bucket and partition, including new partitions if {@link 
MutatorDestination#createPartitions()} is set. Internally the coordinator 
creates and closes {@link Mutator Mutators} as needed to write to the 
appropriate partition and bucket.}} This latter makes it sound like random 
order is ok.  I think you're trying to say group by partition bucket, and the 
MutatorCoordinator will seamlessly handle the transitions between groups.  Is 
that right?  I think we should be very clear to users that there is an extreme 
performance and storage penalty for jumping around in random order.

bq. I now wonder whether the work I’m doing in UgiMetaStoreClientFactory is 
already available in an existing Hive class as it seems like a common 
requirement. Can you advise?
There are a number of places Hive does UGI calls, but I'm not aware of any 
where it does them for metastore calls.

At this point the only issues I see remaining to get this committed is the two 
javadoc comments I've pointed out above.

 Improve hive-hcatalog-streaming extensibility and support updates and deletes.
 --

 Key: HIVE-10165
 URL: https://issues.apache.org/jira/browse/HIVE-10165
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Elliot West
Assignee: Elliot West
  Labels: streaming_api
 Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
 HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, 
 mutate-system-overview.png


 h3. Overview
 I'd like to extend the 
 [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
  API so that it also supports the writing of record updates and deletes in 
 addition to the already supported inserts.
 h3. Motivation
 We have many Hadoop processes outside of Hive that merge changed facts into 
 existing datasets. Traditionally we achieve this by: reading in a 
 ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
 sequence and then applying a function to determine inserted, updated, and 
 deleted rows. However, in our current scheme we must rewrite all partitions 
 that may potentially contain changes. In practice the number of mutated 
 records is very small when compared with the records contained in a 
 partition. This approach results in a number of operational issues:
 * Excessive amount of write activity required for small data changes.
 * Downstream applications cannot robustly read these datasets while they are 
 being updated.
 * Due to scale of the updates (hundreds or partitions) the scope for 
 contention is high. 
 I believe we can address this problem by instead writing only the changed 
 records to a Hive transactional table. This should drastically reduce the 
 amount of data that we need to write and also provide a means for managing 
 concurrent access to the data. Our existing merge processes can read and 
 retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
 an updated form of the hive-hcatalog-streaming API which will then have the 
 required data to perform an update or insert in a transactional manner. 
 h3. Benefits
 * Enables the creation of large-scale dataset merge processes  
 * Opens up Hive transactional functionality in an accessible manner to 
 processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-18 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592140#comment-14592140
 ] 

Prasanth Jayachandran commented on HIVE-11031:
--

Note for backport: The branch-1.0 patch will apply cleanly, but if we run 
orc_merge_incompat1.q it can fail on some platforms. To make it more consistent 
we need HIVE-8801 patch which makes the orc_merge_incompat1.q test more 
consistent across platforms.

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical
 Fix For: 1.2.1, 2.0.0

 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, 
 HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592990#comment-14592990
 ] 

Lefty Leverenz commented on HIVE-7193:
--

Yes, I see.  Thanks [~ngangam].  Could the parameter descriptions include this 
information?

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593003#comment-14593003
 ] 

Lefty Leverenz commented on HIVE-7193:
--

Well, I'd like to see commas  colons explained in the description but maybe 
that's just because I'm ignorant about LDAP.  If you don't think it's 
necessary, it can still be added to the description in the wiki.  And of course 
it's available here.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593005#comment-14593005
 ] 

Lefty Leverenz commented on HIVE-7193:
--

Well, I'd like to see commas  colons explained in the description but maybe 
that's just because I'm ignorant about LDAP.  If you don't think it's 
necessary, it can still be added to the description in the wiki.  And of course 
it's available here.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7193:
-
Comment: was deleted

(was: Well, I'd like to see commas  colons explained in the description but 
maybe that's just because I'm ignorant about LDAP.  If you don't think it's 
necessary, it can still be added to the description in the wiki.  And of course 
it's available here.)

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-18 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592882#comment-14592882
 ] 

Greg Senia commented on HIVE-11051:
---

This seems to be related/similar:
http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2

http://qnalist.com/questions/5904003/map-side-join-fails-when-a-serialized-table-contains-arrays


 Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
 -

 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Attachments: problem_table_joins.tar.gz


 I tried to apply: HIVE-10729 which did not solve the issue.
 The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
 0.5.4/0.5.3
 Status: Running (Executing on YARN cluster with App id 
 application_1434641270368_1038)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  3  300   0  
  0
 Map 2 ... FAILED  3  102   7  
  0
 
 VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s
  
 
 Status: Failed
 Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
 diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
 at 
 

[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592957#comment-14592957
 ] 

Hive QA commented on HIVE-10996:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740526/HIVE-10996.03.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9011 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_having
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4315/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4315/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4315/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740526 - PreCommit-HIVE-TRUNK-Build

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.03.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740473/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 603 failed/errored test(s), 7154 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct

[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: (was: HIVE-10999.1-spark.patch)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740429/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 605 failed/errored test(s), 7101 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18

[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default

2015-06-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11037:
---
Attachment: HIVE-11037.02.patch

address [~jpullokkaran] and [~hagleitn]'s comments.

 HiveOnTez: make explain user level = true as default
 

 Key: HIVE-11037
 URL: https://issues.apache.org/jira/browse/HIVE-11037
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch


 In Hive-9780, we introduced a new level of explain for hive on tez. We would 
 like to make it running by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7193:

Attachment: HIVE-7193.5.patch

Attaching new patch with doc changes from Lefty's review.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593019#comment-14593019
 ] 

Lefty Leverenz commented on HIVE-7193:
--

Great, then in Configuration Properties the parameters will be linked to the 
LDAP section.  Thanks Naveen.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-18 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592862#comment-14592862
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


The schema of the operators in the new plan would be:

{noformat}
GB - (col0, col1, col2)
SEL - (col1, col2)
FIL - (col1, col2)
SEL - (col1, col2)
{noformat}

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.03.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592984#comment-14592984
 ] 

Naveen Gangam commented on HIVE-7193:
-

Thank you for the review.
Q. Also, why is the example a comma-separated list when the description says 
colon-separated?
A. The example shows a single pattern for users for LDAP. Each attribute in 
LDAP DN is separated by COMMA
CN=%s,CN=Users,DC=subdomain,DC=domain,DC=com
However, it is possible that a ldap directory could have users in different 
trees. The pattern for baseDN for each tree is separated by COLON.
   For example 
CN=%s,CN=Users,DC=subdomain,DC=domain,DC=com:CN=%s,OU=IT,DC=domain,DC=com

The same is true for group patterns. Does this help? Thanks

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11050) testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593013#comment-14593013
 ] 

Hive QA commented on HIVE-11050:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740528/HIVE-11050.01.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9010 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4316/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4316/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4316/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740528 - PreCommit-HIVE-TRUNK-Build

 testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data 
 creation queries
 --

 Key: HIVE-11050
 URL: https://issues.apache.org/jira/browse/HIVE-11050
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.1
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Blocker
 Attachments: HIVE-11050.01.patch


 In some environments the Q file tests vector_outer_join\{1-4\}.q fail because 
 the data creation queries produce different input files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593014#comment-14593014
 ] 

Naveen Gangam commented on HIVE-7193:
-

I intend to enhance the LDAP section wiki docs about using these new properties 
in detail, with examples. I just holding out until this patch gets committed. I 
figured thats where most users will look when attempting to use this feature. 
Would that suffice? And leave the patch 5 as-was for now?

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593046#comment-14593046
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740569/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 7943 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lateral_view_explode2
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/894/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/894/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-894/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740569 - PreCommit-HIVE-SPARK-Build

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592953#comment-14592953
 ] 

Lefty Leverenz commented on HIVE-11023:
---

Should this be documented in the wiki?

 Disable directSQL if datanucleus.identifierFactory = datanucleus2
 -

 Key: HIVE-11023
 URL: https://issues.apache.org/jira/browse/HIVE-11023
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Fix For: 1.2.1

 Attachments: HIVE-11023.patch


 We hit an interesting bug in a case where datanucleus.identifierFactory = 
 datanucleus2 .
 The problem is that directSql handgenerates SQL strings assuming 
 datanucleus1 naming scheme. If a user has their metastore JDO managed by 
 datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate 
 are incorrect.
 One simple example of what this results in is the following: whenever DN 
 persists a field which is held as a ListT, it winds up storing each T as a 
 separate line in the appropriate mapping table, and has a column called 
 INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
 automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
 results in the list retaining its order. In DN2 naming scheme, the column is 
 called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
 upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
 and IDX.
 Whenever they use JDO, such as with all writes, it will then use the IDX 
 field, and when they do any sort of optimized reads, such as through 
 directSQL, it will ORDER BY INTEGER_IDX.
 An immediate danger is seen when we consider that the schema of a table is 
 stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
 will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
 schema for the table can come up mixed up in the table's native hashing 
 order, rather than sorted by the index.
 This can then result in schema ordering being different from the actual 
 table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this 
 may return (c:string, a:int, b: string), and thus, queries which are 
 inserting after selecting from another table can have ClassCastExceptions 
 when trying to insert data in the wong order - this is how we discovered this 
 bug. This problem, however, can be far worse, if there are no type problems - 
 it is possible, for eg., that if a,bc were all strings, that that insert 
 query would succeed but mix up the order, which then results in user table 
 data being mixed up. This has the potential to be very bad.
 We should write a tool to help convert metastores that use datanucleus2 to 
 datanucleus1(more difficult, needs more one-time testing) or change 
 directSql to support both(easier to code, but increases test-coverage matrix 
 significantly and we should really then be testing against both schemes). But 
 in the short term, we should disable directSql if we see that the 
 identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593020#comment-14593020
 ] 

Rui Li commented on HIVE-10999:
---

When I tried the patch earlier the downloaded jar was still invalid. Now it 
works well and I've passed some tests locally.

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7193:

Attachment: (was: HIVE-7193.5.patch)

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592996#comment-14592996
 ] 

Lefty Leverenz commented on HIVE-7193:
--

I'm getting 404 Oops, you've found a dead link for patch 5.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-18 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated HIVE-11051:
--
Component/s: Tez

 Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
 -

 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Tez
Affects Versions: 1.2.0
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Attachments: problem_table_joins.tar.gz


 I tried to apply: HIVE-10729 which did not solve the issue.
 The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
 0.5.4/0.5.3
 Status: Running (Executing on YARN cluster with App id 
 application_1434641270368_1038)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  3  300   0  
  0
 Map 2 ... FAILED  3  102   7  
  0
 
 VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s
  
 
 Status: Failed
 Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
 diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
 at 
 

[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-18 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11051:
---
Description: 
I tried to apply: HIVE-10729 which did not solve the issue.

The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
0.5.4/0.5.3


{code}
Status: Running (Executing on YARN cluster with App id 
application_1434641270368_1038)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  3  300   0   0
Map 2 ... FAILED  3  102   7   0

VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s 

Status: Failed
Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-23 
22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
11:54:40.740061,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-23 
22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
11:54:40.740061,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-23 

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592981#comment-14592981
 ] 

Xuefu Zhang commented on HIVE-10999:


[~lirui], could you try the patch locally to see if you can run at least one 
Spark q test successfully? It worked for me, but the pre-commit test seems 
having trouble. Thanks.

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7193:
-
Comment: was deleted

(was: I'm getting 404 Oops, you've found a dead link for patch 5.)

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592995#comment-14592995
 ] 

Lefty Leverenz commented on HIVE-7193:
--

I'm getting 404 Oops, you've found a dead link for patch 5.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593000#comment-14593000
 ] 

Naveen Gangam commented on HIVE-7193:
-

Sorry, I just deleted it seeing you latest comment about including additional 
info in the parameter description. Should all of the above info be in the 
description? Thanks

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7193:

Attachment: HIVE-7193.5.patch

Re-attaching the patch with Lefty's suggestion. I will include full details 
when I update the wiki docs for LDAP Configuration. (Thanks Lefty)

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-18 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592870#comment-14592870
 ] 

Gunther Hagleitner commented on HIVE-10233:
---

Partial review: 
  * There's some unnecessary commented out code in HashTableLoader (NOCOND..)
   * getOutputMemoryNeeded isn't referenced anywhere. I think this can be 
dropped together with setOutputMemoryNeeded + references.
   * IMO a definition of ONE_MB makes no sense might as well use the number in 
the code
   * Same for getInputMemoryNeededFraction
   * memoryInUse in AbstractOperatorDesc isn't used anywhere
   *|| (conf.getBoolVar(HiveConf.ConfVars.HIVEUSEHYBRIDGRACEHASHJOIN))) { 
did you mean to say ? Are you trying to run the mem manager only if tez and 
hybrid?
   * You set a work's memory usage to the data size of it's terminal operator. 
How come?

 Hive on tez: memory manager for grace hash join
 ---

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap, 2.0.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
 HIVE-10233.09.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11042) Need fix Utilities.replaceTaskId method

2015-06-18 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592929#comment-14592929
 ] 

Yongzhi Chen commented on HIVE-11042:
-

The one failure is not related. Its age is 2.
[~csun], [~szehon], could you review the change? Thanks

 Need fix Utilities.replaceTaskId method
 ---

 Key: HIVE-11042
 URL: https://issues.apache.org/jira/browse/HIVE-11042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11042.1.patch


 When I are looking at other bug, I found Utilities.replaceTaskId (String, 
 int) method is not right.
 For example 
 Utilities.replaceTaskId(ds%3D1)01, 5); 
 return 5
 It should return (ds%3D1)05



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: (was: HIVE-10999.1-spark.patch)

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593009#comment-14593009
 ] 

Lefty Leverenz commented on HIVE-7193:
--

Sorry about the duplicate comments.  I'll review patch 5 tomorrow.

 Hive should support additional LDAP authentication parameters
 -

 Key: HIVE-7193
 URL: https://issues.apache.org/jira/browse/HIVE-7193
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mala Chikka Kempanna
Assignee: Naveen Gangam
 Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, 
 HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, 
 LDAPAuthentication_Design_Doc_V2.docx


 Currently hive has only following authenticator parameters for LDAP 
 authentication for hiveserver2:
 {code:xml}
 property 
   namehive.server2.authentication/name 
   valueLDAP/value 
 /property 
 property 
   namehive.server2.authentication.ldap.url/name 
   valueldap://our_ldap_address/value 
 /property 
 {code}
 We need to include other LDAP properties as part of hive-LDAP authentication 
 like below:
 {noformat}
 a group search base - dc=domain,dc=com 
 a group search filter - member={0} 
 a user search base - dc=domain,dc=com 
 a user search filter - sAMAAccountName={0} 
 a list of valid user groups - group1,group2,group3 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11041) Update tests for HIVE-9302 after removing binaries

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592224#comment-14592224
 ] 

Hive QA commented on HIVE-11041:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740366/HIVE-11041.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4307/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4307/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4307/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4307/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
From https://github.com/apache/hive
   9b10194..f6d8075  branch-1   - origin/branch-1
   fb86cef..06f10fe  branch-1.0 - origin/branch-1.0
   2e1bee8..d8ff0bc  branch-1.1 - origin/branch-1.1
   234b82d..703882c  branch-1.2 - origin/branch-1.2
   3f8b0ef..dd30afc  master - origin/master
+ git reset --hard HEAD
HEAD is now at 3f8b0ef HIVE-11031: ORC concatenation of old files can fail 
while merging column statistics (Prasanth Jayachandran reviewed by Gopal V)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at dd30afc HIVE-11040 : Change Derby dependency version to 
10.10.2.0 (Jason Dere, reviewed by Sushanth Sowmyan, Gunther Hagleitner)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740366 - PreCommit-HIVE-TRUNK-Build

 Update tests for HIVE-9302 after removing binaries
 --

 Key: HIVE-11041
 URL: https://issues.apache.org/jira/browse/HIVE-11041
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11041.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

 Upgrade Spark dependency to 1.4 [Spark Branch]
 --

 Key: HIVE-10999
 URL: https://issues.apache.org/jira/browse/HIVE-10999
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch


 Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592388#comment-14592388
 ] 

Hive QA commented on HIVE-10996:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740417/HIVE-10996.02.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9011 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4308/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4308/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4308/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740417 - PreCommit-HIVE-TRUNK-Build

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to 

[jira] [Commented] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592393#comment-14592393
 ] 

Thejas M Nair commented on HIVE-11047:
--

+1

 Update versions of branch-1.2 to 1.2.1
 --

 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11047.patch


 Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update 
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 to reflect that 1.2.1's schema is identical to 1.2.0.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function

2015-06-18 Thread Yi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592377#comment-14592377
 ] 

Yi Zhang commented on HIVE-10142:
-

This request is more in the line as following decay variable definition:

Exponential rate of change can be modeled algebraically by the following 
formula:

N(t)=N(0)e^(−λt)

where N is the quantity, N0 is the initial quantity, λ is the decay constant, 
and t is time.

And the window function will be a summary of the value of all records in the 
window relative to the current record.

 Calculating formula based on difference between each row's value and current 
 row's in Windowing function
 

 Key: HIVE-10142
 URL: https://issues.apache.org/jira/browse/HIVE-10142
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Affects Versions: 1.0.0
Reporter: Yi Zhang
Assignee: Aihua Xu

 For analytics with windowing function, the calculation formula sometimes 
 needs to perform over each row's value against current tow's value. The decay 
 value is a good example, such as sums of value with a decay function based on 
 difference of timestamp between each row and current row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11036) Race condition in DataNucleus makes Metastore to hang

2015-06-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11036:

Reporter: Takahiko Saito  (was: Ashutosh Chauhan)

 Race condition in DataNucleus makes Metastore to hang
 -

 Key: HIVE-11036
 URL: https://issues.apache.org/jira/browse/HIVE-11036
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Takahiko Saito
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11036.patch


 Under moderate to high concurrent query workload Metastore gets deadlocked in 
 DataNucleus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11047:

Description: 
Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to 
reflect that 1.2.1's schema is identical to 1.2.0.

NO PRECOMMIT TESTS


 Update versions of branch-1.2 to 1.2.1
 --

 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11047.patch


 Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update 
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 to reflect that 1.2.1's schema is identical to 1.2.0.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11047:

Attachment: HIVE-11047.patch

Patch attached.

 Update versions of branch-1.2 to 1.2.1
 --

 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-11047.patch


 Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update 
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 to reflect that 1.2.1's schema is identical to 1.2.0.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-11047:

Attachment: HIVE-11047.2.patch

Updated patch slightly to reflect the MetaStoreSchemaInfo.java change happening 
outside this patch for all branches.

 Update versions of branch-1.2 to 1.2.1
 --

 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.1

 Attachments: HIVE-11047.2.patch, HIVE-11047.patch


 Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update 
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 to reflect that 1.2.1's schema is identical to 1.2.0.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592267#comment-14592267
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740429/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 605 failed/errored test(s), 7101 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18

[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592452#comment-14592452
 ] 

Hive QA commented on HIVE-10999:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740473/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 603 failed/errored test(s), 7154 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct

[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-18 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592463#comment-14592463
 ] 

Laljo John Pullokkaran commented on HIVE-11028:
---

+1

 Tez: table self join and join with another table fails with 
 IndexOutOfBoundsException
 -

 Key: HIVE-11028
 URL: https://issues.apache.org/jira/browse/HIVE-11028
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, 
 HIVE-11028.3.patch


 {noformat}
 create table tez_self_join1(id1 int, id2 string, id3 string);
 insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
 (3,'ba','ba');
 create table tez_self_join2(id1 int);
 insert into table tez_self_join2 values(1),(2),(3);
 explain
 select s.id2, s.id3
 from
 (
  select self1.id1, self1.id2, self1.id3
  from tez_self_join1 self1 join tez_self_join1 self2
  on self1.id2=self2.id3 ) s
 join tez_self_join2
 on s.id1=tez_self_join2.id1
 where s.id2='ab';
 {noformat}
 fails with error:
 {noformat}
 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
 (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
 diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
 0, Size: 0
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
 at 
 org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71)
 at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
 ... 13 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11045) ArrayIndexOutOfBoundsException with Hive 1.2.0 and Tez 0.7.0

2015-06-18 Thread Soundararajan Velu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592502#comment-14592502
 ] 

Soundararajan Velu commented on HIVE-11045:
---

Vikram, 

I face this issue only with Hive on Tez, my data is in json format and I use 
JsonSerde from https://github.com/rcongiu/Hive-JSON-Serde,
The query runs perfectly fine on Hive. This only occurs with Tez.
Data set is huge and I have no clue on which records this exception arises, 

The query is as below,
SELECT t1.return_id AS return_id,
   t1.approve_date AS approve_date,
   t1.approve_date_key AS approve_date_key,
   t1.cancel_date AS cancel_date,
   t1.cancel_date_key AS cancel_date_key,
   t1.complete_date AS complete_date,
   t1.complete_date_key AS complete_date_key,
   t1.init_cancellation_date AS init_cancellation_date,
   t1.init_cancellation_date_key AS init_cancellation_date_key,
   t1.reject_date AS reject_date,
   t1.reject_date_key AS reject_date_key,
   t1.unhold_date AS unhold_date,
   t1.unhold_date_key AS unhold_date_key,
   t1.request_service_date AS request_service_date,
   t1.request_service_date_key AS request_service_date_key,
   t1.service_approve_return_date AS service_approve_return_date,
   t1.service_approve_return_date_key AS service_approve_return_date_key,
   CASE
   WHEN t2.action_override_status_time IS NULL THEN 0
   ELSE 1
   END AS flag_action_override,
   CASE
   WHEN t2.action_override_status_time IS NULL THEN NULL
   ELSE t2.action_override_status_time
   END AS action_override_status_time,
   CASE
   WHEN t2.action_override_user_login IS NULL THEN 'NA'
   ELSE t2.action_override_user_login
   END AS action_override_user_login,
   CASE
   WHEN t2.action_override_change_reason IS NULL THEN 'NA'
   ELSE t2.action_override_change_reason
   END AS action_override_change_reason,
   CASE
   WHEN t2.action_override_change_sub_reason IS NULL THEN 'NA'
   ELSE t2.action_override_change_sub_reason
   END AS action_override_change_sub_reason,
   CASE
   WHEN t2.action_override_count IS NULL THEN cast(0 AS bigint)
   ELSE t2.action_override_count
   END AS action_override_count,
   CASE
   WHEN t2.action_change_data IS NULL THEN 'NA'
   ELSE t2.action_change_data
   END AS action_change_data,
   CASE
   WHEN t3.policy_override_status_time IS NULL THEN 0
   ELSE 1
   END AS flag_policy_override,
   CASE
   WHEN t3.policy_override_status_time IS NULL THEN NULL
   ELSE t3.policy_override_status_time
   END AS policy_override_status_time,
   CASE
   WHEN t3.policy_override_user_login IS NULL THEN 'NA'
   ELSE t3.policy_override_user_login
   END AS policy_override_user_login,
   CASE
   WHEN t3.policy_override_change_reason IS NULL THEN 'NA'
   ELSE t3.policy_override_change_reason
   END AS policy_override_change_reason,
   CASE
   WHEN t3.policy_override_change_sub_reason IS NULL THEN 'NA'
   ELSE t3.policy_override_change_sub_reason
   END AS policy_override_change_sub_reason,
   CASE
   WHEN t3.policy_override_count IS NULL THEN cast(0 AS bigint)
   ELSE t3.policy_override_count
   END AS policy_override_count,
   CASE
   WHEN t3.policy_change_data IS NULL THEN 'NA'
   ELSE t3.policy_change_data
   END AS policy_change_data,
   cast(0 AS bigint) AS temp_flag,
   CASE
   WHEN t3.policy_override_status_date_key IS NULL THEN 0
   ELSE t3.policy_override_status_date_key
   END AS policy_override_status_date_key,
   CASE
   WHEN t2.action_override_status_date_key IS NULL THEN 0
   ELSE t2.action_override_status_date_key
   END AS action_override_status_date_key,
   t1.user_approved_by AS user_approved_by,
   t1.user_rejected_by AS user_rejected_by,
   t1.user_cancelled_by AS user_cancelled_by,
   t1.reject_reason AS reject_reason,
   t1.reject_sub_reason AS reject_sub_reason,
   t1.reject_change_data AS reject_change_data
FROM
  (SELECT rh1.`data`.return_id,
  MIN (CASE WHEN rh1.`data`.event = 'approve' THEN 
rh1.`data`.status_time ELSE NULL END) AS approve_date,

MIN (CASE WHEN rh1.`data`.event = 'cancel' THEN 
rh1.`data`.status_time ELSE NULL END) AS cancel_date,


 MIN (CASE WHEN rh1.`data`.event = 'complete' THEN 
rh1.`data`.status_time ELSE NULL END) AS 

[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-18 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592466#comment-14592466
 ] 

Laljo John Pullokkaran commented on HIVE-11028:
---

[~jdere] Test failures needs to be addressed otherwise looks good.

 Tez: table self join and join with another table fails with 
 IndexOutOfBoundsException
 -

 Key: HIVE-11028
 URL: https://issues.apache.org/jira/browse/HIVE-11028
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, 
 HIVE-11028.3.patch


 {noformat}
 create table tez_self_join1(id1 int, id2 string, id3 string);
 insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
 (3,'ba','ba');
 create table tez_self_join2(id1 int);
 insert into table tez_self_join2 values(1),(2),(3);
 explain
 select s.id2, s.id3
 from
 (
  select self1.id1, self1.id2, self1.id3
  from tez_self_join1 self1 join tez_self_join1 self2
  on self1.id2=self2.id3 ) s
 join tez_self_join2
 on s.id1=tez_self_join2.id1
 where s.id2='ab';
 {noformat}
 fails with error:
 {noformat}
 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
 (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
 diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
 0, Size: 0
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
 at 
 org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71)
 at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
 ... 13 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11044) Some optimizable predicates being missed by constant propagation

2015-06-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11044:
--
Attachment: HIVE-11044.1.patch

Initial patch, running ConstantPropagate one additional time after 
PartitionPruner during Optimizer.initialize().

The qfile updates show removal of unnecessary predicates, either (constant = 
constant), or (column is not null) when there are additional predicates on the 
column, along with updated stats due to the removal of the predicates.

Will need to update this patch for test explainuser_2.q, once HIVE-11028 is 
committed.

 Some optimizable predicates being missed by constant propagation
 

 Key: HIVE-11044
 URL: https://issues.apache.org/jira/browse/HIVE-11044
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11044.1.patch


 Some of the qfile explain plans show some predicates that could be taken care 
 of by running ConstantPropagate after the PartitionPruner:
 index_auto_unused.q:
 {noformat}
 filterExpr: ((12.0 = 12.0) and (UDFToDouble(key)  10.0)) (type: boolean)
 {noformat}
 join28.q:
 {noformat}
 predicate: ((11.0 = 11.0) and key is not null) (type: boolean)
 {noformat}
 bucketsort_optimize_insert_7.q (is not null is unnecessary)
 {noformat}
 predicate: (((key  8) and key is not null) and ((key = 0) or (key = 5))) 
 (type: boolean)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592475#comment-14592475
 ] 

Jason Dere commented on HIVE-11028:
---

Would like to add this to branch-1.2

 Tez: table self join and join with another table fails with 
 IndexOutOfBoundsException
 -

 Key: HIVE-11028
 URL: https://issues.apache.org/jira/browse/HIVE-11028
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, 
 HIVE-11028.3.patch


 {noformat}
 create table tez_self_join1(id1 int, id2 string, id3 string);
 insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
 (3,'ba','ba');
 create table tez_self_join2(id1 int);
 insert into table tez_self_join2 values(1),(2),(3);
 explain
 select s.id2, s.id3
 from
 (
  select self1.id1, self1.id2, self1.id3
  from tez_self_join1 self1 join tez_self_join1 self2
  on self1.id2=self2.id3 ) s
 join tez_self_join2
 on s.id1=tez_self_join2.id1
 where s.id2='ab';
 {noformat}
 fails with error:
 {noformat}
 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
 (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
 diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
 0, Size: 0
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
 at 
 org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71)
 at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
 ... 13 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11042) Need fix Utilities.replaceTaskId method

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592528#comment-14592528
 ] 

Hive QA commented on HIVE-11042:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740423/HIVE-11042.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9011 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4309/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4309/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4309/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740423 - PreCommit-HIVE-TRUNK-Build

 Need fix Utilities.replaceTaskId method
 ---

 Key: HIVE-11042
 URL: https://issues.apache.org/jira/browse/HIVE-11042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-11042.1.patch


 When I are looking at other bug, I found Utilities.replaceTaskId (String, 
 int) method is not right.
 For example 
 Utilities.replaceTaskId(ds%3D1)01, 5); 
 return 5
 It should return (ds%3D1)05



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

2015-06-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592540#comment-14592540
 ] 

Sushanth Sowmyan commented on HIVE-11023:
-

[~xuefuz] : Yes, this will happen to all releases with directSql - which means 
anything past 0.12, I think. That said, the number of installations that 
override this parameter should hopefully be minimal.

If people are using datanucleus2 as their identifierFactory version, then:
a) They should disable directSql (can be done by conf parameter, does not need 
this code fix - the code fix simply automates that for current and future 
releases)
b) They should retain that identifierFactory - a mixed metastore with both is 
bad.
c) Once we have a way of migrating them, as with HIVE-11039 filed, we should 
migrate them out of it.

This parameter should never have been a part of hive-site.xml, I think, since 
it's dangerous if a user changes it. A datanucleus1 installation changing the 
parameter to datanucleus2 or vice-versa can result in metadata corruption for 
us.


 Disable directSQL if datanucleus.identifierFactory = datanucleus2
 -

 Key: HIVE-11023
 URL: https://issues.apache.org/jira/browse/HIVE-11023
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Fix For: 1.2.1

 Attachments: HIVE-11023.patch


 We hit an interesting bug in a case where datanucleus.identifierFactory = 
 datanucleus2 .
 The problem is that directSql handgenerates SQL strings assuming 
 datanucleus1 naming scheme. If a user has their metastore JDO managed by 
 datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate 
 are incorrect.
 One simple example of what this results in is the following: whenever DN 
 persists a field which is held as a ListT, it winds up storing each T as a 
 separate line in the appropriate mapping table, and has a column called 
 INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
 automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
 results in the list retaining its order. In DN2 naming scheme, the column is 
 called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
 upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
 and IDX.
 Whenever they use JDO, such as with all writes, it will then use the IDX 
 field, and when they do any sort of optimized reads, such as through 
 directSQL, it will ORDER BY INTEGER_IDX.
 An immediate danger is seen when we consider that the schema of a table is 
 stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
 will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
 schema for the table can come up mixed up in the table's native hashing 
 order, rather than sorted by the index.
 This can then result in schema ordering being different from the actual 
 table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this 
 may return (c:string, a:int, b: string), and thus, queries which are 
 inserting after selecting from another table can have ClassCastExceptions 
 when trying to insert data in the wong order - this is how we discovered this 
 bug. This problem, however, can be far worse, if there are no type problems - 
 it is possible, for eg., that if a,bc were all strings, that that insert 
 query would succeed but mix up the order, which then results in user table 
 data being mixed up. This has the potential to be very bad.
 We should write a tool to help convert metastores that use datanucleus2 to 
 datanucleus1(more difficult, needs more one-time testing) or change 
 directSql to support both(easier to code, but increases test-coverage matrix 
 significantly and we should really then be testing against both schemes). But 
 in the short term, we should disable directSql if we see that the 
 identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced with Job.newInstance() Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader

2015-06-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10754:

Summary: new Job() is deprecated. Replaced with Job.newInstance() 
Pig+Hcatalog doesn't work properly since we need to clone the Job instance in 
HCatLoader  (was: Pig+Hcatalog doesn't work properly since we need to clone the 
Job instance in HCatLoader)

 new Job() is deprecated. Replaced with Job.newInstance() Pig+Hcatalog doesn't 
 work properly since we need to clone the Job instance in HCatLoader
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 {noformat}
 Create table tbl1 (key string, value string) stored as rcfile;
 Create table tbl2 (key string, value string);
 insert into tbl1 values( '1', '111');
 insert into tbl2 values('1', '2');
 {noformat}
 Pig script:
 {noformat}
 src_tbl1 = FILTER tbl1 BY (key == '1');
 prj_tbl1 = FOREACH src_tbl1 GENERATE
key as tbl1_key,
value as tbl1_value,
'333' as tbl1_v1;

 src_tbl2 = FILTER tbl2 BY (key == '1');
 prj_tbl2 = FOREACH src_tbl2 GENERATE
key as tbl2_key,
value as tbl2_value;

 dump prj_tbl1;
 dump prj_tbl2;
 result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
 prj_result = FOREACH result 
   GENERATE  prj_tbl1::tbl1_key AS key1,
 prj_tbl1::tbl1_value AS value1,
 prj_tbl1::tbl1_v1 AS v1,
 prj_tbl2::tbl2_key AS key2,
 prj_tbl2::tbl2_value AS value2;

 dump prj_result;
 {noformat}
 The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  We 
 need to clone the job instance in HCatLoader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11045) ArrayIndexOutOfBoundsException with Hive 1.2.0 and Tez 0.7.0

2015-06-18 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592453#comment-14592453
 ] 

Vikram Dixit K commented on HIVE-11045:
---

[~raj_velu] Can you provide some more information here so as to help debug this 
issue? Can you share the query and if possible a sample data set so that I can 
repro this issue. Also any configuration settings used would be helpful.

Thanks
Vikram.

 ArrayIndexOutOfBoundsException with Hive 1.2.0 and Tez 0.7.0
 

 Key: HIVE-11045
 URL: https://issues.apache.org/jira/browse/HIVE-11045
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
 Environment: Hive 1.2.0, HDP 2.2, Hadoop 2.6, Tez 0.7.0
Reporter: Soundararajan Velu

  TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{_col0:4457890},value:{_col0:null,_col1:null,_col2:null,_col3:null,_col4:null,_col5:null,_col6:null,_col7:null,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null,_col14:null,_col15:null,_col16:null,_col17:fkl_shipping_b2c,_col18:null,_col19:null,_col20:null,_col21:null}}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{_col0:4457890},value:{_col0:null,_col1:null,_col2:null,_col3:null,_col4:null,_col5:null,_col6:null,_col7:null,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null,_col14:null,_col15:null,_col16:null,_col17:fkl_shipping_b2c,_col18:null,_col19:null,_col20:null,_col21:null}}
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302)
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
 ... 14 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{_col0:4457890},value:{_col0:null,_col1:null,_col2:null,_col3:null,_col4:null,_col5:null,_col6:null,_col7:null,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null,_col14:null,_col15:null,_col16:null,_col17:fkl_shipping_b2c,_col18:null,_col19:null,_col20:null,_col21:null}}
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
 at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row (tag=1) 
 {key:{_col0:6417306,_col1:{0:{_col0:2014-08-01 
 02:14:02}}},value:{_col0:2014-08-01 
 02:14:02,_col1:20140801,_col2:sc_jarvis_b2c,_col3:action_override,_col4:WITHIN_GRACE_PERIOD,_col5:policy_override}}
 at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:413)
 at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381)
 at 
 

[jira] [Commented] (HIVE-11046) Filesystem Closed Exception

2015-06-18 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592478#comment-14592478
 ] 

Siddharth Seth commented on HIVE-11046:
---

[~raj_velu] - bunch of questions. 
Do you have additional logs from the container where this error was seen ? Also 
any steps to reproduce and how often are you able to reproduce this ?
Is this using the Tez 0.7.0 release or a snapshot ?


 Filesystem Closed Exception
 ---

 Key: HIVE-11046
 URL: https://issues.apache.org/jira/browse/HIVE-11046
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.7.0, 1.2.0
 Environment: Hive 1.2.0, Tez0.7.0, HDP2.2, Hadoop 2.6
Reporter: Soundararajan Velu

  TaskAttempt 2 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
 Filesystem closed
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
 at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.io.IOException: Filesystem closed
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
 ... 14 more
 Caused by: java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629)
 at java.io.FilterInputStream.close(FilterInputStream.java:181)
 at 
 org.apache.hadoop.io.compress.DecompressorStream.close(DecompressorStream.java:205)
 at org.apache.hadoop.util.LineReader.close(LineReader.java:150)
 at 
 org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:282)
 at 
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:104)
 at 
 org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:170)
 at 
 org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:138)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
 ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10746) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat

2015-06-18 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10746:

Description: 
The following query: 
{code:sql}
SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP 
BY appl_user_id,arsn_cd ORDER BY appl_user_id;
{code}
 runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting 
to run this same query against Tez as the execution engine it consistently runs 
for over 300-500 seconds this seems extremely long. This is a basic external 
table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 
1.2.0 and there clearly is something going awry with Hive w/Tez as an execution 
engine with Single or small file tables. I can attach further logs if someone 
needs them for deeper analysis.

HDFS Output:
{noformat}
hadoop fs -ls /example_dw/crc/arsn
Found 2 items
-rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
/example_dw/crc/arsn/_SUCCESS
-rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
/example_dw/crc/arsn/part-m-0
{noformat}

Hive Table Describe:
{noformat}
hive describe formatted crc_arsn;
OK
# col_name  data_type   comment 
 
arsn_cd string  
clmlvl_cd   string  
arclss_cd   string  
arclssg_cd  string  
arsn_prcsr_rmk_ind  string  
arsn_mbr_rspns_ind  string  
savtyp_cd   string  
arsn_eff_dt string  
arsn_exp_dt string  
arsn_pstd_dts   string  
arsn_lstupd_dts string  
arsn_updrsn_txt string  
appl_user_idstring  
arsntyp_cd  string  
pre_d_indicator string  
arsn_display_txtstring  
arstat_cd   string  
arsn_tracking_nostring  
arsn_cstspcfc_ind   string  
arsn_mstr_rcrd_ind  string  
state_specific_ind  string  
region_specific_in  string  
arsn_dpndnt_cd  string  
unit_adjustment_in  string  
arsn_mbr_only_ind   string  
arsn_qrmb_ind   string  
 
# Detailed Table Information 
Database:   adw  
Owner:  loadu...@exa.example.com   
CreateTime: Mon Apr 28 13:28:05 EDT 2014 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn   
 
Table Type: EXTERNAL_TABLE   
Table Parameters:
EXTERNALTRUE
transient_lastDdlTime   1398706085  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
field.delim \t  
line.delim  \n  
serialization.format\t  
Time taken: 1.245 seconds, Fetched: 54 row(s)
{noformat}


Explain Hive 1.2.0 w/Tez:
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 - Map 1 (SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)


Explain Hive 0.13 w/Tez:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
Tez
  

[jira] [Commented] (HIVE-10978) Document fs.trash.interval wrt Hive and HDFS Encryption

2015-06-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592493#comment-14592493
 ] 

Lefty Leverenz commented on HIVE-10978:
---

[~eugene.koifman], the only encryption doc in the Hive wiki is this section 
(plus Configuration Properties):

* [Setting Up HiveServer2 -- SSL Encryption | 
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-SSLEncryption]
 

 Document fs.trash.interval wrt Hive and HDFS Encryption
 ---

 Key: HIVE-10978
 URL: https://issues.apache.org/jira/browse/HIVE-10978
 Project: Hive
  Issue Type: Bug
  Components: Documentation, Security
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Priority: Critical
  Labels: TODOC1.2

 This should be documented in 1.2.1 Release Notes
 When HDFS is encrypted (TDE is enabled), DROP TABLE and DROP PARTITION have 
 unexpected behavior when Hadoop Trash feature is enabled.
 The later is enabled by setting fs.trash.interval  0 in core-site.xml.
 When Trash is enabled, the data file for the table, should be moved to 
 Trash bin. If the table is inside an Encryption Zone, this move operation 
 is not allowed.
 There are 2 ways to deal with this:
 1. use PURGE, as in DROP TABLE blah PURGE. This skips the Trash bin even if 
 enabled.
 2. set fs.trash.interval = 0. It is critical that this config change is done 
 in core-site.xml. Setting it in hive-site.xml may lead to very strange 
 behavior where the table metadata is deleted but the data file remains.  This 
 will lead to data corruption if a table with the same name is later created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog

2015-06-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10754:

Summary: new Job() is deprecated. Replaced all with Job.getInstance() for 
Hcatalog  (was: new Job() is deprecated. Replaced with Job.newInstance() 
Pig+Hcatalog doesn't work properly since we need to clone the Job instance in 
HCatLoader)

 new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 {noformat}
 Create table tbl1 (key string, value string) stored as rcfile;
 Create table tbl2 (key string, value string);
 insert into tbl1 values( '1', '111');
 insert into tbl2 values('1', '2');
 {noformat}
 Pig script:
 {noformat}
 src_tbl1 = FILTER tbl1 BY (key == '1');
 prj_tbl1 = FOREACH src_tbl1 GENERATE
key as tbl1_key,
value as tbl1_value,
'333' as tbl1_v1;

 src_tbl2 = FILTER tbl2 BY (key == '1');
 prj_tbl2 = FOREACH src_tbl2 GENERATE
key as tbl2_key,
value as tbl2_value;

 dump prj_tbl1;
 dump prj_tbl2;
 result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
 prj_result = FOREACH result 
   GENERATE  prj_tbl1::tbl1_key AS key1,
 prj_tbl1::tbl1_value AS value1,
 prj_tbl1::tbl1_v1 AS v1,
 prj_tbl2::tbl2_key AS key2,
 prj_tbl2::tbl2_value AS value2;

 dump prj_result;
 {noformat}
 The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  We 
 need to clone the job instance in HCatLoader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592673#comment-14592673
 ] 

Hive QA commented on HIVE-11028:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740446/HIVE-11028.3.patch

{color:green}SUCCESS:{color} +1 9011 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4311/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4311/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4311/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740446 - PreCommit-HIVE-TRUNK-Build

 Tez: table self join and join with another table fails with 
 IndexOutOfBoundsException
 -

 Key: HIVE-11028
 URL: https://issues.apache.org/jira/browse/HIVE-11028
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, 
 HIVE-11028.3.patch


 {noformat}
 create table tez_self_join1(id1 int, id2 string, id3 string);
 insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), 
 (3,'ba','ba');
 create table tez_self_join2(id1 int);
 insert into table tez_self_join2 values(1),(2),(3);
 explain
 select s.id2, s.id3
 from
 (
  select self1.id1, self1.id2, self1.id3
  from tez_self_join1 self1 join tez_self_join1 self2
  on self1.id2=self2.id3 ) s
 join tez_self_join2
 on s.id1=tez_self_join2.id1
 where s.id2='ab';
 {noformat}
 fails with error:
 {noformat}
 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver 
 (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
 vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, 
 diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 
 0, Size: 0
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175)
 at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313)
 at 
 

[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592678#comment-14592678
 ] 

Hive QA commented on HIVE-10233:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740448/HIVE-10233.08.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4312/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4312/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4312/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4312/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at b98a30b HIVE-10746: Hive 1.2.0+Tez produces 1-byte FileSplits 
from mapred.TextInputFormat (Gopal V via Gunther H)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/tez_self_join.q
Removing ql/src/test/results/clientpositive/tez/tez_self_join.q.out
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at b98a30b HIVE-10746: Hive 1.2.0+Tez produces 1-byte FileSplits 
from mapred.TextInputFormat (Gopal V via Gunther H)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740448 - PreCommit-HIVE-TRUNK-Build

 Hive on tez: memory manager for grace hash join
 ---

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap, 2.0.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6897) Allow overwrite/append to external Hive table (with partitions) via HCatStorer

2015-06-18 Thread Alen Frantz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592366#comment-14592366
 ] 

Alen Frantz commented on HIVE-6897:
---

I am facing the same issues. Since there is no overwrite feature in HCatalog, 
we need to do it outside Pig. 

The workaround right now is , to delete the part files inside the table 
directory before executing your Pig script. But be careful here, as the rm -r 
command is not a good practice.

Many people are facing the same issue, having said that, we actually need to 
add these features to be able to take HCatalog to a higher level. This would 
help the community.

Really appreciate if these features are added. 

Also, I would be glad if I could help in this in anyway. Feel free to get in 
touch.

Alen

 Allow overwrite/append to external Hive table (with partitions) via HCatStorer
 --

 Key: HIVE-6897
 URL: https://issues.apache.org/jira/browse/HIVE-6897
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, HiveServer2
Affects Versions: 0.12.0
Reporter: Dip Kharod

 I'm using HCatStorer to write to external Hive table with partition from Pig 
 and have the following different use cases:
 1) Need to overwrite (aka, refresh) data into table: Currently I end up doing 
 this outside (drop partition and delete HDFS folder) of Pig which is very 
 painful and error-prone
 2) Need to append (aka, add new file) data to the Hive external 
 table/partition: Again, I end up doing this outside of Pig by copying file in 
 appropriate folder
 It would be very productive for the developers to have both options in 
 HCatStorer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan resolved HIVE-11047.
-
   Resolution: Fixed
Fix Version/s: 1.2.1

Committed to branch-1.2 only. Thanks, Thejas!

 Update versions of branch-1.2 to 1.2.1
 --

 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Fix For: 1.2.1

 Attachments: HIVE-11047.2.patch, HIVE-11047.patch


 Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update 
 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java 
 to reflect that 1.2.1's schema is identical to 1.2.0.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog

2015-06-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10754:

Description: 
Replace all the deprecated new Job() with Job.getInstance() in HCatalog.


  was:
Replace all the deprecated new Job() with Job.getInstance().



 new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 Replace all the deprecated new Job() with Job.getInstance() in HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog

2015-06-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10754:

Description: 
Replace all the deprecated new Job() with Job.getInstance().


  was:
Some older version of new Job() seems not implemented properly, which causes 
the following issue:
{noformat}
Create table tbl1 (key string, value string) stored as rcfile;
Create table tbl2 (key string, value string);
insert into tbl1 values( '1', '111');
insert into tbl2 values('1', '2');
{noformat}

Pig script:
{noformat}
src_tbl1 = FILTER tbl1 BY (key == '1');
prj_tbl1 = FOREACH src_tbl1 GENERATE
   key as tbl1_key,
   value as tbl1_value,
   '333' as tbl1_v1;
   
src_tbl2 = FILTER tbl2 BY (key == '1');
prj_tbl2 = FOREACH src_tbl2 GENERATE
   key as tbl2_key,
   value as tbl2_value;
   
dump prj_tbl1;
dump prj_tbl2;
result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
prj_result = FOREACH result 
  GENERATE  prj_tbl1::tbl1_key AS key1,
prj_tbl1::tbl1_value AS value1,
prj_tbl1::tbl1_v1 AS v1,
prj_tbl2::tbl2_key AS key2,
prj_tbl2::tbl2_value AS value2;
   
dump prj_result;
{noformat}

The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  
Replace all the deprecated new Job() with Job.getInstance().



 new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 Replace all the deprecated new Job() with Job.getInstance().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog

2015-06-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10754:

Description: 
Some older version of new Job() seems not implemented properly, which causes 
the following issue:
{noformat}
Create table tbl1 (key string, value string) stored as rcfile;
Create table tbl2 (key string, value string);
insert into tbl1 values( '1', '111');
insert into tbl2 values('1', '2');
{noformat}

Pig script:
{noformat}
src_tbl1 = FILTER tbl1 BY (key == '1');
prj_tbl1 = FOREACH src_tbl1 GENERATE
   key as tbl1_key,
   value as tbl1_value,
   '333' as tbl1_v1;
   
src_tbl2 = FILTER tbl2 BY (key == '1');
prj_tbl2 = FOREACH src_tbl2 GENERATE
   key as tbl2_key,
   value as tbl2_value;
   
dump prj_tbl1;
dump prj_tbl2;
result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
prj_result = FOREACH result 
  GENERATE  prj_tbl1::tbl1_key AS key1,
prj_tbl1::tbl1_value AS value1,
prj_tbl1::tbl1_v1 AS v1,
prj_tbl2::tbl2_key AS key2,
prj_tbl2::tbl2_value AS value2;
   
dump prj_result;
{noformat}

The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  
Replace all the deprecated new Job() with Job.getInstance().


  was:
{noformat}
Create table tbl1 (key string, value string) stored as rcfile;
Create table tbl2 (key string, value string);
insert into tbl1 values( '1', '111');
insert into tbl2 values('1', '2');
{noformat}

Pig script:
{noformat}
src_tbl1 = FILTER tbl1 BY (key == '1');
prj_tbl1 = FOREACH src_tbl1 GENERATE
   key as tbl1_key,
   value as tbl1_value,
   '333' as tbl1_v1;
   
src_tbl2 = FILTER tbl2 BY (key == '1');
prj_tbl2 = FOREACH src_tbl2 GENERATE
   key as tbl2_key,
   value as tbl2_value;
   
dump prj_tbl1;
dump prj_tbl2;
result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
prj_result = FOREACH result 
  GENERATE  prj_tbl1::tbl1_key AS key1,
prj_tbl1::tbl1_value AS value1,
prj_tbl1::tbl1_v1 AS v1,
prj_tbl2::tbl2_key AS key2,
prj_tbl2::tbl2_value AS value2;
   
dump prj_result;
{noformat}

The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  We 
need to clone the job instance in HCatLoader.



 new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
 -

 Key: HIVE-10754
 URL: https://issues.apache.org/jira/browse/HIVE-10754
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10754.patch


 Some older version of new Job() seems not implemented properly, which causes 
 the following issue:
 {noformat}
 Create table tbl1 (key string, value string) stored as rcfile;
 Create table tbl2 (key string, value string);
 insert into tbl1 values( '1', '111');
 insert into tbl2 values('1', '2');
 {noformat}
 Pig script:
 {noformat}
 src_tbl1 = FILTER tbl1 BY (key == '1');
 prj_tbl1 = FOREACH src_tbl1 GENERATE
key as tbl1_key,
value as tbl1_value,
'333' as tbl1_v1;

 src_tbl2 = FILTER tbl2 BY (key == '1');
 prj_tbl2 = FOREACH src_tbl2 GENERATE
key as tbl2_key,
value as tbl2_value;

 dump prj_tbl1;
 dump prj_tbl2;
 result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
 prj_result = FOREACH result 
   GENERATE  prj_tbl1::tbl1_key AS key1,
 prj_tbl1::tbl1_value AS value1,
 prj_tbl1::tbl1_v1 AS v1,
 prj_tbl2::tbl2_key AS key2,
 prj_tbl2::tbl2_value AS value2;

 dump prj_result;
 {noformat}
 The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  
 Replace all the deprecated new Job() with Job.getInstance().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11048) Make test cbo_windowing robust

2015-06-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11048:

Attachment: HIVE-11048.patch

 Make test cbo_windowing robust
 --

 Key: HIVE-11048
 URL: https://issues.apache.org/jira/browse/HIVE-11048
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-11048.patch


 Add partition / order by in over clause to make result set deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10746) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat

2015-06-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592604#comment-14592604
 ] 

Sushanth Sowmyan commented on HIVE-10746:
-

Please add to the release wiki ( 
https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status ) when 
you commit any patch to branch-1.2. I'll go ahead and add this one in.

  Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat
 --

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Fix For: 1.2.1

 Attachments: HIVE-10746.1.patch, HIVE-10746.2.patch, 
 slow_query_output.zip


 The following query: 
 {code:sql}
 SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP 
 BY appl_user_id,arsn_cd ORDER BY appl_user_id;
 {code}
  runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting 
 to run this same query against Tez as the execution engine it consistently 
 runs for over 300-500 seconds this seems extremely long. This is a basic 
 external table delimited by tabs and is a single file in a folder. In Hive 
 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 
 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez 
 as an execution engine with Single or small file tables. I can attach further 
 logs if someone needs them for deeper analysis.
 HDFS Output:
 {noformat}
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 {noformat}
 Hive Table Describe:
 {noformat}
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  
 arsn_mbr_only_ind   string  
 arsn_qrmb_ind   string  
  
 # Detailed Table Information 
 Database:   adw  
 Owner:  loadu...@exa.example.com   
 CreateTime: Mon Apr 28 13:28:05 EDT 2014 
 LastAccessTime: UNKNOWN  
 Protect Mode:   None 
 Retention:  0
 Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn 

 Table Type: EXTERNAL_TABLE   
 Table Parameters:
 EXTERNALTRUE
 transient_lastDdlTime   1398706085  
  
 # Storage Information
 SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

 InputFormat:org.apache.hadoop.mapred.TextInputFormat 
 OutputFormat:   
 

[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

2015-06-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592608#comment-14592608
 ] 

Xuefu Zhang commented on HIVE-11023:


Makes sense. Thanks for the explanation.

 Disable directSQL if datanucleus.identifierFactory = datanucleus2
 -

 Key: HIVE-11023
 URL: https://issues.apache.org/jira/browse/HIVE-11023
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Fix For: 1.2.1

 Attachments: HIVE-11023.patch


 We hit an interesting bug in a case where datanucleus.identifierFactory = 
 datanucleus2 .
 The problem is that directSql handgenerates SQL strings assuming 
 datanucleus1 naming scheme. If a user has their metastore JDO managed by 
 datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate 
 are incorrect.
 One simple example of what this results in is the following: whenever DN 
 persists a field which is held as a ListT, it winds up storing each T as a 
 separate line in the appropriate mapping table, and has a column called 
 INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
 automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
 results in the list retaining its order. In DN2 naming scheme, the column is 
 called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
 upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
 and IDX.
 Whenever they use JDO, such as with all writes, it will then use the IDX 
 field, and when they do any sort of optimized reads, such as through 
 directSQL, it will ORDER BY INTEGER_IDX.
 An immediate danger is seen when we consider that the schema of a table is 
 stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
 will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
 schema for the table can come up mixed up in the table's native hashing 
 order, rather than sorted by the index.
 This can then result in schema ordering being different from the actual 
 table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this 
 may return (c:string, a:int, b: string), and thus, queries which are 
 inserting after selecting from another table can have ClassCastExceptions 
 when trying to insert data in the wong order - this is how we discovered this 
 bug. This problem, however, can be far worse, if there are no type problems - 
 it is possible, for eg., that if a,bc were all strings, that that insert 
 query would succeed but mix up the order, which then results in user table 
 data being mixed up. This has the potential to be very bad.
 We should write a tool to help convert metastores that use datanucleus2 to 
 datanucleus1(more difficult, needs more one-time testing) or change 
 directSql to support both(easier to code, but increases test-coverage matrix 
 significantly and we should really then be testing against both schemes). But 
 in the short term, we should disable directSql if we see that the 
 identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-18 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592630#comment-14592630
 ] 

Laljo John Pullokkaran commented on HIVE-10996:
---

[~jcamachorodriguez] How could GB child Filter have a different schema than GB?

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join

2015-06-18 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-10233:
-
Attachment: HIVE-10233.09.patch

Rebase and upload patch 09

 Hive on tez: memory manager for grace hash join
 ---

 Key: HIVE-10233
 URL: https://issues.apache.org/jira/browse/HIVE-10233
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap, 2.0.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
 HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
 HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, 
 HIVE-10233.09.patch


 We need a memory manager in llap/tez to manage the usage of memory across 
 threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]

2015-06-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592751#comment-14592751
 ] 

Xuefu Zhang commented on HIVE-10844:


[~chengxiang li], could you please provide a RB entry for this?

 Combine equivalent Works for HoS[Spark Branch]
 --

 Key: HIVE-10844
 URL: https://issues.apache.org/jira/browse/HIVE-10844
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch


 Some Hive queries(like [TPCDS 
 Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql])
  may share the same subquery, which translated into sperate, but equivalent 
 Works in SparkWork, combining these equivalent Works into a single one would 
 help to benifit from following dynamic RDD caching optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default

2015-06-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11037:
---
Attachment: HIVE-11037.01.patch

The temporary patch. We need to update the tez q files, too.

 HiveOnTez: make explain user level = true as default
 

 Key: HIVE-11037
 URL: https://issues.apache.org/jira/browse/HIVE-11037
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11037.01.patch


 In Hive-9780, we introduced a new level of explain for hive on tez. We would 
 like to make it running by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11035) PPD: Orc Split elimination fails because filterColumns=[-1]

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11035:
-
Attachment: HIVE-11035-branch-1.0.patch

 PPD: Orc Split elimination fails because filterColumns=[-1]
 ---

 Key: HIVE-11035
 URL: https://issues.apache.org/jira/browse/HIVE-11035
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11035-branch-1.0.patch, HIVE-11035.patch


 {code}
 create temporary table xx (x int) stored as orc ;
 insert into xx values (20),(200);
 set hive.fetch.task.conversion=none;
 select * from xx where x is null;
 {code}
 This should generate zero tasks after optional split elimination in the app 
 master, instead of generating the 1 task which for sure hits the row-index 
 filters and removes all rows anyway.
 Right now, this runs 1 task for the stripe containing (min=20, max=200, 
 has_null=false), which is broken.
 Instead, it returns YES_NO_NULL from the following default case
 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L976



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591366#comment-14591366
 ] 

Hive QA commented on HIVE-11025:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740235/HIVE-11025.patch

{color:green}SUCCESS:{color} +1 9008 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4297/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4297/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4297/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740235 - PreCommit-HIVE-TRUNK-Build

 In windowing spec, when the datatype is decimal, it's comparing the value 
 against NULL value incorrectly
 

 Key: HIVE-11025
 URL: https://issues.apache.org/jira/browse/HIVE-11025
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11025.patch


 Given data and the following query,
 {noformat}
 deptno  empno  bonussalary
 307698 NULL2850.0 
 307900 NULL950.0 
 307844 0   1500.0 
 select avg(salary) over (partition by deptno order by bonus range 200 
 preceding) from emp2;
 {noformat}
 It produces incorrect result for the row in which bonus=0
 1900.0
 1900.0
 1766.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10952) Describe a non-partitioned table fail

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591368#comment-14591368
 ] 

Hive QA commented on HIVE-10952:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740238/HIVE-10952.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4298/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4298/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4298/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4298/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
From https://github.com/apache/hive
   9692e89..fb86cef  branch-1.0 - origin/branch-1.0
   11a0901..2e1bee8  branch-1.1 - origin/branch-1.1
   d1eaa37..749bbfc  branch-1.2 - origin/branch-1.2
   524cd79..9a511eb  master - origin/master
+ git reset --hard HEAD
HEAD is now at 524cd79 HIVE-11023 : Disable directSQL if 
datanucleus.identifierFactory = datanucleus2 (Sushanth Sowmyan, reviewed by 
Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 9a511eb HIVE-11035: PPD: Orc Split elimination fails because 
filterColumns=[-1] (Prasanth Jayachandran reviewed by Gopal V)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740238 - PreCommit-HIVE-TRUNK-Build

 Describe a non-partitioned table fail
 -

 Key: HIVE-10952
 URL: https://issues.apache.org/jira/browse/HIVE-10952
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Alan Gates
 Fix For: hbase-metastore-branch

 Attachments: HIVE-10952-1.patch, HIVE-10952.patch


 This section of alter1.q fail:
 create table alter1(a int, b int);
 describe extended alter1;
 Exception:
 {code}
 Trying to fetch a non-existent storage descriptor from hash 
 iNVRGkfwwQDGK9oX0fo9XA==^M
 at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer$QualifiedNameUtil.getAttemptTableName(DDLSemanticAnalyzer.java:1765)
 at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer$QualifiedNameUtil.getTableName(DDLSemanticAnalyzer.java:1807)
 at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDescribeTable(DDLSemanticAnalyzer.java:1985)
 at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:318)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
 at 

[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11031:
-
Attachment: HIVE-11031.3.patch

Addressed minor nit.

 ORC concatenation of old files can fail while merging column statistics
 ---

 Key: HIVE-11031
 URL: https://issues.apache.org/jira/browse/HIVE-11031
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical
 Attachments: HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.patch


 Column statistics in ORC are optional protobuf fields. Old ORC files might 
 not have statistics for newly added types like decimal, date, timestamp etc. 
 But column statistics merging assumes column statistics exists for these 
 types and invokes merge. For example, merging of TimestampColumnStatistics 
 directly casts the received ColumnStatistics object without doing instanceof 
 check. If the ORC file contains time stamp column statistics then this will 
 work else it will throw ClassCastException.
 Also, the file merge operator swallows the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10685:
-
Fix Version/s: (was: 1.1.0)
   (was: 1.2.0)
   2.0.0
   1.2.1
   1.1.1
   1.0.1

 Alter table concatenate oparetor will cause duplicate data
 --

 Key: HIVE-10685
 URL: https://issues.apache.org/jira/browse/HIVE-10685
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1
Reporter: guoliming
Assignee: guoliming
Priority: Critical
 Fix For: 1.0.1, 1.1.1, 1.2.1, 2.0.0

 Attachments: HIVE-10685.patch


 Orders table has 15 rows and stored as ORC. 
 {noformat}
 hive select count(*) from orders;
 OK
 15
 Time taken: 37.692 seconds, Fetched: 1 row(s)
 {noformat}
 The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB.
 After executing command : ALTER TABLE orders CONCATENATE;
 The table is already 1530115000 rows.
 My hive version is 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11040) Change Derby dependency version to 10.10.2.0

2015-06-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591594#comment-14591594
 ] 

Hive QA commented on HIVE-11040:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12740263/HIVE-11040.1.patch

{color:green}SUCCESS:{color} +1 9009 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4300/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4300/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4300/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12740263 - PreCommit-HIVE-TRUNK-Build

 Change Derby dependency version to 10.10.2.0
 

 Key: HIVE-11040
 URL: https://issues.apache.org/jira/browse/HIVE-11040
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-11040.1.patch


 We don't see this on the Apache pre-commit tests because it uses PTest, but 
 running the entire TestCliDriver suite results in failures in some of the 
 partition-related qtests (partition_coltype_literals, partition_date, 
 partition_date2). I've only really seen this on Linux (I was using CentOS).
 HIVE-8879 changed the Derby dependency version from 10.10.1.1 to 10.11.1.1. 
 Testing with 10.10.1.1 or 10.20.2.0 seems to allow the partition related 
 tests to pass. I'd like to change the dependency version to 10.20.2.0, since 
 that version should also contain the fix for HIVE-8879.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1

2015-06-18 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9511:
---
Attachment: HIVE-9511.4.patch

 Switch Tez to 0.6.1
 ---

 Key: HIVE-9511
 URL: https://issues.apache.org/jira/browse/HIVE-9511
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, 
 HIVE-9511.4.patch, HIVE-9511.patch.txt


 Tez 0.6.1 has been released.
 Research to switch to version 0.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1

2015-06-18 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9511:
---
Attachment: (was: HIVE-9511.4.patch.txt)

 Switch Tez to 0.6.1
 ---

 Key: HIVE-9511
 URL: https://issues.apache.org/jira/browse/HIVE-9511
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Damien Carol
Assignee: Damien Carol
 Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, 
 HIVE-9511.4.patch, HIVE-9511.patch.txt


 Tez 0.6.1 has been released.
 Research to switch to version 0.6.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader

2015-06-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11033:
-
Attachment: HIVE-11033.2.patch

 BloomFilter index is not honored by ORC reader
 --

 Key: HIVE-11033
 URL: https://issues.apache.org/jira/browse/HIVE-11033
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Allan Yan
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11033.2.patch, HIVE-11033.patch


 There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
 which caused the bloom filter index saved in the ORC file not being used. The 
 root cause is the bloomFilterIndices variable defined in the SargApplier 
 class superseded the one defined in its parent class. Therefore, in the 
 ReaderImpl.pickRowGroups()
 {code}
   protected boolean[] pickRowGroups() throws IOException {
 // if we don't have a sarg or indexes, we read everything
 if (sargApp == null) {
   return null;
 }
 readRowIndex(currentStripe, included, sargApp.sargColumns);
 return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
   }
 {code}
 The bloomFilterIndices populated by readRowIndex() is not picked up by 
 sargApp object. One solution is to make SargApplier.bloomFilterIndices a 
 reference to its parent counterpart.
 {noformat}
 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
 src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
 174d173
  bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 178c177
sarg, options.getColumnNames(), strideRate, types, 
 included.length, bloomFilterIndices);
 ---
sarg, options.getColumnNames(), strideRate, types, 
  included.length);
 204a204
  bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 673c673
  ListOrcProto.Type types, int includedCount, 
 OrcProto.BloomFilterIndex[] bloomFilterIndices) {
 ---
  ListOrcProto.Type types, int includedCount) {
 677c677
this.bloomFilterIndices = bloomFilterIndices;
 ---
bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >