[jira] [Commented] (HIVE-11041) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591662#comment-14591662 ] Jesus Camacho Rodriguez commented on HIVE-11041: [~hsubramaniyan], could you take a look? This patch contains the missing pieces from HIVE-10684/HIVE-10705 for 1.2. Thanks Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11041 URL: https://issues.apache.org/jira/browse/HIVE-11041 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11041.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591767#comment-14591767 ] Hive QA commented on HIVE-11028: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740285/HIVE-11028.2.patch {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 9010 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_eq_with_case_when org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_when org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join32_lessSize org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join33 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_subquery {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4302/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4302/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4302/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740285 - PreCommit-HIVE-TRUNK-Build Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at
[jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591684#comment-14591684 ] Hive QA commented on HIVE-11031: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740329/HIVE-11031.4.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9010 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4301/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4301/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4301/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740329 - PreCommit-HIVE-TRUNK-Build ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9388) HiveServer2 fails to reconnect to MetaStore after MetaStore restart
[ https://issues.apache.org/jira/browse/HIVE-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariusz Strzelecki resolved HIVE-9388. -- Resolution: Duplicate HIVE-10384 HiveServer2 fails to reconnect to MetaStore after MetaStore restart --- Key: HIVE-9388 URL: https://issues.apache.org/jira/browse/HIVE-9388 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0, 0.14.0, 0.13.1, 1.0.0 Reporter: Piotr Ackermann Attachments: HIVE-9388.2.patch, HIVE-9388.patch How to reproduce: # Use Hue to connect to HiveServer2 # Restart Metastore # Try to execute any query in Hue HiveServer2 report error: {quote} ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException null org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:355) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:432) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90) at com.sun.proxy.$Proxy10.getDatabases(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1681) at com.sun.proxy.$Proxy10.getDatabases(Unknown Source) at org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62) at org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715) at org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:438) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) at com.sun.proxy.$Proxy19.getSchemas(Unknown Source) at org.apache.hive.service.cli.CLIService.getSchemas(CLIService.java:277) at org.apache.hive.service.cli.thrift.ThriftCLIService.GetSchemas(ThriftCLIService.java:436) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1433) at org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas.getResult(TCLIService.java:1418) at
[jira] [Updated] (HIVE-11041) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11041: --- Attachment: HIVE-11041.patch Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11041 URL: https://issues.apache.org/jira/browse/HIVE-11041 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11041.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11041) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11041: --- Component/s: Tests Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11041 URL: https://issues.apache.org/jira/browse/HIVE-11041 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11041.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Attachment: HIVE-10999.1-spark.patch Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10746) Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from TextInputFormat
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10746: --- Attachment: HIVE-10746.2.patch Test failures look unrelated. Reformat before commit. Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from TextInputFormat -- Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Attachments: HIVE-10746.1.patch, HIVE-10746.2.patch, slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns:
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592061#comment-14592061 ] Hive QA commented on HIVE-10996: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740328/HIVE-10996.01.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8996 tests executed *Failed tests:* {noformat} TestCliDriver-enforce_order.q-bucketcontext_4.q-stats_publisher_error_1.q-and-12-more - did not produce a TEST-*.xml file org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4304/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4304/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4304/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740328 - PreCommit-HIVE-TRUNK-Build Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced
[jira] [Updated] (HIVE-10746) Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from TextInputFormat
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10746: --- Description: The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: {code} hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \t line.delim \n serialization.format\t Time taken: 1.245 seconds, Fetched: 54 row(s) {code} Explain Hive 1.2.0 w/Tez: {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) Explain Hive 0.13 w/Tez: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3
[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11031: - Attachment: HIVE-11031-branch-1.0.patch ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Fix For: 1.2.1, 2.0.0 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on LLAP: Memory manager
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-10233: -- Affects Version/s: 2.0.0 Hive on LLAP: Memory manager Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Attachment: (was: HIVE-10999.1-spark.patch) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11031: - Attachment: HIVE-11031-branch-1.0.patch ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Fix For: 1.2.1, 2.0.0 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10746) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10746: --- Summary: Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat (was: Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from TextInputFormat) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat -- Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Attachments: HIVE-10746.1.patch, HIVE-10746.2.patch, slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: {code} hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1
[jira] [Commented] (HIVE-11029) hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be impersonated
[ https://issues.apache.org/jira/browse/HIVE-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592128#comment-14592128 ] Xuefu Zhang commented on HIVE-11029: [~nyang], thanks for working on this. To clarify, the problem happens only if the cluster is not a secure one, correct? hadoop.proxyuser.mapr.groups does not work to restrict the groups that can be impersonated -- Key: HIVE-11029 URL: https://issues.apache.org/jira/browse/HIVE-11029 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0 Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-11029.patch In the core-site.xml, the hadoop.proxyuser.user.groups specifies the user groups which can be impersonated by the HS2 user. However, this does not work properly in Hive. In my core-site.xml, I have the following configs: property namehadoop.proxyuser.mapr.hosts/name value*/value /property property namehadoop.proxyuser.mapr.groups/name valueroot/value /property I would expect with this configuration that 'mapr' can impersonate only members of the Unix group 'root'. However if I submit a query as user 'jon' the query is running as user 'jon' even though 'mapr' should not be able to impersonate this user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11042) Need fix Utilities.replaceTaskId method
[ https://issues.apache.org/jira/browse/HIVE-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-11042: Attachment: HIVE-11042.1.patch Need code review Need fix Utilities.replaceTaskId method --- Key: HIVE-11042 URL: https://issues.apache.org/jira/browse/HIVE-11042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11042.1.patch When I are looking at other bug, I found Utilities.replaceTaskId (String, int) method is not right. For example Utilities.replaceTaskId(ds%3D1)01, 5); return 5 It should return (ds%3D1)05 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11031: - Attachment: (was: HIVE-11031-branch-1.0.patch) ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Fix For: 1.2.1, 2.0.0 Attachments: HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592029#comment-14592029 ] Alan Gates commented on HIVE-10165: --- bq. I wanted the ability to mock them in the TestMutatorCoordinator test. They are package private, so this separation doesn't leak into the public API. If this is undesirable, can you recommend an alternative approach? That's fine. I think comments to reflect that those arguments are only for testing purposes would be helpful. bq. This class relies on the correct grouping of the data (by partition,bucket) to avoid the problem that you describe. ... Very keen to hear your thoughts on this. I am fine with pushing this responsibility to the client. But the following in the class javadoc is confusing. It starts by saying {{Events must be grouped by partition, then bucket}} but then later says {{Events are free to target any bucket and partition, including new partitions if {@link MutatorDestination#createPartitions()} is set. Internally the coordinator creates and closes {@link Mutator Mutators} as needed to write to the appropriate partition and bucket.}} This latter makes it sound like random order is ok. I think you're trying to say group by partition bucket, and the MutatorCoordinator will seamlessly handle the transitions between groups. Is that right? I think we should be very clear to users that there is an extreme performance and storage penalty for jumping around in random order. bq. I now wonder whether the work I’m doing in UgiMetaStoreClientFactory is already available in an existing Hive class as it seems like a common requirement. Can you advise? There are a number of places Hive does UGI calls, but I'm not aware of any where it does them for metastore calls. At this point the only issues I see remaining to get this committed is the two javadoc comments I've pointed out above. Improve hive-hcatalog-streaming extensibility and support updates and deletes. -- Key: HIVE-10165 URL: https://issues.apache.org/jira/browse/HIVE-10165 Project: Hive Issue Type: Improvement Components: HCatalog Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Labels: streaming_api Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, mutate-system-overview.png h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592140#comment-14592140 ] Prasanth Jayachandran commented on HIVE-11031: -- Note for backport: The branch-1.0 patch will apply cleanly, but if we run orc_merge_incompat1.q it can fail on some platforms. To make it more consistent we need HIVE-8801 patch which makes the orc_merge_incompat1.q test more consistent across platforms. ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Fix For: 1.2.1, 2.0.0 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592990#comment-14592990 ] Lefty Leverenz commented on HIVE-7193: -- Yes, I see. Thanks [~ngangam]. Could the parameter descriptions include this information? Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593003#comment-14593003 ] Lefty Leverenz commented on HIVE-7193: -- Well, I'd like to see commas colons explained in the description but maybe that's just because I'm ignorant about LDAP. If you don't think it's necessary, it can still be added to the description in the wiki. And of course it's available here. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593005#comment-14593005 ] Lefty Leverenz commented on HIVE-7193: -- Well, I'd like to see commas colons explained in the description but maybe that's just because I'm ignorant about LDAP. If you don't think it's necessary, it can still be added to the description in the wiki. And of course it's available here. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7193: - Comment: was deleted (was: Well, I'd like to see commas colons explained in the description but maybe that's just because I'm ignorant about LDAP. If you don't think it's necessary, it can still be added to the description in the wiki. And of course it's available here.) Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592882#comment-14592882 ] Greg Senia commented on HIVE-11051: --- This seems to be related/similar: http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2 http://qnalist.com/questions/5904003/map-side-join-fails-when-a-serialized-table-contains-arrays Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; - Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Attachments: problem_table_joins.tar.gz I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592957#comment-14592957 ] Hive QA commented on HIVE-10996: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740526/HIVE-10996.03.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9011 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_having {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4315/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4315/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4315/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740526 - PreCommit-HIVE-TRUNK-Build Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Comment: was deleted (was: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740473/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 603 failed/errored test(s), 7154 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Attachment: (was: HIVE-10999.1-spark.patch) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Comment: was deleted (was: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740429/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 605 failed/errored test(s), 7101 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11037: --- Attachment: HIVE-11037.02.patch address [~jpullokkaran] and [~hagleitn]'s comments. HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-7193: Attachment: HIVE-7193.5.patch Attaching new patch with doc changes from Lefty's review. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593019#comment-14593019 ] Lefty Leverenz commented on HIVE-7193: -- Great, then in Configuration Properties the parameters will be linked to the LDAP section. Thanks Naveen. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592862#comment-14592862 ] Jesus Camacho Rodriguez commented on HIVE-10996: The schema of the operators in the new plan would be: {noformat} GB - (col0, col1, col2) SEL - (col1, col2) FIL - (col1, col2) SEL - (col1, col2) {noformat} Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592984#comment-14592984 ] Naveen Gangam commented on HIVE-7193: - Thank you for the review. Q. Also, why is the example a comma-separated list when the description says colon-separated? A. The example shows a single pattern for users for LDAP. Each attribute in LDAP DN is separated by COMMA CN=%s,CN=Users,DC=subdomain,DC=domain,DC=com However, it is possible that a ldap directory could have users in different trees. The pattern for baseDN for each tree is separated by COLON. For example CN=%s,CN=Users,DC=subdomain,DC=domain,DC=com:CN=%s,OU=IT,DC=domain,DC=com The same is true for group patterns. Does this help? Thanks Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11050) testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries
[ https://issues.apache.org/jira/browse/HIVE-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593013#comment-14593013 ] Hive QA commented on HIVE-11050: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740528/HIVE-11050.01.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9010 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4316/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4316/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4316/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740528 - PreCommit-HIVE-TRUNK-Build testCliDriver_vector_outer_join.* failures in Unit tests due to unstable data creation queries -- Key: HIVE-11050 URL: https://issues.apache.org/jira/browse/HIVE-11050 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Matt McCline Assignee: Matt McCline Priority: Blocker Attachments: HIVE-11050.01.patch In some environments the Q file tests vector_outer_join\{1-4\}.q fail because the data creation queries produce different input files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593014#comment-14593014 ] Naveen Gangam commented on HIVE-7193: - I intend to enhance the LDAP section wiki docs about using these new properties in detail, with examples. I just holding out until this patch gets committed. I figured thats where most users will look when attempting to use this feature. Would that suffice? And leave the patch 5 as-was for now? Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593046#comment-14593046 ] Hive QA commented on HIVE-10999: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740569/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 7943 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_lateral_view_explode2 org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/894/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/894/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-894/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740569 - PreCommit-HIVE-SPARK-Build Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
[ https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592953#comment-14592953 ] Lefty Leverenz commented on HIVE-11023: --- Should this be documented in the wiki? Disable directSQL if datanucleus.identifierFactory = datanucleus2 - Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical Fix For: 1.2.1 Attachments: HIVE-11023.patch We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593020#comment-14593020 ] Rui Li commented on HIVE-10999: --- When I tried the patch earlier the downloaded jar was still invalid. Now it works well and I've passed some tests locally. Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-7193: Attachment: (was: HIVE-7193.5.patch) Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592996#comment-14592996 ] Lefty Leverenz commented on HIVE-7193: -- I'm getting 404 Oops, you've found a dead link for patch 5. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated HIVE-11051: -- Component/s: Tez Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; - Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Tez Affects Versions: 1.2.0 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Attachments: problem_table_joins.tar.gz I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at
[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11051: --- Description: I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 {code} Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592981#comment-14592981 ] Xuefu Zhang commented on HIVE-10999: [~lirui], could you try the patch locally to see if you can run at least one Spark q test successfully? It worked for me, but the pre-commit test seems having trouble. Thanks. Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7193: - Comment: was deleted (was: I'm getting 404 Oops, you've found a dead link for patch 5.) Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592995#comment-14592995 ] Lefty Leverenz commented on HIVE-7193: -- I'm getting 404 Oops, you've found a dead link for patch 5. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593000#comment-14593000 ] Naveen Gangam commented on HIVE-7193: - Sorry, I just deleted it seeing you latest comment about including additional info in the parameter description. Should all of the above info be in the description? Thanks Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-7193: Attachment: HIVE-7193.5.patch Re-attaching the patch with Lefty's suggestion. I will include full details when I update the wiki docs for LDAP Configuration. (Thanks Lefty) Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.5.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592870#comment-14592870 ] Gunther Hagleitner commented on HIVE-10233: --- Partial review: * There's some unnecessary commented out code in HashTableLoader (NOCOND..) * getOutputMemoryNeeded isn't referenced anywhere. I think this can be dropped together with setOutputMemoryNeeded + references. * IMO a definition of ONE_MB makes no sense might as well use the number in the code * Same for getInputMemoryNeededFraction * memoryInUse in AbstractOperatorDesc isn't used anywhere *|| (conf.getBoolVar(HiveConf.ConfVars.HIVEUSEHYBRIDGRACEHASHJOIN))) { did you mean to say ? Are you trying to run the mem manager only if tez and hybrid? * You set a work's memory usage to the data size of it's terminal operator. How come? Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11042) Need fix Utilities.replaceTaskId method
[ https://issues.apache.org/jira/browse/HIVE-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592929#comment-14592929 ] Yongzhi Chen commented on HIVE-11042: - The one failure is not related. Its age is 2. [~csun], [~szehon], could you review the change? Thanks Need fix Utilities.replaceTaskId method --- Key: HIVE-11042 URL: https://issues.apache.org/jira/browse/HIVE-11042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11042.1.patch When I are looking at other bug, I found Utilities.replaceTaskId (String, int) method is not right. For example Utilities.replaceTaskId(ds%3D1)01, 5); return 5 It should return (ds%3D1)05 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Attachment: (was: HIVE-10999.1-spark.patch) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Attachment: HIVE-10999.1-spark.patch Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7193) Hive should support additional LDAP authentication parameters
[ https://issues.apache.org/jira/browse/HIVE-7193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593009#comment-14593009 ] Lefty Leverenz commented on HIVE-7193: -- Sorry about the duplicate comments. I'll review patch 5 tomorrow. Hive should support additional LDAP authentication parameters - Key: HIVE-7193 URL: https://issues.apache.org/jira/browse/HIVE-7193 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mala Chikka Kempanna Assignee: Naveen Gangam Attachments: HIVE-7193.2.patch, HIVE-7193.3.patch, HIVE-7193.4.patch, HIVE-7193.patch, LDAPAuthentication_Design_Doc.docx, LDAPAuthentication_Design_Doc_V2.docx Currently hive has only following authenticator parameters for LDAP authentication for hiveserver2: {code:xml} property namehive.server2.authentication/name valueLDAP/value /property property namehive.server2.authentication.ldap.url/name valueldap://our_ldap_address/value /property {code} We need to include other LDAP properties as part of hive-LDAP authentication like below: {noformat} a group search base - dc=domain,dc=com a group search filter - member={0} a user search base - dc=domain,dc=com a user search filter - sAMAAccountName={0} a list of valid user groups - group1,group2,group3 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11041) Update tests for HIVE-9302 after removing binaries
[ https://issues.apache.org/jira/browse/HIVE-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592224#comment-14592224 ] Hive QA commented on HIVE-11041: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740366/HIVE-11041.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4307/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4307/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4307/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4307/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin From https://github.com/apache/hive 9b10194..f6d8075 branch-1 - origin/branch-1 fb86cef..06f10fe branch-1.0 - origin/branch-1.0 2e1bee8..d8ff0bc branch-1.1 - origin/branch-1.1 234b82d..703882c branch-1.2 - origin/branch-1.2 3f8b0ef..dd30afc master - origin/master + git reset --hard HEAD HEAD is now at 3f8b0ef HIVE-11031: ORC concatenation of old files can fail while merging column statistics (Prasanth Jayachandran reviewed by Gopal V) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. + git reset --hard origin/master HEAD is now at dd30afc HIVE-11040 : Change Derby dependency version to 10.10.2.0 (Jason Dere, reviewed by Sushanth Sowmyan, Gunther Hagleitner) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12740366 - PreCommit-HIVE-TRUNK-Build Update tests for HIVE-9302 after removing binaries -- Key: HIVE-11041 URL: https://issues.apache.org/jira/browse/HIVE-11041 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11041.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10999: --- Attachment: HIVE-10999.1-spark.patch Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592388#comment-14592388 ] Hive QA commented on HIVE-10996: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740417/HIVE-10996.02.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9011 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4308/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4308/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4308/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740417 - PreCommit-HIVE-TRUNK-Build Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to
[jira] [Commented] (HIVE-11047) Update versions of branch-1.2 to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592393#comment-14592393 ] Thejas M Nair commented on HIVE-11047: -- +1 Update versions of branch-1.2 to 1.2.1 -- Key: HIVE-11047 URL: https://issues.apache.org/jira/browse/HIVE-11047 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-11047.patch Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to reflect that 1.2.1's schema is identical to 1.2.0. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function
[ https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592377#comment-14592377 ] Yi Zhang commented on HIVE-10142: - This request is more in the line as following decay variable definition: Exponential rate of change can be modeled algebraically by the following formula: N(t)=N(0)e^(−λt) where N is the quantity, N0 is the initial quantity, λ is the decay constant, and t is time. And the window function will be a summary of the value of all records in the window relative to the current record. Calculating formula based on difference between each row's value and current row's in Windowing function Key: HIVE-10142 URL: https://issues.apache.org/jira/browse/HIVE-10142 Project: Hive Issue Type: New Feature Components: PTF-Windowing Affects Versions: 1.0.0 Reporter: Yi Zhang Assignee: Aihua Xu For analytics with windowing function, the calculation formula sometimes needs to perform over each row's value against current tow's value. The decay value is a good example, such as sums of value with a decay function based on difference of timestamp between each row and current row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11036) Race condition in DataNucleus makes Metastore to hang
[ https://issues.apache.org/jira/browse/HIVE-11036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11036: Reporter: Takahiko Saito (was: Ashutosh Chauhan) Race condition in DataNucleus makes Metastore to hang - Key: HIVE-11036 URL: https://issues.apache.org/jira/browse/HIVE-11036 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Takahiko Saito Assignee: Ashutosh Chauhan Attachments: HIVE-11036.patch Under moderate to high concurrent query workload Metastore gets deadlocked in DataNucleus -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11047) Update versions of branch-1.2 to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11047: Description: Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to reflect that 1.2.1's schema is identical to 1.2.0. NO PRECOMMIT TESTS Update versions of branch-1.2 to 1.2.1 -- Key: HIVE-11047 URL: https://issues.apache.org/jira/browse/HIVE-11047 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-11047.patch Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to reflect that 1.2.1's schema is identical to 1.2.0. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11047) Update versions of branch-1.2 to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11047: Attachment: HIVE-11047.patch Patch attached. Update versions of branch-1.2 to 1.2.1 -- Key: HIVE-11047 URL: https://issues.apache.org/jira/browse/HIVE-11047 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-11047.patch Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to reflect that 1.2.1's schema is identical to 1.2.0. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11047) Update versions of branch-1.2 to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-11047: Attachment: HIVE-11047.2.patch Updated patch slightly to reflect the MetaStoreSchemaInfo.java change happening outside this patch for all branches. Update versions of branch-1.2 to 1.2.1 -- Key: HIVE-11047 URL: https://issues.apache.org/jira/browse/HIVE-11047 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 1.2.1 Attachments: HIVE-11047.2.patch, HIVE-11047.patch Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to reflect that 1.2.1's schema is identical to 1.2.0. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592267#comment-14592267 ] Hive QA commented on HIVE-10999: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740429/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 605 failed/errored test(s), 7101 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592452#comment-14592452 ] Hive QA commented on HIVE-10999: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740473/HIVE-10999.1-spark.patch {color:red}ERROR:{color} -1 due to 603 failed/errored test(s), 7154 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct
[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592463#comment-14592463 ] Laljo John Pullokkaran commented on HIVE-11028: --- +1 Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, HIVE-11028.3.patch {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313) at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 13 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11045) ArrayIndexOutOfBoundsException with Hive 1.2.0 and Tez 0.7.0
[ https://issues.apache.org/jira/browse/HIVE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592502#comment-14592502 ] Soundararajan Velu commented on HIVE-11045: --- Vikram, I face this issue only with Hive on Tez, my data is in json format and I use JsonSerde from https://github.com/rcongiu/Hive-JSON-Serde, The query runs perfectly fine on Hive. This only occurs with Tez. Data set is huge and I have no clue on which records this exception arises, The query is as below, SELECT t1.return_id AS return_id, t1.approve_date AS approve_date, t1.approve_date_key AS approve_date_key, t1.cancel_date AS cancel_date, t1.cancel_date_key AS cancel_date_key, t1.complete_date AS complete_date, t1.complete_date_key AS complete_date_key, t1.init_cancellation_date AS init_cancellation_date, t1.init_cancellation_date_key AS init_cancellation_date_key, t1.reject_date AS reject_date, t1.reject_date_key AS reject_date_key, t1.unhold_date AS unhold_date, t1.unhold_date_key AS unhold_date_key, t1.request_service_date AS request_service_date, t1.request_service_date_key AS request_service_date_key, t1.service_approve_return_date AS service_approve_return_date, t1.service_approve_return_date_key AS service_approve_return_date_key, CASE WHEN t2.action_override_status_time IS NULL THEN 0 ELSE 1 END AS flag_action_override, CASE WHEN t2.action_override_status_time IS NULL THEN NULL ELSE t2.action_override_status_time END AS action_override_status_time, CASE WHEN t2.action_override_user_login IS NULL THEN 'NA' ELSE t2.action_override_user_login END AS action_override_user_login, CASE WHEN t2.action_override_change_reason IS NULL THEN 'NA' ELSE t2.action_override_change_reason END AS action_override_change_reason, CASE WHEN t2.action_override_change_sub_reason IS NULL THEN 'NA' ELSE t2.action_override_change_sub_reason END AS action_override_change_sub_reason, CASE WHEN t2.action_override_count IS NULL THEN cast(0 AS bigint) ELSE t2.action_override_count END AS action_override_count, CASE WHEN t2.action_change_data IS NULL THEN 'NA' ELSE t2.action_change_data END AS action_change_data, CASE WHEN t3.policy_override_status_time IS NULL THEN 0 ELSE 1 END AS flag_policy_override, CASE WHEN t3.policy_override_status_time IS NULL THEN NULL ELSE t3.policy_override_status_time END AS policy_override_status_time, CASE WHEN t3.policy_override_user_login IS NULL THEN 'NA' ELSE t3.policy_override_user_login END AS policy_override_user_login, CASE WHEN t3.policy_override_change_reason IS NULL THEN 'NA' ELSE t3.policy_override_change_reason END AS policy_override_change_reason, CASE WHEN t3.policy_override_change_sub_reason IS NULL THEN 'NA' ELSE t3.policy_override_change_sub_reason END AS policy_override_change_sub_reason, CASE WHEN t3.policy_override_count IS NULL THEN cast(0 AS bigint) ELSE t3.policy_override_count END AS policy_override_count, CASE WHEN t3.policy_change_data IS NULL THEN 'NA' ELSE t3.policy_change_data END AS policy_change_data, cast(0 AS bigint) AS temp_flag, CASE WHEN t3.policy_override_status_date_key IS NULL THEN 0 ELSE t3.policy_override_status_date_key END AS policy_override_status_date_key, CASE WHEN t2.action_override_status_date_key IS NULL THEN 0 ELSE t2.action_override_status_date_key END AS action_override_status_date_key, t1.user_approved_by AS user_approved_by, t1.user_rejected_by AS user_rejected_by, t1.user_cancelled_by AS user_cancelled_by, t1.reject_reason AS reject_reason, t1.reject_sub_reason AS reject_sub_reason, t1.reject_change_data AS reject_change_data FROM (SELECT rh1.`data`.return_id, MIN (CASE WHEN rh1.`data`.event = 'approve' THEN rh1.`data`.status_time ELSE NULL END) AS approve_date, MIN (CASE WHEN rh1.`data`.event = 'cancel' THEN rh1.`data`.status_time ELSE NULL END) AS cancel_date, MIN (CASE WHEN rh1.`data`.event = 'complete' THEN rh1.`data`.status_time ELSE NULL END) AS
[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592466#comment-14592466 ] Laljo John Pullokkaran commented on HIVE-11028: --- [~jdere] Test failures needs to be addressed otherwise looks good. Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, HIVE-11028.3.patch {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313) at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 13 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11044) Some optimizable predicates being missed by constant propagation
[ https://issues.apache.org/jira/browse/HIVE-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11044: -- Attachment: HIVE-11044.1.patch Initial patch, running ConstantPropagate one additional time after PartitionPruner during Optimizer.initialize(). The qfile updates show removal of unnecessary predicates, either (constant = constant), or (column is not null) when there are additional predicates on the column, along with updated stats due to the removal of the predicates. Will need to update this patch for test explainuser_2.q, once HIVE-11028 is committed. Some optimizable predicates being missed by constant propagation Key: HIVE-11044 URL: https://issues.apache.org/jira/browse/HIVE-11044 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11044.1.patch Some of the qfile explain plans show some predicates that could be taken care of by running ConstantPropagate after the PartitionPruner: index_auto_unused.q: {noformat} filterExpr: ((12.0 = 12.0) and (UDFToDouble(key) 10.0)) (type: boolean) {noformat} join28.q: {noformat} predicate: ((11.0 = 11.0) and key is not null) (type: boolean) {noformat} bucketsort_optimize_insert_7.q (is not null is unnecessary) {noformat} predicate: (((key 8) and key is not null) and ((key = 0) or (key = 5))) (type: boolean) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592475#comment-14592475 ] Jason Dere commented on HIVE-11028: --- Would like to add this to branch-1.2 Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, HIVE-11028.3.patch {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313) at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 13 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11042) Need fix Utilities.replaceTaskId method
[ https://issues.apache.org/jira/browse/HIVE-11042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592528#comment-14592528 ] Hive QA commented on HIVE-11042: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740423/HIVE-11042.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9011 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4309/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4309/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4309/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12740423 - PreCommit-HIVE-TRUNK-Build Need fix Utilities.replaceTaskId method --- Key: HIVE-11042 URL: https://issues.apache.org/jira/browse/HIVE-11042 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-11042.1.patch When I are looking at other bug, I found Utilities.replaceTaskId (String, int) method is not right. For example Utilities.replaceTaskId(ds%3D1)01, 5); return 5 It should return (ds%3D1)05 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
[ https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592540#comment-14592540 ] Sushanth Sowmyan commented on HIVE-11023: - [~xuefuz] : Yes, this will happen to all releases with directSql - which means anything past 0.12, I think. That said, the number of installations that override this parameter should hopefully be minimal. If people are using datanucleus2 as their identifierFactory version, then: a) They should disable directSql (can be done by conf parameter, does not need this code fix - the code fix simply automates that for current and future releases) b) They should retain that identifierFactory - a mixed metastore with both is bad. c) Once we have a way of migrating them, as with HIVE-11039 filed, we should migrate them out of it. This parameter should never have been a part of hive-site.xml, I think, since it's dangerous if a user changes it. A datanucleus1 installation changing the parameter to datanucleus2 or vice-versa can result in metadata corruption for us. Disable directSQL if datanucleus.identifierFactory = datanucleus2 - Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical Fix For: 1.2.1 Attachments: HIVE-11023.patch We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced with Job.newInstance() Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10754: Summary: new Job() is deprecated. Replaced with Job.newInstance() Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader (was: Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader) new Job() is deprecated. Replaced with Job.newInstance() Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch {noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values( '1', '111'); insert into tbl2 values('1', '2'); {noformat} Pig script: {noformat} src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; dump prj_tbl1; dump prj_tbl2; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). We need to clone the job instance in HCatLoader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11045) ArrayIndexOutOfBoundsException with Hive 1.2.0 and Tez 0.7.0
[ https://issues.apache.org/jira/browse/HIVE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592453#comment-14592453 ] Vikram Dixit K commented on HIVE-11045: --- [~raj_velu] Can you provide some more information here so as to help debug this issue? Can you share the query and if possible a sample data set so that I can repro this issue. Also any configuration settings used would be helpful. Thanks Vikram. ArrayIndexOutOfBoundsException with Hive 1.2.0 and Tez 0.7.0 Key: HIVE-11045 URL: https://issues.apache.org/jira/browse/HIVE-11045 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Environment: Hive 1.2.0, HDP 2.2, Hadoop 2.6, Tez 0.7.0 Reporter: Soundararajan Velu TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{_col0:4457890},value:{_col0:null,_col1:null,_col2:null,_col3:null,_col4:null,_col5:null,_col6:null,_col7:null,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null,_col14:null,_col15:null,_col16:null,_col17:fkl_shipping_b2c,_col18:null,_col19:null,_col20:null,_col21:null}} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{_col0:4457890},value:{_col0:null,_col1:null,_col2:null,_col3:null,_col4:null,_col5:null,_col6:null,_col7:null,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null,_col14:null,_col15:null,_col16:null,_col17:fkl_shipping_b2c,_col18:null,_col19:null,_col20:null,_col21:null}} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{_col0:4457890},value:{_col0:null,_col1:null,_col2:null,_col3:null,_col4:null,_col5:null,_col6:null,_col7:null,_col8:null,_col9:null,_col10:null,_col11:null,_col12:null,_col13:null,_col14:null,_col15:null,_col16:null,_col17:fkl_shipping_b2c,_col18:null,_col19:null,_col20:null,_col21:null}} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {key:{_col0:6417306,_col1:{0:{_col0:2014-08-01 02:14:02}}},value:{_col0:2014-08-01 02:14:02,_col1:20140801,_col2:sc_jarvis_b2c,_col3:action_override,_col4:WITHIN_GRACE_PERIOD,_col5:policy_override}} at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:413) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:381) at
[jira] [Commented] (HIVE-11046) Filesystem Closed Exception
[ https://issues.apache.org/jira/browse/HIVE-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592478#comment-14592478 ] Siddharth Seth commented on HIVE-11046: --- [~raj_velu] - bunch of questions. Do you have additional logs from the container where this error was seen ? Also any steps to reproduce and how often are you able to reproduce this ? Is this using the Tez 0.7.0 release or a snapshot ? Filesystem Closed Exception --- Key: HIVE-11046 URL: https://issues.apache.org/jira/browse/HIVE-11046 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.7.0, 1.2.0 Environment: Hive 1.2.0, Tez0.7.0, HDP2.2, Hadoop 2.6 Reporter: Soundararajan Velu TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 14 more Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.hadoop.io.compress.DecompressorStream.close(DecompressorStream.java:205) at org.apache.hadoop.util.LineReader.close(LineReader.java:150) at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:282) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doClose(HiveRecordReader.java:50) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:104) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:170) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:138) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61) ... 16 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10746) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-10746: Description: The following query: {code:sql} SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; {code} runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: {noformat} hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 {noformat} Hive Table Describe: {noformat} hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \t line.delim \n serialization.format\t Time taken: 1.245 seconds, Fetched: 54 row(s) {noformat} Explain Hive 1.2.0 w/Tez: {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) Explain Hive 0.13 w/Tez: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Tez
[jira] [Commented] (HIVE-10978) Document fs.trash.interval wrt Hive and HDFS Encryption
[ https://issues.apache.org/jira/browse/HIVE-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592493#comment-14592493 ] Lefty Leverenz commented on HIVE-10978: --- [~eugene.koifman], the only encryption doc in the Hive wiki is this section (plus Configuration Properties): * [Setting Up HiveServer2 -- SSL Encryption | https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-SSLEncryption] Document fs.trash.interval wrt Hive and HDFS Encryption --- Key: HIVE-10978 URL: https://issues.apache.org/jira/browse/HIVE-10978 Project: Hive Issue Type: Bug Components: Documentation, Security Affects Versions: 1.2.0 Reporter: Eugene Koifman Priority: Critical Labels: TODOC1.2 This should be documented in 1.2.1 Release Notes When HDFS is encrypted (TDE is enabled), DROP TABLE and DROP PARTITION have unexpected behavior when Hadoop Trash feature is enabled. The later is enabled by setting fs.trash.interval 0 in core-site.xml. When Trash is enabled, the data file for the table, should be moved to Trash bin. If the table is inside an Encryption Zone, this move operation is not allowed. There are 2 ways to deal with this: 1. use PURGE, as in DROP TABLE blah PURGE. This skips the Trash bin even if enabled. 2. set fs.trash.interval = 0. It is critical that this config change is done in core-site.xml. Setting it in hive-site.xml may lead to very strange behavior where the table metadata is deleted but the data file remains. This will lead to data corruption if a table with the same name is later created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10754: Summary: new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog (was: new Job() is deprecated. Replaced with Job.newInstance() Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch {noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values( '1', '111'); insert into tbl2 values('1', '2'); {noformat} Pig script: {noformat} src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; dump prj_tbl1; dump prj_tbl2; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). We need to clone the job instance in HCatLoader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592673#comment-14592673 ] Hive QA commented on HIVE-11028: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740446/HIVE-11028.3.patch {color:green}SUCCESS:{color} +1 9011 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4311/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4311/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4311/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12740446 - PreCommit-HIVE-TRUNK-Build Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, HIVE-11028.3.patch {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313) at
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592678#comment-14592678 ] Hive QA commented on HIVE-10233: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740448/HIVE-10233.08.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4312/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4312/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4312/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4312/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at b98a30b HIVE-10746: Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat (Gopal V via Gunther H) + git clean -f -d Removing ql/src/test/queries/clientpositive/tez_self_join.q Removing ql/src/test/results/clientpositive/tez/tez_self_join.q.out + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at b98a30b HIVE-10746: Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat (Gopal V via Gunther H) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12740448 - PreCommit-HIVE-TRUNK-Build Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6897) Allow overwrite/append to external Hive table (with partitions) via HCatStorer
[ https://issues.apache.org/jira/browse/HIVE-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592366#comment-14592366 ] Alen Frantz commented on HIVE-6897: --- I am facing the same issues. Since there is no overwrite feature in HCatalog, we need to do it outside Pig. The workaround right now is , to delete the part files inside the table directory before executing your Pig script. But be careful here, as the rm -r command is not a good practice. Many people are facing the same issue, having said that, we actually need to add these features to be able to take HCatalog to a higher level. This would help the community. Really appreciate if these features are added. Also, I would be glad if I could help in this in anyway. Feel free to get in touch. Alen Allow overwrite/append to external Hive table (with partitions) via HCatStorer -- Key: HIVE-6897 URL: https://issues.apache.org/jira/browse/HIVE-6897 Project: Hive Issue Type: Improvement Components: HCatalog, HiveServer2 Affects Versions: 0.12.0 Reporter: Dip Kharod I'm using HCatStorer to write to external Hive table with partition from Pig and have the following different use cases: 1) Need to overwrite (aka, refresh) data into table: Currently I end up doing this outside (drop partition and delete HDFS folder) of Pig which is very painful and error-prone 2) Need to append (aka, add new file) data to the Hive external table/partition: Again, I end up doing this outside of Pig by copying file in appropriate folder It would be very productive for the developers to have both options in HCatStorer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11047) Update versions of branch-1.2 to 1.2.1
[ https://issues.apache.org/jira/browse/HIVE-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan resolved HIVE-11047. - Resolution: Fixed Fix Version/s: 1.2.1 Committed to branch-1.2 only. Thanks, Thejas! Update versions of branch-1.2 to 1.2.1 -- Key: HIVE-11047 URL: https://issues.apache.org/jira/browse/HIVE-11047 Project: Hive Issue Type: Bug Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 1.2.1 Attachments: HIVE-11047.2.patch, HIVE-11047.patch Need to update all pom.xml files in branch-1.2 to 1.2.1 , and update metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java to reflect that 1.2.1's schema is identical to 1.2.0. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10754: Description: Replace all the deprecated new Job() with Job.getInstance() in HCatalog. was: Replace all the deprecated new Job() with Job.getInstance(). new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch Replace all the deprecated new Job() with Job.getInstance() in HCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10754: Description: Replace all the deprecated new Job() with Job.getInstance(). was: Some older version of new Job() seems not implemented properly, which causes the following issue: {noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values( '1', '111'); insert into tbl2 values('1', '2'); {noformat} Pig script: {noformat} src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; dump prj_tbl1; dump prj_tbl2; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). Replace all the deprecated new Job() with Job.getInstance(). new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch Replace all the deprecated new Job() with Job.getInstance(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10754: Description: Some older version of new Job() seems not implemented properly, which causes the following issue: {noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values( '1', '111'); insert into tbl2 values('1', '2'); {noformat} Pig script: {noformat} src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; dump prj_tbl1; dump prj_tbl2; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). Replace all the deprecated new Job() with Job.getInstance(). was: {noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values( '1', '111'); insert into tbl2 values('1', '2'); {noformat} Pig script: {noformat} src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; dump prj_tbl1; dump prj_tbl2; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). We need to clone the job instance in HCatLoader. new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch Some older version of new Job() seems not implemented properly, which causes the following issue: {noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values( '1', '111'); insert into tbl2 values('1', '2'); {noformat} Pig script: {noformat} src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; dump prj_tbl1; dump prj_tbl2; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). Replace all the deprecated new Job() with Job.getInstance(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11048) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11048: Attachment: HIVE-11048.patch Make test cbo_windowing robust -- Key: HIVE-11048 URL: https://issues.apache.org/jira/browse/HIVE-11048 Project: Hive Issue Type: Test Components: Tests Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11048.patch Add partition / order by in over clause to make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10746) Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592604#comment-14592604 ] Sushanth Sowmyan commented on HIVE-10746: - Please add to the release wiki ( https://cwiki.apache.org/confluence/display/Hive/Hive+1.2+Release+Status ) when you commit any patch to branch-1.2. I'll go ahead and add this one in. Hive 1.2.0+Tez produces 1-byte FileSplits from mapred.TextInputFormat -- Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Fix For: 1.2.1 Attachments: HIVE-10746.1.patch, HIVE-10746.2.patch, slow_query_output.zip The following query: {code:sql} SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; {code} runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: {noformat} hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 {noformat} Hive Table Describe: {noformat} hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat:
[jira] [Commented] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2
[ https://issues.apache.org/jira/browse/HIVE-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592608#comment-14592608 ] Xuefu Zhang commented on HIVE-11023: Makes sense. Thanks for the explanation. Disable directSQL if datanucleus.identifierFactory = datanucleus2 - Key: HIVE-11023 URL: https://issues.apache.org/jira/browse/HIVE-11023 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.3.0, 1.2.1, 2.0.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Priority: Critical Fix For: 1.2.1 Attachments: HIVE-11023.patch We hit an interesting bug in a case where datanucleus.identifierFactory = datanucleus2 . The problem is that directSql handgenerates SQL strings assuming datanucleus1 naming scheme. If a user has their metastore JDO managed by datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are incorrect. One simple example of what this results in is the following: whenever DN persists a field which is held as a ListT, it winds up storing each T as a separate line in the appropriate mapping table, and has a column called INTEGER_IDX, which holds the position in the list. Then, upon reading, it automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which results in the list retaining its order. In DN2 naming scheme, the column is called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX and IDX. Whenever they use JDO, such as with all writes, it will then use the IDX field, and when they do any sort of optimized reads, such as through directSQL, it will ORDER BY INTEGER_IDX. An immediate danger is seen when we consider that the schema of a table is stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch schema for the table can come up mixed up in the table's native hashing order, rather than sorted by the index. This can then result in schema ordering being different from the actual table. For eg:, if a user has a (a:int,b:string,c:string), a describe on this may return (c:string, a:int, b: string), and thus, queries which are inserting after selecting from another table can have ClassCastExceptions when trying to insert data in the wong order - this is how we discovered this bug. This problem, however, can be far worse, if there are no type problems - it is possible, for eg., that if a,bc were all strings, that that insert query would succeed but mix up the order, which then results in user table data being mixed up. This has the potential to be very bad. We should write a tool to help convert metastores that use datanucleus2 to datanucleus1(more difficult, needs more one-time testing) or change directSql to support both(easier to code, but increases test-coverage matrix significantly and we should really then be testing against both schemes). But in the short term, we should disable directSql if we see that the identifierfactory is datanucleus2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592630#comment-14592630 ] Laljo John Pullokkaran commented on HIVE-10996: --- [~jcamachorodriguez] How could GB child Filter have a different schema than GB? Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-10233: - Attachment: HIVE-10233.09.patch Rebase and upload patch 09 Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592751#comment-14592751 ] Xuefu Zhang commented on HIVE-10844: [~chengxiang li], could you please provide a RB entry for this? Combine equivalent Works for HoS[Spark Branch] -- Key: HIVE-10844 URL: https://issues.apache.org/jira/browse/HIVE-10844 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch Some Hive queries(like [TPCDS Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11037: --- Attachment: HIVE-11037.01.patch The temporary patch. We need to update the tez q files, too. HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11035) PPD: Orc Split elimination fails because filterColumns=[-1]
[ https://issues.apache.org/jira/browse/HIVE-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11035: - Attachment: HIVE-11035-branch-1.0.patch PPD: Orc Split elimination fails because filterColumns=[-1] --- Key: HIVE-11035 URL: https://issues.apache.org/jira/browse/HIVE-11035 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gopal V Assignee: Prasanth Jayachandran Attachments: HIVE-11035-branch-1.0.patch, HIVE-11035.patch {code} create temporary table xx (x int) stored as orc ; insert into xx values (20),(200); set hive.fetch.task.conversion=none; select * from xx where x is null; {code} This should generate zero tasks after optional split elimination in the app master, instead of generating the 1 task which for sure hits the row-index filters and removes all rows anyway. Right now, this runs 1 task for the stripe containing (min=20, max=200, has_null=false), which is broken. Instead, it returns YES_NO_NULL from the following default case https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L976 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11025) In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591366#comment-14591366 ] Hive QA commented on HIVE-11025: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740235/HIVE-11025.patch {color:green}SUCCESS:{color} +1 9008 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4297/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4297/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4297/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12740235 - PreCommit-HIVE-TRUNK-Build In windowing spec, when the datatype is decimal, it's comparing the value against NULL value incorrectly Key: HIVE-11025 URL: https://issues.apache.org/jira/browse/HIVE-11025 Project: Hive Issue Type: Sub-task Components: PTF-Windowing Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11025.patch Given data and the following query, {noformat} deptno empno bonussalary 307698 NULL2850.0 307900 NULL950.0 307844 0 1500.0 select avg(salary) over (partition by deptno order by bonus range 200 preceding) from emp2; {noformat} It produces incorrect result for the row in which bonus=0 1900.0 1900.0 1766.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10952) Describe a non-partitioned table fail
[ https://issues.apache.org/jira/browse/HIVE-10952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591368#comment-14591368 ] Hive QA commented on HIVE-10952: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740238/HIVE-10952.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4298/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4298/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4298/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4298/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin From https://github.com/apache/hive 9692e89..fb86cef branch-1.0 - origin/branch-1.0 11a0901..2e1bee8 branch-1.1 - origin/branch-1.1 d1eaa37..749bbfc branch-1.2 - origin/branch-1.2 524cd79..9a511eb master - origin/master + git reset --hard HEAD HEAD is now at 524cd79 HIVE-11023 : Disable directSQL if datanucleus.identifierFactory = datanucleus2 (Sushanth Sowmyan, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded. + git reset --hard origin/master HEAD is now at 9a511eb HIVE-11035: PPD: Orc Split elimination fails because filterColumns=[-1] (Prasanth Jayachandran reviewed by Gopal V) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12740238 - PreCommit-HIVE-TRUNK-Build Describe a non-partitioned table fail - Key: HIVE-10952 URL: https://issues.apache.org/jira/browse/HIVE-10952 Project: Hive Issue Type: Sub-task Components: Metastore Affects Versions: hbase-metastore-branch Reporter: Daniel Dai Assignee: Alan Gates Fix For: hbase-metastore-branch Attachments: HIVE-10952-1.patch, HIVE-10952.patch This section of alter1.q fail: create table alter1(a int, b int); describe extended alter1; Exception: {code} Trying to fetch a non-existent storage descriptor from hash iNVRGkfwwQDGK9oX0fo9XA==^M at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer$QualifiedNameUtil.getAttemptTableName(DDLSemanticAnalyzer.java:1765) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer$QualifiedNameUtil.getTableName(DDLSemanticAnalyzer.java:1807) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDescribeTable(DDLSemanticAnalyzer.java:1985) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:318) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) at
[jira] [Updated] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11031: - Attachment: HIVE-11031.3.patch Addressed minor nit. ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data
[ https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10685: - Fix Version/s: (was: 1.1.0) (was: 1.2.0) 2.0.0 1.2.1 1.1.1 1.0.1 Alter table concatenate oparetor will cause duplicate data -- Key: HIVE-10685 URL: https://issues.apache.org/jira/browse/HIVE-10685 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1 Reporter: guoliming Assignee: guoliming Priority: Critical Fix For: 1.0.1, 1.1.1, 1.2.1, 2.0.0 Attachments: HIVE-10685.patch Orders table has 15 rows and stored as ORC. {noformat} hive select count(*) from orders; OK 15 Time taken: 37.692 seconds, Fetched: 1 row(s) {noformat} The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB. After executing command : ALTER TABLE orders CONCATENATE; The table is already 1530115000 rows. My hive version is 1.1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11040) Change Derby dependency version to 10.10.2.0
[ https://issues.apache.org/jira/browse/HIVE-11040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591594#comment-14591594 ] Hive QA commented on HIVE-11040: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12740263/HIVE-11040.1.patch {color:green}SUCCESS:{color} +1 9009 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4300/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4300/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4300/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12740263 - PreCommit-HIVE-TRUNK-Build Change Derby dependency version to 10.10.2.0 Key: HIVE-11040 URL: https://issues.apache.org/jira/browse/HIVE-11040 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11040.1.patch We don't see this on the Apache pre-commit tests because it uses PTest, but running the entire TestCliDriver suite results in failures in some of the partition-related qtests (partition_coltype_literals, partition_date, partition_date2). I've only really seen this on Linux (I was using CentOS). HIVE-8879 changed the Derby dependency version from 10.10.1.1 to 10.11.1.1. Testing with 10.10.1.1 or 10.20.2.0 seems to allow the partition related tests to pass. I'd like to change the dependency version to 10.20.2.0, since that version should also contain the fix for HIVE-8879. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1
[ https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9511: --- Attachment: HIVE-9511.4.patch Switch Tez to 0.6.1 --- Key: HIVE-9511 URL: https://issues.apache.org/jira/browse/HIVE-9511 Project: Hive Issue Type: Improvement Components: Tez Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, HIVE-9511.4.patch, HIVE-9511.patch.txt Tez 0.6.1 has been released. Research to switch to version 0.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9511) Switch Tez to 0.6.1
[ https://issues.apache.org/jira/browse/HIVE-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9511: --- Attachment: (was: HIVE-9511.4.patch.txt) Switch Tez to 0.6.1 --- Key: HIVE-9511 URL: https://issues.apache.org/jira/browse/HIVE-9511 Project: Hive Issue Type: Improvement Components: Tez Reporter: Damien Carol Assignee: Damien Carol Attachments: HIVE-9511.2.patch, HIVE-9511.3.patch.txt, HIVE-9511.4.patch, HIVE-9511.patch.txt Tez 0.6.1 has been released. Research to switch to version 0.6.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11033) BloomFilter index is not honored by ORC reader
[ https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11033: - Attachment: HIVE-11033.2.patch BloomFilter index is not honored by ORC reader -- Key: HIVE-11033 URL: https://issues.apache.org/jira/browse/HIVE-11033 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Allan Yan Assignee: Prasanth Jayachandran Attachments: HIVE-11033.2.patch, HIVE-11033.patch There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices variable defined in the SargApplier class superseded the one defined in its parent class. Therefore, in the ReaderImpl.pickRowGroups() {code} protected boolean[] pickRowGroups() throws IOException { // if we don't have a sarg or indexes, we read everything if (sargApp == null) { return null; } readRowIndex(currentStripe, included, sargApp.sargColumns); return sargApp.pickRowGroups(stripes.get(currentStripe), indexes); } {code} The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object. One solution is to make SargApplier.bloomFilterIndices a reference to its parent counterpart. {noformat} 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original 174d173 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 178c177 sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices); --- sarg, options.getColumnNames(), strideRate, types, included.length); 204a204 bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; 673c673 ListOrcProto.Type types, int includedCount, OrcProto.BloomFilterIndex[] bloomFilterIndices) { --- ListOrcProto.Type types, int includedCount) { 677c677 this.bloomFilterIndices = bloomFilterIndices; --- bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()]; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)