[jira] [Created] (HIVE-13020) Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK
Greg Senia created HIVE-13020: - Summary: Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK Key: HIVE-13020 URL: https://issues.apache.org/jira/browse/HIVE-13020 Project: Hive Issue Type: Bug Components: HiveServer2, Metastore, Shims Affects Versions: 1.2.1, 1.2.0, 1.3.0 Environment: Linux X86_64 and IBM JDK 8 Reporter: Greg Senia Assignee: Greg Senia Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0 HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only support the Oracle/Open JDK. I was performing testing of Hadoop running on the IBM JDK and discovered this issue and have since drawn up the attached patch. This looks to resolve the issue in a similar manner as how the Hadoop core folks handle the IBM JDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11051) Hive 1.2.0
Greg Senia created HIVE-11051: - Summary: Hive 1.2.0 Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Greg Senia Assignee: Gopal V Priority: Critical The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd:
[jira] [Created] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
Greg Senia created HIVE-10746: - Summary: Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 1.2.0, 0.14.0, 0.14.1, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE transient_lastDdlTime 1398706085 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \t line.delim \n serialization.format\t Time taken: 1.245 seconds, Fetched: 54 row(s) Explain Hive 1.2.0 w/Tez: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 - Map 1 (SIMPLE_EDGE)
[jira] [Created] (HIVE-10712) Hive on Apache Flink
Greg Senia created HIVE-10712: - Summary: Hive on Apache Flink Key: HIVE-10712 URL: https://issues.apache.org/jira/browse/HIVE-10712 Project: Hive Issue Type: Wish Reporter: Greg Senia Flink as an open-source data analytics cluster computing framework has gained some momentum recently. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Flink users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Flink also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez/Spark does. This is an umbrella JIRA which will cover many coming subtask. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)