[jira] [Created] (HIVE-13020) Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK

2016-02-07 Thread Greg Senia (JIRA)
Greg Senia created HIVE-13020:
-

 Summary: Hive Zookeeper Connection From MetaStore and HiveServer2 
fails with IBM JDK
 Key: HIVE-13020
 URL: https://issues.apache.org/jira/browse/HIVE-13020
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore, Shims
Affects Versions: 1.2.1, 1.2.0, 1.3.0
 Environment: Linux X86_64 and IBM JDK 8
Reporter: Greg Senia
Assignee: Greg Senia
 Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0


HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only support 
the Oracle/Open JDK. I was performing testing of Hadoop running on the IBM JDK 
and discovered this issue and have since drawn up the attached patch. This 
looks to resolve the issue in a similar manner as how the Hadoop core folks 
handle the IBM JDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11051) Hive 1.2.0

2015-06-18 Thread Greg Senia (JIRA)
Greg Senia created HIVE-11051:
-

 Summary: Hive 1.2.0 
 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical


The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
0.5.4/0.5.3

Status: Running (Executing on YARN cluster with App id 
application_1434641270368_1038)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..   SUCCEEDED  3  300   0   0
Map 2 ... FAILED  3  102   7   0

VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s 

Status: Failed
Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-23 
22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
11:54:40.740061,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-23 
22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
11:54:40.740061,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 

[jira] [Created] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-18 Thread Greg Senia (JIRA)
Greg Senia created HIVE-10746:
-

 Summary: Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow 
group by/order by
 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 1.2.0, 0.14.0, 0.14.1, 1.1.0, 1.1.1
Reporter: Greg Senia
Priority: Critical


The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run 
this same query against Tez as the execution engine it consistently runs for 
over 300-500 seconds this seems extremely long. This is a basic external table 
delimited by tabs and is a single file in a folder. In Hive 0.13 this query 
with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 
and there clearly is something going awry with Hive w/Tez as an execution 
engine with Single or small file tables. I can attach further logs if someone 
needs them for deeper analysis.

HDFS Output:
hadoop fs -ls /example_dw/crc/arsn
Found 2 items
-rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
/example_dw/crc/arsn/_SUCCESS
-rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
/example_dw/crc/arsn/part-m-0


Hive Table Describe:
hive describe formatted crc_arsn;
OK
# col_name  data_type   comment 
 
arsn_cd string  
clmlvl_cd   string  
arclss_cd   string  
arclssg_cd  string  
arsn_prcsr_rmk_ind  string  
arsn_mbr_rspns_ind  string  
savtyp_cd   string  
arsn_eff_dt string  
arsn_exp_dt string  
arsn_pstd_dts   string  
arsn_lstupd_dts string  
arsn_updrsn_txt string  
appl_user_idstring  
arsntyp_cd  string  
pre_d_indicator string  
arsn_display_txtstring  
arstat_cd   string  
arsn_tracking_nostring  
arsn_cstspcfc_ind   string  
arsn_mstr_rcrd_ind  string  
state_specific_ind  string  
region_specific_in  string  
arsn_dpndnt_cd  string  
unit_adjustment_in  string  
arsn_mbr_only_ind   string  
arsn_qrmb_ind   string  
 
# Detailed Table Information 
Database:   adw  
Owner:  loadu...@exa.example.com   
CreateTime: Mon Apr 28 13:28:05 EDT 2014 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn   
 
Table Type: EXTERNAL_TABLE   
Table Parameters:
EXTERNALTRUE
transient_lastDdlTime   1398706085  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
field.delim \t  
line.delim  \n  
serialization.format\t  
Time taken: 1.245 seconds, Fetched: 54 row(s)




Explain Hive 1.2.0 w/Tez:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Reducer 2 - Map 1 (SIMPLE_EDGE)

[jira] [Created] (HIVE-10712) Hive on Apache Flink

2015-05-14 Thread Greg Senia (JIRA)
Greg Senia created HIVE-10712:
-

 Summary: Hive on Apache Flink
 Key: HIVE-10712
 URL: https://issues.apache.org/jira/browse/HIVE-10712
 Project: Hive
  Issue Type: Wish
Reporter: Greg Senia


Flink as an open-source data analytics cluster computing framework has gained 
some momentum recently. This initiative will provide user a new alternative so 
that those user can consolidate their backend.
Secondly, providing such an alternative further increases Hive's adoption as it 
exposes Flink users to a viable, feature-rich de facto standard SQL tools on 
Hadoop.
Finally, allowing Hive to run on Flink also has performance benefits. Hive 
queries, especially those involving multiple reducer stages, will run faster, 
thus improving user experience as Tez/Spark does.
This is an umbrella JIRA which will cover many coming subtask.  Feedback from 
the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)