[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-10-28 Thread Greg Senia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961647#comment-16961647
 ] 

Greg Senia commented on HIVE-17395:
---

After spending many hours troubleshooting this enhancement seems to be the root 
of the LParen problem

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
>   at 
> 

[jira] [Comment Edited] (HIVE-18624) Parsing time is extremely high (~10 min) for queries with complex select expressions

2019-10-28 Thread Greg Senia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961586#comment-16961586
 ] 

Greg Senia edited comment on HIVE-18624 at 10/29/19 1:14 AM:
-

Re-created this problem with a simple query as:

Query:: select 
'x'
 from processed_opendata_samples.nyse_stocks limit 2


This bug seems to have made it into many commercial distributions. Spent about 
7 hours debugging today and determined HIVE-11600 brought this problem to light


was (Author: gss2002):
Re-created this problem with a simple query as:

Query:: select 
'x'
 from processed_opendata_samples.nyse_stocks limit 2


> Parsing time is extremely high (~10 min) for queries with complex select 
> expressions
> 
>
> Key: HIVE-18624
> URL: https://issues.apache.org/jira/browse/HIVE-18624
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Parser
>Affects Versions: 2.0.0, 3.0.0, 2.3.2
>Reporter: Amruth Sampath
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 2.4.0, 4.0.0, 3.2.0, 3.1.2
>
> Attachments: HIVE-18624.01.patch, HIVE-18624.02.patch, thread_dump
>
>
> Explain of the same query takes
> 0.1 to 3 seconds in hive 2.1.0 &
>  10-15 min in hive 2.3.2 & latest master
> Sample expression below
> {code:java}
> EXPLAIN
> SELECT DISTINCT
>   IF(lower('a') <= lower('a')
>   ,'a'
>   ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a')
>   ,'a'
>   ,IF(if('a' = 'a', TRUE, FALSE) = 1
>   ,'a'
>   ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a')
>and lower(if('a' = 'a','a','a')) <= lower('a'))
>   OR ('a' like 'a' OR 'a' like 'a')
>   OR 'a' in ('a','a')
>   ,'a'
>   ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1
>   ,'a'
>   ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as 
> string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp() 
> <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + 
> 9*3600
>   ,'a'
>   ,If(lower('a') <= lower('a')
>   and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1
>   ,'a'
>   ,IF('a'=1 AND 'a'=1
>   ,'a'
>   ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,If('a' = 'a' AND 
> lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp)))
>   ,'a'
>   ,IF('a' = 1
>   ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) 
> > 30 and 'a' = 1
>   ,'a', 'a')
>   ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL
>   ,'a'
>   ,IF('a' = 1 and 'a'>0
>   , 'a'
>   ,IF('a' = 1 AND 'a' ='a'
>   ,'a'
>   ,IF('a' is not null and 'a' is not null and 'a' > 'a'
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,If('a' = 1
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as  int) + 93600 < 
> cast(unix_timestamp()  as int)
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,IF('a' = 'a' and 'a' in ('a','a','a')
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a','a'))
>   )))
> AS test_comp_exp
> {code}
>  
> Taking a look at [^thread_dump] shows a very large function stack getting 
> created.
> Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed 
> up the parsing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-18624) Parsing time is extremely high (~10 min) for queries with complex select expressions

2019-10-28 Thread Greg Senia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961586#comment-16961586
 ] 

Greg Senia commented on HIVE-18624:
---

Re-created this problem with a simple query as:

Query:: select 
'x'
 from processed_opendata_samples.nyse_stocks limit 2


> Parsing time is extremely high (~10 min) for queries with complex select 
> expressions
> 
>
> Key: HIVE-18624
> URL: https://issues.apache.org/jira/browse/HIVE-18624
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Parser
>Affects Versions: 2.0.0, 3.0.0, 2.3.2
>Reporter: Amruth Sampath
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 2.4.0, 4.0.0, 3.2.0, 3.1.2
>
> Attachments: HIVE-18624.01.patch, HIVE-18624.02.patch, thread_dump
>
>
> Explain of the same query takes
> 0.1 to 3 seconds in hive 2.1.0 &
>  10-15 min in hive 2.3.2 & latest master
> Sample expression below
> {code:java}
> EXPLAIN
> SELECT DISTINCT
>   IF(lower('a') <= lower('a')
>   ,'a'
>   ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a')
>   ,'a'
>   ,IF(if('a' = 'a', TRUE, FALSE) = 1
>   ,'a'
>   ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a')
>and lower(if('a' = 'a','a','a')) <= lower('a'))
>   OR ('a' like 'a' OR 'a' like 'a')
>   OR 'a' in ('a','a')
>   ,'a'
>   ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1
>   ,'a'
>   ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as 
> string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp() 
> <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + 
> 9*3600
>   ,'a'
>   ,If(lower('a') <= lower('a')
>   and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1
>   ,'a'
>   ,IF('a'=1 AND 'a'=1
>   ,'a'
>   ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,If('a' = 'a' AND 
> lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp)))
>   ,'a'
>   ,IF('a' = 1
>   ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) 
> > 30 and 'a' = 1
>   ,'a', 'a')
>   ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL
>   ,'a'
>   ,IF('a' = 1 and 'a'>0
>   , 'a'
>   ,IF('a' = 1 AND 'a' ='a'
>   ,'a'
>   ,IF('a' is not null and 'a' is not null and 'a' > 'a'
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,If('a' = 1
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as  int) + 93600 < 
> cast(unix_timestamp()  as int)
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,IF('a' = 'a' and 'a' in ('a','a','a')
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a','a'))
>   )))
> AS test_comp_exp
> {code}
>  
> Taking a look at [^thread_dump] shows a very large function stack getting 
> created.
> Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed 
> up the parsing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-18624) Parsing time is extremely high (~10 min) for queries with complex select expressions

2019-10-28 Thread Greg Senia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated HIVE-18624:
--
Affects Version/s: 2.0.0

> Parsing time is extremely high (~10 min) for queries with complex select 
> expressions
> 
>
> Key: HIVE-18624
> URL: https://issues.apache.org/jira/browse/HIVE-18624
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Parser
>Affects Versions: 2.0.0, 3.0.0, 2.3.2
>Reporter: Amruth Sampath
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 2.4.0, 4.0.0, 3.2.0, 3.1.2
>
> Attachments: HIVE-18624.01.patch, HIVE-18624.02.patch, thread_dump
>
>
> Explain of the same query takes
> 0.1 to 3 seconds in hive 2.1.0 &
>  10-15 min in hive 2.3.2 & latest master
> Sample expression below
> {code:java}
> EXPLAIN
> SELECT DISTINCT
>   IF(lower('a') <= lower('a')
>   ,'a'
>   ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a')
>   ,'a'
>   ,IF(if('a' = 'a', TRUE, FALSE) = 1
>   ,'a'
>   ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a')
>and lower(if('a' = 'a','a','a')) <= lower('a'))
>   OR ('a' like 'a' OR 'a' like 'a')
>   OR 'a' in ('a','a')
>   ,'a'
>   ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1
>   ,'a'
>   ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as 
> string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp() 
> <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + 
> 9*3600
>   ,'a'
>   ,If(lower('a') <= lower('a')
>   and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1
>   ,'a'
>   ,IF('a'=1 AND 'a'=1
>   ,'a'
>   ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,If('a' = 'a' AND 
> lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp)))
>   ,'a'
>   ,IF('a' = 1
>   ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) 
> > 30 and 'a' = 1
>   ,'a', 'a')
>   ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL
>   ,'a'
>   ,IF('a' = 1 and 'a'>0
>   , 'a'
>   ,IF('a' = 1 AND 'a' ='a'
>   ,'a'
>   ,IF('a' is not null and 'a' is not null and 'a' > 'a'
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,If('a' = 1
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' = 1
>   ,'a'
>   ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as  int) + 93600 < 
> cast(unix_timestamp()  as int)
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a'
>   ,IF('a' = 'a' and 'a' in ('a','a','a')
>   ,'a'
>   ,IF('a' = 'a'
>   ,'a','a'))
>   )))
> AS test_comp_exp
> {code}
>  
> Taking a look at [^thread_dump] shows a very large function stack getting 
> created.
> Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed 
> up the parsing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

2017-02-13 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864981#comment-15864981
 ] 

Greg Senia commented on HIVE-10511:
---

[~gopalv] thank you for further insight. I wish all Hadoop Vendors I talked 
with felt the same way. I know multiple vendors who feel Hive and HiveServer2 
should be the ONLY access mechanism on top of Hadoop. I guess the approach 
we've been taking at my current and past employer is that tools like Voltage or 
Protegrity or Dataguise would be used to secure column level access using FPE. 
But I can see how some companies would not want to invest down that road. How 
is LLAP and doAS working is that going to work as things like Protegrity 
require jobs to run as the actual endUser. It cannot run as Hive and 
unfortunately I don't think this requirement will be changing any time soon.

> Replacing the implementation of Hive CLI using Beeline
> --
>
> Key: HIVE-10511
> URL: https://issues.apache.org/jira/browse/HIVE-10511
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.10.0
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> Hive CLI is a legacy tool which had two main use cases: 
> 1. a thick client for SQL on hadoop
> 2. a command line tool for HiveServer1.
> HiveServer1 is already deprecated and removed from Hive code base, so  use 
> case #2 is out of the question. For #1, Beeline provides or is supposed to 
> provides equal functionality, yet is implemented differently from Hive CLI.
> As it has been a while that Hive community has been recommending Beeline + 
> HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
> use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
> Beeline plus embedded HS2 so that Hive community only needs to maintain a 
> single code path. In this way, Hive CLI is just an alias to Beeline at either 
> shell script level or at high code level. The goal is that  no changes or 
> minimum changes are expected from existing user scrip using Hive CLI.
> This is an Umbrella JIRA covering all tasks related to this initiative. Over 
> the last year or two, Beeline has been improved significantly to match what 
> Hive CLI offers. Still, there may still be some gaps or deficiency to be 
> discovered and fixed. In the meantime, we also want to make sure the enough 
> tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

2017-02-13 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864337#comment-15864337
 ] 

Greg Senia commented on HIVE-10511:
---

[~gopalv] out of curiosity the big issue is HS2 has always been a scalability 
issue. So the next question how do you plan on stopping folks on using sparkSQL 
cli as it goes directly at metastore and fs which we are using along with 
native mr / spark going directly at the fs system location of this data?

> Replacing the implementation of Hive CLI using Beeline
> --
>
> Key: HIVE-10511
> URL: https://issues.apache.org/jira/browse/HIVE-10511
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.10.0
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> Hive CLI is a legacy tool which had two main use cases: 
> 1. a thick client for SQL on hadoop
> 2. a command line tool for HiveServer1.
> HiveServer1 is already deprecated and removed from Hive code base, so  use 
> case #2 is out of the question. For #1, Beeline provides or is supposed to 
> provides equal functionality, yet is implemented differently from Hive CLI.
> As it has been a while that Hive community has been recommending Beeline + 
> HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
> use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
> Beeline plus embedded HS2 so that Hive community only needs to maintain a 
> single code path. In this way, Hive CLI is just an alias to Beeline at either 
> shell script level or at high code level. The goal is that  no changes or 
> minimum changes are expected from existing user scrip using Hive CLI.
> This is an Umbrella JIRA covering all tasks related to this initiative. Over 
> the last year or two, Beeline has been improved significantly to match what 
> Hive CLI offers. Still, there may still be some gaps or deficiency to be 
> discovered and fixed. In the meantime, we also want to make sure the enough 
> tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline

2017-01-11 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819606#comment-15819606
 ] 

Greg Senia commented on HIVE-10511:
---

I must ask the question how has this been looked at from real world usage. Two 
companies I have worked for now will have major scalability issues with 
HiveServer2 and beeline. Specifically with large results that a user may 
return. Whats the mitigation plan so clusters don't end up with 50 
HiveServer2's with 12GB heaps...

http://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_hiveserver2_configure.html



> Replacing the implementation of Hive CLI using Beeline
> --
>
> Key: HIVE-10511
> URL: https://issues.apache.org/jira/browse/HIVE-10511
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.10.0
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
>
> Hive CLI is a legacy tool which had two main use cases: 
> 1. a thick client for SQL on hadoop
> 2. a command line tool for HiveServer1.
> HiveServer1 is already deprecated and removed from Hive code base, so  use 
> case #2 is out of the question. For #1, Beeline provides or is supposed to 
> provides equal functionality, yet is implemented differently from Hive CLI.
> As it has been a while that Hive community has been recommending Beeline + 
> HS2 configuration, ideally we should deprecating Hive CLI. Because of wide 
> use of Hive CLI, we instead propose replacing Hive CLI's implementation with 
> Beeline plus embedded HS2 so that Hive community only needs to maintain a 
> single code path. In this way, Hive CLI is just an alias to Beeline at either 
> shell script level or at high code level. The goal is that  no changes or 
> minimum changes are expected from existing user scrip using Hive CLI.
> This is an Umbrella JIRA covering all tasks related to this initiative. Over 
> the last year or two, Beeline has been improved significantly to match what 
> Hive CLI offers. Still, there may still be some gaps or deficiency to be 
> discovered and fixed. In the meantime, we also want to make sure the enough 
> tests are included and performance impact is identified and addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13020) Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK

2016-02-10 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142137#comment-15142137
 ] 

Greg Senia commented on HIVE-13020:
---

[~thejas] and [~gopalv] no problem

> Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK
> --
>
> Key: HIVE-13020
> URL: https://issues.apache.org/jira/browse/HIVE-13020
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Shims
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
> Environment: Linux X86_64 and IBM JDK 8
>Reporter: Greg Senia
>Assignee: Greg Senia
>  Labels: hdp, ibm, ibm-jdk
> Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, 
> hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, 
> hiveserver2_beforepatch.txt
>
>
> HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only 
> support the Oracle/Open JDK. I was performing testing of Hadoop running on 
> the IBM JDK and discovered this issue and have since drawn up the attached 
> patch. This looks to resolve the issue in a similar manner as how the Hadoop 
> core folks handle the IBM JDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7443) Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs

2016-02-09 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139638#comment-15139638
 ] 

Greg Senia commented on HIVE-7443:
--

[~crystal_gaoyu] did this fix ever make it into Hive?  If it didn't by applying 
https://issues.apache.org/jira/browse/HADOOP-9969 this issue with beeline is 
gone with Hive 1.2.0 and the following fixes... 
https://issues.apache.org/jira/browse/TEZ-3105, 
https://issues.apache.org/jira/browse/HIVE-13020


> Fix HiveConnection to communicate with Kerberized Hive JDBC server and 
> alternative JDKs
> ---
>
> Key: HIVE-7443
> URL: https://issues.apache.org/jira/browse/HIVE-7443
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC, Security
>Affects Versions: 0.12.0, 0.13.1
> Environment: Kerberos
> Run Hive server2 and client with IBM JDK7.1
>Reporter: Yu Gao
>Assignee: Yu Gao
> Attachments: HIVE-7443.patch
>
>
> Hive Kerberos authentication has been enabled in my cluster. I ran kinit to 
> initialize the current login user's ticket cache successfully, and then tried 
> to use beeline to connect to Hive Server2, but failed. After I manually added 
> some logging to catch the failure exception, this is what I got that caused 
> the failure:
> beeline>  !connect 
> jdbc:hive2://:1/default;principal=hive/@REALM.COM
>  org.apache.hive.jdbc.HiveDriver
> scan complete in 2ms
> Connecting to 
> jdbc:hive2://:1/default;principal=hive/@REALM.COM
> Enter password for 
> jdbc:hive2://:1/default;principal=hive/@REALM.COM:
> 14/07/17 15:12:45 ERROR jdbc.HiveConnection: Failed to open client transport
> javax.security.sasl.SaslException: Failed to open client transport [Caused by 
> java.io.IOException: Could not instantiate SASL transport]
> at 
> org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:78)
> at 
> org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:342)
> at 
> org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:200)
> at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178)
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
> at java.sql.DriverManager.getConnection(DriverManager.java:582)
> at java.sql.DriverManager.getConnection(DriverManager.java:198)
> at 
> org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145)
> at 
> org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:186)
> at org.apache.hive.beeline.Commands.connect(Commands.java:959)
> at org.apache.hive.beeline.Commands.connect(Commands.java:880)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:44)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:801)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.io.IOException: Could not instantiate SASL transport
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Client.createClientTransport(HadoopThriftAuthBridge20S.java:177)
> at 
> org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:74)
> ... 24 more
> Caused by: javax.security.sasl.SaslException: Failure to initialize security 
> context [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0
> major string: Invalid credentials
> minor string: SubjectCredFinder: no JAAS Subject]
> at 
> com.ibm.security.sasl.gsskerb.GssKrb5Client.(GssKrb5Client.java:131)
> at 
> com.ibm.security.sasl.gsskerb.FactoryImpl.createSaslClient(FactoryImpl.java:53)
> at javax.security.sasl.Sasl.createSaslClient(Sasl.java:362)
> at 
> 

[jira] [Commented] (HIVE-9545) Build FAILURE with IBM JVM

2016-02-08 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137134#comment-15137134
 ] 

Greg Senia commented on HIVE-9545:
--

Any way we can get these integrated into Hive??? If there are issues getting it 
integrated please let me know and I will have a discussion with some folks that 
could hopefully influence gettng these IBM JDK related fixes for Hadoop into 
trunk..

> Build FAILURE with IBM JVM 
> ---
>
> Key: HIVE-9545
> URL: https://issues.apache.org/jira/browse/HIVE-9545
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment:  mvn -version
> Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 
> 2014-08-11T22:58:10+02:00)
> Maven home: /opt/apache-maven-3.2.3
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-x86_64-71/jre
> Default locale: en_US, platform encoding: ISO-8859-1
> OS name: "linux", version: "3.10.0-123.4.4.el7.x86_64", arch: "amd64", 
> family: "unix"
>Reporter: pascal oliva
>Assignee: Navis
> Attachments: HIVE-9545.1.patch.txt
>
>
>  NO PRECOMMIT TESTS 
> With the use of IBM JVM environment :
> [root@dorado-vm2 hive]# java -version
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build pxa6470_27sr2-20141026_01(SR2))
> IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References 
> 20141017_217728 (JIT enabled, AOT enabled).
> The build failed on
>  [INFO] Hive Query Language  FAILURE [ 50.053 
> s]
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-exec: Compilation failure: Compilation failure:
> [ERROR] 
> /home/pascal/hive0.14/hive/ql/src/java/org/apache/hadoop/hive/ql/debug/Utils.java:[29,26]
>  package com.sun.management does not exist.
> HOWTO : 
> #git clone -b branch-0.14 https://github.com/apache/hive.git
> #cd hive
> #mvn  install -DskipTests -Phadoop-2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13020) Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK

2016-02-07 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated HIVE-13020:
--
Attachment: HIVE-13020.patch

Patch

> Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK
> ---
>
> Key: HIVE-13020
> URL: https://issues.apache.org/jira/browse/HIVE-13020
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Shims
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
> Environment: Linux X86_64 and IBM JDK 8
>Reporter: Greg Senia
>Assignee: Greg Senia
> Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0
>
> Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, 
> hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, 
> hiveserver2_beforepatch.txt
>
>
> HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only 
> support the Oracle/Open JDK. I was performing testing of Hadoop running on 
> the IBM JDK and discovered this issue and have since drawn up the attached 
> patch. This looks to resolve the issue in a similar manner as how the Hadoop 
> core folks handle the IBM JDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13020) Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK

2016-02-07 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated HIVE-13020:
--
Attachment: hiveserver2_beforepatch.txt
hiveserver2_afterpatch.txt
hivemetastore_beforepatch.txt
hivemetastore_afterpatch.txt

Logs showing before and after patching with the provided patch

> Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK
> ---
>
> Key: HIVE-13020
> URL: https://issues.apache.org/jira/browse/HIVE-13020
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Shims
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
> Environment: Linux X86_64 and IBM JDK 8
>Reporter: Greg Senia
>Assignee: Greg Senia
> Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0
>
> Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, 
> hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, 
> hiveserver2_beforepatch.txt
>
>
> HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only 
> support the Oracle/Open JDK. I was performing testing of Hadoop running on 
> the IBM JDK and discovered this issue and have since drawn up the attached 
> patch. This looks to resolve the issue in a similar manner as how the Hadoop 
> core folks handle the IBM JDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13020) Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK

2016-02-07 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated HIVE-13020:
--
Labels: hdp ibm ibm-jdk  (was: )

> Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK
> --
>
> Key: HIVE-13020
> URL: https://issues.apache.org/jira/browse/HIVE-13020
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Shims
>Affects Versions: 1.2.0, 1.3.0, 1.2.1
> Environment: Linux X86_64 and IBM JDK 8
>Reporter: Greg Senia
>Assignee: Greg Senia
>  Labels: hdp, ibm, ibm-jdk
> Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0
>
> Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, 
> hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, 
> hiveserver2_beforepatch.txt
>
>
> HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only 
> support the Oracle/Open JDK. I was performing testing of Hadoop running on 
> the IBM JDK and discovered this issue and have since drawn up the attached 
> patch. This looks to resolve the issue in a similar manner as how the Hadoop 
> core folks handle the IBM JDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-24 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599913#comment-14599913
 ] 

Greg Senia commented on HIVE-11051:
---

Fix looks good. Tested in our environment testing one final use case today.

 Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
 -

 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Tez
Affects Versions: 1.2.0
Reporter: Greg Senia
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-11051.01.patch, problem_table_joins.tar.gz


 I tried to apply: HIVE-10729 which did not solve the issue.
 The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
 0.5.4/0.5.3
 {code}
 Status: Running (Executing on YARN cluster with App id 
 application_1434641270368_1038)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  3  300   0  
  0
 Map 2 ... FAILED  3  102   7  
  0
 
 VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s
  
 
 Status: Failed
 Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
 diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
 at 
 

[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-23 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598804#comment-14598804
 ] 

Greg Senia commented on HIVE-10729:
---

Gunther Hagleitner and Matt Mcline Using this Patch against my JIRA HIVE-11051 
and the test case on Hadoop 2.4.1 with Hive 1.2.0 and Tez 0.5.4 it still fails:

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cnctevn_id:002246948195,svcrqst_id:003629537980,svcrqst_crt_dts:2015-04-24
 12:48:37.859683,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-24 
12:48:37.859683,crsr_lupdt:null,cntevsds_lupdt:2015-04-24 
12:48:40.499238,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{cnctevn_id:002246948195,svcrqst_id:003629537980,svcrqst_crt_dts:2015-04-24
 12:48:37.859683,subject_seq_no:1,plan_component:HMOM1 
,cust_segment:RM 
,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-04-24 
12:48:37.859683,crsr_lupdt:null,cntevsds_lupdt:2015-04-24 
12:48:40.499238,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
exception: Index: 0, Size: 0
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:426)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 17 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.set(ArrayList.java:426)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.fixupComplexObjects(MapJoinBytesTableContainer.java:424)
at 
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.uppack(HybridHashTableContainer.java:875)
at 
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:845)
at 
org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:722)
at 
org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:62)
at 
org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:33)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:650)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:756)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:414)
... 23 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
vertex_1434641270368_13820_2_01 [Map 2] killed/failed due to:null]DAG failed 
due to vertex failure. failedVertices:1 killedVertices:0

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: 

[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-18 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592882#comment-14592882
 ] 

Greg Senia commented on HIVE-11051:
---

This seems to be related/similar:
http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2

http://qnalist.com/questions/5904003/map-side-join-fails-when-a-serialized-table-contains-arrays


 Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
 -

 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Attachments: problem_table_joins.tar.gz


 I tried to apply: HIVE-10729 which did not solve the issue.
 The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
 0.5.4/0.5.3
 Status: Running (Executing on YARN cluster with App id 
 application_1434641270368_1038)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  3  300   0  
  0
 Map 2 ... FAILED  3  102   7  
  0
 
 VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s
  
 
 Status: Failed
 Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
 diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
 at 
 

[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;

2015-06-18 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated HIVE-11051:
--
Component/s: Tez

 Hive 1.2.0  MapJoin w/Tez - LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
 -

 Key: HIVE-11051
 URL: https://issues.apache.org/jira/browse/HIVE-11051
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Tez
Affects Versions: 1.2.0
Reporter: Greg Senia
Assignee: Gopal V
Priority: Critical
 Attachments: problem_table_joins.tar.gz


 I tried to apply: HIVE-10729 which did not solve the issue.
 The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 
 0.5.4/0.5.3
 Status: Running (Executing on YARN cluster with App id 
 application_1434641270368_1038)
 
 VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
 KILLED
 
 Map 1 ..   SUCCEEDED  3  300   0  
  0
 Map 2 ... FAILED  3  102   7  
  0
 
 VERTICES: 01/02  [=-] 66%   ELAPSED TIME: 7.39 s
  
 
 Status: Failed
 Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, 
 diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row 
 {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23
  11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 
 ,cust_segment:RM 
 ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd:
  ,catsrsn_cd:,apealvl_cd: 
 ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
],svcrqst_lupdt:2015-04-23 
 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 
 11:54:40.740061,ignore_me:1,notes:null}
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
 at 
 

[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-09 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578856#comment-14578856
 ] 

Greg Senia commented on HIVE-10729:
---

Here is the query and source table describe that shows the arraystring which 
seems to be the cause...

drop table debug.ct_gsd_events1_test;
create table debug.ct_gsd_events1_test
as select  a.*,
b.svcrqst_id,
b.svcrqct_cds,
b.svcrtyp_cd,
b.cmpltyp_cd,
b.sum_reason_cd as src,
b.cnctmd_cd,
b.notes
from ctm.ct_gsd_events a
inner join
mbr.gsd_service_request b
on a.contact_event_id = b.cnctevn_id;


hive describe formatted ctm.ct_gsd_events;
OK
# col_name  data_type   comment 
 
hmoid   string  
cumb_id_no  int 
mbrind_id   string  
contact_event_idstring  
ce_create_dtstring  
ce_end_dt   string  
contact_typestring  
cnctevs_cd  string  
contact_modestring  
cntvnst_stts_cd string  
total_transfers int 
ce_notesarraystring   
 
# Detailed Table Information 
Database:   ctm  
Owner:  LOAD_USER  
CreateTime: Fri May 29 09:41:58 EDT 2015 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   
hdfs://xhadnnm1p.example.com:8020/apps/hive/warehouse/ctm.db/ct_gsd_events  
   
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE   true
numFiles154 
numRows 0   
rawDataSize 0   
totalSize   5464108 
transient_lastDdlTime   1432906919  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
serialization.format1   
Time taken: 2.968 seconds, Fetched: 42 row(s)

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at 

[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-06-09 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578874#comment-14578874
 ] 

Greg Senia commented on HIVE-10729:
---

Here is a sample of the data I think the cause is their is a null in the 
arraystring field of notes... this was not a problem with Hive 0.13 it 
definitely started with Hive 0.14/1.x line..


Vertex failed, vertexName=Map 2, vertexId=vertex_1426958683478_216665_2_01, 
diagnostics=[Task failed, taskId=task_1426958683478_216665_2_01_000104, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cumb_id_no:31585,cumb_id_no_sub:31585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09
 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 
,plan_component:H 
,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX
 
,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX
 
,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-03-09 
11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 
11:24:51.315134,cntevsds_lupdt:2015-03-09 
11:25:13.429458,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{cumb_id_no:31XXX585,cumb_id_no_sub:31XXX585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09
 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 
,plan_component:H 
,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX
 
,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX
 
,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-03-09 
11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 
11:24:51.315134,cntevsds_lupdt:2015-03-09 
11:25:13.429458,ignore_me:1,notes:null}
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{cumb_id_no:31585,cumb_id_no_sub:31585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09
 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 
,plan_component:H 
,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX
 
,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd:
 ,catsrsn_cd:,apealvl_cd: 
,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX
 
,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[
   ],svcrqst_lupdt:2015-03-09 
11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 
11:24:51.315134,cntevsds_lupdt:2015-03-09 

[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-27 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561260#comment-14561260
 ] 

Greg Senia commented on HIVE-10729:
---

I tried this patch with Hive 1.2.0 and I am still getting this error 

Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
[Ljava.lang.Object;
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListElement(StandardListObjectInspector.java:66)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFIndex.evaluate(GenericUDFIndex.java:102)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81)
... 31 more

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when projection columns include a map:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.key=test.a;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-21 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555456#comment-14555456
 ] 

Greg Senia commented on HIVE-10746:
---

After having offline discussion with Gopal V he determined the cause of this 
problem is that starting in Hive 0.14 org.apache.hadoop.mapred.TextInputFormat  
uses whatever is defined in property: 
mapreduce.input.fileinputformat.split.minsize; In my case this was defined to 
1... Unfortunately that is 1 byte so it created 40040 splits creating 40400 
reads of the single 3MB file...

Hope this helps someone else out.

Should be around half of the HDFS block size in my case 64MB since my block 
size is 128MB..
mapreduce.input.fileinputformat.split.minsize=67108864


Gopal V if no fix is coming should we resolve/close this JIRA?

 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
 

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Priority: Critical
 Attachments: slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
 consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to 
 run this same query against Tez as the execution engine it consistently runs 
 for over 300-500 seconds this seems extremely long. This is a basic external 
 table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
 query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now 
 Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an 
 execution engine with Single or small file tables. I can attach further logs 
 if someone needs them for deeper analysis.
 HDFS Output:
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 Hive Table Describe:
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  
 arsn_mbr_only_ind   string  
 arsn_qrmb_ind   string  
  
 # Detailed Table Information 
 Database:   adw  
 Owner:  loadu...@exa.example.com   
 CreateTime: Mon Apr 28 13:28:05 EDT 2014 
 LastAccessTime: UNKNOWN  
 Protect Mode:   None 
 Retention:  0
 Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn 

 Table Type: EXTERNAL_TABLE   
 Table Parameters:
 EXTERNALTRUE
 

[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-20 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552508#comment-14552508
 ] 

Greg Senia commented on HIVE-10746:
---

Seems to be that single file with a group by/order by is generating 40040 
splits... I think the map file is needed at this point to determine why this is 
happening correct?

2015-05-19 16:20:32,462 INFO [AsyncDispatcher event handler] impl.VertexImpl: 
Num tasks is -1. Expecting VertexManager/InputInitializers/1-1 split to set 
#tasks for the vertex vertex_1426958683478_171530_1_00
2015-05-19 16:20:32,707 DEBUG [InputInitializer [Map 1] #0] 
security.UserGroupInformation: PrivilegedAction as:gss2002 (auth:SIMPLE) 
from:org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
2015-05-19 16:20:32,708 INFO [InputInitializer [Map 1] #0] 
dag.RootInputInitializerManager: Starting InputInitializer for Input: crc_arsn 
on vertex vertex_1426958683478_171530_1_00 [Map 1]
2015-05-19 16:20:32,722 INFO [InputInitializer [Map 1] #0] log.PerfLogger: 
PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.HiveInputFormat
2015-05-19 16:20:32,723 INFO [InputInitializer [Map 1] #0] exec.Utilities: PLAN 
PATH = 
hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/431ae2bc-ebc9-48e7-bbb3-f03144198009/hive_2015-05-19_16-20-28_783_5570914503219655045-1/gss2002/_tez_scratch_dir/9da6870e-7388-40b1-bab6-9d0f242b1702/map.xml
2015-05-19 16:20:32,723 DEBUG [InputInitializer [Map 1] #0] exec.Utilities: 
Found plan in cache for name: map.xml
2015-05-19 16:20:32,744 INFO [InputInitializer [Map 1] #0] exec.Utilities: 
Processing alias crc_arsn
2015-05-19 16:20:32,744 INFO [InputInitializer [Map 1] #0] exec.Utilities: 
Adding input file hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn
2015-05-19 16:20:32,747 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.ids=
2015-05-19 16:20:32,747 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.names=,arsn_cd,appl_user_id
2015-05-19 16:20:32,747 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
Generating splits
2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2015-05-19 16:20:32,781 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: 
multipleLinearRandomRetry = null
2015-05-19 16:20:32,782 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting 
client out of cache: org.apache.hadoop.ipc.Client@7879a53d
2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2015-05-19 16:20:32,786 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: 
multipleLinearRandomRetry = null
2015-05-19 16:20:32,786 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting 
client out of cache: org.apache.hadoop.ipc.Client@7879a53d
2015-05-19 16:20:32,876 DEBUG [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Time taken to get FileStatuses: 87
2015-05-19 16:20:32,876 INFO [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Total input paths to process : 1
2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2015-05-19 16:20:32,882 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: 
multipleLinearRandomRetry = null
2015-05-19 16:20:32,883 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting 
client out of cache: org.apache.hadoop.ipc.Client@7879a53d
2015-05-19 16:20:32,907 DEBUG [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Total # of splits generated by getSplits: 40040, 
TimeTaken: 124
2015-05-19 16:20:32,916 INFO [InputInitializer 

[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-20 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552793#comment-14552793
 ] 

Greg Senia commented on HIVE-10746:
---

I am guessing this JIRA could be the root of this issue: 
https://issues.apache.org/jira/browse/HIVE-7156

gss2002_20150520132600_e4199888_c149_4394_8231_238d9d9dee98_1.Map_1_crc_arsn 
- gss2002_20150520132600_e4199888_c149_4394_8231_238d9d9dee98_1.Map_1 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ]




2015-05-20 13:26:03,760 INFO [IPC Server handler 0 on 33574] app.DAGAppMaster: 
JSON dump for submitted DAG, dagId=dag_1426958683478_173250_1, 
json={dagName:gss2002_20150520132600_e4199888-c149-4394-8231-238d9d9dee98:1,dagInfo:{\description\:\\\nSELECT
 appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY 
appl_user_id,arsn_cd ORDER BY 
appl_user_id\,\context\:\Hive\},version:1,vertices:[{vertexName:Map
 
1,processorClass:org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor,outEdgeIds:[196588160],additionalInputs:[{name:crc_arsn,class:org.apache.tez.mapreduce.input.MRInputLegacy,initializer:org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator}]},{vertexName:Reducer
 
2,processorClass:org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,inEdgeIds:[196588160],outEdgeIds:[1320926067]},{vertexName:Reducer
 
3,processorClass:org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,inEdgeIds:[1320926067],additionalOutputs:[{name:out_Reducer
 
3,class:org.apache.tez.mapreduce.output.MROutput}]}],edges:[{edgeId:196588160,inputVertexName:Map
 1,outputVertexName:Reducer 
2,dataMovementType:SCATTER_GATHER,dataSourceType:PERSISTED,schedulingType:SEQUENTIAL,edgeSourceClass:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput,edgeDestinationClass:org.apache.tez.runtime.library.input.OrderedGroupedKVInput},{edgeId:1320926067,inputVertexName:Reducer
 2,outputVertexName:Reducer 
3,dataMovementType:SCATTER_GATHER,dataSourceType:PERSISTED,schedulingType:SEQUENTIAL,edgeSourceClass:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput,edgeDestinationClass:org.apache.tez.runtime.library.input.OrderedGroupedKVInput}]}
2015-05-20 13:26:03,762 INFO [IPC Server handler 0 on 33574] app.DAGAppMaster: 
Generating DAG graphviz file, dagId=dag_1426958683478_173250_1, 
filePath=/u01/hadoop/yarn/log/application_1426958683478_173250/container_1426958683478_173250_01_01/dag_1426958683478_173250_1.dot


2015-05-20 13:26:05,142 DEBUG [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Total # of splits generated by getSplits: 40040, 
TimeTaken: 168
2015-05-20 13:26:05,144 DEBUG [Socket Reader #1 for port 33574] ipc.Server:  
got #159
2015-05-20 13:26:05,145 DEBUG [IPC Server handler 0 on 33574] ipc.Server: IPC 
Server handler 0 on 33574: 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 167.69.200.206:54162 Call#159 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER
2015-05-20 13:26:05,145 DEBUG [IPC Server handler 0 on 33574] 
security.UserGroupInformation: PrivilegedAction as:gss2...@exa.example.com 
(auth:TOKEN) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
2015-05-20 13:26:05,147 INFO [IPC Server handler 0 on 33574] ipc.Server: 
Served: getDAGStatus queueTime= 1 procesingTime= 2
2015-05-20 13:26:05,147 DEBUG [IPC Server handler 0 on 33574] ipc.Server: IPC 
Server handler 0 on 33574: responding to 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 167.69.200.206:54162 Call#159 Retry#0
2015-05-20 13:26:05,147 DEBUG [IPC Server handler 0 on 33574] ipc.Server: IPC 
Server handler 0 on 33574: responding to 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 167.69.200.206:54162 Call#159 Retry#0 Wrote 145 bytes.
2015-05-20 13:26:05,154 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
number of splits 40040
2015-05-20 13:26:05,154 INFO [InputInitializer [Map 1] #0] log.PerfLogger: 
/PERFLOG method=getSplits start=1432142764918 end=1432142765154 duration=236 
from=org.apache.hadoop.hive.ql.io.HiveInputFormat
2015-05-20 13:26:05,155 INFO [InputInitializer [Map 1] #0] 
tez.HiveSplitGenerator: Number of input splits: 40040. 23542 available slots, 
1.7 waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat

 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
 

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Priority: Critical
 Attachments: slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn 

[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-20 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553061#comment-14553061
 ] 

Greg Senia commented on HIVE-10746:
---

Debug logs from DAG with compressed it sets 1 split.. so how do we fix this 
issue?


2015-05-20 16:15:12,041 DEBUG [InputInitializer [Map 1] #0] exec.Utilities: 
Found plan in cache for name: map.xml
2015-05-20 16:15:12,055 INFO [InputInitializer [Map 1] #0] exec.Utilities: 
Processing alias gss_rsn2
2015-05-20 16:15:12,055 INFO [InputInitializer [Map 1] #0] exec.Utilities: 
Adding input file 
hdfs://xhadnnm1p.example.com:8020/apps/hive/warehouse/hue_debug.db/gss_rsn2
2015-05-20 16:15:12,057 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.ids=
2015-05-20 16:15:12,058 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.names=,arsn_cd,appl_user_id
2015-05-20 16:15:12,058 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
Generating splits
2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2015-05-20 16:15:12,088 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: 
multipleLinearRandomRetry = null
2015-05-20 16:15:12,088 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting 
client out of cache: org.apache.hadoop.ipc.Client@6c93595a
2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: 
multipleLinearRandomRetry = null
2015-05-20 16:15:12,092 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting 
client out of cache: org.apache.hadoop.ipc.Client@6c93595a
2015-05-20 16:15:12,216 DEBUG [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Time taken to get FileStatuses: 112
2015-05-20 16:15:12,216 INFO [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Total input paths to process : 1
2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] 
hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
2015-05-20 16:15:12,220 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: 
multipleLinearRandomRetry = null
2015-05-20 16:15:12,220 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting 
client out of cache: org.apache.hadoop.ipc.Client@6c93595a
2015-05-20 16:15:12,222 DEBUG [InputInitializer [Map 1] #0] 
mapred.FileInputFormat: Total # of splits generated by getSplits: 1, TimeTaken: 
132
2015-05-20 16:15:12,222 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: 
number of splits 1
2015-05-20 16:15:12,222 INFO [InputInitializer [Map 1] #0] log.PerfLogger: 
/PERFLOG method=getSplits start=1432152912040 end=143215291 duration=182 
from=org.apache.hadoop.hive.ql.io.HiveInputFormat
2015-05-20 16:15:12,222 INFO [InputInitializer [Map 1] #0] 
tez.HiveSplitGenerator: Number of input splits: 1. 23542 available slots, 1.7 
waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat
2015-05-20 16:15:12,223 INFO [InputInitializer [Map 1] #0] exec.Utilities: PLAN 
PATH = 
hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/646469af-0a87-4080-9d2b-e40af4a34c0e/hive_2015-05-20_16-15-06_565_5281905327000741927-1/gss2002/_tez_scratch_dir/049d6a0d-aea4-4805-90a5-84b8c38fe1f4/map.xml
2015-05-20 16:15:12,223 INFO [InputInitializer [Map 1] #0] exec.Utilities: 
***non-local mode***
2015-05-20 16:15:12,223 INFO [InputInitializer [Map 1] #0] exec.Utilities: 
local path = 
hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/646469af-0a87-4080-9d2b-e40af4a34c0e/hive_2015-05-20_16-15-06_565_5281905327000741927-1/gss2002/_tez_scratch_dir/049d6a0d-aea4-4805-90a5-84b8c38fe1f4/map.xml

[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-20 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553018#comment-14553018
 ] 

Greg Senia commented on HIVE-10746:
---

Just to clarify this data is tab delimited being loaded from SqoopV1... What is 
the difference between compressed vs uncompressed at this point?

Map 1: 0(+1)/1  Reducer 2: 0/1  Reducer 3: 0/1  
Map 1: 0(+1)/1  Reducer 2: 0/1  Reducer 3: 0/1  
Map 1: 0(+1)/1  Reducer 2: 0/1  Reducer 3: 0/1  
Map 1: 1/1  Reducer 2: 0/1  Reducer 3: 0/1  
Map 1: 1/1  Reducer 2: 0(+1)/1  Reducer 3: 0/1  
Map 1: 1/1  Reducer 2: 1/1  Reducer 3: 0(+1)/1  
Map 1: 1/1  Reducer 2: 1/1  Reducer 3: 1/1  
Status: DAG finished successfully in 523.42 seconds


METHOD DURATION(ms) 
parse   17
semanticAnalyze  1,593
TezBuildDag585
TezSubmitToRunningDag  187
TotalPrepTime3,522

VERTICES TOTAL_TASKS  FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS
CPU_TIME_MILLIS GC_TIME_MILLIS  INPUT_RECORDS   OUTPUT_RECORDS 
Map 1  100   516.72 
   752,950 15,318 13,440   11,516
Reducer 2  100 0.81 
 1,890 24 11,516   11,516
Reducer 3  100 0.61 
 1,460 19 11,5160
OK
BB166674 P16 1

 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
 

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Priority: Critical
 Attachments: slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
 consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to 
 run this same query against Tez as the execution engine it consistently runs 
 for over 300-500 seconds this seems extremely long. This is a basic external 
 table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
 query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now 
 Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an 
 execution engine with Single or small file tables. I can attach further logs 
 if someone needs them for deeper analysis.
 HDFS Output:
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 Hive Table Describe:
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  

[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-20 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552976#comment-14552976
 ] 

Greg Senia commented on HIVE-10746:
---

With Compression with Snappy it ran in 7 seconds...

Status: DAG finished successfully in 7.93 seconds


METHOD DURATION(ms) 
parse1,081
semanticAnalyze  1,488
TezBuildDag490
TezSubmitToRunningDag  374
TotalPrepTime4,958

VERTICES TOTAL_TASKS  FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS
CPU_TIME_MILLIS GC_TIME_MILLIS  INPUT_RECORDS   OUTPUT_RECORDS 
Map 1  100 2.23 
 3,790 29 13,440   11,516
Reducer 2  100 0.81 
 2,150  0 11,516   11,516
Reducer 3  100 0.61 
 1,110  0 11,5160
OK
BB166674 P16 1

 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
 

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Priority: Critical
 Attachments: slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
 consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to 
 run this same query against Tez as the execution engine it consistently runs 
 for over 300-500 seconds this seems extremely long. This is a basic external 
 table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
 query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now 
 Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an 
 execution engine with Single or small file tables. I can attach further logs 
 if someone needs them for deeper analysis.
 HDFS Output:
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 Hive Table Describe:
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  
 arsn_mbr_only_ind   string  
 arsn_qrmb_ind   string  
  
 # Detailed Table Information 
 Database:   adw  
 Owner:  loadu...@exa.example.com   
 CreateTime: Mon Apr 28 13:28:05 EDT 2014 
 LastAccessTime: UNKNOWN