[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("
[ https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961647#comment-16961647 ] Greg Senia commented on HIVE-17395: --- After spending many hours troubleshooting this enhancement seems to be the root of the LParen problem > HiveServer2 parsing a command with a lot of "(" > --- > > Key: HIVE-17395 > URL: https://issues.apache.org/jira/browse/HIVE-17395 > Project: Hive > Issue Type: Bug > Components: Beeline, HiveServer2 >Affects Versions: 2.3.0 >Reporter: dan young >Priority: Major > > Hello, > We're seeing what appears to be the same issue that was outlined in > HIVE-15388 where the query parser spends a lot of time (never returns and I > need to kill the beeline process) parsing a command with a lot of "(" . I > tried this in both 2.2 and now 2.3. > Here's an example query (this is auto generated SQL BTW) in beeline that > never completes/parses, I end up just killing the beeline process. > It looks like something similar was addressed as part of HIVE-15388. Any > ideas on how to address this? write better SQL? patch? > Regards, > Dano > {noformat} > Connected to: Apache Hive (version 2.3.0) > Driver: Hive JDBC (version 2.3.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 2.3.0 by Apache Hive > 0: jdbc:hive2://localhost:1/test_db> SELECT > ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 AS STRING), '-', > LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), > 2, '0'), '-01 00:00:00') AS TIMESTAMP), > 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 AS STRING), '-', > LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), > 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), > -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 AS STRING), '-', > LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), > 2, '0'), '-01 00:00:00') AS TIMESTAMP), > 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 AS STRING), '-', > LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))), > -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 > 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), > 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11)); > When I did a jstack on the HiveServer2, it appears the be stuck/running in > the HiveParser/antlr. > "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" > #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable > [0x7fb77d7bb000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502) > at org.antlr.runtime.DFA.predict(DFA.java:80) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988) > at > org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324) > at >
[jira] [Comment Edited] (HIVE-18624) Parsing time is extremely high (~10 min) for queries with complex select expressions
[ https://issues.apache.org/jira/browse/HIVE-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961586#comment-16961586 ] Greg Senia edited comment on HIVE-18624 at 10/29/19 1:14 AM: - Re-created this problem with a simple query as: Query:: select 'x' from processed_opendata_samples.nyse_stocks limit 2 This bug seems to have made it into many commercial distributions. Spent about 7 hours debugging today and determined HIVE-11600 brought this problem to light was (Author: gss2002): Re-created this problem with a simple query as: Query:: select 'x' from processed_opendata_samples.nyse_stocks limit 2 > Parsing time is extremely high (~10 min) for queries with complex select > expressions > > > Key: HIVE-18624 > URL: https://issues.apache.org/jira/browse/HIVE-18624 > Project: Hive > Issue Type: Bug > Components: Hive, Parser >Affects Versions: 2.0.0, 3.0.0, 2.3.2 >Reporter: Amruth Sampath >Assignee: Zoltan Haindrich >Priority: Major > Fix For: 2.4.0, 4.0.0, 3.2.0, 3.1.2 > > Attachments: HIVE-18624.01.patch, HIVE-18624.02.patch, thread_dump > > > Explain of the same query takes > 0.1 to 3 seconds in hive 2.1.0 & > 10-15 min in hive 2.3.2 & latest master > Sample expression below > {code:java} > EXPLAIN > SELECT DISTINCT > IF(lower('a') <= lower('a') > ,'a' > ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a') > ,'a' > ,IF(if('a' = 'a', TRUE, FALSE) = 1 > ,'a' > ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a') >and lower(if('a' = 'a','a','a')) <= lower('a')) > OR ('a' like 'a' OR 'a' like 'a') > OR 'a' in ('a','a') > ,'a' > ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1 > ,'a' > ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as > string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp() > <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + > 9*3600 > ,'a' > ,If(lower('a') <= lower('a') > and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1 > ,'a' > ,IF('a'=1 AND 'a'=1 > ,'a' > ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0 > ,'a' > ,IF('a' = 'a' > ,'a' > ,If('a' = 'a' AND > lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp))) > ,'a' > ,IF('a' = 1 > ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) > > 30 and 'a' = 1 > ,'a', 'a') > ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL > ,'a' > ,IF('a' = 1 and 'a'>0 > , 'a' > ,IF('a' = 1 AND 'a' ='a' > ,'a' > ,IF('a' is not null and 'a' is not null and 'a' > 'a' > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' = 'a' > ,'a' > ,If('a' = 1 > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as int) + 93600 < > cast(unix_timestamp() as int) > ,'a' > ,IF('a' = 'a' > ,'a' > ,IF('a' = 'a' and 'a' in ('a','a','a') > ,'a' > ,IF('a' = 'a' > ,'a','a')) > ))) > AS test_comp_exp > {code} > > Taking a look at [^thread_dump] shows a very large function stack getting > created. > Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed > up the parsing. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-18624) Parsing time is extremely high (~10 min) for queries with complex select expressions
[ https://issues.apache.org/jira/browse/HIVE-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961586#comment-16961586 ] Greg Senia commented on HIVE-18624: --- Re-created this problem with a simple query as: Query:: select 'x' from processed_opendata_samples.nyse_stocks limit 2 > Parsing time is extremely high (~10 min) for queries with complex select > expressions > > > Key: HIVE-18624 > URL: https://issues.apache.org/jira/browse/HIVE-18624 > Project: Hive > Issue Type: Bug > Components: Hive, Parser >Affects Versions: 2.0.0, 3.0.0, 2.3.2 >Reporter: Amruth Sampath >Assignee: Zoltan Haindrich >Priority: Major > Fix For: 2.4.0, 4.0.0, 3.2.0, 3.1.2 > > Attachments: HIVE-18624.01.patch, HIVE-18624.02.patch, thread_dump > > > Explain of the same query takes > 0.1 to 3 seconds in hive 2.1.0 & > 10-15 min in hive 2.3.2 & latest master > Sample expression below > {code:java} > EXPLAIN > SELECT DISTINCT > IF(lower('a') <= lower('a') > ,'a' > ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a') > ,'a' > ,IF(if('a' = 'a', TRUE, FALSE) = 1 > ,'a' > ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a') >and lower(if('a' = 'a','a','a')) <= lower('a')) > OR ('a' like 'a' OR 'a' like 'a') > OR 'a' in ('a','a') > ,'a' > ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1 > ,'a' > ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as > string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp() > <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + > 9*3600 > ,'a' > ,If(lower('a') <= lower('a') > and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1 > ,'a' > ,IF('a'=1 AND 'a'=1 > ,'a' > ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0 > ,'a' > ,IF('a' = 'a' > ,'a' > ,If('a' = 'a' AND > lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp))) > ,'a' > ,IF('a' = 1 > ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) > > 30 and 'a' = 1 > ,'a', 'a') > ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL > ,'a' > ,IF('a' = 1 and 'a'>0 > , 'a' > ,IF('a' = 1 AND 'a' ='a' > ,'a' > ,IF('a' is not null and 'a' is not null and 'a' > 'a' > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' = 'a' > ,'a' > ,If('a' = 1 > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as int) + 93600 < > cast(unix_timestamp() as int) > ,'a' > ,IF('a' = 'a' > ,'a' > ,IF('a' = 'a' and 'a' in ('a','a','a') > ,'a' > ,IF('a' = 'a' > ,'a','a')) > ))) > AS test_comp_exp > {code} > > Taking a look at [^thread_dump] shows a very large function stack getting > created. > Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed > up the parsing. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-18624) Parsing time is extremely high (~10 min) for queries with complex select expressions
[ https://issues.apache.org/jira/browse/HIVE-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated HIVE-18624: -- Affects Version/s: 2.0.0 > Parsing time is extremely high (~10 min) for queries with complex select > expressions > > > Key: HIVE-18624 > URL: https://issues.apache.org/jira/browse/HIVE-18624 > Project: Hive > Issue Type: Bug > Components: Hive, Parser >Affects Versions: 2.0.0, 3.0.0, 2.3.2 >Reporter: Amruth Sampath >Assignee: Zoltan Haindrich >Priority: Major > Fix For: 2.4.0, 4.0.0, 3.2.0, 3.1.2 > > Attachments: HIVE-18624.01.patch, HIVE-18624.02.patch, thread_dump > > > Explain of the same query takes > 0.1 to 3 seconds in hive 2.1.0 & > 10-15 min in hive 2.3.2 & latest master > Sample expression below > {code:java} > EXPLAIN > SELECT DISTINCT > IF(lower('a') <= lower('a') > ,'a' > ,IF(('a' IS NULL AND from_unixtime(UNIX_TIMESTAMP()) <= 'a') > ,'a' > ,IF(if('a' = 'a', TRUE, FALSE) = 1 > ,'a' > ,IF(('a' = 1 and lower('a') NOT IN ('a', 'a') >and lower(if('a' = 'a','a','a')) <= lower('a')) > OR ('a' like 'a' OR 'a' like 'a') > OR 'a' in ('a','a') > ,'a' > ,IF(if(lower('a') in ('a', 'a') and 'a'='a', TRUE, FALSE) = 1 > ,'a' > ,IF('a'='a' and unix_timestamp(if('a' = 'a',cast('a' as > string),coalesce('a',cast('a' as string),from_unixtime(unix_timestamp() > <= unix_timestamp(concat_ws('a',cast(lower('a') as string),'00:00:00')) + > 9*3600 > ,'a' > ,If(lower('a') <= lower('a') > and if(lower('a') in ('a', 'a') and 'a'<>'a', TRUE, FALSE) <> 1 > ,'a' > ,IF('a'=1 AND 'a'=1 > ,'a' > ,IF('a' = 1 and COALESCE(cast('a' as int),0) = 0 > ,'a' > ,IF('a' = 'a' > ,'a' > ,If('a' = 'a' AND > lower('a')>lower(if(lower('a')<1830,'a',cast(date_add('a',1) as timestamp))) > ,'a' > ,IF('a' = 1 > ,IF('a' in ('a', 'a') and ((unix_timestamp('a')-unix_timestamp('a')) / 60) > > 30 and 'a' = 1 > ,'a', 'a') > ,IF(if('a' = 'a', FALSE, TRUE ) = 1 AND 'a' IS NULL > ,'a' > ,IF('a' = 1 and 'a'>0 > , 'a' > ,IF('a' = 1 AND 'a' ='a' > ,'a' > ,IF('a' is not null and 'a' is not null and 'a' > 'a' > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' = 'a' > ,'a' > ,If('a' = 1 > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' = 1 > ,'a' > ,IF('a' ='a' and 'a' ='a' and cast(unix_timestamp('a') as int) + 93600 < > cast(unix_timestamp() as int) > ,'a' > ,IF('a' = 'a' > ,'a' > ,IF('a' = 'a' and 'a' in ('a','a','a') > ,'a' > ,IF('a' = 'a' > ,'a','a')) > ))) > AS test_comp_exp > {code} > > Taking a look at [^thread_dump] shows a very large function stack getting > created. > Reverting HIVE-15578 (92f31d07aa988d4a460aac56e369bfa386361776) seem to speed > up the parsing. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline
[ https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864981#comment-15864981 ] Greg Senia commented on HIVE-10511: --- [~gopalv] thank you for further insight. I wish all Hadoop Vendors I talked with felt the same way. I know multiple vendors who feel Hive and HiveServer2 should be the ONLY access mechanism on top of Hadoop. I guess the approach we've been taking at my current and past employer is that tools like Voltage or Protegrity or Dataguise would be used to secure column level access using FPE. But I can see how some companies would not want to invest down that road. How is LLAP and doAS working is that going to work as things like Protegrity require jobs to run as the actual endUser. It cannot run as Hive and unfortunately I don't think this requirement will be changing any time soon. > Replacing the implementation of Hive CLI using Beeline > -- > > Key: HIVE-10511 > URL: https://issues.apache.org/jira/browse/HIVE-10511 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.10.0 >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > > Hive CLI is a legacy tool which had two main use cases: > 1. a thick client for SQL on hadoop > 2. a command line tool for HiveServer1. > HiveServer1 is already deprecated and removed from Hive code base, so use > case #2 is out of the question. For #1, Beeline provides or is supposed to > provides equal functionality, yet is implemented differently from Hive CLI. > As it has been a while that Hive community has been recommending Beeline + > HS2 configuration, ideally we should deprecating Hive CLI. Because of wide > use of Hive CLI, we instead propose replacing Hive CLI's implementation with > Beeline plus embedded HS2 so that Hive community only needs to maintain a > single code path. In this way, Hive CLI is just an alias to Beeline at either > shell script level or at high code level. The goal is that no changes or > minimum changes are expected from existing user scrip using Hive CLI. > This is an Umbrella JIRA covering all tasks related to this initiative. Over > the last year or two, Beeline has been improved significantly to match what > Hive CLI offers. Still, there may still be some gaps or deficiency to be > discovered and fixed. In the meantime, we also want to make sure the enough > tests are included and performance impact is identified and addressed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline
[ https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864337#comment-15864337 ] Greg Senia commented on HIVE-10511: --- [~gopalv] out of curiosity the big issue is HS2 has always been a scalability issue. So the next question how do you plan on stopping folks on using sparkSQL cli as it goes directly at metastore and fs which we are using along with native mr / spark going directly at the fs system location of this data? > Replacing the implementation of Hive CLI using Beeline > -- > > Key: HIVE-10511 > URL: https://issues.apache.org/jira/browse/HIVE-10511 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.10.0 >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > > Hive CLI is a legacy tool which had two main use cases: > 1. a thick client for SQL on hadoop > 2. a command line tool for HiveServer1. > HiveServer1 is already deprecated and removed from Hive code base, so use > case #2 is out of the question. For #1, Beeline provides or is supposed to > provides equal functionality, yet is implemented differently from Hive CLI. > As it has been a while that Hive community has been recommending Beeline + > HS2 configuration, ideally we should deprecating Hive CLI. Because of wide > use of Hive CLI, we instead propose replacing Hive CLI's implementation with > Beeline plus embedded HS2 so that Hive community only needs to maintain a > single code path. In this way, Hive CLI is just an alias to Beeline at either > shell script level or at high code level. The goal is that no changes or > minimum changes are expected from existing user scrip using Hive CLI. > This is an Umbrella JIRA covering all tasks related to this initiative. Over > the last year or two, Beeline has been improved significantly to match what > Hive CLI offers. Still, there may still be some gaps or deficiency to be > discovered and fixed. In the meantime, we also want to make sure the enough > tests are included and performance impact is identified and addressed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10511) Replacing the implementation of Hive CLI using Beeline
[ https://issues.apache.org/jira/browse/HIVE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819606#comment-15819606 ] Greg Senia commented on HIVE-10511: --- I must ask the question how has this been looked at from real world usage. Two companies I have worked for now will have major scalability issues with HiveServer2 and beeline. Specifically with large results that a user may return. Whats the mitigation plan so clusters don't end up with 50 HiveServer2's with 12GB heaps... http://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_hiveserver2_configure.html > Replacing the implementation of Hive CLI using Beeline > -- > > Key: HIVE-10511 > URL: https://issues.apache.org/jira/browse/HIVE-10511 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.10.0 >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > > Hive CLI is a legacy tool which had two main use cases: > 1. a thick client for SQL on hadoop > 2. a command line tool for HiveServer1. > HiveServer1 is already deprecated and removed from Hive code base, so use > case #2 is out of the question. For #1, Beeline provides or is supposed to > provides equal functionality, yet is implemented differently from Hive CLI. > As it has been a while that Hive community has been recommending Beeline + > HS2 configuration, ideally we should deprecating Hive CLI. Because of wide > use of Hive CLI, we instead propose replacing Hive CLI's implementation with > Beeline plus embedded HS2 so that Hive community only needs to maintain a > single code path. In this way, Hive CLI is just an alias to Beeline at either > shell script level or at high code level. The goal is that no changes or > minimum changes are expected from existing user scrip using Hive CLI. > This is an Umbrella JIRA covering all tasks related to this initiative. Over > the last year or two, Beeline has been improved significantly to match what > Hive CLI offers. Still, there may still be some gaps or deficiency to be > discovered and fixed. In the meantime, we also want to make sure the enough > tests are included and performance impact is identified and addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13020) Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142137#comment-15142137 ] Greg Senia commented on HIVE-13020: --- [~thejas] and [~gopalv] no problem > Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK > -- > > Key: HIVE-13020 > URL: https://issues.apache.org/jira/browse/HIVE-13020 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore, Shims >Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Environment: Linux X86_64 and IBM JDK 8 >Reporter: Greg Senia >Assignee: Greg Senia > Labels: hdp, ibm, ibm-jdk > Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, > hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, > hiveserver2_beforepatch.txt > > > HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only > support the Oracle/Open JDK. I was performing testing of Hadoop running on > the IBM JDK and discovered this issue and have since drawn up the attached > patch. This looks to resolve the issue in a similar manner as how the Hadoop > core folks handle the IBM JDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7443) Fix HiveConnection to communicate with Kerberized Hive JDBC server and alternative JDKs
[ https://issues.apache.org/jira/browse/HIVE-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139638#comment-15139638 ] Greg Senia commented on HIVE-7443: -- [~crystal_gaoyu] did this fix ever make it into Hive? If it didn't by applying https://issues.apache.org/jira/browse/HADOOP-9969 this issue with beeline is gone with Hive 1.2.0 and the following fixes... https://issues.apache.org/jira/browse/TEZ-3105, https://issues.apache.org/jira/browse/HIVE-13020 > Fix HiveConnection to communicate with Kerberized Hive JDBC server and > alternative JDKs > --- > > Key: HIVE-7443 > URL: https://issues.apache.org/jira/browse/HIVE-7443 > Project: Hive > Issue Type: Bug > Components: JDBC, Security >Affects Versions: 0.12.0, 0.13.1 > Environment: Kerberos > Run Hive server2 and client with IBM JDK7.1 >Reporter: Yu Gao >Assignee: Yu Gao > Attachments: HIVE-7443.patch > > > Hive Kerberos authentication has been enabled in my cluster. I ran kinit to > initialize the current login user's ticket cache successfully, and then tried > to use beeline to connect to Hive Server2, but failed. After I manually added > some logging to catch the failure exception, this is what I got that caused > the failure: > beeline> !connect > jdbc:hive2://:1/default;principal=hive/@REALM.COM > org.apache.hive.jdbc.HiveDriver > scan complete in 2ms > Connecting to > jdbc:hive2://:1/default;principal=hive/@REALM.COM > Enter password for > jdbc:hive2://:1/default;principal=hive/@REALM.COM: > 14/07/17 15:12:45 ERROR jdbc.HiveConnection: Failed to open client transport > javax.security.sasl.SaslException: Failed to open client transport [Caused by > java.io.IOException: Could not instantiate SASL transport] > at > org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:78) > at > org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:342) > at > org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:200) > at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:178) > at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) > at java.sql.DriverManager.getConnection(DriverManager.java:582) > at java.sql.DriverManager.getConnection(DriverManager.java:198) > at > org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) > at > org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:186) > at org.apache.hive.beeline.Commands.connect(Commands.java:959) > at org.apache.hive.beeline.Commands.connect(Commands.java:880) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at > org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:44) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:801) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659) > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > Caused by: java.io.IOException: Could not instantiate SASL transport > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Client.createClientTransport(HadoopThriftAuthBridge20S.java:177) > at > org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:74) > ... 24 more > Caused by: javax.security.sasl.SaslException: Failure to initialize security > context [Caused by org.ietf.jgss.GSSException, major code: 13, minor code: 0 > major string: Invalid credentials > minor string: SubjectCredFinder: no JAAS Subject] > at > com.ibm.security.sasl.gsskerb.GssKrb5Client.(GssKrb5Client.java:131) > at > com.ibm.security.sasl.gsskerb.FactoryImpl.createSaslClient(FactoryImpl.java:53) > at javax.security.sasl.Sasl.createSaslClient(Sasl.java:362) > at >
[jira] [Commented] (HIVE-9545) Build FAILURE with IBM JVM
[ https://issues.apache.org/jira/browse/HIVE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137134#comment-15137134 ] Greg Senia commented on HIVE-9545: -- Any way we can get these integrated into Hive??? If there are issues getting it integrated please let me know and I will have a discussion with some folks that could hopefully influence gettng these IBM JDK related fixes for Hadoop into trunk.. > Build FAILURE with IBM JVM > --- > > Key: HIVE-9545 > URL: https://issues.apache.org/jira/browse/HIVE-9545 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 > Environment: mvn -version > Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; > 2014-08-11T22:58:10+02:00) > Maven home: /opt/apache-maven-3.2.3 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-x86_64-71/jre > Default locale: en_US, platform encoding: ISO-8859-1 > OS name: "linux", version: "3.10.0-123.4.4.el7.x86_64", arch: "amd64", > family: "unix" >Reporter: pascal oliva >Assignee: Navis > Attachments: HIVE-9545.1.patch.txt > > > NO PRECOMMIT TESTS > With the use of IBM JVM environment : > [root@dorado-vm2 hive]# java -version > java version "1.7.0" > Java(TM) SE Runtime Environment (build pxa6470_27sr2-20141026_01(SR2)) > IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References > 20141017_217728 (JIT enabled, AOT enabled). > The build failed on > [INFO] Hive Query Language FAILURE [ 50.053 > s] > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) > on project hive-exec: Compilation failure: Compilation failure: > [ERROR] > /home/pascal/hive0.14/hive/ql/src/java/org/apache/hadoop/hive/ql/debug/Utils.java:[29,26] > package com.sun.management does not exist. > HOWTO : > #git clone -b branch-0.14 https://github.com/apache/hive.git > #cd hive > #mvn install -DskipTests -Phadoop-2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13020) Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated HIVE-13020: -- Attachment: HIVE-13020.patch Patch > Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK > --- > > Key: HIVE-13020 > URL: https://issues.apache.org/jira/browse/HIVE-13020 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore, Shims >Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Environment: Linux X86_64 and IBM JDK 8 >Reporter: Greg Senia >Assignee: Greg Senia > Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0 > > Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, > hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, > hiveserver2_beforepatch.txt > > > HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only > support the Oracle/Open JDK. I was performing testing of Hadoop running on > the IBM JDK and discovered this issue and have since drawn up the attached > patch. This looks to resolve the issue in a similar manner as how the Hadoop > core folks handle the IBM JDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13020) Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated HIVE-13020: -- Attachment: hiveserver2_beforepatch.txt hiveserver2_afterpatch.txt hivemetastore_beforepatch.txt hivemetastore_afterpatch.txt Logs showing before and after patching with the provided patch > Hive Zookeeper Connection From MetaStore and HiveServer2 fails with IBM JDK > --- > > Key: HIVE-13020 > URL: https://issues.apache.org/jira/browse/HIVE-13020 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore, Shims >Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Environment: Linux X86_64 and IBM JDK 8 >Reporter: Greg Senia >Assignee: Greg Senia > Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0 > > Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, > hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, > hiveserver2_beforepatch.txt > > > HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only > support the Oracle/Open JDK. I was performing testing of Hadoop running on > the IBM JDK and discovered this issue and have since drawn up the attached > patch. This looks to resolve the issue in a similar manner as how the Hadoop > core folks handle the IBM JDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13020) Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK
[ https://issues.apache.org/jira/browse/HIVE-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated HIVE-13020: -- Labels: hdp ibm ibm-jdk (was: ) > Hive Metastore and HiveServer2 to Zookeeper fails with IBM JDK > -- > > Key: HIVE-13020 > URL: https://issues.apache.org/jira/browse/HIVE-13020 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore, Shims >Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Environment: Linux X86_64 and IBM JDK 8 >Reporter: Greg Senia >Assignee: Greg Senia > Labels: hdp, ibm, ibm-jdk > Fix For: 1.3.0, 2.0.0, 1.2.2, 2.1.0 > > Attachments: HIVE-13020.patch, hivemetastore_afterpatch.txt, > hivemetastore_beforepatch.txt, hiveserver2_afterpatch.txt, > hiveserver2_beforepatch.txt > > > HiveServer2 and Hive Metastore Zookeeper component is hardcoded to only > support the Oracle/Open JDK. I was performing testing of Hadoop running on > the IBM JDK and discovered this issue and have since drawn up the attached > patch. This looks to resolve the issue in a similar manner as how the Hadoop > core folks handle the IBM JDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599913#comment-14599913 ] Greg Senia commented on HIVE-11051: --- Fix looks good. Tested in our environment testing one final use case today. Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; - Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Tez Affects Versions: 1.2.0 Reporter: Greg Senia Assignee: Matt McCline Priority: Critical Attachments: HIVE-11051.01.patch, problem_table_joins.tar.gz I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 {code} Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598804#comment-14598804 ] Greg Senia commented on HIVE-10729: --- Gunther Hagleitner and Matt Mcline Using this Patch against my JIRA HIVE-11051 and the test case on Hadoop 2.4.1 with Hive 1.2.0 and Tez 0.5.4 it still fails: Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002246948195,svcrqst_id:003629537980,svcrqst_crt_dts:2015-04-24 12:48:37.859683,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-24 12:48:37.859683,crsr_lupdt:null,cntevsds_lupdt:2015-04-24 12:48:40.499238,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002246948195,svcrqst_id:003629537980,svcrqst_crt_dts:2015-04-24 12:48:37.859683,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-24 12:48:37.859683,crsr_lupdt:null,cntevsds_lupdt:2015-04-24 12:48:40.499238,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:426) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.set(ArrayList.java:426) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.fixupComplexObjects(MapJoinBytesTableContainer.java:424) at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.uppack(HybridHashTableContainer.java:875) at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:845) at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:722) at org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:62) at org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:33) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:650) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:756) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:414) ... 23 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1434641270368_13820_2_01 [Map 2] killed/failed due to:null]DAG failed due to vertex failure. failedVertices:1 killedVertices:0 Query failed when select complex columns from joinned table (tez map join only) --- Key: HIVE-10729 URL: https://issues.apache.org/jira/browse/HIVE-10729 Project: Hive Issue Type: Bug Components:
[jira] [Commented] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592882#comment-14592882 ] Greg Senia commented on HIVE-11051: --- This seems to be related/similar: http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2 http://qnalist.com/questions/5904003/map-side-join-fails-when-a-serialized-table-contains-arrays Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; - Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Attachments: problem_table_joins.tar.gz I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at
[jira] [Updated] (HIVE-11051) Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object;
[ https://issues.apache.org/jira/browse/HIVE-11051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated HIVE-11051: -- Component/s: Tez Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; - Key: HIVE-11051 URL: https://issues.apache.org/jira/browse/HIVE-11051 Project: Hive Issue Type: Bug Components: Serializers/Deserializers, Tez Affects Versions: 1.2.0 Reporter: Greg Senia Assignee: Gopal V Priority: Critical Attachments: problem_table_joins.tar.gz I tried to apply: HIVE-10729 which did not solve the issue. The following exception is thrown on a Tez MapJoin with Hive 1.2.0 and Tez 0.5.4/0.5.3 Status: Running (Executing on YARN cluster with App id application_1434641270368_1038) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. SUCCEEDED 3 300 0 0 Map 2 ... FAILED 3 102 7 0 VERTICES: 01/02 [=-] 66% ELAPSED TIME: 7.39 s Status: Failed Vertex failed, vertexName=Map 2, vertexId=vertex_1434641270368_1038_2_01, diagnostics=[Task failed, taskId=task_1434641270368_1038_2_01_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002245282386,svcrqst_id:003627217285,svcrqst_crt_dts:2015-04-23 11:54:39.238357,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-23 22:14:01.288132,crsr_lupdt:null,cntevsds_lupdt:2015-04-23 11:54:40.740061,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578856#comment-14578856 ] Greg Senia commented on HIVE-10729: --- Here is the query and source table describe that shows the arraystring which seems to be the cause... drop table debug.ct_gsd_events1_test; create table debug.ct_gsd_events1_test as select a.*, b.svcrqst_id, b.svcrqct_cds, b.svcrtyp_cd, b.cmpltyp_cd, b.sum_reason_cd as src, b.cnctmd_cd, b.notes from ctm.ct_gsd_events a inner join mbr.gsd_service_request b on a.contact_event_id = b.cnctevn_id; hive describe formatted ctm.ct_gsd_events; OK # col_name data_type comment hmoid string cumb_id_no int mbrind_id string contact_event_idstring ce_create_dtstring ce_end_dt string contact_typestring cnctevs_cd string contact_modestring cntvnst_stts_cd string total_transfers int ce_notesarraystring # Detailed Table Information Database: ctm Owner: LOAD_USER CreateTime: Fri May 29 09:41:58 EDT 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/apps/hive/warehouse/ctm.db/ct_gsd_events Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles154 numRows 0 rawDataSize 0 totalSize 5464108 transient_lastDdlTime 1432906919 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 Time taken: 2.968 seconds, Fetched: 42 row(s) Query failed when select complex columns from joinned table (tez map join only) --- Key: HIVE-10729 URL: https://issues.apache.org/jira/browse/HIVE-10729 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch When map join happens, if projection columns include complex data types, query will fail. Steps to reproduce: {code:sql} hive set hive.auto.convert.join; hive.auto.convert.join=true hive desc foo; a arrayint hive select * from foo; [1,2] hive desc src_int; key int value string hive select * from src_int where key=2; 2val_2 hive select * from foo join src_int src on src.key = foo.a[1]; {code} Query will fail with stack trace {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692) at
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578874#comment-14578874 ] Greg Senia commented on HIVE-10729: --- Here is a sample of the data I think the cause is their is a null in the arraystring field of notes... this was not a problem with Hive 0.13 it definitely started with Hive 0.14/1.x line.. Vertex failed, vertexName=Map 2, vertexId=vertex_1426958683478_216665_2_01, diagnostics=[Task failed, taskId=task_1426958683478_216665_2_01_000104, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cumb_id_no:31585,cumb_id_no_sub:31585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 ,plan_component:H ,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX ,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX ,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-03-09 11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 11:24:51.315134,cntevsds_lupdt:2015-03-09 11:25:13.429458,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cumb_id_no:31XXX585,cumb_id_no_sub:31XXX585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 ,plan_component:H ,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX ,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX ,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-03-09 11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 11:24:51.315134,cntevsds_lupdt:2015-03-09 11:25:13.429458,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cumb_id_no:31585,cumb_id_no_sub:31585,cnctevn_id:0021XXX86715,svcrqst_id:003XXX346030,svcrqst_crt_dts:2015-03-09 11:25:10.927722,subject_seq_no:1,cntmbrp_id:692XX60 ,plan_component:H ,psuniq_id:14XXX279,cust_segment:RM ,idcard:MEXX ,cnctyp_cd:001,cnctmd_cd:D01,cnctevs_cd:007,svcrtyp_cd:722,svrstyp_cd:832,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,svcrqst_lupdusr_id:XXX ,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-03-09 11:25:10.927722,crsr_lupdt:null,cntmbrp_lupdt:2015-03-09 11:24:51.315134,cntevsds_lupdt:2015-03-09
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561260#comment-14561260 ] Greg Senia commented on HIVE-10729: --- I tried this patch with Hive 1.2.0 and I am still getting this error Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListElement(StandardListObjectInspector.java:66) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIndex.evaluate(GenericUDFIndex.java:102) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81) ... 31 more Query failed when select complex columns from joinned table (tez map join only) --- Key: HIVE-10729 URL: https://issues.apache.org/jira/browse/HIVE-10729 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch When map join happens, if projection columns include complex data types, query will fail. Steps to reproduce: {code:sql} hive set hive.auto.convert.join; hive.auto.convert.join=true hive desc foo; a arrayint hive select * from foo; [1,2] hive desc src_int; key int value string hive select * from src_int where key=2; 2val_2 hive select * from foo join src_int src on src.key = foo.a[1]; {code} Query will fail with stack trace {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386) ... 23 more {noformat} Similar error when projection columns include a map: {code:sql} hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC; hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM src LIMIT 1; hive select * from src join test where src.key=test.a; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555456#comment-14555456 ] Greg Senia commented on HIVE-10746: --- After having offline discussion with Gopal V he determined the cause of this problem is that starting in Hive 0.14 org.apache.hadoop.mapred.TextInputFormat uses whatever is defined in property: mapreduce.input.fileinputformat.split.minsize; In my case this was defined to 1... Unfortunately that is 1 byte so it created 40040 splits creating 40400 reads of the single 3MB file... Hope this helps someone else out. Should be around half of the HDFS block size in my case 64MB since my block size is 128MB.. mapreduce.input.fileinputformat.split.minsize=67108864 Gopal V if no fix is coming should we resolve/close this JIRA? Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical Attachments: slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn Table Type: EXTERNAL_TABLE Table Parameters: EXTERNALTRUE
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552508#comment-14552508 ] Greg Senia commented on HIVE-10746: --- Seems to be that single file with a group by/order by is generating 40040 splits... I think the map file is needed at this point to determine why this is happening correct? 2015-05-19 16:20:32,462 INFO [AsyncDispatcher event handler] impl.VertexImpl: Num tasks is -1. Expecting VertexManager/InputInitializers/1-1 split to set #tasks for the vertex vertex_1426958683478_171530_1_00 2015-05-19 16:20:32,707 DEBUG [InputInitializer [Map 1] #0] security.UserGroupInformation: PrivilegedAction as:gss2002 (auth:SIMPLE) from:org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) 2015-05-19 16:20:32,708 INFO [InputInitializer [Map 1] #0] dag.RootInputInitializerManager: Starting InputInitializer for Input: crc_arsn on vertex vertex_1426958683478_171530_1_00 [Map 1] 2015-05-19 16:20:32,722 INFO [InputInitializer [Map 1] #0] log.PerfLogger: PERFLOG method=getSplits from=org.apache.hadoop.hive.ql.io.HiveInputFormat 2015-05-19 16:20:32,723 INFO [InputInitializer [Map 1] #0] exec.Utilities: PLAN PATH = hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/431ae2bc-ebc9-48e7-bbb3-f03144198009/hive_2015-05-19_16-20-28_783_5570914503219655045-1/gss2002/_tez_scratch_dir/9da6870e-7388-40b1-bab6-9d0f242b1702/map.xml 2015-05-19 16:20:32,723 DEBUG [InputInitializer [Map 1] #0] exec.Utilities: Found plan in cache for name: map.xml 2015-05-19 16:20:32,744 INFO [InputInitializer [Map 1] #0] exec.Utilities: Processing alias crc_arsn 2015-05-19 16:20:32,744 INFO [InputInitializer [Map 1] #0] exec.Utilities: Adding input file hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn 2015-05-19 16:20:32,747 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: hive.io.file.readcolumn.ids= 2015-05-19 16:20:32,747 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: hive.io.file.readcolumn.names=,arsn_cd,appl_user_id 2015-05-19 16:20:32,747 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: Generating splits 2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true 2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 2015-05-19 16:20:32,780 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2015-05-19 16:20:32,781 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: multipleLinearRandomRetry = null 2015-05-19 16:20:32,782 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@7879a53d 2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true 2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 2015-05-19 16:20:32,785 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2015-05-19 16:20:32,786 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: multipleLinearRandomRetry = null 2015-05-19 16:20:32,786 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@7879a53d 2015-05-19 16:20:32,876 DEBUG [InputInitializer [Map 1] #0] mapred.FileInputFormat: Time taken to get FileStatuses: 87 2015-05-19 16:20:32,876 INFO [InputInitializer [Map 1] #0] mapred.FileInputFormat: Total input paths to process : 1 2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true 2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 2015-05-19 16:20:32,881 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2015-05-19 16:20:32,882 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: multipleLinearRandomRetry = null 2015-05-19 16:20:32,883 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@7879a53d 2015-05-19 16:20:32,907 DEBUG [InputInitializer [Map 1] #0] mapred.FileInputFormat: Total # of splits generated by getSplits: 40040, TimeTaken: 124 2015-05-19 16:20:32,916 INFO [InputInitializer
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552793#comment-14552793 ] Greg Senia commented on HIVE-10746: --- I am guessing this JIRA could be the root of this issue: https://issues.apache.org/jira/browse/HIVE-7156 gss2002_20150520132600_e4199888_c149_4394_8231_238d9d9dee98_1.Map_1_crc_arsn - gss2002_20150520132600_e4199888_c149_4394_8231_238d9d9dee98_1.Map_1 [ label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ] 2015-05-20 13:26:03,760 INFO [IPC Server handler 0 on 33574] app.DAGAppMaster: JSON dump for submitted DAG, dagId=dag_1426958683478_173250_1, json={dagName:gss2002_20150520132600_e4199888-c149-4394-8231-238d9d9dee98:1,dagInfo:{\description\:\\\nSELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id\,\context\:\Hive\},version:1,vertices:[{vertexName:Map 1,processorClass:org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor,outEdgeIds:[196588160],additionalInputs:[{name:crc_arsn,class:org.apache.tez.mapreduce.input.MRInputLegacy,initializer:org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator}]},{vertexName:Reducer 2,processorClass:org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,inEdgeIds:[196588160],outEdgeIds:[1320926067]},{vertexName:Reducer 3,processorClass:org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,inEdgeIds:[1320926067],additionalOutputs:[{name:out_Reducer 3,class:org.apache.tez.mapreduce.output.MROutput}]}],edges:[{edgeId:196588160,inputVertexName:Map 1,outputVertexName:Reducer 2,dataMovementType:SCATTER_GATHER,dataSourceType:PERSISTED,schedulingType:SEQUENTIAL,edgeSourceClass:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput,edgeDestinationClass:org.apache.tez.runtime.library.input.OrderedGroupedKVInput},{edgeId:1320926067,inputVertexName:Reducer 2,outputVertexName:Reducer 3,dataMovementType:SCATTER_GATHER,dataSourceType:PERSISTED,schedulingType:SEQUENTIAL,edgeSourceClass:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput,edgeDestinationClass:org.apache.tez.runtime.library.input.OrderedGroupedKVInput}]} 2015-05-20 13:26:03,762 INFO [IPC Server handler 0 on 33574] app.DAGAppMaster: Generating DAG graphviz file, dagId=dag_1426958683478_173250_1, filePath=/u01/hadoop/yarn/log/application_1426958683478_173250/container_1426958683478_173250_01_01/dag_1426958683478_173250_1.dot 2015-05-20 13:26:05,142 DEBUG [InputInitializer [Map 1] #0] mapred.FileInputFormat: Total # of splits generated by getSplits: 40040, TimeTaken: 168 2015-05-20 13:26:05,144 DEBUG [Socket Reader #1 for port 33574] ipc.Server: got #159 2015-05-20 13:26:05,145 DEBUG [IPC Server handler 0 on 33574] ipc.Server: IPC Server handler 0 on 33574: org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from 167.69.200.206:54162 Call#159 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER 2015-05-20 13:26:05,145 DEBUG [IPC Server handler 0 on 33574] security.UserGroupInformation: PrivilegedAction as:gss2...@exa.example.com (auth:TOKEN) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 2015-05-20 13:26:05,147 INFO [IPC Server handler 0 on 33574] ipc.Server: Served: getDAGStatus queueTime= 1 procesingTime= 2 2015-05-20 13:26:05,147 DEBUG [IPC Server handler 0 on 33574] ipc.Server: IPC Server handler 0 on 33574: responding to org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from 167.69.200.206:54162 Call#159 Retry#0 2015-05-20 13:26:05,147 DEBUG [IPC Server handler 0 on 33574] ipc.Server: IPC Server handler 0 on 33574: responding to org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from 167.69.200.206:54162 Call#159 Retry#0 Wrote 145 bytes. 2015-05-20 13:26:05,154 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: number of splits 40040 2015-05-20 13:26:05,154 INFO [InputInitializer [Map 1] #0] log.PerfLogger: /PERFLOG method=getSplits start=1432142764918 end=1432142765154 duration=236 from=org.apache.hadoop.hive.ql.io.HiveInputFormat 2015-05-20 13:26:05,155 INFO [InputInitializer [Map 1] #0] tez.HiveSplitGenerator: Number of input splits: 40040. 23542 available slots, 1.7 waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical Attachments: slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553061#comment-14553061 ] Greg Senia commented on HIVE-10746: --- Debug logs from DAG with compressed it sets 1 split.. so how do we fix this issue? 2015-05-20 16:15:12,041 DEBUG [InputInitializer [Map 1] #0] exec.Utilities: Found plan in cache for name: map.xml 2015-05-20 16:15:12,055 INFO [InputInitializer [Map 1] #0] exec.Utilities: Processing alias gss_rsn2 2015-05-20 16:15:12,055 INFO [InputInitializer [Map 1] #0] exec.Utilities: Adding input file hdfs://xhadnnm1p.example.com:8020/apps/hive/warehouse/hue_debug.db/gss_rsn2 2015-05-20 16:15:12,057 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: hive.io.file.readcolumn.ids= 2015-05-20 16:15:12,058 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: hive.io.file.readcolumn.names=,arsn_cd,appl_user_id 2015-05-20 16:15:12,058 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: Generating splits 2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true 2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 2015-05-20 16:15:12,087 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2015-05-20 16:15:12,088 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: multipleLinearRandomRetry = null 2015-05-20 16:15:12,088 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@6c93595a 2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true 2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2015-05-20 16:15:12,091 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: multipleLinearRandomRetry = null 2015-05-20 16:15:12,092 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@6c93595a 2015-05-20 16:15:12,216 DEBUG [InputInitializer [Map 1] #0] mapred.FileInputFormat: Time taken to get FileStatuses: 112 2015-05-20 16:15:12,216 INFO [InputInitializer [Map 1] #0] mapred.FileInputFormat: Total input paths to process : 1 2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false 2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true 2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false 2015-05-20 16:15:12,219 DEBUG [InputInitializer [Map 1] #0] hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket 2015-05-20 16:15:12,220 DEBUG [InputInitializer [Map 1] #0] retry.RetryUtils: multipleLinearRandomRetry = null 2015-05-20 16:15:12,220 DEBUG [InputInitializer [Map 1] #0] ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@6c93595a 2015-05-20 16:15:12,222 DEBUG [InputInitializer [Map 1] #0] mapred.FileInputFormat: Total # of splits generated by getSplits: 1, TimeTaken: 132 2015-05-20 16:15:12,222 INFO [InputInitializer [Map 1] #0] io.HiveInputFormat: number of splits 1 2015-05-20 16:15:12,222 INFO [InputInitializer [Map 1] #0] log.PerfLogger: /PERFLOG method=getSplits start=1432152912040 end=143215291 duration=182 from=org.apache.hadoop.hive.ql.io.HiveInputFormat 2015-05-20 16:15:12,222 INFO [InputInitializer [Map 1] #0] tez.HiveSplitGenerator: Number of input splits: 1. 23542 available slots, 1.7 waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat 2015-05-20 16:15:12,223 INFO [InputInitializer [Map 1] #0] exec.Utilities: PLAN PATH = hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/646469af-0a87-4080-9d2b-e40af4a34c0e/hive_2015-05-20_16-15-06_565_5281905327000741927-1/gss2002/_tez_scratch_dir/049d6a0d-aea4-4805-90a5-84b8c38fe1f4/map.xml 2015-05-20 16:15:12,223 INFO [InputInitializer [Map 1] #0] exec.Utilities: ***non-local mode*** 2015-05-20 16:15:12,223 INFO [InputInitializer [Map 1] #0] exec.Utilities: local path = hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/646469af-0a87-4080-9d2b-e40af4a34c0e/hive_2015-05-20_16-15-06_565_5281905327000741927-1/gss2002/_tez_scratch_dir/049d6a0d-aea4-4805-90a5-84b8c38fe1f4/map.xml
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553018#comment-14553018 ] Greg Senia commented on HIVE-10746: --- Just to clarify this data is tab delimited being loaded from SqoopV1... What is the difference between compressed vs uncompressed at this point? Map 1: 0(+1)/1 Reducer 2: 0/1 Reducer 3: 0/1 Map 1: 0(+1)/1 Reducer 2: 0/1 Reducer 3: 0/1 Map 1: 0(+1)/1 Reducer 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Reducer 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Reducer 2: 0(+1)/1 Reducer 3: 0/1 Map 1: 1/1 Reducer 2: 1/1 Reducer 3: 0(+1)/1 Map 1: 1/1 Reducer 2: 1/1 Reducer 3: 1/1 Status: DAG finished successfully in 523.42 seconds METHOD DURATION(ms) parse 17 semanticAnalyze 1,593 TezBuildDag585 TezSubmitToRunningDag 187 TotalPrepTime3,522 VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 100 516.72 752,950 15,318 13,440 11,516 Reducer 2 100 0.81 1,890 24 11,516 11,516 Reducer 3 100 0.61 1,460 19 11,5160 OK BB166674 P16 1 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical Attachments: slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string
[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552976#comment-14552976 ] Greg Senia commented on HIVE-10746: --- With Compression with Snappy it ran in 7 seconds... Status: DAG finished successfully in 7.93 seconds METHOD DURATION(ms) parse1,081 semanticAnalyze 1,488 TezBuildDag490 TezSubmitToRunningDag 374 TotalPrepTime4,958 VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 100 2.23 3,790 29 13,440 11,516 Reducer 2 100 0.81 2,150 0 11,516 11,516 Reducer 3 100 0.61 1,110 0 11,5160 OK BB166674 P16 1 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by Key: HIVE-10746 URL: https://issues.apache.org/jira/browse/HIVE-10746 Project: Hive Issue Type: Bug Components: Hive, Tez Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 Reporter: Greg Senia Priority: Critical Attachments: slow_query_output.zip The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to run this same query against Tez as the execution engine it consistently runs for over 300-500 seconds this seems extremely long. This is a basic external table delimited by tabs and is a single file in a folder. In Hive 0.13 this query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an execution engine with Single or small file tables. I can attach further logs if someone needs them for deeper analysis. HDFS Output: hadoop fs -ls /example_dw/crc/arsn Found 2 items -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 /example_dw/crc/arsn/_SUCCESS -rwxr-x--- 6 loaduser hadoopusers3883880 2015-05-17 20:03 /example_dw/crc/arsn/part-m-0 Hive Table Describe: hive describe formatted crc_arsn; OK # col_name data_type comment arsn_cd string clmlvl_cd string arclss_cd string arclssg_cd string arsn_prcsr_rmk_ind string arsn_mbr_rspns_ind string savtyp_cd string arsn_eff_dt string arsn_exp_dt string arsn_pstd_dts string arsn_lstupd_dts string arsn_updrsn_txt string appl_user_idstring arsntyp_cd string pre_d_indicator string arsn_display_txtstring arstat_cd string arsn_tracking_nostring arsn_cstspcfc_ind string arsn_mstr_rcrd_ind string state_specific_ind string region_specific_in string arsn_dpndnt_cd string unit_adjustment_in string arsn_mbr_only_ind string arsn_qrmb_ind string # Detailed Table Information Database: adw Owner: loadu...@exa.example.com CreateTime: Mon Apr 28 13:28:05 EDT 2014 LastAccessTime: UNKNOWN