[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750784#comment-15750784 ] Vikash Pareek commented on HIVE-15272: -- Query you can find in the issue description itself. SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my_db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) For DDL, region -> STRING amount -> DECIMAL name -> STRING > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my_db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikash Pareek updated HIVE-15272: - Description: I ran following Hive query multiple times with execution engine as Hive on Spark and Hive on MapReduce. {code} SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my_db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) {code} With Hive on Spark: Result (count) were different of every execution. With Hive on MapReduce: Result (count) were same of every execution. Seems like Hive on Spark behaving differently in each execution and does not populating correct result. was: I ran following Hive query multiple times with execution engine as Hive on Spark and Hive on MapReduce. {code} SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my-db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) {code} With Hive on Spark: Result (count) were different of every execution. With Hive on MapReduce: Result (count) were same of every execution. Seems like Hive on Spark behaving differently in each execution and does not populating correct result. > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my_db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693243#comment-15693243 ] Vikash Pareek edited comment on HIVE-15272 at 11/24/16 3:10 PM: I am just calculating count of the records, result (count) does not dependent on ordering. Result should be same for each execution as in case of MR. my_table1 (left) is having ~30 million records my_table2 (right) is having ~85 million records was (Author: vpareek): I am just calculating count of the records, result (count) does not dependent on ordering. Result should be same for each execution as in case of MR. I have around 30 million data in my_table1 (left) and 85 million data in my_table2 (right). > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikash Pareek updated HIVE-15272: - Summary: "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark (was: "LEFT OUTER JOIN" Is not populating different records with Hive On Spark) > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating different records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693243#comment-15693243 ] Vikash Pareek commented on HIVE-15272: -- I am just calculating count of the records, result (count) does not dependent on ordering. Result should be same for each execution as in case of MR. I have around 30 million data in my_table1 (left) and 85 million data in my_table2 (right). > "LEFT OUTER JOIN" Is not populating different records with Hive On Spark > > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating different records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikash Pareek updated HIVE-15272: - Description: I ran following Hive query multiple times with execution engine as Hive on Spark and Hive on MapReduce. {code} SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my-db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) {code} With Hive on Spark: Result (count) were different of every execution. With Hive on MapReduce: Result (count) were same of every execution. Seems like Hive on Spark behaving differently in each execution and does not populating correct result. was: I ran following Hive query multiple times with execution engine as Hive on Spark and Hive on MapReduce. {code} SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my-db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) {code} With Hive on Spark: Result (count) were different of every execution. With Hive on MapReduce: Result (count) were same of every execution. > "LEFT OUTER JOIN" Is not populating different records with Hive On Spark > > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating different records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikash Pareek updated HIVE-15272: - Description: I ran following Hive query multiple times with execution engine as Hive on Spark and Hive on MapReduce. {code} SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my-db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) {code} With Hive on Spark: Result (count) were different of every execution. With Hive on MapReduce: Result (count) were same of every execution. was: Following query is populating different result every time I ran with Hive on Spark: {code} SELECT COUNT(*) FROM (SELECT DISTINCT mt1.name, mt1.id FROM (SELECT mt1.*, mt2.region, mt2., regexp_replace(mt2.tr_dat,"\\.","") AS TRANSACTION_DATE FROM my_database.my_table1 mt1 LEFT OUTER JOIN my_database.my_table2 mt2 ON (mt1.id=mt2.id AND mt1.name = mt2.name))t6)A; {code} But the same query populating same result with Hive on MapReduce every time. > "LEFT OUTER JOIN" Is not populating different records with Hive On Spark > > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8960) ParsingException in the WHERE statement with a Sub Query
[ https://issues.apache.org/jira/browse/HIVE-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429315#comment-15429315 ] Vikash Pareek commented on HIVE-8960: - I am trying following query, it is working in Impala but not in Hive. SELECT t1.col1 FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.col2 = t2.col2 AND t1.col3 = t2.col3) WHERE t2.col4 = (SELECT MAX(t22.col4) FROM table2 t22 WHERE t22.col4 <= t1.col4); Is there any alternative for this in Hive? > ParsingException in the WHERE statement with a Sub Query > > > Key: HIVE-8960 > URL: https://issues.apache.org/jira/browse/HIVE-8960 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 0.13.0 > Environment: Secured HDP 2.1.3 with Hive 0.13.0 >Reporter: Rémy SAISSY > > Comparison with a Sub query in a WHERE statement does not work. > Given that id_chargement is an integer: > USE db1; > SELECT * FROM tbl1 a WHERE a.id_chargement > (SELECT MAX(b.id_chargement) > FROM tbl2 b); > or > SELECT * FROM tbl1 a WHERE a.id_chargement > (SELECT b.id_chargement FROM > tbl2 b LIMIT 1); > Both return the following parsing error: > Error: Error while compiling statement: FAILED: ParseException line 1:88 > cannot recognize input near 'SELECT' 'b' '.' in expression specification > (state=42000,code=4) > java.sql.SQLException: Error while compiling statement: FAILED: > ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in > expression specification > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) > at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) > at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) > at org.apache.hive.beeline.Commands.execute(Commands.java:736) > at org.apache.hive.beeline.Commands.sql(Commands.java:657) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:804) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659) > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)