[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961373#comment-15961373 ] Miklos Szurap commented on HIVE-15272: -- As the amount field is a DECIMAL, it is very likely that HIVE-12768 is the root cause. > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek >Assignee: Rui Li > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my_db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753212#comment-15753212 ] Rui Li commented on HIVE-15272: --- OK I'll look into this. [~VPareek], I think the two tables have same DDL right? Do they contain same data? Could you upload some sample data that can reproduce the issue? Thanks! > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my_db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751333#comment-15751333 ] Xuefu Zhang commented on HIVE-15272: cc: [~lirui] > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my_db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750784#comment-15750784 ] Vikash Pareek commented on HIVE-15272: -- Query you can find in the issue description itself. SELECT COUNT(DISTINCT t1.region, t1.amount) FROM my_db.my_table1 t1 LEFT OUTER JOIN my_db.my_table2 t2 ON (t1.id = t2.id AND t1.name = t2.name) For DDL, region -> STRING amount -> DECIMAL name -> STRING > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my_db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
[ https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15703275#comment-15703275 ] Xuefu Zhang commented on HIVE-15272: [~VPareek] would you mind providing a repro case (data, ddl, and query)? Thanks. > "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark > -- > > Key: HIVE-15272 > URL: https://issues.apache.org/jira/browse/HIVE-15272 > Project: Hive > Issue Type: Bug > Components: Hive, Spark >Affects Versions: 1.1.0 > Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4 >Reporter: Vikash Pareek > > I ran following Hive query multiple times with execution engine as Hive on > Spark and Hive on MapReduce. > {code} > SELECT COUNT(DISTINCT t1.region, t1.amount) > FROM my_db.my_table1 t1 > LEFT OUTER > JOIN my-db.my_table2 t2 ON (t1.id = t2.id > AND t1.name = t2.name) > {code} > With Hive on Spark: Result (count) were different of every execution. > With Hive on MapReduce: Result (count) were same of every execution. > Seems like Hive on Spark behaving differently in each execution and does not > populating correct result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)