[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-18 Thread Kamil Gorlo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251495#comment-14251495
 ] 

Kamil Gorlo commented on HIVE-9146:
---

I've tested in on HDP 2.2 with Hive 0.14 and in fact everything is working as 
expected. Thanks.

 Query with left joins produces wrong result when join condition is written in 
 different order
 -

 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Kamil Gorlo

 I have two queries which should be equal (I only swap two join conditions) 
 but they are not. They are simplest queries I could produce to reproduce bug.
 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 And when I run this query (query no. 1):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
 {quote}
 I get result (which is correct):
 | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
 | 1   | 2| 1  | 1  |
 | 1   | 3| 1  | NULL   |
 | 1   | 5| NULL   | NULL   |
 | 3   | 1| NULL   | 1  |
 But when I run second query (query no. 2):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
 {quote}
 I get different (and bad, in my opinion) result:
 |log.id | log.dest_id | com1.msgs | com2.msgs|
 |1|2|1|1|
 |1|3|1|1|
 |1|5|NULL|NULL|
 |3|1|NULL|NULL|
 Query no. 1 and query no. 2 are different in only one place, it is second 
 join condition:
 bf. com2.dest_id=log.id and com2.id=log.dest_id
 vs
 bf. com2.id=log.dest_id and com2.dest_id=log.id
 which in my opinion are equal.
 Explains for both queries are of course slightly different (columns are 
 swapped) and they are here:
 https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
 https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order

2014-12-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250136#comment-14250136
 ] 

Ashutosh Chauhan commented on HIVE-9146:


you might be hitting into HIVE-8298 can you test your queries on Hive 0.14 and 
post your findings here.

 Query with left joins produces wrong result when join condition is written in 
 different order
 -

 Key: HIVE-9146
 URL: https://issues.apache.org/jira/browse/HIVE-9146
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Kamil Gorlo

 I have two queries which should be equal (I only swap two join conditions) 
 but they are not. They are simplest queries I could produce to reproduce bug.
 I have two simple tables:
 desc kgorlo_comm;
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 desc kgorlo_log; 
 | col_name  | data_type  | comment  |
 | id| bigint |  |
 | dest_id   | bigint |  |
 | tstamp| bigint |  |
 With data:
 select * from kgorlo_comm; 
 | kgorlo_comm.id  | kgorlo_comm.dest_id  |
 | 1   | 2|
 | 2   | 1|
 | 1   | 3|
 | 2   | 3|
 | 3   | 5|
 | 4   | 5|
 select * from kgorlo_log; 
 | kgorlo_log.id  | kgorlo_log.dest_id  | kgorlo_log.tstamp  |
 | 1  | 2   | 0  |
 | 1  | 3   | 0  |
 | 1  | 5   | 0  |
 | 3  | 1   | 0  |
 And when I run this query (query no. 1):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
 {quote}
 I get result (which is correct):
 | log.id  | log.dest_id  | com1.msgs  | com2.msgs  |
 | 1   | 2| 1  | 1  |
 | 1   | 3| 1  | NULL   |
 | 1   | 5| NULL   | NULL   |
 | 3   | 1| NULL   | 1  |
 But when I run second query (query no. 2):
 {quote}
 select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
 left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm 
 group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
 {quote}
 I get different (and bad, in my opinion) result:
 |log.id | log.dest_id | com1.msgs | com2.msgs|
 |1|2|1|1|
 |1|3|1|1|
 |1|5|NULL|NULL|
 |3|1|NULL|NULL|
 Query no. 1 and query no. 2 are different in only one place, it is second 
 join condition:
 bf. com2.dest_id=log.id and com2.id=log.dest_id
 vs
 bf. com2.id=log.dest_id and com2.dest_id=log.id
 which in my opinion are equal.
 Explains for both queries are of course slightly different (columns are 
 swapped) and they are here:
 https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
 https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)