[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251495#comment-14251495 ] Kamil Gorlo commented on HIVE-9146: --- I've tested in on HDP 2.2 with Hive 0.14 and in fact everything is working as expected. Thanks. Query with left joins produces wrong result when join condition is written in different order - Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Kamil Gorlo I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9146) Query with left joins produces wrong result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250136#comment-14250136 ] Ashutosh Chauhan commented on HIVE-9146: you might be hitting into HIVE-8298 can you test your queries on Hive 0.14 and post your findings here. Query with left joins produces wrong result when join condition is written in different order - Key: HIVE-9146 URL: https://issues.apache.org/jira/browse/HIVE-9146 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Kamil Gorlo I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug. I have two simple tables: desc kgorlo_comm; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | desc kgorlo_log; | col_name | data_type | comment | | id| bigint | | | dest_id | bigint | | | tstamp| bigint | | With data: select * from kgorlo_comm; | kgorlo_comm.id | kgorlo_comm.dest_id | | 1 | 2| | 2 | 1| | 1 | 3| | 2 | 3| | 3 | 5| | 4 | 5| select * from kgorlo_log; | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp | | 1 | 2 | 0 | | 1 | 3 | 0 | | 1 | 5 | 0 | | 3 | 1 | 0 | And when I run this query (query no. 1): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id; {quote} I get result (which is correct): | log.id | log.dest_id | com1.msgs | com2.msgs | | 1 | 2| 1 | 1 | | 1 | 3| 1 | NULL | | 1 | 5| NULL | NULL | | 3 | 1| NULL | 1 | But when I run second query (query no. 2): {quote} select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id; {quote} I get different (and bad, in my opinion) result: |log.id | log.dest_id | com1.msgs | com2.msgs| |1|2|1|1| |1|3|1|1| |1|5|NULL|NULL| |3|1|NULL|NULL| Query no. 1 and query no. 2 are different in only one place, it is second join condition: bf. com2.dest_id=log.id and com2.id=log.dest_id vs bf. com2.id=log.dest_id and com2.dest_id=log.id which in my opinion are equal. Explains for both queries are of course slightly different (columns are swapped) and they are here: https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good) https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad) -- This message was sent by Atlassian JIRA (v6.3.4#6332)