Hi I've recently noticed incorrect behaviour from Hive Query Planner. The simplest example I could construct is as follows
SELECT tbl3.col2 AS current_regularity_streak FROM (select 1 col1) tbl1 LEFT JOIN (select 1 col1) tbl2 ON tbl1.col1 = tbl2.col1 LEFT JOIN (select 1 col1, 1 col2) tbl3 ON tbl1.col1 = tbl3.col1 WHERE tbl1.col1 in (select 1 col1 union all select 2) AND tbl3.col2 >= 2 The query should logically return no rows, but it does! It returns 1 row with 1 column, with value = 1. The value=1 should have been filtered out by the filter tbl3.col2 >= 2 So df On further examination, I believe the culprit is the IN clause. If I remove this, the query works correctly and returns 0 rows. SELECT tbl3.col2 AS current_regularity_streak FROM (select 1 col1) tbl1 LEFT JOIN (select 1 col1) tbl2 ON tbl1.col1 = tbl2.col1 LEFT JOIN (select 1 col1, 1 col2) tbl3 ON tbl1.col1 = tbl3.col1 WHERE tbl3.col2 >= 2 Is this a known issue? I couldn't find anything on JIRA/Stack overflow/Google. I further analyzed using EXPLAIN FORMATTED and noticed that the plan of the first query doesn't contain the >=2 predicate. The plan of the second query does. I wonder how the planner could omit the filter clause altogether? I'm using HDP 2.6.5.10-2.