Hi

I've recently noticed incorrect behaviour from Hive Query Planner. The
simplest example I could construct is as follows

SELECT
tbl3.col2 AS current_regularity_streak
FROM (select 1 col1) tbl1
LEFT JOIN
(select 1 col1) tbl2 ON tbl1.col1 = tbl2.col1
LEFT JOIN (select 1 col1, 1 col2) tbl3 ON tbl1.col1 = tbl3.col1
WHERE
tbl1.col1 in (select 1 col1 union all select 2)
AND
tbl3.col2 >= 2


The query should logically return no rows, but it does! It returns 1 row
with 1 column, with value = 1. The value=1 should have been filtered out by
the filter tbl3.col2 >= 2
  So  df

On further examination, I believe the culprit is the IN clause. If I remove
this, the query works correctly and returns 0 rows.

SELECT
tbl3.col2 AS current_regularity_streak
FROM (select 1 col1) tbl1
LEFT JOIN
(select 1 col1) tbl2 ON tbl1.col1 = tbl2.col1
LEFT JOIN (select 1 col1, 1 col2) tbl3 ON tbl1.col1 = tbl3.col1
WHERE
tbl3.col2 >= 2

Is this a known issue? I couldn't find anything on JIRA/Stack
overflow/Google.

I further analyzed using EXPLAIN FORMATTED and noticed that the plan of the
first query doesn't contain the >=2 predicate. The plan of the second query
does. I wonder how the planner could omit the filter clause altogether?

I'm using HDP 2.6.5.10-2.

Reply via email to