Krisztian Kasa created HIVE-26452: ------------------------------------- Summary: NPE when converting join to mapjoin and join column referenced more than once Key: HIVE-26452 URL: https://issues.apache.org/jira/browse/HIVE-26452 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa
{code} explain select count(*) from LU_CUSTOMER pa11 join ORDER_FACT a15 on (pa11.CUSTOMER_ID = a15.CUSTOMER_ID) join LU_CUSTOMER a16 on (a15.CUSTOMER_ID = a16.CUSTOMER_ID and pa11.CUSTOMER_ID = a16.CUSTOMER_ID); {code} {{a16.CUSTOMER_ID}} is referenced more than once in the join condition. Hive generates Reduce sink operators for the join's children and one of the RS row schema contains only one instance of the join keys (customer_id). {code} RS[13] result = {HashMap@16092} size = 2 "KEY.reducesinkkey0" -> {ExprNodeColumnDesc@16083} "Column[_col0]" "KEY.reducesinkkey1" -> {ExprNodeColumnDesc@16102} "Column[_col0]" result = {RowSchema@16104} "(KEY.reducesinkkey0: int|{$hdt$_2}customer_id)" signature = {ArrayList@16110} size = 1 0 = {ColumnInfo@16087} "KEY.reducesinkkey0: int" {code} {{KEY.reducesinkkey1}} is missing from the schema. When converting the join to mapjoin the converter algorithm fails looking up both join key column instances. https://github.com/apache/hive/blob/2aaba3c79e740ef27fc263b5a8ff33ad679c5a12/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java#L538 -- This message was sent by Atlassian Jira (v8.20.10#820010)