Krisztian Kasa created HIVE-26452:
-------------------------------------

             Summary: NPE when converting join to mapjoin and join column 
referenced more than once
                 Key: HIVE-26452
                 URL: https://issues.apache.org/jira/browse/HIVE-26452
             Project: Hive
          Issue Type: Bug
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa


{code}
explain
select count(*)
from LU_CUSTOMER pa11
      join        ORDER_FACT        a15
      on         (pa11.CUSTOMER_ID = a15.CUSTOMER_ID)
      join        LU_CUSTOMER        a16
      on         (a15.CUSTOMER_ID = a16.CUSTOMER_ID and pa11.CUSTOMER_ID = 
a16.CUSTOMER_ID);
{code}
{{a16.CUSTOMER_ID}} is referenced more than once in the join condition.

Hive generates Reduce sink operators for the join's children and one of the RS 
row schema contains only one instance of the join keys (customer_id).
{code}
RS[13]                    
result = {HashMap@16092}  size = 2
 "KEY.reducesinkkey0" -> {ExprNodeColumnDesc@16083} "Column[_col0]"
 "KEY.reducesinkkey1" -> {ExprNodeColumnDesc@16102} "Column[_col0]"             
       
 
 
result = {RowSchema@16104} "(KEY.reducesinkkey0: int|{$hdt$_2}customer_id)"
 signature = {ArrayList@16110}  size = 1
  0 = {ColumnInfo@16087} "KEY.reducesinkkey0: int"
{code}

{{KEY.reducesinkkey1}} is missing from the schema.

When converting the join to mapjoin the converter algorithm fails looking up 
both join key column instances.

https://github.com/apache/hive/blob/2aaba3c79e740ef27fc263b5a8ff33ad679c5a12/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java#L538



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to