PENG Zhengshuai created KYLIN-4083:
--------------------------------------

             Summary: Fact Distinct Column Step, UHC column may loose value 
when the hashcode of value is Integer.MIN_VALUE
                 Key: KYLIN-4083
                 URL: https://issues.apache.org/jira/browse/KYLIN-4083
             Project: Kylin
          Issue Type: Bug
            Reporter: PENG Zhengshuai
            Assignee: PENG Zhengshuai


In the Fact Distinct Column Step, kylin uses MR to reduce the values of columns.
If the column is UHC (ultra high cardinality) column and the value of the 
property *kylin.engine.mr.uhc-reducer-count* has been set greater than *1*, the 
Mapper task will write the output of UHC column values to different reducers by 
*FactDistinctColumnPartitioner*

The reducer id will be calculated by hash, the implementation in 
*FactDistinctColumnsReducerMapping#getReducerIdForCol *,  in this method, *the 
reducer id = reducerBeginIndex + Math.abs(value.hashCode()) % uhcReducerCount*

When the value.hashCode() is Integer.MIN_VALUE, the reducer id may return a 
negative value. This may cause the FactDistinctColumn step failed, or the UHC 
column value may be redirected to another reducer which not belongs to UHC 
column



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to