Zhihua Deng created HIVE-24575:
----------------------------------

             Summary: VectorGroupByOperator reusing keys can lead to wrong 
results
                 Key: HIVE-24575
                 URL: https://issues.apache.org/jira/browse/HIVE-24575
             Project: Hive
          Issue Type: Bug
          Components: Vectorization
            Reporter: Zhihua Deng
            Assignee: Zhihua Deng


 A common sql like
{code:java}
select category as category, count(distinct maskdid) as uv from 
dwd_internal_inc_d group by category{code}
can have a wrong result on the trunk,  the result of column category can be 
confused and
aggregate of distinct maskdid is also wrong. 
After some debugging, We find that the problem is caused by wrong byteStarts[i] 
when using it to copy the current keys to the reusable keys: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362]
The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies 
the range from 0 other then the real start index to len of the current keys to 
the reusable keys when clone.byteValues[i].length >= byteValues[i].length met, 
which results to the problem.
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to