Zhihua Deng created HIVE-24575: ---------------------------------- Summary: VectorGroupByOperator reusing keys can lead to wrong results Key: HIVE-24575 URL: https://issues.apache.org/jira/browse/HIVE-24575 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Zhihua Deng Assignee: Zhihua Deng
A common sql like {code:java} select category as category, count(distinct maskdid) as uv from dwd_internal_inc_d group by category{code} can have a wrong result on the trunk, the result of column category can be confused and aggregate of distinct maskdid is also wrong. After some debugging, We find that the problem is caused by wrong byteStarts[i] when using it to copy the current keys to the reusable keys: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362] The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies the range from 0 other then the real start index to len of the current keys to the reusable keys when clone.byteValues[i].length >= byteValues[i].length met, which results to the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)