Gopal V created HIVE-21531:
------------------------------

             Summary: Vectorization: all NULL hashcodes are not computed using 
Murmur3
                 Key: HIVE-21531
                 URL: https://issues.apache.org/jira/browse/HIVE-21531
             Project: Hive
          Issue Type: Bug
            Reporter: Gopal V


The comments in Vectorized hash computation call out the MurmurHash 
implementation (the one using 0x5bd1e995), while the non-vectorized codepath 
calls out the Murmur3 one (using 0xcc9e2d51).

The comments here are wrong

{code}
 /**
   * Batch compute the hash codes for all the serialized keys.
   *
   * NOTE: MAJOR MAJOR ASSUMPTION:
   *     We assume that HashCodeUtil.murmurHash produces the same result
   *     as MurmurHash.hash with seed = 0 (the method used by 
ReduceSinkOperator for
   *     UNIFORM distribution).
   */
  protected void computeSerializedHashCodes() {
    int offset = 0;
    int keyLength;
    byte[] bytes = output.getData();
    for (int i = 0; i < nonNullKeyCount; i++) {
      keyLength = serializedKeyLengths[i];
      hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0);
      offset += keyLength;
    }
  }
{code}

but the wrong comment is followed in the Vector RS operator 

{code}
      System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0, 
nullBytesLength);
      nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0, 
nullBytesLength);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to