LuciferYang commented on code in PR #48248:
URL: https://github.com/apache/spark/pull/48248#discussion_r1777961544


##########
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala:
##########
@@ -266,7 +266,7 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: 
ClassTag](
   /**
    * Re-hash a value to deal better with hash functions that don't differ in 
the lower bits.
    */
-  private def hashcode(h: Int): Int = Hashing.murmur3_32().hashInt(h).asInt()
+  private def hashcode(h: Int): Int = 
Hashing.murmur3_32_fixed().hashInt(h).asInt()

Review Comment:
   @pan3793 
   
   I think this change is safe for Spark. The difference between `MURMUR3_32` 
and `MURMUR3_32_FIXED` lies in the different `supplementaryPlaneFix` parameters 
passed when constructing the `Murmur3_32HashFunction`:
   
   
https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L56-L59
   
   ```java
     static final HashFunction MURMUR3_32 =
         new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ false);
     static final HashFunction MURMUR3_32_FIXED =
         new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ true);
   ```
   
   However, the `supplementaryPlaneFix` parameter is only used in 
`Murmur3_32HashFunction#hashString`, and Spark only utilizes 
`Murmur3_32HashFunction#hashInt`. Therefore, there will be no logical changes 
to this method after this change.
   
   
https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L108-L114
   
   ```java
     @Override
     public HashCode hashInt(int input) {
       int k1 = mixK1(input);
       int h1 = mixH1(seed, k1);
   
       return fmix(h1, Ints.BYTES);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to