LuciferYang commented on code in PR #48248:
URL: https://github.com/apache/spark/pull/48248#discussion_r1776295926
##########
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala:
##########
@@ -266,7 +266,7 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T:
ClassTag](
/**
* Re-hash a value to deal better with hash functions that don't differ in
the lower bits.
*/
- private def hashcode(h: Int): Int = Hashing.murmur3_32().hashInt(h).asInt()
+ private def hashcode(h: Int): Int =
Hashing.murmur3_32_fixed().hashInt(h).asInt()
Review Comment:
https://github.com/google/guava/blob/3c7c173e9c6ac93f154bfe40876f0c792d849f6e/guava/src/com/google/common/hash/Hashing.java#L117-L133
```java
/**
* Returns a hash function implementing the <a
*
href="https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp">32-bit
murmur3
* algorithm, x86 variant</a> (little-endian variant), using the given
seed value, <b>with a known
* bug</b> as described in the deprecation text.
*
* <p>The C++ equivalent is the MurmurHash3_x86_32 function (Murmur3A),
which however does not
* have the bug.
*
* @deprecated This implementation produces incorrect hash values from the
{@link
* HashFunction#hashString} method if the string contains non-BMP
characters. Use {@link
* #murmur3_32_fixed()} instead.
*/
@Deprecated
public static HashFunction murmur3_32() {
return Murmur3_32HashFunction.MURMUR3_32;
}
```
Yes, but this is the official fix provided, and there seems to be no other
equivalent alternative.
@pan3793 any better suggestions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]