Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

via GitHub Wed, 07 Feb 2024 16:22:09 -0800


revans2 commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482265980



##########
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##########
@@ -269,4 +269,42 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
       assert(pos1 == pos2)
     }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {
+    // Therefore, 0.0 and -0.0 should get separate entries in the hash set.
+    //
+    // Exactly these elements provided in roughly this order will trigger the 
following scenario:
+    // When probing the bitset in `getPos(-0.0)`, the loop will happen upon 
the entry for 0.0.
+    // In the old logic pre-SPARK-45599, the loop will find that the bit is 
set and, because
+    // -0.0 == 0.0, it will think that's the position of -0.0. But in reality 
this is the position
+    // of 0.0. So -0.0 and 0.0 will be stored at different positions, but 
`getPos()` will return
+    // the same position for them. This can cause users of OpenHashSet, like 
OpenHashMap, to
+    // return the wrong value for a key based on whether or not this bitset 
lookup collision
+    // happens.

Review Comment:
   I found it because I was essentially running fuzz testing comparing 
https://github.com/NVIDIA/spark-rapids to the gold standard Apache Spark. Most 
failures that we find end up being issues with the RAPIDs Accelerator for 
Apache Spark, but every so often I find something that ends up being wrong with 
Spark, like in this case.  So yes it was just pure luck that we found it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

Reply via email to