Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-27 Thread via GitHub


nchammas commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1967915028

   Backport to 3.5: #45296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-27 Thread via GitHub


nchammas commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1967413553

   @dongjoon-hyun - I think this warning category was added in Scala 2.13 and 
then backported to Scala 2.12. Trying to confirm right now over on 
https://github.com/scala/scala/pull/8120 and 
https://github.com/scala/scala/pull/10446.
   
   I see that Spark 3.5 supports both Scala 2.12 and 2.13, so I will figure out 
some method that is compatible across both versions and open a PR against 
`branch-3.5`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-27 Thread via GitHub


dongjoon-hyun commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1967045615

   Sorry but I reverted this from `branch-3.5` to unblock the community PRs.
   - 
https://github.com/apache/spark/commit/cbf25fb633f4bf2f83a6f6e39aafaa80bf47e160
   
   Please make a backporting PR to branch-3.5 again after fixing the above 
annotation compilation. Thank you in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-27 Thread via GitHub


dongjoon-hyun commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1967032385

   Hi, @cloud-fan and @nchammas .
   
   This broke Apache Spark branch-3.5.
   
   ![Screenshot 2024-02-27 at 08 38 
03](https://github.com/apache/spark/assets/9700541/55560e72-aa6a-4aeb-8f23-66cb6ffd327c)
   
   ```
   [error] 
/home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala:136:4:
 Invalid message filter:
   [error] Unknown category: `other-non-cooperative-equals`
   [error]   @annotation.nowarn("cat=other-non-cooperative-equals")
   [error]^
   [error] one error found
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-26 Thread via GitHub


cloud-fan closed pull request #45036: [SPARK-45599][CORE] Use object equality 
in OpenHashSet
URL: https://github.com/apache/spark/pull/45036


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-26 Thread via GitHub


cloud-fan commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1963523699

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-22 Thread via GitHub


nchammas commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1959816906

   Is there anything more we're waiting on here? (Perhaps just an extra 
confirmation from @revans2 that the original [fuzz test he ran][1] now passes?)
   
   [1]: https://github.com/apache/spark/pull/45036#discussion_r1482265980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


peter-toth commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1947853367

   > `SQLOpenHashSet` also handles null differently. Not sure if `OpenHashSet` 
already covers it.
   
   Yes, you are right. Probably the `NaN` special handling can be remvoved 
though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


cloud-fan commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1947841579

   `SQLOpenHashSet` also handles null differently. Not sure if `OpenHashSet` 
already covers it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


peter-toth commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1947835261

   If we go this direction and change `OpenHashSet` do we still need 
`SQLOpenHashSet`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


nchammas commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1947785597

   @revans2 - Just to confirm, have you rerun your tests against this branch 
(the ones that exposed this bug) and do they pass now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1491975003


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala:
##
@@ -249,4 +249,34 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers 
{
 map(null) = null
 assert(map.get(null) === Some(null))
   }
+
+  test("SPARK-45599: 0.0 and -0.0 should count distinctly") {
+// Exactly these elements provided in roughly this order trigger a 
condition where lookups of
+// 0.0 and -0.0 in the bitset happen to collide, causing their counts to 
be merged incorrectly
+// and inconsistently if `==` is used to check for key equality.

Review Comment:
   I tweaked the test name. Is that what you had in mind?
   
   This comment explains why we need exactly the following elements to trigger 
the 0.0/-0.0 miscount. It doesn't always happen (which is part of what kept 
this bug hidden for so long).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


cloud-fan commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1491931994


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala:
##
@@ -249,4 +249,34 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers 
{
 map(null) = null
 assert(map.get(null) === Some(null))
   }
+
+  test("SPARK-45599: 0.0 and -0.0 should count distinctly") {
+// Exactly these elements provided in roughly this order trigger a 
condition where lookups of
+// 0.0 and -0.0 in the bitset happen to collide, causing their counts to 
be merged incorrectly
+// and inconsistently if `==` is used to check for key equality.

Review Comment:
   shall we mention the NaN behavior as well? All NaN values are all the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1491176098


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala:
##
@@ -249,4 +249,32 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers 
{
 map(null) = null
 assert(map.get(null) === Some(null))
   }
+
+  test("SPARK-45599: 0.0 and -0.0 should count distinctly") {
+// Exactly these elements provided in roughly this order trigger a 
condition where lookups of
+// 0.0 and -0.0 in the bitset happen to collide, causing their counts to 
be merged incorrectly
+// and inconsistently if `==` is used to check for key equality.
+val spark45599Repro = Seq(
+  Double.NaN,
+  2.0,
+  168.0,
+  Double.NaN,
+  Double.NaN,
+  -0.0,
+  153.0,
+  0.0
+)
+
+val map1 = new OpenHashMap[Double, Int]()
+spark45599Repro.foreach(map1.changeValue(_, 1, {_ + 1}))
+assert(map1(0.0) == 1)
+assert(map1(-0.0) == 1)
+
+val map2 = new OpenHashMap[Double, Int]()
+// Simply changing the order in which the elements are added to the map 
should not change the
+// counts for 0.0 and -0.0.
+spark45599Repro.reverse.foreach(map2.changeValue(_, 1, {_ + 1}))
+assert(map2(0.0) == 1)
+assert(map2(-0.0) == 1)

Review Comment:
   Added just below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1491158277


##
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala:
##
@@ -110,6 +110,18 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) 
T: ClassTag](
 this
   }
 
+  /**
+   * Check if a key exists at the provided position using object equality 
rather than
+   * cooperative equality. Otherwise, hash sets will mishandle values for 
which `==`
+   * and `equals` return different results, like 0.0/-0.0 and NaN/NaN.

Review Comment:
   Yes, the differences are subtle:
   
   ```scala
   scala> 0.0 == -0.0
   val res0: Boolean = true
   
   scala> 0.0 equals -0.0
   val res1: Boolean = false
   
   scala> Double.NaN == Double.NaN
   val res2: Boolean = false
   
   scala> Double.NaN equals Double.NaN
   val res3: Boolean = true
   ```
   
   There is a long discussion on the Scala forums from 2017 about this 
difference and some of the problems it causes:
   
   [Can we get rid of cooperative 
equality?](https://contributors.scala-lang.org/t/can-we-get-rid-of-cooperative-equality/1131)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1491125038


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,43 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {
+// Therefore, 0.0 and -0.0 should get separate entries in the hash set.
+//
+// Exactly these elements provided in roughly this order will trigger the 
following scenario:
+// When probing the bitset in `getPos(-0.0)`, the loop will happen upon 
the entry for 0.0.
+// In the old logic pre-SPARK-45599, the loop will find that the bit is 
set and, because
+// -0.0 == 0.0, it will think that's the position of -0.0. But in reality 
this is the position
+// of 0.0. So -0.0 and 0.0 will be stored at different positions, but 
`getPos()` will return
+// the same position for them. This can cause users of OpenHashSet, like 
OpenHashMap, to
+// return the wrong value for a key based on whether or not this bitset 
lookup collision
+// happens.
+val spark45599Repro = Seq(
+  Double.NaN,
+  2.0,
+  168.0,
+  Double.NaN,
+  Double.NaN,
+  -0.0,
+  153.0,
+  0.0
+)
+val set = new OpenHashSet[Double]()
+spark45599Repro.foreach(set.add)
+assert(set.size == 6)
+val zeroPos = set.getPos(0.0)
+val negZeroPos = set.getPos(-0.0)
+assert(zeroPos != negZeroPos)
+  }
+
+  test("SPARK-45599: NaN and NaN are the same but not equal") {
+// Any mathematical comparison to NaN will return false, but when we place 
it in
+// a hash set we want the lookup to work like a "normal" value.
+val set = new OpenHashSet[Double]()
+set.add(Double.NaN)
+set.add(Double.NaN)
+assert(set.contains(Double.NaN))
+assert(set.size == 1)

Review Comment:
   Yes sir. On `master`, this is the actual behavior of `OpenHashSet`:
   
   ```scala
   // ...OpenHashSet$mcD$sp@21b327e6 did not contain NaN
   assert(set.contains(Double.NaN))
   
   // ...OpenHashSet$mcD$sp@1f09db1e had size 2 instead of expected size 1
   assert(set.size == 1)
   ```
   
   Every NaN will get its own entry in `OpenHashSet` on `master`. So if we add 
1,000,000 NaNs to the set, NaN will have 1,000,000 entries in there. And 
`.contains()` will _still_ return `false`. :D



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


cloud-fan commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1490973900


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,43 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {
+// Therefore, 0.0 and -0.0 should get separate entries in the hash set.
+//
+// Exactly these elements provided in roughly this order will trigger the 
following scenario:
+// When probing the bitset in `getPos(-0.0)`, the loop will happen upon 
the entry for 0.0.
+// In the old logic pre-SPARK-45599, the loop will find that the bit is 
set and, because
+// -0.0 == 0.0, it will think that's the position of -0.0. But in reality 
this is the position
+// of 0.0. So -0.0 and 0.0 will be stored at different positions, but 
`getPos()` will return
+// the same position for them. This can cause users of OpenHashSet, like 
OpenHashMap, to
+// return the wrong value for a key based on whether or not this bitset 
lookup collision
+// happens.
+val spark45599Repro = Seq(
+  Double.NaN,
+  2.0,
+  168.0,
+  Double.NaN,
+  Double.NaN,
+  -0.0,
+  153.0,
+  0.0
+)
+val set = new OpenHashSet[Double]()
+spark45599Repro.foreach(set.add)
+assert(set.size == 6)
+val zeroPos = set.getPos(0.0)
+val negZeroPos = set.getPos(-0.0)
+assert(zeroPos != negZeroPos)
+  }
+
+  test("SPARK-45599: NaN and NaN are the same but not equal") {
+// Any mathematical comparison to NaN will return false, but when we place 
it in
+// a hash set we want the lookup to work like a "normal" value.
+val set = new OpenHashSet[Double]()
+set.add(Double.NaN)
+set.add(Double.NaN)
+assert(set.contains(Double.NaN))
+assert(set.size == 1)

Review Comment:
   do we actually waste space in `OpenHashSet` to store all the NaN values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


cloud-fan commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1490972974


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala:
##
@@ -249,4 +249,32 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers 
{
 map(null) = null
 assert(map.get(null) === Some(null))
   }
+
+  test("SPARK-45599: 0.0 and -0.0 should count distinctly") {
+// Exactly these elements provided in roughly this order trigger a 
condition where lookups of
+// 0.0 and -0.0 in the bitset happen to collide, causing their counts to 
be merged incorrectly
+// and inconsistently if `==` is used to check for key equality.
+val spark45599Repro = Seq(
+  Double.NaN,
+  2.0,
+  168.0,
+  Double.NaN,
+  Double.NaN,
+  -0.0,
+  153.0,
+  0.0
+)
+
+val map1 = new OpenHashMap[Double, Int]()
+spark45599Repro.foreach(map1.changeValue(_, 1, {_ + 1}))
+assert(map1(0.0) == 1)
+assert(map1(-0.0) == 1)
+
+val map2 = new OpenHashMap[Double, Int]()
+// Simply changing the order in which the elements are added to the map 
should not change the
+// counts for 0.0 and -0.0.
+spark45599Repro.reverse.foreach(map2.changeValue(_, 1, {_ + 1}))
+assert(map2(0.0) == 1)
+assert(map2(-0.0) == 1)

Review Comment:
   shall we test NaN  as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-15 Thread via GitHub


cloud-fan commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1490967547


##
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala:
##
@@ -110,6 +110,18 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) 
T: ClassTag](
 this
   }
 
+  /**
+   * Check if a key exists at the provided position using object equality 
rather than
+   * cooperative equality. Otherwise, hash sets will mishandle values for 
which `==`
+   * and `equals` return different results, like 0.0/-0.0 and NaN/NaN.

Review Comment:
   I was told that in scala `==` is the same as `equals`, but `eq` is a 
different operator. I need to refresh my knowledge now :)



##
core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala:
##
@@ -110,6 +110,18 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) 
T: ClassTag](
 this
   }
 
+  /**
+   * Check if a key exists at the provided position using object equality 
rather than
+   * cooperative equality. Otherwise, hash sets will mishandle values for 
which `==`
+   * and `equals` return different results, like 0.0/-0.0 and NaN/NaN.
+   *
+   * See: https://issues.apache.org/jira/browse/SPARK-45599
+   */
+  @annotation.nowarn("cat=other-non-cooperative-equals")
+  private def keyExistsAtPos(k: T, pos: Int) =
+// _data(pos) == k

Review Comment:
   we should remove it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-12 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1486895251


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   > Spark's `OpenHashSet` does not have to match `java.util.HashSet`. What 
matters is the SQL semantic.
   
   Whether or not `OpenHashSet` matches `java.util.HashSet`, I want to 
emphasize for the record that `OpenHashSet` mishandles 0.0/-0.0 and NaN. Its 
behavior is simply _incorrect_. [These][test1] [tests][test2] fail on `master` 
in ways that can only be described as bugs, regardless of whatever SQL 
semantics we want to preserve.
   
   [This comment][why] explains the root cause. Basically, it is a mistake to 
combine hash code-based lookups with cooperative equality, at least in the way 
we are doing it in `OpenHashSet`.
   
   [test1]: 
https://github.com/apache/spark/pull/45036/files#diff-09400ec633b1f1322c5f7b39dc4e87073b0b0435b60b9cff93388053be5083b6R274-R278
   [test2]: 
https://github.com/apache/spark/pull/45036/files#diff-894198a5fea34e5b7f07d0a4641eb09995315d5de3e0fded3743c15a3c8af405R308-R309
   [why]: 
https://github.com/apache/spark/pull/45036/files#diff-894198a5fea34e5b7f07d0a4641eb09995315d5de3e0fded3743c15a3c8af405R277-R283
   
   But I understand what you are saying. Fixing bugs in `OpenHashSet` doesn't 
help us if it also breaks users' SQL.
   
   > Can you highlight which functions/operators are using this `OpenHashSet` 
   
   I've updated the PR description with a summary of what uses `OpenHashSet`.
   
   As a side note, I believe that if we accept the change proposed here, we 
should be able to eliminate `SQLOpenHashSet`. `SQLOpenHashSet` was created 
specifically to work around the bugs in `OpenHashSet` that we are addressing in 
this PR. See #33955 and #33993.
   
   > and what is the impact of this change to the SQL semantic?
   
   I've updated the PR description with a diff of what tests pass or fail on 
`master` vs. this branch. Please take a look and let me know if you think we 
need any more tests. I know we are touching a sensitive code path and I 
appreciate the need for caution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


cloud-fan commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482324222


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   Spark's `OpenHashSet` does not have to match `java.util.HashSet`. What 
matters is the SQL semantic. Can you highlight which functions/operators are 
using this `OpenHashSet` and what is the impact of this change to the SQL 
semantic?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


revans2 commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482265980


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,42 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {
+// Therefore, 0.0 and -0.0 should get separate entries in the hash set.
+//
+// Exactly these elements provided in roughly this order will trigger the 
following scenario:
+// When probing the bitset in `getPos(-0.0)`, the loop will happen upon 
the entry for 0.0.
+// In the old logic pre-SPARK-45599, the loop will find that the bit is 
set and, because
+// -0.0 == 0.0, it will think that's the position of -0.0. But in reality 
this is the position
+// of 0.0. So -0.0 and 0.0 will be stored at different positions, but 
`getPos()` will return
+// the same position for them. This can cause users of OpenHashSet, like 
OpenHashMap, to
+// return the wrong value for a key based on whether or not this bitset 
lookup collision
+// happens.

Review Comment:
   I found it because I was essentially running fuzz testing comparing 
https://github.com/NVIDIA/spark-rapids to the gold standard Apache Spark. Most 
failures that we find end up being issues with the RAPIDs Accelerator for 
Apache Spark, but every so often I find something that ends up being wrong with 
Spark, like in this case.  So yes it was just pure luck that we found it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482237064


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala:
##
@@ -249,4 +249,34 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers 
{
 map(null) = null
 assert(map.get(null) === Some(null))
   }
+
+  test("SPARK-45599: 0.0 and -0.0 should count distinctly") {
+// Exactly these elements provided in roughly this order trigger a 
condition where lookups of
+// 0.0 and -0.0 in the bitset happen to collide, causing their counts to 
be merged incorrectly
+// and inconsistently if `==` is used to check for key equality.
+val spark45599Repro = Seq(
+  Double.NaN,
+  2.0,
+  168.0,
+  Double.NaN,
+  Double.NaN,
+  -0.0,
+  153.0,
+  0.0
+)
+
+val map1 = new OpenHashMap[Double, Int]()
+spark45599Repro.foreach(map1.changeValue(_, 1, {_ + 1}))
+map1.iterator.foreach(println)
+assert(map1(0.0) == 1)
+assert(map1(-0.0) == 1)
+
+val map2 = new OpenHashMap[Double, Int]()
+// Simply changing the order in which the elements are added to the map 
should not change the
+// counts for 0.0 and -0.0.
+spark45599Repro.reverse.foreach(map2.changeValue(_, 1, {_ + 1}))
+map2.iterator.foreach(println)
+assert(map2(0.0) == 1)
+assert(map2(-0.0) == 1)

Review Comment:
   This is another expression of the same bug that this PR addresses. If you 
run this test on `master`, you will see that the counts for 0.0 and -0.0 depend 
on the order in which the elements from `spark45599Repro` are added to the map.



##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,42 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {
+// Therefore, 0.0 and -0.0 should get separate entries in the hash set.
+//
+// Exactly these elements provided in roughly this order will trigger the 
following scenario:
+// When probing the bitset in `getPos(-0.0)`, the loop will happen upon 
the entry for 0.0.
+// In the old logic pre-SPARK-45599, the loop will find that the bit is 
set and, because
+// -0.0 == 0.0, it will think that's the position of -0.0. But in reality 
this is the position
+// of 0.0. So -0.0 and 0.0 will be stored at different positions, but 
`getPos()` will return
+// the same position for them. This can cause users of OpenHashSet, like 
OpenHashMap, to
+// return the wrong value for a key based on whether or not this bitset 
lookup collision
+// happens.

Review Comment:
   It really is amazing that @revans2 found this bug, because it depends on the 
set being a specific size and the 0.0 and -0.0 being inserted and then looked 
up in just the right order so that they happen to collide in the bitset.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482123330


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   Consider another interesting case where `java.util.HashSet` and 
`OpenHashSet` differ:
   
   ```scala
   scala> val h = new HashSet[Double]()
   val h: java.util.HashSet[Double] = []
   
   scala> h.add(Double.NaN)
   val res9: Boolean = true
   
   scala> h.add(Double.NaN)
   val res10: Boolean = false
   
   scala> h.contains(Double.NaN)
   val res11: Boolean = true
   
   scala> h.size()
   val res12: Int = 1
   ```
   
   On `master`, `OpenHashSet` does something obviously wrong:
   
   ```scala
   val set = new OpenHashSet[Double]()
   set.add(Double.NaN)
   set.add(Double.NaN)
   set.size  // returns 2
   set.contains(Double.NaN)  // returns false
   ```
   
   This could possibly lead to a bug like the one reported in SPARK-45599 but 
in reverse, where a new NaN row is added rather than dropped. I will see if I 
can construct such a scenario as a demonstration. But regardless, I think this 
behavior is incorrect by itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482125925


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   Note also that the docstring for `OpenHashSet` seems to imply that it is 
meant to be a faster but semantically equivalent alternative to 
`java.util.HashSet`:
   
   
https://github.com/apache/spark/blob/0d5c2ce427e06adfebf47bc446ff6646513ae750/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala#L31-L32
   
   If that's true, then we should perhaps add property based tests to ensure 
alignment between the two implementations, but I'll leave that as a potential 
future improvement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482123330


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   Consider another interesting case where `java.util.HashSet` and 
`OpenHashSet` differ:
   
   ```scala
   scala> val h = new HashSet[Double]()
   val h: java.util.HashSet[Double] = []
   
   scala> h.add(Double.NaN)
   val res9: Boolean = true
   
   scala> h.add(Double.NaN)
   val res10: Boolean = false
   
   scala> h.size()
   val res11: Int = 1
   ```
   
   On `master`, `OpenHashSet` does IMO the wrong thing:
   
   ```scala
   val set = new OpenHashSet[Double]()
   set.add(Double.NaN)
   set.add(Double.NaN)
   set.size  // returns 2
   ```
   
   This could possibly lead to a bug like the one reported in SPARK-45599 but 
in reverse, where a new NaN row is added rather than dropped. I will see if I 
can construct such a scenario as a demonstration. But regardless, I think this 
behavior is incorrect by itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482089432


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   >  In Spark, 0.0 == -0.0, and in GROUP BY, 0.0 and -0.0 are considered to be 
in the same group and normalized to 0.0.
   
   This PR does not change this behavior. I noticed, however, that we do not 
have any tests currently to check that -0.0 is normalized and grouped as you 
describe, so I went ahead and added such a test in 
2bfc60548db2c41f4c64b63d40a2652cb22732ab.
   
   Does this address your concern? Or are you suggesting that we should 
normalize -0.0 to 0.0 across the board?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482085088


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   > This is a bit tricky and it's better if we can find a reference system 
that defines this semantic.
   
   ```scala
   scala> import java.util.HashSet
   import java.util.HashSet
   
   scala> val h = new HashSet[Double]()
   val h: java.util.HashSet[Double] = []
   
   scala> h.add(0.0)
   val res0: Boolean = true
   
   scala> h.add(-0.0)
   val res1: Boolean = true
   
   scala> h.size()
   val res2: Int = 2
   ```
   
   The doc for [HashSet.add][1] states:
   
   > More formally, adds the specified element e to this set if this set 
contains no element e2 such that Objects.equals(e, e2). If this set already 
contains the element, the call leaves the set unchanged and returns false.
   
   In other words, `java.util.HashSet` uses `equals` and not `==`, and 
therefore it considers `0.0` and `-0.0` distinct elements.
   
   So this PR brings `OpenHashSet` more in line with the semantics of 
`java.util.HashSet`.
   
   [1]: 
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HashSet.html#add(E)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


peter-toth commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1932381997

   > I can hold cutting the tag of RC1 for this.
   
   Thanks @HeartSaVioR. BTW I don't think this should be a blocker of 3.5.1 as 
this is not a regression, but if we can find a quick fix for this issue it 
would be good to include it in the release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-07 Thread via GitHub


peter-toth commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1481726694


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   Yes, probably make sense if we just fix this particular issue as 
`OpenHashSet` is used at many other places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-06 Thread via GitHub


cloud-fan commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1480904118


##
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
   assert(pos1 == pos2)
 }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   This is a bit tricky and it's better if we can find a reference system that 
defines this semantic. In Spark, `0.0 == -0.0`, and in GROUP BY, 0.0 and -0.0 
are considered to be in the same group and normalized to 0.0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-06 Thread via GitHub


HeartSaVioR commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1931270883

   I have no expertise on this area so will leave the decision on merging the 
change to which version(s), and whether we want to safeguard or not, to 
introduce an escape-hatch on behavioral change. I can hold cutting the tag of 
RC1 for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-06 Thread via GitHub


peter-toth commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1930364725

   Yes, I was watching this ticket because it is a correctness issue.
   The fix looks good to me.
   
   cc @cloud-fan and @HeartSaVioR as it would be great to include the fix in 
3.5.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-06 Thread via GitHub


nchammas commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1930043679

   Build looks good.
   
   @revans2 and @peter-toth - What do you think? (Tagging you since I noticed 
you are watching the linked JIRA ticket.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

2024-02-05 Thread via GitHub


nchammas commented on PR #45036:
URL: https://github.com/apache/spark/pull/45036#issuecomment-1928693126

   `core/testOnly org.apache.spark.util.collection.*` passes on my machine, but 
I haven't run the full test suite yet. I will monitor the full test results 
here on GitHub.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org