Re: [PR] [SPARK-54918][SQL] Normalize floating numbers in array set operations [spark]

via GitHub Tue, 26 May 2026 05:44:01 -0700


asugranyes commented on code in PR #53695:
URL: https://github.com/apache/spark/pull/53695#discussion_r3303700927



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala:
##########
@@ -45,8 +47,11 @@ import org.apache.spark.util.ArrayImplicits._
  * binary `UnsafeRow` and compare the binary data directly. Different NaNs 
have different binary
  * representation, and the same thing happens for -0.0 and 0.0.
  *
- * This rule normalizes NaN and -0.0 in window partition keys, join keys and 
aggregate grouping
- * keys.
+ * Case 5 is problematic for a similar reason: hash-based array set operations 
compare elements by
+ * their binary representation via hash sets.
+ *
+ * This rule normalizes NaN and -0.0 in window partition keys, join keys, 
aggregate grouping
+ * keys, and the inputs of array set operations.

Review Comment:
   Addressed the remaining doc nit and updated the docstring to reflect the two 
invocation points (FinishAnalysis + late optimizer batch).
   
   The FinishAnalysis placement ends up being much cleaner architecturally than 
maintaining local bypasses across eager evaluation rules, and the idempotence 
tests now become much more meaningful with the dual-trigger setup in place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54918][SQL] Normalize floating numbers in array set operations [spark]

Reply via email to