asugranyes commented on code in PR #53695: URL: https://github.com/apache/spark/pull/53695#discussion_r3303700927
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala: ########## @@ -45,8 +47,11 @@ import org.apache.spark.util.ArrayImplicits._ * binary `UnsafeRow` and compare the binary data directly. Different NaNs have different binary * representation, and the same thing happens for -0.0 and 0.0. * - * This rule normalizes NaN and -0.0 in window partition keys, join keys and aggregate grouping - * keys. + * Case 5 is problematic for a similar reason: hash-based array set operations compare elements by + * their binary representation via hash sets. + * + * This rule normalizes NaN and -0.0 in window partition keys, join keys, aggregate grouping + * keys, and the inputs of array set operations. Review Comment: Addressed the remaining doc nit and updated the docstring to reflect the two invocation points (FinishAnalysis + late optimizer batch). The FinishAnalysis placement ends up being much cleaner architecturally than maintaining local bypasses across eager evaluation rules, and the idempotence tests now become much more meaningful with the dual-trigger setup in place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
