asugranyes commented on PR #53695: URL: https://github.com/apache/spark/pull/53695#issuecomment-4532718404
> ### The redirect > > Add ~5 case clauses to `NormalizeFloatingNumbers`: > > ```scala > case e: ArrayDistinct if needNormalize(e.child.dataType) => > e.copy(child = normalize(e.child)) > case e: ArrayUnion if needNormalize(e.left.dataType) => > e.copy(left = normalize(e.left), right = normalize(e.right)) > // + ArrayIntersect, ArrayExcept, ArraysOverlap > ``` > > For `ArrayType(DoubleType)`, `normalize` (`NormalizeFloatingNumbers.scala:149-153`) already wraps the input with `ArrayTransform(child, x => NormalizeNaNAndZero(x))` — i.e., produces a pre-normalized array. The four expressions then operate on already-normalized input with no changes. Hi @cloud-fan , the redirect to NormalizeFloatingNumbers is in. The rule now normalizes array inputs for the five hash/set-like array expressions: array_distinct, array_union, array_intersect, array_except, and arrays_overlap. Tests cover both layers: * NormalizeFloatingPointNumbersSuite for the optimizer rewrite * DataFrameFunctionsSuite for runtime semantics using Double.doubleToRawLongBits CI is green. One scope note worth flagging: the rule runs late in the optimizer, so expressions rewritten or eliminated by earlier optimizer rules bypass it. In particular: * ConstantFolding folds all-literal expressions (e.g. SELECT array_distinct(array(-0.0D, 0.0D))) * ConvertToLocalRelation collapses Projects over in-memory LocalRelations Queries are covered as long as the expression reaches NormalizeFloatingNumbers in the optimized plan, as verified by the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
