asugranyes commented on PR #53695:
URL: https://github.com/apache/spark/pull/53695#issuecomment-4532718404

   > ### The redirect
   > 
   > Add ~5 case clauses to `NormalizeFloatingNumbers`:
   > 
   > ```scala
   > case e: ArrayDistinct if needNormalize(e.child.dataType) =>
   >   e.copy(child = normalize(e.child))
   > case e: ArrayUnion    if needNormalize(e.left.dataType) =>
   >   e.copy(left = normalize(e.left), right = normalize(e.right))
   > // + ArrayIntersect, ArrayExcept, ArraysOverlap
   > ```
   > 
   > For `ArrayType(DoubleType)`, `normalize` 
(`NormalizeFloatingNumbers.scala:149-153`) already wraps the input with 
`ArrayTransform(child, x => NormalizeNaNAndZero(x))` — i.e., produces a 
pre-normalized array. The four expressions then operate on already-normalized 
input with no changes.
   
   Hi @cloud-fan , the redirect to NormalizeFloatingNumbers is in.
   
   The rule now normalizes array inputs for the five hash/set-like array 
expressions: array_distinct, array_union, array_intersect, array_except, and 
arrays_overlap.
   
   Tests cover both layers:
   
   * NormalizeFloatingPointNumbersSuite for the optimizer rewrite
   * DataFrameFunctionsSuite for runtime semantics using 
Double.doubleToRawLongBits
   
   CI is green.
   
   One scope note worth flagging: the rule runs late in the optimizer, so 
expressions rewritten or eliminated by earlier optimizer rules bypass it. In 
particular:
   
   * ConstantFolding folds all-literal expressions (e.g. SELECT 
array_distinct(array(-0.0D, 0.0D)))
   * ConvertToLocalRelation collapses Projects over in-memory LocalRelations
   
   Queries are covered as long as the expression reaches 
NormalizeFloatingNumbers in the optimized plan, as verified by the tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to