yadavay-amzn commented on PR #56244:
URL: https://github.com/apache/spark/pull/56244#issuecomment-4795027584

   Thanks for the careful review, @peter-toth - both points were spot on.
   
   **Scope:** narrowed `canDecompose` to fire only when one operand is foldable 
and the other is `CollapseProject.isCheap`. So `struct_col1 = struct_col2` (no 
pushdown benefit) and `udf() = literal` (non-cheap, would duplicate the UDF) no 
longer decompose; it now only produces the pushable, non-duplicating `col <=> 
literal` shape it was meant for.
   
   **NULL correctness:** you're right the pushable shortcut dropped the 
whole-null vs all-null-fields distinction. I now AND `IsNotNull(<nullable 
operand>)` into the conjunction on both the EqualTo and EqualNullSafe pushable 
paths (it pushes down too), so an `s IS NULL` row is correctly excluded for `s 
= named_struct('a', null)`. Verified with eval-level tests and end-to-end 
`assertSameUnderRule` over Parquet data containing genuine null structs (these 
fail without the guard). The non-pushable paths were already correct and are 
unchanged.
   
   Also bumped both confs to 4.3.0. PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to