utkarsh39 opened a new pull request #34263:
URL: https://github.com/apache/spark/pull/34263


   ### What changes were proposed in this pull request?
   The PR modifies `IsNotNull` constraint generation to generate constraints on 
the referenced nested field instead of generating a constraint on the top level 
nested type. See the following section for an example.
   
   ### Why are the changes needed?
   
[InferFiltersFromConstraints](https://github.com/apache/spark/blob/05c0fa573881b49d8ead9a5e16071190e5841e1b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1206)
 optimization rule generates `IsNotNull` constraints corresponding to null 
intolerant predicates. The `IsNotNull` constraints are generated on the 
attribute inside the corresponding predicate.
   e.g. A predicate `a > 0` on an integer column a will result in a constraint 
`IsNotNull(a)`. On the other hand a predicate on a nested int column 
`structCol.b` where `structCol` is a struct column results in a constraint 
`IsNotNull(structCol)`.
   
   This generation of constraints on the root level nested type is extremely 
conservative as it could lead to materialization of the the entire struct. The 
constraint should instead be generated on the nested field being referenced by 
the predicate. In the above example, the constraint should be 
`IsNotNull(structCol.b)` instead of `IsNotNull(structCol)`. 
   
   The new constraints also create opportunities for nested pruning. Currently 
`IsNotNull(structCol)` constraint would preclude pruning of `structCol`. 
However the constraint `IsNotNull(structCol.b)` could create opportunities to 
prune `structCol`.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added test to `InferFiltersFromConstraintsSuite`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to