Utkarsh Agarwal created SPARK-36978:
---------------------------------------

             Summary: InferConstraints rule should create IsNotNull constraints 
on the nested field instead of the root nested type 
                 Key: SPARK-36978
                 URL: https://issues.apache.org/jira/browse/SPARK-36978
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0, 3.0.0, 3.2.0
            Reporter: Utkarsh Agarwal


[InferFiltersFromConstraints|https://github.com/apache/spark/blob/05c0fa573881b49d8ead9a5e16071190e5841e1b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1206]
 optimization rule generates {{IsNotNull}} constraints corresponding to null 
intolerant predicates. The {{IsNotNull}} constraints are generated on the 
attribute inside the corresponding predicate. 
e.g. A predicate {{a > 0}}  on an integer column {{a}} will result in a 
constraint {{IsNotNull(a)}}. On the other hand a predicate on a nested int 
column {{structCol.b}} where {{structCol}} is a struct column results in a 
constraint {{IsNotNull(structCol)}}.

This generation of constraints on the root level nested type is extremely 
conservative as it could lead to materialization of the the entire struct. The 
constraint should instead be generated on the nested field being referenced by 
the predicate. In the above example, the constraint should be 
{{IsNotNull(structCol.b)}} instead of {{IsNotNull(structCol)}}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to