cloud-fan commented on PR #37634:
URL: https://github.com/apache/spark/pull/37634#issuecomment-1250772044

   Spark trusts data nullability in many places (expressions, projection 
generators, optimizer rules, etc.). It's a lot of efforts to improve error 
messages for all these places when data does not match the nullability. We'd 
better pick a clear scope here.
   
   AFAIK a common source of mismatch is data source and UDF. We can focus on 
these 2 cases only.
   
   For data sources, we can add a Filter node above the data source relation to 
apply null check, using the existing `AssertNotNull` expression. For UDF, we 
can wrap the UDF expression with `AssertNotNull` to do the null check as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to