cloud-fan commented on PR #37634: URL: https://github.com/apache/spark/pull/37634#issuecomment-1250772044
Spark trusts data nullability in many places (expressions, projection generators, optimizer rules, etc.). It's a lot of efforts to improve error messages for all these places when data does not match the nullability. We'd better pick a clear scope here. AFAIK a common source of mismatch is data source and UDF. We can focus on these 2 cases only. For data sources, we can add a Filter node above the data source relation to apply null check, using the existing `AssertNotNull` expression. For UDF, we can wrap the UDF expression with `AssertNotNull` to do the null check as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
