xkrogen opened a new pull request, #37634:
URL: https://github.com/apache/spark/pull/37634

   ### What changes were proposed in this pull request?
   This modifies `GenerateUnsafeProjection` to wrap projections of non-null 
fields in try-catch blocks which swallow any `NullPointerException` that is 
thrown, and instead throw a helpful error message indicating that a null value 
was encountered where the schema indicated non-null. This new error is added in 
`QueryExecutionErrors`.
   
   Small modifications are made to a few methods in `GenerateUnsafeProjection` 
to allow for passing down the path to the projection in question, to give the 
user a helpful indication of what they need to change. Getting the name of the 
top-level column seems to require substantial changes since the name is thrown 
away when the `BoundReference` is created (in favor of an ordinal), so for the 
top-level only an ordinal is given; for nested fields the name is provided. An 
example error message looks like:
   
   ```
   java.lang.RuntimeException: The value at <POS_0>.`middle`.`bottom` cannot be 
null, but a NULL was found. This is typically caused by the presence of a NULL 
value when the schema indicates the value should be non-null. Check that the 
input data matches the schema and/or that UDFs which can return null have a 
nullable return schema.
   ```
   
   ### Why are the changes needed?
   This is needed to help users decipher the error message; currently a 
`NullPointerException` without any message is thrown, which provides the user 
no guidance on what they've done wrong, and typically leads them to believe 
there is a bug in Spark. See the Jira for a specific example of how this 
behavior can be triggered and what the exception looks like currently.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, in the case that a user has a data-schema mismatch, they will not get a 
much more helpful error message. In other cases, no change.
   
   ### How was this patch tested?
   See tests in `DataFrameSuite` and `GeneratedProjectionSuite`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to