xkrogen opened a new pull request, #37634: URL: https://github.com/apache/spark/pull/37634
### What changes were proposed in this pull request? This modifies `GenerateUnsafeProjection` to wrap projections of non-null fields in try-catch blocks which swallow any `NullPointerException` that is thrown, and instead throw a helpful error message indicating that a null value was encountered where the schema indicated non-null. This new error is added in `QueryExecutionErrors`. Small modifications are made to a few methods in `GenerateUnsafeProjection` to allow for passing down the path to the projection in question, to give the user a helpful indication of what they need to change. Getting the name of the top-level column seems to require substantial changes since the name is thrown away when the `BoundReference` is created (in favor of an ordinal), so for the top-level only an ordinal is given; for nested fields the name is provided. An example error message looks like: ``` java.lang.RuntimeException: The value at <POS_0>.`middle`.`bottom` cannot be null, but a NULL was found. This is typically caused by the presence of a NULL value when the schema indicates the value should be non-null. Check that the input data matches the schema and/or that UDFs which can return null have a nullable return schema. ``` ### Why are the changes needed? This is needed to help users decipher the error message; currently a `NullPointerException` without any message is thrown, which provides the user no guidance on what they've done wrong, and typically leads them to believe there is a bug in Spark. See the Jira for a specific example of how this behavior can be triggered and what the exception looks like currently. ### Does this PR introduce _any_ user-facing change? Yes, in the case that a user has a data-schema mismatch, they will not get a much more helpful error message. In other cases, no change. ### How was this patch tested? See tests in `DataFrameSuite` and `GeneratedProjectionSuite`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
