bersprockets opened a new pull request, #38697: URL: https://github.com/apache/spark/pull/38697
Backport of #38635 ### What changes were proposed in this pull request? When a user specifies a null format in `to_number`/`try_to_number`, return `null`, with a data type of `DecimalType.USER_DEFAULT`, rather than throwing a `NullPointerException`. Also, since the code for `ToNumber` and `TryToNumber` is virtually identical, put all common code in new abstract class `ToNumberBase` to avoid fixing the bug in two places. ### Why are the changes needed? `to_number`/`try_to_number` currently throws a `NullPointerException` when the format is `null`: ``` spark-sql> SELECT to_number('454', null); org.apache.spark.SparkException: The Spark SQL phase analysis failed with an internal error. Please, fill a bug report in, and provide the full stack trace. at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:500) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:512) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) ... Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat$lzycompute(numberFormatExpressions.scala:72) at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat(numberFormatExpressions.scala:72) at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter$lzycompute(numberFormatExpressions.scala:73) at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter(numberFormatExpressions.scala:73) at org.apache.spark.sql.catalyst.expressions.ToNumber.checkInputDataTypes(numberFormatExpressions.scala:81) ``` Also: ``` spark-sql> SELECT try_to_number('454', null); org.apache.spark.SparkException: The Spark SQL phase analysis failed with an internal error. Please, fill a bug report in, and provide the full stack trace. at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:500) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:512) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) ... Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat$lzycompute(numberFormatExpressions.scala:72) at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat(numberFormatExpressions.scala:72) at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter$lzycompute(numberFormatExpressions.scala:73) at org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter(numberFormatExpressions.scala:73) at org.apache.spark.sql.catalyst.expressions.ToNumber.checkInputDataTypes(numberFormatExpressions.scala:81) at org.apache.spark.sql.catalyst.expressions.TryToNumber.checkInputDataTypes(numberFormatExpressions.scala:146) ``` Compare to `to_binary` and `try_to_binary`: ``` spark-sql> SELECT to_binary('abc', null); NULL Time taken: 3.111 seconds, Fetched 1 row(s) spark-sql> SELECT try_to_binary('abc', null); NULL Time taken: 0.06 seconds, Fetched 1 row(s) spark-sql> ``` Also compare to `to_number` in PostgreSQL 11.18: ``` SELECT to_number('454', null) is null as a; a true ``` ### Does this PR introduce _any_ user-facing change? `to_number`/`try_to_number` with null format will now return `null` with a data type of `DecimalType.USER_DEFAULT`. ### How was this patch tested? New unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org