bersprockets opened a new pull request, #38697:
URL: https://github.com/apache/spark/pull/38697

   Backport of #38635
   
   ### What changes were proposed in this pull request?
   
   When a user specifies a null format in `to_number`/`try_to_number`, return 
`null`, with a data type of `DecimalType.USER_DEFAULT`, rather than throwing a 
`NullPointerException`.
   
   Also, since the code for `ToNumber` and `TryToNumber` is virtually 
identical, put all common code in new abstract class `ToNumberBase` to avoid 
fixing the bug in two places.
   
   ### Why are the changes needed?
   
   `to_number`/`try_to_number` currently throws a `NullPointerException` when 
the format is `null`:
   
   ```
   spark-sql> SELECT to_number('454', null);
   org.apache.spark.SparkException: The Spark SQL phase analysis failed with an 
internal error. Please, fill a bug report in, and provide the full stack trace.
        at 
org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:500)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:512)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
   ...
   Caused by: java.lang.NullPointerException
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat$lzycompute(numberFormatExpressions.scala:72)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat(numberFormatExpressions.scala:72)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter$lzycompute(numberFormatExpressions.scala:73)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter(numberFormatExpressions.scala:73)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.checkInputDataTypes(numberFormatExpressions.scala:81)
   ```
   Also:
   ```
   spark-sql> SELECT try_to_number('454', null);
   org.apache.spark.SparkException: The Spark SQL phase analysis failed with an 
internal error. Please, fill a bug report in, and provide the full stack trace.
        at 
org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:500)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:512)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
   ...
   Caused by: java.lang.NullPointerException
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat$lzycompute(numberFormatExpressions.scala:72)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormat(numberFormatExpressions.scala:72)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter$lzycompute(numberFormatExpressions.scala:73)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.numberFormatter(numberFormatExpressions.scala:73)
        at 
org.apache.spark.sql.catalyst.expressions.ToNumber.checkInputDataTypes(numberFormatExpressions.scala:81)
        at 
org.apache.spark.sql.catalyst.expressions.TryToNumber.checkInputDataTypes(numberFormatExpressions.scala:146)
   ```
   Compare to `to_binary` and `try_to_binary`:
   ```
   spark-sql> SELECT to_binary('abc', null);
   NULL
   Time taken: 3.111 seconds, Fetched 1 row(s)
   spark-sql> SELECT try_to_binary('abc', null);
   NULL
   Time taken: 0.06 seconds, Fetched 1 row(s)
   spark-sql>
   ```
   Also compare to `to_number` in PostgreSQL 11.18:
   ```
   SELECT to_number('454', null) is null as a;
   a
   true
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   `to_number`/`try_to_number` with null format will now return `null` with a 
data type of `DecimalType.USER_DEFAULT`.
   
   ### How was this patch tested?
   
   New unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to