zhengruifeng opened a new pull request, #53730: URL: https://github.com/apache/spark/pull/53730
### What changes were proposed in this pull request? When input integral columns is nullable and a batch contains null value, this batch will be converted to float64, this was a workaround to resolve another issue in `DataFrame.toPandas` since spark 2.3 https://issues.apache.org/jira/browse/SPARK-21766 However it caused another correctness issue: when the integer is large, the roundtrip int->float->int will loss precision. This PR focus on the correctness issue on Pandas UDF. ### Why are the changes needed? to fix a correctness bug in pandas udf. ### Does this PR introduce _any_ user-facing change? ```py from pyspark.sql.types import * from pyspark.sql.functions import pandas_udf identity_udf = pandas_udf(lambda s: s, returnType=LongType()) query = "SELECT * FROM VALUES (9223372036854775707, 1), (NULL, 2) AS tab(a, b)" df = spark.sql(query).repartition(1).sortWithinPartitions("b") df.select("a", identity_udf("a").alias("b")).show() ``` before ``` +-------------------+-------------------+ | a| b| +-------------------+-------------------+ |9223372036854775707|9223372036854775807| | NULL| NULL| +-------------------+-------------------+ ``` value is changed from `9223372036854775707` to `9223372036854775807` after ``` +-------------------+-------------------+ | a| b| +-------------------+-------------------+ |9223372036854775707|9223372036854775707| | NULL| NULL| +-------------------+-------------------+ ``` ### How was this patch tested? added test and updated golden files ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
