HyukjinKwon opened a new pull request #28135: [SPARK-26412][PYTHON][FOLLOW-UP] Improve error messages in Scala iterator pandas UDF URL: https://github.com/apache/spark/pull/28135 ### What changes were proposed in this pull request? This PR proposes to improve the error message from Scalar iterator pandas UDF. ### Why are the changes needed? To show the correct error messages. ### Does this PR introduce any user-facing change? Yes, but only in unreleased branches. ```python import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType @pandas_udf('long', PandasUDFType.SCALAR_ITER) def pandas_plus_one(iterator): for _ in iterator: yield pd.Series(list(range(20))) spark.range(10).repartition(1).select(pandas_plus_one("id")).show() ``` Before: ``` AssertionError: Pandas MAP_ITER UDF outputted more rows than input rows. ``` After: ``` RuntimeError: The length of each output series (or frame) in Scalar iterator pandas UDF should be the same with the input's; however, the length of output series (or frame) was 20 and the length of the input's was 10. ``` ### How was this patch tested? Unittests were fixed accordingly.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
