[GitHub] [spark] HyukjinKwon opened a new pull request #28135: [SPARK-26412][PYTHON][FOLLOW-UP] Improve error messages in Scala iterator pandas UDF

GitBox Mon, 06 Apr 2020 01:41:27 -0700

HyukjinKwon opened a new pull request #28135: [SPARK-26412][PYTHON][FOLLOW-UP] 
Improve error messages in Scala iterator pandas UDF
URL: https://github.com/apache/spark/pull/28135
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to improve the error message from Scalar iterator pandas 
UDF.
   
   ### Why are the changes needed?
   
   To show the correct error messages.
   
   ### Does this PR introduce any user-facing change?
   
   Yes, but only in unreleased branches.
   
   ```python
   import pandas as pd
   from pyspark.sql.functions import pandas_udf, PandasUDFType
   
   @pandas_udf('long', PandasUDFType.SCALAR_ITER)
   def pandas_plus_one(iterator):
         for _ in iterator:
               yield pd.Series(list(range(20)))
   
   spark.range(10).repartition(1).select(pandas_plus_one("id")).show()
   ```
   
   Before:
   
   ```
   AssertionError: Pandas MAP_ITER UDF outputted more rows than input rows.
   ```
   
   
   After:
   
   ```
   RuntimeError: The length of each output series (or frame) in Scalar iterator
   pandas UDF should be the same with the input's; however, the length of
   output series (or frame) was 20 and the length of the input's was 10.
   ```
   
   
   ### How was this patch tested?
   
   Unittests were fixed accordingly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon opened a new pull request #28135: [SPARK-26412][PYTHON][FOLLOW-UP] Improve error messages in Scala iterator pandas UDF

Reply via email to