WeichenXu123 commented on a change in pull request #28135: 
[SPARK-26412][PYTHON][FOLLOW-UP] Improve error messages in Scala iterator 
pandas UDF
URL: https://github.com/apache/spark/pull/28135#discussion_r405923792
 
 

 ##########
 File path: python/pyspark/worker.py
 ##########
 @@ -357,8 +357,14 @@ def map_batch(batch):
             num_output_rows = 0
             for result_batch, result_type in result_iter:
                 num_output_rows += len(result_batch)
-                assert is_map_iter or num_output_rows <= num_input_rows[0], \
-                    "Pandas MAP_ITER UDF outputted more rows than input rows."
+
+                if is_scalar_iter and num_output_rows > num_input_rows[0]:
+                    raise RuntimeError(
+                        "The length of each output series (or frame) in Scalar 
iterator pandas "
+                        "UDF should be the same with the input's; however, the 
length of output "
+                        "series (or frame) was %d and the length of the 
input's was %d." % (
+                            num_output_rows, num_input_rows[0]))
+
 
 Review comment:
   message: Pandas MAP_ITER UDF outputted more rows than input rows.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to