Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/19325#discussion_r140835837
--- Diff: python/pyspark/worker.py ---
@@ -80,14 +77,12 @@ def wrap_pandas_udf(f, return_type):
arrow_return_type = toArrowType(return_type)
def verify_result_length(*a):
- kwargs = a[-1]
- result = f(*a[:-1], **kwargs)
- if len(result) != kwargs["length"]:
+ result = f(*a)
+ if len(result) != len(a[0]):
--- End diff --
Good point. We should probably have a test that returns a scalar value too.
I'm not sure we should limit the return type so much. As long as pyarrow can
consume it, then it should be ok - it can also take a numpy array which might
be useful. Otherwise it should raise a clear exception. Maybe checking that it
has `__len__` is good enough?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]