[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

icexelloss Tue, 03 Oct 2017 19:52:14 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18732#discussion_r142571660
  
    --- Diff: python/pyspark/worker.py ---
    @@ -74,17 +75,37 @@ def wrap_udf(f, return_type):
     
     
     def wrap_pandas_udf(f, return_type):
    -    arrow_return_type = toArrowType(return_type)
    -
    -    def verify_result_length(*a):
    -        result = f(*a)
    -        if not hasattr(result, "__len__"):
    -            raise TypeError("Return type of pandas_udf should be a 
Pandas.Series")
    -        if len(result) != len(a[0]):
    -            raise RuntimeError("Result vector from pandas_udf was not the 
required length: "
    -                               "expected %d, got %d" % (len(a[0]), 
len(result)))
    -        return result
    -    return lambda *a: (verify_result_length(*a), arrow_return_type)
    +    if isinstance(return_type, StructType):
    +        import pyarrow as pa
    +
    +        arrow_return_types = list(to_arrow_type(field.dataType) for field 
in return_type)
    --- End diff --
    
    Fixed.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

Reply via email to