[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

icexelloss Thu, 05 Oct 2017 12:22:37 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18732#discussion_r143033289
  
    --- Diff: python/pyspark/worker.py ---
    @@ -74,17 +74,35 @@ def wrap_udf(f, return_type):
     
     
     def wrap_pandas_udf(f, return_type):
    -    arrow_return_type = toArrowType(return_type)
    -
    -    def verify_result_length(*a):
    -        result = f(*a)
    -        if not hasattr(result, "__len__"):
    -            raise TypeError("Return type of pandas_udf should be a 
Pandas.Series")
    -        if len(result) != len(a[0]):
    -            raise RuntimeError("Result vector from pandas_udf was not the 
required length: "
    -                               "expected %d, got %d" % (len(a[0]), 
len(result)))
    -        return result
    -    return lambda *a: (verify_result_length(*a), arrow_return_type)
    +    if isinstance(return_type, StructType):
    --- End diff --
    
    Added doc. Check for `returnType == StructType `is done earlier:
    
    
https://github.com/icexelloss/spark/blob/groupby-apply-SPARK-20396/python/pyspark/sql/group.py#L238



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

Reply via email to