[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

viirya Thu, 14 Sep 2017 01:45:11 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19147#discussion_r138831553
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2111,6 +2126,53 @@ def wrapper(*args):
             return wrapper
     
     
    +def _udf(f, returnType, vectorized):
    +    udf_obj = UserDefinedFunction(f, returnType, vectorized=vectorized)
    +    return udf_obj._wrapped()
    +
    +
    +if _have_pandas and _have_arrow:
    +
    +    @since(2.3)
    +    def pandas_udf(f=None, returnType=StringType()):
    +        """
    +        Creates a :class:`Column` expression representing a vectorized 
user defined function (UDF).
    +
    +        .. note:: The vectorized user-defined functions must be 
deterministic. Due to optimization,
    +            duplicate invocations may be eliminated or the function may 
even be invoked more times
    +            than it is present in the query.
    --- End diff --
    
    Should we explain more about what the vectorized UDF is and its expected 
input parameters and outputs?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

Reply via email to