GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20217

    [SPARK-23026] [PySpark] Add RegisterUDF to PySpark

    ## What changes were proposed in this pull request?
    Add a new API for registering row-at-a-time or scalar vectorized UDFs. The 
registered UDFs can be used in the SQL statement. For example,
    
    Add a new API for registering row-at-a-time or scalar vectorized UDFs. The 
registered UDFs can be used in the SQL statement.
    
    ```
    >>> from pyspark.sql.types import IntegerType
    >>> from pyspark.sql.functions import udf
    >>> slen = udf(lambda s: len(s), IntegerType())
    >>> _ = spark.udf.registerUDF("slen", slen)
    >>> spark.sql("SELECT slen('test')").collect()
    [Row(slen(test)=4)]
    
    >>> import random
    >>> from pyspark.sql.functions import udf
    >>> from pyspark.sql.types import IntegerType
    >>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
    >>> newRandom_udf = spark.catalog.registerUDF("random_udf", random_udf)
    >>> spark.sql("SELECT random_udf()").collect()  
    [Row(random_udf()=82)]
    >>> spark.range(1).select(newRandom_udf()).collect()  
    [Row(random_udf()=62)]
    
    >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    >>> @pandas_udf("integer", PandasUDFType.SCALAR)  
    ... def add_one(x):
    ...     return x + 1
    ...
    >>> _ = spark.udf.registerUDF("add_one", add_one)  
    >>> spark.sql("SELECT add_one(id) FROM range(10)").collect()  
    ```
    ## How was this patch tested?
    Added test cases

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark registerUDF

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20217
    
----
commit f25669a4b6c2298359df1b9083037468652cd141
Author: gatorsmile <gatorsmile@...>
Date:   2018-01-10T10:24:08Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to