GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/20217
[SPARK-23026] [PySpark] Add RegisterUDF to PySpark ## What changes were proposed in this pull request? Add a new API for registering row-at-a-time or scalar vectorized UDFs. The registered UDFs can be used in the SQL statement. For example, Add a new API for registering row-at-a-time or scalar vectorized UDFs. The registered UDFs can be used in the SQL statement. ``` >>> from pyspark.sql.types import IntegerType >>> from pyspark.sql.functions import udf >>> slen = udf(lambda s: len(s), IntegerType()) >>> _ = spark.udf.registerUDF("slen", slen) >>> spark.sql("SELECT slen('test')").collect() [Row(slen(test)=4)] >>> import random >>> from pyspark.sql.functions import udf >>> from pyspark.sql.types import IntegerType >>> random_udf = udf(lambda: random.randint(0, 100), IntegerType()).asNondeterministic() >>> newRandom_udf = spark.catalog.registerUDF("random_udf", random_udf) >>> spark.sql("SELECT random_udf()").collect() [Row(random_udf()=82)] >>> spark.range(1).select(newRandom_udf()).collect() [Row(random_udf()=62)] >>> from pyspark.sql.functions import pandas_udf, PandasUDFType >>> @pandas_udf("integer", PandasUDFType.SCALAR) ... def add_one(x): ... return x + 1 ... >>> _ = spark.udf.registerUDF("add_one", add_one) >>> spark.sql("SELECT add_one(id) FROM range(10)").collect() ``` ## How was this patch tested? Added test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark registerUDF Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20217.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20217 ---- commit f25669a4b6c2298359df1b9083037468652cd141 Author: gatorsmile <gatorsmile@...> Date: 2018-01-10T10:24:08Z fix ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org