[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162448507
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -181,3 +183,179 @@ def asNondeterministic(self):
 """
 self.deterministic = False
 return self
+
+
+class UDFRegistration(object):
+"""
+Wrapper for user-defined function registration. This instance can be 
accessed by
+:attr:`spark.udf` or :attr:`sqlContext.udf`.
+
+.. versionadded:: 1.3.1
+"""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+@since("1.3.1")
+def register(self, name, f, returnType=None):
+"""Registers a Python function (including lambda function) or a 
user-defined function
+in SQL statements.
+
+:param name: name of the user-defined function in SQL statements.
+:param f: a Python function, or a user-defined function. The 
user-defined function can
+be either row-at-a-time or vectorized. See 
:meth:`pyspark.sql.functions.udf` and
+:meth:`pyspark.sql.functions.pandas_udf`.
+:param returnType: the return type of the registered user-defined 
function.
+:return: a user-defined function.
+
+`returnType` can be optionally specified when `f` is a Python 
function but not
+when `f` is a user-defined function. Please see below.
--- End diff --

Could you add another paragraph for explaining how to register a 
non-deterministic Python function? This sounds a common question from end users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20288


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162229716
  
--- Diff: python/pyspark/sql/context.py ---
@@ -29,9 +29,10 @@
 from pyspark.sql.readwriter import DataFrameReader
 from pyspark.sql.streaming import DataStreamReader
 from pyspark.sql.types import IntegerType, Row, StringType
+from pyspark.sql.udf import UDFRegistration
--- End diff --

I intendedly kept this to retain the import path 
`pyspark.sql.context.UDFRegistration` just in case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162230241
  
--- Diff: python/pyspark/sql/context.py ---
@@ -172,113 +173,29 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
 """
 return self.sparkSession.range(start, end, step, numPartitions)
 
-@ignore_unicode_prefix
 @since(1.2)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`sqlContext.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = sqlContext.registerFunction("stringLengthString", 
lambda x: len(x))
->>> sqlContext.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> sqlContext.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = sqlContext.udf.register("slen", slen)
->>> sqlContext.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = sqlContext.registerFunction("random_udf", 
random_udf)
->>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=82)]
->>> sqlContext.range(1).select(new_random_udf()).collect()  # 
doctest: +SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: 
+SKIP
->>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  
# doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
+"""An alias for :func:`spark.udf.register`.
+See :meth:`pyspark.sql.UDFRegistration.register`.
+
+.. note:: Deprecated in 2.3.0. Use :func:`spark.udf.register` 
instead.
--- End diff --

It shows the doc as below:

![2018-01-18 10 28 
46](https://user-images.githubusercontent.com/6477701/35076515-379756f4-fc3c-11e7-99db-447fb466c626.png)

I checked the link, `pyspark.sql.UDFRegistration.register` is correct.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162198033
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -224,92 +224,20 @@ def dropGlobalTempView(self, viewName):
 """
 self._jcatalog.dropGlobalTempView(viewName)
 
-@ignore_unicode_prefix
-@since(2.0)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`spark.catalog.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = spark.catalog.registerFunction("stringLengthString", 
len)
->>> spark.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.catalog.registerFunction("stringLengthInt", len, 
IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = spark.udf.register("slen", slen)
->>> spark.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = spark.catalog.registerFunction("random_udf", 
random_udf)
->>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=82)]
->>> spark.range(1).select(new_random_udf()).collect()  # doctest: 
+SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
->>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-
-# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
-if hasattr(f, 'asNondeterministic'):
-if returnType is not None:
-raise TypeError(
-"Invalid returnType: None is expected when f is a 
UserDefinedFunction, "
-"but got %s." % returnType)
-if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
-  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
-raise ValueError(
-"Invalid f: f must be either SQL_BATCHED_UDF or 
SQL_PANDAS_SCALAR_UDF")
-register_udf = UserDefinedFunction(f.func, 
returnType=f.returnType, name=name,
-   evalType=f.evalType,
-   
deterministic=f.deterministic)
-return_udf = f
-else:
-if returnType is None:
-returnType = StringType()
-register_udf = 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162089110
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -224,92 +224,20 @@ def dropGlobalTempView(self, viewName):
 """
 self._jcatalog.dropGlobalTempView(viewName)
 
-@ignore_unicode_prefix
-@since(2.0)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`spark.catalog.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = spark.catalog.registerFunction("stringLengthString", 
len)
->>> spark.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.catalog.registerFunction("stringLengthInt", len, 
IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = spark.udf.register("slen", slen)
->>> spark.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = spark.catalog.registerFunction("random_udf", 
random_udf)
->>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=82)]
->>> spark.range(1).select(new_random_udf()).collect()  # doctest: 
+SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
->>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-
-# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
-if hasattr(f, 'asNondeterministic'):
-if returnType is not None:
-raise TypeError(
-"Invalid returnType: None is expected when f is a 
UserDefinedFunction, "
-"but got %s." % returnType)
-if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
-  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
-raise ValueError(
-"Invalid f: f must be either SQL_BATCHED_UDF or 
SQL_PANDAS_SCALAR_UDF")
-register_udf = UserDefinedFunction(f.func, 
returnType=f.returnType, name=name,
-   evalType=f.evalType,
-   
deterministic=f.deterministic)
-return_udf = f
-else:
-if returnType is None:
-returnType = StringType()
-register_udf = 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162035869
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -181,3 +183,179 @@ def asNondeterministic(self):
 """
 self.deterministic = False
 return self
+
+
+class UDFRegistration(object):
+"""
+Wrapper for user-defined function registration.
+
+.. versionadded:: 1.3.1
+"""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+@since(1.3)
+def register(self, name, f, returnType=None):
+"""Registers a Python function (including lambda function) or a 
user-defined function
+in SQL statements.
+
+:param name: name of the user-defined function in SQL statements.
+:param f: a Python function, or a user-defined function. The 
user-defined function can
+be either row-at-a-time or vectorized. See 
:meth:`pyspark.sql.functions.udf` and
+:meth:`pyspark.sql.functions.pandas_udf`.
+:param returnType: the return type of the registered user-defined 
function.
+:return: a user-defined function.
+
+`returnType` can be optionally specified when `f` is a Python 
function but not
+when `f` is a user-defined function. Please see below.
+
+1. When `f` is a Python function:
+
+`returnType` defaults to string type and can be optionally 
specified. The produced
+object must match the specified type. In this case, this API 
works as if
+`register(name, f, returnType=StringType())`.
+
+>>> strlen = spark.udf.register("stringLengthString", lambda 
x: len(x))
+>>> spark.sql("SELECT stringLengthString('test')").collect()
+[Row(stringLengthString(test)=u'4')]
+
+>>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
+[Row(stringLengthString(text)=u'3')]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+2. When `f` is a user-defined function:
+
+Spark uses the return type of the given user-defined function 
as the return type of
+the registered user-defined function. `returnType` should not 
be specified.
+In this case, this API works as if `register(name, f)`.
+
+>>> from pyspark.sql.types import IntegerType
+>>> from pyspark.sql.functions import udf
+>>> slen = udf(lambda s: len(s), IntegerType())
+>>> _ = spark.udf.register("slen", slen)
+>>> spark.sql("SELECT slen('test')").collect()
+[Row(slen(test)=4)]
+
+>>> import random
+>>> from pyspark.sql.functions import udf
+>>> from pyspark.sql.types import IntegerType
+>>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
+>>> new_random_udf = spark.udf.register("random_udf", 
random_udf)
+>>> spark.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
+[Row(random_udf()=82)]
+
+>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+>>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: 
+SKIP
+... def add_one(x):
+... return x + 1
+...
+>>> _ = spark.udf.register("add_one", add_one)  # doctest: 
+SKIP
+>>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
+[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
+
+.. note:: Registration for a user-defined function (case 2.) 
was added from
+Spark 2.3.0.
+"""
+
+# This is to check whether the input function is from a 
user-defined function or
+# Python function.
+if hasattr(f, 'asNondeterministic'):
+if returnType is not None:
+raise TypeError(
+"Invalid returnType: data type can not be specified 
when f is"
+"a user-defined function, but 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162035576
  
--- Diff: python/pyspark/sql/context.py ---
@@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
 """
 return self.sparkSession.range(start, end, step, numPartitions)
 
-@ignore_unicode_prefix
-@since(1.2)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`sqlContext.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = sqlContext.registerFunction("stringLengthString", 
lambda x: len(x))
->>> sqlContext.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> sqlContext.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = sqlContext.udf.register("slen", slen)
->>> sqlContext.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = sqlContext.registerFunction("random_udf", 
random_udf)
->>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=82)]
->>> sqlContext.range(1).select(new_random_udf()).collect()  # 
doctest: +SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: 
+SKIP
->>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  
# doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+warnings.warn(
+"Deprecated in 2.3.0. Use spark.udf.register instead.",
+DeprecationWarning)
+return self.sparkSession.udf.register(name, f, returnType)
+# Reuse the docstring from UDFRegistration but with few notes.
+_register_doc = UDFRegistration.register.__doc__.strip()
+registerFunction.__doc__ = """%s
 
-@ignore_unicode_prefix
-@since(2.1)
-def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a java UDF so it can be used in SQL statements.
-
-In addition to a name and the function itself, the return type can 
be optionally specified.
-When the return type is not specified we would infer it via 
reflection.
-:param name:  name of the UDF
-:param javaClassName: fully 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162035758
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -181,3 +183,179 @@ def asNondeterministic(self):
 """
 self.deterministic = False
 return self
+
+
+class UDFRegistration(object):
+"""
+Wrapper for user-defined function registration.
+
+.. versionadded:: 1.3.1
+"""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+@since(1.3)
+def register(self, name, f, returnType=None):
+"""Registers a Python function (including lambda function) or a 
user-defined function
+in SQL statements.
+
+:param name: name of the user-defined function in SQL statements.
+:param f: a Python function, or a user-defined function. The 
user-defined function can
+be either row-at-a-time or vectorized. See 
:meth:`pyspark.sql.functions.udf` and
+:meth:`pyspark.sql.functions.pandas_udf`.
+:param returnType: the return type of the registered user-defined 
function.
+:return: a user-defined function.
+
+`returnType` can be optionally specified when `f` is a Python 
function but not
+when `f` is a user-defined function. Please see below.
+
+1. When `f` is a Python function:
+
+`returnType` defaults to string type and can be optionally 
specified. The produced
+object must match the specified type. In this case, this API 
works as if
+`register(name, f, returnType=StringType())`.
+
+>>> strlen = spark.udf.register("stringLengthString", lambda 
x: len(x))
+>>> spark.sql("SELECT stringLengthString('test')").collect()
+[Row(stringLengthString(test)=u'4')]
+
+>>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
+[Row(stringLengthString(text)=u'3')]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+2. When `f` is a user-defined function:
+
+Spark uses the return type of the given user-defined function 
as the return type of
+the registered user-defined function. `returnType` should not 
be specified.
+In this case, this API works as if `register(name, f)`.
+
+>>> from pyspark.sql.types import IntegerType
+>>> from pyspark.sql.functions import udf
+>>> slen = udf(lambda s: len(s), IntegerType())
+>>> _ = spark.udf.register("slen", slen)
+>>> spark.sql("SELECT slen('test')").collect()
+[Row(slen(test)=4)]
+
+>>> import random
+>>> from pyspark.sql.functions import udf
+>>> from pyspark.sql.types import IntegerType
+>>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
+>>> new_random_udf = spark.udf.register("random_udf", 
random_udf)
+>>> spark.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
+[Row(random_udf()=82)]
+
+>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+>>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: 
+SKIP
+... def add_one(x):
+... return x + 1
+...
+>>> _ = spark.udf.register("add_one", add_one)  # doctest: 
+SKIP
+>>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
+[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
+
+.. note:: Registration for a user-defined function (case 2.) 
was added from
+Spark 2.3.0.
+"""
--- End diff --

https://user-images.githubusercontent.com/6477701/35042729-1acaa234-fbcd-11e7-9d3f-4e94dc200e2c.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162035427
  
--- Diff: python/pyspark/sql/context.py ---
@@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
 """
 return self.sparkSession.range(start, end, step, numPartitions)
 
-@ignore_unicode_prefix
-@since(1.2)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`sqlContext.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = sqlContext.registerFunction("stringLengthString", 
lambda x: len(x))
->>> sqlContext.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> sqlContext.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = sqlContext.udf.register("slen", slen)
->>> sqlContext.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = sqlContext.registerFunction("random_udf", 
random_udf)
->>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=82)]
->>> sqlContext.range(1).select(new_random_udf()).collect()  # 
doctest: +SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: 
+SKIP
->>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  
# doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+warnings.warn(
+"Deprecated in 2.3.0. Use spark.udf.register instead.",
+DeprecationWarning)
+return self.sparkSession.udf.register(name, f, returnType)
+# Reuse the docstring from UDFRegistration but with few notes.
+_register_doc = UDFRegistration.register.__doc__.strip()
+registerFunction.__doc__ = """%s
 
-@ignore_unicode_prefix
-@since(2.1)
-def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a java UDF so it can be used in SQL statements.
-
-In addition to a name and the function itself, the return type can 
be optionally specified.
-When the return type is not specified we would infer it via 
reflection.
-:param name:  name of the UDF
-:param javaClassName: fully 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162035316
  
--- Diff: python/pyspark/sql/context.py ---
@@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
 """
 return self.sparkSession.range(start, end, step, numPartitions)
 
-@ignore_unicode_prefix
-@since(1.2)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`sqlContext.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = sqlContext.registerFunction("stringLengthString", 
lambda x: len(x))
->>> sqlContext.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> sqlContext.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = sqlContext.udf.register("slen", slen)
->>> sqlContext.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = sqlContext.registerFunction("random_udf", 
random_udf)
->>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=82)]
->>> sqlContext.range(1).select(new_random_udf()).collect()  # 
doctest: +SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: 
+SKIP
->>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  
# doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+warnings.warn(
+"Deprecated in 2.3.0. Use spark.udf.register instead.",
+DeprecationWarning)
+return self.sparkSession.udf.register(name, f, returnType)
+# Reuse the docstring from UDFRegistration but with few notes.
+_register_doc = UDFRegistration.register.__doc__.strip()
+registerFunction.__doc__ = """%s
 
-@ignore_unicode_prefix
-@since(2.1)
-def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a java UDF so it can be used in SQL statements.
-
-In addition to a name and the function itself, the return type can 
be optionally specified.
-When the return type is not specified we would infer it via 
reflection.
-:param name:  name of the UDF
-:param javaClassName: fully 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162035180
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -224,92 +224,20 @@ def dropGlobalTempView(self, viewName):
 """
 self._jcatalog.dropGlobalTempView(viewName)
 
-@ignore_unicode_prefix
-@since(2.0)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`spark.catalog.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = spark.catalog.registerFunction("stringLengthString", 
len)
->>> spark.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.catalog.registerFunction("stringLengthInt", len, 
IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = spark.udf.register("slen", slen)
->>> spark.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = spark.catalog.registerFunction("random_udf", 
random_udf)
->>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=82)]
->>> spark.range(1).select(new_random_udf()).collect()  # doctest: 
+SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
->>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-
-# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
-if hasattr(f, 'asNondeterministic'):
-if returnType is not None:
-raise TypeError(
-"Invalid returnType: None is expected when f is a 
UserDefinedFunction, "
-"but got %s." % returnType)
-if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
-  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
-raise ValueError(
-"Invalid f: f must be either SQL_BATCHED_UDF or 
SQL_PANDAS_SCALAR_UDF")
-register_udf = UserDefinedFunction(f.func, 
returnType=f.returnType, name=name,
-   evalType=f.evalType,
-   
deterministic=f.deterministic)
-return_udf = f
-else:
-if returnType is None:
-returnType = StringType()
-register_udf = 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162031948
  
--- Diff: python/pyspark/sql/context.py ---
@@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
 """
 return self.sparkSession.range(start, end, step, numPartitions)
 
-@ignore_unicode_prefix
-@since(1.2)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`sqlContext.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = sqlContext.registerFunction("stringLengthString", 
lambda x: len(x))
->>> sqlContext.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> sqlContext.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = sqlContext.udf.register("slen", slen)
->>> sqlContext.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = sqlContext.registerFunction("random_udf", 
random_udf)
->>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=82)]
->>> sqlContext.range(1).select(new_random_udf()).collect()  # 
doctest: +SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: 
+SKIP
->>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  
# doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+warnings.warn(
+"Deprecated in 2.3.0. Use spark.udf.register instead.",
+DeprecationWarning)
+return self.sparkSession.udf.register(name, f, returnType)
+# Reuse the docstring from UDFRegistration but with few notes.
+_register_doc = UDFRegistration.register.__doc__.strip()
+registerFunction.__doc__ = """%s
 
-@ignore_unicode_prefix
-@since(2.1)
-def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a java UDF so it can be used in SQL statements.
-
-In addition to a name and the function itself, the return type can 
be optionally specified.
-When the return type is not specified we would infer it via 
reflection.
-:param name:  name of the UDF
-:param javaClassName: fully 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162031848
  
--- Diff: python/pyspark/sql/context.py ---
@@ -172,113 +173,34 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
 """
 return self.sparkSession.range(start, end, step, numPartitions)
 
-@ignore_unicode_prefix
-@since(1.2)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`sqlContext.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = sqlContext.registerFunction("stringLengthString", 
lambda x: len(x))
->>> sqlContext.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> sqlContext.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.registerFunction("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
->>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = sqlContext.udf.register("slen", slen)
->>> sqlContext.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = sqlContext.registerFunction("random_udf", 
random_udf)
->>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=82)]
->>> sqlContext.range(1).select(new_random_udf()).collect()  # 
doctest: +SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = sqlContext.udf.register("add_one", add_one)  # doctest: 
+SKIP
->>> sqlContext.sql("SELECT add_one(id) FROM range(3)").collect()  
# doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+warnings.warn(
+"Deprecated in 2.3.0. Use spark.udf.register instead.",
+DeprecationWarning)
+return self.sparkSession.udf.register(name, f, returnType)
+# Reuse the docstring from UDFRegistration but with few notes.
+_register_doc = UDFRegistration.register.__doc__.strip()
+registerFunction.__doc__ = """%s
 
-@ignore_unicode_prefix
-@since(2.1)
-def registerJavaFunction(self, name, javaClassName, returnType=None):
-"""Register a java UDF so it can be used in SQL statements.
-
-In addition to a name and the function itself, the return type can 
be optionally specified.
-When the return type is not specified we would infer it via 
reflection.
-:param name:  name of the UDF
-:param javaClassName: fully 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162031680
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -181,3 +183,180 @@ def asNondeterministic(self):
 """
 self.deterministic = False
 return self
+
+
+class UDFRegistration(object):
+"""
+Wrapper for user-defined function registration.
+
+.. versionadded:: 1.3.1
+"""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+@since(1.3)
+def register(self, name, f, returnType=None):
+"""Registers a Python function (including lambda function) or a 
user-defined function
+in SQL statements.
+
+:param name: name of the user-defined function in SQL statements.
+:param f: a Python function, or a user-defined function. The 
user-defined function can
+be either row-at-a-time or vectorized. See 
:meth:`pyspark.sql.functions.udf` and
+:meth:`pyspark.sql.functions.pandas_udf`.
+:param returnType: the return type of the registered user-defined 
function.
+:return: a user-defined function.
+
+`returnType` can be optionally specified when `f` is a Python 
function but not
+when `f` is a user-defined function. See below:
+
+1. When `f` is a Python function:
+
+`returnType` defaults to string type and can be optionally 
specified. The produced
+object must match the specified type. In this case, this API 
works as if
+`register(name, f, returnType=StringType())`.
+
+>>> strlen = spark.udf.register("stringLengthString", lambda 
x: len(x))
+>>> spark.sql("SELECT stringLengthString('test')").collect()
+[Row(stringLengthString(test)=u'4')]
+
+>>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
+[Row(stringLengthString(text)=u'3')]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+
+2. When `f` is a user-defined function:
+
+Spark uses the return type of the given user-defined function 
as the return type of
+the registered user-defined function. `returnType` should not 
be specified.
+In this case, this API works as if `register(name, f)`.
+
+>>> from pyspark.sql.types import IntegerType
+>>> from pyspark.sql.functions import udf
+>>> slen = udf(lambda s: len(s), IntegerType())
+>>> _ = spark.udf.register("slen", slen)
+>>> spark.sql("SELECT slen('test')").collect()
+[Row(slen(test)=4)]
+
+>>> import random
+>>> from pyspark.sql.functions import udf
+>>> from pyspark.sql.types import IntegerType
+>>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
+>>> new_random_udf = spark.udf.register("random_udf", 
random_udf)
+>>> spark.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
+[Row(random_udf()=82)]
+
+>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+>>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: 
+SKIP
+... def add_one(x):
+... return x + 1
+...
+>>> _ = spark.udf.register("add_one", add_one)  # doctest: 
+SKIP
+>>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
+[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
+
+.. note:: Registration for a user-defined function (case 2.) 
was added from
+Spark 2.3.0.
+"""
+
+# This is to check whether the input function is from a 
user-defined function or
+# Python function.
+if hasattr(f, 'asNondeterministic'):
+if returnType is not None:
+raise TypeError(
+"Invalid returnType: data type can not be specified 
when f is"
+"a user-defined function, but got 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r162031475
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -181,3 +183,180 @@ def asNondeterministic(self):
 """
 self.deterministic = False
 return self
+
+
+class UDFRegistration(object):
--- End diff --

This seems introduced from 1.3.1 - 
https://issues.apache.org/jira/browse/SPARK-6603


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161983722
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -224,92 +225,18 @@ def dropGlobalTempView(self, viewName):
 """
 self._jcatalog.dropGlobalTempView(viewName)
 
-@ignore_unicode_prefix
-@since(2.0)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`spark.catalog.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = spark.catalog.registerFunction("stringLengthString", 
len)
->>> spark.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.catalog.registerFunction("stringLengthInt", len, 
IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = spark.udf.register("slen", slen)
->>> spark.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = spark.catalog.registerFunction("random_udf", 
random_udf)
->>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=82)]
->>> spark.range(1).select(new_random_udf()).collect()  # doctest: 
+SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
->>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-
-# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
-if hasattr(f, 'asNondeterministic'):
-if returnType is not None:
-raise TypeError(
-"Invalid returnType: None is expected when f is a 
UserDefinedFunction, "
-"but got %s." % returnType)
-if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
-  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
-raise ValueError(
-"Invalid f: f must be either SQL_BATCHED_UDF or 
SQL_PANDAS_SCALAR_UDF")
-register_udf = UserDefinedFunction(f.func, 
returnType=f.returnType, name=name,
-   evalType=f.evalType,
-   
deterministic=f.deterministic)
-return_udf = f
-else:
-if returnType is None:
-returnType = StringType()
-register_udf = 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161969687
  
--- Diff: python/pyspark/sql/context.py ---
@@ -147,7 +147,8 @@ def udf(self):
 
 :return: :class:`UDFRegistration`
 """
-return UDFRegistration(self)
+from pyspark.sql.session import UDFRegistration
+return UDFRegistration(self.sparkSession)
--- End diff --

How about `return self.sparkSession.udf`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161966469
  
--- Diff: python/pyspark/sql/session.py ---
@@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
 self.stop()
 
 
+class UDFRegistration(object):
+"""Wrapper for user-defined function registration."""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+def register(self, name, f, returnType=None):
+"""Registers a Python function (including lambda function) or a 
user-defined function
+in SQL statements.
+
+:param name: name of the user-defined function in SQL statements.
+:param f: a Python function, or a user-defined function. The 
user-defined function can
+be either row-at-a-time or vectorized. See 
:meth:`pyspark.sql.functions.udf` and
+:meth:`pyspark.sql.functions.pandas_udf`.
+:param returnType: the return type of the registered user-defined 
function.
+:return: a user-defined function.
+
+`returnType` can be optionally specified when `f` is a Python 
function but not
+when `f` is a user-defined function. See below:
+
+1. When `f` is a Python function, `returnType` defaults to string 
type and can be
+optionally specified. The produced object must match the specified 
type. In this case,
+this API works as if `register(name, f, returnType=StringType())`.
+
+>>> strlen = spark.udf.register("stringLengthString", lambda 
x: len(x))
+>>> spark.sql("SELECT stringLengthString('test')").collect()
+[Row(stringLengthString(test)=u'4')]
+
+>>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
+[Row(stringLengthString(text)=u'3')]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+2. When `f` is a user-defined function, Spark uses the return type 
of the given a
+user-defined function as the return type of the registered a 
user-defined function.
--- End diff --

the registered a user-defined function -> the registered user-defined 
function


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161964247
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -224,92 +225,18 @@ def dropGlobalTempView(self, viewName):
 """
 self._jcatalog.dropGlobalTempView(viewName)
 
-@ignore_unicode_prefix
-@since(2.0)
 def registerFunction(self, name, f, returnType=None):
-"""Registers a Python function (including lambda function) or a 
:class:`UserDefinedFunction`
-as a UDF. The registered UDF can be used in SQL statements.
-
-:func:`spark.udf.register` is an alias for 
:func:`spark.catalog.registerFunction`.
-
-In addition to a name and the function itself, `returnType` can be 
optionally specified.
-1) When f is a Python function, `returnType` defaults to a string. 
The produced object must
-match the specified type. 2) When f is a 
:class:`UserDefinedFunction`, Spark uses the return
-type of the given UDF as the return type of the registered UDF. 
The input parameter
-`returnType` is None by default. If given by users, the value must 
be None.
-
-:param name: name of the UDF in SQL statements.
-:param f: a Python function, or a wrapped/native 
UserDefinedFunction. The UDF can be either
-row-at-a-time or vectorized.
-:param returnType: the return type of the registered UDF.
-:return: a wrapped/native :class:`UserDefinedFunction`
-
->>> strlen = spark.catalog.registerFunction("stringLengthString", 
len)
->>> spark.sql("SELECT stringLengthString('test')").collect()
-[Row(stringLengthString(test)=u'4')]
-
->>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
-[Row(stringLengthString(text)=u'3')]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.catalog.registerFunction("stringLengthInt", len, 
IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
->>> spark.sql("SELECT stringLengthInt('test')").collect()
-[Row(stringLengthInt(test)=4)]
-
->>> from pyspark.sql.types import IntegerType
->>> from pyspark.sql.functions import udf
->>> slen = udf(lambda s: len(s), IntegerType())
->>> _ = spark.udf.register("slen", slen)
->>> spark.sql("SELECT slen('test')").collect()
-[Row(slen(test)=4)]
-
->>> import random
->>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType
->>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> new_random_udf = spark.catalog.registerFunction("random_udf", 
random_udf)
->>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=82)]
->>> spark.range(1).select(new_random_udf()).collect()  # doctest: 
+SKIP
-[Row(()=26)]
-
->>> from pyspark.sql.functions import pandas_udf, PandasUDFType
->>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
-... def add_one(x):
-... return x + 1
-...
->>> _ = spark.udf.register("add_one", add_one)  # doctest: +SKIP
->>> spark.sql("SELECT add_one(id) FROM range(3)").collect()  # 
doctest: +SKIP
-[Row(add_one(id)=1), Row(add_one(id)=2), Row(add_one(id)=3)]
-"""
-
-# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
-if hasattr(f, 'asNondeterministic'):
-if returnType is not None:
-raise TypeError(
-"Invalid returnType: None is expected when f is a 
UserDefinedFunction, "
-"but got %s." % returnType)
-if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
-  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
-raise ValueError(
-"Invalid f: f must be either SQL_BATCHED_UDF or 
SQL_PANDAS_SCALAR_UDF")
-register_udf = UserDefinedFunction(f.func, 
returnType=f.returnType, name=name,
-   evalType=f.evalType,
-   
deterministic=f.deterministic)
-return_udf = f
-else:
-if returnType is None:
-returnType = StringType()
-register_udf = 

[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161964383
  
--- Diff: python/pyspark/sql/context.py ---
@@ -147,7 +147,8 @@ def udf(self):
 
 :return: :class:`UDFRegistration`
 """
-return UDFRegistration(self)
+from pyspark.sql.session import UDFRegistration
--- End diff --

Why we import `UDFRegistration` here again? Isn't it imported at the top?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161966507
  
--- Diff: python/pyspark/sql/session.py ---
@@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
 self.stop()
 
 
+class UDFRegistration(object):
+"""Wrapper for user-defined function registration."""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+def register(self, name, f, returnType=None):
+"""Registers a Python function (including lambda function) or a 
user-defined function
+in SQL statements.
+
+:param name: name of the user-defined function in SQL statements.
+:param f: a Python function, or a user-defined function. The 
user-defined function can
+be either row-at-a-time or vectorized. See 
:meth:`pyspark.sql.functions.udf` and
+:meth:`pyspark.sql.functions.pandas_udf`.
+:param returnType: the return type of the registered user-defined 
function.
+:return: a user-defined function.
+
+`returnType` can be optionally specified when `f` is a Python 
function but not
+when `f` is a user-defined function. See below:
+
+1. When `f` is a Python function, `returnType` defaults to string 
type and can be
+optionally specified. The produced object must match the specified 
type. In this case,
+this API works as if `register(name, f, returnType=StringType())`.
+
+>>> strlen = spark.udf.register("stringLengthString", lambda 
x: len(x))
+>>> spark.sql("SELECT stringLengthString('test')").collect()
+[Row(stringLengthString(test)=u'4')]
+
+>>> spark.sql("SELECT 'foo' AS 
text").select(strlen("text")).collect()
+[Row(stringLengthString(text)=u'3')]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+>>> from pyspark.sql.types import IntegerType
+>>> _ = spark.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
+>>> spark.sql("SELECT stringLengthInt('test')").collect()
+[Row(stringLengthInt(test)=4)]
+
+2. When `f` is a user-defined function, Spark uses the return type 
of the given a
--- End diff --

of the given a user-defined function -> of the given user-defined function



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161965369
  
--- Diff: python/pyspark/sql/session.py ---
@@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
 self.stop()
 
 
+class UDFRegistration(object):
+"""Wrapper for user-defined function registration."""
+
+def __init__(self, sparkSession):
+self.sparkSession = sparkSession
+
+@ignore_unicode_prefix
+def register(self, name, f, returnType=None):
--- End diff --

shall we add `since 2.3`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161965278
  
--- Diff: python/pyspark/sql/context.py ---
@@ -624,6 +536,9 @@ def _test():
 globs['os'] = os
 globs['sc'] = sc
 globs['sqlContext'] = SQLContext(sc)
+# 'spark' alias is a small hack for reusing doctests. Please see the 
reassignment
+# of docstrings above.
+globs['spark'] = globs['sqlContext']
--- End diff --

shall we do `globs['spark'] = globs['sqlContext'].sparkSession`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20288#discussion_r161964738
  
--- Diff: python/pyspark/sql/session.py ---
@@ -778,6 +778,146 @@ def __exit__(self, exc_type, exc_val, exc_tb):
 self.stop()
 
 
+class UDFRegistration(object):
--- End diff --

shall we put it in `udf.py`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20288: [SPARK-23122][PYTHON][SQL] Deprecate register* fo...

2018-01-16 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/20288

[SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and 
Catalog in PySpark

## What changes were proposed in this pull request?

This PR proposes to deprecate `register*` for UDFs in `SQLContext` and 
`Catalog` in Spark 2.3.0.

These are inconsistent with Scala / Java APIs and also these basically do 
the same things with `spark.udf.register*`.

Also, this PR moves the logcis from `[sqlContext|spark.catalog].register*` 
to `spark.udf.register*` and reuse the docstring.

## How was this patch tested?

Manually tested, manually checked the API documentation and tests added to 
check if deprecated APIs call the aliases correctly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark deprecate-udf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20288


commit f63105c7faddc79ccd624c9234b56916efec3569
Author: hyukjinkwon 
Date:   2018-01-17T02:49:08Z

Deprecate register* for UDFs in SQLContext and Catalog in PySpark




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org