Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20171#discussion_r161638977
--- Diff: python/pyspark/sql/context.py ---
@@ -204,15 +206,31 @@ def registerFunction(self, name, f,
returnType=StringType()):
>>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
[Row(stringLengthInt(test)=4)]
+ >>> from pyspark.sql.types import IntegerType
+ >>> from pyspark.sql.functions import udf
+ >>> slen = udf(lambda s: len(s), IntegerType())
+ >>> _ = sqlContext.udf.register("slen", slen)
+ >>> sqlContext.sql("SELECT slen('test')").collect()
+ [Row(slen(test)=4)]
+
>>> import random
>>> from pyspark.sql.functions import udf
- >>> from pyspark.sql.types import IntegerType, StringType
+ >>> from pyspark.sql.types import IntegerType
>>> random_udf = udf(lambda: random.randint(0, 100),
IntegerType()).asNondeterministic()
- >>> newRandom_udf = sqlContext.registerFunction("random_udf",
random_udf, StringType())
+ >>> newRandom_udf = sqlContext.udf.register("random_udf",
random_udf)
--- End diff --
I mean it doesn't completely cover the concern:
> `sqlContext` has been deprecated since 2.0. SparkSession should be the
default entrance
and this change doesn't completely replace it too. If it's meant to be
separate, we should better leave this change out. What I was wondering is why
this partially fixes this concern in a separate PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]