Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20171#discussion_r161638977
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -204,15 +206,31 @@ def registerFunction(self, name, f, 
returnType=StringType()):
             >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
     
    +        >>> from pyspark.sql.types import IntegerType
    +        >>> from pyspark.sql.functions import udf
    +        >>> slen = udf(lambda s: len(s), IntegerType())
    +        >>> _ = sqlContext.udf.register("slen", slen)
    +        >>> sqlContext.sql("SELECT slen('test')").collect()
    +        [Row(slen(test)=4)]
    +
             >>> import random
             >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> from pyspark.sql.types import IntegerType
             >>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
    -        >>> newRandom_udf = sqlContext.registerFunction("random_udf", 
random_udf, StringType())
    +        >>> newRandom_udf = sqlContext.udf.register("random_udf", 
random_udf)
    --- End diff --
    
    I mean it doesn't completely cover the concern:
    
    > `sqlContext` has been deprecated since 2.0. SparkSession should be the 
default entrance
    
    and this change doesn't completely replace it too. If it's meant to be 
separate, we should better leave this change out. What I was wondering is why 
this partially fixes this concern in a separate PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to