[GitHub] spark pull request #20171: [SPARK-22978] [PySpark] Register Vectorized UDFs ...

HyukjinKwon Mon, 15 Jan 2018 16:35:00 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20171#discussion_r161638977
  
    --- Diff: python/pyspark/sql/context.py ---
    @@ -204,15 +206,31 @@ def registerFunction(self, name, f, 
returnType=StringType()):
             >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
             [Row(stringLengthInt(test)=4)]
     
    +        >>> from pyspark.sql.types import IntegerType
    +        >>> from pyspark.sql.functions import udf
    +        >>> slen = udf(lambda s: len(s), IntegerType())
    +        >>> _ = sqlContext.udf.register("slen", slen)
    +        >>> sqlContext.sql("SELECT slen('test')").collect()
    +        [Row(slen(test)=4)]
    +
             >>> import random
             >>> from pyspark.sql.functions import udf
    -        >>> from pyspark.sql.types import IntegerType, StringType
    +        >>> from pyspark.sql.types import IntegerType
             >>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
    -        >>> newRandom_udf = sqlContext.registerFunction("random_udf", 
random_udf, StringType())
    +        >>> newRandom_udf = sqlContext.udf.register("random_udf", 
random_udf)
    --- End diff --
    
    I mean it doesn't completely cover the concern:
    
    > `sqlContext` has been deprecated since 2.0. SparkSession should be the 
default entrance
    
    and this change doesn't completely replace it too. If it's meant to be 
separate, we should better leave this change out. What I was wondering is why 
this partially fixes this concern in a separate PR.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20171: [SPARK-22978] [PySpark] Register Vectorized UDFs ...

Reply via email to