[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

HyukjinKwon Mon, 05 Feb 2018 05:34:59 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20507#discussion_r165972212
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3920,6 +3920,14 @@ def test_vectorized_udf_null_string(self):
             res = df.select(str_f(col('str')))
             self.assertEquals(df.collect(), res.collect())
     
    +    def test_vectorized_udf_string_in_udf(self):
    +        from pyspark.sql.functions import pandas_udf, col
    +        import pandas as pd
    +        df = self.spark.range(10)
    +        str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]), 
StringType())
    --- End diff --
    
    Not a big deal. How about `pd.Series(map(str, x))`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20507: [SPARK-23334][SQL][PYTHON] Fix pandas_udf with re...

Reply via email to