Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20507#discussion_r165972212
--- Diff: python/pyspark/sql/tests.py ---
@@ -3920,6 +3920,14 @@ def test_vectorized_udf_null_string(self):
res = df.select(str_f(col('str')))
self.assertEquals(df.collect(), res.collect())
+ def test_vectorized_udf_string_in_udf(self):
+ from pyspark.sql.functions import pandas_udf, col
+ import pandas as pd
+ df = self.spark.range(10)
+ str_f = pandas_udf(lambda x: pd.Series(["%s" % i for i in x]),
StringType())
--- End diff --
Not a big deal. How about `pd.Series(map(str, x))`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]