Yes, but I don’t want to use it in a select() call. 
Either selectExpr() or spark.sql(), with the udf being called inside a string.

Now I got it to work using 
"sqlContext.registerFunction('encodeOneHot_udf',encodeOneHot, VectorUDT())”
But this sqlContext approach will disappear, right? So I’m curious what to use 
instead.

> On Aug 4, 2016, at 3:54 PM, Nicholas Chammas <nicholas.cham...@gmail.com> 
> wrote:
> 
> Have you looked at pyspark.sql.functions.udf and the associated examples?
> 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen <bteeu...@gmail.com 
> <mailto:bteeu...@gmail.com>>님이 작성:
> Hi,
> 
> I’d like to use a UDF in pyspark 2.0. As in ..
> ________ 
> 
> def squareIt(x):
>   return x * x
> 
> # register the function and define return type
> ….
> 
> spark.sql(“”"select myUdf(adgroupid, 'extra_string_parameter') as 
> function_result from df’)
> 
> _________
> 
> How can I register the function? I only see registerFunction in the 
> deprecated sqlContext at 
> http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html 
> <http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html>.
> As the ‘spark’ object unifies hiveContext and sqlContext, what is the new way 
> to go?
> 
> Ben

Reply via email to