second option is more correct and should provide better performance.
From: Perttu Ranta-aho [mailto:ranta...@iki.fi]
Sent: Thursday, November 17, 2016 1:50 PM
To: user@spark.apache.org
Subject: Re: Nested UDFs
Hi,
My example was little bogus, my real use case is to do multiple regexp
;
> You would need a Udf if you would wanted to do something on the string
> value of a single row (e.g. return data + “bla”)
>
>
>
> Assaf.
>
>
>
> *From:* Perttu Ranta-aho [mailto:ranta...@iki.fi]
> *Sent:* Thursday, November 17, 2016 9:15 AM
> *To:* user@spark.ap
: Perttu Ranta-aho [mailto:ranta...@iki.fi]
Sent: Thursday, November 17, 2016 9:15 AM
To: user@spark.apache.org
Subject: Nested UDFs
Hi,
Shouldn't this work?
from pyspark.sql.functions import regexp_replace, udf
def my_f(data):
return regexp_replace(data, 'a', 'X')
my_u
Hi,
Shouldn't this work?
from pyspark.sql.functions import regexp_replace, udf
def my_f(data):
return regexp_replace(data, 'a', 'X')
my_udf = udf(my_f)
test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',))
test_data.select(my_udf(test_data.name)).show()
But instead of