second option is more correct and should provide better performance.
From: Perttu Ranta-aho [mailto:ranta...@iki.fi]
Sent: Thursday, November 17, 2016 1:50 PM
To: user@spark.apache.org
Subject: Re: Nested UDFs
Hi,
My example was little bogus, my real use case is to do multiple regexp
Hi,
My example was little bogus, my real use case is to do multiple regexp
replacements so something like:
def my_f(data):
for match, repl in regexp_list:
data = regexp_replace(match, repl, data)
return data
I could achieve my goal by mutiple .select(regexp_replace()) lines, but o
Regexp_replace is supposed to receive a column, you don’t need to write a UDF
for it.
Instead try:
Test_data.select(regexp_Replace(test_data.name, ‘a’, ‘X’)
You would need a Udf if you would wanted to do something on the string value of
a single row (e.g. return data + “bla”)
Assaf.
From: Pert