You could use a UDF pretty easily, something like this should work, the
lastElement function could be changed to do pretty much any string manipulation
you want.
import org.apache.spark.sql.functions.udf
def lastElement(input: String) = input.split("/").last
val lastElementUdf = udf(lastElement(_:String))
df.select(lastElementUdf ($"col1")).show()
Ewan
From: Bharathi Raja [mailto:[email protected]]
Sent: 12 May 2016 11:40
To: Raghavendra Pandey <[email protected]>; Bharathi Raja
<[email protected]>
Cc: User <[email protected]>
Subject: RE: Spark 1.6.0: substring on df.select
Thanks Raghav.
I have 5+ million records. I feel creating multiple come is not an optimal way.
Please suggest any other alternate solution.
Can’t we do any string operation in DF.Select?
Regards,
Raja
From: Raghavendra Pandey<mailto:[email protected]>
Sent: 11 May 2016 09:04 PM
To: Bharathi Raja<mailto:[email protected]>
Cc: User<mailto:[email protected]>
Subject: Re: Spark 1.6.0: substring on df.select
You can create a column with count of /. Then take max of it and create that
many columns for every row with null fillers.
Raghav
On 11 May 2016 20:37, "Bharathi Raja"
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I have a dataframe column col1 with values something like
“/client/service/version/method”. The number of “/” are not constant.
Could you please help me to extract all methods from the column col1?
In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).
Thanks in advance.
Regards,
Raja