Re: Spark 1.6.0: substring on df.select
Alternatively, you may try the built-in function: regexp_extract > On May 12, 2016, at 20:27, Ewan Leith <ewan.le...@realitymine.com> wrote: > > You could use a UDF pretty easily, something like this should work, the > lastElement function could be changed to do pretty much any string > manipulation you want. > > import org.apache.spark.sql.functions.udf > > def lastElement(input: String) = input.split("/").last > > val lastElementUdf = udf(lastElement(_:String)) > > df.select(lastElementUdf ($"col1")).show() > > Ewan > > > From: Bharathi Raja [mailto:raja...@yahoo.com.INVALID] > Sent: 12 May 2016 11:40 > To: Raghavendra Pandey <raghavendra.pan...@gmail.com>; Bharathi Raja > <raja...@yahoo.com.invalid> > Cc: User <user@spark.apache.org> > Subject: RE: Spark 1.6.0: substring on df.select > > Thanks Raghav. > > I have 5+ million records. I feel creating multiple come is not an optimal > way. > > Please suggest any other alternate solution. > Can’t we do any string operation in DF.Select? > > Regards, > Raja > > From: Raghavendra Pandey <mailto:raghavendra.pan...@gmail.com> > Sent: 11 May 2016 09:04 PM > To: Bharathi Raja <mailto:raja...@yahoo.com.invalid> > Cc: User <mailto:user@spark.apache.org> > Subject: Re: Spark 1.6.0: substring on df.select > > You can create a column with count of /. Then take max of it and create that > many columns for every row with null fillers. > > Raghav > > On 11 May 2016 20:37, "Bharathi Raja" <raja...@yahoo.com.invalid > <mailto:raja...@yahoo.com.invalid>> wrote: > Hi, > > I have a dataframe column col1 with values something like > “/client/service/version/method”. The number of “/” are not constant. > Could you please help me to extract all methods from the column col1? > > In Pig i used SUBSTRING with LAST_INDEX_OF(“/”). > > Thanks in advance. > Regards, > Raja
RE: Spark 1.6.0: substring on df.select
You could use a UDF pretty easily, something like this should work, the lastElement function could be changed to do pretty much any string manipulation you want. import org.apache.spark.sql.functions.udf def lastElement(input: String) = input.split("/").last val lastElementUdf = udf(lastElement(_:String)) df.select(lastElementUdf ($"col1")).show() Ewan From: Bharathi Raja [mailto:raja...@yahoo.com.INVALID] Sent: 12 May 2016 11:40 To: Raghavendra Pandey <raghavendra.pan...@gmail.com>; Bharathi Raja <raja...@yahoo.com.invalid> Cc: User <user@spark.apache.org> Subject: RE: Spark 1.6.0: substring on df.select Thanks Raghav. I have 5+ million records. I feel creating multiple come is not an optimal way. Please suggest any other alternate solution. Can’t we do any string operation in DF.Select? Regards, Raja From: Raghavendra Pandey<mailto:raghavendra.pan...@gmail.com> Sent: 11 May 2016 09:04 PM To: Bharathi Raja<mailto:raja...@yahoo.com.invalid> Cc: User<mailto:user@spark.apache.org> Subject: Re: Spark 1.6.0: substring on df.select You can create a column with count of /. Then take max of it and create that many columns for every row with null fillers. Raghav On 11 May 2016 20:37, "Bharathi Raja" <raja...@yahoo.com.invalid<mailto:raja...@yahoo.com.invalid>> wrote: Hi, I have a dataframe column col1 with values something like “/client/service/version/method”. The number of “/” are not constant. Could you please help me to extract all methods from the column col1? In Pig i used SUBSTRING with LAST_INDEX_OF(“/”). Thanks in advance. Regards, Raja
RE: Spark 1.6.0: substring on df.select
Thanks Raghav. I have 5+ million records. I feel creating multiple come is not an optimal way. Please suggest any other alternate solution. Can’t we do any string operation in DF.Select? Regards, Raja From: Raghavendra Pandey Sent: 11 May 2016 09:04 PM To: Bharathi Raja Cc: User Subject: Re: Spark 1.6.0: substring on df.select You can create a column with count of /. Then take max of it and create that many columns for every row with null fillers. Raghav On 11 May 2016 20:37, "Bharathi Raja" <raja...@yahoo.com.invalid> wrote: Hi, I have a dataframe column col1 with values something like “/client/service/version/method”. The number of “/” are not constant. Could you please help me to extract all methods from the column col1? In Pig i used SUBSTRING with LAST_INDEX_OF(“/”). Thanks in advance. Regards, Raja
Re: Spark 1.6.0: substring on df.select
You can create a column with count of /. Then take max of it and create that many columns for every row with null fillers. Raghav On 11 May 2016 20:37, "Bharathi Raja"wrote: Hi, I have a dataframe column col1 with values something like “/client/service/version/method”. The number of “/” are not constant. Could you please help me to extract all methods from the column col1? In Pig i used SUBSTRING with LAST_INDEX_OF(“/”). Thanks in advance. Regards, Raja