Re: Spark 1.6.0: substring on df.select

2016-05-12 Thread Sun Rui
Alternatively, you may try the built-in function:
regexp_extract

> On May 12, 2016, at 20:27, Ewan Leith <ewan.le...@realitymine.com> wrote:
> 
> You could use a UDF pretty easily, something like this should work, the 
> lastElement function could be changed to do pretty much any string 
> manipulation you want.
>  
> import org.apache.spark.sql.functions.udf
>  
> def lastElement(input: String) = input.split("/").last
>  
> val lastElementUdf = udf(lastElement(_:String))
>  
> df.select(lastElementUdf ($"col1")).show()
>  
> Ewan
>  
>  
> From: Bharathi Raja [mailto:raja...@yahoo.com.INVALID] 
> Sent: 12 May 2016 11:40
> To: Raghavendra Pandey <raghavendra.pan...@gmail.com>; Bharathi Raja 
> <raja...@yahoo.com.invalid>
> Cc: User <user@spark.apache.org>
> Subject: RE: Spark 1.6.0: substring on df.select
>  
> Thanks Raghav. 
>  
> I have 5+ million records. I feel creating multiple come is not an optimal 
> way.
>  
> Please suggest any other alternate solution.
> Can’t we do any string operation in DF.Select?
>  
> Regards,
> Raja
>  
> From: Raghavendra Pandey <mailto:raghavendra.pan...@gmail.com>
> Sent: 11 May 2016 09:04 PM
> To: Bharathi Raja <mailto:raja...@yahoo.com.invalid>
> Cc: User <mailto:user@spark.apache.org>
> Subject: Re: Spark 1.6.0: substring on df.select
>  
> You can create a column with count of /.  Then take max of it and create that 
> many columns for every row with null fillers.
> 
> Raghav 
> 
> On 11 May 2016 20:37, "Bharathi Raja" <raja...@yahoo.com.invalid 
> <mailto:raja...@yahoo.com.invalid>> wrote:
> Hi,
>  
> I have a dataframe column col1 with values something like 
> “/client/service/version/method”. The number of “/” are not constant.
> Could you please help me to extract all methods from the column col1?
>  
> In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).
>  
> Thanks in advance.
> Regards,
> Raja



RE: Spark 1.6.0: substring on df.select

2016-05-12 Thread Ewan Leith
You could use a UDF pretty easily, something like this should work, the 
lastElement function could be changed to do pretty much any string manipulation 
you want.

import org.apache.spark.sql.functions.udf

def lastElement(input: String) = input.split("/").last

val lastElementUdf = udf(lastElement(_:String))

df.select(lastElementUdf ($"col1")).show()

Ewan


From: Bharathi Raja [mailto:raja...@yahoo.com.INVALID]
Sent: 12 May 2016 11:40
To: Raghavendra Pandey <raghavendra.pan...@gmail.com>; Bharathi Raja 
<raja...@yahoo.com.invalid>
Cc: User <user@spark.apache.org>
Subject: RE: Spark 1.6.0: substring on df.select

Thanks Raghav.

I have 5+ million records. I feel creating multiple come is not an optimal way.

Please suggest any other alternate solution.
Can’t we do any string operation in DF.Select?

Regards,
Raja

From: Raghavendra Pandey<mailto:raghavendra.pan...@gmail.com>
Sent: 11 May 2016 09:04 PM
To: Bharathi Raja<mailto:raja...@yahoo.com.invalid>
Cc: User<mailto:user@spark.apache.org>
Subject: Re: Spark 1.6.0: substring on df.select


You can create a column with count of /.  Then take max of it and create that 
many columns for every row with null fillers.

Raghav
On 11 May 2016 20:37, "Bharathi Raja" 
<raja...@yahoo.com.invalid<mailto:raja...@yahoo.com.invalid>> wrote:
Hi,

I have a dataframe column col1 with values something like 
“/client/service/version/method”. The number of “/” are not constant.
Could you please help me to extract all methods from the column col1?

In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).

Thanks in advance.
Regards,
Raja



RE: Spark 1.6.0: substring on df.select

2016-05-12 Thread Bharathi Raja
Thanks Raghav. 

I have 5+ million records. I feel creating multiple come is not an optimal way.

Please suggest any other alternate solution.
Can’t we do any string operation in DF.Select?

Regards,
Raja

From: Raghavendra Pandey
Sent: 11 May 2016 09:04 PM
To: Bharathi Raja
Cc: User
Subject: Re: Spark 1.6.0: substring on df.select

You can create a column with count of /.  Then take max of it and create that 
many columns for every row with null fillers. 
Raghav 
On 11 May 2016 20:37, "Bharathi Raja" <raja...@yahoo.com.invalid> wrote:
Hi,
 
I have a dataframe column col1 with values something like 
“/client/service/version/method”. The number of “/” are not constant. 
Could you please help me to extract all methods from the column col1?
 
In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).
 
Thanks in advance.
Regards,
Raja



Re: Spark 1.6.0: substring on df.select

2016-05-11 Thread Raghavendra Pandey
You can create a column with count of /.  Then take max of it and create
that many columns for every row with null fillers.

Raghav
On 11 May 2016 20:37, "Bharathi Raja"  wrote:

Hi,



I have a dataframe column col1 with values something like
“/client/service/version/method”. The number of “/” are not constant.

Could you please help me to extract all methods from the column col1?



In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).



Thanks in advance.

Regards,

Raja