Re: The equivalent for INSTR in Spark FP

Mich Talebzadeh Tue, 02 Aug 2016 00:53:10 -0700

No thinking on my part!!!

rs.select(mySubstr($"transactiondescription", lit(1),
instr($"transactiondescription", "CD"))).show(2)
+--------------------------------------------------------------+
|UDF(transactiondescription,1,instr(transactiondescription,CD))|
+--------------------------------------------------------------+
|                                           VERSEAS TRANSACTI C|
|                                           XYZ.COM 80...|
+--------------------------------------------------------------+
only showing top 2 rows


Let me test it.

Cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 1 August 2016 at 23:43, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Jacek.
>
> It sounds like the issue the position of the second variable in substring()
>
> This works
>
> scala> val wSpec2 =
> Window.partitionBy(substring($"transactiondescription",1,20))
> wSpec2: org.apache.spark.sql.expressions.WindowSpec =
> org.apache.spark.sql.expressions.WindowSpec@1a4eae2
>
> Using udf as suggested
>
> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>      |  s.substring(start, end) }
> mySubstr: org.apache.spark.sql.UserDefinedFunction =
> UserDefinedFunction(<function3>,StringType,List(StringType, IntegerType,
> IntegerType))
>
>
> This was throwing error:
>
> val wSpec2 = Window.partitionBy(substring("transactiondescription",1,
> indexOf("transactiondescription",'CD')-2))
>
>
> So I tried using udf
>
> scala> val wSpec2 =
> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
> instr('s, "CD")))
>      | )
> <console>:28: error: value select is not a member of
> org.apache.spark.sql.ColumnName
>          val wSpec2 =
> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
> instr('s, "CD")))
>
> Obviously I am not doing correctly :(
>
> cheers
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 1 August 2016 at 23:02, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> Interesting...
>>
>> I'm temping to think that substring function should accept the columns
>> that hold the numbers for start and end. I'd love hearing people's
>> thought on this.
>>
>> For now, I'd say you need to define udf to do substring as follows:
>>
>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>> s.substring(start, end) }
>> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
>> UserDefinedFunction(<function3>,StringType,Some(List(StringType,
>> IntegerType, IntegerType)))
>>
>> scala> df.show
>> +-----------+
>> |          s|
>> +-----------+
>> |hello world|
>> +-----------+
>>
>> scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show
>> +-----------------------+
>> |UDF(s, 1, instr(s, ll))|
>> +-----------------------+
>> |                     el|
>> +-----------------------+
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh
>> <mich.talebza...@gmail.com> wrote:
>> > Thanks Jacek,
>> >
>> > Do I have any other way of writing this with functional programming?
>> >
>> > select
>> >
>> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>> >
>> >
>> > Cheers,
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>> > loss, damage or destruction of data or any other property which may
>> arise
>> > from relying on this email's technical content is explicitly
>> disclaimed. The
>> > author will in no case be liable for any monetary damages arising from
>> such
>> > loss, damage or destruction.
>> >
>> >
>> >
>> >
>> > On 1 August 2016 at 22:13, Jacek Laskowski <ja...@japila.pl> wrote:
>> >>
>> >> Hi Mich,
>> >>
>> >> There's no indexOf UDF -
>> >>
>> >>
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
>> >>
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> ----
>> >> https://medium.com/@jaceklaskowski/
>> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>
>> >>
>> >> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh
>> >> <mich.talebza...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > What is the equivalent of FP for the following window/analytic that
>> >> > works OK
>> >> > in Spark SQL
>> >> >
>> >> > This one using INSTR
>> >> >
>> >> > select
>> >> >
>> >> >
>> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>> >> >
>> >> >
>> >> > select distinct *
>> >> > from (
>> >> >       select
>> >> >
>> >> >
>> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>> >> >       SUM(debitamount) OVER (PARTITION BY
>> >> >
>> >> >
>> substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) AS
>> >> > spent
>> >> >       from accounts.ll_18740868 where transactiontype = 'DEB'
>> >> >      ) tmp
>> >> >
>> >> >
>> >> > I tried indexOf but it does not work!
>> >> >
>> >> > val wSpec2 =
>> >> >
>> >> >
>> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>> >> > <console>:26: error: not found: value indexOf
>> >> >          val wSpec2 =
>> >> >
>> >> >
>> Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>> >> >
>> >> >
>> >> > Thanks
>> >> >
>> >> > Dr Mich Talebzadeh
>> >> >
>> >> >
>> >> >
>> >> > LinkedIn
>> >> >
>> >> >
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >> >
>> >> >
>> >> >
>> >> > http://talebzadehmich.wordpress.com
>> >> >
>> >> >
>> >> > Disclaimer: Use it at your own risk. Any and all responsibility for
>> any
>> >> > loss, damage or destruction of data or any other property which may
>> >> > arise
>> >> > from relying on this email's technical content is explicitly
>> disclaimed.
>> >> > The
>> >> > author will in no case be liable for any monetary damages arising
>> from
>> >> > such
>> >> > loss, damage or destruction.
>> >> >
>> >> >
>> >
>> >
>>
>
>

Re: The equivalent for INSTR in Spark FP

Reply via email to