Job Opportunities in India or UK with Tier 2 Sponsorship - Spark Expert

2024-08-26 Thread sri hari kali charan Tummala
Hi Spark Community,

I'm a seasoned Data Engineering professional with 13+ years of experience
and expertise in Apache Spark, particularly in Structured Streaming. I'm
looking for job opportunities in India or the UK that offer Tier 2
sponsorship. If anyone knows of openings or can connect me with potential
employers, please reach out.

Thanks,
Sri Tummala


India Scala & Big Data Job Referral

2023-12-21 Thread sri hari kali charan Tummala
Hi Community,

I was laid off from Apple in February 2023, which led to my relocation from
the USA due to immigration issues related to my H1B visa.


I have over 12 years of experience as a consultant in Big Data, Spark,
Scala, Python, and Flink.


Despite my move to India, I haven't secured a job yet. I am seeking
referrals within product firms (preferably non-consulting) in India that
work with Flink, Spark, Scala, Big Data, or in the fields of ML & AI. Can
someone assist me with this?

Thanks
Sri


Spark Scala Contract Opportunity @USA

2022-11-10 Thread sri hari kali charan Tummala
Hi All,

Is anyone looking for a spark scala contract role inside the USA? A company
called Maxonic has an open spark scala contract position (100% remote)
inside the USA if anyone is interested, please send your CV to
kali.tumm...@gmail.com.

Thanks & Regards
Sri Tummala


Big Data Contract Roles ?

2022-09-14 Thread sri hari kali charan Tummala
Hi Flink Users/ Spark Users,

Is anyone hiring contract corp to corp big Data spark scala or Flink scala
roles ?


Thanks
Sri


Re: sql to spark scala rdd

2016-08-01 Thread sri hari kali charan Tummala
Hi All,

Below code calculates cumulative sum (running sum) and moving average using
scala RDD type of programming, I was using wrong function which is sliding
use scalleft instead.


sc.textFile("C:\\Users\\kalit_000\\Desktop\\Hadoop_IMP_DOC\\spark\\data.txt")
  .map(x => x.split("\\~"))
  .map(x => (x(0), x(1), x(2).toDouble))
  .groupBy(_._1)
  .mapValues{(x => x.toList.sortBy(_._2).zip(Stream from
1).scanLeft(("","",0.0,0.0,0.0,0.0))
  { (a,b) => (b._1._1,b._1._2,b._1._3,(b._1._3.toDouble +
a._3.toDouble),(b._1._3.toDouble + a._3.toDouble)/b._2,b._2)}.tail)}
  .flatMapValues(x => x.sortBy(_._1))
  .foreach(println)

Input Data:-

Headers:-
Key,Date,balance

786~20160710~234
786~20160709~-128
786~20160711~-457
987~20160812~456
987~20160812~567

Output Data:-

Column Headers:-
key, (key,Date,balance , daily balance, running average , row_number based
on key)

(786,(786,20160709,-128.0,-128.0,-128.0,1.0))
(786,(786,20160710,234.0,106.0,53.0,2.0))
(786,(786,20160711,-457.0,-223.0,-74.33,3.0))

(987,(987,20160812,567.0,1023.0,511.5,2.0))
(987,(987,20160812,456.0,456.0,456.0,1.0))

Reference:-

https://bzhangusc.wordpress.com/2014/06/21/calculate-running-sums/


Thanks
Sri


On Mon, Aug 1, 2016 at 12:07 AM, Sri  wrote:

> Hi ,
>
> I solved it using spark SQL which uses similar window functions mentioned
> below , for my own knowledge I am trying to solve using Scala RDD which I
> am unable to.
> What function in Scala supports window function like SQL unbounded
> preceding and current row ? Is it sliding ?
>
>
> Thanks
> Sri
>
> Sent from my iPhone
>
> On 31 Jul 2016, at 23:16, Mich Talebzadeh 
> wrote:
>
> hi
>
> You mentioned:
>
> I already solved it using DF and spark sql ...
>
> Are you referring to this code which is a classic analytics:
>
> SELECT DATE,balance,
>  SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING
>
>  AND
>
>  CURRENT ROW) daily_balance
>
>  FROM  table
>
>
> So how did you solve it using DF in the first place?
>
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 1 August 2016 at 07:04, Sri  wrote:
>
>> Hi ,
>>
>> Just wondering how spark SQL works behind the scenes does it not convert
>> SQL to some Scala RDD ? Or Scala ?
>>
>> How to write below SQL in Scala or Scala RDD
>>
>> SELECT DATE,balance,
>>
>> SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING
>>
>> AND
>>
>> CURRENT ROW) daily_balance
>>
>> FROM  table
>>
>>
>> Thanks
>> Sri
>> Sent from my iPhone
>>
>> On 31 Jul 2016, at 13:21, Jacek Laskowski  wrote:
>>
>> Hi,
>>
>> Impossible - see
>>
>> http://www.scala-lang.org/api/current/index.html#scala.collection.Seq@sliding(size:Int,step:Int):Iterator[Repr]
>> .
>>
>> I tried to show you why you ended up with "non-empty iterator" after
>> println. You should really start with
>> http://www.scala-lang.org/documentation/
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Sun, Jul 31, 2016 at 8:49 PM, sri hari kali charan Tummala
>>  wrote:
>>
>> Tuple
>>
>>
>> [Lscala.Tuple2;@65e4cb84
>>
>>
>> On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski  wrote:
>>
>>
>> Hi,
>>
>>
>> What's the result type of sliding(2,1)?
>>
>>
>> Pozdrawiam,
>>
>> Jacek Laskowski
>>
>> 
>>
>> https://medium.com/@jaceklaskowski/
>>
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>>
>> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala
>>
>>  wrote:
>>
>> tried this no luck, wht is non-empty iterator here ?
>>
&

Re: sql to spark scala rdd

2016-07-31 Thread sri hari kali charan Tummala
val test=sc.textFile(file).keyBy(x => x.split("\\~") (0))
  .map(x => x._2.split("\\~"))
  .map(x => ((x(0),x(1),x(2
  .map{case (account,datevalue,amount) =>
((account,datevalue),(amount.toDouble))}.mapValues(x =>
x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ +
_._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println)


On Sun, Jul 31, 2016 at 12:15 PM, sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi All,
>
> I already solved it using DF and spark sql I was wondering how to solve in
> scala rdd, I just got the answer need to check my results compared to spark
> sql thanks all for your time.
>
> I am trying to solve moving average using scala RDD group by key.
>
>
> input:-
> -987~20150728~100
> -987~20150729~50
> -987~20150730~-100
> -987~20150804~200
> -987~20150807~-300
> -987~20150916~100
>
>
> val test=sc.textFile(file).keyBy(x => x.split("\\~") (0))
>   .map(x => x._2.split("\\~"))
>   .map(x => ((x(0),x(1),x(2
>   .map{case (account,datevalue,amount) => 
> ((account,datevalue),(amount.toDouble))}.mapValues(x => 
> x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ + 
> _._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println)
>
> Op:-
>
> accountkey,date,balance_of_account, daily_average, sum_base_on_window
>
> ((-987,20150728),50.0,75.0,150.0)
> ((-987,20150729),-100.0,-25.0,-50.0)
> ((-987,20150730),200.0,50.0,100.0)
> ((-987,20150804),-300.0,-50.0,-100.0)
> ((-987,20150807),100.0,-100.0,-200.0)
>
>
> below book is written for Hadoop Mapreduce the book has solution for
> moving average but its in Java.
>
>
> https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch06.html
>
>
> Sql:-
>
>
> SELECT DATE,balance,
> SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
> CURRENT ROW) daily_balance
> FROM  table
>
> Thanks
> Sri
>
>
>
> On Sun, Jul 31, 2016 at 11:54 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Check also this
>> <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 31 July 2016 at 19:49, sri hari kali charan Tummala <
>> kali.tumm...@gmail.com> wrote:
>>
>>> Tuple
>>>
>>> [Lscala.Tuple2;@65e4cb84
>>>
>>> On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> What's the result type of sliding(2,1)?
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> 
>>>> https://medium.com/@jaceklaskowski/
>>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>
>>>>
>>>> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala
>>>>  wrote:
>>>> > tried this no luck, wht is non-empty iterator here ?
>>>> >
>>>> > OP:-
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> > (-987,non-empty iterator)
>>>> >
>>>> >
>>>> > sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>>> >   .map(x => x._2.split("\\~"))
>>>> >   .map(x => (x(0),x(2)))
>>>> > .map { case (key,value) =>
>>>> (key,value.toArray.toSeq.sliding(2,1).map(x
>>>> > => x.sum/x.size))}.foreach(println)
>>>> >
>>>> >
>>>> > On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala
>>>> >  wrote:
>>>> >>
>>>> >> Hi

Re: sql to spark scala rdd

2016-07-31 Thread sri hari kali charan Tummala
Hi All,

I already solved it using DF and spark sql I was wondering how to solve in
scala rdd, I just got the answer need to check my results compared to spark
sql thanks all for your time.

I am trying to solve moving average using scala RDD group by key.


input:-
-987~20150728~100
-987~20150729~50
-987~20150730~-100
-987~20150804~200
-987~20150807~-300
-987~20150916~100


val test=sc.textFile(file).keyBy(x => x.split("\\~") (0))
  .map(x => x._2.split("\\~"))
  .map(x => ((x(0),x(1),x(2
  .map{case (account,datevalue,amount) =>
((account,datevalue),(amount.toDouble))}.mapValues(x =>
x).toArray.sliding(2,1).map(x => (x(0)._1,x(1)._2,(x.foldLeft(0.0)(_ +
_._2/x.size)),x.foldLeft(0.0)(_ + _._2))).foreach(println)

Op:-

accountkey,date,balance_of_account, daily_average, sum_base_on_window

((-987,20150728),50.0,75.0,150.0)
((-987,20150729),-100.0,-25.0,-50.0)
((-987,20150730),200.0,50.0,100.0)
((-987,20150804),-300.0,-50.0,-100.0)
((-987,20150807),100.0,-100.0,-200.0)


below book is written for Hadoop Mapreduce the book has solution for moving
average but its in Java.

https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch06.html


Sql:-


SELECT DATE,balance,
SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW) daily_balance
FROM  table

Thanks
Sri



On Sun, Jul 31, 2016 at 11:54 AM, Mich Talebzadeh  wrote:

> Check also this
> <https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 31 July 2016 at 19:49, sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> Tuple
>>
>> [Lscala.Tuple2;@65e4cb84
>>
>> On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski  wrote:
>>
>>> Hi,
>>>
>>> What's the result type of sliding(2,1)?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala
>>>  wrote:
>>> > tried this no luck, wht is non-empty iterator here ?
>>> >
>>> > OP:-
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> > (-987,non-empty iterator)
>>> >
>>> >
>>> > sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>> >   .map(x => x._2.split("\\~"))
>>> >   .map(x => (x(0),x(2)))
>>> > .map { case (key,value) =>
>>> (key,value.toArray.toSeq.sliding(2,1).map(x
>>> > => x.sum/x.size))}.foreach(println)
>>> >
>>> >
>>> > On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala
>>> >  wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> I managed to write using sliding function but can it get key as well
>>> in my
>>> >> output ?
>>> >>
>>> >> sc.textFile(file).keyBy(x => x.split("\\~") (0))
>>> >>   .map(x => x._2.split("\\~"))
>>> >>   .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x =>
>>> >> (x,x.size)).foreach(println)
>>> >>
>>> >>
>>> >> at the moment my output:-
>>> >>
>>> >> 75.0
>>> >> -25.0
>>> >> 50.0
>>> >> -50.0
>>> >> -100.0
>>> >>
>>> >> I want with key how to get moving average output based on key ?
>>> >>
>>> >>
>>> >> 987,75.0
>>> >> 987,-25
>>> >> 987,50.0
>>> >>
>>> >> Thanks
>>> >> Sri
>>> >>
>>> 

Re: sql to spark scala rdd

2016-07-31 Thread sri hari kali charan Tummala
Tuple

[Lscala.Tuple2;@65e4cb84

On Sun, Jul 31, 2016 at 1:00 AM, Jacek Laskowski  wrote:

> Hi,
>
> What's the result type of sliding(2,1)?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sun, Jul 31, 2016 at 9:23 AM, sri hari kali charan Tummala
>  wrote:
> > tried this no luck, wht is non-empty iterator here ?
> >
> > OP:-
> > (-987,non-empty iterator)
> > (-987,non-empty iterator)
> > (-987,non-empty iterator)
> > (-987,non-empty iterator)
> > (-987,non-empty iterator)
> >
> >
> > sc.textFile(file).keyBy(x => x.split("\\~") (0))
> >   .map(x => x._2.split("\\~"))
> >   .map(x => (x(0),x(2)))
> > .map { case (key,value) =>
> (key,value.toArray.toSeq.sliding(2,1).map(x
> > => x.sum/x.size))}.foreach(println)
> >
> >
> > On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala
> >  wrote:
> >>
> >> Hi All,
> >>
> >> I managed to write using sliding function but can it get key as well in
> my
> >> output ?
> >>
> >> sc.textFile(file).keyBy(x => x.split("\\~") (0))
> >>   .map(x => x._2.split("\\~"))
> >>   .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x =>
> >> (x,x.size)).foreach(println)
> >>
> >>
> >> at the moment my output:-
> >>
> >> 75.0
> >> -25.0
> >> 50.0
> >> -50.0
> >> -100.0
> >>
> >> I want with key how to get moving average output based on key ?
> >>
> >>
> >> 987,75.0
> >> 987,-25
> >> 987,50.0
> >>
> >> Thanks
> >> Sri
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Jul 30, 2016 at 11:40 AM, sri hari kali charan Tummala
> >>  wrote:
> >>>
> >>> for knowledge just wondering how to write it up in scala or spark RDD.
> >>>
> >>> Thanks
> >>> Sri
> >>>
> >>> On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski 
> >>> wrote:
> >>>>
> >>>> Why?
> >>>>
> >>>> Pozdrawiam,
> >>>> Jacek Laskowski
> >>>> 
> >>>> https://medium.com/@jaceklaskowski/
> >>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> >>>> Follow me at https://twitter.com/jaceklaskowski
> >>>>
> >>>>
> >>>> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com
> >>>>  wrote:
> >>>> > Hi All,
> >>>> >
> >>>> > I managed to write business requirement in spark-sql and hive I am
> >>>> > still
> >>>> > learning scala how this below sql be written using spark RDD not
> spark
> >>>> > data
> >>>> > frames.
> >>>> >
> >>>> > SELECT DATE,balance,
> >>>> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING
> AND
> >>>> > CURRENT ROW) daily_balance
> >>>> > FROM  table
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > View this message in context:
> >>>> >
> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html
> >>>> > Sent from the Apache Spark User List mailing list archive at
> >>>> > Nabble.com.
> >>>> >
> >>>> >
> -
> >>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>>> >
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks & Regards
> >>> Sri Tummala
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards
> >> Sri Tummala
> >>
> >
> >
> >
> > --
> > Thanks & Regards
> > Sri Tummala
> >
>



-- 
Thanks & Regards
Sri Tummala


Re: sql to spark scala rdd

2016-07-31 Thread sri hari kali charan Tummala
tried this no luck, wht is non-empty iterator here ?

OP:-
(-987,non-empty iterator)
(-987,non-empty iterator)
(-987,non-empty iterator)
(-987,non-empty iterator)
(-987,non-empty iterator)


sc.textFile(file).keyBy(x => x.split("\\~") (0))
  .map(x => x._2.split("\\~"))
  .map(x => (x(0),x(2)))
.map { case (key,value) =>
(key,value.toArray.toSeq.sliding(2,1).map(x =>
x.sum/x.size))}.foreach(println)


On Sun, Jul 31, 2016 at 12:03 AM, sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi All,
>
> I managed to write using sliding function but can it get key as well in my
> output ?
>
> sc.textFile(file).keyBy(x => x.split("\\~") (0))
>   .map(x => x._2.split("\\~"))
>   .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x => 
> (x,x.size)).foreach(println)
>
>
> at the moment my output:-
>
> 75.0
> -25.0
> 50.0
> -50.0
> -100.0
>
> I want with key how to get moving average output based on key ?
>
>
> 987,75.0
> 987,-25
> 987,50.0
>
> Thanks
> Sri
>
>
>
>
>
>
> On Sat, Jul 30, 2016 at 11:40 AM, sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
>> for knowledge just wondering how to write it up in scala or spark RDD.
>>
>> Thanks
>> Sri
>>
>> On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski 
>> wrote:
>>
>>> Why?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com
>>>  wrote:
>>> > Hi All,
>>> >
>>> > I managed to write business requirement in spark-sql and hive I am
>>> still
>>> > learning scala how this below sql be written using spark RDD not spark
>>> data
>>> > frames.
>>> >
>>> > SELECT DATE,balance,
>>> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
>>> > CURRENT ROW) daily_balance
>>> > FROM  table
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html
>>> > Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com.
>>> >
>>> > -
>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> >
>>>
>>
>>
>>
>> --
>> Thanks & Regards
>> Sri Tummala
>>
>>
>
>
> --
> Thanks & Regards
> Sri Tummala
>
>


-- 
Thanks & Regards
Sri Tummala


Re: sql to spark scala rdd

2016-07-31 Thread sri hari kali charan Tummala
Hi All,

I managed to write using sliding function but can it get key as well in my
output ?

sc.textFile(file).keyBy(x => x.split("\\~") (0))
  .map(x => x._2.split("\\~"))
  .map(x => (x(2).toDouble)).toArray().sliding(2,1).map(x =>
(x,x.size)).foreach(println)


at the moment my output:-

75.0
-25.0
50.0
-50.0
-100.0

I want with key how to get moving average output based on key ?


987,75.0
987,-25
987,50.0

Thanks
Sri






On Sat, Jul 30, 2016 at 11:40 AM, sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> for knowledge just wondering how to write it up in scala or spark RDD.
>
> Thanks
> Sri
>
> On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski  wrote:
>
>> Why?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com
>>  wrote:
>> > Hi All,
>> >
>> > I managed to write business requirement in spark-sql and hive I am still
>> > learning scala how this below sql be written using spark RDD not spark
>> data
>> > frames.
>> >
>> > SELECT DATE,balance,
>> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
>> > CURRENT ROW) daily_balance
>> > FROM  table
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > -
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >
>>
>
>
>
> --
> Thanks & Regards
> Sri Tummala
>
>


-- 
Thanks & Regards
Sri Tummala


Re: sql to spark scala rdd

2016-07-30 Thread sri hari kali charan Tummala
for knowledge just wondering how to write it up in scala or spark RDD.

Thanks
Sri

On Sat, Jul 30, 2016 at 11:24 AM, Jacek Laskowski  wrote:

> Why?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sat, Jul 30, 2016 at 4:42 AM, kali.tumm...@gmail.com
>  wrote:
> > Hi All,
> >
> > I managed to write business requirement in spark-sql and hive I am still
> > learning scala how this below sql be written using spark RDD not spark
> data
> > frames.
> >
> > SELECT DATE,balance,
> > SUM(balance) OVER (ORDER BY DATE ROWS BETWEEN UNBOUNDED PRECEDING AND
> > CURRENT ROW) daily_balance
> > FROM  table
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/sql-to-spark-scala-rdd-tp27433.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>



-- 
Thanks & Regards
Sri Tummala


Re: spark local dir to HDFS ?

2016-07-05 Thread sri hari kali charan Tummala
thanks makes sense, can anyone answer this below question ?

http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-td27264.html

Thanks
Sri

On Tue, Jul 5, 2016 at 8:15 PM, Saisai Shao  wrote:

> It is not worked to configure local dirs to HDFS. Local dirs are mainly
> used for data spill and shuffle data persistence, it is not suitable to use
> hdfs. If you met capacity problem, you could configure multiple dirs
> located in different mounted disks.
>
> On Wed, Jul 6, 2016 at 9:05 AM, Sri  wrote:
>
>> Hi ,
>>
>> Space issue  we are currently using /tmp and at the moment we don't have
>> any mounted location setup yet.
>>
>> Thanks
>> Sri
>>
>>
>> Sent from my iPhone
>>
>> On 5 Jul 2016, at 17:22, Jeff Zhang  wrote:
>>
>> Any reason why you want to set this on hdfs ?
>>
>> On Tue, Jul 5, 2016 at 3:47 PM, kali.tumm...@gmail.com <
>> kali.tumm...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> can I set spark.local.dir to HDFS location instead of /tmp folder ?
>>>
>>> I tried setting up temp folder to HDFS but it didn't worked can
>>> spark.local.dir write to HDFS ?
>>>
>>> .set("spark.local.dir","hdfs://namednode/spark_tmp/")
>>>
>>>
>>> 16/07/05 15:35:47 ERROR DiskBlockManager: Failed to create local dir in
>>> hdfs://namenode/spark_tmp/. Ignoring this directory.
>>> java.io.IOException: Failed to create a temp directory (under
>>> hdfs://namenode/spark_tmp/) after 10 attempts!
>>>
>>>
>>> Thanks
>>> Sri
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-local-dir-to-HDFS-tp27291.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>> .
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>


-- 
Thanks & Regards
Sri Tummala


Re: spark parquet too many small files ?

2016-07-02 Thread sri hari kali charan Tummala
Hi Takeshi,

I cant use coalesce in spark-sql shell right I know we can use coalesce in
spark with scala application , here in my project we are not building jar
or using python we are just executing hive query in spark-sql shell and
submitting to yarn client .

Example:-
spark-sql --verbose --queue default --name wchargeback_event.sparksql.kali
--master yarn-client --driver-memory 15g --executor-memory 15g
--num-executors 10 --executor-cores 2 -f /x/home/pp_dt_fin_batch/users/
srtummala/run-spark/sql/wtr_full.sql --conf
"spark.yarn.executor.memoryOverhead=8000"
--conf "spark.sql.shuffle.partitions=50" --conf
"spark.kyroserializer.buffer.max.mb=5g" --conf "spark.driver.maxResultSize=20g"
--conf "spark.storage.memoryFraction=0.8" --conf
"spark.hadoopConfiguration=2560"
--conf "spark.dynamicAllocation.enabled=false$" --conf
"spark.shuffle.service.enabled=false" --conf "spark.executor.instances=10"

Thanks
Sri




On Sat, Jul 2, 2016 at 2:53 AM, Takeshi Yamamuro 
wrote:

> Please also see https://issues.apache.org/jira/browse/SPARK-16188.
>
> // maropu
>
> On Fri, Jul 1, 2016 at 7:39 PM, kali.tumm...@gmail.com <
> kali.tumm...@gmail.com> wrote:
>
>> I found the jira for the issue will there be a fix in future ? or no fix ?
>>
>> https://issues.apache.org/jira/browse/SPARK-6221
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27267.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>



-- 
Thanks & Regards
Sri Tummala


Re: how to run latest version of spark in old version of spark in cloudera cluster ?

2016-01-27 Thread sri hari kali charan Tummala
Thank you very much, well documented.

Thanks
Sri

On Wed, Jan 27, 2016 at 8:46 PM, Deenar Toraskar 
wrote:

> Sri
>
> Look at the instructions here. They are for 1.5.1, but should also work
> for 1.6
>
>
> https://www.linkedin.com/pulse/running-spark-151-cdh-deenar-toraskar-cfa?trk=hp-feed-article-title-publish&trkSplashRedir=true&forceNoSplash=true
>
> Deenar
>
>
> On 27 January 2016 at 20:16, Koert Kuipers  wrote:
>
>> you need to build spark 1.6 for your hadoop distro, and put that on the
>> proxy node and configure it correctly to find your cluster (hdfs and yarn).
>> then use the spark-submit script for that spark 1.6 version to launch your
>> application on yarn
>>
>> On Wed, Jan 27, 2016 at 3:11 PM, sri hari kali charan Tummala <
>> kali.tumm...@gmail.com> wrote:
>>
>>> Hi Koert,
>>>
>>> I am submitting my code (spark jar ) using spark-submit in proxy node ,
>>> I checked the version of the cluster and node its says 1.2 I dint really
>>> understand what you mean.
>>>
>>> can I ask yarn to use different version of spark ? or should I say
>>> override the spark_home variables to look at 1.6 spark jar ?
>>>
>>> Thanks
>>> Sri
>>>
>>> On Wed, Jan 27, 2016 at 7:45 PM, Koert Kuipers 
>>> wrote:
>>>
>>>> If you have yarn you can just launch your spark 1.6 job from a single
>>>> machine with spark 1.6 available on it and ignore the version of spark
>>>> (1.2) that is installed
>>>> On Jan 27, 2016 11:29, "kali.tumm...@gmail.com" 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Just realized cloudera version of spark on my cluster is 1.2, the jar
>>>>> which
>>>>> I built using maven is version 1.6 which is causing issue.
>>>>>
>>>>> Is there a way to run spark version 1.6 in 1.2 version of spark ?
>>>>>
>>>>> Thanks
>>>>> Sri
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-run-latest-version-of-spark-in-old-version-of-spark-in-cloudera-cluster-tp26087.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Sri Tummala
>>>
>>>
>>
>


-- 
Thanks & Regards
Sri Tummala


Re: how to run latest version of spark in old version of spark in cloudera cluster ?

2016-01-27 Thread sri hari kali charan Tummala
Hi Koert,

I am submitting my code (spark jar ) using spark-submit in proxy node , I
checked the version of the cluster and node its says 1.2 I dint really
understand what you mean.

can I ask yarn to use different version of spark ? or should I say override
the spark_home variables to look at 1.6 spark jar ?

Thanks
Sri

On Wed, Jan 27, 2016 at 7:45 PM, Koert Kuipers  wrote:

> If you have yarn you can just launch your spark 1.6 job from a single
> machine with spark 1.6 available on it and ignore the version of spark
> (1.2) that is installed
> On Jan 27, 2016 11:29, "kali.tumm...@gmail.com" 
> wrote:
>
>> Hi All,
>>
>> Just realized cloudera version of spark on my cluster is 1.2, the jar
>> which
>> I built using maven is version 1.6 which is causing issue.
>>
>> Is there a way to run spark version 1.6 in 1.2 version of spark ?
>>
>> Thanks
>> Sri
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-run-latest-version-of-spark-in-old-version-of-spark-in-cloudera-cluster-tp26087.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


-- 
Thanks & Regards
Sri Tummala


Re: how to turn off spark streaming gracefully ?

2015-12-18 Thread sri hari kali charan Tummala
Hi Cody,

KafkaUtils.createRDD totally make sense now I can run my spark job once in
15 minutes extract data out of kafka and stop ..., I rely on kafka offset
for Incremental data am I right ? so no duplicate data will be returned.


Thanks
Sri





On Fri, Dec 18, 2015 at 2:41 PM, Cody Koeninger  wrote:

> If you're really doing a daily batch job, have you considered just using
> KafkaUtils.createRDD rather than a streaming job?
>
> On Fri, Dec 18, 2015 at 5:04 AM, kali.tumm...@gmail.com <
> kali.tumm...@gmail.com> wrote:
>
>> Hi All,
>>
>> Imagine I have a Production spark streaming kafka (direct connection)
>> subscriber and publisher jobs running which publish and subscriber
>> (receive)
>> data from a kafka topic and I save one day's worth of data using
>> dstream.slice to Cassandra daily table (so I create daily table before
>> running spark streaming job).
>>
>> My question if all the above code runs in some scheduler like autosys how
>> should I say to spark publisher to stop publishing as it is End of day and
>> to spark subscriber to stop receiving to stop receiving without killing
>> the
>> jobs ? if I kill my autosys scheduler turns red saying the job had failed
>> etc ...
>> Is there a way to stop both subscriber and publisher with out killing or
>> terminating the code.
>>
>> Thanks
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-turn-off-spark-streaming-gracefully-tp25734.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Thanks & Regards
Sri Tummala


Re: Release data for spark 1.6?

2015-12-12 Thread sri hari kali charan Tummala
thanks Sean and Ted, I will wait for 1.6 to be out.

Happy Christmas to all !

Thanks
Sri

On Sat, Dec 12, 2015 at 12:18 PM, Ted Yu  wrote:

> Please take a look at SPARK-9078 which allows jdbc dialects to override
> the query for checking table existence.
>
> On Dec 12, 2015, at 7:12 PM, sri hari kali charan Tummala <
> kali.tumm...@gmail.com> wrote:
>
> Hi Michael, Ted,
>
>
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L48
>
> In Present spark version in line 48 there is a bug, to check whether table
> exists in a database using limit doesnt work for all databases sql server
> for example.
>
> best way to check whehter table exists in any database is to use, select *
> from table where 1=2;  or select 1 from table where 1=2; this supports all
> the databases.
>
> In spark 1.6 can this change be implemented, this lets
>  write.mode("append") bug to go away.
>
>
>
> def tableExists(conn: Connection, table: String): Boolean = { // Somewhat
> hacky, but there isn't a good way to identify whether a table exists for all 
> //
> SQL database systems, considering "table" could also include the database
> name. Try(conn.prepareStatement(s"SELECT 1 FROM $table LIMIT 1"
> ).executeQuery().next()).isSuccess }
>
> Solution:-
> def tableExists(conn: Connection, table: String): Boolean = { // Somewhat
> hacky, but there isn't a good way to identify whether a table exists for all 
> //
> SQL database systems, considering "table" could also include the database
> name. Try(conn.prepareStatement(s"SELECT 1 FROM $table where 1=2"
> ).executeQuery().next()).isSuccess }
>
>
>
> Thanks
> Sri
>
>
>
> On Wed, Dec 9, 2015 at 10:30 PM, Michael Armbrust 
> wrote:
>
>> The release date is "as soon as possible".  In order to make an Apache
>> release we must present a release candidate and have 72-hours of voting by
>> the PMC.  As soon as there are no known bugs, the vote will pass and 1.6
>> will be released.
>>
>> In the mean time, I'd love support from the community testing the most
>> recent release candidate.
>>
>> On Wed, Dec 9, 2015 at 2:19 PM, Sri  wrote:
>>
>>> Hi Ted,
>>>
>>> Thanks for the info , but there is no particular release date from my
>>> understanding the package is in testing there is no release date mentioned.
>>>
>>> Thanks
>>> Sri
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> > On 9 Dec 2015, at 21:38, Ted Yu  wrote:
>>> >
>>> > See this thread:
>>> >
>>> >
>>> http://search-hadoop.com/m/q3RTtBMZpK7lEFB1/Spark+1.6.0+RC&subj=Re+VOTE+Release+Apache+Spark+1+6+0+RC1+
>>> >
>>> >> On Dec 9, 2015, at 1:20 PM, "kali.tumm...@gmail.com" <
>>> kali.tumm...@gmail.com> wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> does anyone know exact release data for spark 1.6 ?
>>> >>
>>> >> Thanks
>>> >> Sri
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Release-data-for-spark-1-6-tp25654.html
>>> >> Sent from the Apache Spark User List mailing list archive at
>>> Nabble.com <http://nabble.com>.
>>> >>
>>> >> -
>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>> >>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>
> --
> Thanks & Regards
> Sri Tummala
>
>


-- 
Thanks & Regards
Sri Tummala


Re: spark data frame write.mode("append") bug

2015-12-12 Thread sri hari kali charan Tummala
Hi All,

https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L48

In Present spark version in line 48 there is a bug, to check whether table
exists in a database using limit doesnt work for all databases sql server
for example.

best way to check whehter table exists in any database is to use, select *
from table where 1=2;  or select 1 from table where 1=2; this supports all
the databases.

In spark 1.6 can this change be implemented, this lets
 write.mode("append") bug to go away.



def tableExists(conn: Connection, table: String): Boolean = {

// Somewhat hacky, but there isn't a good way to identify whether a
table exists for all
// SQL database systems, considering "table" could also include the
database name.
Try(conn.prepareStatement(s"SELECT 1 FROM $table LIMIT
1").executeQuery().next()).isSuccess
  }

Solution:-
def tableExists(conn: Connection, table: String): Boolean = {

// Somewhat hacky, but there isn't a good way to identify whether a
table exists for all
// SQL database systems, considering "table" could also include the
database name.
Try(conn.prepareStatement(s"SELECT 1 FROM $table where
1=2").executeQuery().next()).isSuccess
  }



Thanks

On Wed, Dec 9, 2015 at 4:24 PM, Seongduk Cheon  wrote:

> Not for sure, but I think it is bug as of 1.5.
>
> Spark is using LIMIT keyword whether a table exists.
>
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L48
>
> If your database does not support LIMIT keyword such as SQL Server, spark
> try to create table
>
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L272-L275
>
> This issue has already fixed and It will be released on 1.6
> https://issues.apache.org/jira/browse/SPARK-9078
>
>
> --
> Cheon
>
> 2015-12-09 22:54 GMT+09:00 kali.tumm...@gmail.com 
> :
>
>> Hi Spark Contributors,
>>
>> I am trying to append data  to target table using df.write.mode("append")
>> functionality but spark throwing up table already exists exception.
>>
>> Is there a fix scheduled in later spark release ?, I am using spark 1.5.
>>
>> val sourcedfmode=sourcedf.write.mode("append")
>> sourcedfmode.jdbc(TargetDBinfo.url,TargetDBinfo.table,targetprops)
>>
>> Full Code:-
>>
>> https://github.com/kali786516/ScalaDB/blob/master/src/main/java/com/kali/db/SaprkSourceToTargetBulkLoad.scala
>>
>> Spring Config File:-
>>
>> https://github.com/kali786516/ScalaDB/blob/master/src/main/resources/SourceToTargetBulkLoad.xml
>>
>>
>> Thanks
>> Sri
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-data-frame-write-mode-append-bug-tp25650.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>


-- 
Thanks & Regards
Sri Tummala


Re: Release data for spark 1.6?

2015-12-12 Thread sri hari kali charan Tummala
Hi Michael, Ted,

https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L48

In Present spark version in line 48 there is a bug, to check whether table
exists in a database using limit doesnt work for all databases sql server
for example.

best way to check whehter table exists in any database is to use, select *
from table where 1=2;  or select 1 from table where 1=2; this supports all
the databases.

In spark 1.6 can this change be implemented, this lets
 write.mode("append") bug to go away.



def tableExists(conn: Connection, table: String): Boolean = { // Somewhat
hacky, but there isn't a good way to identify whether a table exists for all //
SQL database systems, considering "table" could also include the database
name. Try(conn.prepareStatement(s"SELECT 1 FROM $table LIMIT 1"
).executeQuery().next()).isSuccess }

Solution:-
def tableExists(conn: Connection, table: String): Boolean = { // Somewhat
hacky, but there isn't a good way to identify whether a table exists for all //
SQL database systems, considering "table" could also include the database
name. Try(conn.prepareStatement(s"SELECT 1 FROM $table where 1=2"
).executeQuery().next()).isSuccess }



Thanks
Sri



On Wed, Dec 9, 2015 at 10:30 PM, Michael Armbrust 
wrote:

> The release date is "as soon as possible".  In order to make an Apache
> release we must present a release candidate and have 72-hours of voting by
> the PMC.  As soon as there are no known bugs, the vote will pass and 1.6
> will be released.
>
> In the mean time, I'd love support from the community testing the most
> recent release candidate.
>
> On Wed, Dec 9, 2015 at 2:19 PM, Sri  wrote:
>
>> Hi Ted,
>>
>> Thanks for the info , but there is no particular release date from my
>> understanding the package is in testing there is no release date mentioned.
>>
>> Thanks
>> Sri
>>
>>
>>
>> Sent from my iPhone
>>
>> > On 9 Dec 2015, at 21:38, Ted Yu  wrote:
>> >
>> > See this thread:
>> >
>> >
>> http://search-hadoop.com/m/q3RTtBMZpK7lEFB1/Spark+1.6.0+RC&subj=Re+VOTE+Release+Apache+Spark+1+6+0+RC1+
>> >
>> >> On Dec 9, 2015, at 1:20 PM, "kali.tumm...@gmail.com" <
>> kali.tumm...@gmail.com> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> does anyone know exact release data for spark 1.6 ?
>> >>
>> >> Thanks
>> >> Sri
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Release-data-for-spark-1-6-tp25654.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >>
>> >> -
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 
Thanks & Regards
Sri Tummala


Re: spark sql current time stamp function ?

2015-12-07 Thread sri hari kali charan Tummala
Hi Ted,

Gave and exception am I following right approach ?

val test=sqlContext.sql("select *,  monotonicallyIncreasingId()  from kali")


On Mon, Dec 7, 2015 at 4:52 PM, Ted Yu  wrote:

> Have you tried using monotonicallyIncreasingId ?
>
> Cheers
>
> On Mon, Dec 7, 2015 at 7:56 AM, Sri  wrote:
>
>> Thanks , I found the right function current_timestamp().
>>
>> different Question:-
>> Is there a row_number() function in spark SQL ? Not in Data frame just
>> spark SQL?
>>
>>
>> Thanks
>> Sri
>>
>> Sent from my iPhone
>>
>> On 7 Dec 2015, at 15:49, Ted Yu  wrote:
>>
>> Does unix_timestamp() satisfy your needs ?
>> See sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala
>>
>> On Mon, Dec 7, 2015 at 6:54 AM, kali.tumm...@gmail.com <
>> kali.tumm...@gmail.com> wrote:
>>
>>> I found a way out.
>>>
>>> import java.text.SimpleDateFormat
>>> import java.util.Date;
>>>
>>> val format = new SimpleDateFormat("-M-dd hh:mm:ss")
>>>
>>>  val testsql=sqlContext.sql("select
>>> column1,column2,column3,column4,column5
>>> ,'%s' as TIME_STAMP from TestTable limit 10".format(format.format(new
>>> Date(
>>>
>>>
>>> Thanks
>>> Sri
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-current-time-stamp-function-tp25620p25621.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>


-- 
Thanks & Regards
Sri Tummala


Re: Pass spark partition explicitly ?

2015-10-18 Thread sri hari kali charan Tummala
Hi Richard,

Thanks so my take from your discussion is we want pass explicitly partition
values it have to be written inside the code.

Thanks
Sri

On Sun, Oct 18, 2015 at 7:05 PM, Richard Eggert 
wrote:

> If you want to override the default partitioning behavior,  you have to do
> so in your code where you create each RDD. Different RDDs usually have
> different numbers of partitions (except when one RDD is directly derived
> from another without shuffling) because they usually have different sizes,
> so it wouldn't make sense to have some sort of "global" notion of how many
> partitions to create.  You could,  if you wanted,  pass partition counts in
> as command line options to your application and use those values in your
> code that creates the RDDs, of course.
>
> Rich
> On Oct 18, 2015 1:57 PM, "kali.tumm...@gmail.com" 
> wrote:
>
>> Hi All,
>>
>> can I pass number of partitions to all the RDD explicitly while submitting
>> the spark Job or di=o I need to mention in my spark code itself ?
>>
>> Thanks
>> Sri
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


-- 
Thanks & Regards
Sri Tummala