Re: aggregateByKey on PairRDD

2016-03-30 Thread write2sivakumar@gmail


Hi,
We can use CombineByKey to achieve this.
val finalRDD = tempRDD.combineByKey((x: (Any, Any)) => (x),(acc: (Any, Any), x) 
=> (acc, x),(acc1: (Any, Any), acc2: (Any, Any)) => (acc1, acc2))
finalRDD.collect.foreach(println)
(amazon,((book1, tech),(book2,tech)))(barns, (book,tech))(eBay, 
(book1,tech))
Thanks,Sivakumar

 Original message 
From: Daniel Haviv <daniel.ha...@veracity-group.com> 
Date: 30/03/2016  18:58  (GMT+08:00) 
To: Akhil Das <ak...@sigmoidanalytics.com> 
Cc: Suniti Singh <suniti.si...@gmail.com>, u...@spark.apache.org, dev 
<dev@spark.apache.org> 
Subject: Re: aggregateByKey on PairRDD 

Hi,shouldn't groupByKey be avoided 
(https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
 ?

Thank you,.Daniel
On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
Isn't it what tempRDD.groupByKey does? 
ThanksBest Regards

On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <suniti.si...@gmail.com> wrote:
Hi All,
I have an RDD having the data in  the following form :








tempRDD: RDD[(String, (String, String))](brand , (product, 
key))("amazon",("book1","tech"))("eBay",("book1","tech"))
("barns",("book","tech"))
("amazon",("book2","tech"))
I would like to group the data by Brand and would like to get the result set in 
the following format :resultSetRDD : RDD[(String, List[(String), (String)]i 
tried using the aggregateByKey but kind  of not getting how to achieve this. OR 
is there any other way to achieve this?







val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) => aggr + 
String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)resultSetRDD = 
(amazon,("book1","tech"),("book2","tech"))Thanks,Suniti






Re: aggregateByKey on PairRDD

2016-03-30 Thread Akhil Das
Isn't it what tempRDD.groupByKey does?

Thanks
Best Regards

On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh 
wrote:

> Hi All,
>
> I have an RDD having the data in  the following form :
>
> tempRDD: RDD[(String, (String, String))]
>
> (brand , (product, key))
>
> ("amazon",("book1","tech"))
>
> ("eBay",("book1","tech"))
>
> ("barns",("book","tech"))
>
> ("amazon",("book2","tech"))
>
>
> I would like to group the data by Brand and would like to get the result
> set in the following format :
>
> resultSetRDD : RDD[(String, List[(String), (String)]
>
> i tried using the aggregateByKey but kind  of not getting how to achieve
> this. OR is there any other way to achieve this?
>
> val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) =>
> aggr + String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)
>
> resultSetRDD = (amazon,("book1","tech"),("book2","tech"))
>
> Thanks,
>
> Suniti
>


aggregateByKey on PairRDD

2016-03-29 Thread Suniti Singh
Hi All,

I have an RDD having the data in  the following form :

tempRDD: RDD[(String, (String, String))]

(brand , (product, key))

("amazon",("book1","tech"))

("eBay",("book1","tech"))

("barns",("book","tech"))

("amazon",("book2","tech"))


I would like to group the data by Brand and would like to get the result
set in the following format :

resultSetRDD : RDD[(String, List[(String), (String)]

i tried using the aggregateByKey but kind  of not getting how to achieve
this. OR is there any other way to achieve this?

val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) => aggr
+ String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)

resultSetRDD = (amazon,("book1","tech"),("book2","tech"))

Thanks,

Suniti