Re: aggregateByKey on PairRDD

2016-03-30 Thread write2sivakumar@gmail
dev <dev@spark.apache.org> Subject: Re: aggregateByKey on PairRDD Hi,shouldn't groupByKey be avoided (https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html) ? Thank you,.Daniel On Wed, Mar 30, 2016 at 9:01 AM, Akhi

Re: aggregateByKey on PairRDD

2016-03-30 Thread Akhil Das
Isn't it what tempRDD.groupByKey does? Thanks Best Regards On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh wrote: > Hi All, > > I have an RDD having the data in the following form : > > tempRDD: RDD[(String, (String, String))] > > (brand , (product, key)) > >

aggregateByKey on PairRDD

2016-03-29 Thread Suniti Singh
Hi All, I have an RDD having the data in the following form : tempRDD: RDD[(String, (String, String))] (brand , (product, key)) ("amazon",("book1","tech")) ("eBay",("book1","tech")) ("barns",("book","tech")) ("amazon",("book2","tech")) I would like to group the data by Brand and would