dev
<dev@spark.apache.org>
Subject: Re: aggregateByKey on PairRDD
Hi,shouldn't groupByKey be avoided
(https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?
Thank you,.Daniel
On Wed, Mar 30, 2016 at 9:01 AM, Akhi
Isn't it what tempRDD.groupByKey does?
Thanks
Best Regards
On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh
wrote:
> Hi All,
>
> I have an RDD having the data in the following form :
>
> tempRDD: RDD[(String, (String, String))]
>
> (brand , (product, key))
>
>
Hi All,
I have an RDD having the data in the following form :
tempRDD: RDD[(String, (String, String))]
(brand , (product, key))
("amazon",("book1","tech"))
("eBay",("book1","tech"))
("barns",("book","tech"))
("amazon",("book2","tech"))
I would like to group the data by Brand and would