GroupByKey results in OOM - Any other alternative

2014-06-14 Thread Vivek YS
Hi, For last couple of days I have been trying hard to get around this problem. Please share any insights on solving this problem. Problem : There is a huge list of (key, value) pairs. I want to transform this to (key, distinct values) and then eventually to (key, distinct values count) On

Re: GroupByKey results in OOM - Any other alternative

2014-06-14 Thread Vivek YS
accuracy. On Sat, Jun 14, 2014 at 1:58 PM, Vivek YS vivek...@gmail.com wrote: Hi, For last couple of days I have been trying hard to get around this problem. Please share any insights on solving this problem. Problem : There is a huge list of (key, value) pairs. I want to transform

Re: Broadcst RDD Lookup

2014-05-01 Thread Vivek YS
No I am sure the items match. Because userCluster productCluster are prepared from data . Cross product of userCluster productCluster is a super set of data. On Thu, May 1, 2014 at 3:41 PM, Mayur Rustagi mayur.rust...@gmail.comwrote: Mostly none of the items in PairRDD match your input.