subject:"\"Is pair rdd join more efficient than regular rdd\""

Re: Is pair rdd join more efficient than regular rdd

2015-02-02 Thread Akhil Das

Yes it would, you can create a key and then partition it (say HashPartitioner) and then joining would be faster as all the similar keys will go in one partition. Thanks Best Regards On Sun, Feb 1, 2015 at 5:13 PM, Sunita Arvind wrote: > Hi All > > We are joining large tables using spark sql and

Is pair rdd join more efficient than regular rdd

2015-02-01 Thread Sunita Arvind

Hi All We are joining large tables using spark sql and running into shuffle issues. We have explored multiple options - using coalesce to reduce number of partitions, tuning various parameters like disk buffer, reducing data in chunks etc. which all seem to help btw. What I would like to know is,