Yes it would, you can create a key and then partition it (say
HashPartitioner) and then joining would be faster as all the similar keys
will go in one partition.
Thanks
Best Regards
On Sun, Feb 1, 2015 at 5:13 PM, Sunita Arvind wrote:
> Hi All
>
> We are joining large tables using spark sql and
Hi All
We are joining large tables using spark sql and running into shuffle
issues. We have explored multiple options - using coalesce to reduce number
of partitions, tuning various parameters like disk buffer, reducing data in
chunks etc. which all seem to help btw. What I would like to know is,