Thank you Akhil!
Date: Fri, 14 Aug 2015 14:51:56 +0530
Subject: Re: RDD.join vs spark SQL join
From: ak...@sigmoidanalytics.com
To: jiangxia...@outlook.com
CC: user@spark.apache.org
Both works the same way, but with SparkSQL you will get the optimization etc
done by the catalyst. One important
Both works the same way, but with SparkSQL you will get the optimization
etc done by the catalyst. One important thing to consider is the #
partitions and the key distribution (when you are doing RDD.join), If the
keys are not evenly distributed across machines then you can see the
process chocking