RE: RDD.join vs spark SQL join

2015-08-15 Thread Xiao JIANG
Thank you Akhil! Date: Fri, 14 Aug 2015 14:51:56 +0530 Subject: Re: RDD.join vs spark SQL join From: ak...@sigmoidanalytics.com To: jiangxia...@outlook.com CC: user@spark.apache.org Both works the same way, but with SparkSQL you will get the optimization etc done by the catalyst. One important

Re: RDD.join vs spark SQL join

2015-08-14 Thread Akhil Das
Both works the same way, but with SparkSQL you will get the optimization etc done by the catalyst. One important thing to consider is the # partitions and the key distribution (when you are doing RDD.join), If the keys are not evenly distributed across machines then you can see the process

RDD.join vs spark SQL join

2015-08-13 Thread Xiao JIANG
Hi,May I know the performance difference the rdd.join function and spark SQL join operation. If I want to join several big Rdds, how should I decide which one I should use? What are the factors to consider here? Thanks!