First of all any action is only performed when you trigger a collect, When you trigger collect, at that point it retrieves data from disk joins the datasets together & delivers it to you.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Nov 13, 2014 at 12:26 PM, ajay garg <[email protected]> wrote: > Hi, > I have two RDDs A and B which are created from reading file from HDFS. > I have a third RDD C which is created by taking join of A and B. All three > RDDs (A, B and C ) are not cached. > Now if I perform any action on C (let say collect), action is served > without > reading any data from the disk. > Since no data is cached in spark how is action on C is served without > reading data from disk. > > Thanks > --Ajay > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
