First of all any action is only performed when you trigger a collect,
When you trigger collect, at that point it retrieves data from disk joins
the datasets together & delivers it to you.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>


On Thu, Nov 13, 2014 at 12:26 PM, ajay garg <[email protected]> wrote:

> Hi,
>      I have two RDDs A and B which are created from reading file from HDFS.
> I have a third RDD C which is created by taking join of A and B. All three
> RDDs (A, B and C ) are not cached.
> Now if I perform any action on C (let say collect), action is served
> without
> reading any data from the disk.
> Since no data is cached in spark how is action on C is served without
> reading data from disk.
>
> Thanks
> --Ajay
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to