I think it is because A.join(B) is a shuffle map stage, whose result is stored 
temporarily (i'm not sure it's in memeory or in disk)I saw the word "map 
output" in the log of my spark application, i think it is the intermediate 
result of my application, and according to the log, it is stored


qinwei
 From: ajay gargDate: 2014-11-13 14:56To: userSubject: Joined RDDHi,
     I have two RDDs A and B which are created from reading file from HDFS.
I have a third RDD C which is created by taking join of A and B. All three
RDDs (A, B and C ) are not cached.
Now if I perform any action on C (let say collect), action is served without
reading any data from the disk.
Since no data is cached in spark how is action on C is served without
reading data from disk.
 
Thanks
--Ajay
 
 
 
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
 

Reply via email to