Thanks. Another basic question:
Lets say derivedRDD is much larger than originalRDD and it doesn't fit into memory. Will Spark take care of automatically spilling it to disk? or will I face JavaHeap out of memory? On Wed, Feb 19, 2014 at 11:05 PM, Ewen Cheslack-Postava <m...@ewencp.org>wrote: > > Only originalRDD is cached. You need to call cache/persist for every RDD > you want cached. > > David Thomas <dt5434...@gmail.com> > February 19, 2014 at 10:03 PM > When I persist/cache an RDD, are all the derived RDDs cached as well or do > I need to call cache individually on each RDD if I need them to be cached? > > For ex: > > val originalRDD = sc.parallelize(...) > originalRDD.cache > val derivedRDD = originalRDD.map() > > Is derivedRDD cached in this case? > >
<<inline: compose-unknown-contact.jpg>>