Re: Basic question on RDD caching

David Thomas Wed, 19 Feb 2014 22:29:21 -0800

Thanks.

Another basic question:

Lets say derivedRDD is much larger than originalRDD and it doesn't fit into
memory. Will Spark take care of automatically spilling it to disk? or will
I face JavaHeap out of memory?

On Wed, Feb 19, 2014 at 11:05 PM, Ewen Cheslack-Postava <m...@ewencp.org>wrote:

>
> Only originalRDD is cached. You need to call cache/persist for every RDD
> you want cached.
>
>   David Thomas <dt5434...@gmail.com>
>  February 19, 2014 at 10:03 PM
> When I persist/cache an RDD, are all the derived RDDs cached as well or do
> I need to  call cache individually on each RDD if I need them to be cached?
>
> For ex:
>
> val originalRDD = sc.parallelize(...)
> originalRDD.cache
> val derivedRDD = originalRDD.map()
>
> Is derivedRDD cached in this case?
>
>

<<inline: compose-unknown-contact.jpg>>

Re: Basic question on RDD caching

Reply via email to