Hello Jeff,

I'm quite new in the Spark topic but as far as I understood caching Spark uses 
the available memory and if more memory is requested cached RDDs are thrown 
away in a LRU manner and will be recomputed when needed. Please correct me if 
I'm wrong 

Regards Lars 

> Am 13.09.2015 um 05:14 schrieb Hemminger Jeff <j...@atware.co.jp>:
> 
> I am trying to understand the process of caching and specifically what the 
> behavior is when the cache is full. Please excuse me if this question is a 
> little vague, I am trying to build my understanding of this process.
> 
> I have an RDD that I perform several computations with, I persist it with 
> IN_MEMORY_SER before performing the computations.
> 
> I believe that, due to insufficient memory, it is recomputing (at least part 
> of) the RDD each time.
> 
> Logging shows that the RDD was not cached previously, and therefore needs to 
> be computed.
> 
> I looked at the BlockManager Spark code, and see that getOrCompute attempts 
> to retrieve memory from cache. If it is not available, it computes it.
> 
> Can I assume that when Spark attempts to cache an RDD but runs out of memory, 
> it recomputes a part of the RDD each time it is read?
> 
> I think I might be incorrect in this assumption, because I would have 
> expected a warning message if the cache was out of memory.
> 
> Thanks,
> Jeff

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to