Hello Jeff, I'm quite new in the Spark topic but as far as I understood caching Spark uses the available memory and if more memory is requested cached RDDs are thrown away in a LRU manner and will be recomputed when needed. Please correct me if I'm wrong
Regards Lars > Am 13.09.2015 um 05:14 schrieb Hemminger Jeff <j...@atware.co.jp>: > > I am trying to understand the process of caching and specifically what the > behavior is when the cache is full. Please excuse me if this question is a > little vague, I am trying to build my understanding of this process. > > I have an RDD that I perform several computations with, I persist it with > IN_MEMORY_SER before performing the computations. > > I believe that, due to insufficient memory, it is recomputing (at least part > of) the RDD each time. > > Logging shows that the RDD was not cached previously, and therefore needs to > be computed. > > I looked at the BlockManager Spark code, and see that getOrCompute attempts > to retrieve memory from cache. If it is not available, it computes it. > > Can I assume that when Spark attempts to cache an RDD but runs out of memory, > it recomputes a part of the RDD each time it is read? > > I think I might be incorrect in this assumption, because I would have > expected a warning message if the cache was out of memory. > > Thanks, > Jeff --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org