I am trying to understand the process of caching and specifically what the behavior is when the cache is full. Please excuse me if this question is a little vague, I am trying to build my understanding of this process.
I have an RDD that I perform several computations with, I persist it with IN_MEMORY_SER before performing the computations. I believe that, due to insufficient memory, it is recomputing (at least part of) the RDD each time. Logging shows that the RDD was not cached previously, and therefore needs to be computed. I looked at the BlockManager Spark code, and see that getOrCompute attempts to retrieve memory from cache. If it is not available, it computes it. Can I assume that when Spark attempts to cache an RDD but runs out of memory, it recomputes a part of the RDD each time it is read? I think I might be incorrect in this assumption, because I would have expected a warning message if the cache was out of memory. Thanks, Jeff