Maybe your memory isn't enough to contain the current RDD and also all the
past ones?
RDDs that are cached or persisted have to be unpersisted explicitly, no
auto-unpersist (maybe changes will be for 1.0 version?) exists.
Be careful that calling cache() or persist() doesn't imply the RDD will be
thx for the help, unpersist is excatly what I want:)
I see that spark will remove some cache automatically when memory is full,
it is much more helpful if the rule satisfy something like LRU
It seems that persist and cache is some kind of lazy?
--
View this message in context:
Yes, persist/cache will cache an RDD only when an action is applied to it.
On Sun, May 4, 2014 at 6:32 AM, Earthson earthson...@gmail.com wrote:
thx for the help, unpersist is excatly what I want:)
I see that spark will remove some cache automatically when memory is full,
it is much more
I'm using spark for LDA impementation. I need cache RDD for next step of
Gibbs Sampling, and cached the result and the cache previous could be
uncache. Something like LRU cache should delete the previous cache because
it is never used then, but the cache runs into confusion:
Here is the code:)