I too have seen cached RDDs not hit 100%, even when they are DISK_ONLY.
Just saw that yesterday in fact. In some cases RDDs I expected didn't show
up in the list at all. I have no idea if this is an issue with Spark or
something I'm not understanding about how persist works (probably the
latter).
Could you try to click one that RDD and see the storage info per
partition? I tried continuously caching RDDs, so new ones kick old
ones out when there is not enough memory. I saw similar glitches but
the storage info per partition is correct. If you find a way to
reproduce this error, please
Xiangrui, clicking into the RDD link, it gives the same message, say only
96 of 100 partitions are cached. The disk/memory usage are the same, which
is far below the limit.
Is this what you want to check or other issue?
On Wed, Jun 11, 2014 at 4:38 PM, Xiangrui Meng men...@gmail.com wrote: