The mostly likely explanation is that you wanted to put all the partitions in memory and they don't all fit. Unless you asked to persist to memory or disk, some partitions will simply not be cached.
Consider using MEMORY_OR_DISK persistence. This can also happen if blocks were lost due to node failure. On Wed, Feb 18, 2015 at 3:19 PM, shahab <shahab.mok...@gmail.com> wrote: > Hi, > > I have a cached RDD (I can see in UI that it is cached), but when I use this > RDD , I can see that the RDD is partially recomputed (computed) again. It is > "partially" because I can see in UI that some task are skipped (have a look > at the attached figure). > > Now the question is 1: what causes a cached RDD to be recomputed again? and > why somes tasks are skipped and some not?? > > best, > /Shahab > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org