I used Cache followed by a "count" on RDD to ensure that caching is performed.
val rdd = srdd.flatMap(mapProfile_To_Sessions).cache val count = rdd.count //so at this point RDD should be cahed ? right? On Tue, Oct 28, 2014 at 8:35 AM, Sean Owen <[email protected]> wrote: > Did you just call cache()? By itself it does nothing but once an action > requires it to be computed it should become cached. > On Oct 28, 2014 8:19 AM, "shahab" <[email protected]> wrote: > >> Hi, >> >> I have a standalone spark , where the executor is set to have 6.3 G >> memory , as I am using two workers so in total there 12.6 G memory and 4 >> cores. >> >> I am trying to cache a RDD with approximate size of 3.2 G, but apparently >> it is not cached as neither I can see " BlockManagerMasterActor: Added >> rdd_XX in memory " nor the performance of running the tasks is improved >> >> But, why it is not cached when there is enough memory storage? >> I tried with smaller RDDs. 1 or 2 G and it works, at least I could see >> "BlockManagerMasterActor: >> Added rdd_0_1 in memory" and improvement in results. >> >> Any idea what I am missing in my settings, or... ? >> >> thanks, >> /Shahab >> >
