Re: Why RDD is not cached?

shahab Tue, 28 Oct 2014 01:20:04 -0700

I used Cache followed by a "count" on RDD to ensure that caching is
performed.


val rdd = srdd.flatMap(mapProfile_To_Sessions).cache

   val count = rdd.count

//so at this point RDD should be cahed ? right?

On Tue, Oct 28, 2014 at 8:35 AM, Sean Owen <[email protected]> wrote:

> Did you just call cache()? By itself it does nothing but once an action
> requires it to be computed it should become cached.
> On Oct 28, 2014 8:19 AM, "shahab" <[email protected]> wrote:
>
>> Hi,
>>
>> I have a standalone spark , where the executor is set to have 6.3 G
>> memory , as I am using two workers so in total there 12.6 G memory and 4
>> cores.
>>
>> I am trying to cache a RDD with approximate size of 3.2 G, but apparently
>> it is not cached as neither I can see  "  BlockManagerMasterActor: Added
>> rdd_XX in memory " nor  the performance of running the tasks is improved
>>
>> But, why it is not cached when there is enough memory storage?
>> I tried with smaller RDDs. 1 or 2 G and it works, at least I could see 
>> "BlockManagerMasterActor:
>> Added rdd_0_1 in memory" and improvement in results.
>>
>> Any idea what I am missing in my settings, or... ?
>>
>> thanks,
>> /Shahab
>>
>

Re: Why RDD is not cached?

Reply via email to