Re: [Spark] RDDs are not persisting in memory

diplomatic Guru Tue, 11 Oct 2016 04:44:28 -0700

Hello team. so I found and resolved the issue. In case if someone run into
same problem this was the problem.


>>Each nodes were allocated 1397MB memory for storages.
16/10/11 13:16:58 INFO storage.MemoryStore: MemoryStore started with
capacity 1397.3 MB

>> However, my RDD exceeded the storage limit (although it says computed
1224MB).

16/10/11 13:18:36 WARN storage.MemoryStore: Not enough space to cache
rdd_6_0 in memory! (computed 1224.3 MB so far)
16/10/11 13:18:36 INFO storage.MemoryStore: Memory use = 331.8 KB (blocks)
+ 1224.3 MB (scratch space shared across 2 tasks(s)) = 1224.6 MB. Storage
limit = 1397.3 MB.

Therefore, I repartitioned the RDDs for better memory utilisation, wich
resolved the issue.

Kind regards,

Guru


On 11 October 2016 at 11:23, diplomatic Guru <[email protected]>
wrote:

> @Song, I have called an action but it did not cache as you can see in the
> provided screenshot on my original email. It has cahced into Disk but not
> memory.
>
> @Chin Wei Low, I have 15GB memory allocated which is more than the dataset
> size.
>
> Any other suggestion please?
>
>
> Kind regards,
>
> Guru
>
> On 11 October 2016 at 03:34, Chin Wei Low <[email protected]> wrote:
>
>> Hi,
>>
>> Your RDD is 5GB, perhaps it is too large to fit into executor's storage
>> memory. You can refer to the Executors tab in Spark UI to check the
>> available memory for storage for each of the executor.
>>
>> Regards,
>> Chin Wei
>>
>> On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru <
>> [email protected]> wrote:
>>
>>> Hello team,
>>>
>>> Spark version: 1.6.0
>>>
>>> I'm trying to persist done data into memory for reusing them. However,
>>> when I call rdd.cache() OR  rdd.persist(StorageLevel.MEMORY_ONLY())  it
>>> does not store the data as I can not see any rdd information under WebUI
>>> (Storage Tab).
>>>
>>> Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for
>>> which it stored the data into Disk only as shown in below screenshot:
>>>
>>> [image: Inline images 2]
>>>
>>> Do you know why the memory is not being used?
>>>
>>> Is there a configuration in cluster level to stop jobs from storing data
>>> into memory altogether?
>>>
>>>
>>> Please let me know.
>>>
>>> Thanks
>>>
>>> Guru
>>>
>>>
>>
>

Re: [Spark] RDDs are not persisting in memory

Reply via email to