Hello team. so I found and resolved the issue. In case if someone run into same problem this was the problem.
>>Each nodes were allocated 1397MB memory for storages. 16/10/11 13:16:58 INFO storage.MemoryStore: MemoryStore started with capacity 1397.3 MB >> However, my RDD exceeded the storage limit (although it says computed 1224MB). 16/10/11 13:18:36 WARN storage.MemoryStore: Not enough space to cache rdd_6_0 in memory! (computed 1224.3 MB so far) 16/10/11 13:18:36 INFO storage.MemoryStore: Memory use = 331.8 KB (blocks) + 1224.3 MB (scratch space shared across 2 tasks(s)) = 1224.6 MB. Storage limit = 1397.3 MB. Therefore, I repartitioned the RDDs for better memory utilisation, wich resolved the issue. Kind regards, Guru On 11 October 2016 at 11:23, diplomatic Guru <[email protected]> wrote: > @Song, I have called an action but it did not cache as you can see in the > provided screenshot on my original email. It has cahced into Disk but not > memory. > > @Chin Wei Low, I have 15GB memory allocated which is more than the dataset > size. > > Any other suggestion please? > > > Kind regards, > > Guru > > On 11 October 2016 at 03:34, Chin Wei Low <[email protected]> wrote: > >> Hi, >> >> Your RDD is 5GB, perhaps it is too large to fit into executor's storage >> memory. You can refer to the Executors tab in Spark UI to check the >> available memory for storage for each of the executor. >> >> Regards, >> Chin Wei >> >> On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru < >> [email protected]> wrote: >> >>> Hello team, >>> >>> Spark version: 1.6.0 >>> >>> I'm trying to persist done data into memory for reusing them. However, >>> when I call rdd.cache() OR rdd.persist(StorageLevel.MEMORY_ONLY()) it >>> does not store the data as I can not see any rdd information under WebUI >>> (Storage Tab). >>> >>> Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for >>> which it stored the data into Disk only as shown in below screenshot: >>> >>> [image: Inline images 2] >>> >>> Do you know why the memory is not being used? >>> >>> Is there a configuration in cluster level to stop jobs from storing data >>> into memory altogether? >>> >>> >>> Please let me know. >>> >>> Thanks >>> >>> Guru >>> >>> >> >
