Will not store rdd_16_4383 as it would require dropping another block from the same RDD

Alexander Pivovarov Fri, 15 Apr 2016 12:41:13 -0700

I run Spark 1.6.1 on YARN   (EMR-4.5.0)

I call RDD.count on MEMORY_ONLY_SER cached RDD (spark.serializer is
KryoSerializer)


after count task is done I noticed that Spark UI shows that RDD Fraction
Cached is 6% only
Size in Memory = 65.3 GB

I looked at Executors stderr on Spark UI and saw lots of messages like

16/04/15 19:08:03 INFO storage.MemoryStore: Will not store rdd_16_4383
as it would require dropping another block from the same RDD
16/04/15 19:08:03 WARN storage.MemoryStore: Not enough space to cache
rdd_16_4383 in memory! (computed 1462.4 MB so far)
16/04/15 19:08:03 INFO storage.MemoryStore: Memory use = 11.0 KB
(blocks) + 33.8 GB (scratch space shared across 17 tasks(s)) = 33.8
GB. Storage limit = 33.8 GB.
16/04/15 19:08:06 INFO storage.MemoryStore: Will not store rdd_16_4306
as it would require dropping another block from the same RDD
16/04/15 19:08:06 WARN storage.MemoryStore: Not enough space to cache
rdd_16_4306 in memory! (computed 1920.6 MB so far)
16/04/15 19:08:06 INFO storage.MemoryStore: Memory use = 11.0 KB
(blocks) + 33.8 GB (scratch space shared across 17 tasks(s)) = 33.8
GB. Storage limit = 33.8 GB.



But the cluster has memory to cache 3 RDDs like that

spark.executor.instances - 100

spark.executor.memory - 48524M

Storage Memory on each executor - 33.8 GB

Executors Memory: 67.2 GB Used (3.3 TB Total)


If my RDD takes 65.3 GB in memory storage when RDD Fraction Cached =
6%  then total size in memory should be about 1.1 TB


The cluster has 3.3 TB total storage memory and only 1 application is
running now (the RDD is the first RDD to cache in my programm)

Why Spark can not store entire RDD in memory?


BTW. Previous Spark 1.5.2 stores 100% of the RDD


Should I switch to legacy mode? spark.memory.useLegacyMode=true


   -

Will not store rdd_16_4383 as it would require dropping another block from the same RDD

Reply via email to