Set spark.storage.memoryFraction flag to 1 while creating the sparkContext to utilize upto 73Gb of your memory, default it 0.6 and hence you are getting 33.6Gb. Also set rdd.compression and StorageLevel as MEMORY_ONLY_SER if your data is kind of larger than your available memory. (you could try MEMORY_AND_DISK_SER also)
Thanks Best Regards On Wed, Dec 3, 2014 at 12:23 AM, akhandeshi <[email protected]> wrote: > I am running in local mode. I am using google n1-highmem-16 (16 vCPU, 104 > GB > memory) machine. > > I have allocated the SPARK_DRIVER_MEMORY=95g > > I see Memory: 33.6 GB Used (73.7 GB Total) that the exeuctor is using. > > In the log out put below, I see 33.6 gb blocks are used by 2 rdds that I > have cached. I should still have 40.2 gb left. > > However, I see messages like: > > 14/12/02 18:15:04 WARN storage.MemoryStore: Not enough space to cache > rdd_15_9 in memory! (computed 8.1 GB so far) > 14/12/02 18:15:04 INFO storage.MemoryStore: Memory use = 33.6 GB (blocks) + > 40.1 GB (scratch space shared across 14 thread(s)) = 73.7 GB. Storage limit > = 73.7 GB. > 14/12/02 18:15:04 WARN spark.CacheManager: Persisting partition rdd_15_9 to > disk instead. > . > . > . > . > further down I see: > 4/12/02 18:30:08 INFO storage.BlockManagerInfo: Added rdd_15_9 on disk on > localhost:41889 (size: 6.9 GB) > 4/12/02 18:30:08 INFO storage.BlockManagerMaster: Updated info of block > rdd_15_9 > 14/12/02 18:30:08 ERROR executor.Executor: Exception in task 9.0 in stage > 2.0 (TID 348) > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE > > I don't understand couple of things: > 1) In this case, I am joining 2 RDDs (size 16.3 G and 17.2 GB) both rdds > are > create from reading from HDFS files. The size of each .part is 24.87 MB, I > am reading this files into 250 partitions, so I shouldn't have any > individual partition over 25MB, so how could rdd_15_9 have 8.1g? > > 2) Even if the data is 8.1g, spark should have enough memory to write, but > I > would expect Integer.MAX_VALUE 2gb limitation! However, I don't get that > error message, and partial dataset is written to disk (6.9 gb). I don't > understand how and why only partial dataset is written. > > 3) Why do get "java.lang.IllegalArgumentException: Size exceeds > Integer.MAX_VALUE" after writing partial dataset. > > I would love to hear from anyone that can shed some light into this... > > > None > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
