This probably means that there’s not enough free memory for the “scratch” space used for computations, so we OOM before the Spark cache decides that it’s full and starts to spill stuff. Try reducing spark.storage.memoryFraction (default is 0.66, try 0.5).
Matei On Feb 5, 2014, at 10:29 PM, Andrew Ash <[email protected]> wrote: > // version 0.9.0 > > Hi Spark users, > > My understanding of the MEMORY_AND_DISK_SER persistence level was that if an > RDD could fit into memory then it would be left there (same as MEMORY_ONLY), > and only if it was too big for memory would it spill to disk. Here's how the > docs describe it: > > MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions that > don't fit in memory to disk instead of recomputing them on the fly each time > they're needed. > https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html > > > > What I'm observing though is that really large RDDs are actually causing > OOMs. I'm not sure if this is a regression in 0.9.0 or if it has been this > way for some time. > > While I look through the source code, has anyone actually observed the > correct spill to disk behavior rather than an OOM? > > Thanks! > Andrew
