Hi Everyone, Maybe it's a good time to reevaluate off-heap storage for RDD's with custom allocator?
On a few occasions recently I had to lower both spark.storage.memoryFraction and spark.shuffle.memoryFraction spark.shuffle.spill helps a bit with large scale reduces Also it could be you're hitting: https://github.com/apache/incubator-spark/pull/180 /Rafal Andrew Ash wrote: > I dropped down to 0.5 but still OOM'd, so sent it all the way to 0.1 > and didn't get an OOM. I could tune this some more to find where the > cliff is, but this is a one-off job so now that it's completed I don't > want to spend any more time tuning it. > > Is there a reason that this value couldn't be dynamically adjusted in > response to actual heap usage? > > I can imagine a scenario where spending too much time in GC > (descending into GC hell) drops the value a little to keep from OOM, > or directly measuring how much of the heap is spent on this scratch > space and adjusting appropriately. > > > On Sat, Feb 8, 2014 at 3:40 PM, Matei Zaharia <[email protected] > <mailto:[email protected]>> wrote: > > This probably means that there’s not enough free memory for the > “scratch” space used for computations, so we OOM before the Spark > cache decides that it’s full and starts to spill stuff. Try > reducing spark.storage.memoryFraction (default is 0.66, try 0.5). > > Matei > > On Feb 5, 2014, at 10:29 PM, Andrew Ash <[email protected] > <mailto:[email protected]>> wrote: > >> // version 0.9.0 >> >> Hi Spark users, >> >> My understanding of the MEMORY_AND_DISK_SER persistence level was >> that if an RDD could fit into memory then it would be left there >> (same as MEMORY_ONLY), and only if it was too big for memory >> would it spill to disk. Here's how the docs describe it: >> >> MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill >> partitions that don't fit in memory to disk instead of >> recomputing them on the fly each time they're needed. >> >> >> https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html >> >> >> >> What I'm observing though is that really large RDDs are actually >> causing OOMs. I'm not sure if this is a regression in 0.9.0 or >> if it has been this way for some time. >> >> While I look through the source code, has anyone actually >> observed the correct spill to disk behavior rather than an OOM? >> >> Thanks! >> Andrew > >
