But wait, does Spark know to unpersist() RDDs that are not referenced anywhere? That would’ve taken care of the RDDs that I kept creating and then orphaning as part of my job testing/profiling.
Is that what SPARK-1103 <https://issues.apache.org/jira/browse/SPARK-1103>is about, btw? (Sorry to keep digging up this thread.) On Wed, Apr 16, 2014 at 5:55 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Never mind. I'll take it from both Andrew and Syed's comments that the > answer is yes. Dunno why I thought otherwise. > > > On Wed, Apr 16, 2014 at 5:43 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I’m running into a similar issue as the OP. I’m running the same job over >> and over (with minor tweaks) in the same cluster to profile it. It just >> recently started throwing java.lang.OutOfMemoryError: Java heap space. >> >>> Are you caching a lot of RDD's? If so, maybe you should unpersist() the >>> ones that you're not using. >> >> I thought that Spark automatically ejects RDDs from the cache using >> LRU. >> >> Do I need to explicitly unpersist() RDDs that are cached with the >> default storage level? >> >> Nick >> >> >> On Thu, Mar 27, 2014 at 1:46 PM, Andrew Or <and...@databricks.com> wrote: >> >> Are you caching a lot of RDD's? If so, maybe you should unpersist() the >>> ones that you're not using. Also, if you're on 0.9, make sure >>> spark.shuffle.spill is enabled (which it is by default). This allows your >>> application to spill in-memory content to disk if necessary. >>> >>> How much memory are you giving to your executors? The default, >>> spark.executor.memory is 512m, which is quite low. Consider raising this. >>> Checking the web UI is a good way to figure out your runtime memory usage. >>> >>> >>> On Thu, Mar 27, 2014 at 9:22 AM, Ognen Duzlevski < >>> og...@plainvanillagames.com> wrote: >>> >>>> Look at the tuning guide on Spark's webpage for strategies to cope >>>> with this. >>>> I have run into quite a few memory issues like these, some are resolved >>>> by changing the StorageLevel strategy and employing things like Kryo, some >>>> are solved by specifying the number of tasks to break down a given >>>> operation into etc. >>>> >>>> Ognen >>>> >>>> >>>> On 3/27/14, 10:21 AM, Sai Prasanna wrote: >>>> >>>> "java.lang.OutOfMemoryError: GC overhead limit exceeded" >>>> >>>> What is the problem. The same code, i run, one instance it runs in 8 >>>> second, next time it takes really long time, say 300-500 seconds... >>>> I see the logs a lot of GC overhead limit exceeded is seen. What should >>>> be done ?? >>>> >>>> Please can someone throw some light on it ?? >>>> >>>> >>>> >>>> -- >>>> *Sai Prasanna. AN* >>>> *II M.Tech (CS), SSSIHL* >>>> >>>> >>>> * Entire water in the ocean can never sink a ship, Unless it gets >>>> inside. All the pressures of life can never hurt you, Unless you let them >>>> in.* >>>> >>>> >>>> >>> >