I was faced with a same problem too. As a result, this is not a Spark problem, but Scala. And I think it may not be a case of memory leak. Spark-shell basically implements Scala REPL and it is originally designed as short time application, not 24/7 application. Scala-shell uses so many objects like Symbol and Naming in Global, internal compiler. These objects seem to be needed to store class tree ,hierarchical information, names of variables, history and so on when you enter some codes. That's why it easily causes heap space OOME. And Permgen OOME usually comes from many anonymous classes generated from REPL. Permgen sweeping option may not help us because we commonly create instances of these anonymous classes and using it in somewhere of codes. They may be associated with AppClassLoader so I think it is impossible to clear chain of these instances. Restarting JVM is very simple solution.
-----Original Message----- From: "Samson Zhu"<[email protected]> To: "Ted Yu"<[email protected]>; Cc: "user"<[email protected]>; Sent: 2015-12-16 (수) 13:46:55 Subject: Re: How to keep long running spark-shell but avoid hitting Java Out of Memory Exception: PermGen Space Hi Ted, The top three classes on the graph are:scala.collection.immutable.$colon$colonscala.reflect.internal.Symbols$TypeHistoryscala.reflect.internal.Scopes$ScopeEntry I also attach a bigger graph. And I use Spark 1.5.2 Thanks,Yunshan 2015-12-15 19:20 GMT-08:00 Ted Yu <[email protected]>: Can you post the class names in your graph ?After zooming in the picture, I can only see the package of the first class: scala.collection.immutable BTW which release of Spark are you using ? Cheers On Tue, Dec 15, 2015 at 6:13 PM, yunshan <[email protected]> wrote: Hi, Recently, I want to build a system that can continuously process spark jobs. Under the hood, I keep a spark-shell alive so I can utilize RDD caching to cache spark jobs’ input (sometimes our jobs have the same input data). It works well until we find a problem about PermGen Space: After 500 job runs in spark-shell, my spark driver throws the Java OOM Exception: PermGen Space. At first, I thought maybe there are some memory leaks in my code. After I dived deeper, I realized it might not be the case. Every time I send a command to spark-shell, the permGen space increases. Here is what I did to measure the spark-shell driver’s permGen space: I launch a spark-shell and call a simple command multi times: scala> for (i <- 1 to 50) { val rdd = sc.binaryFiles("/share/HIGGS") } I can see the PermGen space keep increasing (See the PU column). Even with explicit GC ( scala> for (i <- 1 to 50) { System.gc() } ), the PermGen space still increases: [dev@sandbox ~]$ jstat -gc 20581 S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 2560.0 2560.0 0.0 2541.0 344576.0 81066.0 699392.0 206814.0 97280.0 96796.2 207 0.803 198 37.506 38.309 [dev@sandbox ~]$ jstat -gc 20581 S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 2560.0 8704.0 2545.4 0.0 332288.0 331671.6 699392.0 214443.6 97280.0 96851.0 208 0.813 198 37.506 38.319 On the graph I attach, it shows the heapdump comparison. It seems a lot of scala reflection classes are loaded. Not sure whether it is the root cause. Moreover, it seems that even calling the garbage collector itself increases the PermGen usage. We could try to increase PermGen size to mitigate the problem, but that will only postpone the problem: we want to keep spark-shell as long as possible at with ever-increasing PermGen usage, we will run out of memory. 1. Is there any interdependence between garbage collector and Spark shell? Can Spark shell stop garbage collector from cleaning PermSpace 2. We don’t want a full Spark restart because it takes too long: is there a way to clean PermGen space in Spark? This also happens with these swtiches: ./bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled" Appreciate any advices. Thank you. Thanks, Yunshan <http://apache-spark-user-list.1001560.n3.nabble.com/file/n25713/heap-comparison_%2800000002%29.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-long-running-spark-shell-but-avoid-hitting-Java-Out-of-Memory-Exception-PermGen-Space-tp25713.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
