Can you post the class names in your graph ? After zooming in the picture, I can only see the package of the first class: scala.collection.immutable
BTW which release of Spark are you using ? Cheers On Tue, Dec 15, 2015 at 6:13 PM, yunshan <[email protected]> wrote: > Hi, > > Recently, I want to build a system that can continuously process spark > jobs. > Under the hood, I keep a spark-shell alive so I can utilize RDD caching to > cache spark jobs’ input (sometimes our jobs have the same input data). It > works well until we find a problem about PermGen Space: After 500 job runs > in spark-shell, my spark driver throws the Java OOM Exception: PermGen > Space. > > At first, I thought maybe there are some memory leaks in my code. After I > dived deeper, I realized it might not be the case. Every time I send a > command to spark-shell, the permGen space increases. Here is what I did to > measure the spark-shell driver’s permGen space: > > I launch a spark-shell and call a simple command multi times: scala> for (i > <- 1 to 50) { val rdd = sc.binaryFiles("/share/HIGGS") } > I can see the PermGen space keep increasing (See the PU column). Even with > explicit GC ( scala> for (i <- 1 to 50) { System.gc() } ), the PermGen > space > still increases: > > [dev@sandbox ~]$ jstat -gc 20581 > S0C S1C S0U S1U EC EU OC OU PC > PU YGC YGCT FGC FGCT GCT > 2560.0 2560.0 0.0 2541.0 344576.0 81066.0 699392.0 206814.0 97280.0 > 96796.2 207 0.803 198 37.506 38.309 > [dev@sandbox ~]$ jstat -gc 20581 > S0C S1C S0U S1U EC EU OC OU PC > PU YGC YGCT FGC FGCT GCT > 2560.0 8704.0 2545.4 0.0 332288.0 331671.6 699392.0 214443.6 97280.0 > 96851.0 208 0.813 198 37.506 38.319 > > On the graph I attach, it shows the heapdump comparison. It seems a lot of > scala reflection classes are loaded. Not sure whether it is the root cause. > > Moreover, it seems that even calling the garbage collector itself increases > the PermGen usage. > We could try to increase PermGen size to mitigate the problem, but that > will > only postpone the problem: we want to keep spark-shell as long as possible > at with ever-increasing PermGen usage, we will run out of memory. > > 1. Is there any interdependence between garbage collector and Spark > shell? > Can Spark shell stop garbage collector from cleaning PermSpace > 2. We don’t want a full Spark restart because it takes too long: is > there a > way to clean PermGen space in Spark? > > This also happens with these swtiches: > ./bin/spark-shell --conf > "spark.executor.extraJavaOptions=-XX:+CMSClassUnloadingEnabled > -XX:+CMSPermGenSweepingEnabled" > > Appreciate any advices. Thank you. > > Thanks, > Yunshan > > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n25713/heap-comparison_%2800000002%29.png > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-long-running-spark-shell-but-avoid-hitting-Java-Out-of-Memory-Exception-PermGen-Space-tp25713.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
