Can you post the class names in your graph ?
After zooming in the picture, I can only see the package of the first
class: scala.collection.immutable

BTW which release of Spark are you using ?

Cheers

On Tue, Dec 15, 2015 at 6:13 PM, yunshan <[email protected]> wrote:

> Hi,
>
> Recently, I want to build a system that can continuously process spark
> jobs.
> Under the hood, I keep a spark-shell alive so I can utilize RDD caching to
> cache spark jobs’ input (sometimes our jobs have the same input data). It
> works well until we find a problem about PermGen Space: After 500 job runs
> in spark-shell, my spark driver throws the Java OOM Exception: PermGen
> Space.
>
> At first, I thought maybe there are some memory leaks in my code. After I
> dived deeper, I realized it might not be the case. Every time I send a
> command to spark-shell, the permGen space increases. Here is what I did to
> measure the spark-shell driver’s permGen space:
>
> I launch a spark-shell and call a simple command multi times: scala> for (i
> <- 1 to 50) { val rdd = sc.binaryFiles("/share/HIGGS") }
> I can see the PermGen space keep increasing (See the PU column). Even with
> explicit GC ( scala> for (i <- 1 to 50) { System.gc() } ), the PermGen
> space
> still increases:
>
> [dev@sandbox ~]$ jstat -gc 20581
> S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC
> PU    YGC     YGCT    FGC    FGCT     GCT
> 2560.0 2560.0  0.0   2541.0 344576.0 81066.0   699392.0   206814.0  97280.0
> 96796.2    207    0.803  198    37.506   38.309
> [dev@sandbox ~]$ jstat -gc 20581
> S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC
> PU    YGC     YGCT    FGC    FGCT     GCT
> 2560.0 8704.0 2545.4  0.0   332288.0 331671.6  699392.0   214443.6  97280.0
> 96851.0    208    0.813  198    37.506   38.319
>
> On the graph I attach, it shows the heapdump comparison. It seems a lot of
> scala reflection classes are loaded. Not sure whether it is the root cause.
>
> Moreover, it seems that even calling the garbage collector itself increases
> the PermGen usage.
> We could try to increase PermGen size to mitigate the problem, but that
> will
> only postpone the problem: we want to keep spark-shell as long as possible
> at with ever-increasing PermGen usage, we will run out of memory.
>
> 1.      Is there any interdependence between garbage collector and Spark
> shell?
> Can Spark shell stop garbage collector from cleaning PermSpace
> 2.      We don’t want a full Spark restart because it takes too long: is
> there a
> way to clean PermGen space in Spark?
>
> This also happens with these swtiches:
> ./bin/spark-shell --conf
> "spark.executor.extraJavaOptions=-XX:+CMSClassUnloadingEnabled
> -XX:+CMSPermGenSweepingEnabled"
>
> Appreciate any advices. Thank you.
>
> Thanks,
> Yunshan
>
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25713/heap-comparison_%2800000002%29.png
> >
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-long-running-spark-shell-but-avoid-hitting-Java-Out-of-Memory-Exception-PermGen-Space-tp25713.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to