It's normal for PermGen to be a bit more of an issue with Spark than for other JVM-based applications. You should simply increase the PermGen size, which I don't see in your command. -XX:MaxPermSize=256m allows it to grow to 256m for example. The right size depends on your total heap size and app.
Also, Java 8 no longer has a permanent generation, so this particular type of problem and tuning is not needed. You might consider running on Java 8. On Fri, Jan 9, 2015 at 10:38 AM, Joe Wass <[email protected]> wrote: > I'm running on an AWS cluster of 10 x m1.large (64 bit, 7.5 GiB RAM). FWIW > I'm using the Flambo Clojure wrapper which uses the Java API but I don't > think that should make any difference. I'm running with the following > command: > > spark/bin/spark-submit --class mything.core --name "My Thing" --conf > spark.yarn.executor.memoryOverhead=4096 --conf > spark.executor.extraJavaOptions="-XX:+CMSClassUnloadingEnabled > -XX:+CMSPermGenSweepingEnabled" /root/spark/code/myjar.jar > > For one of the stages I'm getting errors: > > - ExecutorLostFailure (executor lost) > - Resubmitted (resubmitted due to lost executor) > > And I think they're caused by slave executor JVMs dying up with this error: > > java.lang.OutOfMemoryError: PermGen space > java.lang.Class.getDeclaredConstructors0(Native Method) > java.lang.Class.privateGetDeclaredConstructors(Class.java:2585) > java.lang.Class.getConstructor0(Class.java:2885) > java.lang.Class.newInstance(Class.java:350) > > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) > > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396) > java.security.AccessController.doPrivileged(Native Method) > > sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395) > > sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113) > > sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331) > > java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376) > java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) > java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493) > java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468) > java.security.AccessController.doPrivileged(Native Method) > java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468) > java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365) > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602) > > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) > > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > > > 1 stage out of 14 (so far) is failing. My failing stage is 1768 succeeded / > 1862 (940 failed). 7 tasks failed with OOM, 919 were "Resubmitted > (resubmitted due to lost executor)". > > Now my "Aggregated Metrics by Executor" shows that 10 out of 16 executors > show "CANNOT FIND ADDRESS" which I imagine means the JVM blew up and hasn't > been restarted. Now the 'Executors' tab shows only 7 executors. > > - Is this normal? > - Any ideas why this is happening? > - Any other measures I can take to prevent this? > - Is the rest of my app going to run on a reduced number of executors? > - Can I re-start the executors mid-application? This is a long-running job, > so I'd like to do what I can whilst it's running, if possible. > - Am I correct in thinking that the --conf arguments are supplied to the > JVMs of the slave executors, so they will be receiving the extraJavaOptions > and memoryOverhead? > > Thanks very much! > > Joe --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
