Hi, Sparkers
Our cluster is running Spark 1.5.2 with Standalone mode.
It runs fine for weeks, but today, I found out the master crash due to OOM.
We have several ETL jobs runs daily on Spark, and adhoc jobs. I can see the 
"Completed Applications" table grows in the master UI.
Original I set "export SPARK_DAEMON_MEMORY=1g", as I don't think master/worker 
JVM daemon needs too much memory.
I never meet Spar master OOM when we run on version 1.3.1. But suddenly, I got 
it on 1.5.2.
I am not sure if it is due to growing "Completed Applications" history? I 
didn't start the spark history server, as we don't have that requirement yet.
Now I change the daemon memory from 1g to 2g, and restart the cluster.
Below is the OOM log in the spark master. In fact, OOM happened minutes after a 
job just finished. The job finished successfully, as the final HDFS output was 
generated around 09:31:00. So in theory, there is no active jobs while OOM 
happens, or it is trigged by the succeeds of that job. I don't know. But Spark 
master OOM in fact is a SPOF for us. Does anyone have any idea about it?
16/03/30 09:36:40 ERROR akka.ErrorMonitor: Uncaught fatal error from thread 
[sparkMaster-akka.remote.default-remote-dispatcher-33] shutting down 
ActorSystem [sparkMaster]java.lang.OutOfMemoryError: Java heap space        at 
java.lang.Class.getDeclaredMethods0(Native Method)        at 
java.lang.Class.privateGetDeclaredMethods(Class.java:2615)        at 
java.lang.Class.getDeclaredMethod(Class.java:2007)        at 
java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431)        
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)        at 
java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:494)        at 
java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)        at 
java.security.AccessController.doPrivileged(Native Method)        at 
java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)        at 
java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)        at 
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)        at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)        
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)        
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)      
  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)       
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)   
     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)      
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)   
     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)   
     at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)       
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)        
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)     
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)     
   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) 
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)    
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)        
at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)     
   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)        at 
akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)        at 
akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
        at scala.util.Try$.apply(Try.scala:161)        at 
akka.serialization.Serialization.deserialize(Serialization.scala:98)
Thanks
Yong                                      

Reply via email to