Brian created SPARK-11049:
-----------------------------

             Summary: If a single executor fails to allocate memory, entire job 
fails
                 Key: SPARK-11049
                 URL: https://issues.apache.org/jira/browse/SPARK-11049
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.4.0
            Reporter: Brian


To reproduce:

* Create a spark cluster using start-master.sh and start-slave.sh (I believe 
this is the "standalone cluster manager?").  
* Leave a process running on some nodes that take up about significant amounts 
of RAM.
* Leave some nodes with plenty of RAM to run spark.
* Run a job against this cluster with spark.executor.memory asking for all or 
most of the memory available on each node.

On the node that has insufficient memory, there will of course be an error like:
Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.

On the driver node, and in the spark master UI, I see that _all_ executors exit 
or are killed, and the entire job fails.  It would be better if there was an 
indication of which individual node is actually at fault.  It would also be 
better if the cluster manager could handle failing-over to nodes that are still 
operating properly and have sufficient RAM.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to