Rui Li created SPARK-2387:
-----------------------------

             Summary: Remove the stage barrier for better resource utilization
                 Key: SPARK-2387
                 URL: https://issues.apache.org/jira/browse/SPARK-2387
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
            Reporter: Rui Li


DAGScheduler divides a Spark job into multiple stages according to RDD 
dependencies. Whenever there’s a shuffle dependency, DAGScheduler creates a 
shuffle map stage on the map side, and another stage depending on that stage.
Currently, the downstream stage cannot start until all its depended stages have 
finished. This barrier between stages leads to idle slots when waiting for the 
last few upstream tasks to finish and thus wasting cluster resources.
Therefore we propose to remove the barrier and pre-start the reduce stage once 
there're free slots. This can achieve better resource utilization and improve 
the overall job performance, especially when there're lots of executors granted 
to the application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to