GitHub user juanrh opened a pull request:

    https://github.com/apache/spark/pull/19590

    [WIP][SPARK-22148][CORE] TaskSetManager.abortIfCompletelyBlacklisted should 
not abort when all current executors are blacklisted but dynamic allocation is 
enabled

    ## What changes were proposed in this pull request?
    I've been working on this issue, and I would like to get your feedback on 
the following approach. The idea is that instead of failing in 
`TaskSetManager.abortIfCompletelyBlacklisted`, when a task cannot be scheduled 
in any executor but dynamic allocation is enabled, we will register this task 
with `ExecutorAllocationManager`. Then `ExecutorAllocationManager` will request 
additional executors for these "unscheduleable tasks" by increasing the value 
returned in `ExecutorAllocationManager.maxNumExecutorsNeeded`. This way we are 
counting these tasks twice, but this makes sense because the current executors 
don't have any slot for these tasks, so we actually want to get new executors 
that are able to run these tasks. To avoid a deadlock due to tasks being 
unscheduleable forever, we store the timestamp when a task was registered as 
unscheduleable, and in `ExecutorAllocationManager.schedule` we abort the 
application if there is some task that has been unscheduleable for a 
configurable a
 ge threshold. This way we give an opportunity to dynamic allocation to get 
more executors that are able to run the tasks, but we don't make the 
application wait forever.
    
    ## How was this patch tested?
    This is WIP for discussion, unit tests will be provided later on

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/juanrh/spark hortala-SPARK-22148

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19590.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19590
    
----
commit 7baa51e5cbafc93a0fe56eb11d8c76c69b51c893
Author: Juan Rodriguez Hortala <[email protected]>
Date:   2017-10-21T00:40:07Z

    first prototype

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to