Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18739#discussion_r129749722
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -665,10 +667,15 @@ private[spark] class TaskSetManager(
                 }
               }
               if (blacklistedEverywhere) {
    -            val partition = tasks(indexInTaskSet).partitionId
    -            abort(s"Aborting $taskSet because task $indexInTaskSet 
(partition $partition) " +
    -              s"cannot run anywhere due to node and executor blacklist.  
Blacklisting behavior " +
    -              s"can be configured via spark.blacklist.*.")
    +            val dynamicAllocationEnabled = 
conf.getBoolean("spark.dynamicAllocation.enabled", false)
    +            val mayAllocateNewExecutor =
    +              conf.getInt("spark.executor.instances", -1) > 
currentExecutorNumber
    +            if (!dynamicAllocationEnabled && !mayAllocateNewExecutor) {
    --- End diff --
    
    but even with dynamic allocation, you might still want this, right?  Are 
you hoping that with dynamic allocation, even if everything is blacklisted, 
eventually the executors will go idle, get torn down, and then new executors 
will get created since you still have tasks left?
    
    On large clusters, this seems desirable.  There are weird cases with small 
clusters though ... suppose the cluster only has two nodes, and you end up 
blacklisting both nodes (with such a small cluster, that can happen just 
because tasks fail from poor user code).  Then with this change, you'll go back 
to having the job sit idle for a long time, just waiting for the blacklist to 
timeout.
    
    I agree the current solution isn't great, but I don't know if this really 
improves things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to