Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18874
  
    @tgravescs that's actually progress. You're no longer saying that the goal 
is to keep a few executors around just in case 
(https://issues.apache.org/jira/browse/SPARK-21656) or that the problem is 
waiting on locality 
(https://issues.apache.org/jira/browse/SPARK-21656?focusedCommentId=16117159&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16117159)
 . 
    
    I believe then you're saying the problem is what I asked about in 
https://github.com/apache/spark/pull/18874#issuecomment-321315467 : there 
should be no way to go to 0 executors when there is any work to do. The 
scheduler should never make that decision, even if the min. I agree.
    
    You're actually saying something stronger: the number of executors should 
not go below target, not minimum. If so can we update the description to state 
it that way? (and if so, is checking vs min redundant?)
    
    But what about just fixing the 0 executor case as that is the scenario 
where no progress can be made?
    
    This change is a heuristic with side effects, as noted just above. This 
means you don't remove legitimately idle executors that the scheduler won't 
use. It harms the common case, though probably marginally. It's behavior change.
    
    It helps a case only where the driver is stuck for periods longer than an 
executor idle timeout. I think you have bigger problems if this is the case, 
right? If you have 60s GC pauses, you need to tune GC (or, idle timeout), but 
it's fair to have to tune _something_ if you don't like the slowdown from 
executors having to be reallocated.
    
    Is this a good tradeoff? I don't think so, but don't feel extreme about it.
    Is it important to address 0 executors, more narrowly? yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to