Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/13482
  
    We should probably decouple the task scheduling and the executor lost 
reason eventually, but that is a separate issue.
    
    The only time I would see removing the notifyAll a problem is if they 
increase the heartbeat timeout to a very large number, but it would have to be 
close to the rpc timeout, which they just shouldn't do.   Otherwise a couple of 
extra seconds to reschedule the tasks in this failure case that is not the norm 
shouldn't be a problem and as soon as one happens, it goes down to the 200ms 
that this patch is suggesting anyway.
    
    @rdblue  does removing the notifyAll call solve your problem as well?  That 
seems like a much cleaner approach then notifying but then sleeping some time 
again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to