Github user rustagi commented on the issue:

    https://github.com/apache/spark/pull/11205
  
    I am seeing this issue quite frequently. Not sure what is causing it but 
frequently we will get a onTaskEnd event after a stage has ended. This will 
cause the numRunningTasks to become negative. If executor number is updated 
then number of required executors(maxNumExecutorsNeeded) becomes negative & 
have issues in new executor allocation and deallocation. Best case you get 
executors that are unable to deallocate & over time spark does not allocate new 
executors even if there are tasks pending.
    There is a simple hacky patch here: 
https://github.com/apache/spark/pull/9288 & this one is an attempt to correct 
it with more accountability. 
    I am seeing this issue so frequently that I am not sure its possible to run 
Spark with dynamic allocation successfully for long duration without fixing it. 
I'll try the hacky patch & confirm. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to