Github user rustagi commented on the issue:
https://github.com/apache/spark/pull/11205
I am seeing this issue quite frequently. Not sure what is causing it but
frequently we will get a onTaskEnd event after a stage has ended. This will
cause the numRunningTasks to become negative. If executor number is updated
then number of required executors(maxNumExecutorsNeeded) becomes negative &
have issues in new executor allocation and deallocation. Best case you get
executors that are unable to deallocate & over time spark does not allocate new
executors even if there are tasks pending.
There is a simple hacky patch here:
https://github.com/apache/spark/pull/9288 & this one is an attempt to correct
it with more accountability.
I am seeing this issue so frequently that I am not sure its possible to run
Spark with dynamic allocation successfully for long duration without fixing it.
I'll try the hacky patch & confirm.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]