GitHub user ArcherShao opened a pull request:
https://github.com/apache/spark/pull/5676
[SPARK-6891] Fix the bug that ExecutorAllocationManager will request
negative number executors
In ExecutorAllocationManager, executor allocate schedule at a fix
rate(100ms), it will call the method 'addOrCancelExecutorRequests' first, and
then remove expired excutors.
Suppose at time T, no task is running or pending, and there a 5 executors
runing, but all expired.
1. the method 'addOrCancelExecutorRequests' wiill be called, and the value
of 'ExecutorAllocationManager.numExecutorsPending' will update to -5.
2. remove 5 expired excutors.
Suppose still no task is running or pending at T+1, the method
'targetNumExecutors' will return -5, and method 'addExecutors' will be called,
private def addExecutors(maxNumExecutorsNeeded: Int): Int = {
val currentTarget = targetNumExecutors
....
val actualMaxNumExecutors = math.min(maxNumExecutors,
maxNumExecutorsNeeded)
val newTotalExecutors = math.min(currentTarget + numExecutorsToAdd,
actualMaxNumExecutors)
val addRequestAcknowledged = testing ||
client.requestTotalExecutors(newTotalExecutors)
....
}
newTotalExecutors will be a negative number, when
client.requestTotalExecutors(newTotalExecutors) called, it will throw an
exception.
Let method 'targetNumExecutors' return a value not less than
minNumExecutors, then the newTotalExecutors will never be negative.
And targetNumExecutors not less than minNumExecutors is also make sense.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ArcherShao/spark SPARK-6891
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5676.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5676
----
commit 1693b54f209a17ebb6bed449f81840737f97366a
Author: ArcherShao <[email protected]>
Date: 2015-04-24T00:59:59Z
[SPARK-6891] Fix the bug that ExecutorAllocationManager will request
negative number executors
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]