Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/17480#discussion_r111298162
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -249,7 +249,14 @@ private[spark] class ExecutorAllocationManager(
* yarn-client mode when AM re-registers after a failure.
*/
def reset(): Unit = synchronized {
- initializing = true
+ /**
+ * When some tasks need to be scheduled and initial executor = 0,
resetting the initializing
+ * field may cause it to not be set to false in yarn.
+ * SPARK-20079: https://issues.apache.org/jira/browse/SPARK-20079
+ */
+ if (maxNumExecutorsNeeded() == 0) {
+ initializing = true
--- End diff --
>instead it shouldn't have any effect on the current number of needed
executors
I think I don't say `reset` shouldn't have any effect on the current
required executor numbers.
`reset` is happened in yarn client mode AM failure situation. In this
situation, executors will be re-spawned to the initial executor number. So
`numExecutorsTarget` should also set to this value to match in the initial
state.
IIUC, your purpose of reset is to change the state (executor number) to be
the same as final state of last attempt. For example we have 10 executors
before AM gone, my understanding is that you want dynamic allocation to reset
to 10 after AM restart, am I right?
And the original implementation of `reset` is to change the state to the
first state of last attempt, which means if the initial state is 1 executors,
then after reset we also change to "1".
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]