Github user skonto commented on the issue:
https://github.com/apache/spark/pull/20640
@squito @IgorBerman a few points:
a) This should be configurable. SPARK-16630 is not closed, it seems also
kubernetes needs to support checking for blacklisted nodes when launching
tasks. From what I see [here
](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L172)
though, the CoarseGrainedSchedulerBackend checks if the executor is acceptable
when it registers back to the backend. So if it is launched on a black listed
node it is removed immediately. Of course we want to fail fast and thus, in
kubernetes case we should also have code to exclude the blacklisted nodes when
launching pods. See
[here](https://github.com/kubernetes/kubernetes/issues/14573) and
[here](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/).
This is essentially the same concept like in
[yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/
scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L127-L133).
b) I see TaskSchedulerImpl a higher level abstraction compared
CoarseGrainedSchedulerBackend (this is subclassed by all major backends). The
thing is blacklist info is kept in TaskSchedulerImpl and all backends update it
implicitly in different paths. CoarseGrainedSchedulerBackend updates it when
there is a new status update for
[example](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L122).
At the end of this path
[handleFailedTask](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L772)
is called which updates the related info.
A more interesting path for the discussion is when an executor disconnects
and this is where
[Yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L213)
and
[Kubernetes](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L428)
override the driver's endpoint to set
[criteria](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L356-L366)
for when a executor is considered lost and task failure should be taken into
consideration for blacklisting. The reason you want this is preemption, or for
mesos ,besides that, could be a scenario where we have an agent restarting
([process is restarte
d](http://mesos.apache.org/documentation/latest/agent-recovery/)) and then
executors are killed (framework checkpointing not enabled, eg. when upgrading).
In the latter case if I restart my agent process I dont want to blacklist it,
my slave is ready to go as soon as it reconnects to master. I also dont want to
wait for blacklisting to expire.
To move forward, I suggest:
a) remove the handling logic for blacklisting in mesos. Logging is done
when a node is blacklisted
[here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194).
b) override logic at driver's point when there is a
[need](http://mesos.apache.org/documentation/latest/oversubscription/) for
doing so.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]