Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @squito @IgorBerman a few points: a) This should be configurable. SPARK-16630 is not closed, it seems also kubernetes needs to support checking for blacklisted nodes when launching tasks. From what I see [here ](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L172) though, the CoarseGrainedSchedulerBackend checks if the executor is acceptable when it registers back to the backend. So if it is launched on a black listed node it is removed immediately. Of course we want to fail fast and thus, in kubernetes case we should also have code to exclude the blacklisted nodes when launching pods. See [here](https://github.com/kubernetes/kubernetes/issues/14573) and [here](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/). This is essentially the same concept like in [yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/ scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L127-L133). b) I see TaskSchedulerImpl a higher level abstraction compared CoarseGrainedSchedulerBackend (this is subclassed by all major backends). The thing is blacklist info is kept in TaskSchedulerImpl and all backends update it implicitly in different paths. CoarseGrainedSchedulerBackend updates it when there is a new status update for [example](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L122). At the end of this path [handleFailedTask](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L772) is called which updates the related info. A more interesting path for the discussion is when an executor disconnects and this is where [Yarn](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L213) and [Kubernetes](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L428) override the driver's endpoint to set [criteria](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L356-L366) for when a executor is considered lost and task failure should be taken into consideration for blacklisting. The reason you want this is preemption, or for mesos ,besides that, could be a scenario where we have an agent restarting ([process is restarte d](http://mesos.apache.org/documentation/latest/agent-recovery/)) and then executors are killed (framework checkpointing not enabled, eg. when upgrading). In the latter case if I restart my agent process I dont want to blacklist it, my slave is ready to go as soon as it reconnects to master. I also dont want to wait for blacklisting to expire. To move forward, I suggest: a) remove the handling logic for blacklisting in mesos. Logging is done when a node is blacklisted [here](https://github.com/apache/spark/blob/f41c0a93fd3913ad93e55ddbfd875229872ecc97/core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala#L194). b) override logic at driver's point when there is a [need](http://mesos.apache.org/documentation/latest/oversubscription/) for doing so.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org