akirillov commented on issue #20640: [SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend URL: https://github.com/apache/spark/pull/20640#issuecomment-526274978 Thanks, @IgorBerman. Based on the conversation it looks like the more general idea is to use logic similar to the one in `BlacklistTracker` but for Mesos Task failures. Mesos task launch failure can be caused by multiple reasons including `TASK_ERROR` due to lack of permissions (not node-specific), `TASK_KILLED` due to over-commitment or the upcoming node draining Mesos feature. So it doesn't seem that `BlacklistTracker` can be used for this purpose and another implementation is needed. Speaking more generally, if there's a failed node or a network failure, the scheduler will not receive offers from that node and won't attempt to launch a task(executor) on it. Also, given that a coarse-grained scheduler is the default one, and the fine-grained scheduler is deprecated, the scheduling happens only on application start (except dynamic allocation use case). So given the nature and duration of the scheduling step, it's not clear if the blacklisting makes sense for the scheduling of executors themselves.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
