Github user IgorBerman commented on a diff in the pull request:
https://github.com/apache/spark/pull/20640#discussion_r195829448
--- Diff:
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
---
@@ -648,14 +645,8 @@ private[spark] class
MesosCoarseGrainedSchedulerBackend(
totalGpusAcquired -= gpus
gpusByTaskId -= taskId
}
- // If it was a failure, mark the slave as failed for blacklisting
purposes
if (TaskState.isFailed(state)) {
- slave.taskFailures += 1
-
- if (slave.taskFailures >= MAX_SLAVE_FAILURES) {
- logInfo(s"Blacklisting Mesos slave $slaveId due to too many
failures; " +
- "is Spark installed on it?")
- }
+ logError(s"Task $taskId failed on Mesos slave $slaveId.")
--- End diff --
@squito @felixcheung wdyt regarding adding almost same lines here as in
https://github.com/apache/spark/pull/21068/files#diff-65ed0dbf413c9f48cfa8f6eed9f3f0d5R73
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]