[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

squito Wed, 20 Jun 2018 19:54:01 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20640#discussion_r196997885
  
    --- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
    @@ -648,14 +645,8 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
               totalGpusAcquired -= gpus
               gpusByTaskId -= taskId
             }
    -        // If it was a failure, mark the slave as failed for blacklisting 
purposes
             if (TaskState.isFailed(state)) {
    -          slave.taskFailures += 1
    -
    -          if (slave.taskFailures >= MAX_SLAVE_FAILURES) {
    -            logInfo(s"Blacklisting Mesos slave $slaveId due to too many 
failures; " +
    -                "is Spark installed on it?")
    -          }
    +          logError(s"Task $taskId failed on Mesos slave $slaveId.")
    --- End diff --
    
    @IgorBerman I'm not entirely sure what you mean.
    
    yes, *eventually* I think mesos should be doing something very simliar to 
whats in that PR.  You can't use that immediately, because for now the other PR 
is tied to yarn internals.  But I don't think it would be too hard to refactor 
what's there just a little bit so most of the logic could be reused.
    
    but I think everybody just wants to get this change in, and do that in a 
followup.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20640: [SPARK-19755][Mesos] Blacklist is always active f...

Reply via email to