tgravescs commented on a change in pull request #23223: 
[SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn 
to maxmize use of cluster resource
URL: https://github.com/apache/spark/pull/23223#discussion_r243312174
 
 

 ##########
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -612,13 +612,23 @@ private[yarn] class YarnAllocator(
             val message = "Container killed by YARN for exceeding physical 
memory limits. " +
               s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
             (true, message)
-          case _ =>
-            // all the failures which not covered above, like:
-            // disk failure, kill by app master or resource manager, ...
-            allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt)
-            (true, "Container marked as failed: " + containerId + onHostStr +
-              ". Exit status: " + completedContainer.getExitStatus +
-              ". Diagnostics: " + completedContainer.getDiagnostics)
+          case other_exit_status =>
+            // SPARK-26269: follow YARN's blacklisting behaviour(see 
https://github
+            // 
.com/apache/hadoop/blob/228156cfd1b474988bc4fedfbf7edddc87db41e3/had
+            // 
oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap
+            // ache/hadoop/yarn/util/Apps.java#L273 for details)
+            if 
(NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
+              (true, s"Container marked as failed: $containerId$onHostStr" +
 
 Review comment:
   I think we want to return false here for exitCausedByApp since these don't 
seem to be issues with the App.  From the comment in the yarn code: // Neither 
the app's fault nor the system's fault. This happens by design,
         // so no need for skipping nodes
   
   If we mark it as true then it counts against our task failure, if its false 
it doesn't seems like these shouldn't count against our failures. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to