Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14079#discussion_r70848996
  
    --- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -217,18 +219,28 @@ private[yarn] class YarnAllocator(
        * @param localityAwareTasks number of locality aware tasks to be used 
as container placement hint
        * @param hostToLocalTaskCount a map of preferred hostname to possible 
task counts to be used as
        *                             container placement hint.
    +   * @param nodeBlacklist a set of blacklisted node to avoid allocating 
new container on them. It
    +   *                              will be used to update AM blacklist.
        * @return Whether the new requested total is different than the old 
value.
        */
       def requestTotalExecutorsWithPreferredLocalities(
           requestedTotal: Int,
           localityAwareTasks: Int,
    -      hostToLocalTaskCount: Map[String, Int]): Boolean = synchronized {
    +      hostToLocalTaskCount: Map[String, Int],
    +      nodeBlacklist: Set[String]): Boolean = synchronized {
         this.numLocalityAwareTasks = localityAwareTasks
         this.hostToLocalTaskCounts = hostToLocalTaskCount
     
         if (requestedTotal != targetNumExecutors) {
           logInfo(s"Driver requested a total number of $requestedTotal 
executor(s).")
           targetNumExecutors = requestedTotal
    +
    +      // Update blacklist infomation to YARN ResouceManager for this 
application,
    +      // in order to avoid allocating new Container on the problematic 
nodes.
    +      val blacklistAdditions = nodeBlacklist -- currentNodeBlacklist
    --- End diff --
    
    we probably want to remove this from our actual asks (locality preferences) 
as well.  the capacity scheduler (didn't check fair scheduler) is smart enough 
to not schedule on blacklisted node, but it would make sense to remove and 
potentially add another one that would be local rather then letting it fall 
back to pick a non-local one for us.  Not super critical so if we want to move 
that to another jira I'm fine with it. I think you had some other things around 
integrating it more with the resource managers later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to