Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18651#discussion_r128831747
  
    --- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ---
    @@ -525,9 +534,11 @@ private[yarn] class YarnAllocator(
                   } catch {
                     case NonFatal(e) =>
                       logError(s"Failed to launch executor $executorId on 
container $containerId", e)
    -                  // Assigned container should be released immediately to 
avoid unnecessary resource
    -                  // occupation.
    +                  // Assigned container should be released immediately
    +                  // to avoid unnecessary resource occupation.
                       amClient.releaseAssignedContainer(containerId)
    +              } finally {
    +                numExecutorsStarting.decrementAndGet()
    --- End diff --
    
    yes but its a bug right now as the numbers can be wrong. Are you looking at 
the synchronization?
    
    Right now everything is called synchronized up to the point of launcher 
pool to do the ExecutorRunnable.  At this point running is not incremented, 
pending is decremented and we now increment Starting.  That is fine.
    
    But when the ExecutorRunnable finishes the only place its called 
synchronized is in updateInternalState.  This right now increments running but 
does not decrement starting.  if updateResourceRequests gets called (which is 
synchronized), Right after updateInternalState (which leave the syncrhonized) 
but before the finally block executes and decrements starting the total number 
can be more then it really is.  That executor is counted as both running and 
starting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to