Yuqi Wang created YARN-6959:

             Summary: RM may allocate wrong AM Container for new attempt
                 Key: YARN-6959
                 URL: https://issues.apache.org/jira/browse/YARN-6959
             Project: Hadoop YARN
          Issue Type: Bug
          Components: scheduler
    Affects Versions: 2.7.1
            Reporter: Yuqi Wang
            Assignee: Yuqi Wang
             Fix For: 3.0.0-alpha4, 2.7.1

*Issue Summary:*
Previous attempt ResourceRequest may be recorded into current attempt 
ResourceRequests. These mis-recorded ResourceRequests may confuse AM Container 
Request and Allocation for current attempt.

*Issue Pipeline:*

// Executing precondition check for the incoming attempt id.
ApplicationMasterService.allocate() ->

scheduler.allocate(attemptId, ask, ...) ->

// Previous precondition check for the attempt id may be outdated here, 
// i.e. the currentAttempt may not be the corresponding attempt of the 
// Such as the attempt id is corresponding to the previous attempt.
currentAttempt = scheduler.getApplicationAttempt(attemptId) ->

// Previous attempt ResourceRequest may be recorded into current attempt 
currentAttempt.updateResourceRequests(ask) ->

// RM may allocate wrong AM Container for the current attempt, because its 
// may come from previous attempt which can be any ResourceRequests previous AM 
// and there is not matching logic for the original AM Container 
ResourceRequest and 
// the returned amContainerAllocation below.
AMContainerAllocatedTransition.transition(...) ->
amContainerAllocation = scheduler.allocate(currentAttemptId, ...)

*Patch Correctness:*
Because after this Patch, RM will definitely record ResourceRequests from 
different attempt into different objects of 
So, even if RM still record ResourceRequests from old attempt at any time, 
these ResourceRequests will be recorded in old AppSchedulingInfo object which 
will not impact current attempt's resource requests and allocation.

The getApplicationAttempt function in AbstractYarnScheduler is so confusing, we 
should better rename it to getCurrentApplicationAttempt. And reconsider whether 
there are any other bugs related to getApplicationAttempt.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to