Yuqi Wang commented on YARN-6959:

As this issue, other places which call getApplicationAttempt may also want to 
get the attempt specified in the arg instead of current attempt.
And if I just change getApplicationAttempt to getCurrentApplicationAttempt, it 
is more likely to hide the bugs.
I think only for this fix, I will not touch getApplicationAttempt until we have 
a confirmed all places used getApplicationAttempt is safe.

> RM may allocate wrong AM Container for new attempt
> --------------------------------------------------
>                 Key: YARN-6959
>                 URL: https://issues.apache.org/jira/browse/YARN-6959
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, fairscheduler, scheduler
>    Affects Versions: 2.7.1
>            Reporter: Yuqi Wang
>            Assignee: Yuqi Wang
>              Labels: patch
>             Fix For: 2.7.1, 3.0.0-alpha4
>         Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to