[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855418#comment-13855418
 ] 

Jian He commented on YARN-1493:
-------------------------------

The new patch included the patch for YARN-1490, as while working on them, they 
seem tightly coupled to me. will separate it if necessary.
What this patch has done is to share the containers across AMs, basically by 
copying needed objects from first attempt to the second attempt when the 
attempt is failing over.

Summarize the patch:
- Make schedulers send the App_accepted/App_rejected event to the RMApp instead 
of RMAppAttempt.
- Create two new events AppAddedSchedulerEvent and AppRemovedSchedulerEvent for 
adding and removing apps in the schedulers.
- Change the state transition to start a new attempt until the app is accepted 
by the scheduler.
- Create a new SchedulerApplication and rename the current SchedulerApplication 
to  SchedulerApplicationAttempt.
- Create a field in AppSubmissionContext to indicate whether to clean the 
containers on AM failure or not.
- Copy the data structures inside SchedulerApplicationAttempt over in the case 
that new attempt is recovering the failed attempt’s scheduler info.
- Similarly, copy the needed data structures inside RMAppAttempt over in the 
case that new attempt is recovering the failed RMAppAttempt info.

> Schedulers don't recognize apps separately from app-attempts
> ------------------------------------------------------------
>
>                 Key: YARN-1493
>                 URL: https://issues.apache.org/jira/browse/YARN-1493
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
> YARN-1493.4.patch, YARN-1493.5.patch
>
>
> Today, scheduler is tied to attempt only.
> We need to separate app-level handling logic in scheduler. We can add new 
> app-level events to the scheduler and separate the app-level logic out. This 
> is good for work-preserving AM restart, RM restart, and also needed for 
> differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to