[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804905#comment-13804905
 ] 

Jian He commented on YARN-891:
------------------------------

Summarize the patch:

- RMStateStore change:
   -- Add more fields in ApplicationState and ApplicationAttemptState class for 
storing the final state of application/attempt data. And the corresponding 
PBImpls.
   -- Add separate API (updateApplicationState, updateApplicationStateInternal 
etc.) for handling updating the final state of application/attempt
   -- Add corresponding update events for the above update operation.
- RMAppImpl/RMAppAttemptImpl:
  -- Create a new FinalSavingTransition. When app/attempt is 
finishing/killing/failing, go through FinalSavingTransition, notify 
RMStateStore to update the final state and also remember the supposed-to-do 
transition after saving operation is done and remember the corresponding event
  -- Create a new FINAL_SAVING state waiting for updating final 
application/attempt state operation to be done.
  -- Create a new FinalStateSavedTransition during which do the earlier 
remembered transition with the remembered event.
- RMAppManager
  --RMAppManager.recover() is changed to always recover applications, let 
RMAppRecoveredTransition internally decide whether to launch the application or 
not.

Did manual single node test with HDFS store and ZK store. Restart RM after 
application is succeeded, failed, or killed, the application can show up on the 
UI and yarn command is also able to retrieve the application status.

To do:
  - We should move the newInstance methods from both the data PM impls to the 
data objects themselves.
    -  Change App Kill flow to kill the attempt first and let attempt to notify 
app back that it is killed,  instead of directly send kill event to the app,
    -  Support recovering unmanaged AM.
    - RMStateStore app cleaner.
    - Reject container allocate request in scheduler at Final_Saving state.


> Store completed application information in RM state store
> ---------------------------------------------------------
>
>                 Key: YARN-891
>                 URL: https://issues.apache.org/jira/browse/YARN-891
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Jian He
>         Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
> YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to