[
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804905#comment-13804905
]
Jian He commented on YARN-891:
------------------------------
Summarize the patch:
- RMStateStore change:
-- Add more fields in ApplicationState and ApplicationAttemptState class for
storing the final state of application/attempt data. And the corresponding
PBImpls.
-- Add separate API (updateApplicationState, updateApplicationStateInternal
etc.) for handling updating the final state of application/attempt
-- Add corresponding update events for the above update operation.
- RMAppImpl/RMAppAttemptImpl:
-- Create a new FinalSavingTransition. When app/attempt is
finishing/killing/failing, go through FinalSavingTransition, notify
RMStateStore to update the final state and also remember the supposed-to-do
transition after saving operation is done and remember the corresponding event
-- Create a new FINAL_SAVING state waiting for updating final
application/attempt state operation to be done.
-- Create a new FinalStateSavedTransition during which do the earlier
remembered transition with the remembered event.
- RMAppManager
--RMAppManager.recover() is changed to always recover applications, let
RMAppRecoveredTransition internally decide whether to launch the application or
not.
Did manual single node test with HDFS store and ZK store. Restart RM after
application is succeeded, failed, or killed, the application can show up on the
UI and yarn command is also able to retrieve the application status.
To do:
- We should move the newInstance methods from both the data PM impls to the
data objects themselves.
- Change App Kill flow to kill the attempt first and let attempt to notify
app back that it is killed, instead of directly send kill event to the app,
- Support recovering unmanaged AM.
- RMStateStore app cleaner.
- Reject container allocate request in scheduler at Final_Saving state.
> Store completed application information in RM state store
> ---------------------------------------------------------
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Bikas Saha
> Assignee: Jian He
> Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch,
> YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch,
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch,
> YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when
> application/attempt completes. This solves some problems like finished
> application get lost after RM restart and some other races like YARN-1195
--
This message was sent by Atlassian JIRA
(v6.1#6144)