[
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767371#comment-13767371
]
Bikas Saha commented on YARN-540:
---------------------------------
{code}
Note: This flag is only needed for RM recovery purpose. If RM recovery is
+ * enabled, user is expected to retry until this flag becomes true.
+ * Otherwise,user will risk restarting an already finished application after RM
+ * restarts.
{code}
How about the following?
The flag indicates whether the application has successfully unregistered and is
safe to stop. The application may stop after the flag is true. If the
application stops before the flag is true then the RM may retry the application
.
{code}
/**
+ * Get the flag which indicates that the application has successfully
+ * unregistered with RM and the application state has been removed from
+ * RMStateStore.
+ */
{code}
Lets not mention internal names like RMStateStore in the javadoc. We can simply
say "unregistered with the RM and the application can safely stop"
Can we create an RMApp method to createYarnApplicationState() (and remove the
ServerUtils method) instead of exposing internal stuff via
getPreviousStateAtRemoving()
{code}
+ public static YarnApplicationState createApplicationState(RMApp rmApp) {
+ RMAppState rmAppState = rmApp.getState();
+ // If App is in REMOVING state, return its previous state.
+ if (rmAppState.equals(RMAppState.REMOVING)) {
+ rmAppState = rmApp.getPreviousStateAtRemoving();
{code}
Can we make this a common method instead of duplicating the code
{code}
+ if (!app.isAppRemovalRequestSent) {
+ // application completely done and remove from state store.
+ app.rmContext.getStateStore().removeApplication(app);
+ app.isAppRemovalRequestSent = true;
+ }
{code}
This should be in FinalTransition.transition() because its common to all kinds
of terminal transitions. All terminal transitions including AttemptFinished
transition call FinalTransition.transition(). Sorry for not noticing this
earlier.
{code}
app.finishTime = System.currentTimeMillis();
}
+ if (!app.isAppRemovalRequestSent) {
+ // application completely done and remove from state store.
+ app.rmContext.getStateStore().removeApplication(app);
+ app.isAppRemovalRequestSent = true;
+ }
+
{code}
Isnt testAppRemovingFinishing() already covered by testCreateAppFinishing()?
> Race condition causing RM to potentially relaunch already unregistered AMs on
> RM restart
> ----------------------------------------------------------------------------------------
>
> Key: YARN-540
> URL: https://issues.apache.org/jira/browse/YARN-540
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Jian He
> Assignee: Jian He
> Attachments: YARN-540.10.patch, YARN-540.10.patch, YARN-540.1.patch,
> YARN-540.2.patch, YARN-540.3.patch, YARN-540.4.patch, YARN-540.5.patch,
> YARN-540.6.patch, YARN-540.7.patch, YARN-540.7.patch, YARN-540.8.patch,
> YARN-540.9.patch, YARN-540.9.patch, YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The
> next time RM comes back, it will reload the existing state files even though
> the job is succeeded
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira