[ 
https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765187#comment-13765187
 ] 

Bikas Saha commented on YARN-540:
---------------------------------


I think there is a member variable in AMRMClient that is used to get the ping 
interval from config. We could use that instead of the hardcoded 100. Sorry for 
not mentioning it earlier.
{code}
+        LOG.info("Waiting for application to be successfully unregistered.");
+        Thread.sleep(100);
{code}

Can we rename isAppRemovedFromStateStore() to isAppSafeToUnregister()? Then we 
can move the check for unmanagedAM within that method. This way we wont leak 
unmanagedAM outside RMAppImpl.

This transition is invalid and should not be ignored. Its a bug if it happens.
{code}
+    // ignorable transitions
+    .addTransition(RMAppState.REMOVING, RMAppState.REMOVING,
+        RMAppEventType.ATTEMPT_UNREGISTERED)
{code}

Shouldnt the app.isAppRemovalRequestSent flag be checked here since this will 
typically happen after unregister has already removed the app. How is this 
working on a single node cluster? Is delete not throwing an exception for 
non-existent location?
{code}
+      // application completely done and remove from state store.
+      // App state may be already removed during 
RMAppFinishingOrRemovingTransition.
+      RMStateStore store = app.rmContext.getStateStore();
+      store.removeApplication(app);
{code}

What is the YARNApplicationState enum corresponding to AppState.REMOVING?

Is MockRMApp never expected to get removed from the store? I would have 
expected this to return true.
{code}
+  @Override
+  public boolean isAppRemovedFromStateStore() {
+    return false;
{code}

Can RMAppEventType.ATTEMPT_FAILED be received when in REMOVING state (and also 
when in FINISHING state)?


                
> Race condition causing RM to potentially relaunch already unregistered AMs on 
> RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.1.patch, YARN-540.2.patch, YARN-540.3.patch, 
> YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.patch, 
> YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown 
> and restart-dispatcher is stopped before it can process REMOVE_APP event. The 
> next time RM comes back, it will reload the existing state files even though 
> the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to