[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037791#comment-14037791
 ] 

Jian He commented on YARN-1365:
-------------------------------

Thanks for updating the patch

1.how about UnregisteredApplicationMasterException -> 
ApplicationMasterNotRegisteredException ?
please also add comments that this exception can happen even if application has 
registered before because RM may have restarted and the expectation to handle 
this exception is to re-register.

2.This newly added constructor is not used anywhere?  we can just use 
“app.handler.handle” to send the scheduler event in RMAppRecoverdTransition 
instead of refactoring the transition. 
{code}
    public void transition(RMAppImpl app, RMAppEvent event,
                           boolean shouldSchedulerNotifyAppAdded) {
      transitionImplementation(app, event, shouldSchedulerNotifyAppAdded);
    }
{code}

3. the following code format in FifoScheduler can be consolidated to 2 lines. 
{code}
  public synchronized void addApplication(ApplicationId applicationId,
                                          String queue, String user, boolean
      shouldNotifyAppAccepted) {
{code}

4. some minor comments on testRMRestartWorkPreservingAppReregister:
this conf.set is not needed, it’s already enabled globally.
{code}
conf.setBoolean(YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED,
        true);
{code}
We can use MockRM.launchAndRegisterAM instead of changing 
TestRMRestart.launchAM to be static
{code}
MockAM am0 = TestRMRestart.launchAM(app0, rm1, nm1);
{code}
If using the global variable rm1,rm2, the following two statements are not 
needed.
{code}
    rm1.stop();
    rm2.stop();
{code}

> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1365
>                 URL: https://issues.apache.org/jira/browse/YARN-1365
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.003.patch, YARN-1365.004.patch, YARN-1365.005.patch, 
> YARN-1365.005.patch, YARN-1365.006.patch, YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to