[jira] [Commented] (YARN-128) Resurrect RM Restart

Arinto Murdopo (JIRA) Thu, 15 Nov 2012 00:36:23 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497843#comment-13497843
 ]


Arinto Murdopo commented on YARN-128:
-------------------------------------

Based on the YARN-128.full-code-4.patch, I have these following observations:

1) In TestRMRestart.java Line 78, app1 and appState refer to the same instance 
because we are using memory to store the states (MemoryRMStateStore). 
Therefore, the assert result will always be True. 

2) ApplicationState is stored when we invoke MockRM's submitApp method. More 
precisely, it is in ClientRMService class, line 266. The state that we store 
contains the resource request from client. In this case, the value of resource 
request is 200. However, if we wait for some time, the value will be updated to 
1024 (which is the normalized value given by the Scheduler).

3)Currently our school project is trying to persist the state in persistent 
storage, and the assert statement in our modified test class returns error 
since our storage stores the resource value before updated by the scheduler.

Based on above observations, should we update the persisted memory value with 
the new value assigned by scheduler?
Since we are going to restart both ApplicationMaster and NodeManager when there 
is failure in ResourceManager, I think the answer is no, we can use the 
original value requested by user. But I'm not really sure with my own 
reasoning.. soo.. please comment on it. :) . If the answer is yes, then we 
should wait until Scheduler updates the resource value before persisting it 
into the storage.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt, 
> RMRestartPhase1.pdf, YARN-128.full-code.3.patch, YARN-128.full-code-4.patch, 
> YARN-128.new-code-added.3.patch, YARN-128.new-code-added-4.patch, 
> YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch, 
> YARN-128.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM 
> refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-128) Resurrect RM Restart

Reply via email to