[
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497843#comment-13497843
]
Arinto Murdopo commented on YARN-128:
-------------------------------------
Based on the YARN-128.full-code-4.patch, I have these following observations:
1) In TestRMRestart.java Line 78, app1 and appState refer to the same instance
because we are using memory to store the states (MemoryRMStateStore).
Therefore, the assert result will always be True.
2) ApplicationState is stored when we invoke MockRM's submitApp method. More
precisely, it is in ClientRMService class, line 266. The state that we store
contains the resource request from client. In this case, the value of resource
request is 200. However, if we wait for some time, the value will be updated to
1024 (which is the normalized value given by the Scheduler).
3)Currently our school project is trying to persist the state in persistent
storage, and the assert statement in our modified test class returns error
since our storage stores the resource value before updated by the scheduler.
Based on above observations, should we update the persisted memory value with
the new value assigned by scheduler?
Since we are going to restart both ApplicationMaster and NodeManager when there
is failure in ResourceManager, I think the answer is no, we can use the
original value requested by user. But I'm not really sure with my own
reasoning.. soo.. please comment on it. :) . If the answer is yes, then we
should wait until Scheduler updates the resource value before persisting it
into the storage.
> Resurrect RM Restart
> ---------------------
>
> Key: YARN-128
> URL: https://issues.apache.org/jira/browse/YARN-128
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Bikas Saha
> Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt,
> RMRestartPhase1.pdf, YARN-128.full-code.3.patch, YARN-128.full-code-4.patch,
> YARN-128.new-code-added.3.patch, YARN-128.new-code-added-4.patch,
> YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch,
> YARN-128.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM
> refactor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira