[ 
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494202#comment-13494202
 ] 

Bikas Saha commented on YARN-128:
---------------------------------

Attaching a proposal doc and code for the first iteration. The proposal is in 
the same lines as the earlier initial design sketch but limits the first 
iteration of the work to restarting the applications after the RM comes back 
up. The reasoning and ideas are detailed in the doc.

Attaching some code that implements the proposal. It includes a functional test 
that verifies the end-to-end scenario using an in-memory store. If everything 
looks good overral then I will tie up the loose ends and add more tests.

For review, the code is broken into 1) removal of old code 2) new code + test. 
There are TODO comments in the code where folks could make suggestions. The 
code is attached in full for a build and test pass on Jenkins because my 
machine is having long host resolution timeouts. Any ideas on this?

During the testing I found a bug in the CapacityScheduler because of which it 
fails to activate applications when resources are added to the cluster. Folks 
can comment on the fix. There is a separate test case that shows the bug and 
verifies the fix.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt, 
> RMRestartPhase1.pdf, YARN-128-combined.patch, YARN-128.new-code.1.patch, 
> YARN-128.patch, YARN-128.remove-old-code.1.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM 
> refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to