[ 
https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498876#comment-13498876
 ] 

Bikas Saha commented on YARN-128:
---------------------------------

@Arinto
Thanks for using the code!
1) Yes. Both are the same object. But that is what the test is testing. That 
the context that got saved in the store is the same as the one the app was 
submitted with. We are doing this with an in memory store that lets us examine 
the stored data and compare it with the real data. A real store would save this 
the data. So comparison is not possible.
3) Yes. It seems incorrect to store scheduler side-effects. e.g. upon restart 
if the scheduler config make minimum container size = 512 then again it will 
not match.
I am attaching a patch for a ZK store that you can try. It applies on top of 
the current full patch.

@Tom
Thanks for reviewing!
1) There is no race condition because the Dispatcher has not been started yet 
and hence the attempt start event has not been processed. There is a comment to 
that effect in the code.
2) I agree. I had thought about it too. But it looks like the current behavior 
(before this patch) does this because it does not differentiate killed/failed 
attempts when deciding that the attempt retry limit has been reached. So I 
thought about leaving it for a separate jira which would be unrelated to this. 
Once that is done this code could use it and not count the restarted attempt. 
This patch is already huge. Does that sound good?
3) Yes. That could be done. The constructor makes it easier to write tests 
without mangling configs.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, restart-12-11-zkstore.patch, 
> RM-recovery-initial-thoughts.txt, RMRestartPhase1.pdf, 
> YARN-128.full-code.3.patch, YARN-128.full-code-4.patch, 
> YARN-128.new-code-added.3.patch, YARN-128.new-code-added-4.patch, 
> YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch, 
> YARN-128.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM 
> refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to