[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430973#comment-15430973
 ] 

Jason Lowe commented on YARN-3998:
----------------------------------

bq. it looks like we throw an exception when we encounter a field we don't know 
of.

Yeah, I misremembered how unexpected keys were handled.  We should fix that to 
make it easier to do rolling downgrades.  Bonus points for having it clean up 
all unrecognized keys underneath a container "directory" when the container 
completes so we don't leak unknown keys after a downgrade.

In the meantime we'll have to treat it as an incompatible change with respect 
to downgrades.  As I mentioned on YARN-5049 if we don't create these new store 
keys unless the feature is being used then at least we have the benefit of not 
breaking downgrades until the feature is enabled.  Users that don't turn on the 
feature can still downgrade if needed.

> Add support in the NodeManager to re-launch containers
> ------------------------------------------------------
>
>                 Key: YARN-3998
>                 URL: https://issues.apache.org/jira/browse/YARN-3998
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>             Fix For: 2.9.0
>
>         Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, 
> YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch, YARN-3998.09.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to