[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023859#comment-16023859
 ] 

Vrushali C commented on YARN-6555:
----------------------------------

Thanks Rohith and Haibo for the patch and discussions.

I think I agree we should store the flow context in the state store only if all 
three fields of flow context is present. I think default values for flow 
context should be used only in the case when we are upgrading from non-existent 
flow context to enabling atvs2, in that case, it's at read time from state 
store when we don't find it in the state store. YARN-6323

Which leads me to the question, regarding YARN-6323. Rohith, should I rebase my 
patch on YARN-6323 after this one goes in? Then I can create a default flow 
context if the null check at 386 in ContainerManagerImpl.java  fails. 

One more question. In buildAppProto at lines 986 onwards in 
ContainerManagerImpl.java, should those be done only if ATSv2 is enabled?

At line 1041, in startContainerInternal in ContainerManagerImpl.java, just 
trying to understand why these were moved. Should the flow context not be 
created from launch context if application exists? Trying to understand what 
behavior will change by this code movement.



> Enable flow context read (& corresponding write) for recovering application 
> with NM restart 
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-6555
>                 URL: https://issues.apache.org/jira/browse/YARN-6555
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-6555.001.patch, YARN-6555.002.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to