Vrushali C commented on YARN-6323:

So this jira YARN-6323 is not for data inconsistencies. It is for dealing with 
NM startup failure. If you bring up an NM with atsv2 enabled on a node which 
has an app that has been running from before atsv2 was turned on, then NM will 
not be able to recover the flow context for this app, since the flow context 
never existed before. 

Related jira was YARN-6555 in which [~rohithsharma] added the work preserving 
flow context storage and retrieval on the NM. 

To explain this jira a bit more:
In the patch on YARN-6555 

at line 386 in ContainerManagerImpl , if the p.getFlowContext() != null then we 
create the Flow Context correctly and pass it in as an argument to  
ApplicationImpl on line 393. But if it is null (when it does not exist), then 
null FlowContext will be passed to ApplicationImpl and ApplicationImpl 
constructor will throw new IllegalArgumentException("flow context cannot be 

> Rolling upgrade/config change is broken on timeline v2. 
> --------------------------------------------------------
>                 Key: YARN-6323
>                 URL: https://issues.apache.org/jira/browse/YARN-6323
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Vrushali C
>              Labels: yarn-5355-merge-blocker
>         Attachments: YARN-6323.001.patch
> Found this issue when deploying on real clusters. If there are apps running 
> when we enable timeline v2 (with work preserving restart enabled), node 
> managers will fail to start due to missing app context data. We should 
> probably assign some default names to these "left over" apps. I believe it's 
> suboptimal to let users clean up the whole cluster before enabling timeline 
> v2. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to