[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Attachment: YARN-6555.003.patch Updated the patch fixing review comment from Haibo. > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Labels: yarn-5355-merge-blocker (was: ) > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6555.001.patch, YARN-6555.002.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Target Version/s: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Attachments: YARN-6555.001.patch, YARN-6555.002.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Attachment: YARN-6555.002.patch Assigned the JIRA to myself and updated patch with test case. > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Attachments: YARN-6555.001.patch, YARN-6555.002.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Attachment: YARN-6555.001.patch Taking liberty to attach a patch without test cases added so that this can be used in case NM recovery is blocking. And also submitting a patch to see jenkins report. > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-6555.001.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Affects Version/s: 3.0.0-alpha3 YARN-5355-branch-2 YARN-5355 > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Vrushali C > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-6555: - Description: If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM fails to start and throws an error as "flow context can't be null". This is happening because the flow context did not exist before but now that timeline service v2 is enabled, ApplicationImpl expects it to exist. This would also happen even if flow context existed before but since we are not persisting it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in to ApplicationImpl. full stack trace {code} 2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.IllegalArgumentException: flow context cannot be null at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) {code} was: If timeline service v2 is enabled and NM is restarted with recovery enabled, then NM fails to start and throws an error as "flow context can't be null". This is happening because the flow context did not exist before but now that timeline service v2 is enabled, ApplicationImpl expects it to exist. This would also happen even if flow context existed before but since we are not persisting it / reading it during ContainerManagerImpl#recoverApplication, it does not get passed in to ApplicationImpl. {code} full stack trace {code} 2017-05-03 21:51:52,178 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.IllegalArgumentException: flow context cannot be null at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) {code} {code} > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not