[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-1: -- Hadoop Flags: Reviewed > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) > ... 5 more > 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {code} > When i digging into the codebase, found that the node and labels mapping is > stored in the nodelabel.mirror file when configured the type of centralized. > So the content of nodelabel.mirror
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-1: -- Component/s: yarn > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) > ... 5 more > 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {code} > When i digging into the codebase, found that the node and labels mapping is > stored in the nodelabel.mirror file when configured the type of centralized. > So the content of nodelabel.mirror file
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-1: -- Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) > ... 5 more > 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {code} > When i digging into the codebase, found that the node and labels mapping is > stored in the nodelabel.mirror file when configured the type of centralized. > So the content of
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-1: - Fix Version/s: 3.4.0 (was: 3.3.4) Changed the fix version to 3.4.0 because it is now only in trunk. > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) > ... 5 more > 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {code} > When i digging into the codebase, found that the node and labels mapping is > stored in the nodelabel.mirror file when configured the type of
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-1: -- Labels: pull-request-available (was: ) > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) > ... 5 more > 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {code} > When i digging into the codebase, found that the node and labels mapping is > stored in the nodelabel.mirror file when configured the type of centralized. > So the content of nodelabel.mirror file is as follows > 1. the node-label list > 2. the node to
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated YARN-1: Description: When i make configure-type from delegated-centralized to centralized in yarn-site.xml and restart the RM, it failed. The error stacktrace is as follows {code:txt} 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) ... 4 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) at org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) at org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) ... 5 more 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session {code} When i digging into the codebase, found that the node and labels mapping is stored in the nodelabel.mirror file when configured the type of centralized. So the content of nodelabel.mirror file is as follows 1. the node-label list 2. the node to label mapping (only exist when configured the type of centralized) was: When i make configure-type from delegated-centralized to centralized in yarn-site.xml and restart the RM, it failed. The error stacktrace is as follows {code:txt} 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) at
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated YARN-1: Description: When i make configure-type from delegated-centralized to centralized in yarn-site.xml and restart the RM, it failed. The error stacktrace is as follows {code:txt} 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) ... 4 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) at org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) at org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) ... 5 more 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session {code} When i digging into the codebase, found that the node and labels mapping is stored in the nodelabel.mirror file when configured the type of centralized. However the conf was: When i make configure-type from delegated-centralized to centralized in yarn-site.xml and restart the RM, it failed. The error stacktrace is as follows {code:txt} 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610)
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated YARN-1: Description: When i make configure-type from delegated-centralized to centralized in yarn-site.xml and restart the RM, it failed. The error stacktrace is as follows {code:txt} 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) ... 4 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) at org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) at org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) at org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) ... 5 more 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session {code} When i digging into the codebase, found that the node and labels mapping is stored into the nodelabel.mirror file when configured the was:When i > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated YARN-1: Description: When i > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > > When i -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org