[jira] [Created] (YARN-9171) Replace incorrect use of system property user.name
Dinesh Chitlangia created YARN-9171: --- Summary: Replace incorrect use of system property user.name Key: YARN-9171 URL: https://issues.apache.org/jira/browse/YARN-9171 Project: Hadoop YARN Issue Type: Improvement Environment: Kerberized Reporter: Dinesh Chitlangia Assignee: Dinesh Chitlangia This jira has been created to track the suggested changes for YARN as identified in HDFS-14176 Following occurrences need to be corrected: YARN/ApiServiceClient L211 YARN/CGroupsHandler L460 YARN/YarnServiceJobSubmitter L216 YARN/YarnClientImpl L (here, it is using this pattern only if security is not enabled) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9171) Replace incorrect use of system property user.name
[ https://issues.apache.org/jira/browse/YARN-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-9171. - Resolution: Won't Fix > Replace incorrect use of system property user.name > -- > > Key: YARN-9171 > URL: https://issues.apache.org/jira/browse/YARN-9171 > Project: Hadoop YARN > Issue Type: Improvement > Environment: Kerberized >Reporter: Dinesh Chitlangia >Assignee: Dinesh Chitlangia >Priority: Major > > This jira has been created to track the suggested changes for YARN as > identified in HDFS-14176 > Following occurrences need to be corrected: > -YARN/ApiServiceClient L211- > YARN/CGroupsHandlerImpl L460 > -YARN/YarnServiceJobSubmitter L216- > -YARN/YarnClientImpl L- (here, it is using this pattern only if security > is not enabled) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized
[ https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-1. -- Fix Version/s: 3.3.4 Resolution: Fixed > Recovery failure when node-label configure-type transit from > delegated-centralized to centralized > - > > Key: YARN-1 > URL: https://issues.apache.org/jira/browse/YARN-1 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4 > > Time Spent: 1h > Remaining Estimate: 0h > > When i make configure-type from delegated-centralized to centralized in > yarn-site.xml and restart the RM, it failed. > The error stacktrace is as follows > > {code:txt} > 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61) > at > org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76) > at > org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120) > at > org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328) > ... 5 more > 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Trying to re-establish ZK session > {code} > When i digging into the codebase, found that the node and labels mapping is > stored in the nodelabel.mirror file when configured the type of centralized. > So the content of nodelabel.mirror file is as
[jira] [Resolved] (YARN-11247) Remove unused classes introduced by YARN-9615
[ https://issues.apache.org/jira/browse/YARN-11247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-11247. -- Fix Version/s: 3.4.0 Resolution: Fixed > Remove unused classes introduced by YARN-9615 > - > > Key: YARN-11247 > URL: https://issues.apache.org/jira/browse/YARN-11247 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: DisableEventTypeMetrics-Not used.png > > > YARN-9615 adds Metric to RM's dispatcher, but the patch introduces a class > without any usage > org.apache.hadoop.yarn.metrics#DisableEventTypeMetrics > 1. Without any code references > 2. Without any test code references > 3. Delete this class, the local can be compiled successfully > I think this class can be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11307) Fix Yarn Router Broken Link
[ https://issues.apache.org/jira/browse/YARN-11307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-11307. -- Fix Version/s: 3.3.9 Resolution: Fixed > Fix Yarn Router Broken Link > --- > > Key: YARN-11307 > URL: https://issues.apache.org/jira/browse/YARN-11307 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.9 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11324) [Federation] Fix some PBImpl classes to avoid NPE.
[ https://issues.apache.org/jira/browse/YARN-11324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-11324. -- Fix Version/s: 3.4.0 Resolution: Fixed > [Federation] Fix some PBImpl classes to avoid NPE. > -- > > Key: YARN-11324 > URL: https://issues.apache.org/jira/browse/YARN-11324 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router, yarn >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Fix For: 3.4.0 > > Attachments: image-2022-09-30-16-52-25-031.png > > > When completing YARN-11323, I found that there is a bug in > ApplicationHomeSubClusterPBImpl, which may cause a null pointer exception > when getting getApplicationId > {code:java} > @Test > public void testGetApplicationIdNullException() throws YarnException { > ApplicationId appId = ApplicationId.newInstance(Time.now(), 1); > ApplicationHomeSubCluster appHomeSC = ApplicationHomeSubCluster.newInstance( > appId, subClusterId); > System.out.println(appHomeSC.getApplicationId()); > } {code} > The test results are as follows: > !image-2022-09-30-16-52-25-031.png|width=818,height=271! > > After we set the ApplicationId, direct get will get a null value. > *Why this problem occurs?* > The reason for this problem is because we did not set a value for > ApplicationHomeSubClusterProtoOrBuilder when we setApplication > *Improve the code:* > 1.set a value for ApplicationHomeSubClusterProtoOrBuilder when we > setApplication. > 2. At the same time, in order to improve the access efficiency, we should > first check whether the internal property is empty when getApplication. If it > is not empty, we can return it directly. If it is empty, we convert it from > the proto object. > While modifying ApplicationHomeSubClusterImpl, I will check the pbImpl > classes of all router modules to make sure all pbimpl are fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6169) container-executor message on empty configuration file can be improved
[ https://issues.apache.org/jira/browse/YARN-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-6169. - Fix Version/s: 3.4.0 Resolution: Fixed > container-executor message on empty configuration file can be improved > -- > > Key: YARN-6169 > URL: https://issues.apache.org/jira/browse/YARN-6169 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Miklos Szegedi >Assignee: Riya Khandelwal >Priority: Trivial > Labels: newbie, pull-request-available, trivial > Fix For: 3.4.0 > > > If the configuration file is empty, we get the following error message: > {{Invalid configuration provided in /root/etc/hadoop/container-executor.cfg}} > This is does not provide enough details to figure out what is the issue at > the first glance. We should use something like 'Empty configuration file > provided...' > {code} > if (cfg->size == 0) { > fprintf(ERRORFILE, "Invalid configuration provided in %s\n", file_name); > exit(INVALID_CONFIG_FILE); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11626) Optimization of the safeDelete operation in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-11626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-11626. -- Fix Version/s: 3.5.0 Resolution: Fixed > Optimization of the safeDelete operation in ZKRMStateStore > -- > > Key: YARN-11626 > URL: https://issues.apache.org/jira/browse/YARN-11626 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0-alpha4, 3.1.1, 3.3.0 >Reporter: wangzhihui >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > h1. Description > * We can be observed that removing app info started at 06:17:20, but the > NoNodeException was received at 06:17:35. > * During the 15s interval, Curator was retrying the metadata operation. Due > to the non-idempotent nature of the Zookeeper deletion operation, in one of > the retry attempts, the metadata operation was successful but no response was > received. In the next retry it resulted in a NoNodeException, triggering the > STATE_STORE_FENCED event and ultimately causing the current ResourceManager > to switch to standby . > {code:java} > 2023-10-28 06:17:20,359 INFO recovery.RMStateStore > (RMStateStore.java:transition(333)) - Removing info for app: > application_1697410508608_140368 > 2023-10-28 06:17:20,359 INFO resourcemanager.RMAppManager > (RMAppManager.java:checkAppNumCompletedLimit(303)) - Application should be > expired, max number of completed apps kept in memory met: > maxCompletedAppsInMemory = 1000, removing app > application_1697410508608_140368 from memory: > 2023-10-28 06:17:35,665 ERROR recovery.RMStateStore > (RMStateStore.java:transition(337)) - Error removing app: > application_1697410508608_140368 > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > 2023-10-28 06:17:35,666 INFO recovery.RMStateStore > (RMStateStore.java:handleStoreEvent(1147)) - RMStateStore state change from > ACTIVE to FENCED > 2023-10-28 06:17:35,666 ERROR resourcemanager.ResourceManager > (ResourceManager.java:handle(898)) - Received RMFatalEvent of type > STATE_STORE_FENCED, caused by > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode > 2023-10-28 06:17:35,666 INFO resourcemanager.ResourceManager > (ResourceManager.java:transitionToStandby(1309)) - Transitioning to standby > state > {code} > h1. Solution > The NoNodeException clearly indicates that the Znode no longer exists, so we > can safely ignore this exception to avoid triggering a larger impact on the > cluster caused by ResourceManager failover. > h1. Other > We also need to discuss and optimize the same issues in safeCreate. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11670) Add CallerContext in NodeManager
[ https://issues.apache.org/jira/browse/YARN-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved YARN-11670. -- Fix Version/s: 3.5.0 Resolution: Fixed Thanks [~yangjiandan] for contribution and [~whbing] and [~slfan1989] for reviews. > Add CallerContext in NodeManager > > > Key: YARN-11670 > URL: https://issues.apache.org/jira/browse/YARN-11670 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Jiandan Yang >Assignee: Jiandan Yang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Currently, MR and Spark have added caller context, enabling tracing of > HDFS/ResourceManager operators from Spark apps and MapReduce apps. However, > operators from NodeManagers cannot be identified in the audit log. For > example, HDFS operations issued from NodeManagers during resource > localization cannot be identified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org