[jira] [Created] (YARN-9171) Replace incorrect use of system property user.name

2019-01-02 Thread Dinesh Chitlangia (JIRA)
Dinesh Chitlangia created YARN-9171:
---

 Summary: Replace incorrect use of system property user.name
 Key: YARN-9171
 URL: https://issues.apache.org/jira/browse/YARN-9171
 Project: Hadoop YARN
  Issue Type: Improvement
 Environment: Kerberized
Reporter: Dinesh Chitlangia
Assignee: Dinesh Chitlangia


This jira has been created to track the suggested changes for YARN as 
identified in HDFS-14176

Following occurrences need to be corrected:
YARN/ApiServiceClient L211
YARN/CGroupsHandler L460
YARN/YarnServiceJobSubmitter L216
YARN/YarnClientImpl L (here, it is using this pattern only if security is 
not enabled)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9171) Replace incorrect use of system property user.name

2019-05-01 Thread Dinesh Chitlangia (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-9171.
-
Resolution: Won't Fix

> Replace incorrect use of system property user.name
> --
>
> Key: YARN-9171
> URL: https://issues.apache.org/jira/browse/YARN-9171
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: Kerberized
>Reporter: Dinesh Chitlangia
>Assignee: Dinesh Chitlangia
>Priority: Major
>
> This jira has been created to track the suggested changes for YARN as 
> identified in HDFS-14176
> Following occurrences need to be corrected:
> -YARN/ApiServiceClient L211-
> YARN/CGroupsHandlerImpl L460
> -YARN/YarnServiceJobSubmitter L216-
> -YARN/YarnClientImpl L- (here, it is using this pattern only if security 
> is not enabled)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized

2022-04-21 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-1.
--
Fix Version/s: 3.3.4
   Resolution: Fixed

> Recovery failure when node-label configure-type transit from 
> delegated-centralized to centralized
> -
>
> Key: YARN-1
> URL: https://issues.apache.org/jira/browse/YARN-1
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When i make configure-type from delegated-centralized to centralized in 
> yarn-site.xml and restart the RM, it failed.
> The error stacktrace is as follows
>  
> {code:txt}
> 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61)
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328)
> ... 5 more
> 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
>  {code}
> When i digging into the codebase, found that the node and labels mapping is 
> stored in the nodelabel.mirror file when configured the type of centralized. 
> So the content of nodelabel.mirror file is as 

[jira] [Resolved] (YARN-11247) Remove unused classes introduced by YARN-9615

2022-10-18 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-11247.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> Remove unused classes introduced by YARN-9615
> -
>
> Key: YARN-11247
> URL: https://issues.apache.org/jira/browse/YARN-11247
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: DisableEventTypeMetrics-Not used.png
>
>
> YARN-9615 adds Metric to RM's dispatcher, but the patch introduces a class 
> without any usage
> org.apache.hadoop.yarn.metrics#DisableEventTypeMetrics
> 1. Without any code references
> 2. Without any test code references
> 3. Delete this class, the local can be compiled successfully
> I think this class can be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11307) Fix Yarn Router Broken Link

2022-09-20 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-11307.
--
Fix Version/s: 3.3.9
   Resolution: Fixed

> Fix Yarn Router Broken Link
> ---
>
> Key: YARN-11307
> URL: https://issues.apache.org/jira/browse/YARN-11307
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11324) [Federation] Fix some PBImpl classes to avoid NPE.

2022-10-04 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-11324.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> [Federation] Fix some PBImpl classes to avoid NPE.
> --
>
> Key: YARN-11324
> URL: https://issues.apache.org/jira/browse/YARN-11324
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router, yarn
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: image-2022-09-30-16-52-25-031.png
>
>
> When completing YARN-11323, I found that there is a bug in 
> ApplicationHomeSubClusterPBImpl, which may cause a null pointer exception 
> when getting getApplicationId
> {code:java}
> @Test
> public void testGetApplicationIdNullException() throws YarnException {
>   ApplicationId appId = ApplicationId.newInstance(Time.now(), 1);
>   ApplicationHomeSubCluster appHomeSC = ApplicationHomeSubCluster.newInstance(
>   appId, subClusterId);
>   System.out.println(appHomeSC.getApplicationId());
> } {code}
> The test results are as follows:
> !image-2022-09-30-16-52-25-031.png|width=818,height=271!
>  
> After we set the ApplicationId, direct get will get a null value.
> *Why this problem occurs?*
> The reason for this problem is because we did not set a value for 
> ApplicationHomeSubClusterProtoOrBuilder when we setApplication
> *Improve the code:*
> 1.set a value for ApplicationHomeSubClusterProtoOrBuilder when we 
> setApplication.
> 2. At the same time, in order to improve the access efficiency, we should 
> first check whether the internal property is empty when getApplication. If it 
> is not empty, we can return it directly. If it is empty, we convert it from 
> the proto object.
> While modifying ApplicationHomeSubClusterImpl, I will check the pbImpl 
> classes of all router modules to make sure all pbimpl are fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6169) container-executor message on empty configuration file can be improved

2022-10-03 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-6169.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> container-executor message on empty configuration file can be improved
> --
>
> Key: YARN-6169
> URL: https://issues.apache.org/jira/browse/YARN-6169
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Riya Khandelwal
>Priority: Trivial
>  Labels: newbie, pull-request-available, trivial
> Fix For: 3.4.0
>
>
> If the configuration file is empty, we get the following error message:
> {{Invalid configuration provided in /root/etc/hadoop/container-executor.cfg}}
> This is does not provide enough details to figure out what is the issue at 
> the first glance. We should use something like 'Empty configuration file 
> provided...'
> {code}
>   if (cfg->size == 0) {
> fprintf(ERRORFILE, "Invalid configuration provided in %s\n", file_name);
> exit(INVALID_CONFIG_FILE);
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11626) Optimization of the safeDelete operation in ZKRMStateStore

2024-03-21 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-11626.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

> Optimization of the safeDelete operation in ZKRMStateStore
> --
>
> Key: YARN-11626
> URL: https://issues.apache.org/jira/browse/YARN-11626
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha4, 3.1.1, 3.3.0
>Reporter: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> h1. Description 
>  * We can be observed that removing app info started at 06:17:20, but the 
> NoNodeException was received at 06:17:35. 
>  * During the 15s interval, Curator was retrying the metadata operation. Due 
> to the non-idempotent nature of the Zookeeper deletion operation, in one of 
> the retry attempts, the metadata operation was successful but no response was 
> received. In the next retry it resulted in a NoNodeException, triggering the 
> STATE_STORE_FENCED event and ultimately causing the current ResourceManager 
> to switch to standby .
> {code:java}
> 2023-10-28 06:17:20,359 INFO  recovery.RMStateStore 
> (RMStateStore.java:transition(333)) - Removing info for app: 
> application_1697410508608_140368
> 2023-10-28 06:17:20,359 INFO  resourcemanager.RMAppManager 
> (RMAppManager.java:checkAppNumCompletedLimit(303)) - Application should be 
> expired, max number of completed apps kept in memory met: 
> maxCompletedAppsInMemory = 1000, removing app 
> application_1697410508608_140368 from memory:
> 2023-10-28 06:17:35,665 ERROR recovery.RMStateStore 
> (RMStateStore.java:transition(337)) - Error removing app: 
> application_1697410508608_140368
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> 2023-10-28 06:17:35,666 INFO  recovery.RMStateStore 
> (RMStateStore.java:handleStoreEvent(1147)) - RMStateStore state change from 
> ACTIVE to FENCED
> 2023-10-28 06:17:35,666 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:handle(898)) - Received RMFatalEvent of type 
> STATE_STORE_FENCED, caused by 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
> 2023-10-28 06:17:35,666 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:transitionToStandby(1309)) - Transitioning to standby 
> state
>  {code}
> h1. Solution
> The NoNodeException clearly indicates that the Znode no longer exists, so we 
> can safely ignore this exception to avoid triggering a larger impact on the 
> cluster caused by ResourceManager failover.
> h1. Other
> We also need to discuss and optimize the same issues in safeCreate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11670) Add CallerContext in NodeManager

2024-04-08 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved YARN-11670.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Thanks [~yangjiandan] for contribution and [~whbing]  and [~slfan1989]  for 
reviews.

> Add CallerContext in NodeManager
> 
>
> Key: YARN-11670
> URL: https://issues.apache.org/jira/browse/YARN-11670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Currently, MR and Spark have added caller context, enabling tracing of 
> HDFS/ResourceManager operators from Spark apps and MapReduce apps. However, 
> operators from NodeManagers cannot be identified in the audit log. For 
> example, HDFS operations issued from NodeManagers during resource 
> localization cannot be identified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org