date:20240127

[jira] [Updated] (YARN-11103) SLS cleanup after previously merged SLS refactor jiras

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11103:
--
Affects Version/s: 3.4.0

> SLS cleanup after previously merged SLS refactor jiras
> --
>
> Key: YARN-11103
> URL: https://issues.apache.org/jira/browse/YARN-11103
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There have been some jiras that moved around SLS code in order to have a more 
> readable SLSRunner.
> Mostly, the code fragments were just moved to separate classes.
> Most of the issues came up were just because our build system detected them 
> as failures but they were part of the original code so they were not newly 
> introduced issues.
> There were some comments about fixing these, here are all of them I found, so 
> we need to fix these (if they are not yet fixed):
> * 
> https://issues.apache.org/jira/browse/YARN-10548?focusedCommentId=17512336=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17512336
> https://issues.apache.org/jira/browse/YARN-10548?focusedCommentId=17513012=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17513012
> https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17511762=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17511762
> https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17390981=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17390981
> https://issues.apache.org/jira/browse/YARN-10547?focusedCommentId=17510839=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17510839
> https://issues.apache.org/jira/browse/YARN-11094?focusedCommentId=17512324=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17512324



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11103) SLS cleanup after previously merged SLS refactor jiras

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11103:
--
Target Version/s: 3.4.0

> SLS cleanup after previously merged SLS refactor jiras
> --
>
> Key: YARN-11103
> URL: https://issues.apache.org/jira/browse/YARN-11103
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There have been some jiras that moved around SLS code in order to have a more 
> readable SLSRunner.
> Mostly, the code fragments were just moved to separate classes.
> Most of the issues came up were just because our build system detected them 
> as failures but they were part of the original code so they were not newly 
> introduced issues.
> There were some comments about fixing these, here are all of them I found, so 
> we need to fix these (if they are not yet fixed):
> * 
> https://issues.apache.org/jira/browse/YARN-10548?focusedCommentId=17512336=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17512336
> https://issues.apache.org/jira/browse/YARN-10548?focusedCommentId=17513012=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17513012
> https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17511762=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17511762
> https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17390981=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17390981
> https://issues.apache.org/jira/browse/YARN-10547?focusedCommentId=17510839=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17510839
> https://issues.apache.org/jira/browse/YARN-11094?focusedCommentId=17512324=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17512324



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11106) Fix the test failure due to missing conf of yarn.resourcemanager.node-labels.am.default-node-label-expression

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11106:
--
Component/s: test

> Fix the test failure due to missing conf of 
> yarn.resourcemanager.node-labels.am.default-node-label-expression
> -
>
> Key: YARN-11106
> URL: https://issues.apache.org/jira/browse/YARN-11106
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11106) Fix the test failure due to missing conf of yarn.resourcemanager.node-labels.am.default-node-label-expression

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11106:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix the test failure due to missing conf of 
> yarn.resourcemanager.node-labels.am.default-node-label-expression
> -
>
> Key: YARN-11106
> URL: https://issues.apache.org/jira/browse/YARN-11106
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11107:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> When NodeLabel is enabled for a YARN cluster, AM blacklist program does not 
> work properly
> -
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Assignee: Xiping Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-1:
--
Hadoop Flags: Reviewed

> Recovery failure when node-label configure-type transit from 
> delegated-centralized to centralized
> -
>
> Key: YARN-1
> URL: https://issues.apache.org/jira/browse/YARN-1
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When i make configure-type from delegated-centralized to centralized in 
> yarn-site.xml and restart the RM, it failed.
> The error stacktrace is as follows
>  
> {code:txt}
> 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61)
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328)
> ... 5 more
> 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
>  {code}
> When i digging into the codebase, found that the node and labels mapping is 
> stored in the nodelabel.mirror file when configured the type of centralized. 
> So the content of nodelabel.mirror

[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-1:
--
Component/s: yarn

> Recovery failure when node-label configure-type transit from 
> delegated-centralized to centralized
> -
>
> Key: YARN-1
> URL: https://issues.apache.org/jira/browse/YARN-1
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When i make configure-type from delegated-centralized to centralized in 
> yarn-site.xml and restart the RM, it failed.
> The error stacktrace is as follows
>  
> {code:txt}
> 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61)
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328)
> ... 5 more
> 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
>  {code}
> When i digging into the codebase, found that the node and labels mapping is 
> stored in the nodelabel.mirror file when configured the type of centralized. 
> So the content of nodelabel.mirror file

[jira] [Updated] (YARN-11111) Recovery failure when node-label configure-type transit from delegated-centralized to centralized

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-1:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Recovery failure when node-label configure-type transit from 
> delegated-centralized to centralized
> -
>
> Key: YARN-1
> URL: https://issues.apache.org/jira/browse/YARN-1
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When i make configure-type from delegated-centralized to centralized in 
> yarn-site.xml and restart the RM, it failed.
> The error stacktrace is as follows
>  
> {code:txt}
> 2022-04-13 14:44:14,885 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:901)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:610)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:333)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.initNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:61)
> at 
> org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.ReplaceLabelsOnNodeRequestPBImpl.getNodeToLabels(ReplaceLabelsOnNodeRequestPBImpl.java:138)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:76)
> at 
> org.apache.hadoop.yarn.nodelabels.store.op.NodeLabelMirrorOp.recover(NodeLabelMirrorOp.java:41)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:120)
> at 
> org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:149)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:106)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:252)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:266)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:910)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1278)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1315)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1315)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:328)
> ... 5 more
> 2022-04-13 14:44:14,886 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
>  {code}
> When i digging into the codebase, found that the node and labels mapping is 
> stored in the nodelabel.mirror file when configured the type of centralized. 
> So the content of

[jira] [Updated] (YARN-11116) Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter class

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Migrate Times util from SimpleDateFormat to thread-safe DateTimeFormatter 
> class
> ---
>
> Key: YARN-6
> URL: https://issues.apache.org/jira/browse/YARN-6
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
> Attachments: YARN-6.001.perftest.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Came across a stack trace with SimpleDateFormatter in it which led me to 
> investigate current practices
>  
> {noformat}
>  6578 "IPC Server handler 29 on 8032" #797 daemon prio=5 os_prio=0 
> tid=0x7fb6527d nid=0x953b runnable [0x7fb5ba034000]
>  6579    java.lang.Thread.State: RUNNABLE
>  6580     at org.apache.hadoop.yarn.util.Times.formatISO8601(Times.java:95)
>  6581     at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:810)
>  6582     at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:396)
>  6583     at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:224)
>  6584     at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:529)
>  6585     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>  6586     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:500)
>  6587     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1069)
>  6588     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
>  6589     at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:936)
>  6590     at java.security.AccessController.doPrivileged(Native Method)
>  6591     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2135)
>  6592     at 
> org.apache.hadoop.security.UserGroupInformation.doAsPrivileged(UserGroupInformation.java:2123)
>  6593     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2875)
>  6594 
> {noformat}
>  
> DateTimeFormatter is thread-safe meaning no need to wrap the class in Thread 
> local as they can be reused safely across threads. In addition, the new 
> classes are slightly more performant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/commit/88dcf40f4dab564477542b8efb82f4f20d132eee].
> 1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default".
> The testcase queries the apps by queue name "default" and the response only 
> contains the runningApp, which is submitted to "default" so the other app 
> that is submitted to "root.default" is not returned.
> 2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default" (same 
> setup as above).
> The testcase queries the apps by queue name "root.default" (which is the full 
> queue path) and the response only contains the finishedApp, which is 
> submittted to "root.default" so the other app that is submitted to "default" 
> is not returned.
> A trivial conclusion of this is that only those applications are included in 
> the response that exactly match the queue name where the application is 
> submitted to, either specified explicity at submission or resolved by the 
> placement engine.
> Before YARN-9879 was implemented, Capacity Scheduler was only capable of 
> definining a leaf queue with a specific name in the whole hierarchy once, 
> meaning that leaf queue names were unique.
> For example root.a.testQueue and root.b.testQueue couldn't coexist, as the 
> leaf queue name is the same.
> At this point, I supposed that YARN-9879 is causing this issue, but as the 
> behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with 
> the same name, a query of "root.default" and "default" could easily work as 
> it was guaranteed that there's not another "default" leaf queue in the 
> hierarchy, just one. I digged a bit further.
> I also noticed that YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797])
>  could have introduced this issue a long time ago, as it removed the iterator 
> logic that queried the applications with method YarnScheduler#getAppsInQueue 
> (see 
> [this|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797#diff-5b432bf3a8eb3e039878300ffb9db1f728226b9e3f63c4eb53be5ed5a833390aL843]).
> Let's follow the implementation of YarnScheduler#getAppsInQueue for CS: 
> 1. First of all, 
> [here|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L2501-L2509]
>  is the method definition.
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is called from here.
> 2. 
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is then calling 
> [QueueManager#getQueue|https://github.com/apache/hadoop/blob/da09d68056d4e6a9490ddc6d9ae816b65217e117/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java#L136-L138].
> 3. 
> [QueueManager#getQueue|https://github.com/apache/hadoop/blob/da09d68056d4e6a9490ddc6d9ae816b65217e117/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java#L136-L138]
>  is then calling

[jira] [Updated] (YARN-11121) Check GetClusterMetrics Request parameter is null

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11121:
--
Fix Version/s: (was: 3.4.0)

> Check GetClusterMetrics Request parameter is null
> -
>
> Key: YARN-11121
> URL: https://issues.apache.org/jira/browse/YARN-11121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The original code logic does not judge that the request is NULL. In this 
> case, add a judgment condition to ensure that when it is empty, it can be 
> effectively processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11123) ResourceManager webapps test failures due to org.apache.hadoop.metrics2.MetricsException and subsequent java.net.BindException: Address already in use

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11123:
--
  Component/s: resourcemanager
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> ResourceManager webapps test failures due to 
> org.apache.hadoop.metrics2.MetricsException and subsequent 
> java.net.BindException: Address already in use
> --
>
> Key: YARN-11123
> URL: https://issues.apache.org/jira/browse/YARN-11123
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Running all tests from: org/apache/hadoop/yarn/server/resourcemanager/webapp 
> produces the following test failures: 
>  # First, 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication#testDelegationTokenAuth
>  fails with:
> {code}
> org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:479)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:1443)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.startWepApp(MockRM.java:822)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1552)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:195)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.setupAndStartRM(TestRMWebServicesDelegationTokenAuthentication.java:190)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.before(TestRMWebServicesDelegationTokenAuthentication.java:133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at

[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11128:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0  (was: 3.4.0, 3.3.5)

> Fix comments in TestProportionalCapacityPreemptionPolicy*
> -
>
> Key: YARN-11128
> URL: https://issues.apache.org/jira/browse/YARN-11128
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, documentation
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> At various places, comment for appsConfig is 
> {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}}
> but should be 
> {{// 
> queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11126) ZKConfigurationStore Java deserialisation vulnerability

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11126:
--
Target Version/s: 3.3.4, 2.10.2, 3.4.0

> ZKConfigurationStore Java deserialisation vulnerability
> ---
>
> Key: YARN-11126
> URL: https://issues.apache.org/jira/browse/YARN-11126
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.2
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.4
>
> Attachments: TestZKConfigurationStoreCVE.java
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> ZKConfigurationStore uses ObjectInputStream to deserialise objects from 
> ZooKeeper. An attacker who *has access to ZK* can exploit this, e.g.: using 
> [gadget chain deserialisation 
> attacks|https://snyk.io/blog/serialization-and-deserialization-in-java/] the 
> attacker can run arbitrary commands, even create reverse shells.
> A useful 
> [CheatSheet|https://github.com/GrrrDog/Java-Deserialization-Cheat-Sheet/blob/master/README.md]
>  for Java Deserialisation.
> I managed to start the Calculator app on my Mac using the following payload:
> {code}
>   //java -jar ./target/ysoserial-0.0.6-SNAPSHOT-all.jar CommonsBeanutils1 
> 'open /System/Applications/Calculator.app' | base64
>   @Test
>   public void testDeserializationCommonsBeanutils1() throws Exception {
> 
>

[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11128:
--
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Fix comments in TestProportionalCapacityPreemptionPolicy*
> -
>
> Key: YARN-11128
> URL: https://issues.apache.org/jira/browse/YARN-11128
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, documentation
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> At various places, comment for appsConfig is 
> {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}}
> but should be 
> {{// 
> queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11133) YarnClient gets the wrong EffectiveMinCapacity value

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11133:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0

> YarnClient gets the wrong EffectiveMinCapacity value
> 
>
> Key: YARN-11133
> URL: https://issues.apache.org/jira/browse/YARN-11133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.2.3, 3.3.2
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> It calls the QueueConfigurations#getEffectiveMinCapacity to get the wrong 
> value when I use the YarnClient. I found some bugs with 
> QueueConfigurationsPBImpl#mergeLocalToBuilder.
> {code:java}
> private void mergeLocalToBuilder() {
>   if (this.effMinResource != null) {
> builder
> .setEffectiveMinCapacity(convertToProtoFormat(this.effMinResource));
>   }
>   if (this.effMaxResource != null) {
> builder
> .setEffectiveMaxCapacity(convertToProtoFormat(this.effMaxResource));
>   }
>   if (this.configuredMinResource != null) {
> builder.setEffectiveMinCapacity(
> convertToProtoFormat(this.configuredMinResource));
>   }
>   if (this.configuredMaxResource != null) {
> builder.setEffectiveMaxCapacity(
> convertToProtoFormat(this.configuredMaxResource));
>   }
> } {code}
> configuredMinResource was incorrectly assigned to effMinResource. This causes 
> the real effMinResource to be overwritten and configuredMinResource is null. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11134) Support getNodeToLabels API in FederationClientInterceptor

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11134:
--
Fix Version/s: (was: 3.4.0)

> Support getNodeToLabels API in FederationClientInterceptor
> --
>
> Key: YARN-11134
> URL: https://issues.apache.org/jira/browse/YARN-11134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
> Attachments: YARN-11134.01.patch
>
>
> The Node Label capability is a very important capability for Yarn, and it is 
> also a very important capability for Yarn Federation.
> The Patch will complete the getNodeToLabels method.
> The issue mentioned in this JIRA will be continued in YARN-10465.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11137) Improve log message in FederationClientInterceptor

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11137:
--
Target Version/s: 3.4.0

> Improve log message in FederationClientInterceptor
> --
>
> Key: YARN-11137
> URL: https://issues.apache.org/jira/browse/YARN-11137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-11137.01.patch, YARN-11137.02.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> While reading the relevant yarn-federation-router's code, I found the 
> following issues with log method in FederationClientInterceptor:
> The log methods are inconsistent, some use the splicing method, and some use 
> the placeholder method，as follows:
> org.apache.hadoop.yarn.server.router.clientrmsubmit.FederationClientInterceptor#getNewApplication
> {code:java}
> for (int i = 0; i < numSubmitRetries; ++i) {
>       SubClusterId subClusterId = 
> getRandomActiveSubCluster(subClustersActive);
>       LOG.debug(
>           "getNewApplication try #{} on SubCluster {}", i, subClusterId);
>       ApplicationClientProtocol clientRMProxy =
>           getClientRMProxyForSubCluster(subClusterId);
>   ...
> }{code}
> org.apache.hadoop.yarn.server.router.clientrmsubmit.FederationClientInterceptor#submitApplication
> {code:java}
> for (int i = 0; i < numSubmitRetries; ++i) {      
>  SubClusterId subClusterId = policyFacade.getHomeSubcluster(
>           request.getApplicationSubmissionContext(), blacklist);
>       LOG.info("submitApplication appId" + applicationId + " try #" + i
>           + " on SubCluster " + subClusterId);
>...
> } {code}
> I think the first way is better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11138) TestRouterWebServicesREST Junit Test Error Fix

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11138.
---
Hadoop Flags:   (was: Reviewed)
  Resolution: Duplicate

> TestRouterWebServicesREST Junit Test Error Fix
> --
>
> Key: YARN-11138
> URL: https://issues.apache.org/jira/browse/YARN-11138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, test
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 28.818 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST
> [ERROR] org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST 
>  Time elapsed: 28.817 s  <<< FAILURE!
> java.lang.AssertionError: Web app not running
> at org.junit.Assert.fail(Assert.java:89)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.waitWebAppRunning(TestRouterWebServicesREST.java:199)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.setUp(TestRouterWebServicesREST.java:217)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11138) TestRouterWebServicesREST Junit Test Error Fix

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11138:
--
Fix Version/s: (was: 3.4.0)

> TestRouterWebServicesREST Junit Test Error Fix
> --
>
> Key: YARN-11138
> URL: https://issues.apache.org/jira/browse/YARN-11138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, test
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 28.818 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST
> [ERROR] org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST 
>  Time elapsed: 28.817 s  <<< FAILURE!
> java.lang.AssertionError: Web app not running
> at org.junit.Assert.fail(Assert.java:89)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.waitWebAppRunning(TestRouterWebServicesREST.java:199)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.setUp(TestRouterWebServicesREST.java:217)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-11138) TestRouterWebServicesREST Junit Test Error Fix

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reopened YARN-11138:
---

> TestRouterWebServicesREST Junit Test Error Fix
> --
>
> Key: YARN-11138
> URL: https://issues.apache.org/jira/browse/YARN-11138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, test
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 28.818 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST
> [ERROR] org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST 
>  Time elapsed: 28.817 s  <<< FAILURE!
> java.lang.AssertionError: Web app not running
> at org.junit.Assert.fail(Assert.java:89)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.waitWebAppRunning(TestRouterWebServicesREST.java:199)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.setUp(TestRouterWebServicesREST.java:217)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11141) Capacity Scheduler does not support ambiguous queue names when moving application across queues

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11141:
--
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Capacity Scheduler does not support ambiguous queue names when moving 
> application across queues
> ---
>
> Key: YARN-11141
> URL: https://issues.apache.org/jira/browse/YARN-11141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0, 3.3.5
>Reporter: András Győri
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> CapacityScheduler#moveApplication can not resolve ambiguous queue names due 
> to using queue name instead of queue path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11140) Support getClusterNodeLabels API in FederationClientInterceptor

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11140:
--
Fix Version/s: (was: 3.4.0)

> Support getClusterNodeLabels API in FederationClientInterceptor
> ---
>
> Key: YARN-11140
> URL: https://issues.apache.org/jira/browse/YARN-11140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> *getClusterNodeLabels* used by the client to get the labels of nodes in the 
> cluster, this is the basic and commonly used method, it should be implemented 
> in
> Yarn Federation.
> The JIRA will be linked directly to YARN-10465, and a PR will be submitted in 
> YARN-10465 to implement the feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11138) TestRouterWebServicesREST Junit Test Error Fix

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11138:
--
  Component/s: federation
   test
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestRouterWebServicesREST Junit Test Error Fix
> --
>
> Key: YARN-11138
> URL: https://issues.apache.org/jira/browse/YARN-11138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, test
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
> Fix For: 3.4.0
>
>
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 28.818 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST
> [ERROR] org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST 
>  Time elapsed: 28.817 s  <<< FAILURE!
> java.lang.AssertionError: Web app not running
> at org.junit.Assert.fail(Assert.java:89)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.waitWebAppRunning(TestRouterWebServicesREST.java:199)
> at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.setUp(TestRouterWebServicesREST.java:217)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11142) Remove unused Imports in Hadoop YARN project

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11142:
--
  Component/s: yarn
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove unused Imports in Hadoop YARN project
> 
>
> Key: YARN-11142
> URL: https://issues.apache.org/jira/browse/YARN-11142
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. Optimize Imports to keep code clean
>  # Remove any unused imports



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11147) ResourceUsage and QueueCapacities classes provide node label iterators that are not thread safe

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11147:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> ResourceUsage and QueueCapacities classes provide node label iterators that 
> are not thread safe
> ---
>
> Key: YARN-11147
> URL: https://issues.apache.org/jira/browse/YARN-11147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: András Győri
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> AbstractResourceUsage#getNodePartitionsSet and 
> QueueCapacities#getNodePartitionsSet provide keySet, a mutable view on the 
> HashMap's keys, that is subject to change. Iterating through an iterator that 
> is modified by an other thread at the same time results in a 
> ConcurrentModificationException as the following stacktrace shows:
> {code:java}
> 2022-04-28 13:21:53,692 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
> Error in handling event type NODE_LABELS_UPDATE to the Event Dispatcher
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
>     at com.google.common.collect.Sets$1$1.computeNext(Sets.java:758)
>     at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
>     at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:236)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:1281)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:2115)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1900)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:169)
>     at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11152) QueueMetrics is leaking memory when creating a new queue during reinitialisation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11152:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> QueueMetrics is leaking memory when creating a new queue during 
> reinitialisation
> 
>
> Key: YARN-11152
> URL: https://issues.apache.org/jira/browse/YARN-11152
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: András Győri
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Capacity Scheduler handles reinitialisation by reparsing the entire queue 
> hierarchy, then reinitialising the old queue hierarchy by taking the newly 
> parsed queues into account. After this, the newly parsed queues are discarded 
> and they are GCed.
> However, with the introduction of YARN-6492, we are storing a parent queue in 
> QueueMetrics, which is problematic, because at that point, the parent queue 
> could still point to a parent reference, that is a newly parsed parent queue 
> (which should be discarded after the reinitialisation). Due to this fact, 
> QueueMetrics could contain parents members of an entirely different queue 
> hierarchy than the current hierarchy in use. It could lead to subtle problems 
> as well as memory leak, because one parent reference will keep the whole 
> queue hierarchy alive.
> This problem arised when we programatically added one queue after an other 
> via the mutation API, thus keeping alive hundreds of queue hierarchies at the 
> same time, crippling the GC and the whole RM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11153) Make proxy server support YARN federation.

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11153:
--
Target Version/s: 3.4.0

> Make proxy server support YARN federation.
> --
>
> Key: YARN-11153
> URL: https://issues.apache.org/jira/browse/YARN-11153
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.2.1
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10775-design-doc.001.pdf
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> I setup a yarn federation cluster, I can't connect the running app web, but 
> the completed and accepted app's web works.
> So I think need two step: 
> (a) YARN-11153: make proxy server support federation. (YARN-11153)
> (b) YARN-11154: make router support proxy server.
> Though it is a not difficult problem, but not easy to describe the problem. 
> So I submit a document YARN-10775-design-doc.001.pdf to explain this.
>  
> If standalone proxyserver is enable, after step (a), the problem is solved.
> If standalone proxyserver is disable, after step (a) and (b), we use router 
> as web proxy server, so we hide the cluster info for client, I think it is 
> reasonable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11160) Support getResourceProfiles, getResourceProfile API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11160:
--
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support getResourceProfiles, getResourceProfile API's for Federation
> 
>
> Key: YARN-11160
> URL: https://issues.apache.org/jira/browse/YARN-11160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Support getResourceProfiles, getResourceProfile API's for Federation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11158) Support getDelegationToken, renewDelegationToken, cancelDelegationToken API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11158:
--
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support getDelegationToken, renewDelegationToken, cancelDelegationToken API's 
> for Federation
> 
>
> Key: YARN-11158
> URL: https://issues.apache.org/jira/browse/YARN-11158
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11159) Support failApplicationAttempt, updateApplicationPriority, updateApplicationTimeouts API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11159:
--
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support failApplicationAttempt, updateApplicationPriority, 
> updateApplicationTimeouts API's for Federation
> -
>
> Key: YARN-11159
> URL: https://issues.apache.org/jira/browse/YARN-11159
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Support failApplicationAttempt, updateApplicationPriority, 
> updateApplicationTimeouts API's for Federation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11161) Support getAttributesToNodes, getClusterNodeAttributes, getNodesToAttributes API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11161:
--
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support getAttributesToNodes, getClusterNodeAttributes, getNodesToAttributes 
> API's for Federation
> -
>
> Key: YARN-11161
> URL: https://issues.apache.org/jira/browse/YARN-11161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Support getAttributesToNodes, getClusterNodeAttributes, getNodesToAttributes 
> API's for Federation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11162) Set the zk acl for nodes created by ZKConfigurationStore.

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11162:
--
Target Version/s: 3.3.4, 2.10.2, 3.4.0

> Set the zk acl for nodes created by ZKConfigurationStore.
> -
>
> Key: YARN-11162
> URL: https://issues.apache.org/jira/browse/YARN-11162
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.10.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11167) impove import * In YARN Project

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11167:
--
Fix Version/s: (was: 3.4.0)

> impove import * In YARN Project
> ---
>
> Key: YARN-11167
> URL: https://issues.apache.org/jira/browse/YARN-11167
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Directly using * to reference does not conform to the code specification, 
> adjust it and refer to the specified package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11169) Support moveApplicationAcrossQueues, getQueueInfo API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11169:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> Support moveApplicationAcrossQueues, getQueueInfo API's for Federation
> --
>
> Key: YARN-11169
> URL: https://issues.apache.org/jira/browse/YARN-11169
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Support moveApplicationAcrossQueues, getQueueInfo API's for Federation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11176) Refactor TestAggregatedLogDeletionService

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11176:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Refactor TestAggregatedLogDeletionService
> -
>
> Key: YARN-11176
> URL: https://issues.apache.org/jira/browse/YARN-11176
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The code of TestAggregatedLogDeletionService is quite messy.
> Some refactor could be performed on this code to make it more readable and 
> easier to understand.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11172) Fix testDelegationToken

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11172:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0

> Fix testDelegationToken
> ---
>
> Key: YARN-11172
> URL: https://issues.apache.org/jira/browse/YARN-11172
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.3.5
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> UT fail after HDFS-16563, other PR is blocked.
> {code:java}
> [ERROR] 
> testDelegationToken(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 17.379 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testDelegationToken(TestClientRMTokens.java:207)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11169) Support moveApplicationAcrossQueues, getQueueInfo API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11169:
--
Component/s: federation

> Support moveApplicationAcrossQueues, getQueueInfo API's for Federation
> --
>
> Key: YARN-11169
> URL: https://issues.apache.org/jira/browse/YARN-11169
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Support moveApplicationAcrossQueues, getQueueInfo API's for Federation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11177:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support getNewReservation, submitReservation, updateReservation, 
> deleteReservation API's for Federation
> ---
>
> Key: YARN-11177
> URL: https://issues.apache.org/jira/browse/YARN-11177
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11180) Refactor some code of getNewApplication, submitApplication, forceKillApplication, getApplicationReport

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11180:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Refactor some code of getNewApplication, submitApplication, 
> forceKillApplication, getApplicationReport
> --
>
> Key: YARN-11180
> URL: https://issues.apache.org/jira/browse/YARN-11180
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> *1) FederationClientInterceptor#getNewApplication*
> 1.Increase request is empty check
> 2.Use RouterServerUtil.logAndThrowException instead of throw YarnRuntime 
> Exception.
> *2) 
> FederationClientInterceptor#submitApplication/forceKillApplication/getApplicationReport/getApplications*
> 1.Use RouterServerUtil.logAndThrowException instead of throw YarnRuntime 
> Exception.
> 2.Use string.format instead of +
> 3.Fix Code Style.
> *3) FederationClientInterceptor#getClusterMetrics*
> 1.Increase request is empty check
> *4) 
> FederationClientInterceptor#getClusterNodes/getQueueUserAcls/listReservations/getNodeToLabels/getLabelsToNodes/getClusterNodeLabels*
> 1.Use RouterServerUtil.logAndThrowException instead of throw YarnRuntime 
> Exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11177:
--
Component/s: federation

> Support getNewReservation, submitReservation, updateReservation, 
> deleteReservation API's for Federation
> ---
>
> Key: YARN-11177
> URL: https://issues.apache.org/jira/browse/YARN-11177
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11182) Refactor TestAggregatedLogDeletionService: 2nd phase

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11182:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Refactor TestAggregatedLogDeletionService: 2nd phase
> 
>
> Key: YARN-11182
> URL: https://issues.apache.org/jira/browse/YARN-11182
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The code of TestAggregatedLogDeletionService is quite messy.
> After YARN-11176, a significant refactor has been performed.
> Some more refactor could be performed on this file in order to easily define 
> new tests without copying between ~100-200 lines of code for a testcase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11185) Pending app metrics are increased doubly when a queue reaches its max-parallel-apps limit

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11185:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Pending app metrics are increased doubly when a queue reaches its 
> max-parallel-apps limit
> -
>
> Key: YARN-11185
> URL: https://issues.apache.org/jira/browse/YARN-11185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: András Győri
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When an application is submitted to a queue, its pending app metric is 
> increased, even, if the application reached the queue's max-parallel-apps 
> limit. If this application is allowed to run in the future because some other 
> application is finished, the application is submitted to the queue again, 
> increasing the pending app queue and user metrics again. Even if the 
> application finishes, it can only decrease the pending metric by one, which 
> makes the pending app metric monotonically increasing, whereas the ideal 
> state should eventually be 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11187) Remove WhiteBox in yarn module.

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11187:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0

> Remove WhiteBox in yarn module.
> ---
>
> Key: YARN-11187
> URL: https://issues.apache.org/jira/browse/YARN-11187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11188) Only files belong to the first file controller are removed even if multiple log aggregation file controllers are configured

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11188:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Only files belong to the first file controller are removed even if multiple 
> log aggregation file controllers are configured
> ---
>
> Key: YARN-11188
> URL: https://issues.apache.org/jira/browse/YARN-11188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Log aggregation can be configured to have a comma-separated list of file 
> controllers.
> The current behaviour only removes files that belong to the first file 
> controller.
> This can be problematic. 
> For example, if some user configures IFile as the file controller, and later 
> on changes the file controllers to specify multiple file controllers (e.g. 
> value = TFile,IFile) then only the first controller will be considered and 
> the files belong to that controller will be removed, in this case files 
> written by the TFile controller will be removed and the files created with 
> the IFile controller will be kept.
> This behaviour should be changed so that all of the files should be removed 
> if multiple file controllers are enabled.
> h2. CODE PATH
> 
> 1. 
> [AggregatedLogDeletionService$LogDeletionTask#run|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108]:
>  
> Let's understand what does this method do.
> 1.1 An important bit is to see how the value of the field called 
> 'retentionMillis' is set. In the constructor of LogDeletionTask, there's an 
> incoming parameter called 'retentionSecs' that is just multiplied by 1000 to 
> have a millisecond value.
> Let's see where 'retentionSecs' is coming from.
> 1.2 
> [AggregatedLogDeletionService#scheduleLogDeletionTask|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L258-L283]
>  that sets the value of retentionSecs.
> The config key for this value is 'yarn.log-aggregation.retain-seconds'.
> The javadoc says: "How long to wait before deleting aggregated logs, -1 
> disables. Be careful set this too small and you will spam the name node."
> 1.3 Going back to 
> [https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L82-L108],
>  the 'cutOffMillis' value is computed by getting the current time in millis 
> minus the retentionMillis.
> 1.4 The main point of this method is to iterate over the files in the remote 
> root log dir (field called 'remoteRootLogDir') and to check if it is a 
> directory. If so, a new Path is created with that particular directory ([code 
> link|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L90-L96]).
> One more important thing to mention: There's a field called 'suffix' that is 
> added to the remote root log dir path.
> Let's check how the 'remoteRootLogDir' and 'suffix' field get its value as 
> this is crucial to understand how the log dirs are deleted.
> 1.5 remoteRootLogDir is set in the constructor of LogDeletionTask, 
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java#L77].
> The value is returned by calling fileController.getRemoteRootLogDir().
> The LogAggregationFileControllerFactory creates the instance of 
> LogAggregationFileController.
> 
> *The process of determining the log aggregation file controller is quite 
> messy, let me describe this in detail.*
> *There are 2 types of file controllers: LogAggregationIndexedFileController 
> and LogAggregationTFileController*
> *There's a testcase called 
> [TestLogAggregationFileControllerFactory#testLogAggregationFileControllerFactory|#testLogAggregationFileControllerFactory]
>  that shows how the LogAggregationFileControllerFactory is configured.*
> 2.1

[jira] [Updated] (YARN-11187) Remove WhiteBox in yarn module.

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11187:
--
Component/s: test

> Remove WhiteBox in yarn module.
> ---
>
> Key: YARN-11187
> URL: https://issues.apache.org/jira/browse/YARN-11187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11192) TestRouterWebServicesREST failing after YARN-9827

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11192:
--
Target Version/s: 3.4.0

> TestRouterWebServicesREST failing after YARN-9827
> -
>
> Key: YARN-11192
> URL: https://issues.apache.org/jira/browse/YARN-11192
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In YARN-9827, the following modifications:
> {code:java}
> GenericExceptionHandler should respond with SERVICE_UNAVAILABLE in case of 
> connection and service unavailable exception instead of 
> INTERNAL_SERVICE_ERROR. {code}
> This modification caused all of YARN Federation's TestRouterWebServicesREST 
> unit tests to fail
> {code:java}
> [ERROR] Tests run: 201, Failures: 15, Errors: 0, Skipped: 0, Flakes: 2
> .
> [ERROR] 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppStateXML(org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST)
> [ERROR]   Run 1: TestRouterWebServicesREST.testUpdateAppStateXML:774 
> expected:<500> but was:<503>
> [ERROR]   Run 2: TestRouterWebServicesREST.testUpdateAppStateXML:774 
> expected:<500> but was:<503>
> [ERROR]   Run 3: TestRouterWebServicesREST.testUpdateAppStateXML:774 
> expected:<500> but was:<503> {code}
> Report-URL:
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4464/5/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11196:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> NUMA Awareness support in DefaultContainerExecutor
> --
>
> Key: YARN-11196
> URL: https://issues.apache.org/jira/browse/YARN-11196
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.3.3
>Reporter: Prabhu Joseph
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> [YARN-5764|https://issues.apache.org/jira/browse/YARN-5764] has added support 
> of NUMA Awareness for Containers launched through LinuxContainerExecutor. 
> This feature is useful to have in DefaultContainerExecutor as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11198) Deletion of assigned resources (e.g. GPU's, NUMA, FPGA's) from State Store

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11198:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> Deletion of assigned resources (e.g. GPU's, NUMA, FPGA's) from State Store
> --
>
> Key: YARN-11198
> URL: https://issues.apache.org/jira/browse/YARN-11198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.3
>Reporter: Prabhu Joseph
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> [YARN-7033|https://issues.apache.org/jira/browse/YARN-7033] provided support 
> to recover  assigned resources to container. But did not delete them from 
> State Store as part of removal of container after the configured duration 
> yarn.nodemanager.duration-to-track-stopped-containers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11190) CS Mapping rule bug: User matcher does not work correctly for usernames with dot

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11190:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> CS Mapping rule bug: User matcher does not work correctly for usernames with 
> dot
> 
>
> Key: YARN-11190
> URL: https://issues.apache.org/jira/browse/YARN-11190
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: testUserNameSetDefaultAndPlaceWith2Rules.log, 
> testUserNameSetDefaultAndPlaceWith2RulesUsernameReplacedWithDot.log, 
> testcases.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Given the following scenario, the placement engine does not work as expected.
> A user with a '.' (dot) inside his/her username submits a job.
> Let the username be "test.user"
> There are 2 mapping rules: 
> 1. The matcher matches the user with name "test.user" and has an associated 
> mapping rule action that sets the default queue to "root.user".
> 2. The second mapping rule matches the same user ("test.user") and places the 
> application to the default queue.
> *Expactation:*
> When the user with username "root.user" submits a job, the application will 
> be placed to queue "root.user".
> *Observed behaviour:* 
> The application is placed to test_dot_user.
> This means that the dot is replaced to "{_}dot{_}" too early so that the 
> default queue is set incorrectly.
>  
> I have attached a patch file that demonstrates this behaviour with 2 new 
> testcases along with the logs of these testcases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11198) Deletion of assigned resources (e.g. GPU's, NUMA, FPGA's) from State Store

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11198:
--
Component/s: nodemanager

> Deletion of assigned resources (e.g. GPU's, NUMA, FPGA's) from State Store
> --
>
> Key: YARN-11198
> URL: https://issues.apache.org/jira/browse/YARN-11198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.3
>Reporter: Prabhu Joseph
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> [YARN-7033|https://issues.apache.org/jira/browse/YARN-7033] provided support 
> to recover  assigned resources to container. But did not delete them from 
> State Store as part of removal of container after the configured duration 
> yarn.nodemanager.duration-to-track-stopped-containers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11203) Fix typo in hadoop-yarn-server-router module

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11203:
--
Target Version/s: 3.4.0

> Fix typo in hadoop-yarn-server-router module
> 
>
> Key: YARN-11203
> URL: https://issues.apache.org/jira/browse/YARN-11203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Fix typo in hadoop-yarn-server-router module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11210) Fix YARN RMAdminCLI retry logic for non-retryable kerberos configuration exception

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11210:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix YARN RMAdminCLI retry logic for non-retryable kerberos configuration 
> exception
> --
>
> Key: YARN-11210
> URL: https://issues.apache.org/jira/browse/YARN-11210
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.4.0
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> h2. Description of Problem
> Applications which call YARN RMAdminCLI (i.e. YARN ResourceManager client) 
> synchronously can be blocked for up to 15 minutes with the default 
> configuration of "yarn.resourcemanager.connect.max-wait.ms"; this is not an 
> issue in of itself, but there is a non-retryable IllegalArgumentException 
> exception thrown within the YARN ResourceManager client that is getting 
> swallowed & treated as a retryable "connection exception" meaning that it 
> gets retried for 15 minutes.
> The purpose of this JIRA (and PR) is to modify the YARN client so that it 
> does not retry on this non-retryable exception.
> h2. Background Information
> YARN ResourceManager client treats connection exceptions as retryable & with 
> the default value of "yarn.resourcemanager.connect.max-wait.ms" will attempt 
> to connect to the ResourceManager for up to 15 minutes when facing 
> "connection exceptions". This arguably makes sense because connection 
> exceptions are in some cases transient & can be recovered from without any 
> action needed from the client. See example below where YARN ResourceManager 
> client was able to recover from connection issues that resulted from the 
> ResourceManager process being down.
> {quote}> yarn rmadmin -refreshNodes
> 22/06/28 14:40:17 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8033
> 22/06/28 14:40:18 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:40:19 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:40:20 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> ...
> 22/06/28 14:40:27 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:40:28 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:40:29 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> ...
> 22/06/28 14:40:37 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:40:37 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Your endpoint configuration is wrong; For more 
> details see:  [http://wiki.apache.org/hadoop/UnsetHostnameOrPort], while 
> invoking ResourceManagerAdministrationProtocolPBClientImpl.refreshNodes over 
> null after 1 failover attempts. Trying to failover after sleeping for 41061ms.
> 22/06/28 14:41:19 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:41:20 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> ...
> 22/06/28 14:41:28 INFO ipc.Client: Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8033. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 22/06/28 14:41:28 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Your endpoint configuration is wrong; For more 
> details

[jira] [Updated] (YARN-11204) Various MapReduce tests fail with NPE in AggregatedLogDeletionService.stopRMClient

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11204:
--
Component/s: log-aggregation
 (was: test)

> Various MapReduce tests fail with NPE in 
> AggregatedLogDeletionService.stopRMClient
> --
>
> Key: YARN-11204
> URL: https://issues.apache.org/jira/browse/YARN-11204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: 
> hadoop-mapreduce-project_hadoop-mapreduce-client_testlogs.txt, 
> testAllOpportunisticMaps_logs.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During testing of HADOOP-15327, I noticed that lots of unit test are failing 
> in the module called 'hadoop-mapreduce-client-jobclient'.
> See this link for details: 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3259/9/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client.txt
> In case of the above Jenkins link expires later, I attached the same text 
> file to this jira.
> Let's see one example: 
> org.apache.hadoop.mapred.TestMROpportunisticMaps#testAllOpportunisticMaps
> Logs are also attached.
> An example stacktrace, for reference: 
> {code}
> 2022-06-29 11:24:13,510 INFO  [Listener at 0.0.0.0/8049] 
> service.AbstractService (AbstractService.java:noteFailure(268)) - Service 
> TestMROpportunisticMaps failed in state STOPPED
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService.stopRMClient(AggregatedLogDeletionService.java:322)
>     at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService.serviceStop(AggregatedLogDeletionService.java:229)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>     at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>     at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:160)
>     at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:134)
>     at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStop(JobHistoryServer.java:203)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster$JobHistoryServerWrapper.serviceStop(MiniMRYarnCluster.java:293)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>     at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>     at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:160)
>     at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:134)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.stop(MiniMRYarnClusterAdapter.java:56)
>     at 
> org.apache.hadoop.mapred.TestMROpportunisticMaps.doTest(TestMROpportunisticMaps.java:108)
>     at 
> org.apache.hadoop.mapred.TestMROpportunisticMaps.doTest(TestMROpportunisticMaps.java:74)
>     at 
> org.apache.hadoop.mapred.TestMROpportunisticMaps.testAllOpportunisticMaps(TestMROpportunisticMaps.java:60)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at

[jira] [Updated] (YARN-11204) Various MapReduce tests fail with NPE in AggregatedLogDeletionService.stopRMClient

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11204:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Various MapReduce tests fail with NPE in 
> AggregatedLogDeletionService.stopRMClient
> --
>
> Key: YARN-11204
> URL: https://issues.apache.org/jira/browse/YARN-11204
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: 
> hadoop-mapreduce-project_hadoop-mapreduce-client_testlogs.txt, 
> testAllOpportunisticMaps_logs.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During testing of HADOOP-15327, I noticed that lots of unit test are failing 
> in the module called 'hadoop-mapreduce-client-jobclient'.
> See this link for details: 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3259/9/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client.txt
> In case of the above Jenkins link expires later, I attached the same text 
> file to this jira.
> Let's see one example: 
> org.apache.hadoop.mapred.TestMROpportunisticMaps#testAllOpportunisticMaps
> Logs are also attached.
> An example stacktrace, for reference: 
> {code}
> 2022-06-29 11:24:13,510 INFO  [Listener at 0.0.0.0/8049] 
> service.AbstractService (AbstractService.java:noteFailure(268)) - Service 
> TestMROpportunisticMaps failed in state STOPPED
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService.stopRMClient(AggregatedLogDeletionService.java:322)
>     at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService.serviceStop(AggregatedLogDeletionService.java:229)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>     at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>     at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:160)
>     at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:134)
>     at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStop(JobHistoryServer.java:203)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster$JobHistoryServerWrapper.serviceStop(MiniMRYarnCluster.java:293)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>     at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>     at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:160)
>     at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:134)
>     at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at 
> org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.stop(MiniMRYarnClusterAdapter.java:56)
>     at 
> org.apache.hadoop.mapred.TestMROpportunisticMaps.doTest(TestMROpportunisticMaps.java:108)
>     at 
> org.apache.hadoop.mapred.TestMROpportunisticMaps.doTest(TestMROpportunisticMaps.java:74)
>     at 
> org.apache.hadoop.mapred.TestMROpportunisticMaps.testAllOpportunisticMaps(TestMROpportunisticMaps.java:60)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at

[jira] [Updated] (YARN-11212) [Federation] Add getNodeToLabels REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11212:
--
Target Version/s: 3.4.0

> [Federation] Add getNodeToLabels REST APIs for Router
> -
>
> Key: YARN-11212
> URL: https://issues.apache.org/jira/browse/YARN-11212
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Add getNodeToLabels REST APIs for Router.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11211) QueueMetrics leaks Configuration objects when validation API is called multiple times

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11211:
--
Target Version/s: 3.4.0

> QueueMetrics leaks Configuration objects when validation API is called 
> multiple times
> -
>
> Key: YARN-11211
> URL: https://issues.apache.org/jira/browse/YARN-11211
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: András Győri
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> QueueMetrics#QUEUE_METRICS is a static map, which is a source of multiple 
> bugs eg. YARN-11152.
> The current scenario could be reproduced by adding queues one at a time via 
> the mutation API.
>  # Validate adding queue1 via validation API
>  # Validation API instantiates a new CS, with a new Configuration, that 
> instantiates a ConfigurationProperties
>  # QueueMetrics does share the same QUEUE_METRICS cache with the original CS, 
> where there is now a Metrics object that belongs to the new CS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11211) QueueMetrics leaks Configuration objects when validation API is called multiple times

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11211:
--
Affects Version/s: 3.4.0

> QueueMetrics leaks Configuration objects when validation API is called 
> multiple times
> -
>
> Key: YARN-11211
> URL: https://issues.apache.org/jira/browse/YARN-11211
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: András Győri
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> QueueMetrics#QUEUE_METRICS is a static map, which is a source of multiple 
> bugs eg. YARN-11152.
> The current scenario could be reproduced by adding queues one at a time via 
> the mutation API.
>  # Validate adding queue1 via validation API
>  # Validation API instantiates a new CS, with a new Configuration, that 
> instantiates a ConfigurationProperties
>  # QueueMetrics does share the same QUEUE_METRICS cache with the original CS, 
> where there is now a Metrics object that belongs to the new CS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11221) [Federation] Add replaceLabelsOnNodes, replaceLabelsOnNode REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11221:
--
Target Version/s: 3.4.0

> [Federation] Add replaceLabelsOnNodes, replaceLabelsOnNode REST APIs for 
> Router
> ---
>
> Key: YARN-11221
> URL: https://issues.apache.org/jira/browse/YARN-11221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11220) [Federation] Add getLabelsToNodes, getClusterNodeLabels, getLabelsOnNode REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11220:
--
Target Version/s: 3.4.0

> [Federation] Add getLabelsToNodes, getClusterNodeLabels, getLabelsOnNode REST 
> APIs for Router
> -
>
> Key: YARN-11220
> URL: https://issues.apache.org/jira/browse/YARN-11220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11219) [Federation] Add getAppActivities, getAppStatistics REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11219:
--
Target Version/s: 3.4.0

> [Federation] Add getAppActivities, getAppStatistics REST APIs for Router
> 
>
> Key: YARN-11219
> URL: https://issues.apache.org/jira/browse/YARN-11219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11217) [Federation] Add dumpSchedulerLogs REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11217:
--
Target Version/s: 3.4.0

> [Federation] Add dumpSchedulerLogs REST APIs for Router
> ---
>
> Key: YARN-11217
> URL: https://issues.apache.org/jira/browse/YARN-11217
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11218) [Federation] Add getActivities, getBulkActivities REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11218:
--
Target Version/s: 3.4.0

> [Federation] Add getActivities, getBulkActivities REST APIs for Router
> --
>
> Key: YARN-11218
> URL: https://issues.apache.org/jira/browse/YARN-11218
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11223) [Federation] Add getAppPriority, updateApplicationPriority REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11223:
--
Target Version/s: 3.4.0

> [Federation] Add getAppPriority, updateApplicationPriority REST APIs for 
> Router
> ---
>
> Key: YARN-11223
> URL: https://issues.apache.org/jira/browse/YARN-11223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11225) [Federation] Add postDelegationToken, postDelegationTokenExpiration, cancelDelegationToken REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11225:
--
Target Version/s: 3.4.0

> [Federation] Add postDelegationToken, postDelegationTokenExpiration, 
> cancelDelegationToken  REST APIs for Router
> 
>
> Key: YARN-11225
> URL: https://issues.apache.org/jira/browse/YARN-11225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11224) [Federation] Add getAppQueue, updateAppQueue REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11224:
--
Target Version/s: 3.4.0

> [Federation] Add getAppQueue, updateAppQueue REST APIs for Router
> -
>
> Key: YARN-11224
> URL: https://issues.apache.org/jira/browse/YARN-11224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11222) [Federation] Add addToClusterNodeLabels, removeFromClusterNodeLabels REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11222:
--
Target Version/s: 3.4.0

> [Federation] Add addToClusterNodeLabels, removeFromClusterNodeLabels REST 
> APIs for Router
> -
>
> Key: YARN-11222
> URL: https://issues.apache.org/jira/browse/YARN-11222
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11226) [Federation] Add createNewReservation, submitReservation, updateReservation, deleteReservation REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11226:
--
Target Version/s: 3.4.0

> [Federation] Add createNewReservation, submitReservation, updateReservation, 
> deleteReservation REST APIs for Router
> ---
>
> Key: YARN-11226
> URL: https://issues.apache.org/jira/browse/YARN-11226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11230) [Federation] Add getContainer, signalToContainer REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11230:
--
Target Version/s: 3.4.0

> [Federation] Add getContainer, signalToContainer  REST APIs for Router
> --
>
> Key: YARN-11230
> URL: https://issues.apache.org/jira/browse/YARN-11230
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11236) [RESERVATION] Implement FederationReservationHomeSubClusterStore With MemoryStore

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11236:
--
Target Version/s: 3.4.0

> [RESERVATION] Implement FederationReservationHomeSubClusterStore With 
> MemoryStore
> -
>
> Key: YARN-11236
> URL: https://issues.apache.org/jira/browse/YARN-11236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11228) [Federation] Add getAppAttempts, getAppAttempt REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11228:
--
Target Version/s: 3.4.0

> [Federation] Add getAppAttempts, getAppAttempt REST APIs for Router
> ---
>
> Key: YARN-11228
> URL: https://issues.apache.org/jira/browse/YARN-11228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11229) [Federation] Add checkUserAccessToQueue REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11229:
--
Target Version/s: 3.4.0

> [Federation] Add checkUserAccessToQueue REST APIs for Router
> 
>
> Key: YARN-11229
> URL: https://issues.apache.org/jira/browse/YARN-11229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11227) [Federation] Add getAppTimeout, getAppTimeouts, updateApplicationTimeout REST APIs for Router

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11227:
--
Target Version/s: 3.4.0

> [Federation] Add getAppTimeout, getAppTimeouts, updateApplicationTimeout REST 
> APIs for Router
> -
>
> Key: YARN-11227
> URL: https://issues.apache.org/jira/browse/YARN-11227
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11235) [RESERVATION] Refactor Policy Code and Define getReservationHomeSubcluster

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11235:
--
Target Version/s: 3.4.0

> [RESERVATION] Refactor Policy Code and Define getReservationHomeSubcluster
> --
>
> Key: YARN-11235
> URL: https://issues.apache.org/jira/browse/YARN-11235
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: [RESERVATION] Add support for reservation-based 
> routing.pdf
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Refer to 2.1 Router Policy, which describes the changes to be made. The 
> documentation will continue to improve, the current version is V1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11238) Optimizing FederationClientInterceptor Call with Parallelism

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11238:
--
Target Version/s: 3.4.0

> Optimizing FederationClientInterceptor Call with Parallelism
> 
>
> Key: YARN-11238
> URL: https://issues.apache.org/jira/browse/YARN-11238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11240) Fix incorrect placeholder in yarn-module

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11240:
--
Target Version/s: 3.4.0

> Fix incorrect placeholder in yarn-module
> 
>
> Key: YARN-11240
> URL: https://issues.apache.org/jira/browse/YARN-11240
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Try to deal with the moudle problem at a time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11239) Optimize FederationClientInterceptor audit log

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11239:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Optimize FederationClientInterceptor audit log
> --
>
> Key: YARN-11239
> URL: https://issues.apache.org/jira/browse/YARN-11239
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11237) Bug while disabling proxy failover with Federation

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11237:
--
Target Version/s: 3.4.0

> Bug while disabling proxy failover with Federation
> --
>
> Key: YARN-11237
> URL: https://issues.apache.org/jira/browse/YARN-11237
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.3.3
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When one disables the use of RM fail over proxy with federation, there is a 
> bug checking a wrong/parent flag `yarn.federation.enabled` whether the 
> federation is used instead of the fail over feature flag 
> `yarn.federation.failover.enabled` of federation. Without this change, when 
> fail over feature is disabled, node manager cannot be started.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11240) Fix incorrect placeholder in yarn-module

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11240:
--
Component/s: yarn

> Fix incorrect placeholder in yarn-module
> 
>
> Key: YARN-11240
> URL: https://issues.apache.org/jira/browse/YARN-11240
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Try to deal with the moudle problem at a time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11241) Add uncleaning option for local app log file with log-aggregation enabled

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11241:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Add uncleaning option for local app log file with log-aggregation enabled
> -
>
> Key: YARN-11241
> URL: https://issues.apache.org/jira/browse/YARN-11241
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add uncleaning option for local app log file with log-aggregation enabled
> This will be helpful for debugging purpose.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11245) Upgrade JUnit from 4 to 5 in hadoop-yarn-csi

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11245:
--
  Component/s: yarn-csi
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Upgrade JUnit from 4 to 5 in hadoop-yarn-csi
> 
>
> Key: YARN-11245
> URL: https://issues.apache.org/jira/browse/YARN-11245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-csi
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Upgrade JUnit from 4 to 5 in hadoop-yarn-csi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11245) Upgrade JUnit from 4 to 5 in hadoop-yarn-csi

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11245:
--
Hadoop Flags: Reviewed

> Upgrade JUnit from 4 to 5 in hadoop-yarn-csi
> 
>
> Key: YARN-11245
> URL: https://issues.apache.org/jira/browse/YARN-11245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-csi
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Upgrade JUnit from 4 to 5 in hadoop-yarn-csi



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11253) Add Configuration to delegationToken RemoverScanInterval

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11253:
--
Target Version/s: 3.4.0

> Add Configuration to delegationToken RemoverScanInterval
> 
>
> Key: YARN-11253
> URL: https://issues.apache.org/jira/browse/YARN-11253
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When reading the code, I found the case of hard coding, I think the 
> parameters should be abstracted into the configuration.
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService#
> createRMDelegationTokenSecretManager
> {code:java}
> protected RMDelegationTokenSecretManager 
> createRMDelegationTokenSecretManager(Configuration conf, RMContext rmContext) 
> {  
>// . 360 This hard code should be extracted    
>return new RMDelegationTokenSecretManager(secretKeyInterval, 
> tokenMaxLifetime, tokenRenewInterval, 360, rmContext); 
> } 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11250) Capture the Performance Metrics of ZookeeperFederationStateStore

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11250:
--
Target Version/s: 3.4.0

> Capture the Performance Metrics of ZookeeperFederationStateStore
> 
>
> Key: YARN-11250
> URL: https://issues.apache.org/jira/browse/YARN-11250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Capture the Performance Metrics of ZookeeperFederationStateStore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11248) Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11248:
--
Hadoop Flags: Reviewed

> Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING
> ---
>
> Key: YARN-11248
> URL: https://issues.apache.org/jira/browse/YARN-11248
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.3.3
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>
> Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11252) [RESERVATION] Yarn Federation Router Supports Update / Delete Reservation in MemoryStore

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11252:
--
Target Version/s: 3.4.0

> [RESERVATION] Yarn Federation Router Supports Update / Delete Reservation in 
> MemoryStore
> 
>
> Key: YARN-11252
> URL: https://issues.apache.org/jira/browse/YARN-11252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0, 3.3.4
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11255) Support loading alternative docker client config from system environment

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11255:
--
Labels: pull-request-available  (was: )

> Support loading alternative docker client config from system environment
> 
>
> Key: YARN-11255
> URL: https://issues.apache.org/jira/browse/YARN-11255
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When using YARN docker support, although the hadoop shell supported 
> {code:java}
> -docker_client_config{code}
>  to pass the client config file that contains security token to generate the 
> docker config for each job as a temporary file.
> For other applications that submit jobs to YARN, e.g. Spark, which loads the 
> docker setting via system environment e.g. 
> {code:java}
> spark.executorEnv.* {code}
> will not be able to add those authorization token because this system 
> environment isn't considered in YARN.
> Add genetic solution to handle these kind of cases without making changes in 
> spark code or others
> Eg
> When using remote container registry, the 
> {{YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG}} must reference the config.json
> file containing the credentials used to authenticate.
> {code:java}
> DOCKER_IMAGE_NAME=hadoop-docker 
> DOCKER_CLIENT_CONFIG=hdfs:///user/hadoop/config.json
> spark-submit --master yarn \
> --deploy-mode cluster \
> --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
> --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME \
> --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG
>  \
> --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
> --conf 
> spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME
>  \
> --conf 
> spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG
>  \
> sparkR.R{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11254) hadoop-minikdc dependency duplicated in hadoop-yarn-server-nodemanager

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11254:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> hadoop-minikdc dependency duplicated in hadoop-yarn-server-nodemanager
> --
>
> Key: YARN-11254
> URL: https://issues.apache.org/jira/browse/YARN-11254
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: Clara Fang
>Assignee: Clara Fang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The dependency hadoop-minikdc is defined twice in 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/pom.xml
> {code:xml}
> 
> org.apache.hadoop
> hadoop-minikdc
> test
> 
> 
> org.apache.hadoop
> hadoop-minikdc
> test
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11254) hadoop-minikdc dependency duplicated in hadoop-yarn-server-nodemanager

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11254:
--
Affects Version/s: 3.4.0

> hadoop-minikdc dependency duplicated in hadoop-yarn-server-nodemanager
> --
>
> Key: YARN-11254
> URL: https://issues.apache.org/jira/browse/YARN-11254
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.4.0
>Reporter: Clara Fang
>Assignee: Clara Fang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The dependency hadoop-minikdc is defined twice in 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/pom.xml
> {code:xml}
> 
> org.apache.hadoop
> hadoop-minikdc
> test
> 
> 
> org.apache.hadoop
> hadoop-minikdc
> test
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11255) Support loading alternative docker client config from system environment

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11255:
--
 Component/s: yarn
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> Support loading alternative docker client config from system environment
> 
>
> Key: YARN-11255
> URL: https://issues.apache.org/jira/browse/YARN-11255
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
> Fix For: 3.4.0
>
>
> When using YARN docker support, although the hadoop shell supported 
> {code:java}
> -docker_client_config{code}
>  to pass the client config file that contains security token to generate the 
> docker config for each job as a temporary file.
> For other applications that submit jobs to YARN, e.g. Spark, which loads the 
> docker setting via system environment e.g. 
> {code:java}
> spark.executorEnv.* {code}
> will not be able to add those authorization token because this system 
> environment isn't considered in YARN.
> Add genetic solution to handle these kind of cases without making changes in 
> spark code or others
> Eg
> When using remote container registry, the 
> {{YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG}} must reference the config.json
> file containing the credentials used to authenticate.
> {code:java}
> DOCKER_IMAGE_NAME=hadoop-docker 
> DOCKER_CLIENT_CONFIG=hdfs:///user/hadoop/config.json
> spark-submit --master yarn \
> --deploy-mode cluster \
> --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
> --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME \
> --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG
>  \
> --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
> --conf 
> spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME
>  \
> --conf 
> spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG
>  \
> sparkR.R{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11255) Support loading alternative docker client config from system environment

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11255:
--
Affects Version/s: 3.4.0

> Support loading alternative docker client config from system environment
> 
>
> Key: YARN-11255
> URL: https://issues.apache.org/jira/browse/YARN-11255
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
> Fix For: 3.4.0
>
>
> When using YARN docker support, although the hadoop shell supported 
> {code:java}
> -docker_client_config{code}
>  to pass the client config file that contains security token to generate the 
> docker config for each job as a temporary file.
> For other applications that submit jobs to YARN, e.g. Spark, which loads the 
> docker setting via system environment e.g. 
> {code:java}
> spark.executorEnv.* {code}
> will not be able to add those authorization token because this system 
> environment isn't considered in YARN.
> Add genetic solution to handle these kind of cases without making changes in 
> spark code or others
> Eg
> When using remote container registry, the 
> {{YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG}} must reference the config.json
> file containing the credentials used to authenticate.
> {code:java}
> DOCKER_IMAGE_NAME=hadoop-docker 
> DOCKER_CLIENT_CONFIG=hdfs:///user/hadoop/config.json
> spark-submit --master yarn \
> --deploy-mode cluster \
> --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
> --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME \
> --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG
>  \
> --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
> --conf 
> spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME
>  \
> --conf 
> spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG
>  \
> sparkR.R{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11271) Upgrade JUnit from 4 to 5 in hadoop-yarn-server-timelineservice-hbase-common

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11271:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> Upgrade JUnit from 4 to 5 in hadoop-yarn-server-timelineservice-hbase-common
> 
>
> Key: YARN-11271
> URL: https://issues.apache.org/jira/browse/YARN-11271
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test, yarn
>Affects Versions: 3.3.4
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11270) Upgrade JUnit from 4 to 5 in hadoop-yarn-server-timelineservice-hbase-client

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11270:
--
Target Version/s: 3.4.0

> Upgrade JUnit from 4 to 5 in hadoop-yarn-server-timelineservice-hbase-client
> 
>
> Key: YARN-11270
> URL: https://issues.apache.org/jira/browse/YARN-11270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test, yarn
>Affects Versions: 3.3.4
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11278) Ambiguous error message in mutation API

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11278:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Ambiguous error message in mutation API
> ---
>
> Key: YARN-11278
> URL: https://issues.apache.org/jira/browse/YARN-11278
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: András Győri
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In RMWebServices#updateSchedulerConfiguration, we are checking two 
> prerequisites:
> {code:java}
> if (scheduler instanceof MutableConfScheduler && ((MutableConfScheduler)
> scheduler).isConfigurationMutable()) { {code}
> However, the error message is misleading in the second case (namely if the 
> configuration is not mutable eg. a FILE_CONFIGURATION_STORE)
> {code:java}
> } else {
>   return Response.status(Status.BAD_REQUEST)
>   .entity("Configuration change only supported by " +
>   "MutableConfScheduler.")
>   .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11297) Improve Yarn Router Reservation Submission Code

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11297:
--
Hadoop Flags: Reviewed

> Improve Yarn Router Reservation Submission Code
> ---
>
> Key: YARN-11297
> URL: https://issues.apache.org/jira/browse/YARN-11297
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The same reservation may be submitted repeatedly. At this time, we should use 
> the reserved results first. If the reserved results are not available, 
> consider applying from other RMs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11283) [Federation] Fix Typo of NodeManager AMRMProxy.

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11283:
--
Target Version/s: 3.4.0

> [Federation] Fix Typo of NodeManager AMRMProxy.
> ---
>
> Key: YARN-11283
> URL: https://issues.apache.org/jira/browse/YARN-11283
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, nodemanager
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix Typo of NodeManager amrmproxy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11287) Fix NoClassDefFoundError: org/junit/platform/launcher/core/LauncherFactory after YARN-10793

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11287:
--
Hadoop Flags: Reviewed

> Fix NoClassDefFoundError: org/junit/platform/launcher/core/LauncherFactory 
> after YARN-10793
> ---
>
> Key: YARN-11287
> URL: https://issues.apache.org/jira/browse/YARN-11287
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> After executing the yarn-project global unit test, I found the following 
> error:
> {code:java}
> ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) 
> on project hadoop-yarn-server-applicationhistoryservice: Execution 
> default-test of goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test failed: 
> java.lang.NoClassDefFoundError: 
> org/junit/platform/launcher/core/LauncherFactory: 
> org.junit.platform.launcher.core.LauncherFactory -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hadoop-yarn-server-applicationhistoryservice {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11307) Fix Yarn Router Broken Link

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11307:
--
Hadoop Flags: Reviewed

> Fix Yarn Router Broken Link
> ---
>
> Key: YARN-11307
> URL: https://issues.apache.org/jira/browse/YARN-11307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11303) Upgrade jquery ui to 1.13.2

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11303:
--
 Target Version/s: 3.3.5, 3.4.0
Affects Version/s: 3.3.5
   3.4.0

> Upgrade jquery ui to 1.13.2
> ---
>
> Key: YARN-11303
> URL: https://issues.apache.org/jira/browse/YARN-11303
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 3.4.0, 3.3.5
>Reporter: D M Murali Krishna Reddy
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>
> The current jquery-ui version used(1.13.1) in the trunk has the following 
> vulnerability 
> [CVE-2022-31160|https://nvd.nist.gov/vuln/detail/CVE-2022-31160]  so we need 
> to upgrade to at least 1.13.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11324) [Federation] Fix some PBImpl classes to avoid NPE.

2024-01-27 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11324:
--
Hadoop Flags: Reviewed

> [Federation] Fix some PBImpl classes to avoid NPE.
> --
>
> Key: YARN-11324
> URL: https://issues.apache.org/jira/browse/YARN-11324
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router, yarn
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-09-30-16-52-25-031.png
>
>
> When completing YARN-11323, I found that there is a bug in 
> ApplicationHomeSubClusterPBImpl, which may cause a null pointer exception 
> when getting getApplicationId
> {code:java}
> @Test
> public void testGetApplicationIdNullException() throws YarnException {
>   ApplicationId appId = ApplicationId.newInstance(Time.now(), 1);
>   ApplicationHomeSubCluster appHomeSC = ApplicationHomeSubCluster.newInstance(
>   appId, subClusterId);
>   System.out.println(appHomeSC.getApplicationId());
> } {code}
> The test results are as follows:
> !image-2022-09-30-16-52-25-031.png|width=818,height=271!
>  
> After we set the ApplicationId, direct get will get a null value.
> *Why this problem occurs？*
> The reason for this problem is because we did not set a value for 
> ApplicationHomeSubClusterProtoOrBuilder when we setApplication
> *Improve the code:*
> 1.set a value for ApplicationHomeSubClusterProtoOrBuilder when we 
> setApplication.
> 2. At the same time, in order to improve the access efficiency, we should 
> first check whether the internal property is empty when getApplication. If it 
> is not empty, we can return it directly. If it is empty, we convert it from 
> the proto object.
> While modifying ApplicationHomeSubClusterImpl, I will check the pbImpl 
> classes of all router modules to make sure all pbimpl are fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 190 matches

Mail list logo