[jira] [Created] (YARN-11136) Support getLabelsToNodes API in FederationClientInterceptor
fanshilun created YARN-11136: Summary: Support getLabelsToNodes API in FederationClientInterceptor Key: YARN-11136 URL: https://issues.apache.org/jira/browse/YARN-11136 Project: Hadoop YARN Issue Type: Sub-task Reporter: fanshilun -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11135) Optimize Imports in hadoop project
[ https://issues.apache.org/jira/browse/YARN-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11135: -- Labels: pull-request-available (was: ) > Optimize Imports in hadoop project > -- > > Key: YARN-11135 > URL: https://issues.apache.org/jira/browse/YARN-11135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > h3. Optimize Imports to keep code clean > # Remove any unused imports > # Sort the import statements. > # Remove .* imports -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11135) Optimize Imports in hadoop project
[ https://issues.apache.org/jira/browse/YARN-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Gupta updated YARN-11135: -- Description: h3. Optimize Imports to keep code clean # Remove any unused imports # Sort the import statements. # Remove .* imports was: h3. Optimize Imports to keep code clean # Remove any unused imports, # Sort the import statements. > Optimize Imports in hadoop project > -- > > Key: YARN-11135 > URL: https://issues.apache.org/jira/browse/YARN-11135 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Major > > h3. Optimize Imports to keep code clean > # Remove any unused imports > # Sort the import statements. > # Remove .* imports -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11135) Optimize Imports in hadoop project
Ashutosh Gupta created YARN-11135: - Summary: Optimize Imports in hadoop project Key: YARN-11135 URL: https://issues.apache.org/jira/browse/YARN-11135 Project: Hadoop YARN Issue Type: New Feature Reporter: Ashutosh Gupta Assignee: Ashutosh Gupta h3. Optimize Imports to keep code clean # Remove any unused imports, # Sort the import statements. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-10080: -- Labels: pull-request-available (was: ) > Support show app id on localizer thread pool > > > Key: YARN-10080 > URL: https://issues.apache.org/jira/browse/YARN-10080 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Attachments: YARN-10080-001.patch, YARN-10080.002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently when we are troubleshooting a container localizer issue, if we want > to analyze the jstack with thread detail, we can not figure out which thread > is processing the given container. So i want to add app id on the thread name -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Gupta reassigned YARN-10080: - Assignee: Ashutosh Gupta (was: zhoukang) > Support show app id on localizer thread pool > > > Key: YARN-10080 > URL: https://issues.apache.org/jira/browse/YARN-10080 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: Ashutosh Gupta >Priority: Major > Attachments: YARN-10080-001.patch, YARN-10080.002.patch > > > Currently when we are troubleshooting a container localizer issue, if we want > to analyze the jstack with thread detail, we can not figure out which thread > is processing the given container. So i want to add app id on the thread name -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool
[ https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533382#comment-17533382 ] Ashutosh Gupta commented on YARN-10080: --- Taking it up. > Support show app id on localizer thread pool > > > Key: YARN-10080 > URL: https://issues.apache.org/jira/browse/YARN-10080 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-10080-001.patch, YARN-10080.002.patch > > > Currently when we are troubleshooting a container localizer issue, if we want > to analyze the jstack with thread detail, we can not figure out which thread > is processing the given container. So i want to add app id on the thread name -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9355) RMContainerRequestor#makeRemoteRequest has confusing log message
[ https://issues.apache.org/jira/browse/YARN-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-9355: - Labels: newbie newbie++ pull-request-available (was: newbie newbie++) > RMContainerRequestor#makeRemoteRequest has confusing log message > > > Key: YARN-9355 > URL: https://issues.apache.org/jira/browse/YARN-9355 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Ashutosh Gupta >Priority: Trivial > Labels: newbie, newbie++, pull-request-available > Attachments: YARN-9355.001.patch, YARN-9355.002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#makeRemoteRequest > has this log: > {code:java} > if (ask.size() > 0 || release.size() > 0) { > LOG.info("getResources() for " + applicationId + ":" + " ask=" > + ask.size() + " release= " + release.size() + " newContainers=" > + allocateResponse.getAllocatedContainers().size() > + " finishedContainers=" + numCompletedContainers > + " resourcelimit=" + availableResources + " knownNMs=" > + clusterNmCount); > } > {code} > The reason why "getResources()" is printed because > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#getResources > invokes makeRemoteRequest. This is not too informative and error-prone as > name of getResources could change over time and the log will be outdated. > Moreover, it's not a good idea to print a method name from a method below the > current one in the stack. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9355) RMContainerRequestor#makeRemoteRequest has confusing log message
[ https://issues.apache.org/jira/browse/YARN-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533380#comment-17533380 ] Ashutosh Gupta commented on YARN-9355: -- Taking it up. > RMContainerRequestor#makeRemoteRequest has confusing log message > > > Key: YARN-9355 > URL: https://issues.apache.org/jira/browse/YARN-9355 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Umesh Mittal >Priority: Trivial > Labels: newbie, newbie++ > Attachments: YARN-9355.001.patch, YARN-9355.002.patch > > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#makeRemoteRequest > has this log: > {code:java} > if (ask.size() > 0 || release.size() > 0) { > LOG.info("getResources() for " + applicationId + ":" + " ask=" > + ask.size() + " release= " + release.size() + " newContainers=" > + allocateResponse.getAllocatedContainers().size() > + " finishedContainers=" + numCompletedContainers > + " resourcelimit=" + availableResources + " knownNMs=" > + clusterNmCount); > } > {code} > The reason why "getResources()" is printed because > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#getResources > invokes makeRemoteRequest. This is not too informative and error-prone as > name of getResources could change over time and the log will be outdated. > Moreover, it's not a good idea to print a method name from a method below the > current one in the stack. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9355) RMContainerRequestor#makeRemoteRequest has confusing log message
[ https://issues.apache.org/jira/browse/YARN-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Gupta reassigned YARN-9355: Assignee: Ashutosh Gupta (was: Umesh Mittal) > RMContainerRequestor#makeRemoteRequest has confusing log message > > > Key: YARN-9355 > URL: https://issues.apache.org/jira/browse/YARN-9355 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Ashutosh Gupta >Priority: Trivial > Labels: newbie, newbie++ > Attachments: YARN-9355.001.patch, YARN-9355.002.patch > > > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#makeRemoteRequest > has this log: > {code:java} > if (ask.size() > 0 || release.size() > 0) { > LOG.info("getResources() for " + applicationId + ":" + " ask=" > + ask.size() + " release= " + release.size() + " newContainers=" > + allocateResponse.getAllocatedContainers().size() > + " finishedContainers=" + numCompletedContainers > + " resourcelimit=" + availableResources + " knownNMs=" > + clusterNmCount); > } > {code} > The reason why "getResources()" is printed because > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#getResources > invokes makeRemoteRequest. This is not too informative and error-prone as > name of getResources could change over time and the log will be outdated. > Moreover, it's not a good idea to print a method name from a method below the > current one in the stack. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*
[ https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11128: -- Labels: pull-request-available (was: ) > Fix comments in TestProportionalCapacityPreemptionPolicy* > - > > Key: YARN-11128 > URL: https://issues.apache.org/jira/browse/YARN-11128 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, documentation >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > At various places, comment for appsConfig is > {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}} > but should be > {{// > queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11122) Support getClusterNodes API in FederationClientInterceptor
[ https://issues.apache.org/jira/browse/YARN-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fanshilun updated YARN-11122: - Summary: Support getClusterNodes API in FederationClientInterceptor (was: Support getClusterNodes In Federation architecture) > Support getClusterNodes API in FederationClientInterceptor > -- > > Key: YARN-11122 > URL: https://issues.apache.org/jira/browse/YARN-11122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: fanshilun >Priority: Major > Attachments: YARN-11122.01.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Yarn Federation is a very useful feature, especially important in clusters > with more than 1,000 units. For example, we have 2 offline clusters, which > are used for ad-hoc and offline scheduling (etl) respectively. We hope to be > able to isolate and rationally use 2 The resources of a cluster, at night, > adhoc resources can be used by the etl cluster; at the same time, during > breakfast (9:00-22:00), the etl cluster resources can also be used by the > adhoc cluster. > At the same time, so that more people can use this feature, gradually improve > the method that has not been implemented. > In YARN-10465, some methods have been implemented, from a personal point of > view, some doubts. > Question 1: Without considering the metric implementation, it is impossible > to understand the execution of related functions. > Question 2: Using multi-threading and reflection implementation, the > readability of the related logic is relatively poor, and there is not much > performance difference between this method and the conventional loop method > to obtain the theory. > Question 3: The code is already 2 years old, merged into the local branch, > there may be conflicts. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11134) Support getNodeToLabels API in FederationClientInterceptor
fanshilun created YARN-11134: Summary: Support getNodeToLabels API in FederationClientInterceptor Key: YARN-11134 URL: https://issues.apache.org/jira/browse/YARN-11134 Project: Hadoop YARN Issue Type: Sub-task Reporter: fanshilun -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11131) FlowRunCoprocessor Scan Used Deprecated Method
[ https://issues.apache.org/jira/browse/YARN-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fanshilun updated YARN-11131: - Component/s: ATSv2 Fix Version/s: 3.4.0 Affects Version/s: 3.3.2 3.3.1 3.3.0 > FlowRunCoprocessor Scan Used Deprecated Method > -- > > Key: YARN-11131 > URL: https://issues.apache.org/jira/browse/YARN-11131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: FlowRunCoprocessor#Scan Used Deprecated Methods.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Found FlowRunCoprocessor Used Deprecated Methods in > hadoop-yarn-server-timelineservice-hbase-server-2, try to replace -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*
[ https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-11128: - Component/s: documentation Issue Type: Bug (was: New Feature) > Fix comments in TestProportionalCapacityPreemptionPolicy* > - > > Key: YARN-11128 > URL: https://issues.apache.org/jira/browse/YARN-11128 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, documentation >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Minor > > At various places, comment for appsConfig is > {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}} > but should be > {{// > queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*
[ https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-11128: - Description: At various places, comment for appsConfig is {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}} but should be {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}} was: At various places, comment for appsConfig is `// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)` but should be `// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)` > Fix comments in TestProportionalCapacityPreemptionPolicy* > - > > Key: YARN-11128 > URL: https://issues.apache.org/jira/browse/YARN-11128 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Minor > > At various places, comment for appsConfig is > {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}} > but should be > {{// > queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11133) YarnClient gets the wrong EffectiveMinCapacity value
[ https://issues.apache.org/jira/browse/YARN-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11133: -- Labels: pull-request-available (was: ) > YarnClient gets the wrong EffectiveMinCapacity value > > > Key: YARN-11133 > URL: https://issues.apache.org/jira/browse/YARN-11133 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Affects Versions: 3.2.3, 3.3.2 >Reporter: Zilong Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It calls the QueueConfigurations#getEffectiveMinCapacity to get the wrong > value when I use the YarnClient. I found some bugs with > QueueConfigurationsPBImpl#mergeLocalToBuilder. > {code:java} > private void mergeLocalToBuilder() { > if (this.effMinResource != null) { > builder > .setEffectiveMinCapacity(convertToProtoFormat(this.effMinResource)); > } > if (this.effMaxResource != null) { > builder > .setEffectiveMaxCapacity(convertToProtoFormat(this.effMaxResource)); > } > if (this.configuredMinResource != null) { > builder.setEffectiveMinCapacity( > convertToProtoFormat(this.configuredMinResource)); > } > if (this.configuredMaxResource != null) { > builder.setEffectiveMaxCapacity( > convertToProtoFormat(this.configuredMaxResource)); > } > } {code} > configuredMinResource was incorrectly assigned to effMinResource. This causes > the real effMinResource to be overwritten and configuredMinResource is null. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11133) YarnClient gets the wrong EffectiveMinCapacity value
Zilong Zhu created YARN-11133: - Summary: YarnClient gets the wrong EffectiveMinCapacity value Key: YARN-11133 URL: https://issues.apache.org/jira/browse/YARN-11133 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.3.2, 3.2.3 Reporter: Zilong Zhu It calls the QueueConfigurations#getEffectiveMinCapacity to get the wrong value when I use the YarnClient. I found some bugs with QueueConfigurationsPBImpl#mergeLocalToBuilder. {code:java} private void mergeLocalToBuilder() { if (this.effMinResource != null) { builder .setEffectiveMinCapacity(convertToProtoFormat(this.effMinResource)); } if (this.effMaxResource != null) { builder .setEffectiveMaxCapacity(convertToProtoFormat(this.effMaxResource)); } if (this.configuredMinResource != null) { builder.setEffectiveMinCapacity( convertToProtoFormat(this.configuredMinResource)); } if (this.configuredMaxResource != null) { builder.setEffectiveMaxCapacity( convertToProtoFormat(this.configuredMaxResource)); } } {code} configuredMinResource was incorrectly assigned to effMinResource. This causes the real effMinResource to be overwritten and configuredMinResource is null. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] (YARN-11122) Support getClusterNodes In Federation architecture
[ https://issues.apache.org/jira/browse/YARN-11122 ] fanshilun deleted comment on YARN-11122: -- was (Author: slfan1989): This submission will cause the DelegationTokenSecretManagerMetrics to be initialized twice in Yarn ResourceManager and throw an exception. makes Junit Test failed > Support getClusterNodes In Federation architecture > -- > > Key: YARN-11122 > URL: https://issues.apache.org/jira/browse/YARN-11122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: fanshilun >Priority: Major > Attachments: YARN-11122.01.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Yarn Federation is a very useful feature, especially important in clusters > with more than 1,000 units. For example, we have 2 offline clusters, which > are used for ad-hoc and offline scheduling (etl) respectively. We hope to be > able to isolate and rationally use 2 The resources of a cluster, at night, > adhoc resources can be used by the etl cluster; at the same time, during > breakfast (9:00-22:00), the etl cluster resources can also be used by the > adhoc cluster. > At the same time, so that more people can use this feature, gradually improve > the method that has not been implemented. > In YARN-10465, some methods have been implemented, from a personal point of > view, some doubts. > Question 1: Without considering the metric implementation, it is impossible > to understand the execution of related functions. > Question 2: Using multi-threading and reflection implementation, the > readability of the related logic is relatively poor, and there is not much > performance difference between this method and the conventional loop method > to obtain the theory. > Question 3: The code is already 2 years old, merged into the local branch, > there may be conflicts. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11122) Support getClusterNodes In Federation architecture
[ https://issues.apache.org/jira/browse/YARN-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533235#comment-17533235 ] fanshilun edited comment on YARN-11122 at 5/7/22 9:22 AM: -- Hi, [~snemeth] Could you kindly help in reviewing this PR? Thanks. was (Author: slfan1989): [~snemeth] Could you kindly help in reviewing this PR? Thanks. > Support getClusterNodes In Federation architecture > -- > > Key: YARN-11122 > URL: https://issues.apache.org/jira/browse/YARN-11122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: fanshilun >Priority: Major > Attachments: YARN-11122.01.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Yarn Federation is a very useful feature, especially important in clusters > with more than 1,000 units. For example, we have 2 offline clusters, which > are used for ad-hoc and offline scheduling (etl) respectively. We hope to be > able to isolate and rationally use 2 The resources of a cluster, at night, > adhoc resources can be used by the etl cluster; at the same time, during > breakfast (9:00-22:00), the etl cluster resources can also be used by the > adhoc cluster. > At the same time, so that more people can use this feature, gradually improve > the method that has not been implemented. > In YARN-10465, some methods have been implemented, from a personal point of > view, some doubts. > Question 1: Without considering the metric implementation, it is impossible > to understand the execution of related functions. > Question 2: Using multi-threading and reflection implementation, the > readability of the related logic is relatively poor, and there is not much > performance difference between this method and the conventional loop method > to obtain the theory. > Question 3: The code is already 2 years old, merged into the local branch, > there may be conflicts. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11122) Support getClusterNodes In Federation architecture
[ https://issues.apache.org/jira/browse/YARN-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533235#comment-17533235 ] fanshilun commented on YARN-11122: -- [~snemeth] Could you kindly help in reviewing this PR? Thanks. > Support getClusterNodes In Federation architecture > -- > > Key: YARN-11122 > URL: https://issues.apache.org/jira/browse/YARN-11122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: fanshilun >Priority: Major > Attachments: YARN-11122.01.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Yarn Federation is a very useful feature, especially important in clusters > with more than 1,000 units. For example, we have 2 offline clusters, which > are used for ad-hoc and offline scheduling (etl) respectively. We hope to be > able to isolate and rationally use 2 The resources of a cluster, at night, > adhoc resources can be used by the etl cluster; at the same time, during > breakfast (9:00-22:00), the etl cluster resources can also be used by the > adhoc cluster. > At the same time, so that more people can use this feature, gradually improve > the method that has not been implemented. > In YARN-10465, some methods have been implemented, from a personal point of > view, some doubts. > Question 1: Without considering the metric implementation, it is impossible > to understand the execution of related functions. > Question 2: Using multi-threading and reflection implementation, the > readability of the related logic is relatively poor, and there is not much > performance difference between this method and the conventional loop method > to obtain the theory. > Question 3: The code is already 2 years old, merged into the local branch, > there may be conflicts. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.
[ https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated YARN-11127: --- Description: I found rm deadlock in our cluster. It's a low probability event. some critical jstack information are below: {code:java} "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 waiting on condition [0x7f85dd00b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7f9389aab478> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) - locked <0x7f88db78c5c8> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670) at org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) at java.lang.Thread.run(Thread.java:748) "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7f938976e818> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getResourceUsageReport(FiCaSchedulerApp.java:1115) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:433) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:143) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getRMAppMetrics(RMAppImpl.java:1693) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:742) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:428) at
[jira] [Commented] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.
[ https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533204#comment-17533204 ] zhengchenyu commented on YARN-11127: Another problem is that When dispatcher thread stuck, rm can't stop by itself, means rm failover fail. I open YARN-11132 to discuss this problem. > Potential deadlock in AsyncDispatcher caused by RMNodeImpl, > SchedulerApplicationAttempt and RMAppImpl's lock contention. > > > Key: YARN-11127 > URL: https://issues.apache.org/jira/browse/YARN-11127 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Attachments: rm-dead-lock.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > I found rm deadlock in our cluster. It's a low probability event. some > critical jstack information are below: > {code:java} > "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 > waiting on condition [0x7f85dd00b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f9389aab478> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > - locked <0x7f88db78c5c8> (a > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670) > at > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 > tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7f938976e818> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at >
[jira] [Updated] (YARN-11132) RM failover may fail when Dispatcher stuck.
[ https://issues.apache.org/jira/browse/YARN-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11132: -- Labels: pull-request-available (was: ) > RM failover may fail when Dispatcher stuck. > --- > > Key: YARN-11132 > URL: https://issues.apache.org/jira/browse/YARN-11132 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, yarn >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If dispatcher stuck because of dead lock, rm failover will fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11132) RM failover may fail when Dispatcher stuck.
[ https://issues.apache.org/jira/browse/YARN-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533203#comment-17533203 ] zhengchenyu commented on YARN-11132: I think we could watch the head element of eventQueue to detect dead lock. > RM failover may fail when Dispatcher stuck. > --- > > Key: YARN-11132 > URL: https://issues.apache.org/jira/browse/YARN-11132 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, yarn >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > > If dispatcher stuck because of dead lock, rm failover will fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11132) RM failover may fail when Dispatcher stuck.
zhengchenyu created YARN-11132: -- Summary: RM failover may fail when Dispatcher stuck. Key: YARN-11132 URL: https://issues.apache.org/jira/browse/YARN-11132 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, yarn Reporter: zhengchenyu Assignee: zhengchenyu If dispatcher stuck because of dead lock, rm failover will fail. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11131) FlowRunCoprocessor Scan Used Deprecated Method
[ https://issues.apache.org/jira/browse/YARN-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11131: -- Labels: pull-request-available (was: ) > FlowRunCoprocessor Scan Used Deprecated Method > -- > > Key: YARN-11131 > URL: https://issues.apache.org/jira/browse/YARN-11131 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Priority: Major > Labels: pull-request-available > Attachments: FlowRunCoprocessor#Scan Used Deprecated Methods.png > > Time Spent: 10m > Remaining Estimate: 0h > > Found FlowRunCoprocessor Used Deprecated Methods in > hadoop-yarn-server-timelineservice-hbase-server-2, try to replace -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org