[jira] [Created] (YARN-11136) Support getLabelsToNodes API in FederationClientInterceptor

2022-05-07 Thread fanshilun (Jira)
fanshilun created YARN-11136:


 Summary: Support getLabelsToNodes API in 
FederationClientInterceptor
 Key: YARN-11136
 URL: https://issues.apache.org/jira/browse/YARN-11136
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: fanshilun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11135) Optimize Imports in hadoop project

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11135:
--
Labels: pull-request-available  (was: )

> Optimize Imports in hadoop project
> --
>
> Key: YARN-11135
> URL: https://issues.apache.org/jira/browse/YARN-11135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Optimize Imports to keep code clean
>  # Remove any unused imports
>  # Sort the import statements.
>  # Remove .* imports



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11135) Optimize Imports in hadoop project

2022-05-07 Thread Ashutosh Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Gupta updated YARN-11135:
--
Description: 
h3. Optimize Imports to keep code clean
 # Remove any unused imports
 # Sort the import statements.
 # Remove .* imports

  was:
h3. Optimize Imports to keep code clean
 # Remove any unused imports,
 # Sort the import statements.


> Optimize Imports in hadoop project
> --
>
> Key: YARN-11135
> URL: https://issues.apache.org/jira/browse/YARN-11135
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>
> h3. Optimize Imports to keep code clean
>  # Remove any unused imports
>  # Sort the import statements.
>  # Remove .* imports



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11135) Optimize Imports in hadoop project

2022-05-07 Thread Ashutosh Gupta (Jira)
Ashutosh Gupta created YARN-11135:
-

 Summary: Optimize Imports in hadoop project
 Key: YARN-11135
 URL: https://issues.apache.org/jira/browse/YARN-11135
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


h3. Optimize Imports to keep code clean
 # Remove any unused imports,
 # Sort the import statements.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10080) Support show app id on localizer thread pool

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10080:
--
Labels: pull-request-available  (was: )

> Support show app id on localizer thread pool
> 
>
> Key: YARN-10080
> URL: https://issues.apache.org/jira/browse/YARN-10080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10080-001.patch, YARN-10080.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when we are troubleshooting a container localizer issue, if we want 
> to analyze the jstack with thread detail, we can not figure out which thread 
> is processing the given container. So i want to add app id on the thread name



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10080) Support show app id on localizer thread pool

2022-05-07 Thread Ashutosh Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Gupta reassigned YARN-10080:
-

Assignee: Ashutosh Gupta  (was: zhoukang)

> Support show app id on localizer thread pool
> 
>
> Key: YARN-10080
> URL: https://issues.apache.org/jira/browse/YARN-10080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: Ashutosh Gupta
>Priority: Major
> Attachments: YARN-10080-001.patch, YARN-10080.002.patch
>
>
> Currently when we are troubleshooting a container localizer issue, if we want 
> to analyze the jstack with thread detail, we can not figure out which thread 
> is processing the given container. So i want to add app id on the thread name



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool

2022-05-07 Thread Ashutosh Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533382#comment-17533382
 ] 

Ashutosh Gupta commented on YARN-10080:
---

Taking it up. 

> Support show app id on localizer thread pool
> 
>
> Key: YARN-10080
> URL: https://issues.apache.org/jira/browse/YARN-10080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10080-001.patch, YARN-10080.002.patch
>
>
> Currently when we are troubleshooting a container localizer issue, if we want 
> to analyze the jstack with thread detail, we can not figure out which thread 
> is processing the given container. So i want to add app id on the thread name



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9355) RMContainerRequestor#makeRemoteRequest has confusing log message

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-9355:
-
Labels: newbie newbie++ pull-request-available  (was: newbie newbie++)

> RMContainerRequestor#makeRemoteRequest has confusing log message
> 
>
> Key: YARN-9355
> URL: https://issues.apache.org/jira/browse/YARN-9355
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Ashutosh Gupta
>Priority: Trivial
>  Labels: newbie, newbie++, pull-request-available
> Attachments: YARN-9355.001.patch, YARN-9355.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#makeRemoteRequest 
> has this log: 
> {code:java}
> if (ask.size() > 0 || release.size() > 0) {
>   LOG.info("getResources() for " + applicationId + ":" + " ask="
>   + ask.size() + " release= " + release.size() + " newContainers="
>   + allocateResponse.getAllocatedContainers().size()
>   + " finishedContainers=" + numCompletedContainers
>   + " resourcelimit=" + availableResources + " knownNMs="
>   + clusterNmCount);
> }
> {code}
> The reason why "getResources()" is printed because 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#getResources 
> invokes makeRemoteRequest. This is not too informative and error-prone as 
> name of getResources could change over time and the log will be outdated. 
> Moreover, it's not a good idea to print a method name from a method below the 
> current one in the stack.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9355) RMContainerRequestor#makeRemoteRequest has confusing log message

2022-05-07 Thread Ashutosh Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533380#comment-17533380
 ] 

Ashutosh Gupta commented on YARN-9355:
--

Taking it up.

> RMContainerRequestor#makeRemoteRequest has confusing log message
> 
>
> Key: YARN-9355
> URL: https://issues.apache.org/jira/browse/YARN-9355
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Umesh Mittal
>Priority: Trivial
>  Labels: newbie, newbie++
> Attachments: YARN-9355.001.patch, YARN-9355.002.patch
>
>
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#makeRemoteRequest 
> has this log: 
> {code:java}
> if (ask.size() > 0 || release.size() > 0) {
>   LOG.info("getResources() for " + applicationId + ":" + " ask="
>   + ask.size() + " release= " + release.size() + " newContainers="
>   + allocateResponse.getAllocatedContainers().size()
>   + " finishedContainers=" + numCompletedContainers
>   + " resourcelimit=" + availableResources + " knownNMs="
>   + clusterNmCount);
> }
> {code}
> The reason why "getResources()" is printed because 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#getResources 
> invokes makeRemoteRequest. This is not too informative and error-prone as 
> name of getResources could change over time and the log will be outdated. 
> Moreover, it's not a good idea to print a method name from a method below the 
> current one in the stack.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9355) RMContainerRequestor#makeRemoteRequest has confusing log message

2022-05-07 Thread Ashutosh Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Gupta reassigned YARN-9355:


Assignee: Ashutosh Gupta  (was: Umesh Mittal)

> RMContainerRequestor#makeRemoteRequest has confusing log message
> 
>
> Key: YARN-9355
> URL: https://issues.apache.org/jira/browse/YARN-9355
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Ashutosh Gupta
>Priority: Trivial
>  Labels: newbie, newbie++
> Attachments: YARN-9355.001.patch, YARN-9355.002.patch
>
>
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor#makeRemoteRequest 
> has this log: 
> {code:java}
> if (ask.size() > 0 || release.size() > 0) {
>   LOG.info("getResources() for " + applicationId + ":" + " ask="
>   + ask.size() + " release= " + release.size() + " newContainers="
>   + allocateResponse.getAllocatedContainers().size()
>   + " finishedContainers=" + numCompletedContainers
>   + " resourcelimit=" + availableResources + " knownNMs="
>   + clusterNmCount);
> }
> {code}
> The reason why "getResources()" is printed because 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#getResources 
> invokes makeRemoteRequest. This is not too informative and error-prone as 
> name of getResources could change over time and the log will be outdated. 
> Moreover, it's not a good idea to print a method name from a method below the 
> current one in the stack.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11128:
--
Labels: pull-request-available  (was: )

> Fix comments in TestProportionalCapacityPreemptionPolicy*
> -
>
> Key: YARN-11128
> URL: https://issues.apache.org/jira/browse/YARN-11128
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, documentation
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At various places, comment for appsConfig is 
> {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}}
> but should be 
> {{// 
> queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11122) Support getClusterNodes API in FederationClientInterceptor

2022-05-07 Thread fanshilun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fanshilun updated YARN-11122:
-
Summary: Support getClusterNodes API in FederationClientInterceptor  (was: 
Support getClusterNodes In Federation architecture)

> Support getClusterNodes API in FederationClientInterceptor
> --
>
> Key: YARN-11122
> URL: https://issues.apache.org/jira/browse/YARN-11122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: fanshilun
>Priority: Major
> Attachments: YARN-11122.01.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Yarn Federation is a very useful feature, especially important in clusters 
> with more than 1,000 units. For example, we have 2 offline clusters, which 
> are used for ad-hoc and offline scheduling (etl) respectively. We hope to be 
> able to isolate and rationally use 2 The resources of a cluster, at night, 
> adhoc resources can be used by the etl cluster; at the same time, during 
> breakfast (9:00-22:00), the etl cluster resources can also be used by the 
> adhoc cluster.
> At the same time, so that more people can use this feature, gradually improve 
> the method that has not been implemented.
> In YARN-10465, some methods have been implemented, from a personal point of 
> view, some doubts.
> Question 1: Without considering the metric implementation, it is impossible 
> to understand the execution of related functions.
> Question 2: Using multi-threading and reflection implementation, the 
> readability of the related logic is relatively poor, and there is not much 
> performance difference between this method and the conventional loop method 
> to obtain the theory.
> Question 3: The code is already 2 years old, merged into the local branch, 
> there may be conflicts.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11134) Support getNodeToLabels API in FederationClientInterceptor

2022-05-07 Thread fanshilun (Jira)
fanshilun created YARN-11134:


 Summary: Support getNodeToLabels API in FederationClientInterceptor
 Key: YARN-11134
 URL: https://issues.apache.org/jira/browse/YARN-11134
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: fanshilun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11131) FlowRunCoprocessor Scan Used Deprecated Method

2022-05-07 Thread fanshilun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fanshilun updated YARN-11131:
-
  Component/s: ATSv2
Fix Version/s: 3.4.0
Affects Version/s: 3.3.2
   3.3.1
   3.3.0

> FlowRunCoprocessor Scan Used Deprecated Method
> --
>
> Key: YARN-11131
> URL: https://issues.apache.org/jira/browse/YARN-11131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: FlowRunCoprocessor#Scan Used Deprecated Methods.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Found FlowRunCoprocessor Used Deprecated Methods in 
> hadoop-yarn-server-timelineservice-hbase-server-2, try to replace



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*

2022-05-07 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-11128:
-
Component/s: documentation
 Issue Type: Bug  (was: New Feature)

> Fix comments in TestProportionalCapacityPreemptionPolicy*
> -
>
> Key: YARN-11128
> URL: https://issues.apache.org/jira/browse/YARN-11128
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, documentation
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>
> At various places, comment for appsConfig is 
> {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}}
> but should be 
> {{// 
> queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11128) Fix comments in TestProportionalCapacityPreemptionPolicy*

2022-05-07 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-11128:
-
Description: 
At various places, comment for appsConfig is 

{{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}}

but should be 

{{// 
queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}}

 

  was:
At various places, comment for appsConfig is 

`// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)`

but should be 

`// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)`

 


> Fix comments in TestProportionalCapacityPreemptionPolicy*
> -
>
> Key: YARN-11128
> URL: https://issues.apache.org/jira/browse/YARN-11128
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>
> At various places, comment for appsConfig is 
> {{// queueName\t(priority,resource,host,expression,#repeat,reserved,pending)}}
> but should be 
> {{// 
> queueName\t(priority,resource,host,expression,#repeat,reserved,pending,user)}}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11133) YarnClient gets the wrong EffectiveMinCapacity value

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11133:
--
Labels: pull-request-available  (was: )

> YarnClient gets the wrong EffectiveMinCapacity value
> 
>
> Key: YARN-11133
> URL: https://issues.apache.org/jira/browse/YARN-11133
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.2.3, 3.3.2
>Reporter: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It calls the QueueConfigurations#getEffectiveMinCapacity to get the wrong 
> value when I use the YarnClient. I found some bugs with 
> QueueConfigurationsPBImpl#mergeLocalToBuilder.
> {code:java}
> private void mergeLocalToBuilder() {
>   if (this.effMinResource != null) {
> builder
> .setEffectiveMinCapacity(convertToProtoFormat(this.effMinResource));
>   }
>   if (this.effMaxResource != null) {
> builder
> .setEffectiveMaxCapacity(convertToProtoFormat(this.effMaxResource));
>   }
>   if (this.configuredMinResource != null) {
> builder.setEffectiveMinCapacity(
> convertToProtoFormat(this.configuredMinResource));
>   }
>   if (this.configuredMaxResource != null) {
> builder.setEffectiveMaxCapacity(
> convertToProtoFormat(this.configuredMaxResource));
>   }
> } {code}
> configuredMinResource was incorrectly assigned to effMinResource. This causes 
> the real effMinResource to be overwritten and configuredMinResource is null. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11133) YarnClient gets the wrong EffectiveMinCapacity value

2022-05-07 Thread Zilong Zhu (Jira)
Zilong Zhu created YARN-11133:
-

 Summary: YarnClient gets the wrong EffectiveMinCapacity value
 Key: YARN-11133
 URL: https://issues.apache.org/jira/browse/YARN-11133
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.3.2, 3.2.3
Reporter: Zilong Zhu


It calls the QueueConfigurations#getEffectiveMinCapacity to get the wrong value 
when I use the YarnClient. I found some bugs with 
QueueConfigurationsPBImpl#mergeLocalToBuilder.
{code:java}
private void mergeLocalToBuilder() {
  if (this.effMinResource != null) {
builder
.setEffectiveMinCapacity(convertToProtoFormat(this.effMinResource));
  }
  if (this.effMaxResource != null) {
builder
.setEffectiveMaxCapacity(convertToProtoFormat(this.effMaxResource));
  }
  if (this.configuredMinResource != null) {
builder.setEffectiveMinCapacity(
convertToProtoFormat(this.configuredMinResource));
  }
  if (this.configuredMaxResource != null) {
builder.setEffectiveMaxCapacity(
convertToProtoFormat(this.configuredMaxResource));
  }
} {code}
configuredMinResource was incorrectly assigned to effMinResource. This causes 
the real effMinResource to be overwritten and configuredMinResource is null. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] (YARN-11122) Support getClusterNodes In Federation architecture

2022-05-07 Thread fanshilun (Jira)


[ https://issues.apache.org/jira/browse/YARN-11122 ]


fanshilun deleted comment on YARN-11122:
--

was (Author: slfan1989):
This submission will cause the DelegationTokenSecretManagerMetrics to be 
initialized twice in Yarn ResourceManager and throw an exception.
makes Junit Test failed



> Support getClusterNodes In Federation architecture
> --
>
> Key: YARN-11122
> URL: https://issues.apache.org/jira/browse/YARN-11122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: fanshilun
>Priority: Major
> Attachments: YARN-11122.01.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Yarn Federation is a very useful feature, especially important in clusters 
> with more than 1,000 units. For example, we have 2 offline clusters, which 
> are used for ad-hoc and offline scheduling (etl) respectively. We hope to be 
> able to isolate and rationally use 2 The resources of a cluster, at night, 
> adhoc resources can be used by the etl cluster; at the same time, during 
> breakfast (9:00-22:00), the etl cluster resources can also be used by the 
> adhoc cluster.
> At the same time, so that more people can use this feature, gradually improve 
> the method that has not been implemented.
> In YARN-10465, some methods have been implemented, from a personal point of 
> view, some doubts.
> Question 1: Without considering the metric implementation, it is impossible 
> to understand the execution of related functions.
> Question 2: Using multi-threading and reflection implementation, the 
> readability of the related logic is relatively poor, and there is not much 
> performance difference between this method and the conventional loop method 
> to obtain the theory.
> Question 3: The code is already 2 years old, merged into the local branch, 
> there may be conflicts.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-11122) Support getClusterNodes In Federation architecture

2022-05-07 Thread fanshilun (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533235#comment-17533235
 ] 

fanshilun edited comment on YARN-11122 at 5/7/22 9:22 AM:
--

Hi, [~snemeth]  Could you kindly help in reviewing this PR? Thanks.


was (Author: slfan1989):
[~snemeth]  Could you kindly help in reviewing this PR? Thanks.

> Support getClusterNodes In Federation architecture
> --
>
> Key: YARN-11122
> URL: https://issues.apache.org/jira/browse/YARN-11122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: fanshilun
>Priority: Major
> Attachments: YARN-11122.01.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Yarn Federation is a very useful feature, especially important in clusters 
> with more than 1,000 units. For example, we have 2 offline clusters, which 
> are used for ad-hoc and offline scheduling (etl) respectively. We hope to be 
> able to isolate and rationally use 2 The resources of a cluster, at night, 
> adhoc resources can be used by the etl cluster; at the same time, during 
> breakfast (9:00-22:00), the etl cluster resources can also be used by the 
> adhoc cluster.
> At the same time, so that more people can use this feature, gradually improve 
> the method that has not been implemented.
> In YARN-10465, some methods have been implemented, from a personal point of 
> view, some doubts.
> Question 1: Without considering the metric implementation, it is impossible 
> to understand the execution of related functions.
> Question 2: Using multi-threading and reflection implementation, the 
> readability of the related logic is relatively poor, and there is not much 
> performance difference between this method and the conventional loop method 
> to obtain the theory.
> Question 3: The code is already 2 years old, merged into the local branch, 
> there may be conflicts.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11122) Support getClusterNodes In Federation architecture

2022-05-07 Thread fanshilun (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533235#comment-17533235
 ] 

fanshilun commented on YARN-11122:
--

[~snemeth]  Could you kindly help in reviewing this PR? Thanks.

> Support getClusterNodes In Federation architecture
> --
>
> Key: YARN-11122
> URL: https://issues.apache.org/jira/browse/YARN-11122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: fanshilun
>Priority: Major
> Attachments: YARN-11122.01.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Yarn Federation is a very useful feature, especially important in clusters 
> with more than 1,000 units. For example, we have 2 offline clusters, which 
> are used for ad-hoc and offline scheduling (etl) respectively. We hope to be 
> able to isolate and rationally use 2 The resources of a cluster, at night, 
> adhoc resources can be used by the etl cluster; at the same time, during 
> breakfast (9:00-22:00), the etl cluster resources can also be used by the 
> adhoc cluster.
> At the same time, so that more people can use this feature, gradually improve 
> the method that has not been implemented.
> In YARN-10465, some methods have been implemented, from a personal point of 
> view, some doubts.
> Question 1: Without considering the metric implementation, it is impossible 
> to understand the execution of related functions.
> Question 2: Using multi-threading and reflection implementation, the 
> readability of the related logic is relatively poor, and there is not much 
> performance difference between this method and the conventional loop method 
> to obtain the theory.
> Question 3: The code is already 2 years old, merged into the local branch, 
> there may be conflicts.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.

2022-05-07 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-11127:
---
Description: 
I found rm deadlock in our cluster. It's a low probability event. some critical 
jstack information are below: 
{code:java}
"RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 
waiting on condition [0x7f85dd00b000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x7f9389aab478> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        - locked <0x7f88db78c5c8> (a 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
        at java.lang.Thread.run(Thread.java:748)


"IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 
tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x7f938976e818> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getResourceUsageReport(FiCaSchedulerApp.java:1115)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:433)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:143)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getRMAppMetrics(RMAppImpl.java:1693)
        at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:742)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:428)
        at 

[jira] [Commented] (YARN-11127) Potential deadlock in AsyncDispatcher caused by RMNodeImpl, SchedulerApplicationAttempt and RMAppImpl's lock contention.

2022-05-07 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533204#comment-17533204
 ] 

zhengchenyu commented on YARN-11127:


Another problem is that When dispatcher thread stuck, rm can't stop by itself, 
means rm failover fail. I open YARN-11132 to discuss this problem.

> Potential deadlock in AsyncDispatcher caused by RMNodeImpl, 
> SchedulerApplicationAttempt and RMAppImpl's lock contention.
> 
>
> Key: YARN-11127
> URL: https://issues.apache.org/jira/browse/YARN-11127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Attachments: rm-dead-lock.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I found rm deadlock in our cluster. It's a low probability event. some 
> critical jstack information are below: 
> {code:java}
> "RM Event dispatcher" #63 prio=5 os_prio=0 tid=0x7f9a73aaa800 nid=0x221e7 
> waiting on condition [0x7f85dd00b000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x7f9389aab478> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppLogAggregation.aggregateLogReport(RMAppLogAggregation.java:120)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.aggregateLogReport(RMAppImpl.java:1740)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handleLogAggregationStatus(RMNodeImpl.java:1481)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.access$500(RMNodeImpl.java:104)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl$StatusUpdateWhenHealthyTransition.transition(RMNodeImpl.java:1198)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         - locked <0x7f88db78c5c8> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:670)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.handle(RMNodeImpl.java:101)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1116)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher.handle(ResourceManager.java:1100)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
>         at java.lang.Thread.run(Thread.java:748)
> "IPC Server handler 264 on default port 8032" #1717 daemon prio=5 os_prio=0 
> tid=0x55b69acc2800 nid=0x229a5 waiting on condition [0x7f8574ba2000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x7f938976e818> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at 
> 

[jira] [Updated] (YARN-11132) RM failover may fail when Dispatcher stuck.

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11132:
--
Labels: pull-request-available  (was: )

> RM failover may fail when Dispatcher stuck.
> ---
>
> Key: YARN-11132
> URL: https://issues.apache.org/jira/browse/YARN-11132
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, yarn
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If dispatcher stuck because of dead lock, rm failover will fail.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11132) RM failover may fail when Dispatcher stuck.

2022-05-07 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533203#comment-17533203
 ] 

zhengchenyu commented on YARN-11132:


I think we could watch the head element of eventQueue to detect dead lock.

> RM failover may fail when Dispatcher stuck.
> ---
>
> Key: YARN-11132
> URL: https://issues.apache.org/jira/browse/YARN-11132
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, yarn
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>
> If dispatcher stuck because of dead lock, rm failover will fail.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11132) RM failover may fail when Dispatcher stuck.

2022-05-07 Thread zhengchenyu (Jira)
zhengchenyu created YARN-11132:
--

 Summary: RM failover may fail when Dispatcher stuck.
 Key: YARN-11132
 URL: https://issues.apache.org/jira/browse/YARN-11132
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, yarn
Reporter: zhengchenyu
Assignee: zhengchenyu


If dispatcher stuck because of dead lock, rm failover will fail.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11131) FlowRunCoprocessor Scan Used Deprecated Method

2022-05-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11131:
--
Labels: pull-request-available  (was: )

> FlowRunCoprocessor Scan Used Deprecated Method
> --
>
> Key: YARN-11131
> URL: https://issues.apache.org/jira/browse/YARN-11131
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: fanshilun
>Priority: Major
>  Labels: pull-request-available
> Attachments: FlowRunCoprocessor#Scan Used Deprecated Methods.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Found FlowRunCoprocessor Used Deprecated Methods in 
> hadoop-yarn-server-timelineservice-hbase-server-2, try to replace



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org