[jira] [Commented] (YARN-10282) CLONE - hadoop-yarn-server-nodemanager build failed: make failed with error code 2

2021-04-29 Thread Wenhui Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337038#comment-17337038
 ] 

Wenhui Xu commented on YARN-10282:
--

i have same problem 3.3.0 in mac, can anyone help?

 

input:

 mvn package -Pdist,native -DskipTests -Dmaven.javadoc.skip -e -X

 

output:

...

 

[*INFO*] 
**

[*INFO*] *BUILD FAILURE*

[*INFO*] 
**

[*INFO*] Total time:  01:48 min

[*INFO*] Finished at: 2021-04-30T10:43:40+08:00

[*INFO*] 
**

[*ERROR*] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.3.0:cmake-compile *(cmake-compile)* on 
project hadoop-yarn-server-nodemanager: *make failed with error code 2* -> 
*[Help 1]*

*org.apache.maven.lifecycle.LifecycleExecutionException*: *Failed to execute 
goal* *org.apache.hadoop:hadoop-maven-plugins:3.3.0:cmake-compile* 
*(cmake-compile)* on project hadoop-yarn-server-nodemanager: *make failed with 
error code 2*

    *at* org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(*MojoExecutor.java:215*)

    *at* org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(*MojoExecutor.java:156*)

    *at* org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(*MojoExecutor.java:148*)

    *at* 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(*LifecycleModuleBuilder.java:117*)

    *at* 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(*LifecycleModuleBuilder.java:81*)

    *at* 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (*SingleThreadedBuilder.java:56*)

    *at* org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(*LifecycleStarter.java:128*)

    *at* org.apache.maven.DefaultMaven.doExecute (*DefaultMaven.java:305*)

    *at* org.apache.maven.DefaultMaven.doExecute (*DefaultMaven.java:192*)

    *at* org.apache.maven.DefaultMaven.execute (*DefaultMaven.java:105*)

    *at* org.apache.maven.cli.MavenCli.execute (*MavenCli.java:957*)

    *at* org.apache.maven.cli.MavenCli.doMain (*MavenCli.java:289*)

    *at* org.apache.maven.cli.MavenCli.main (*MavenCli.java:193*)

    *at* jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (*Native Method*)

    *at* jdk.internal.reflect.NativeMethodAccessorImpl.invoke 
(*NativeMethodAccessorImpl.java:64*)

    *at* jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke 
(*DelegatingMethodAccessorImpl.java:43*)

    *at* java.lang.reflect.Method.invoke (*Method.java:564*)

    *at* org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(*Launcher.java:282*)

    *at* org.codehaus.plexus.classworlds.launcher.Launcher.launch 
(*Launcher.java:225*)

    *at* org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(*Launcher.java:406*)

    *at* org.codehaus.plexus.classworlds.launcher.Launcher.main 
(*Launcher.java:347*)

*Caused by*: org.apache.maven.plugin.MojoExecutionException: *make failed with 
error code 2*

    *at* org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.runMake 
(*CompileMojo.java:229*)

    *at* org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.execute 
(*CompileMojo.java:98*)

    *at* org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(*DefaultBuildPluginManager.java:137*)

    *at* org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(*MojoExecutor.java:210*)

    *at* org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(*MojoExecutor.java:156*)

    *at* org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(*MojoExecutor.java:148*)

    *at* 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(*LifecycleModuleBuilder.java:117*)

    *at* 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(*LifecycleModuleBuilder.java:81*)

    *at* 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (*SingleThreadedBuilder.java:56*)

    *at* org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(*LifecycleStarter.java:128*)

    *at* org.apache.maven.DefaultMaven.doExecute (*DefaultMaven.java:305*)

    *at* org.apache.maven.DefaultMaven.doExecute (*DefaultMaven.java:192*)

    *at* org.apache.maven.DefaultMaven.execute (*DefaultMaven.java:105*)

    *at* org.apache.maven.cli.MavenCli.execute (*MavenCli.java:957*)

    *at* org.apache.maven.cli.MavenCli.doMain (*MavenCli.java:289*)

    *at* org.apache.maven.cli.MavenCli.main (*MavenCli.java:193*)

    *at* jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (*Native Method*)

    *at* jdk.internal.reflect.NativeMethodAccessorImpl.invoke 
(*NativeMethodAccessorImpl.java:64*)

    *at* jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke 
(*DelegatingMethodAccessorImpl.java:43*)

    

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

2021-04-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335845#comment-17335845
 ] 

Hadoop QA commented on YARN-10571:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
58s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 31m 
12s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-mvninstall-root.txt{color}
 | {color:red} root in trunk failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
28s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
27s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/buildtool-branch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} The patch fails to run checkstyle in 
hadoop-yarn-server-resourcemanager {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
31s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
1m 41s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
33s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
30s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08. {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
17s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  0m 
31s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/946/artifact/out/branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
22s{color} | 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-29 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335804#comment-17335804
 ] 

Eric Badger commented on YARN-9927:
---

{noformat}
+// Test multi thread dispatcher
+conf.setBoolean(YarnConfiguration.
+MULTI_THREAD_DISPATCHER_ENABLED, true);
{noformat}
If this is a feature that is disabled by default, I don't think we should have 
it enabled by default in all of the RM tests. I would be happier running it as 
a parameterized test with both multi and single thread dispatchers.

In general I think the patch looks reasonable, but I would like to see testing 
done to see if this makes the problem better or worse. I would think it would 
make things better, but until we run some real tests on it, we won't really 
know. So getting something similar to what [~hcarrot] provided originally would 
be good. That way we can merge this with confidence. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

2021-04-29 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10571:

Attachment: YARN-10571.003.patch

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

2021-04-29 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10571:

Attachment: (was: YARN-10571.003.patch)

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-04-29 Thread Andrew Chung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335761#comment-17335761
 ] 

Andrew Chung commented on YARN-10760:
-

[~inigoiri] Sure thing!

> Number of allocated OPPORTUNISTIC containers can dip below 0
> 
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: Andrew Chung
>Assignee: Andrew Chung
>Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from 
> multiple sources, yet it appears that there are scenarios in which the caller 
> does not hold the appropriate lock, which can lead to the count of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple 
> fix might be to check if the {{RMContainer}} has already been removed 
> beforehand, though that may not fix the underlying issue that causes the race 
> condition.
> Following is "capture" of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
> JMX query:
> {noformat}
> {
> "name" : 
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> "AllocateLatencyOQuantiles50thPercentileTime" : 0,
> "AllocateLatencyOQuantiles75thPercentileTime" : 0,
> "AllocateLatencyOQuantiles90thPercentileTime" : 0,
> "AllocateLatencyOQuantiles95thPercentileTime" : 0,
> "AllocateLatencyOQuantiles99thPercentileTime" : 0
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-29 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335726#comment-17335726
 ] 

Eric Badger commented on YARN-10707:


Thanks for the updates, [~zhuqi]! +1 I've committed this to trunk (3.4) and 
branch-3.3. There are conflicts backporting back further than that

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch, 
> YARN-10707.009.patch, YARN-10707.010.patch, YARN-10707.011.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-29 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10707:
---
Fix Version/s: 3.3.1
   3.4.0

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch, 
> YARN-10707.009.patch, YARN-10707.010.patch, YARN-10707.011.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-04-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335605#comment-17335605
 ] 

Íñigo Goiri commented on YARN-10760:


Thanks [~afchung90], could you create a PR for this?

> Number of allocated OPPORTUNISTIC containers can dip below 0
> 
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: Andrew Chung
>Assignee: Andrew Chung
>Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from 
> multiple sources, yet it appears that there are scenarios in which the caller 
> does not hold the appropriate lock, which can lead to the count of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple 
> fix might be to check if the {{RMContainer}} has already been removed 
> beforehand, though that may not fix the underlying issue that causes the race 
> condition.
> Following is "capture" of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
> JMX query:
> {noformat}
> {
> "name" : 
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> "AllocateLatencyOQuantiles50thPercentileTime" : 0,
> "AllocateLatencyOQuantiles75thPercentileTime" : 0,
> "AllocateLatencyOQuantiles90thPercentileTime" : 0,
> "AllocateLatencyOQuantiles95thPercentileTime" : 0,
> "AllocateLatencyOQuantiles99thPercentileTime" : 0
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-04-29 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned YARN-10760:
--

Assignee: Andrew Chung

> Number of allocated OPPORTUNISTIC containers can dip below 0
> 
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: Andrew Chung
>Assignee: Andrew Chung
>Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from 
> multiple sources, yet it appears that there are scenarios in which the caller 
> does not hold the appropriate lock, which can lead to the count of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple 
> fix might be to check if the {{RMContainer}} has already been removed 
> beforehand, though that may not fix the underlying issue that causes the race 
> condition.
> Following is "capture" of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
> JMX query:
> {noformat}
> {
> "name" : 
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> "AllocateLatencyOQuantiles50thPercentileTime" : 0,
> "AllocateLatencyOQuantiles75thPercentileTime" : 0,
> "AllocateLatencyOQuantiles90thPercentileTime" : 0,
> "AllocateLatencyOQuantiles95thPercentileTime" : 0,
> "AllocateLatencyOQuantiles99thPercentileTime" : 0
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-04-29 Thread Andrew Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Chung updated YARN-10760:

Affects Version/s: 3.1.2

> Number of allocated OPPORTUNISTIC containers can dip below 0
> 
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: Andrew Chung
>Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from 
> multiple sources, yet it appears that there are scenarios in which the caller 
> does not hold the appropriate lock, which can lead to the count of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple 
> fix might be to check if the {{RMContainer}} has already been removed 
> beforehand, though that may not fix the underlying issue that causes the race 
> condition.
> Following is "capture" of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
> JMX query:
> {noformat}
> {
> "name" : 
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> "AllocateLatencyOQuantiles50thPercentileTime" : 0,
> "AllocateLatencyOQuantiles75thPercentileTime" : 0,
> "AllocateLatencyOQuantiles90thPercentileTime" : 0,
> "AllocateLatencyOQuantiles95thPercentileTime" : 0,
> "AllocateLatencyOQuantiles99thPercentileTime" : 0
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-04-29 Thread Andrew Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Chung updated YARN-10760:

Description: 
{{AbstractYarnScheduler.completedContainers}} can potentially be called from 
multiple sources, yet it appears that there are scenarios in which the caller 
does not hold the appropriate lock, which can lead to the count of 
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
To prevent double counting when releasing allocated O containers, a simple fix 
might be to check if the {{RMContainer}} has already been removed beforehand, 
though that may not fix the underlying issue that causes the race condition.

Following is "capture" of 
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
JMX query:

{noformat}
{
"name" : 
"Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
"modelerType" : "OpportunisticSchedulerMetrics",
"tag.OpportunisticSchedulerMetrics" : "ResourceManager",
"tag.Context" : "yarn",
"tag.Hostname" : "",
"AllocatedOContainers" : -2716,
"AggregateOContainersAllocated" : 306020,
"AggregateOContainersReleased" : 308736,
"AggregateNodeLocalOContainersAllocated" : 0,
"AggregateRackLocalOContainersAllocated" : 0,
"AggregateOffSwitchOContainersAllocated" : 306020,
"AllocateLatencyOQuantilesNumOps" : 0,
"AllocateLatencyOQuantiles50thPercentileTime" : 0,
"AllocateLatencyOQuantiles75thPercentileTime" : 0,
"AllocateLatencyOQuantiles90thPercentileTime" : 0,
"AllocateLatencyOQuantiles95thPercentileTime" : 0,
"AllocateLatencyOQuantiles99thPercentileTime" : 0
  }
{noformat}

  was:
{{AbstractYarnScheduler.completedContainers}} can potentially be called from 
multiple sources, yet it appears that there are scenarios in which the caller 
does not hold the appropriate lock, which can lead to the count of 
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
To prevent double counting when releasing allocated O containers, a simple fix 
might be to check if the {{RMContainer}} has already been removed beforehand, 
though that may not fix the underlying issue that causes the race condition.

Following is a screenshot of 
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
JMX query:

{noformat}
{
"name" : 
"Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
"modelerType" : "OpportunisticSchedulerMetrics",
"tag.OpportunisticSchedulerMetrics" : "ResourceManager",
"tag.Context" : "yarn",
"tag.Hostname" : "",
"AllocatedOContainers" : -2716,
"AggregateOContainersAllocated" : 306020,
"AggregateOContainersReleased" : 308736,
"AggregateNodeLocalOContainersAllocated" : 0,
"AggregateRackLocalOContainersAllocated" : 0,
"AggregateOffSwitchOContainersAllocated" : 306020,
"AllocateLatencyOQuantilesNumOps" : 0,
"AllocateLatencyOQuantiles50thPercentileTime" : 0,
"AllocateLatencyOQuantiles75thPercentileTime" : 0,
"AllocateLatencyOQuantiles90thPercentileTime" : 0,
"AllocateLatencyOQuantiles95thPercentileTime" : 0,
"AllocateLatencyOQuantiles99thPercentileTime" : 0
  }
{noformat}


> Number of allocated OPPORTUNISTIC containers can dip below 0
> 
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Andrew Chung
>Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from 
> multiple sources, yet it appears that there are scenarios in which the caller 
> does not hold the appropriate lock, which can lead to the count of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple 
> fix might be to check if the {{RMContainer}} has already been removed 
> beforehand, though that may not fix the underlying issue that causes the race 
> condition.
> Following is "capture" of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
> JMX query:
> {noformat}
> {
> "name" : 
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> 

[jira] [Created] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-04-29 Thread Andrew Chung (Jira)
Andrew Chung created YARN-10760:
---

 Summary: Number of allocated OPPORTUNISTIC containers can dip 
below 0
 Key: YARN-10760
 URL: https://issues.apache.org/jira/browse/YARN-10760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Andrew Chung


{{AbstractYarnScheduler.completedContainers}} can potentially be called from 
multiple sources, yet it appears that there are scenarios in which the caller 
does not hold the appropriate lock, which can lead to the count of 
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
To prevent double counting when releasing allocated O containers, a simple fix 
might be to check if the {{RMContainer}} has already been removed beforehand, 
though that may not fix the underlying issue that causes the race condition.

Following is a screenshot of 
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
JMX query:

{noformat}
{
"name" : 
"Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
"modelerType" : "OpportunisticSchedulerMetrics",
"tag.OpportunisticSchedulerMetrics" : "ResourceManager",
"tag.Context" : "yarn",
"tag.Hostname" : "",
"AllocatedOContainers" : -2716,
"AggregateOContainersAllocated" : 306020,
"AggregateOContainersReleased" : 308736,
"AggregateNodeLocalOContainersAllocated" : 0,
"AggregateRackLocalOContainersAllocated" : 0,
"AggregateOffSwitchOContainersAllocated" : 306020,
"AllocateLatencyOQuantilesNumOps" : 0,
"AllocateLatencyOQuantiles50thPercentileTime" : 0,
"AllocateLatencyOQuantiles75thPercentileTime" : 0,
"AllocateLatencyOQuantiles90thPercentileTime" : 0,
"AllocateLatencyOQuantiles95thPercentileTime" : 0,
"AllocateLatencyOQuantiles99thPercentileTime" : 0
  }
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-04-29 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335401#comment-17335401
 ] 

Bilwa S T commented on YARN-10745:
--

Hi [~dmmkr]

Thanks for the patch. I have few minor comments
 * In ProportionalCapacityPreemptionPolicy.java  LOG.isDebugEnabled() check can 
be removed for below log

          
{quote}   LOG.debug("Send to scheduler: in app={} " +
                       "#containers-to-be-preemptionCandidates={}", 
appAttemptId,
                     e.getValue().size());
          
{quote}
 
 * Why do we need LOG.isDebugEnabled() check in AsyncDispatcher.java

Few suggestions
 *     In NodesListManager.java we can print below log only if either of the 
sets is not empty

{quote}               LOG.info("hostsReader include:\{" +StringUtils.join(",", 
hostsReader.getHosts()) +"} exclude:{" +
               StringUtils.join(",", hostsReader.getExcludedHosts()) + "}");
{quote}
            

> Change Log level from info to debug for few logs and remove unnecessary 
> debuglog checks
> ---
>
> Key: YARN-10745
> URL: https://issues.apache.org/jira/browse/YARN-10745
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-10745.001.patch
>
>
> Change the info log level to debug for few logs so that the load on the 
> logger decreases in large cluster and improves the performance.
> Remove the unnecessary isDebugEnabled() checks for printing strings without 
> any string concatenation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10505) Extend the maximum-capacity property to react to weight mode changes

2021-04-29 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori resolved YARN-10505.
-
Resolution: Duplicate

> Extend the maximum-capacity property to react to weight mode changes
> 
>
> Key: YARN-10505
> URL: https://issues.apache.org/jira/browse/YARN-10505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Priority: Major
>
> The property root.users.maximum-capacity could mean the following things:
>  * Relative Percentage: maximum capacity relative to its parent. If it’s set 
> to 50, then it means that the capacity is capped with respect to the parent.
>  * Absolute Percentage: maximum capacity expressed as a percentage of the 
> overall cluster capacity.
>  * Percentages of different resource types: this would refer to vCores, 
> memory, GPU, etc... Similarly to the single percentage value, this could 
> either mean percentage of the parent or percentage of the overall cluster 
> resource.
>  * Absolute limit: explicit definition of vCores and memory like vcores=20, 
> memory-mb=16384. 
>  
> Note that Fair Scheduler supports the following settings:
>  * Single percentage (absolute)
>  * Two percentages (absolute)
>  * Absolute resources
>  
> It is recommended that all three formats are supported for maximum-capacity 
> after introducing weight mode. The final form of the configuration for 
> example could look like this:
> root.users.maximum-capacity = 100% - single percentage
> root.users.maximum-capacity = (vcores=100%, memory-mb=100%) - two percentages
> root.users.maximum-capacity = (vcores=10, memory-mb=1mb) - absolute



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10505) Extend the maximum-capacity property to react to weight mode changes

2021-04-29 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335286#comment-17335286
 ] 

Andras Gyori commented on YARN-10505:
-

This will be covered in YARN-9936. Closing it.

> Extend the maximum-capacity property to react to weight mode changes
> 
>
> Key: YARN-10505
> URL: https://issues.apache.org/jira/browse/YARN-10505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Priority: Major
>
> The property root.users.maximum-capacity could mean the following things:
>  * Relative Percentage: maximum capacity relative to its parent. If it’s set 
> to 50, then it means that the capacity is capped with respect to the parent.
>  * Absolute Percentage: maximum capacity expressed as a percentage of the 
> overall cluster capacity.
>  * Percentages of different resource types: this would refer to vCores, 
> memory, GPU, etc... Similarly to the single percentage value, this could 
> either mean percentage of the parent or percentage of the overall cluster 
> resource.
>  * Absolute limit: explicit definition of vCores and memory like vcores=20, 
> memory-mb=16384. 
>  
> Note that Fair Scheduler supports the following settings:
>  * Single percentage (absolute)
>  * Two percentages (absolute)
>  * Absolute resources
>  
> It is recommended that all three formats are supported for maximum-capacity 
> after introducing weight mode. The final form of the configuration for 
> example could look like this:
> root.users.maximum-capacity = 100% - single percentage
> root.users.maximum-capacity = (vcores=100%, memory-mb=100%) - two percentages
> root.users.maximum-capacity = (vcores=10, memory-mb=1mb) - absolute



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9443) Fast RM Failover using Ratis (Raft protocol)

2021-04-29 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335205#comment-17335205
 ] 

Qi Zhu commented on YARN-9443:
--

[~prabhujoseph] [~ztang] [~ebadger] [~epayne]

Is this going on, now the state store is used to store in ZK, but in large 
cluster will not run very well. 

 YARN-5123 use sql based to store the state, but it is also not a hot HA like 
NameNode in HDFS.

If we want to realize the hot HA for resourcemanager, it's a very good choice 
to use ratis(raft) to make the state consistent in HA mode (the actvie RM state 
consistent with standby RM state, use log commit in raft), when we transform to 
standby we don't need fence to load the large state from ZK, we can realize the 
hot HA.

Thanks.

> Fast RM Failover using Ratis (Raft protocol)
> 
>
> Key: YARN-9443
> URL: https://issues.apache.org/jira/browse/YARN-9443
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> During Failover, the RM Standby will have a lag as it has to recover from 
> Zookeeper / FileSystem StateStore. RM HA using Ratis (Raft Protocol) can 
> achieve Fast failover as all RMs are in sync already. This is used by Ozone - 
> HDDS-505.
>  
> cc [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org