[jira] [Updated] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10761:
--
Attachment: YARN-10761.003.patch

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, YARN-10761.002.patch, 
> YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, 
> image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340582#comment-17340582
 ] 

Qi Zhu commented on YARN-10761:
---

Fixed checkstyle in latest patch.

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, YARN-10761.002.patch, 
> YARN-10761.003.patch, image-2021-05-06-16-38-51-406.png, 
> image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340580#comment-17340580
 ] 

Hadoop QA commented on YARN-10761:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
17s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
33s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  0s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
10s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/963/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 4 new + 46 unchanged - 0 fixed = 50 total (was 46) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  6s{color} | 

[jira] [Updated] (YARN-9333) TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes fails intermittently

2021-05-06 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-9333:

Fix Version/s: 3.3.1

Backported to branch-3.3.

> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittently
> 
>
> Key: YARN-9333
> URL: https://issues.apache.org/jira/browse/YARN-9333
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-9333-001.patch, YARN-9333-002.patch, 
> YARN-9333-003.patch, YARN-9333-debug1.patch
>
>
> TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes
>  fails intermittent - observed in YARN-9311.
> {code}
> [ERROR] 
> testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes[MinSharePreemptionWithDRF](org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption)
>   Time elapsed: 11.056 s  <<< FAILURE!
> java.lang.AssertionError: Incorrect # of containers on the greedy app 
> expected:<6> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyPreemption(TestFairSchedulerPreemption.java:296)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.verifyRelaxLocalityPreemption(TestFairSchedulerPreemption.java:537)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption.testRelaxLocalityPreemptionWithNoLessAMInRemainingNodes(TestFairSchedulerPreemption.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> 

[jira] [Updated] (YARN-10515) Fix flaky test TestCapacitySchedulerAutoQueueCreation.testDynamicAutoQueueCreationWithTags

2021-05-06 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10515:
-
Fix Version/s: 3.3.1

Backported to branch-3.3.

> Fix flaky test 
> TestCapacitySchedulerAutoQueueCreation.testDynamicAutoQueueCreationWithTags
> --
>
> Key: YARN-10515
> URL: https://issues.apache.org/jira/browse/YARN-10515
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10515-001.patch
>
>
> The testcase 
> TestCapacitySchedulerAutoQueueCreation.testDynamicAutoQueueCreationWithTags 
> sometimes fails with the following error:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:174)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:110)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:884)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1296)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1018)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:158)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:134)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:130)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation$5.(TestCapacitySchedulerAutoQueueCreation.java:873)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation.testDynamicAutoQueueCreationWithTags(TestCapacitySchedulerAutoQueueCreation.java:873)
> ...
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition=,q0=root,q1=a already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionQueueMetrics(QueueMetrics.java:317)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.setAvailableResourcesToQueue(QueueMetrics.java:513)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:308)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:412)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:350)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setupQueueConfigs(ParentQueue.java:137)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractManagedParentQueue.(AbstractManagedParentQueue.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ManagedParentQueue.(ManagedParentQueue.java:57)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:261)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:289)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(CapacitySchedulerQueueManager.java:162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:752)
> {noformat}
> We have to reset 

[jira] [Commented] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-05-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340570#comment-17340570
 ] 

Qi Zhu commented on YARN-10755:
---

Thanks [~chaosju] report , and [~BilwaST] for taking this.

I will help review it.

> Multithreaded loading Apps from zk statestore
> -
>
> Key: YARN-10755
> URL: https://issues.apache.org/jira/browse/YARN-10755
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: version: hadooop-2.8.5
>Reporter: chaosju
>Assignee: Bilwa S T
>Priority: Major
> Attachments: image-2021-04-27-12-55-18-710.png
>
>
> In RM, we may be get a list of applications to be read from state store and 
> then divide the work of reading data associated with each app  to multiple 
> threads.
> I think its import to large clusters.
> h2. Profile
> Profile by  TestZKRMStateStorePerf 
> Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 
> Profile Result: loadRMAppState stage cost is 5s.
> Profile logs:
> !image-2021-04-27-12-55-18-710.png!  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-05-06 Thread D M Murali Krishna Reddy (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

D M Murali Krishna Reddy updated YARN-10745:

Attachment: YARN-10745.005.patch

> Change Log level from info to debug for few logs and remove unnecessary 
> debuglog checks
> ---
>
> Key: YARN-10745
> URL: https://issues.apache.org/jira/browse/YARN-10745
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-10745.001.patch, YARN-10745.002.patch, 
> YARN-10745.003.patch, YARN-10745.004.patch, YARN-10745.005.patch
>
>
> Change the info log level to debug for few logs so that the load on the 
> logger decreases in large cluster and improves the performance.
> Remove the unnecessary isDebugEnabled() checks for printing strings without 
> any string concatenation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-05-06 Thread D M Murali Krishna Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340555#comment-17340555
 ] 

D M Murali Krishna Reddy commented on YARN-10745:
-

Thanks [~ebadger]  for the review comments.

Regarding the wrong indentation, In the initial patches I have followed the 
correct level of indentation but the hadoopQA checkstyle was showing error, So 
I have changed the indentation to fix the checkstyle warnings. I will change 
the indentation level as per your review.

 

Is {{clusterNodeReports}} guaranteeed to be non-null here?

Yes, as per my understanding the clusterNodeReports will never be null. If it 
was null we would be getting a NPE in the below for loop anyway. Also, I think 
findbugs would catch this type of potential NPE, So I don't think it is a 
problem.

 
{code:java}
-// NodeManager is the last service to start, so NodeId is available.
+// NodeStatusUpdater is the last service to start, so NodeId is available.

{code}
Regarding the above change, I have misunderstood the old comment and changed 
it. Will be reverting it.

 
{code:java}
+  LOG.info("Callback succeeded for initializing request processing " +
+  "pipeline for an AM ");
{code}
I haven't debugged AMRMProxy a lot, but going through the code found it might 
be useful to have this log. If you feel it is not required and doesn't add any 
value, I can remove it.

 
{code:java}
-LOG.info("hostsReader include:{" +
-StringUtils.join(",", hostsReader.getHosts()) +
-"} exclude:{" +
-StringUtils.join(",", hostsReader.getExcludedHosts()) + "}");
-
+if (!hostsReader.getHosts().isEmpty() ||
+!hostsReader.getExcludedHosts().isEmpty()) {
+  LOG.info("hostsReader include:{" +
+  StringUtils.join(",", hostsReader.getHosts()) +
+  "} exclude:{" +
+  StringUtils.join(",", hostsReader.getExcludedHosts()) + "}");
+}
{code}
I have added this change as per the suggestion of [~BilwaST], I will remove 
this change in the 005 patch.

 

> Change Log level from info to debug for few logs and remove unnecessary 
> debuglog checks
> ---
>
> Key: YARN-10745
> URL: https://issues.apache.org/jira/browse/YARN-10745
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-10745.001.patch, YARN-10745.002.patch, 
> YARN-10745.003.patch, YARN-10745.004.patch
>
>
> Change the info log level to debug for few logs so that the load on the 
> logger decreases in large cluster and improves the performance.
> Remove the unnecessary isDebugEnabled() checks for printing strings without 
> any string concatenation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340529#comment-17340529
 ] 

Qi Zhu commented on YARN-10761:
---

Thanks a lot [~ebadger] for review.

I have changed the two create to one, and use a local variable to save.

Updated it in latest patch.

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, YARN-10761.002.patch, 
> image-2021-05-06-16-38-51-406.png, image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10761:
--
Attachment: YARN-10761.002.patch

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, YARN-10761.002.patch, 
> image-2021-05-06-16-38-51-406.png, image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10762) docker builds having problems with non root user on MAC OS

2021-05-06 Thread chaosju (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaosju updated YARN-10762:
---
Affects Version/s: 3.2.1

> docker builds having problems with non root user on MAC OS
> --
>
> Key: YARN-10762
> URL: https://issues.apache.org/jira/browse/YARN-10762
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: mac os
>Reporter: chaosju
>Priority: Major
>
> [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-common ---
> [INFO] Executing tasks
> main:
>  [mkdir] Created dir: 
> /home/chaosju/hadoop/hadoop-common-project/hadoop-common/target/test-dir
>  [mkdir] Created dir: 
> /home/chaosju/hadoop/hadoop-common-project/hadoop-common/target/test/data
> [INFO] Executed tasks
> [INFO] 
> [INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-os) @ 
> hadoop-common ---
> [INFO] Skipping Rule Enforcement.
> [INFO] 
> [INFO] --- protobuf-maven-plugin:0.5.1:compile (src-compile-protoc) @ 
> hadoop-common ---
> [INFO] Compiling 14 proto file(s) to 
> /home/chaosju/hadoop/hadoop-common-project/hadoop-common/target/generated-sources/java
> [INFO] 
> [INFO] --- maven-antrun-plugin:1.7:run (isal) @ hadoop-common ---
> [INFO] Executing tasks
> main:
>  [exec] make --no-print-directory install-am
>  [exec] CC erasure_code/ec_base.lo
>  [exec] CC raid/raid_base.lo
>  [exec] CC erasure_code/ec_highlevel_func.lo
>  [exec] CCLD libisal.la
>  [exec] ar: `u' modifier ignored since `D' is the default (see `U')
>  [exec] CCLD programs/igzip
>  [exec] /usr/bin/install: cannot create regular file 
> '/usr/lib/libisal.so.2.0.30': Permission denied
>  [exec] make[2]: *** [install-libLTLIBRARIES] Error 1
>  [exec] make[1]: *** [install-am] Error 2
>  [exec] make: *** [install] Error 2
>  [exec] /bin/mkdir -p '/usr/lib'
>  [exec] /bin/bash ./libtool --mode=install /usr/bin/install -c libisal.la 
> '/usr/lib'
>  [exec] libtool: install: /usr/bin/install -c .libs/libisal.so.2.0.30 
> /usr/lib/libisal.so.2.0.30
>  [exec] Makefile:2099: recipe for target 'install-libLTLIBRARIES' failed
>  [exec] Makefile:3992: recipe for target 'install-am' failed
>  [exec] Makefile:3986: recipe for target 'install' failed
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Hadoop Main . SUCCESS [ 2.416 s]
> [INFO] Apache Hadoop Build Tools .. SUCCESS [ 27.978 
> s]
> [INFO] Apache Hadoop Project POM .. SUCCESS [ 2.207 s]
> [INFO] Apache Hadoop Annotations .. SUCCESS [ 1.917 s]
> [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.631 s]
> [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 2.373 s]
> [INFO] Apache Hadoop Maven Plugins  SUCCESS [ 5.202 s]
> [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 1.915 s]
> [INFO] Apache Hadoop Auth . SUCCESS [ 8.674 s]
> [INFO] Apache Hadoop Auth Examples  SUCCESS [ 2.773 s]
> [INFO] tq-security  SUCCESS [ 2.880 s]
> [INFO] Apache Hadoop Common ... FAILURE [ 5.035 s]
> [INFO] Apache Hadoop NFS .. SKIPPED
> [INFO] Apache Hadoop KMS .. SKIPPED
> [INFO] Apache Hadoop Common Project ... SKIPPED
> [INFO] Apache Hadoop HDFS Client .. SKIPPED
> [INFO] Apache Hadoop HDFS . SKIPPED
> [INFO] Apache Hadoop HDFS Native Client ... SKIPPED
> [INFO] Apache Hadoop HttpFS ... SKIPPED
> [INFO] Apache Hadoop HDFS-NFS . SKIPPED
> [INFO] Apache Hadoop HDFS-RBF . SKIPPED
> [INFO] Apache Hadoop HDFS Project . SKIPPED
> [INFO] Apache Hadoop YARN . SKIPPED
> [INFO] Apache Hadoop YARN API . SKIPPED
> [INFO] Apache Hadoop YARN Common .. SKIPPED
> [INFO] Apache Hadoop YARN Registry  SKIPPED
> [INFO] Apache Hadoop YARN Server .. SKIPPED
> [INFO] Apache Hadoop YARN Server Common ... SKIPPED
> [INFO] Apache Hadoop YARN NodeManager . SKIPPED
> [INFO] Apache Hadoop YARN Web Proxy ... SKIPPED
> [INFO] Apache Hadoop YARN ApplicationHistoryService ... SKIPPED
> [INFO] Apache Hadoop YARN Timeline Service  SKIPPED
> [INFO] Apache Hadoop YARN ResourceManager . SKIPPED
> [INFO] Apache Hadoop YARN 

[jira] [Created] (YARN-10762) docker builds having problems with non root user on MAC OS

2021-05-06 Thread chaosju (Jira)
chaosju created YARN-10762:
--

 Summary: docker builds having problems with non root user on MAC OS
 Key: YARN-10762
 URL: https://issues.apache.org/jira/browse/YARN-10762
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: mac os
Reporter: chaosju


[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-common ---
[INFO] Executing tasks

main:
 [mkdir] Created dir: 
/home/chaosju/hadoop/hadoop-common-project/hadoop-common/target/test-dir
 [mkdir] Created dir: 
/home/chaosju/hadoop/hadoop-common-project/hadoop-common/target/test/data
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M1:enforce (enforce-os) @ hadoop-common 
---
[INFO] Skipping Rule Enforcement.
[INFO] 
[INFO] --- protobuf-maven-plugin:0.5.1:compile (src-compile-protoc) @ 
hadoop-common ---
[INFO] Compiling 14 proto file(s) to 
/home/chaosju/hadoop/hadoop-common-project/hadoop-common/target/generated-sources/java
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (isal) @ hadoop-common ---
[INFO] Executing tasks

main:
 [exec] make --no-print-directory install-am
 [exec] CC erasure_code/ec_base.lo
 [exec] CC raid/raid_base.lo
 [exec] CC erasure_code/ec_highlevel_func.lo
 [exec] CCLD libisal.la
 [exec] ar: `u' modifier ignored since `D' is the default (see `U')
 [exec] CCLD programs/igzip
 [exec] /usr/bin/install: cannot create regular file 
'/usr/lib/libisal.so.2.0.30': Permission denied
 [exec] make[2]: *** [install-libLTLIBRARIES] Error 1
 [exec] make[1]: *** [install-am] Error 2
 [exec] make: *** [install] Error 2
 [exec] /bin/mkdir -p '/usr/lib'
 [exec] /bin/bash ./libtool --mode=install /usr/bin/install -c libisal.la 
'/usr/lib'
 [exec] libtool: install: /usr/bin/install -c .libs/libisal.so.2.0.30 
/usr/lib/libisal.so.2.0.30
 [exec] Makefile:2099: recipe for target 'install-libLTLIBRARIES' failed
 [exec] Makefile:3992: recipe for target 'install-am' failed
 [exec] Makefile:3986: recipe for target 'install' failed
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main . SUCCESS [ 2.416 s]
[INFO] Apache Hadoop Build Tools .. SUCCESS [ 27.978 s]
[INFO] Apache Hadoop Project POM .. SUCCESS [ 2.207 s]
[INFO] Apache Hadoop Annotations .. SUCCESS [ 1.917 s]
[INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.631 s]
[INFO] Apache Hadoop Project Dist POM . SUCCESS [ 2.373 s]
[INFO] Apache Hadoop Maven Plugins  SUCCESS [ 5.202 s]
[INFO] Apache Hadoop MiniKDC .. SUCCESS [ 1.915 s]
[INFO] Apache Hadoop Auth . SUCCESS [ 8.674 s]
[INFO] Apache Hadoop Auth Examples  SUCCESS [ 2.773 s]
[INFO] tq-security  SUCCESS [ 2.880 s]
[INFO] Apache Hadoop Common ... FAILURE [ 5.035 s]
[INFO] Apache Hadoop NFS .. SKIPPED
[INFO] Apache Hadoop KMS .. SKIPPED
[INFO] Apache Hadoop Common Project ... SKIPPED
[INFO] Apache Hadoop HDFS Client .. SKIPPED
[INFO] Apache Hadoop HDFS . SKIPPED
[INFO] Apache Hadoop HDFS Native Client ... SKIPPED
[INFO] Apache Hadoop HttpFS ... SKIPPED
[INFO] Apache Hadoop HDFS-NFS . SKIPPED
[INFO] Apache Hadoop HDFS-RBF . SKIPPED
[INFO] Apache Hadoop HDFS Project . SKIPPED
[INFO] Apache Hadoop YARN . SKIPPED
[INFO] Apache Hadoop YARN API . SKIPPED
[INFO] Apache Hadoop YARN Common .. SKIPPED
[INFO] Apache Hadoop YARN Registry  SKIPPED
[INFO] Apache Hadoop YARN Server .. SKIPPED
[INFO] Apache Hadoop YARN Server Common ... SKIPPED
[INFO] Apache Hadoop YARN NodeManager . SKIPPED
[INFO] Apache Hadoop YARN Web Proxy ... SKIPPED
[INFO] Apache Hadoop YARN ApplicationHistoryService ... SKIPPED
[INFO] Apache Hadoop YARN Timeline Service  SKIPPED
[INFO] Apache Hadoop YARN ResourceManager . SKIPPED
[INFO] Apache Hadoop YARN Server Tests  SKIPPED
[INFO] Apache Hadoop YARN Client .. SKIPPED
[INFO] Apache Hadoop YARN SharedCacheManager .. SKIPPED
[INFO] Apache Hadoop YARN Timeline Plugin Storage . SKIPPED
[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SKIPPED
[INFO] Apache Hadoop YARN TimelineService HBase Common  SKIPPED
[INFO] Apache Hadoop YARN TimelineService 

[jira] [Commented] (YARN-9279) Remove the old hamlet package

2021-05-06 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340483#comment-17340483
 ] 

Hadoop QA commented on YARN-9279:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red}{color} | {color:red} YARN-9279 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9279 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957469/YARN-9279.01.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/962/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Remove the old hamlet package
> -
>
> Key: YARN-9279
> URL: https://issues.apache.org/jira/browse/YARN-9279
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-9279.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The old hamlet package was deprecated in HADOOP-11875. Let's remove this to 
> improve the maintenability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9279) Remove the old hamlet package

2021-05-06 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340481#comment-17340481
 ] 

Akira Ajisaka commented on YARN-9279:
-

PR opened: https://github.com/apache/hadoop/pull/2986

> Remove the old hamlet package
> -
>
> Key: YARN-9279
> URL: https://issues.apache.org/jira/browse/YARN-9279
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-9279.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The old hamlet package was deprecated in HADOOP-11875. Let's remove this to 
> improve the maintenability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9279) Remove the old hamlet package

2021-05-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-9279:
-
Labels: pull-request-available  (was: )

> Remove the old hamlet package
> -
>
> Key: YARN-9279
> URL: https://issues.apache.org/jira/browse/YARN-9279
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-9279.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The old hamlet package was deprecated in HADOOP-11875. Let's remove this to 
> improve the maintenability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10756) Remove additional junit 4.11 dependency from javadoc

2021-05-06 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10756:
-
Summary: Remove additional junit 4.11 dependency from javadoc  (was: 
Upgrade JUnit to 4.13.1)

> Remove additional junit 4.11 dependency from javadoc
> 
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test, timelineservice
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340347#comment-17340347
 ] 

Eric Badger commented on YARN-10761:


Thanks for the patch, [~zhuqi]. 

Is there a reason we need to call {{create()}} twice for each metric? The code 
in the patch calls it onceee to create the {{GenericEventTypeMetricsManager}} 
and then again just so that it can call {{getEnumClass()}}. Seems better to 
save the first {{create()}} call off into a local variable and then call 
{{getEnumClass()}} on that so we don't have to call {{create()}} twice per 
metric

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, image-2021-05-06-16-38-51-406.png, 
> image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10760) Number of allocated OPPORTUNISTIC containers can dip below 0

2021-05-06 Thread Andrew Chung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340276#comment-17340276
 ] 

Andrew Chung commented on YARN-10760:
-

[~elgoiri] My purported fix did not work and we are still observing the issue 
in production. May need further investigation.

> Number of allocated OPPORTUNISTIC containers can dip below 0
> 
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: Andrew Chung
>Assignee: Andrew Chung
>Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from 
> multiple sources, yet it appears that there are scenarios in which the caller 
> does not hold the appropriate lock, which can lead to the count of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple 
> fix might be to check if the {{RMContainer}} has already been removed 
> beforehand, though that may not fix the underlying issue that causes the race 
> condition.
> Following is "capture" of 
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a 
> JMX query:
> {noformat}
> {
> "name" : 
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> "AllocateLatencyOQuantiles50thPercentileTime" : 0,
> "AllocateLatencyOQuantiles75thPercentileTime" : 0,
> "AllocateLatencyOQuantiles90thPercentileTime" : 0,
> "AllocateLatencyOQuantiles95thPercentileTime" : 0,
> "AllocateLatencyOQuantiles99thPercentileTime" : 0
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-05-06 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned YARN-10755:


Assignee: Bilwa S T

> Multithreaded loading Apps from zk statestore
> -
>
> Key: YARN-10755
> URL: https://issues.apache.org/jira/browse/YARN-10755
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: version: hadooop-2.8.5
>Reporter: chaosju
>Assignee: Bilwa S T
>Priority: Major
> Attachments: image-2021-04-27-12-55-18-710.png
>
>
> In RM, we may be get a list of applications to be read from state store and 
> then divide the work of reading data associated with each app  to multiple 
> threads.
> I think its import to large clusters.
> h2. Profile
> Profile by  TestZKRMStateStorePerf 
> Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 
> Profile Result: loadRMAppState stage cost is 5s.
> Profile logs:
> !image-2021-04-27-12-55-18-710.png!  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10258) Add metrics for 'ApplicationsRunning' in NodeManager

2021-05-06 Thread ANANDA G B (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340154#comment-17340154
 ] 

ANANDA G B edited comment on YARN-10258 at 5/6/21, 1:49 PM:


[~ebadger], [~BilwaST], [~brahmareddy] Fixed the UT and updated the patch, can 
you review and merge to trunk


was (Author: gb.ana...@gmail.com):
[~BilwaST] [~brahmareddy] [~Hemanth Boyina] Fixed the UT and updated the patch

> Add metrics for 'ApplicationsRunning' in NodeManager
> 
>
> Key: YARN-10258
> URL: https://issues.apache.org/jira/browse/YARN-10258
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: ANANDA G B
>Assignee: ANANDA G B
>Priority: Minor
> Attachments: YARN-10258-001.patch, YARN-10258-002.patch, 
> YARN-10258-003.patch, YARN-10258-005.patch, YARN-10258-006.patch, 
> YARN-10258-007.patch, YARN-10258-008.patch, YARN-10258-009.patch, 
> YARN-10258-010.patch, YARN-10258_004.patch
>
>
> Add metrics for 'ApplicationsRunning' in NodeManagers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-05-06 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340181#comment-17340181
 ] 

Bilwa S T commented on YARN-9615:
-

[~pbacsko] No problem. I just want this to be merged before 3.3.1 release is 
done. Thanks

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10642) Race condition: AsyncDispatcher can get stuck by the changes introduced in YARN-8995

2021-05-06 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340172#comment-17340172
 ] 

Bilwa S T commented on YARN-10642:
--

Hi [~pbacsko]

YARN-8995 is merged to branch-3.1  so we need to backport it to branch-3.1 as 
well.

> Race condition: AsyncDispatcher can get stuck by the changes introduced in 
> YARN-8995
> 
>
> Key: YARN-10642
> URL: https://issues.apache.org/jira/browse/YARN-10642
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: MockForDeadLoop.java, YARN-10642-branch-3.2.001.patch, 
> YARN-10642-branch-3.2.002.patch, YARN-10642-branch-3.3.001.patch, 
> YARN-10642.001.patch, YARN-10642.002.patch, YARN-10642.003.patch, 
> YARN-10642.004.patch, YARN-10642.005.patch, deadloop.png, debugfornode.png, 
> put.png, take.png
>
>
> In our cluster, ResouceManager stuck twice within twenty days. Yarn client 
> can't submit application. I got jstack info at second time, then found the 
> reason.
> I analyze all the jstack, I found many thread stuck because can't get 
> LinkedBlockingQueue.putLock. (Note: Sorry for limited space , omit the 
> analytical process)
> The reason is that one thread hold the putLock all the time, 
> printEventQueueDetails will called forEachRemaining, then hold putLock and 
> readLock. The AsyncDispatcher will stuck.
> {code}
> Thread 6526 (IPC Server handler 454 on default port 8030):
>   State: RUNNABLE
>   Blocked count: 29988
>   Waited count: 2035029
>   Stack:
> 
> java.util.concurrent.LinkedBlockingQueue$LBQSpliterator.forEachRemaining(LinkedBlockingQueue.java:926)
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.printEventQueueDetails(AsyncDispatcher.java:270)
> 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:295)
> 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.handleProgress(DefaultAMSProcessor.java:408)
> 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:215)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:432)
> 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1040)
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:958)
> java.security.AccessController.doPrivileged(Native Method)
> {code}
> I analyze LinkedBlockingQueue's source code. I found forEachRemaining in 
> LinkedBlockingQueue.LBQSpliterator may stuck, when forEachRemaining and take 
> are called in different thread. 
> YARN-8995 introduce printEventQueueDetails method, 
> "eventQueue.stream().collect" will called forEachRemaining method.
> Let's see why? "put.png" shows that how to put("a"), "take.png" shows that 
> how to take()。Specical Node: The removed Node will point itself for help gc!!!
> The key point code is in forEachRemaining, we see LBQSpliterator use 
> forEachRemaining to visit all Node. But when got item value from Node, will 
> release the lock. If at this time, take() will be called. 
> The variable 'p' in forEachRemaining may point a Node which point itself, 
> then forEachRemaining will be in dead loop. You can see it in "deadloop.png"
> Let's see a simple uni-test, Let's forEachRemaining called more slow than 
> take, the problem will reproduction。uni-test is MockForDeadLoop.java.
> I debug MockForDeadLoop.java, and see a Node point itself. You can see pic 
> 

[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-05-06 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340163#comment-17340163
 ] 

Peter Bacsko commented on YARN-9615:


[~BilwaST] I'm currently on vacation, I can get back to this on Monday. 

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10258) Add metrics for 'ApplicationsRunning' in NodeManager

2021-05-06 Thread ANANDA G B (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340154#comment-17340154
 ] 

ANANDA G B commented on YARN-10258:
---

[~BilwaST] [~brahmareddy] [~Hemanth Boyina] Fixed the UT and updated the patch

> Add metrics for 'ApplicationsRunning' in NodeManager
> 
>
> Key: YARN-10258
> URL: https://issues.apache.org/jira/browse/YARN-10258
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: ANANDA G B
>Assignee: ANANDA G B
>Priority: Minor
> Attachments: YARN-10258-001.patch, YARN-10258-002.patch, 
> YARN-10258-003.patch, YARN-10258-005.patch, YARN-10258-006.patch, 
> YARN-10258-007.patch, YARN-10258-008.patch, YARN-10258-009.patch, 
> YARN-10258-010.patch, YARN-10258_004.patch
>
>
> Add metrics for 'ApplicationsRunning' in NodeManagers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10258) Add metrics for 'ApplicationsRunning' in NodeManager

2021-05-06 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340150#comment-17340150
 ] 

Hadoop QA commented on YARN-10258:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 31s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
37s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
33s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
26s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
26s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 70 unchanged - 1 fixed = 70 total (was 71) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 52s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| 

[jira] [Commented] (YARN-10258) Add metrics for 'ApplicationsRunning' in NodeManager

2021-05-06 Thread ANANDA G B (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340131#comment-17340131
 ] 

ANANDA G B commented on YARN-10258:
---

Updated latest patch with UT fix

> Add metrics for 'ApplicationsRunning' in NodeManager
> 
>
> Key: YARN-10258
> URL: https://issues.apache.org/jira/browse/YARN-10258
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: ANANDA G B
>Assignee: ANANDA G B
>Priority: Minor
> Attachments: YARN-10258-001.patch, YARN-10258-002.patch, 
> YARN-10258-003.patch, YARN-10258-005.patch, YARN-10258-006.patch, 
> YARN-10258-007.patch, YARN-10258-008.patch, YARN-10258-009.patch, 
> YARN-10258-010.patch, YARN-10258_004.patch
>
>
> Add metrics for 'ApplicationsRunning' in NodeManagers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10258) Add metrics for 'ApplicationsRunning' in NodeManager

2021-05-06 Thread ANANDA G B (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ANANDA G B updated YARN-10258:
--
Attachment: YARN-10258-010.patch

> Add metrics for 'ApplicationsRunning' in NodeManager
> 
>
> Key: YARN-10258
> URL: https://issues.apache.org/jira/browse/YARN-10258
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.3
>Reporter: ANANDA G B
>Assignee: ANANDA G B
>Priority: Minor
> Attachments: YARN-10258-001.patch, YARN-10258-002.patch, 
> YARN-10258-003.patch, YARN-10258-005.patch, YARN-10258-006.patch, 
> YARN-10258-007.patch, YARN-10258-008.patch, YARN-10258-009.patch, 
> YARN-10258-010.patch, YARN-10258_004.patch
>
>
> Add metrics for 'ApplicationsRunning' in NodeManagers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread chaosju (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340082#comment-17340082
 ] 

chaosju commented on YARN-10761:


(y)

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, image-2021-05-06-16-38-51-406.png, 
> image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340081#comment-17340081
 ] 

Qi Zhu commented on YARN-10761:
---

[~ebadger] [~pbacsko] [~gandras] [~bilwa_st]

Could you help review this?

And i have confirmed it in test case.

Thanks.

!image-2021-05-06-16-38-51-406.png|width=736,height=84!

!image-2021-05-06-16-39-28-362.png|width=698,height=93!

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, image-2021-05-06-16-38-51-406.png, 
> image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10761:
--
Attachment: image-2021-05-06-16-39-28-362.png

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, image-2021-05-06-16-38-51-406.png, 
> image-2021-05-06-16-39-28-362.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10761:
--
Attachment: image-2021-05-06-16-38-51-406.png

> Add more event type to RM Dispatcher event metrics.
> ---
>
> Key: YARN-10761
> URL: https://issues.apache.org/jira/browse/YARN-10761
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10761.001.patch, image-2021-05-06-16-38-51-406.png
>
>
> Since YARN-9615  add NodesListManagerEventType to event metrics.
> And we'd better add total 4 busy event type to the metrics according to 
> YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10761) Add more event type to RM Dispatcher event metrics.

2021-05-06 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10761:
-

 Summary: Add more event type to RM Dispatcher event metrics.
 Key: YARN-10761
 URL: https://issues.apache.org/jira/browse/YARN-10761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Qi Zhu
Assignee: Qi Zhu


Since YARN-9615  add NodesListManagerEventType to event metrics.

And we'd better add total 4 busy event type to the metrics according to 
YARN-9927.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism

2021-05-06 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339383#comment-17339383
 ] 

Qi Zhu edited comment on YARN-9927 at 5/6/21, 6:41 AM:
---

Great review and investigation!

Thanks very much  [~ebadger]  [~gandras] .

I agree with you that we should do some stress test done via SLS or manually. 
And the more generic way of event handling is a great improvement in YARN.

I will investigate how to use SLS to confirm the improvement.

And about the test, i will change it to test both the multi-thread and the 
single one.

 


was (Author: zhuqi):
Great review and investigation!

Thanks very much  [~ebadger] [~ebadger] .

I agree with you that we should do some stress test done via SLS or manually. 
And the more generic way of event handling is a great improvement in YARN.

I will investigate how to use SLS to confirm the improvement.

And about the test, i will change it to test both the multi-thread and the 
single one.

 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-05-06 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340015#comment-17340015
 ] 

Bilwa S T commented on YARN-9615:
-

[~pbacsko] can you please backport it to branch-3.3 ?

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org