[jira] [Comment Edited] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332972#comment-17332972
 ] 

Qi Zhu edited comment on YARN-10707 at 4/27/21, 6:25 AM:
-

Thanks [~ebadger] for patient review.

I have updated in latest patch.

We could add all plugin utilizations to the nodeUtilization object in a 
following jira.

After i investigation, i think we should use the totalNodeGpuUtilization both 
in the NM side and the RM side, you are right, and also can consistent with the 
CPU utilization.

Very good suggestion.

Thanks.


was (Author: zhuqi):
Thanks [~ebadger] for patient review.

We could add all plugin utilizations to the nodeUtilization object in a 
following jira.

After i investigation, i think we should use the totalNodeGpuUtilization both 
in the NM side and the RM side, you are right, and also can consistent with the 
CPU utilization.

Very good suggestion.

Thanks.

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332972#comment-17332972
 ] 

Qi Zhu commented on YARN-10707:
---

Thanks [~ebadger] for patient review.

We could add all plugin utilizations to the nodeUtilization object in a 
following jira.

After i investigation, i think we should use the totalNodeGpuUtilization both 
in the NM side and the RM side, you are right, and also can consistent with the 
CPU utilization.

Very good suggestion.

Thanks.

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10707:
--
Attachment: YARN-10707.008.patch

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10510) TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause NullPointerException

2021-04-26 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332955#comment-17332955
 ] 

Akira Ajisaka commented on YARN-10510:
--

Hi [~tuyu], I couldn't reproduce the error. What command did you run?

> TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt  will cause 
> NullPointerException
> 
>
> Key: YARN-10510
> URL: https://issues.apache.org/jira/browse/YARN-10510
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: tuyu
>Priority: Minor
>  Labels: test
> Attachments: YARN-10510.001.patch
>
>
> run TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause blow 
> exception
> {code:java}
> 2020-12-01 20:16:41,412 ERROR [main] webapp.AppBlock 
> (AppBlock.java:render(124)) - Failed to read the application 
> application_1234_.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.getApplicationReport(RMAppBlock.java:218)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:112)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt(TestAppPage.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
> Disconnected from the target VM, address: '127.0.0.1:60623', transport: 
> 'socket'
> {code}
> because of  mockClientRMService not mock getApplicationReport and 
> getApplicationAttempts interface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10510) TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause NullPointerException

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10510:
-
Fix Version/s: (was: 3.3.1)
   (was: 3.2.1)

Removed the "Fix versions". It is set by a committer when the fix is merged.

> TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt  will cause 
> NullPointerException
> 
>
> Key: YARN-10510
> URL: https://issues.apache.org/jira/browse/YARN-10510
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: tuyu
>Priority: Minor
>  Labels: test
> Attachments: YARN-10510.001.patch
>
>
> run TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause blow 
> exception
> {code:java}
> 2020-12-01 20:16:41,412 ERROR [main] webapp.AppBlock 
> (AppBlock.java:render(124)) - Failed to read the application 
> application_1234_.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.getApplicationReport(RMAppBlock.java:218)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:112)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt(TestAppPage.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
> Disconnected from the target VM, address: '127.0.0.1:60623', transport: 
> 'socket'
> {code}
> because of  mockClientRMService not mock getApplicationReport and 
> getApplicationAttempts interface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10731) Please upgrade jquery to 3.6.0

2021-04-26 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332940#comment-17332940
 ] 

Akira Ajisaka commented on YARN-10731:
--

JQuery is now 3.5.1, and now there is no CVE. Hi [~hxhefx], do you think there 
is any vulnerability in 3.5.1?

> Please upgrade jquery to 3.6.0
> --
>
> Key: YARN-10731
> URL: https://issues.apache.org/jira/browse/YARN-10731
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: buid, security, webapp
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
>
> Our fortify scan picked up a couple of security issues with the current 
> jquery lib being used by hadoop-yarn-common. Please upgrade it to the latest 
> version available (3.6.0). Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10731) Please upgrade jquery to 3.6.0

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10731:
-
Fix Version/s: (was: 3.4.0)
   (was: 3.3.0)

Removed "Fix versions". It is set by a committer when the fix is merged.

> Please upgrade jquery to 3.6.0
> --
>
> Key: YARN-10731
> URL: https://issues.apache.org/jira/browse/YARN-10731
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: buid, security, webapp
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
>
> Our fortify scan picked up a couple of security issues with the current 
> jquery lib being used by hadoop-yarn-common. Please upgrade it to the latest 
> version available (3.6.0). Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10756) Upgrade JUnit to 4.13.1

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10756:
-
   Component/s: timelineservice
External issue URL:   (was: 
https://issues.apache.org/jira/browse/HADOOP-17602)
Labels: pull-request-available  (was: TimeLine 
pull-request-available)

> Upgrade JUnit to 4.13.1
> ---
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test, timelineservice
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10756) Upgrade JUnit to 4.13.1

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10756:
--
Labels: TimeLine pull-request-available  (was: TimeLine)

> Upgrade JUnit to 4.13.1
> ---
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: TimeLine, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10756) Upgrade JUnit to 4.13.1

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned YARN-10756:


Assignee: Akira Ajisaka

> Upgrade JUnit to 4.13.1
> ---
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: TimeLine
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10756) Upgrade JUnit to 4.13.1

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10756:
-
Component/s: (was: security)

> Upgrade JUnit to 4.13.1
> ---
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Priority: Major
>  Labels: TimeLine
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10756) Upgrade JUnit to 4.13.1

2021-04-26 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332934#comment-17332934
 ] 

Akira Ajisaka commented on YARN-10756:
--

Moved to YARN.

> Upgrade JUnit to 4.13.1
> ---
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, test
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Priority: Major
>  Labels: TimeLine
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Moved] (YARN-10756) Upgrade JUnit to 4.13.1

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka moved HADOOP-17656 to YARN-10756:
---

  Component/s: (was: test)
   (was: security)
   (was: build)
   test
   security
   build
  Key: YARN-10756  (was: HADOOP-17656)
 Target Version/s: 3.4.0, 3.3.1, 3.1.5, 3.2.3  (was: 3.3.1)
Affects Version/s: (was: 3.1.1)
   3.1.1
   Issue Type: Bug  (was: Improvement)
  Project: Hadoop YARN  (was: Hadoop Common)

> Upgrade JUnit to 4.13.1
> ---
>
> Key: YARN-10756
> URL: https://issues.apache.org/jira/browse/YARN-10756
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, security, test
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Priority: Major
>  Labels: TimeLine
>
> Yarn Timeline Server still using 4.11 Junit version, need to upgrade it to 
> 4.13.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10561) Upgrade node.js to at least 10.x in YARN application catalog webapp

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10561:
-
Target Version/s: 3.4.0  (was: 3.4.0, 3.3.1)

> Upgrade node.js to at least 10.x in YARN application catalog webapp
> ---
>
> Key: YARN-10561
> URL: https://issues.apache.org/jira/browse/YARN-10561
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> YARN application catalog webapp is using node.js 8.11.3, and 8.x are already 
> EoL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-04-26 Thread chaosju (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaosju updated YARN-10755:
---
Attachment: image-2021-04-27-12-55-18-710.png

> Multithreaded loading Apps from zk statestore
> -
>
> Key: YARN-10755
> URL: https://issues.apache.org/jira/browse/YARN-10755
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: version: hadooop-2.8.5
>Reporter: chaosju
>Priority: Major
> Attachments: image-2021-04-27-12-55-18-710.png
>
>
>  
> In RM, we may be get a list of applications to be read from state store and 
> then divide the work of reading data associated with each app  to multiple 
> threads.
> I think its import to large clusters.
> h2. Profile
> Profile by  TestZKRMStateStorePerf 
> Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 
> Profile Result: loadRMAppState stage cost is 8s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-04-26 Thread chaosju (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaosju updated YARN-10755:
---
Description: 
In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.

I think its import to large clusters.
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 5s.

Profile logs:

!image-2021-04-27-12-55-18-710.png!  

 

 

  was:
 

In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.

I think its import to large clusters.
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 8s.

 


> Multithreaded loading Apps from zk statestore
> -
>
> Key: YARN-10755
> URL: https://issues.apache.org/jira/browse/YARN-10755
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: version: hadooop-2.8.5
>Reporter: chaosju
>Priority: Major
> Attachments: image-2021-04-27-12-55-18-710.png
>
>
> In RM, we may be get a list of applications to be read from state store and 
> then divide the work of reading data associated with each app  to multiple 
> threads.
> I think its import to large clusters.
> h2. Profile
> Profile by  TestZKRMStateStorePerf 
> Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 
> Profile Result: loadRMAppState stage cost is 5s.
> Profile logs:
> !image-2021-04-27-12-55-18-710.png!  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-04-26 Thread chaosju (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaosju updated YARN-10755:
---
Description: 
 

In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.

I think its import to large clusters.
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 8s.

 

  was:
 

In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 8s.

 


> Multithreaded loading Apps from zk statestore
> -
>
> Key: YARN-10755
> URL: https://issues.apache.org/jira/browse/YARN-10755
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: version: hadooop-2.8.5
>Reporter: chaosju
>Priority: Major
>
>  
> In RM, we may be get a list of applications to be read from state store and 
> then divide the work of reading data associated with each app  to multiple 
> threads.
> I think its import to large clusters.
> h2. Profile
> Profile by  TestZKRMStateStorePerf 
> Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 
> Profile Result: loadRMAppState stage cost is 8s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-04-26 Thread chaosju (Jira)
chaosju created YARN-10755:
--

 Summary: Multithreaded loading Apps from zk statestore
 Key: YARN-10755
 URL: https://issues.apache.org/jira/browse/YARN-10755
 Project: Hadoop YARN
  Issue Type: Improvement
 Environment: version: hadooop-2.8.5
Reporter: chaosju


 

In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.

 
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 8s.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10755) Multithreaded loading Apps from zk statestore

2021-04-26 Thread chaosju (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaosju updated YARN-10755:
---
Description: 
 

In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 8s.

 

  was:
 

In RM, we may be get a list of applications to be read from state store and 
then divide the work of reading data associated with each app  to multiple 
threads.

 
h2. Profile

Profile by  TestZKRMStateStorePerf 

Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 

Profile Result: loadRMAppState stage cost is 8s.

 


> Multithreaded loading Apps from zk statestore
> -
>
> Key: YARN-10755
> URL: https://issues.apache.org/jira/browse/YARN-10755
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: version: hadooop-2.8.5
>Reporter: chaosju
>Priority: Major
>
>  
> In RM, we may be get a list of applications to be read from state store and 
> then divide the work of reading data associated with each app  to multiple 
> threads.
> h2. Profile
> Profile by  TestZKRMStateStorePerf 
> Params: -appSize 2 -appattemptsize 2 -hostPort localhost:2181 
> Profile Result: loadRMAppState stage cost is 8s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for

2021-04-26 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10670:
-
Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Application Failure: desired = 20, completed = 20, allocated = 20, failed = 1, 
diagnostics = [2021-02-09 22:11:48.440]Container killed to make room for 
Guaranateed container.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are 
> killed then application is failed. But in this case as containers are killed 
> to make room for guaranteed containers which is not correct to fail an 
> application
> 
>
> Key: YARN-10670
> URL: https://issues.apache.org/jira/browse/YARN-10670
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.1
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::
>  yarn.resourcemanager.opportunistic-container-allocation.enabled
>  true
>  
>  # Set this in NM[s]yarn-site.xml ::: 
>  yarn.nodemanager.opportunistic-containers-max-queue-length
>  30
>  
>  
>  Test Steps:
> Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC -promote_opportunistic_after_start
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics 
> message
> {noformat}
> Application Failure: desired = 20, completed = 20, allocated = 20, failed = 
> 1, diagnostics = [2021-02-09 22:11:48.440]Container killed to make room for 
> Guaranateed container.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for

2021-04-26 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10670:
-
Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are 
> killed then application is failed. But in this case as containers are killed 
> to make room for guaranteed containers which is not correct to fail an 
> application
> 
>
> Key: YARN-10670
> URL: https://issues.apache.org/jira/browse/YARN-10670
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.1
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::
>  yarn.resourcemanager.opportunistic-container-allocation.enabled
>  true
>  
>  # Set this in NM[s]yarn-site.xml ::: 
>  yarn.nodemanager.opportunistic-containers-max-queue-length
>  30
>  
>  
>  Test Steps:
> Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC -promote_opportunistic_after_start
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics 
> message
> {noformat}
> Attempt recovered after RM restartApplication Failure: desired = 20, 
> completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
> 22:11:48.440]Container De-queued to meet NM queuing limits.
> [2021-02-09 22:11:48.441]Container terminated before launch.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10722) Improvement to DelegationTokenRenewer in RM

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332907#comment-17332907
 ] 

Qi Zhu commented on YARN-10722:
---

[~fengnanli] 

You can improve this by YARN-9768

But it only consider the app recovery case, i will improve the app submitted 
case in YARN-10754 .

Thanks.

> Improvement to DelegationTokenRenewer in RM
> ---
>
> Key: YARN-10722
> URL: https://issues.apache.org/jira/browse/YARN-10722
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: RM, yarn
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>
> We have seen multiple issues related with Yarn DelegationTokenRenewer, 
> especially when namenodes where the token was given had some issues (i.e. 
> standby down). This component has become a SPOF blocking all Yarn 
> applications to be accepted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10754) RM Renew Delegation token thread should timeout and retry should also consider app new submitted.

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332904#comment-17332904
 ] 

Qi Zhu commented on YARN-10754:
---

cc [~ebadger] [~epayne]   [~Jim_Brennan]  [~snemeth] [~pbacsko] [~gandras] 
[~fengnanli]

What's your opinions about this?
Thanks.

 

> RM Renew Delegation token thread should timeout and retry should also 
> consider app new submitted.
> -
>
> Key: YARN-10754
> URL: https://issues.apache.org/jira/browse/YARN-10754
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10754.001.patch, image-2021-04-27-11-38-29-162.png
>
>
> As  YARN-9768 described:
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
> But it only consider the app recovery, not consider the app submitted:
> !image-2021-04-27-11-38-29-162.png|width=516,height=428!
> It will cause the app submitted not retry, when renew token (HDFS Namenode/ 
> Router) timeout. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10754) RM Renew Delegation token thread should timeout and retry should also consider app new submitted.

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10754:
--
Description: 
As  YARN-9768 described:

Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews HDFS 
tokens received to check for validity and expiration time.

This call is made to an underlying HDFS NN or Router Node (which has exact APIs 
as HDFS NN). If one of the nodes is bad and the renew call is stuck the thread 
remains stuck indefinitely. The thread should ideally timeout the renewToken 
and retry from the client's perspective.

But it only consider the app recovery, not consider the app submitted:

!image-2021-04-27-11-38-29-162.png|width=516,height=428!

It will cause the app submitted not retry, when renew token (HDFS Namenode/ 
Router) timeout. 

  was:
As 

Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews HDFS 
tokens received to check for validity and expiration time.

This call is made to an underlying HDFS NN or Router Node (which has exact APIs 
as HDFS NN). If one of the nodes is bad and the renew call is stuck the thread 
remains stuck indefinitely. The thread should ideally timeout the renewToken 
and retry from the client's perspective.


> RM Renew Delegation token thread should timeout and retry should also 
> consider app new submitted.
> -
>
> Key: YARN-10754
> URL: https://issues.apache.org/jira/browse/YARN-10754
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10754.001.patch, image-2021-04-27-11-38-29-162.png
>
>
> As  YARN-9768 described:
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
> But it only consider the app recovery, not consider the app submitted:
> !image-2021-04-27-11-38-29-162.png|width=516,height=428!
> It will cause the app submitted not retry, when renew token (HDFS Namenode/ 
> Router) timeout. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10754) RM Renew Delegation token thread should timeout and retry should also consider app new submitted.

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10754:
--
Attachment: image-2021-04-27-11-38-29-162.png

> RM Renew Delegation token thread should timeout and retry should also 
> consider app new submitted.
> -
>
> Key: YARN-10754
> URL: https://issues.apache.org/jira/browse/YARN-10754
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10754.001.patch, image-2021-04-27-11-38-29-162.png
>
>
> As 
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10754) RM Renew Delegation token thread should timeout and retry should also consider app new submitted.

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10754:
--
Description: 
As 

Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews HDFS 
tokens received to check for validity and expiration time.

This call is made to an underlying HDFS NN or Router Node (which has exact APIs 
as HDFS NN). If one of the nodes is bad and the renew call is stuck the thread 
remains stuck indefinitely. The thread should ideally timeout the renewToken 
and retry from the client's perspective.

> RM Renew Delegation token thread should timeout and retry should also 
> consider app new submitted.
> -
>
> Key: YARN-10754
> URL: https://issues.apache.org/jira/browse/YARN-10754
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10754.001.patch
>
>
> As 
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-04-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332857#comment-17332857
 ] 

Hadoop QA commented on YARN-10745:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
11s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m  
7s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
25s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
24s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
47s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
29s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 10s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
55s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 39m 
46s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  8m  
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
12s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
41s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 56s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/935/artifact/out/diff-checkstyle-root.txt{color}
 | {color:orange} root: The patch generated 7 new + 541 unchanged - 20 fixed = 
548 total (was 561) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {c

[jira] [Commented] (YARN-7769) FS QueueManager should not create default queue at init

2021-04-26 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332842#comment-17332842
 ] 

Wilfred Spiegelenburg commented on YARN-7769:
-

OK, in that case we're ready to commit. +1 from my side.

[~snemeth]: I do think that we need to release note this as it is a difference 
in behaviour. Do we also need to make sure that YARN-8951 works, or at least 
does not throw a NPE and takes down the RM before we commit this?

> FS QueueManager should not create default queue at init
> ---
>
> Key: YARN-7769
> URL: https://issues.apache.org/jira/browse/YARN-7769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-7769.001.patch, YARN-7769.002.patch, 
> YARN-7769.003.patch
>
>
> Currently the FairScheduler QueueManager automatically creates the default 
> queue. However the default queue does not need to exist. We have two possible 
> cases which we should handle:
> * Based on the placement rule "Default" the name for the default queue might 
> not be default and it should be created with a different name
> * There might not be a "Default" placement rule at all which removes the need 
> to create the queue.
> We should leave the creation of the default queue to the point in time that 
> we can assess if it is needed or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10754) RM Renew Delegation token thread should timeout and retry should also consider app new submitted.

2021-04-26 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10754:
-

 Summary: RM Renew Delegation token thread should timeout and retry 
should also consider app new submitted.
 Key: YARN-10754
 URL: https://issues.apache.org/jira/browse/YARN-10754
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Qi Zhu
Assignee: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-26 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332739#comment-17332739
 ] 

Eric Badger commented on YARN-10707:


Thanks for the updated patch, [~zhuqi]! It's much cleaner and much smaller now

{noformat}
 float nodeGpuUtilization = 0F;
+float nodeGpus = 0F;
 try {
   if (gpuNodeResourceUpdateHandler != null) {
 nodeGpuUtilization =
 gpuNodeResourceUpdateHandler.getNodeGpuUtilization();
+nodeGpus =
+gpuNodeResourceUpdateHandler.getNodePhysGpus();
   }
 } catch (Exception e) {
   LOG.error("Get Node GPU Utilization error: " + e);
 }
{noformat}
Ideally this wouldn't be GPU-specific and we could add all plugin utilizations 
to the nodeUtilization object. But that is beyond the scope of this JIRA, so I 
think this is fine. However, I think we can get a better name than 
{{nodeGpus}}. Maybe {{TotalNodeGpuUtilization}}?

Additionally, why are we sending the average GPU utilization to the NM metrics, 
but the total GPU utilization to the RM? Memory and CPU are consistent across 
the two. I don't understand why GPU is different.

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10745) Change Log level from info to debug for few logs and remove unnecessary debuglog checks

2021-04-26 Thread D M Murali Krishna Reddy (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

D M Murali Krishna Reddy updated YARN-10745:

Attachment: YARN-10745.001.patch

> Change Log level from info to debug for few logs and remove unnecessary 
> debuglog checks
> ---
>
> Key: YARN-10745
> URL: https://issues.apache.org/jira/browse/YARN-10745
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-10745.001.patch
>
>
> Change the info log level to debug for few logs so that the load on the 
> logger decreases in large cluster and improves the performance.
> Remove the unnecessary isDebugEnabled() checks for printing strings without 
> any string concatenation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7769) FS QueueManager should not create default queue at init

2021-04-26 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332593#comment-17332593
 ] 

Benjamin Teke commented on YARN-7769:
-

[~wilfreds] both options are good to me, maybe it's a bit cleaner to update it 
in a new Jira so I created YARN-10753.

> FS QueueManager should not create default queue at init
> ---
>
> Key: YARN-7769
> URL: https://issues.apache.org/jira/browse/YARN-7769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-7769.001.patch, YARN-7769.002.patch, 
> YARN-7769.003.patch
>
>
> Currently the FairScheduler QueueManager automatically creates the default 
> queue. However the default queue does not need to exist. We have two possible 
> cases which we should handle:
> * Based on the placement rule "Default" the name for the default queue might 
> not be default and it should be created with a different name
> * There might not be a "Default" placement rule at all which removes the need 
> to create the queue.
> We should leave the creation of the default queue to the point in time that 
> we can assess if it is needed or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7769) FS QueueManager should not create default queue at init

2021-04-26 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332593#comment-17332593
 ] 

Benjamin Teke edited comment on YARN-7769 at 4/26/21, 4:58 PM:
---

[~wilfreds] both options are good to me, I think it's a bit cleaner to update 
it in a new Jira so I created YARN-10753.


was (Author: bteke):
[~wilfreds] both options are good to me, maybe it's a bit cleaner to update it 
in a new Jira so I created YARN-10753.

> FS QueueManager should not create default queue at init
> ---
>
> Key: YARN-7769
> URL: https://issues.apache.org/jira/browse/YARN-7769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-7769.001.patch, YARN-7769.002.patch, 
> YARN-7769.003.patch
>
>
> Currently the FairScheduler QueueManager automatically creates the default 
> queue. However the default queue does not need to exist. We have two possible 
> cases which we should handle:
> * Based on the placement rule "Default" the name for the default queue might 
> not be default and it should be created with a different name
> * There might not be a "Default" placement rule at all which removes the need 
> to create the queue.
> We should leave the creation of the default queue to the point in time that 
> we can assess if it is needed or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10753) Document the removal of FS default queue creation

2021-04-26 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-10753:


 Summary: Document the removal of FS default queue creation
 Key: YARN-10753
 URL: https://issues.apache.org/jira/browse/YARN-10753
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.4.0
Reporter: Benjamin Teke
Assignee: Benjamin Teke


In YARN-7769 the auto creation of the root.default queue is removed from FS 
initialization. This implacts the the 
[documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html]
 in multiple places:

* By default, all users share a single queue, named “default”.
* user-as-default-queue says it falls back to the default queue
* allow-undeclared-pools is another point that mentions the default queue

This should be changed according to the new state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10752) Shaded guava not found when compiling with profile hbase2.0

2021-04-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-10752.
--
Fix Version/s: 3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged the PR.

> Shaded guava not found when compiling with profile hbase2.0
> ---
>
> Key: YARN-10752
> URL: https://issues.apache.org/jira/browse/YARN-10752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When preparing for Hadoop 3.3.1 release, I found the build breaks when 
> compiling with profile hbase2.0 because the shaded guava classes in the 
> hadoop-thirdparty jars were not found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332488#comment-17332488
 ] 

Hadoop QA commented on YARN-10739:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
12s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  8s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m  
8s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 52s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the 

[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332433#comment-17332433
 ] 

Qi Zhu commented on YARN-10739:
---

Thanks a lot [~pbacsko] for review.

Very value suggestions, updated in latest patch 006. :D

Thanks.

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch, 
> YARN-10739.005.patch, YARN-10739.006.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10739:
--
Attachment: YARN-10739.006.patch

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch, 
> YARN-10739.005.patch, YARN-10739.006.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10739:
--
Attachment: YARN-10739.005.patch

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch, 
> YARN-10739.005.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10691) DominantResourceCalculator isInvalidDivisor should consider only countable resource types

2021-04-26 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332275#comment-17332275
 ] 

Bilwa S T commented on YARN-10691:
--

cc [~epayne] [~jbrennan]

> DominantResourceCalculator isInvalidDivisor should consider only countable 
> resource types
> -
>
> Key: YARN-10691
> URL: https://issues.apache.org/jira/browse/YARN-10691
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10691.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10752) Shaded guava not found when compiling with profile hbase2.0

2021-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10752:
--
Labels: pull-request-available  (was: )

> Shaded guava not found when compiling with profile hbase2.0
> ---
>
> Key: YARN-10752
> URL: https://issues.apache.org/jira/browse/YARN-10752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When preparing for Hadoop 3.3.1 release, I found the build breaks when 
> compiling with profile hbase2.0 because the shaded guava classes in the 
> hadoop-thirdparty jars were not found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10752) Shaded guava not found when compiling with profile hbase2.0

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned YARN-10752:
--

Assignee: Wei-Chiu Chuang

> Shaded guava not found when compiling with profile hbase2.0
> ---
>
> Key: YARN-10752
> URL: https://issues.apache.org/jira/browse/YARN-10752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>
> When preparing for Hadoop 3.3.1 release, I found the build breaks when 
> compiling with profile hbase2.0 because the shaded guava classes in the 
> hadoop-thirdparty jars were not found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10752) Shaded guava not found when compiling with profile hbase2.0

2021-04-26 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created YARN-10752:
--

 Summary: Shaded guava not found when compiling with profile 
hbase2.0
 Key: YARN-10752
 URL: https://issues.apache.org/jira/browse/YARN-10752
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 3.3.1
Reporter: Wei-Chiu Chuang


When preparing for Hadoop 3.3.1 release, I found the build breaks when 
compiling with profile hbase2.0 because the shaded guava classes in the 
hadoop-thirdparty jars were not found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332076#comment-17332076
 ] 

Peter Bacsko commented on YARN-10739:
-

Thanks for the patch [~zhuqi].

I have some comments:
1. {{PrintEventDetailsService #%d}} - I think it's better to call it 
{{PrintEventDetailsThread #%d}}.

2. Variable {{printEventDetailsService}} - same here, 
{{printEventDetailsExecutor}} sounds better.

3. {{printEventDetailsService.allowCoreThreadTimeOut(true);}} --> there is just 
one core thread. I think it's fine if we don't allow it to time out, so I 
suggest to set this to "false" (which is the default).

4. {{printEventDetailsService.shutdown();}} -- since we're shutting it down in 
{{serviceStop()}}, let's call {{shutdownNow()}} which is safer. Don't wait for 
printing.

5. Tracing log:
{noformat}
// For test
if (LOG.isTraceEnabled()) {
  LOG.trace("Event type: " + entry.getKey() + " printed.");
}
{noformat}

I know that this is for testing, but still, this affects production code. Trace 
level already floods the logs with everything. I don't think we should print 
this, even on TRACE. It's not a huge issue if it is not tested. 



> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332076#comment-17332076
 ] 

Peter Bacsko edited comment on YARN-10739 at 4/26/21, 12:14 PM:


Thanks for the patch [~zhuqi].

I have some comments:
1. {{PrintEventDetailsService #%d}} - I think it's better to call it 
{{PrintEventDetailsThread #%d}}.

2. Variable {{printEventDetailsService}} - same here, 
{{printEventDetailsExecutor}} sounds better.

3. {{printEventDetailsService.allowCoreThreadTimeOut(true);}} --> there is just 
one core thread. I think it's fine if we don't allow it to time out, so I 
suggest to set this to "false" (which is the default).

4. {{printEventDetailsService.shutdown();}} -- since we're shutting it down in 
{{serviceStop()}}, let's call {{shutdownNow()}} which is safer. Don't wait for 
printing.

5. Tracing log:
{noformat}
// For test
if (LOG.isTraceEnabled()) {
  LOG.trace("Event type: " + entry.getKey() + " printed.");
}
{noformat}

I know that this is for testing, but still, this affects the production code. 
Trace level already floods the logs with everything. I don't think we should 
print this, even on TRACE. It's not a huge issue if it is not tested. 




was (Author: pbacsko):
Thanks for the patch [~zhuqi].

I have some comments:
1. {{PrintEventDetailsService #%d}} - I think it's better to call it 
{{PrintEventDetailsThread #%d}}.

2. Variable {{printEventDetailsService}} - same here, 
{{printEventDetailsExecutor}} sounds better.

3. {{printEventDetailsService.allowCoreThreadTimeOut(true);}} --> there is just 
one core thread. I think it's fine if we don't allow it to time out, so I 
suggest to set this to "false" (which is the default).

4. {{printEventDetailsService.shutdown();}} -- since we're shutting it down in 
{{serviceStop()}}, let's call {{shutdownNow()}} which is safer. Don't wait for 
printing.

5. Tracing log:
{noformat}
// For test
if (LOG.isTraceEnabled()) {
  LOG.trace("Event type: " + entry.getKey() + " printed.");
}
{noformat}

I know that this is for testing, but still, this affects production code. Trace 
level already floods the logs with everything. I don't think we should print 
this, even on TRACE. It's not a huge issue if it is not tested. 



> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-

[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332053#comment-17332053
 ] 

Hadoop QA commented on YARN-10739:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 
58s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 21s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 
42s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  9s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the 

[jira] [Commented] (YARN-10637) fs2cs: add queue autorefresh policy during conversion

2021-04-26 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332014#comment-17332014
 ] 

Peter Bacsko commented on YARN-10637:
-

+1

Thanks [~zhuqi] for the patch and [~gandras] for the review. Committed to trunk.

> fs2cs: add queue autorefresh policy during conversion
> -
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10637) fs2cs: add queue autorefresh policy during conversion

2021-04-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10637:

Summary: fs2cs: add queue autorefresh policy during conversion  (was: We 
should support fs to cs support for auto refresh queues when conf changed, 
after YARN-10623 finished.)

> fs2cs: add queue autorefresh policy during conversion
> -
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10637) fs2cs: add queue autorefresh policy during conversion

2021-04-26 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10637:

Labels: fs2cs  (was: )

> fs2cs: add queue autorefresh policy during conversion
> -
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331984#comment-17331984
 ] 

Qi Zhu commented on YARN-10739:
---

Thanks [~gandras] for review.

Good suggestion, i have updated in latest patch.

[~ebadger] [~snemeth] [~pbacsko] 

If you any other advice?

Thanks.

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10739:
--
Attachment: YARN-10739.004.patch

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331981#comment-17331981
 ] 

Qi Zhu commented on YARN-9615:
--

Thanks [~chaosju] for concern.

I will improve it later.

And i also will improve the event generally in 
https://issues.apache.org/jira/browse/YARN-9927.

 

 

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.

2021-04-26 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331974#comment-17331974
 ] 

Qi Zhu commented on YARN-10637:
---

Thanks [~gandras] for confirm.

[~pbacsko] do you have any comments?

Thanks.

 

> We should support fs to cs support for auto refresh queues when conf changed, 
> after YARN-10623 finished.
> 
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10561) Upgrade node.js to at least 10.x in YARN application catalog webapp

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-10561:
---

===Bulk update===

I am planning to cut the branch for Hadoop 3.3.1 release, and this jira targets 
3.3.1 currently. Please take the time to review the patch, or push out of 3.3.1 
if you think it can't be finished in the next few weeks.

> Upgrade node.js to at least 10.x in YARN application catalog webapp
> ---
>
> Key: YARN-10561
> URL: https://issues.apache.org/jira/browse/YARN-10561
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> YARN application catalog webapp is using node.js 8.11.3, and 8.x are already 
> EoL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10660) YARN Web UI have problem when show node partitions resource

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-10660:
---

===Bulk update===

I am planning to cut the branch for Hadoop 3.3.1 release, and this jira targets 
3.3.1 currently. Please take the time to review the patch, or push out of 3.3.1 
if you think it can't be finished in the next few weeks.

> YARN Web UI have problem when show node partitions resource
> ---
>
> Key: YARN-10660
> URL: https://issues.apache.org/jira/browse/YARN-10660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.0, 3.1.1, 3.2.1, 3.2.2
>Reporter: tuyu
>Priority: Minor
> Attachments: 2021-03-01 19-56-02 的屏幕截图.png, YARN-10660.patch
>
>
> when enable yarn label function, Yarn UI will show queue resource base on 
> partitions,but there have some problem when click expand button. The url will 
> increase very long, like  this 
> {code:java}
> 127.0.0.1:20701/cluster/scheduler?openQueues=Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20
> {code}
> The root cause is
> {code:java}
>origin url is:
>   Partition:  
>htmlencode is:
>   Partition:  
>   SchedulerPageUtil have some javascript code
>  storeExpandedQueue
> tmpCurrentParam = tmpCurrentParam.split('&');",
>the  Partition:   
>  will split and len > 1, the problem logic is here, if click  expand button 
> close, the function will clear params, but it the split array is not match 
> orgin url 
> {code}
> when click expand button close, lt;DEFAULT_PARTITION>  vCores:96>  will append, if click expand multi times, the length will 
> increase too long
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10510) TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause NullPointerException

2021-04-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-10510:
---

===Bulk update===

I am planning to cut the branch for Hadoop 3.3.1 release, and this jira targets 
3.3.1 currently. Please take the time to review the patch, or push out of 3.3.1 
if you think it can't be finished in the next few weeks.

> TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt  will cause 
> NullPointerException
> 
>
> Key: YARN-10510
> URL: https://issues.apache.org/jira/browse/YARN-10510
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: tuyu
>Priority: Minor
>  Labels: test
> Fix For: 3.2.1, 3.3.1
>
> Attachments: YARN-10510.001.patch
>
>
> run TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause blow 
> exception
> {code:java}
> 2020-12-01 20:16:41,412 ERROR [main] webapp.AppBlock 
> (AppBlock.java:render(124)) - Failed to read the application 
> application_1234_.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.getApplicationReport(RMAppBlock.java:218)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:112)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt(TestAppPage.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
> Disconnected from the target VM, address: '127.0.0.1:60623', transport: 
> 'socket'
> {code}
> because of  mockClientRMService not mock getApplicationReport and 
> getApplicationAttempts interface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10739) GenericEventHandler.printEventQueueDetails cause RM recovery cost too much time

2021-04-26 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331877#comment-17331877
 ] 

Andras Gyori commented on YARN-10739:
-

Thank you for the patch [~zhuqi]. Overally the patch looks good to me and I 
also find it useful. One minor nit is that the threadFactory could be extracted 
to a local variable.

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> ---
>
> Key: YARN-10739
> URL: https://issues.apache.org/jira/browse/YARN-10739
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Zhanqi Cai
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
> this.rmContext
> .getDispatcher()
> .getEventHandler()
> .handle(
> new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
> appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator iterator = eventQueue.iterator();
>   Map counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
> Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 1200
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8811) Support Container Storage Interface (CSI) in YARN

2021-04-26 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331858#comment-17331858
 ] 

Akira Ajisaka commented on YARN-8811:
-

Is this project still ongoing? I didn't see any documents in the source code.

> Support Container Storage Interface (CSI) in YARN
> -
>
> Key: YARN-8811
> URL: https://issues.apache.org/jira/browse/YARN-8811
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: CSI
> Attachments: Support Container Storage Interface(CSI) in YARN_design 
> doc_20180921.pdf, Support Container Storage Interface(CSI) in YARN_design 
> doc_20180928.pdf, Support Container Storage Interface(CSI) in 
> YARN_design_doc_v3.pdf
>
>
> The Container Storage Interface (CSI) is a vendor neutral interface to bridge 
> Container Orchestrators and Storage Providers. With the adoption of CSI in 
> YARN, it will be easier to integrate 3rd party storage systems, and provide 
> the ability to attach persistent volumes for stateful applications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-04-26 Thread chaosju (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331831#comment-17331831
 ] 

chaosju commented on YARN-9615:
---

{code:java}
// code placeholder
   protected Dispatcher createDispatcher() {
AsyncDispatcher dispatcher = new AsyncDispatcher("RM Event dispatcher");
GenericEventTypeMetrics genericEventTypeMetrics =
GenericEventTypeMetricsManager.
create(dispatcher.getName(), NodesListManagerEventType.class);
// We can add more
dispatcher.addMetrics(genericEventTypeMetrics,
genericEventTypeMetrics.getEnumClass());
return dispatcher;
}{code}
Metrics should think about  RMNodeEvent、RMAppAttemptEvent、RMAppEvent etc,Which 
has great frequency?Reference https://issues.apache.org/jira/browse/YARN-9927

Or this method may make event collection configurable?

[~zhuqi]

 

 
{code:java}
 {code}
 

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org