date:20210428

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335104#comment-17335104
 ] 

Qi Zhu edited comment on YARN-10738 at 4/29/21, 2:48 AM:
-

Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling without node heartbeat scheduling, may will all commit to the 
first same node, it will cause the hot node, and the hot node is the big 
bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

I agree with you we should do more test and practice to push this issue, and i 
think it will be helpful to large scale cluster, especially the real time 
clusters, i will help a lot.

Thanks again [~Jim_Brennan] .


was (Author: zhuqi):
Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling without node heartbeat scheduling, may will all commit to the 
first same node, it will cause the hot node, and the hot node is the big 
bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

Thanks.

> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335104#comment-17335104
 ] 

Qi Zhu edited comment on YARN-10738 at 4/29/21, 2:46 AM:
-

Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling without node heartbeat scheduling, may will all commit to the 
first same node, it will cause the hot node, and the hot node is the big 
bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

Thanks.


was (Author: zhuqi):
Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling without node heartbeat scheduling, will commit to the first 
same node, it will cause the hot node, and the hot node is the big bottleneck 
of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

Thanks.

> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335104#comment-17335104
 ] 

Qi Zhu edited comment on YARN-10738 at 4/29/21, 2:45 AM:
-

Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling without node heartbeat scheduling, will commit to the first 
same node, it will cause the hot node, and the hot node is the big bottleneck 
of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

Thanks.


was (Author: zhuqi):
Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling, will commit to the first same node, it will cause the hot 
node, and the hot node is the big bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

Thanks.

> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335104#comment-17335104
 ] 

Qi Zhu commented on YARN-10738:
---

Thanks [~Jim_Brennan] for review and very patient investigation.

The original ResourceUsageMultiNodeLookupPolicy policy sometimes cause the hot 
node in test cluster, and after the gap shuffle about more than 50% reduce the 
hot node case, but the gap 10 we should discuss about it, it related to the 
size of the cluster, and it will get better result if we choose the good gap.

I agree with you, that another option to consider would be to have a policy 
that uses node utilization, which should more accurately reflect how busy the 
node is. And we should also shuffle based the node utilization, because multi 
thread scheduling, will commit to the first same node, it will cause the hot 
node, and the hot node is the big bottleneck of real time cluster. 

And actually the hot node is mainly affected the real time cluster, because it 
is more restrict to the delay of job.

Thanks.

> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

2021-04-28 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335017#comment-17335017
 ] 

Jim Brennan commented on YARN-10738:


[~zhuqi], I am not very familiar with the multi-threaded scheduling code - we 
have not started using it yet.  So it would be very helpful if you could 
provide more details about what you are observing in your cluster, and how you 
think this will fix it.  Is your cluster made up of many nodes that are the 
same size, or do you have a mix of different sizes?  If you have any data that 
shows some nodes being more heavily utilized than others, that would be helpful.

Looking at  {{ResourceUsageMultiNodeLookupPolicy}}, it seems to sort by 
allocated resources to a node, so this seems to be trying to ensure we allocate 
more evenly across nodes.  It doesn't consider the relative sizes of the nodes 
though, so in a heterogenous cluster, I could see it leading to smaller nodes 
being busier than larger nodes.   I wonder if a reverse sort by unallocated 
resources might be more fair, because it would favor nodes that have more room 
for new resource requests, rather than those that currently have fewer 
resources allocated.

Another option to consider would be to have a policy that uses node 
utilization, which should more accurately reflect how busy the node is.

With respect to the policy proposed in this ticket, I am not convinced it will 
help very much?  It's doing the same sort by allocated resources, but just 
adding a shuffle of every 10 nodes.  I'm not sure how much that will help in 
practice on a large cluster.  A rack is usually more than 10 nodes, so it's 
possible the same set of racks will be over-utilized.   Again, it would be 
helpful if you had some before/after data to show how it helps in a real 
cluster.


> When multi thread scheduling with multi node, we should shuffle with a gap to 
> prevent hot accessing nodes.
> --
>
> Key: YARN-10738
> URL: https://issues.apache.org/jira/browse/YARN-10738
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the multi threading scheduling with multi node is not reasonable.
> In large clusters, it will cause the hot accessing nodes, which will lead the 
> abnormal boom node.
> Solution:
> I think we should shuffle the sorted node (such the available resource sort 
> policy) with an interval. 
> I will solve the above problem, and avoid the hot accessing node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

2021-04-28 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334763#comment-17334763
 ] 

Hadoop QA commented on YARN-10571:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
16s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
34s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  2s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m  
7s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 55s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/945/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 5 new + 51 unchanged 
- 5 fixed = 56 total (was 56) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 45s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/945/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt{color}
 | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 5 new 
+ 40 unchanged - 5 fixed = 45 total (was 45) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334756#comment-17334756
 ] 

Qi Zhu commented on YARN-10707:
---

The failed time out test is not related, passed locally.

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch, 
> YARN-10707.009.patch, YARN-10707.010.patch, YARN-10707.011.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10759) Encapsulate queue config modes

2021-04-28 Thread Andras Gyori (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334727#comment-17334727
 ] 

Andras Gyori commented on YARN-10759:
-

Uploaded an initial revision of this refactor. Currently I have skipped 
refactoring ManagedParent and AutoCreatedLeafQueue related logic. I have also 
kept this improvement as simple as possible. It is not justified yet to 
complicate things by:
 * Encapsulate child queue modes for a ParentQueue
 * Create new mode WEIGHT mode, as it is more of a subtype of RELATIVE mode

This might change, however, according to the needs of the tasks defined under 
this Jira eg. YARN-9936.

> Encapsulate queue config modes
> --
>
> Key: YARN-10759
> URL: https://issues.apache.org/jira/browse/YARN-10759
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10759.001.patch
>
>
> Capacity Scheduler queues have three modes:
>  * relative/percentage
>  * weight
>  * absolute
> Most of them have their own:
>  * validation logic
>  * config setting logic
>  * effective capacity calculation logic
> These logics can be easily extracted and encapsulated in separate config mode 
> classes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10759) Encapsulate queue config modes

2021-04-28 Thread Andras Gyori (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10759:

Attachment: YARN-10759.001.patch

> Encapsulate queue config modes
> --
>
> Key: YARN-10759
> URL: https://issues.apache.org/jira/browse/YARN-10759
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10759.001.patch
>
>
> Capacity Scheduler queues have three modes:
>  * relative/percentage
>  * weight
>  * absolute
> Most of them have their own:
>  * validation logic
>  * config setting logic
>  * effective capacity calculation logic
> These logics can be easily extracted and encapsulated in separate config mode 
> classes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-28 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334691#comment-17334691
 ] 

Hadoop QA commented on YARN-10707:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 
25s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:blue}0{color} | {color:blue} buf {color} | {color:blue}  0m  0s{color} 
| {color:blue}{color} | {color:blue} buf was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 3 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
26s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
27s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 33s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
17s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 29m 
18s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
27s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
13s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  9m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
13s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  8m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
36s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
20s{color} | {col

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-28 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334689#comment-17334689
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m  
4s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
55s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
6s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 44s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 31m 
19s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  6m 
18s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
17s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
17s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
57s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 55s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {c

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

2021-04-28 Thread Andras Gyori (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334675#comment-17334675
 ] 

Andras Gyori commented on YARN-10571:
-

Thank you [~pbacsko]. I have no idea what is going on, but I hope the OOM error 
is not related.

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

2021-04-28 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334674#comment-17334674
 ] 

Peter Bacsko commented on YARN-10571:
-

Thanks [~gandras] for the patch. Do you know what's going on with the javac 
warnings? That code wasn't even touched. Maybe it has to do with the failing 
build ("Unable to create native thread").

I'll trigger a rebuild.

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

2021-04-28 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334608#comment-17334608
 ] 

Hadoop QA commented on YARN-10571:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
16s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
24s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
34s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 20s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 27m 
50s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 28s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/942/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 5 new + 51 unchanged 
- 5 fixed = 56 total (was 56) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 22s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/942/artifact/out/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt{color}
 | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 5 new 
+ 40 unchanged - 5 fixed = 45 total (was 45) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green}{color} | {color:green

[jira] [Updated] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-28 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10707:
--
Attachment: YARN-10707.011.patch

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch, 
> YARN-10707.009.patch, YARN-10707.010.patch, YARN-10707.011.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334606#comment-17334606
 ] 

Qi Zhu commented on YARN-10707:
---

Fixed java doc in latest patch.

> Support custom resources in ResourceUtilization, and update Node GPU 
> Utilization to use.
> 
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch, YARN-10707.004.patch, YARN-10707.005.patch, 
> YARN-10707.006.patch, YARN-10707.007.patch, YARN-10707.008.patch, 
> YARN-10707.009.patch, YARN-10707.010.patch, YARN-10707.011.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334565#comment-17334565
 ] 

Qi Zhu edited comment on YARN-9927 at 4/28/21, 8:29 AM:


[~ebadger]  [~pbacsko] [~gandras] [~epayne]

Updated a patch to improve:

1. The event produce to event queue :
 Each eventType will have single async dispatcher thread, the thread number is 
small which i have counted. And each dispatcher has single event queue, events 
will not affect each other anymore.

2. The event consume from event queue and process:

I add an example to multi thread processing in handler just for RMNodeEvent. 
Different RMNodeImpl object has different write lock, i think it can be faster 
to make multi thread handle with different RMNode object.

What's your opinions about this?

Thanks.


was (Author: zhuqi):
[~ebadger] [~gandras] [~epayne] 

Updated a patch to improve:

1. The event produce to event queue :
Each eventType will have single async dispatcher thread, the thread number is 
small which i have counted. And each dispatcher has single event queue, events 
will not affect each other anymore.

2. The event consume from event queue and process:

I add an example to multi thread processing in handler just for RMNodeEvent. 
Different RMNodeImpl object has different write lock, i think it can be faster 
to make multi thread handle with different RMNode object.

What's your opinions about this?

Thanks.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-28 Thread Qi Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334565#comment-17334565
 ] 

Qi Zhu commented on YARN-9927:
--

[~ebadger] [~gandras] [~epayne] 

Updated a patch to improve:

1. The event produce to event queue :
Each eventType will have single async dispatcher thread, the thread number is 
small which i have counted. And each dispatcher has single event queue, events 
will not affect each other anymore.

2. The event consume from event queue and process:

I add an example to multi thread processing in handler just for RMNodeEvent. 
Different RMNodeImpl object has different write lock, i think it can be faster 
to make multi thread handle with different RMNode object.

What's your opinions about this?

Thanks.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

2021-04-28 Thread Qi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9927:
-
Attachment: YARN-9927.005.patch

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

2021-04-28 Thread Andras Gyori (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10571:

Attachment: YARN-10571.003.patch

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

2021-04-28 Thread Andras Gyori (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10571:

Attachment: (was: YARN-10571.003.patch)

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

[jira] [Comment Edited] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

[jira] [Commented] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

[jira] [Commented] (YARN-10738) When multi thread scheduling with multi node, we should shuffle with a gap to prevent hot accessing nodes.

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

[jira] [Commented] (YARN-10759) Encapsulate queue config modes

[jira] [Updated] (YARN-10759) Encapsulate queue config modes

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

[jira] [Commented] (YARN-10571) Refactor dynamic queue handling logic

[jira] [Updated] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

[jira] [Commented] (YARN-10707) Support custom resources in ResourceUtilization, and update Node GPU Utilization to use.

[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

[jira] [Updated] (YARN-9927) RM multi-thread event processing mechanism

[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

21 matches

Site Navigation

Mail list logo

Footer information