[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298587#comment-17298587
 ] 

Qi Zhu edited comment on YARN-10674 at 3/10/21, 7:36 AM:
-

Thanks a lot [~pbacsko] for reply.

Your advice make sense to me , and it's the only solution to solve it now, 
because the preemption is related to the enable option.

“However, you already added a policy in YARN-10623, so looks like this property 
always has to be enabled in weight mode.”

Actually this is used in all mode for auto refresh. 

I will update it in next patch.

Thanks.


was (Author: zhuqi):
Thanks a lot [~pbacsko] for reply.

Your advice make sense to me , and it's the only solution to solve it now, 
because the preemption is related to the enable option.

I will update it in next patch.

Thanks.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298587#comment-17298587
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks a lot [~pbacsko] for reply.

Your advice make sense to me , and it's the only solution to solve it now, 
because the preemption is related to the enable option.

I will update it in next patch.

Thanks.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-09 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298568#comment-17298568
 ] 

Akira Ajisaka commented on YARN-10501:
--

Filed HADOOP-17572 for this issue.

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.1.001.patch, 
> YARN-10501-branch-2.10.1.002.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-09 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298565#comment-17298565
 ] 

Akira Ajisaka commented on YARN-10501:
--

This error is not due to the yetus changes. It is because openjdk-7-jdk has 
been removed from ppa
https://launchpad.net/~openjdk-r/+archive/ubuntu/ppa/+packages?field.name_filter=&field.status_filter=superseded&field.series_filter=xenial

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.1.001.patch, 
> YARN-10501-branch-2.10.1.002.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI

2021-03-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298332#comment-17298332
 ] 

Eric Payne commented on YARN-10588:
---

Thanks [~BilwaST] for reporting the issue and the fixes! The changes LGTM. 
[~Jim_Brennan], do you want to weigh in?

> Percentage of queue and cluster is zero in WebUI 
> -
>
> Key: YARN-10588
> URL: https://issues.apache.org/jira/browse/YARN-10588
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10588.001.patch, YARN-10588.002.patch, 
> YARN-10588.003.patch, YARN-10588.004.patch
>
>
> Steps to reproduce:
> Configure below property in resource-types.xml
> {code:java}
> 
>  yarn.resource-types
>  yarn.io/gpu
>  {code}
> Submit a job
> In UI you can see % Of Queue and % Of Cluster is zero for the submitted 
> application
>  
> This is because in SchedulerApplicationAttempt has below check for 
> calculating queueUsagePerc and clusterUsagePerc
> {code:java}
> if (!calc.isInvalidDivisor(cluster)) {
> float queueCapacityPerc = queue.getQueueInfo(false, false)
> .getCapacity();
> queueUsagePerc = calc.divide(cluster, usedResourceClone,
> Resources.multiply(cluster, queueCapacityPerc)) * 100;
> if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) {
>   queueUsagePerc = 0.0f;
> }
> clusterUsagePerc =
> calc.divide(cluster, usedResourceClone, cluster) * 100;
>   }
> {code}
> calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2021-03-09 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298253#comment-17298253
 ] 

Eric Badger commented on YARN-10501:


[~ahussein], [~aajisaka], is this due to any of the recent yetus changes? New 
branch-2.10 patches are failing Hadoop QA because it can't find openjdk-7-jdk

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10501-branch-2.10.1.001.patch, 
> YARN-10501-branch-2.10.1.002.patch, YARN-10501.002.patch, 
> YARN-10501.003.patch, YARN-10501.004.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl

2021-03-09 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298166#comment-17298166
 ] 

Brahma Reddy Battula commented on YARN-9731:


[~abmodi] could you check the latest patch, looks all your comments are 
addressed. It's an security issue,better to have for branch-3.3 also.

> In ATS v1.5, all jobs are visible to all users without view-acl
> ---
>
> Key: YARN-9731
> URL: https://issues.apache.org/jira/browse/YARN-9731
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9731.001.patch, YARN-9731.002.patch, 
> YARN-9731.003.patch, YARN-9731.004.patch, YARN-9731.005.patch, 
> ats_v1.5_screenshot.png
>
>
> In ATS v1.5 of secure mode,
> all jobs are visible to all users without view-acl.
> if user does not have view-acl,  user should not be able to see jobs.
> I attatched ATS UI screenshot.
>  
> ATS v1.5 log
> {code:java}
> 2019-08-09 10:21:13,679 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1954. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1954
> 2019-08-09 10:21:13,680 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1951. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1951
> {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI

2021-03-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298159#comment-17298159
 ] 

Eric Payne commented on YARN-10588:
---

[~BilwaST], sorry for the delay. I will look at this today.

> Percentage of queue and cluster is zero in WebUI 
> -
>
> Key: YARN-10588
> URL: https://issues.apache.org/jira/browse/YARN-10588
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10588.001.patch, YARN-10588.002.patch, 
> YARN-10588.003.patch, YARN-10588.004.patch
>
>
> Steps to reproduce:
> Configure below property in resource-types.xml
> {code:java}
> 
>  yarn.resource-types
>  yarn.io/gpu
>  {code}
> Submit a job
> In UI you can see % Of Queue and % Of Cluster is zero for the submitted 
> application
>  
> This is because in SchedulerApplicationAttempt has below check for 
> calculating queueUsagePerc and clusterUsagePerc
> {code:java}
> if (!calc.isInvalidDivisor(cluster)) {
> float queueCapacityPerc = queue.getQueueInfo(false, false)
> .getCapacity();
> queueUsagePerc = calc.divide(cluster, usedResourceClone,
> Resources.multiply(cluster, queueCapacityPerc)) * 100;
> if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) {
>   queueUsagePerc = 0.0f;
> }
> clusterUsagePerc =
> calc.divide(cluster, usedResourceClone, cluster) * 100;
>   }
> {code}
> calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298156#comment-17298156
 ] 

Peter Bacsko commented on YARN-10674:
-

[~zhuqi] this is very interesting. If we set RM Monitors to enabled, it means 
that system-wide preemption is always enabled, too:

AbstractCSQueue:
{noformat}
  private boolean isQueueHierarchyPreemptionDisabled(CSQueue q,
  CapacitySchedulerConfiguration configuration) {
boolean systemWidePreemption =
csContext.getConfiguration()
.getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS);
CSQueue parentQ = q.getParent();

// If the system-wide preemption switch is turned off, all of the queues in
// the qPath hierarchy have preemption disabled, so return true.
if (!systemWidePreemption) return true;
{noformat}
However, you already added a policy in YARN-10623, so looks like this property 
always has to be enabled in weight mode. But what if we convert an FS 
configuration which disabled preemption completely?

I think the best thing we can do right now is that we disable preemption for 
"root", which will propagate to all other parent queues.

So I suggest the following approach:
 1. In percentage conversion mode, do not enable RM monitors by default, 
because it's not needed.
 2. In weight mode (which is the default now), we have to enable it. But if 
"yarn.scheduler.fair.preemption" is false, then 
"yarn.scheduler.capacity.root.disable_preemption" must be set to true, but only 
for "root". This can be done in {{FSQueueConverter}}.

cc [~bteke] [~gandras] [~snemeth], not sure if this is a good approach, but I 
can't see anything better.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10497) Fix an issue in CapacityScheduler which fails to delete queues

2021-03-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298128#comment-17298128
 ] 

Hadoop QA commented on YARN-10497:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
20s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 41s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 52s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green}{color} | {color

[jira] [Commented] (YARN-8823) Monitor the healthy state of GPU

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298127#comment-17298127
 ] 

Qi Zhu commented on YARN-8823:
--

[~adam.antal] [~tangzhankun]

Is this going on?
"I was wondering if this issue can be easily finished by writing a custom 
health checker script (idea from YARN-9923).
I think it would make sense to push this feature - would you like to share your 
PoC and compare its advantages/disadvantages to a node checker script?"

This is a good suggestion.
 

> Monitor the healthy state of GPU
> 
>
> Key: YARN-8823
> URL: https://issues.apache.org/jira/browse/YARN-8823
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>
> We have GPU resource discovered when the NM bootstrap but not updated through 
> later heatbeat with RM. There should be a monitoring mechanism to check GPU 
> healthy status from time to time and also the corresponding handling.
> And YARN-8851 will also handle device's monitoring. There could be some 
> common part between the two.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298120#comment-17298120
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks [~pbacsko] for patient review, your suggestion is valid.

I have one question:

YARN-10637 also need enable "RM monitors", if we need to set enable "RM 
monitors" default true, because the enable for "RM monitors" has no side 
effect, and the preemption default is false, i think set to true is more 
reasonable.

What's your opinion about this?

Thanks.:D

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298112#comment-17298112
 ] 

Peter Bacsko edited comment on YARN-10674 at 3/9/21, 3:23 PM:
--

[~zhuqi] I have the following comments:

1. This change seems to always enable "RM monitors":
{noformat}
// This should be always true to trigger dynamic queue auto deletion
// when expired.
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);
{noformat}
But I don't think this is necessary. We need to enable it in two cases: 
preemption is enabled OR we're in weight mode. We don't have auto-queue delete 
in percentage mode (fs2cs can still convert to percentages with a command line 
switch).
 So I suggest that you pass an extra boolean "usePercentages".

Invocation from {{FSConfigToCSConfigConverter}}:
{noformat}
siteConverter.convertSiteProperties(inputYarnSiteConfig,
convertedYarnSiteConfig, drfUsed,
conversionOptions.isEnableAsyncScheduler(), usePercentages);  <-- last 
argument is new
{noformat}
Then in the site converter:
{noformat}
if (conf.getBoolean(FairSchedulerConfiguration.PREEMPTION,
FairSchedulerConfiguration.DEFAULT_PREEMPTION)) {
  yarnSiteConfig.setBoolean(
  YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);
  preemptionEnabled = true;
  ...
}

if (!usePercentages) {
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);   // 
setting it again is OK

String policies =
yarnSiteConfig.get(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES);
if (policies == null) {
  policies = AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();
} else {
  policies += "," + AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();
}

yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES,
policies);

// Set the expired for deletion interval to 10s, consistent with fs.
yarnSiteConfig.setInt(CapacitySchedulerConfiguration.
AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10);
}
{noformat}
If I think about it, {{yarnSiteConfig}} is the output config. So this cannot 
happen:
{noformat}
} else {
  policies += "," + AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();
}
{noformat}
This {{Configuration}} object is created with no entries. The {{else}} branch 
will never be taken.

So it can be simplified to:
{noformat}
if (!usePercentages) {
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);

String policy = AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();

yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES,
policy);

// Set the expired for deletion interval to 10s, consistent with fs.
yarnSiteConfig.setInt(CapacitySchedulerConfiguration.
AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10);
}
{noformat}
2. This also means two separate test cases:
 * When usePercentages = false, then {{RM_SCHEDULER_ENABLE_MONITORS}} and 
{{RM_SCHEDULER_MONITOR_POLICIES}} should be set (with preemption = false)
 * When usePercentages = true, then {{RM_SCHEDULER_ENABLE_MONITORS}} and 
{{RM_SCHEDULER_MONITOR_POLICIES}} should NOT be set (with preemption = false)

I recommend the following naming:
 {{testRmMonitorsAndPoliciesSetWhenUsingWeights()}} - first scenario
 {{testRmMonitorsAndPoliciesSetWhenUsingPercentages()}} - second scenario


was (Author: pbacsko):
[~zhuqi] I have the following comments:

1. This change seems to always enable "RM monitors":
{noformat}
// This should be always true to trigger dynamic queue auto deletion
// when expired.
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);
{noformat}
But I don't think this is necessary. We need to enable it in two cases: 
preemption is enabled OR we're in weight mode. We don't have auto-queue delete 
in percentage mode (fs2cs can still convert to percentages with a command line 
switch).
 So I suggest that you pass an extra boolean "usePercentages".

Invocation from {{FSConfigToCSConfigConverter}}:
{noformat}
siteConverter.convertSiteProperties(inputYarnSiteConfig,
convertedYarnSiteConfig, drfUsed,
conversionOptions.isEnableAsyncScheduler(), usePercentages);  <-- last 
argument is new
{noformat}
Then in the site converter:
{noformat}
if (conf.getBoolean(FairSchedulerConfiguration.PREEMPTION,
FairSchedulerConfiguration.DEFAULT_PREEMPTION)) {
  yarnSiteConfig.setBoolean(
  YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);
  preemptionEnabled = true;
  ...
}

if (!usePercentages) {
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MO

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298112#comment-17298112
 ] 

Peter Bacsko commented on YARN-10674:
-

[~zhuqi] I have the following comments:

1. This change seems to always enable "RM monitors":
{noformat}
// This should be always true to trigger dynamic queue auto deletion
// when expired.
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);
{noformat}
But I don't think this is necessary. We need to enable it in two cases: 
preemption is enabled OR we're in weight mode. We don't have auto-queue delete 
in percentage mode (fs2cs can still convert to percentages with a command line 
switch).
 So I suggest that you pass an extra boolean "usePercentages".

Invocation from {{FSConfigToCSConfigConverter}}:
{noformat}
siteConverter.convertSiteProperties(inputYarnSiteConfig,
convertedYarnSiteConfig, drfUsed,
conversionOptions.isEnableAsyncScheduler(), usePercentages);  <-- last 
argument is new
{noformat}
Then in the site converter:
{noformat}
if (conf.getBoolean(FairSchedulerConfiguration.PREEMPTION,
FairSchedulerConfiguration.DEFAULT_PREEMPTION)) {
  yarnSiteConfig.setBoolean(
  YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);
  preemptionEnabled = true;
  ...
}

if (!usePercentages) {
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);   // 
setting it again is OK

String policies =
yarnSiteConfig.get(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES);
if (policies == null) {
  policies = AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();
} else {
  policies += "," + AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();
}

yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES,
policies);

// Set the expired for deletion interval to 10s, consistent with fs.
yarnSiteConfig.setInt(CapacitySchedulerConfiguration.
AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10);
}
{noformat}
If I think about it, {{yarnSiteConfig}} is the output config. So this cannot 
happen:
{noformat}
} else {
  policies += "," + AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();
}
{noformat}
This {{Configuration}} object is created with no entries. The {{else}} branch 
will never be taken.

So it can be simplified to:
{noformat}
if (!usePercentages) {
yarnSiteConfig.setBoolean(
YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true);

String policy = AutoCreatedQueueDeletionPolicy.
  class.getCanonicalName();

yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES,
policy);

// Set the expired for deletion interval to 10s, consistent with fs.
yarnSiteConfig.setInt(CapacitySchedulerConfiguration.
AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10);
}
{noformat}
2. This also means two separate test cases:
 * When usePercentages = false, then {{RM_SCHEDULER_ENABLE_MONITORS}} and 
{{RM_SCHEDULER_MONITOR_POLICIES}} should be set (with preemption = false)
 * When usePercentages = true, then\{{RM_SCHEDULER_ENABLE_MONITORS}} and 
{{RM_SCHEDULER_MONITOR_POLICIES}} should NOT be set (with preemption = false)

I recommend the following naming:
 {{testRmMonitorsAndPoliciesSetWhenUsingWeights()}} - first scenario
 {{testRmMonitorsAndPoliciesSetWhenUsingPercentages()}} - second scenario

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10671) Fix Typo in TestSchedulingRequestContainerAllocation

2021-03-09 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298102#comment-17298102
 ] 

Brahma Reddy Battula commented on YARN-10671:
-

[~dmmkr] thanks for reporting.. Path lgtm.

> Fix Typo in TestSchedulingRequestContainerAllocation
> 
>
> Key: YARN-10671
> URL: https://issues.apache.org/jira/browse/YARN-10671
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Minor
> Attachments: YARN-10671.001.patch
>
>
> Remove extra colon in  
> TestSchedulingRequestContainerAllocation.testInterAppCompositeConstraints , 
> the test case is passing currently but, It will fail once YARN-4283 gets 
> merged.
> {code:java}
> MockNM nm1 = rm.registerNode("192.168.0.1:1234:", 100*GB, 100);{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298092#comment-17298092
 ] 

Peter Bacsko commented on YARN-10674:
-

Ok thanks, I'll review this one soon.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298076#comment-17298076
 ] 

Qi Zhu edited comment on YARN-10674 at 3/9/21, 2:10 PM:


Thanks [~pbacsko] for review.

It not depends on YARN-10682 , "," already supported now,  YARN-10682 fixed the 
"," with space error , just like YARN-10497 we should fix, such as the case:

"a,  b" is not supported now, but "a,b" is supported.:D

I updated the description for YARN-10682.

 


was (Author: zhuqi):
Thanks [~pbacsko] for review.

It not depends on YARN-10682 , "," already supported now,  YARN-10682 fixed the 
"," with space error such as the case:

"a,  b" is not supported now, but "a,b" is supported.:D

I updated the description for YARN-10682.

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298076#comment-17298076
 ] 

Qi Zhu edited comment on YARN-10674 at 3/9/21, 1:54 PM:


Thanks [~pbacsko] for review.

It not depends on YARN-10682 , "," already supported now,  YARN-10682 fixed the 
"," with space error such as the case:

"a,  b" is not supported now, but "a,b" is supported.:D

I updated the description for YARN-10682.

 


was (Author: zhuqi):
Thanks [~pbacsko] for review.

It not depends on YARN-10682 , "," already supported now,  YARN-10682 fixed the 
"," with space error such as the case:

"a,  b" is not supported now, but "a,b" is supported.:D

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should support trim between ",".

2021-03-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10682:
--
Description: 
When i configured scheduler monitor policies with space, the RM will start with 
error.

The conf should support trim between "," , such as :

"a,b,c" is supported now, but "a,   b,  c" is not supported now, just add trim 
in this jira.

 

When tested multi policy, it happened.

 
 yarn.resourcemanager.scheduler.monitor.policies
 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
 

 

  was:
When i configured scheduler monitor policies with space, the RM will start with 
error.

The conf should support trim between ",".

 

When tested multi policy, it happened.

 
 yarn.resourcemanager.scheduler.monitor.policies
 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
 

 


> The scheduler monitor policies conf should support trim between ",".
> 
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between "," , such as :
> "a,b,c" is supported now, but "a,   b,  c" is not supported now, just add 
> trim in this jira.
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should support trim between ",".

2021-03-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10682:
--
Description: 
When i configured scheduler monitor policies with space, the RM will start with 
error.

The conf should support trim between "," , such as :

"a,b,c" is supported now, but "a,   b,  c" is not supported now, just add trim 
in this jira.

 

When tested multi policy, it happened.

 
 yarn.resourcemanager.scheduler.monitor.policies
 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,
   
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
 

 

  was:
When i configured scheduler monitor policies with space, the RM will start with 
error.

The conf should support trim between "," , such as :

"a,b,c" is supported now, but "a,   b,  c" is not supported now, just add trim 
in this jira.

 

When tested multi policy, it happened.

 
 yarn.resourcemanager.scheduler.monitor.policies
 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
 

 


> The scheduler monitor policies conf should support trim between ",".
> 
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between "," , such as :
> "a,b,c" is supported now, but "a,   b,  c" is not supported now, just add 
> trim in this jira.
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,
>    
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298076#comment-17298076
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks [~pbacsko] for review.

It not depends on YARN-10682 , "," already supported now,  YARN-10682 fixed the 
"," with space error such as the case:

"a,  b" is not supported now, but "a,b" is supported.:D

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298074#comment-17298074
 ] 

Peter Bacsko commented on YARN-10674:
-

[~zhuqi] am I right when I think that this patch depends on YARN-10682? Because 
this change generates a config entry with "," and it's not supported now.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298072#comment-17298072
 ] 

Qi Zhu commented on YARN-9615:
--

Thanks [~pbacsko] for commit and review.

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298069#comment-17298069
 ] 

Peter Bacsko commented on YARN-9615:


+1

I had to commit twice because there are actually two authors.

Thanks for the patch [~jhung] / [~zhuqi] and [~bibinchundatt] for the review.

Committed to trunk.

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298068#comment-17298068
 ] 

Peter Bacsko commented on YARN-9615:


Thanks [~zhuqi] patch v11 looks good, committing it soon.

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298064#comment-17298064
 ] 

Peter Bacsko commented on YARN-10679:
-

+1 thanks [~snemeth] for the patch and [~shuzirra] for the review.

Committed to trunk.

> Better logging of uncaught exceptions throughout SLS
> 
>
> Key: YARN-10679
> URL: https://issues.apache.org/jira/browse/YARN-10679
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10679.001.patch
>
>
> In our internal environment, there was a test failure while running SLS tests 
> with Jenkins.
> It's difficult to align the uncaught exceptions (in this case an NPE) and the 
> log itself as the exception is logged with {{e.printStackTrace()}}.
> This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", 
> exception)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298062#comment-17298062
 ] 

Peter Bacsko commented on YARN-10679:
-

Ok, this time the failed test is different, most likely a flaky one. Let's 
investigate it later.

> Better logging of uncaught exceptions throughout SLS
> 
>
> Key: YARN-10679
> URL: https://issues.apache.org/jira/browse/YARN-10679
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10679.001.patch
>
>
> In our internal environment, there was a test failure while running SLS tests 
> with Jenkins.
> It's difficult to align the uncaught exceptions (in this case an NPE) and the 
> log itself as the exception is logged with {{e.printStackTrace()}}.
> This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", 
> exception)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS

2021-03-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298061#comment-17298061
 ] 

Hadoop QA commented on YARN-10679:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
27s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 50s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {col

[jira] [Commented] (YARN-10497) Fix an issue in CapacityScheduler which fails to delete queues

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298059#comment-17298059
 ] 

Qi Zhu commented on YARN-10497:
---

[~shuzirra] [~pbacsko]

Updated a patch to use getTrimmedStringCollection to fix this issue.:D

Thanks.

 

> Fix an issue in CapacityScheduler which fails to delete queues
> --
>
> Key: YARN-10497
> URL: https://issues.apache.org/jira/browse/YARN-10497
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-10497.001.patch, YARN-10497.002.patch, 
> YARN-10497.003.patch, YARN-10497.004.patch, YARN-10497.005.patch
>
>
> We saw an exception when using queue mutation APIs:
> {code:java}
> 2020-11-13 16:47:46,327 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Queue 
> root.am2cmQueueSecond not found
> {code}
> Which comes from this code:
> {code:java}
> List siblingQueues = getSiblingQueues(queueToRemove,
> proposedConf);
> if (!siblingQueues.contains(queueName)) {
>   throw new IOException("Queue " + queueToRemove + " not found");
> } 
> {code}
> (Inside MutableCSConfigurationProvider)
> If you look at the method:
> {code:java}
>  
>   private List getSiblingQueues(String queuePath, Configuration conf) 
> {
> String parentQueue = queuePath.substring(0, queuePath.lastIndexOf('.'));
> String childQueuesKey = CapacitySchedulerConfiguration.PREFIX +
> parentQueue + CapacitySchedulerConfiguration.DOT +
> CapacitySchedulerConfiguration.QUEUES;
> return new ArrayList<>(conf.getStringCollection(childQueuesKey));
>   }
> {code}
> And here's capacity-scheduler.xml I got
> {code:java}
> yarn.scheduler.capacity.root.queuesdefault, q1, 
> q2
> {code}
> You can notice there're spaces between default, q1, a2
> So conf.getStringCollection returns:
> {code:java}
> default
> q1
> ...
> {code}
> Which causes match issue when we try to delete the queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10497) Fix an issue in CapacityScheduler which fails to delete queues

2021-03-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10497:
--
Attachment: YARN-10497.005.patch

> Fix an issue in CapacityScheduler which fails to delete queues
> --
>
> Key: YARN-10497
> URL: https://issues.apache.org/jira/browse/YARN-10497
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-10497.001.patch, YARN-10497.002.patch, 
> YARN-10497.003.patch, YARN-10497.004.patch, YARN-10497.005.patch
>
>
> We saw an exception when using queue mutation APIs:
> {code:java}
> 2020-11-13 16:47:46,327 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Queue 
> root.am2cmQueueSecond not found
> {code}
> Which comes from this code:
> {code:java}
> List siblingQueues = getSiblingQueues(queueToRemove,
> proposedConf);
> if (!siblingQueues.contains(queueName)) {
>   throw new IOException("Queue " + queueToRemove + " not found");
> } 
> {code}
> (Inside MutableCSConfigurationProvider)
> If you look at the method:
> {code:java}
>  
>   private List getSiblingQueues(String queuePath, Configuration conf) 
> {
> String parentQueue = queuePath.substring(0, queuePath.lastIndexOf('.'));
> String childQueuesKey = CapacitySchedulerConfiguration.PREFIX +
> parentQueue + CapacitySchedulerConfiguration.DOT +
> CapacitySchedulerConfiguration.QUEUES;
> return new ArrayList<>(conf.getStringCollection(childQueuesKey));
>   }
> {code}
> And here's capacity-scheduler.xml I got
> {code:java}
> yarn.scheduler.capacity.root.queuesdefault, q1, 
> q2
> {code}
> You can notice there're spaces between default, q1, a2
> So conf.getStringCollection returns:
> {code:java}
> default
> q1
> ...
> {code}
> Which causes match issue when we try to delete the queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298057#comment-17298057
 ] 

Peter Bacsko commented on YARN-10679:
-

Re-triggered build to see what's going on with TestSLSRunner.

> Better logging of uncaught exceptions throughout SLS
> 
>
> Key: YARN-10679
> URL: https://issues.apache.org/jira/browse/YARN-10679
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10679.001.patch
>
>
> In our internal environment, there was a test failure while running SLS tests 
> with Jenkins.
> It's difficult to align the uncaught exceptions (in this case an NPE) and the 
> log itself as the exception is logged with {{e.printStackTrace()}}.
> This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", 
> exception)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class

2021-03-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298055#comment-17298055
 ] 

Hadoop QA commented on YARN-10677:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
28s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
39s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  9s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
43s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green}{color} | {color:green} hadoop-tools/hadoop-sls: The 
patch generated 0 new + 1 unchanged - 1 fixed = 1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 49s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:

[jira] [Commented] (YARN-10681) Fix assertion failure message in BaseSLSRunnerTest

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298041#comment-17298041
 ] 

Peter Bacsko commented on YARN-10681:
-

+1 thanks [~snemeth] and [~shuzirra] for the patch and review, committed to 
trunk.

> Fix assertion failure message in BaseSLSRunnerTest
> --
>
> Key: YARN-10681
> URL: https://issues.apache.org/jira/browse/YARN-10681
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Trivial
> Attachments: YARN-10681.001.patch
>
>
> There is this failure message: 
> https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/BaseSLSRunnerTest.java#L129-L130
> The word "catched" should be replaced with "caught".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-03-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298032#comment-17298032
 ] 

Hadoop QA commented on YARN-9615:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
39s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 4 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
40s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
18s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
8s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 31s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
56s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
36s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
36s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
25s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
25s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 10s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. 

[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298031#comment-17298031
 ] 

Peter Bacsko commented on YARN-10677:
-

+1 LGTM.

Thanks [~snemeth] for the patch and [~zhuqi] for the review. Committed to 
trunk. (Jenkins is running but I don't expect any issues).

> Logger of SLSFairScheduler is provided with the wrong class
> ---
>
> Key: YARN-10677
> URL: https://issues.apache.org/jira/browse/YARN-10677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10677.001.patch, YARN-10677.002.patch, 
> YARN-10677.003.patch, YARN-10677.004.patch
>
>
> In SLSFairScheduler, the Logger definition looks like: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69
> We need to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class

2021-03-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298018#comment-17298018
 ] 

Szilard Nemeth commented on YARN-10677:
---

[~pbacsko],
Attached 004 patch that fixes the checkstyle and whitespace.

> Logger of SLSFairScheduler is provided with the wrong class
> ---
>
> Key: YARN-10677
> URL: https://issues.apache.org/jira/browse/YARN-10677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10677.001.patch, YARN-10677.002.patch, 
> YARN-10677.003.patch, YARN-10677.004.patch
>
>
> In SLSFairScheduler, the Logger definition looks like: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69
> We need to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class

2021-03-09 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10677:
--
Attachment: YARN-10677.004.patch

> Logger of SLSFairScheduler is provided with the wrong class
> ---
>
> Key: YARN-10677
> URL: https://issues.apache.org/jira/browse/YARN-10677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10677.001.patch, YARN-10677.002.patch, 
> YARN-10677.003.patch, YARN-10677.004.patch
>
>
> In SLSFairScheduler, the Logger definition looks like: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69
> We need to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks

2021-03-09 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10680:
--
Priority: Minor  (was: Major)

> Revisit try blocks without catch blocks but having finally blocks
> -
>
> Key: YARN-10680
> URL: https://issues.apache.org/jira/browse/YARN-10680
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Minor
>  Labels: newbie
>
> This jira is to revisit all try blocks without catch blocks but having 
> finally blocks in SLS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks

2021-03-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298017#comment-17298017
 ] 

Szilard Nemeth commented on YARN-10680:
---

[~shuzirra] You might be interested in taking this jira.

> Revisit try blocks without catch blocks but having finally blocks
> -
>
> Key: YARN-10680
> URL: https://issues.apache.org/jira/browse/YARN-10680
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie
>
> This jira is to revisit all try blocks without catch blocks but having 
> finally blocks in SLS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks

2021-03-09 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-10680:
-

Assignee: (was: Szilard Nemeth)

> Revisit try blocks without catch blocks but having finally blocks
> -
>
> Key: YARN-10680
> URL: https://issues.apache.org/jira/browse/YARN-10680
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie
>
> This jira is to revisit all try blocks without catch blocks but having 
> finally blocks in SLS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start

2021-03-09 Thread Sushanta Sen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanta Sen updated YARN-10684:

Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : :

Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client 
-jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
 -shell_command sleep -shell_args 20 -num_containers 10 -container_type 
OPPORTUNISTIC -*promote_opportunistic_after_start*

Actual Result: Distributed Shell Yarn Job Failed almost all times with below 
Diagnostics message

*[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated 
= 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to 
make room for Guaranteed Container.]*

Expected Result: DS job should be successful with argument 
"promote_opportunistic_after_start" * ** *

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
yarn.resourcemanager.opportunistic-container-allocation.enabled
true

 # Set this in NM[s]yarn-site.xml ::: 
yarn.nodemanager.opportunistic-containers-max-queue-length
30


 
Test Steps:

Job Command : :

Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client 
-jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
 -shell_command sleep -shell_args 20 -num_containers 10 -container_type 
OPPORTUNISTIC -*promote_opportunistic_after_start*

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message

*[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated 
= 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to 
make room for Guaranteed Container.]*

Expected Result: DS job should be successful with argument 
"promote_opportunistic_after_start" ** **


> YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried 
> adding flag -promote_opportunistic_after_start 
> ---
>
> Key: YARN-10684
> URL: https://issues.apache.org/jira/browse/YARN-10684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-scheduling
>Affects Versions: 3.1.1
>Reporter: Sushanta Sen
>Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::
>  yarn.resourcemanager.opportunistic-container-allocation.enabled
>  true
>  
>  # Set this in NM[s]yarn-site.xml ::: 
>  yarn.nodemanager.opportunistic-containers-max-queue-length
>  30
>  
>  
>  Test Steps:
> Job Command : :
> Job Command :: yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
>  -shell_command sleep -shell_args 20 -num_containers 10 -container_type 
> OPPORTUNISTIC -*promote_opportunistic_after_start*
> Actual Result: Distributed Shell Yarn Job Failed almost all times with below 
> Diagnostics message
> *[ Failed Reason : Application Failure: desired = 10, completed = 10, 
> allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container 
> Killed to make room for Guaranteed Container.]*
> Expected Result: DS job should be successful with argument 
> "promote_opportunistic_after_start" * ** *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start

2021-03-09 Thread Sushanta Sen (Jira)
Sushanta Sen created YARN-10684:
---

 Summary: YARN: Opportunistic Container :: Distributed YARN Job has 
Failed when tried adding flag -promote_opportunistic_after_start 
 Key: YARN-10684
 URL: https://issues.apache.org/jira/browse/YARN-10684
 Project: Hadoop YARN
  Issue Type: Bug
  Components: distributed-scheduling
Affects Versions: 3.1.1
Reporter: Sushanta Sen


Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
yarn.resourcemanager.opportunistic-container-allocation.enabled
true

 # Set this in NM[s]yarn-site.xml ::: 
yarn.nodemanager.opportunistic-containers-max-queue-length
30


 
Test Steps:

Job Command : :

Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client 
-jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
 -shell_command sleep -shell_args 20 -num_containers 10 -container_type 
OPPORTUNISTIC -*promote_opportunistic_after_start*

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message

*[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated 
= 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to 
make room for Guaranteed Container.]*

Expected Result: DS job should be successful with argument 
"promote_opportunistic_after_start" ** **



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10682) The scheduler monitor policies conf should support trim between ",".

2021-03-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298011#comment-17298011
 ] 

Hadoop QA commented on YARN-10682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
26s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  9s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  
1s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; 
considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 14s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {col

[jira] [Commented] (YARN-10678) Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298010#comment-17298010
 ] 

Peter Bacsko commented on YARN-10678:
-

+1 thanks [~snemeth] for the patch and [~shuzirra] for the review.

Committed to trunk.

> Try blocks without catch blocks in SLS scheduler classes can swallow other 
> exceptions
> -
>
> Key: YARN-10678
> URL: https://issues.apache.org/jira/browse/YARN-10678
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10678-unchecked-exception-from-FS-allocate.diff, 
> YARN-10678-unchecked-exception-from-FS-allocate_fixed.diff, 
> YARN-10678.001.patch, 
> org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_modified.log,
>  
> org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_original.log
>
>
> In SLSFairScheduler, we have this try-finally block (without catch block) in 
> the allocate method: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L109-L123
> Similarly, in SLSCapacityScheduler: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java#L116-L131
> In the finally block, the updateQueueWithAllocateRequest is invoked: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L118
> In our internal environment, there was a situation when an NPE was logged 
> from this method: 
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.updateQueueWithAllocateRequest(SLSFairScheduler.java:262)
>   at 
> org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.allocate(SLSFairScheduler.java:118)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:288)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
>   at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This can happen if the following events occur:
> 1. A runtime exception is thrown in FairScheduler or CapacityScheduler's 
> allocate method 
> 2. In this case, the local variable called 'allocation' remains null: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L110
> 3. In updateQueueWithAllocateRequest, this null object will be dereferenced 
> here: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L262
> 4. Then, we have an NPE here: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L117-L122
> In this case, we lost the original exception thrown from 
> FairScheduler#allocate.
> In order to fix this, a catch-block should be introduced and the exception 
> needs to be logged.
> The whole thing applies to SLSCapacityScheduler as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005

[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI

2021-03-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298009#comment-17298009
 ] 

Bilwa S T commented on YARN-10588:
--

Hi [~Jim_Brennan]

can you please take a look at this Jira when you get time? Thanks

> Percentage of queue and cluster is zero in WebUI 
> -
>
> Key: YARN-10588
> URL: https://issues.apache.org/jira/browse/YARN-10588
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10588.001.patch, YARN-10588.002.patch, 
> YARN-10588.003.patch, YARN-10588.004.patch
>
>
> Steps to reproduce:
> Configure below property in resource-types.xml
> {code:java}
> 
>  yarn.resource-types
>  yarn.io/gpu
>  {code}
> Submit a job
> In UI you can see % Of Queue and % Of Cluster is zero for the submitted 
> application
>  
> This is because in SchedulerApplicationAttempt has below check for 
> calculating queueUsagePerc and clusterUsagePerc
> {code:java}
> if (!calc.isInvalidDivisor(cluster)) {
> float queueCapacityPerc = queue.getQueueInfo(false, false)
> .getCapacity();
> queueUsagePerc = calc.divide(cluster, usedResourceClone,
> Resources.multiply(cluster, queueCapacityPerc)) * 100;
> if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) {
>   queueUsagePerc = 0.0f;
> }
> clusterUsagePerc =
> calc.divide(cluster, usedResourceClone, cluster) * 100;
>   }
> {code}
> calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298007#comment-17298007
 ] 

Peter Bacsko commented on YARN-10677:
-

[~snemeth] please fix the whitespace and checkstyle, thanks. 

> Logger of SLSFairScheduler is provided with the wrong class
> ---
>
> Key: YARN-10677
> URL: https://issues.apache.org/jira/browse/YARN-10677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10677.001.patch, YARN-10677.002.patch, 
> YARN-10677.003.patch
>
>
> In SLSFairScheduler, the Logger definition looks like: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69
> We need to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS

2021-03-09 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298002#comment-17298002
 ] 

Gergely Pollak commented on YARN-10679:
---

[~snemeth] thank you for the patch, LGTM+1 (Non-binding)

> Better logging of uncaught exceptions throughout SLS
> 
>
> Key: YARN-10679
> URL: https://issues.apache.org/jira/browse/YARN-10679
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10679.001.patch
>
>
> In our internal environment, there was a test failure while running SLS tests 
> with Jenkins.
> It's difficult to align the uncaught exceptions (in this case an NPE) and the 
> log itself as the exception is logged with {{e.printStackTrace()}}.
> This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", 
> exception)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10681) Fix assertion failure message in BaseSLSRunnerTest

2021-03-09 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298000#comment-17298000
 ] 

Gergely Pollak commented on YARN-10681:
---

[~snemeth] thank you for the patch LGTM+1(Non-binding)

> Fix assertion failure message in BaseSLSRunnerTest
> --
>
> Key: YARN-10681
> URL: https://issues.apache.org/jira/browse/YARN-10681
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Trivial
> Attachments: YARN-10681.001.patch
>
>
> There is this failure message: 
> https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/BaseSLSRunnerTest.java#L129-L130
> The word "catched" should be replaced with "caught".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10678) Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions

2021-03-09 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297999#comment-17297999
 ] 

Gergely Pollak commented on YARN-10678:
---

[~snemeth] thank you for the patch, LGTM+1 (Non-binding)

> Try blocks without catch blocks in SLS scheduler classes can swallow other 
> exceptions
> -
>
> Key: YARN-10678
> URL: https://issues.apache.org/jira/browse/YARN-10678
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10678-unchecked-exception-from-FS-allocate.diff, 
> YARN-10678-unchecked-exception-from-FS-allocate_fixed.diff, 
> YARN-10678.001.patch, 
> org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_modified.log,
>  
> org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_original.log
>
>
> In SLSFairScheduler, we have this try-finally block (without catch block) in 
> the allocate method: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L109-L123
> Similarly, in SLSCapacityScheduler: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java#L116-L131
> In the finally block, the updateQueueWithAllocateRequest is invoked: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L118
> In our internal environment, there was a situation when an NPE was logged 
> from this method: 
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.updateQueueWithAllocateRequest(SLSFairScheduler.java:262)
>   at 
> org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.allocate(SLSFairScheduler.java:118)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:288)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348)
>   at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212)
>   at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This can happen if the following events occur:
> 1. A runtime exception is thrown in FairScheduler or CapacityScheduler's 
> allocate method 
> 2. In this case, the local variable called 'allocation' remains null: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L110
> 3. In updateQueueWithAllocateRequest, this null object will be dereferenced 
> here: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L262
> 4. Then, we have an NPE here: 
> https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L117-L122
> In this case, we lost the original exception thrown from 
> FairScheduler#allocate.
> In order to fix this, a catch-block should be introduced and the exception 
> needs to be logged.
> The whole thing applies to SLSCapacityScheduler as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (YARN-10675) Consolidate YARN-10672 and YARN-10447

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297998#comment-17297998
 ] 

Peter Bacsko commented on YARN-10675:
-

+1 LGTM.

Thanks [~snemeth] for the patch, committed to trunk.

> Consolidate YARN-10672 and YARN-10447
> -
>
> Key: YARN-10675
> URL: https://issues.apache.org/jira/browse/YARN-10675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10675.001.patch
>
>
> Let's consolidate the solution applied for YARN-10672 and apply it to the 
> code changes introduced with YARN-10447.
> Quoting [~pbacsko]: 
> {quote}
> The solution is much straightforward than mine in YARN-10447. Actually we 
> might consider applying this to TestLeafQueue with undoing my changes, 
> because that's more complicated (I had no patience to go deeper with Mockito 
> internal behavior, I just thought well, disable that thread and that's 
> enough).
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10676) Improve code quality in TestTimelineAuthenticationFilterForV1

2021-03-09 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297996#comment-17297996
 ] 

Peter Bacsko commented on YARN-10676:
-

+1 thanks [~snemeth] for the patch and [~bteke] / [~zhuqi] / [~shuzirra] for 
the review.

Committed to trunk.

> Improve code quality in TestTimelineAuthenticationFilterForV1
> -
>
> Key: YARN-10676
> URL: https://issues.apache.org/jira/browse/YARN-10676
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-10676.001.patch
>
>
> - In testcase "testDelegationTokenOperations", the exception message is 
> checked but in case it does not match the assertion, the exception is not 
> printed. This happens 3 times.
> - Assertion messages can be added
> - Fields called "httpSpnegoKeytabFile" and "httpSpnegoPrincipal" can be 
> static final.
> - There's a typo in comment "avaiable" (happens 2 times)
> - There are some Assert.fail() calls, without messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10683) Add total resource in NodeManager metrics

2021-03-09 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10683:
---

Assignee: Minni Mittal

> Add total resource in NodeManager metrics
> -
>
> Key: YARN-10683
> URL: https://issues.apache.org/jira/browse/YARN-10683
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10683) Add total resource in NodeManager metrics

2021-03-09 Thread Minni Mittal (Jira)
Minni Mittal created YARN-10683:
---

 Summary: Add total resource in NodeManager metrics
 Key: YARN-10683
 URL: https://issues.apache.org/jira/browse/YARN-10683
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl

2021-03-09 Thread ANANDA G B (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297827#comment-17297827
 ] 

ANANDA G B edited comment on YARN-9731 at 3/9/21, 9:09 AM:
---

Thanks for working on it [~magnum].

[~prabhujoseph] Can we commit this 3.1.2 and 3.3.1 version.


was (Author: gb.ana...@gmail.com):
Thanks for working on it [~magnum].

[~prabhujoseph] Can we commit this 3.1.2 version.

> In ATS v1.5, all jobs are visible to all users without view-acl
> ---
>
> Key: YARN-9731
> URL: https://issues.apache.org/jira/browse/YARN-9731
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9731.001.patch, YARN-9731.002.patch, 
> YARN-9731.003.patch, YARN-9731.004.patch, YARN-9731.005.patch, 
> ats_v1.5_screenshot.png
>
>
> In ATS v1.5 of secure mode,
> all jobs are visible to all users without view-acl.
> if user does not have view-acl,  user should not be able to see jobs.
> I attatched ATS UI screenshot.
>  
> ATS v1.5 log
> {code:java}
> 2019-08-09 10:21:13,679 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1954. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1954
> 2019-08-09 10:21:13,680 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1951. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1951
> {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297927#comment-17297927
 ] 

Qi Zhu commented on YARN-9615:
--

Submitted again, trigger the Jenkins. 

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.002.patch, 
> YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, 
> YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, 
> YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, 
> YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, 
> image-2021-03-04-10-36-12-441.png, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10682) The scheduler monitor policies conf should support trim between ",".

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297923#comment-17297923
 ] 

Qi Zhu edited comment on YARN-10682 at 3/9/21, 8:01 AM:


 [~pbacsko], [~gandras],  [~shuzirra], [~bteke] 

No additional test needed.

Could you help to review this fix? :D

Thanks.


was (Author: zhuqi):
 [~pbacsko], [~gandras],  [~shuzirra], [~bteke] Could you help to review this 
fix? :D

Thanks.

> The scheduler monitor policies conf should support trim between ",".
> 
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between ",".
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10682) The scheduler monitor policies conf should support trim between ",".

2021-03-09 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297923#comment-17297923
 ] 

Qi Zhu commented on YARN-10682:
---

 [~pbacsko], [~gandras],  [~shuzirra], [~bteke] Could you help to review this 
fix? :D

Thanks.

> The scheduler monitor policies conf should support trim between ",".
> 
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between ",".
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should support trim between ",".

2021-03-09 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10682:
--
Description: 
When i configured scheduler monitor policies with space, the RM will start with 
error.

The conf should support trim between ",".

 

When tested multi policy, it happened.

 
 yarn.resourcemanager.scheduler.monitor.policies
 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
 

 

  was:
When i configured scheduler monitor policies with space, the RM will start with 
error.

The conf should support trim between ",".

 


> The scheduler monitor policies conf should support trim between ",".
> 
>
> Key: YARN-10682
> URL: https://issues.apache.org/jira/browse/YARN-10682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10682.001.patch
>
>
> When i configured scheduler monitor policies with space, the RM will start 
> with error.
> The conf should support trim between ",".
>  
> When tested multi policy, it happened.
>  
>  yarn.resourcemanager.scheduler.monitor.policies
>  
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org