[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298587#comment-17298587 ] Qi Zhu edited comment on YARN-10674 at 3/10/21, 7:36 AM: - Thanks a lot [~pbacsko] for reply. Your advice make sense to me , and it's the only solution to solve it now, because the preemption is related to the enable option. “However, you already added a policy in YARN-10623, so looks like this property always has to be enabled in weight mode.” Actually this is used in all mode for auto refresh. I will update it in next patch. Thanks. was (Author: zhuqi): Thanks a lot [~pbacsko] for reply. Your advice make sense to me , and it's the only solution to solve it now, because the preemption is related to the enable option. I will update it in next patch. Thanks. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298587#comment-17298587 ] Qi Zhu commented on YARN-10674: --- Thanks a lot [~pbacsko] for reply. Your advice make sense to me , and it's the only solution to solve it now, because the preemption is related to the enable option. I will update it in next patch. Thanks. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298568#comment-17298568 ] Akira Ajisaka commented on YARN-10501: -- Filed HADOOP-17572 for this issue. > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.1.001.patch, > YARN-10501-branch-2.10.1.002.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298565#comment-17298565 ] Akira Ajisaka commented on YARN-10501: -- This error is not due to the yetus changes. It is because openjdk-7-jdk has been removed from ppa https://launchpad.net/~openjdk-r/+archive/ubuntu/ppa/+packages?field.name_filter=&field.status_filter=superseded&field.series_filter=xenial > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.1.001.patch, > YARN-10501-branch-2.10.1.002.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI
[ https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298332#comment-17298332 ] Eric Payne commented on YARN-10588: --- Thanks [~BilwaST] for reporting the issue and the fixes! The changes LGTM. [~Jim_Brennan], do you want to weigh in? > Percentage of queue and cluster is zero in WebUI > - > > Key: YARN-10588 > URL: https://issues.apache.org/jira/browse/YARN-10588 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10588.001.patch, YARN-10588.002.patch, > YARN-10588.003.patch, YARN-10588.004.patch > > > Steps to reproduce: > Configure below property in resource-types.xml > {code:java} > > yarn.resource-types > yarn.io/gpu > {code} > Submit a job > In UI you can see % Of Queue and % Of Cluster is zero for the submitted > application > > This is because in SchedulerApplicationAttempt has below check for > calculating queueUsagePerc and clusterUsagePerc > {code:java} > if (!calc.isInvalidDivisor(cluster)) { > float queueCapacityPerc = queue.getQueueInfo(false, false) > .getCapacity(); > queueUsagePerc = calc.divide(cluster, usedResourceClone, > Resources.multiply(cluster, queueCapacityPerc)) * 100; > if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) { > queueUsagePerc = 0.0f; > } > clusterUsagePerc = > calc.divide(cluster, usedResourceClone, cluster) * 100; > } > {code} > calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298253#comment-17298253 ] Eric Badger commented on YARN-10501: [~ahussein], [~aajisaka], is this due to any of the recent yetus changes? New branch-2.10 patches are failing Hadoop QA because it can't find openjdk-7-jdk > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.1.001.patch, > YARN-10501-branch-2.10.1.002.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl
[ https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298166#comment-17298166 ] Brahma Reddy Battula commented on YARN-9731: [~abmodi] could you check the latest patch, looks all your comments are addressed. It's an security issue,better to have for branch-3.3 also. > In ATS v1.5, all jobs are visible to all users without view-acl > --- > > Key: YARN-9731 > URL: https://issues.apache.org/jira/browse/YARN-9731 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9731.001.patch, YARN-9731.002.patch, > YARN-9731.003.patch, YARN-9731.004.patch, YARN-9731.005.patch, > ats_v1.5_screenshot.png > > > In ATS v1.5 of secure mode, > all jobs are visible to all users without view-acl. > if user does not have view-acl, user should not be able to see jobs. > I attatched ATS UI screenshot. > > ATS v1.5 log > {code:java} > 2019-08-09 10:21:13,679 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1954. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1954 > 2019-08-09 10:21:13,680 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1951. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1951 > {code} > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI
[ https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298159#comment-17298159 ] Eric Payne commented on YARN-10588: --- [~BilwaST], sorry for the delay. I will look at this today. > Percentage of queue and cluster is zero in WebUI > - > > Key: YARN-10588 > URL: https://issues.apache.org/jira/browse/YARN-10588 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10588.001.patch, YARN-10588.002.patch, > YARN-10588.003.patch, YARN-10588.004.patch > > > Steps to reproduce: > Configure below property in resource-types.xml > {code:java} > > yarn.resource-types > yarn.io/gpu > {code} > Submit a job > In UI you can see % Of Queue and % Of Cluster is zero for the submitted > application > > This is because in SchedulerApplicationAttempt has below check for > calculating queueUsagePerc and clusterUsagePerc > {code:java} > if (!calc.isInvalidDivisor(cluster)) { > float queueCapacityPerc = queue.getQueueInfo(false, false) > .getCapacity(); > queueUsagePerc = calc.divide(cluster, usedResourceClone, > Resources.multiply(cluster, queueCapacityPerc)) * 100; > if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) { > queueUsagePerc = 0.0f; > } > clusterUsagePerc = > calc.divide(cluster, usedResourceClone, cluster) * 100; > } > {code} > calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298156#comment-17298156 ] Peter Bacsko commented on YARN-10674: - [~zhuqi] this is very interesting. If we set RM Monitors to enabled, it means that system-wide preemption is always enabled, too: AbstractCSQueue: {noformat} private boolean isQueueHierarchyPreemptionDisabled(CSQueue q, CapacitySchedulerConfiguration configuration) { boolean systemWidePreemption = csContext.getConfiguration() .getBoolean(YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ENABLE_MONITORS); CSQueue parentQ = q.getParent(); // If the system-wide preemption switch is turned off, all of the queues in // the qPath hierarchy have preemption disabled, so return true. if (!systemWidePreemption) return true; {noformat} However, you already added a policy in YARN-10623, so looks like this property always has to be enabled in weight mode. But what if we convert an FS configuration which disabled preemption completely? I think the best thing we can do right now is that we disable preemption for "root", which will propagate to all other parent queues. So I suggest the following approach: 1. In percentage conversion mode, do not enable RM monitors by default, because it's not needed. 2. In weight mode (which is the default now), we have to enable it. But if "yarn.scheduler.fair.preemption" is false, then "yarn.scheduler.capacity.root.disable_preemption" must be set to true, but only for "root". This can be done in {{FSQueueConverter}}. cc [~bteke] [~gandras] [~snemeth], not sure if this is a good approach, but I can't see anything better. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10497) Fix an issue in CapacityScheduler which fails to delete queues
[ https://issues.apache.org/jira/browse/YARN-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298128#comment-17298128 ] Hadoop QA commented on YARN-10497: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 20s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 41s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 52s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green}{color} | {color
[jira] [Commented] (YARN-8823) Monitor the healthy state of GPU
[ https://issues.apache.org/jira/browse/YARN-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298127#comment-17298127 ] Qi Zhu commented on YARN-8823: -- [~adam.antal] [~tangzhankun] Is this going on? "I was wondering if this issue can be easily finished by writing a custom health checker script (idea from YARN-9923). I think it would make sense to push this feature - would you like to share your PoC and compare its advantages/disadvantages to a node checker script?" This is a good suggestion. > Monitor the healthy state of GPU > > > Key: YARN-8823 > URL: https://issues.apache.org/jira/browse/YARN-8823 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > We have GPU resource discovered when the NM bootstrap but not updated through > later heatbeat with RM. There should be a monitoring mechanism to check GPU > healthy status from time to time and also the corresponding handling. > And YARN-8851 will also handle device's monitoring. There could be some > common part between the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298120#comment-17298120 ] Qi Zhu commented on YARN-10674: --- Thanks [~pbacsko] for patient review, your suggestion is valid. I have one question: YARN-10637 also need enable "RM monitors", if we need to set enable "RM monitors" default true, because the enable for "RM monitors" has no side effect, and the preemption default is false, i think set to true is more reasonable. What's your opinion about this? Thanks.:D > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298112#comment-17298112 ] Peter Bacsko edited comment on YARN-10674 at 3/9/21, 3:23 PM: -- [~zhuqi] I have the following comments: 1. This change seems to always enable "RM monitors": {noformat} // This should be always true to trigger dynamic queue auto deletion // when expired. yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); {noformat} But I don't think this is necessary. We need to enable it in two cases: preemption is enabled OR we're in weight mode. We don't have auto-queue delete in percentage mode (fs2cs can still convert to percentages with a command line switch). So I suggest that you pass an extra boolean "usePercentages". Invocation from {{FSConfigToCSConfigConverter}}: {noformat} siteConverter.convertSiteProperties(inputYarnSiteConfig, convertedYarnSiteConfig, drfUsed, conversionOptions.isEnableAsyncScheduler(), usePercentages); <-- last argument is new {noformat} Then in the site converter: {noformat} if (conf.getBoolean(FairSchedulerConfiguration.PREEMPTION, FairSchedulerConfiguration.DEFAULT_PREEMPTION)) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); preemptionEnabled = true; ... } if (!usePercentages) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); // setting it again is OK String policies = yarnSiteConfig.get(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES); if (policies == null) { policies = AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); } else { policies += "," + AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); } yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, policies); // Set the expired for deletion interval to 10s, consistent with fs. yarnSiteConfig.setInt(CapacitySchedulerConfiguration. AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10); } {noformat} If I think about it, {{yarnSiteConfig}} is the output config. So this cannot happen: {noformat} } else { policies += "," + AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); } {noformat} This {{Configuration}} object is created with no entries. The {{else}} branch will never be taken. So it can be simplified to: {noformat} if (!usePercentages) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); String policy = AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, policy); // Set the expired for deletion interval to 10s, consistent with fs. yarnSiteConfig.setInt(CapacitySchedulerConfiguration. AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10); } {noformat} 2. This also means two separate test cases: * When usePercentages = false, then {{RM_SCHEDULER_ENABLE_MONITORS}} and {{RM_SCHEDULER_MONITOR_POLICIES}} should be set (with preemption = false) * When usePercentages = true, then {{RM_SCHEDULER_ENABLE_MONITORS}} and {{RM_SCHEDULER_MONITOR_POLICIES}} should NOT be set (with preemption = false) I recommend the following naming: {{testRmMonitorsAndPoliciesSetWhenUsingWeights()}} - first scenario {{testRmMonitorsAndPoliciesSetWhenUsingPercentages()}} - second scenario was (Author: pbacsko): [~zhuqi] I have the following comments: 1. This change seems to always enable "RM monitors": {noformat} // This should be always true to trigger dynamic queue auto deletion // when expired. yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); {noformat} But I don't think this is necessary. We need to enable it in two cases: preemption is enabled OR we're in weight mode. We don't have auto-queue delete in percentage mode (fs2cs can still convert to percentages with a command line switch). So I suggest that you pass an extra boolean "usePercentages". Invocation from {{FSConfigToCSConfigConverter}}: {noformat} siteConverter.convertSiteProperties(inputYarnSiteConfig, convertedYarnSiteConfig, drfUsed, conversionOptions.isEnableAsyncScheduler(), usePercentages); <-- last argument is new {noformat} Then in the site converter: {noformat} if (conf.getBoolean(FairSchedulerConfiguration.PREEMPTION, FairSchedulerConfiguration.DEFAULT_PREEMPTION)) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); preemptionEnabled = true; ... } if (!usePercentages) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MO
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298112#comment-17298112 ] Peter Bacsko commented on YARN-10674: - [~zhuqi] I have the following comments: 1. This change seems to always enable "RM monitors": {noformat} // This should be always true to trigger dynamic queue auto deletion // when expired. yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); {noformat} But I don't think this is necessary. We need to enable it in two cases: preemption is enabled OR we're in weight mode. We don't have auto-queue delete in percentage mode (fs2cs can still convert to percentages with a command line switch). So I suggest that you pass an extra boolean "usePercentages". Invocation from {{FSConfigToCSConfigConverter}}: {noformat} siteConverter.convertSiteProperties(inputYarnSiteConfig, convertedYarnSiteConfig, drfUsed, conversionOptions.isEnableAsyncScheduler(), usePercentages); <-- last argument is new {noformat} Then in the site converter: {noformat} if (conf.getBoolean(FairSchedulerConfiguration.PREEMPTION, FairSchedulerConfiguration.DEFAULT_PREEMPTION)) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); preemptionEnabled = true; ... } if (!usePercentages) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); // setting it again is OK String policies = yarnSiteConfig.get(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES); if (policies == null) { policies = AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); } else { policies += "," + AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); } yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, policies); // Set the expired for deletion interval to 10s, consistent with fs. yarnSiteConfig.setInt(CapacitySchedulerConfiguration. AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10); } {noformat} If I think about it, {{yarnSiteConfig}} is the output config. So this cannot happen: {noformat} } else { policies += "," + AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); } {noformat} This {{Configuration}} object is created with no entries. The {{else}} branch will never be taken. So it can be simplified to: {noformat} if (!usePercentages) { yarnSiteConfig.setBoolean( YarnConfiguration.RM_SCHEDULER_ENABLE_MONITORS, true); String policy = AutoCreatedQueueDeletionPolicy. class.getCanonicalName(); yarnSiteConfig.set(YarnConfiguration.RM_SCHEDULER_MONITOR_POLICIES, policy); // Set the expired for deletion interval to 10s, consistent with fs. yarnSiteConfig.setInt(CapacitySchedulerConfiguration. AUTO_CREATE_CHILD_QUEUE_EXPIRED_TIME, 10); } {noformat} 2. This also means two separate test cases: * When usePercentages = false, then {{RM_SCHEDULER_ENABLE_MONITORS}} and {{RM_SCHEDULER_MONITOR_POLICIES}} should be set (with preemption = false) * When usePercentages = true, then\{{RM_SCHEDULER_ENABLE_MONITORS}} and {{RM_SCHEDULER_MONITOR_POLICIES}} should NOT be set (with preemption = false) I recommend the following naming: {{testRmMonitorsAndPoliciesSetWhenUsingWeights()}} - first scenario {{testRmMonitorsAndPoliciesSetWhenUsingPercentages()}} - second scenario > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10671) Fix Typo in TestSchedulingRequestContainerAllocation
[ https://issues.apache.org/jira/browse/YARN-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298102#comment-17298102 ] Brahma Reddy Battula commented on YARN-10671: - [~dmmkr] thanks for reporting.. Path lgtm. > Fix Typo in TestSchedulingRequestContainerAllocation > > > Key: YARN-10671 > URL: https://issues.apache.org/jira/browse/YARN-10671 > Project: Hadoop YARN > Issue Type: Bug >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Minor > Attachments: YARN-10671.001.patch > > > Remove extra colon in > TestSchedulingRequestContainerAllocation.testInterAppCompositeConstraints , > the test case is passing currently but, It will fail once YARN-4283 gets > merged. > {code:java} > MockNM nm1 = rm.registerNode("192.168.0.1:1234:", 100*GB, 100);{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298092#comment-17298092 ] Peter Bacsko commented on YARN-10674: - Ok thanks, I'll review this one soon. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298076#comment-17298076 ] Qi Zhu edited comment on YARN-10674 at 3/9/21, 2:10 PM: Thanks [~pbacsko] for review. It not depends on YARN-10682 , "," already supported now, YARN-10682 fixed the "," with space error , just like YARN-10497 we should fix, such as the case: "a, b" is not supported now, but "a,b" is supported.:D I updated the description for YARN-10682. was (Author: zhuqi): Thanks [~pbacsko] for review. It not depends on YARN-10682 , "," already supported now, YARN-10682 fixed the "," with space error such as the case: "a, b" is not supported now, but "a,b" is supported.:D I updated the description for YARN-10682. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298076#comment-17298076 ] Qi Zhu edited comment on YARN-10674 at 3/9/21, 1:54 PM: Thanks [~pbacsko] for review. It not depends on YARN-10682 , "," already supported now, YARN-10682 fixed the "," with space error such as the case: "a, b" is not supported now, but "a,b" is supported.:D I updated the description for YARN-10682. was (Author: zhuqi): Thanks [~pbacsko] for review. It not depends on YARN-10682 , "," already supported now, YARN-10682 fixed the "," with space error such as the case: "a, b" is not supported now, but "a,b" is supported.:D > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should support trim between ",".
[ https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10682: -- Description: When i configured scheduler monitor policies with space, the RM will start with error. The conf should support trim between "," , such as : "a,b,c" is supported now, but "a, b, c" is not supported now, just add trim in this jira. When tested multi policy, it happened. yarn.resourcemanager.scheduler.monitor.policies org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy was: When i configured scheduler monitor policies with space, the RM will start with error. The conf should support trim between ",". When tested multi policy, it happened. yarn.resourcemanager.scheduler.monitor.policies org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > The scheduler monitor policies conf should support trim between ",". > > > Key: YARN-10682 > URL: https://issues.apache.org/jira/browse/YARN-10682 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10682.001.patch > > > When i configured scheduler monitor policies with space, the RM will start > with error. > The conf should support trim between "," , such as : > "a,b,c" is supported now, but "a, b, c" is not supported now, just add > trim in this jira. > > When tested multi policy, it happened. > > yarn.resourcemanager.scheduler.monitor.policies > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should support trim between ",".
[ https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10682: -- Description: When i configured scheduler monitor policies with space, the RM will start with error. The conf should support trim between "," , such as : "a,b,c" is supported now, but "a, b, c" is not supported now, just add trim in this jira. When tested multi policy, it happened. yarn.resourcemanager.scheduler.monitor.policies org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy, org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy was: When i configured scheduler monitor policies with space, the RM will start with error. The conf should support trim between "," , such as : "a,b,c" is supported now, but "a, b, c" is not supported now, just add trim in this jira. When tested multi policy, it happened. yarn.resourcemanager.scheduler.monitor.policies org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > The scheduler monitor policies conf should support trim between ",". > > > Key: YARN-10682 > URL: https://issues.apache.org/jira/browse/YARN-10682 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10682.001.patch > > > When i configured scheduler monitor policies with space, the RM will start > with error. > The conf should support trim between "," , such as : > "a,b,c" is supported now, but "a, b, c" is not supported now, just add > trim in this jira. > > When tested multi policy, it happened. > > yarn.resourcemanager.scheduler.monitor.policies > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy, > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298076#comment-17298076 ] Qi Zhu commented on YARN-10674: --- Thanks [~pbacsko] for review. It not depends on YARN-10682 , "," already supported now, YARN-10682 fixed the "," with space error such as the case: "a, b" is not supported now, but "a,b" is supported.:D > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298074#comment-17298074 ] Peter Bacsko commented on YARN-10674: - [~zhuqi] am I right when I think that this patch depends on YARN-10682? Because this change generates a config entry with "," and it's not supported now. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298072#comment-17298072 ] Qi Zhu commented on YARN-9615: -- Thanks [~pbacsko] for commit and review. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-9615.001.patch, YARN-9615.002.patch, > YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, > YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, > YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, > YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, > image-2021-03-04-10-36-12-441.png, screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298069#comment-17298069 ] Peter Bacsko commented on YARN-9615: +1 I had to commit twice because there are actually two authors. Thanks for the patch [~jhung] / [~zhuqi] and [~bibinchundatt] for the review. Committed to trunk. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.002.patch, > YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, > YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, > YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, > YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, > image-2021-03-04-10-36-12-441.png, screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298068#comment-17298068 ] Peter Bacsko commented on YARN-9615: Thanks [~zhuqi] patch v11 looks good, committing it soon. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.002.patch, > YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, > YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, > YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, > YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, > image-2021-03-04-10-36-12-441.png, screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS
[ https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298064#comment-17298064 ] Peter Bacsko commented on YARN-10679: - +1 thanks [~snemeth] for the patch and [~shuzirra] for the review. Committed to trunk. > Better logging of uncaught exceptions throughout SLS > > > Key: YARN-10679 > URL: https://issues.apache.org/jira/browse/YARN-10679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10679.001.patch > > > In our internal environment, there was a test failure while running SLS tests > with Jenkins. > It's difficult to align the uncaught exceptions (in this case an NPE) and the > log itself as the exception is logged with {{e.printStackTrace()}}. > This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", > exception)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS
[ https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298062#comment-17298062 ] Peter Bacsko commented on YARN-10679: - Ok, this time the failed test is different, most likely a flaky one. Let's investigate it later. > Better logging of uncaught exceptions throughout SLS > > > Key: YARN-10679 > URL: https://issues.apache.org/jira/browse/YARN-10679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10679.001.patch > > > In our internal environment, there was a test failure while running SLS tests > with Jenkins. > It's difficult to align the uncaught exceptions (in this case an NPE) and the > log itself as the exception is logged with {{e.printStackTrace()}}. > This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", > exception)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS
[ https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298061#comment-17298061 ] Hadoop QA commented on YARN-10679: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 27s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 50s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {col
[jira] [Commented] (YARN-10497) Fix an issue in CapacityScheduler which fails to delete queues
[ https://issues.apache.org/jira/browse/YARN-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298059#comment-17298059 ] Qi Zhu commented on YARN-10497: --- [~shuzirra] [~pbacsko] Updated a patch to use getTrimmedStringCollection to fix this issue.:D Thanks. > Fix an issue in CapacityScheduler which fails to delete queues > -- > > Key: YARN-10497 > URL: https://issues.apache.org/jira/browse/YARN-10497 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-10497.001.patch, YARN-10497.002.patch, > YARN-10497.003.patch, YARN-10497.004.patch, YARN-10497.005.patch > > > We saw an exception when using queue mutation APIs: > {code:java} > 2020-11-13 16:47:46,327 WARN > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: > CapacityScheduler configuration validation failed:java.io.IOException: Queue > root.am2cmQueueSecond not found > {code} > Which comes from this code: > {code:java} > List siblingQueues = getSiblingQueues(queueToRemove, > proposedConf); > if (!siblingQueues.contains(queueName)) { > throw new IOException("Queue " + queueToRemove + " not found"); > } > {code} > (Inside MutableCSConfigurationProvider) > If you look at the method: > {code:java} > > private List getSiblingQueues(String queuePath, Configuration conf) > { > String parentQueue = queuePath.substring(0, queuePath.lastIndexOf('.')); > String childQueuesKey = CapacitySchedulerConfiguration.PREFIX + > parentQueue + CapacitySchedulerConfiguration.DOT + > CapacitySchedulerConfiguration.QUEUES; > return new ArrayList<>(conf.getStringCollection(childQueuesKey)); > } > {code} > And here's capacity-scheduler.xml I got > {code:java} > yarn.scheduler.capacity.root.queuesdefault, q1, > q2 > {code} > You can notice there're spaces between default, q1, a2 > So conf.getStringCollection returns: > {code:java} > default > q1 > ... > {code} > Which causes match issue when we try to delete the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10497) Fix an issue in CapacityScheduler which fails to delete queues
[ https://issues.apache.org/jira/browse/YARN-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10497: -- Attachment: YARN-10497.005.patch > Fix an issue in CapacityScheduler which fails to delete queues > -- > > Key: YARN-10497 > URL: https://issues.apache.org/jira/browse/YARN-10497 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-10497.001.patch, YARN-10497.002.patch, > YARN-10497.003.patch, YARN-10497.004.patch, YARN-10497.005.patch > > > We saw an exception when using queue mutation APIs: > {code:java} > 2020-11-13 16:47:46,327 WARN > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: > CapacityScheduler configuration validation failed:java.io.IOException: Queue > root.am2cmQueueSecond not found > {code} > Which comes from this code: > {code:java} > List siblingQueues = getSiblingQueues(queueToRemove, > proposedConf); > if (!siblingQueues.contains(queueName)) { > throw new IOException("Queue " + queueToRemove + " not found"); > } > {code} > (Inside MutableCSConfigurationProvider) > If you look at the method: > {code:java} > > private List getSiblingQueues(String queuePath, Configuration conf) > { > String parentQueue = queuePath.substring(0, queuePath.lastIndexOf('.')); > String childQueuesKey = CapacitySchedulerConfiguration.PREFIX + > parentQueue + CapacitySchedulerConfiguration.DOT + > CapacitySchedulerConfiguration.QUEUES; > return new ArrayList<>(conf.getStringCollection(childQueuesKey)); > } > {code} > And here's capacity-scheduler.xml I got > {code:java} > yarn.scheduler.capacity.root.queuesdefault, q1, > q2 > {code} > You can notice there're spaces between default, q1, a2 > So conf.getStringCollection returns: > {code:java} > default > q1 > ... > {code} > Which causes match issue when we try to delete the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS
[ https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298057#comment-17298057 ] Peter Bacsko commented on YARN-10679: - Re-triggered build to see what's going on with TestSLSRunner. > Better logging of uncaught exceptions throughout SLS > > > Key: YARN-10679 > URL: https://issues.apache.org/jira/browse/YARN-10679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10679.001.patch > > > In our internal environment, there was a test failure while running SLS tests > with Jenkins. > It's difficult to align the uncaught exceptions (in this case an NPE) and the > log itself as the exception is logged with {{e.printStackTrace()}}. > This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", > exception)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class
[ https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298055#comment-17298055 ] Hadoop QA commented on YARN-10677: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 28s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 39s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 9s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 43s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green}{color} | {color:green} hadoop-tools/hadoop-sls: The patch generated 0 new + 1 unchanged - 1 fixed = 1 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 49s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:
[jira] [Commented] (YARN-10681) Fix assertion failure message in BaseSLSRunnerTest
[ https://issues.apache.org/jira/browse/YARN-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298041#comment-17298041 ] Peter Bacsko commented on YARN-10681: - +1 thanks [~snemeth] and [~shuzirra] for the patch and review, committed to trunk. > Fix assertion failure message in BaseSLSRunnerTest > -- > > Key: YARN-10681 > URL: https://issues.apache.org/jira/browse/YARN-10681 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Trivial > Attachments: YARN-10681.001.patch > > > There is this failure message: > https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/BaseSLSRunnerTest.java#L129-L130 > The word "catched" should be replaced with "caught". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298032#comment-17298032 ] Hadoop QA commented on YARN-9615: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 39s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 40s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 16s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 18s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 8s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 31s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 56s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 36s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 36s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 25s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 25s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 10s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts.
[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class
[ https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298031#comment-17298031 ] Peter Bacsko commented on YARN-10677: - +1 LGTM. Thanks [~snemeth] for the patch and [~zhuqi] for the review. Committed to trunk. (Jenkins is running but I don't expect any issues). > Logger of SLSFairScheduler is provided with the wrong class > --- > > Key: YARN-10677 > URL: https://issues.apache.org/jira/browse/YARN-10677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10677.001.patch, YARN-10677.002.patch, > YARN-10677.003.patch, YARN-10677.004.patch > > > In SLSFairScheduler, the Logger definition looks like: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69 > We need to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class
[ https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298018#comment-17298018 ] Szilard Nemeth commented on YARN-10677: --- [~pbacsko], Attached 004 patch that fixes the checkstyle and whitespace. > Logger of SLSFairScheduler is provided with the wrong class > --- > > Key: YARN-10677 > URL: https://issues.apache.org/jira/browse/YARN-10677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10677.001.patch, YARN-10677.002.patch, > YARN-10677.003.patch, YARN-10677.004.patch > > > In SLSFairScheduler, the Logger definition looks like: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69 > We need to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class
[ https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10677: -- Attachment: YARN-10677.004.patch > Logger of SLSFairScheduler is provided with the wrong class > --- > > Key: YARN-10677 > URL: https://issues.apache.org/jira/browse/YARN-10677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10677.001.patch, YARN-10677.002.patch, > YARN-10677.003.patch, YARN-10677.004.patch > > > In SLSFairScheduler, the Logger definition looks like: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69 > We need to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks
[ https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10680: -- Priority: Minor (was: Major) > Revisit try blocks without catch blocks but having finally blocks > - > > Key: YARN-10680 > URL: https://issues.apache.org/jira/browse/YARN-10680 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Priority: Minor > Labels: newbie > > This jira is to revisit all try blocks without catch blocks but having > finally blocks in SLS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks
[ https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298017#comment-17298017 ] Szilard Nemeth commented on YARN-10680: --- [~shuzirra] You might be interested in taking this jira. > Revisit try blocks without catch blocks but having finally blocks > - > > Key: YARN-10680 > URL: https://issues.apache.org/jira/browse/YARN-10680 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie > > This jira is to revisit all try blocks without catch blocks but having > finally blocks in SLS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10680) Revisit try blocks without catch blocks but having finally blocks
[ https://issues.apache.org/jira/browse/YARN-10680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-10680: - Assignee: (was: Szilard Nemeth) > Revisit try blocks without catch blocks but having finally blocks > - > > Key: YARN-10680 > URL: https://issues.apache.org/jira/browse/YARN-10680 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie > > This jira is to revisit all try blocks without catch blocks but having > finally blocks in SLS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start
[ https://issues.apache.org/jira/browse/YARN-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10684: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed almost all times with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" * ** * was: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" ** ** > YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried > adding flag -promote_opportunistic_after_start > --- > > Key: YARN-10684 > URL: https://issues.apache.org/jira/browse/YARN-10684 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM yarn-site.xml :: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]yarn-site.xml ::: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : > Job Command :: yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar > -shell_command sleep -shell_args 20 -num_containers 10 -container_type > OPPORTUNISTIC -*promote_opportunistic_after_start* > Actual Result: Distributed Shell Yarn Job Failed almost all times with below > Diagnostics message > *[ Failed Reason : Application Failure: desired = 10, completed = 10, > allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container > Killed to make room for Guaranteed Container.]* > Expected Result: DS job should be successful with argument > "promote_opportunistic_after_start" * ** * -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start
Sushanta Sen created YARN-10684: --- Summary: YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start Key: YARN-10684 URL: https://issues.apache.org/jira/browse/YARN-10684 Project: Hadoop YARN Issue Type: Bug Components: distributed-scheduling Affects Versions: 3.1.1 Reporter: Sushanta Sen Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" ** ** -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10682) The scheduler monitor policies conf should support trim between ",".
[ https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298011#comment-17298011 ] Hadoop QA commented on YARN-10682: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 26s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 9s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 1s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 14s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {col
[jira] [Commented] (YARN-10678) Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions
[ https://issues.apache.org/jira/browse/YARN-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298010#comment-17298010 ] Peter Bacsko commented on YARN-10678: - +1 thanks [~snemeth] for the patch and [~shuzirra] for the review. Committed to trunk. > Try blocks without catch blocks in SLS scheduler classes can swallow other > exceptions > - > > Key: YARN-10678 > URL: https://issues.apache.org/jira/browse/YARN-10678 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10678-unchecked-exception-from-FS-allocate.diff, > YARN-10678-unchecked-exception-from-FS-allocate_fixed.diff, > YARN-10678.001.patch, > org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_modified.log, > > org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_original.log > > > In SLSFairScheduler, we have this try-finally block (without catch block) in > the allocate method: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L109-L123 > Similarly, in SLSCapacityScheduler: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java#L116-L131 > In the finally block, the updateQueueWithAllocateRequest is invoked: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L118 > In our internal environment, there was a situation when an NPE was logged > from this method: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.updateQueueWithAllocateRequest(SLSFairScheduler.java:262) > at > org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.allocate(SLSFairScheduler.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:288) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > This can happen if the following events occur: > 1. A runtime exception is thrown in FairScheduler or CapacityScheduler's > allocate method > 2. In this case, the local variable called 'allocation' remains null: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L110 > 3. In updateQueueWithAllocateRequest, this null object will be dereferenced > here: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L262 > 4. Then, we have an NPE here: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L117-L122 > In this case, we lost the original exception thrown from > FairScheduler#allocate. > In order to fix this, a catch-block should be introduced and the exception > needs to be logged. > The whole thing applies to SLSCapacityScheduler as well. -- This message was sent by Atlassian Jira (v8.3.4#803005
[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI
[ https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298009#comment-17298009 ] Bilwa S T commented on YARN-10588: -- Hi [~Jim_Brennan] can you please take a look at this Jira when you get time? Thanks > Percentage of queue and cluster is zero in WebUI > - > > Key: YARN-10588 > URL: https://issues.apache.org/jira/browse/YARN-10588 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10588.001.patch, YARN-10588.002.patch, > YARN-10588.003.patch, YARN-10588.004.patch > > > Steps to reproduce: > Configure below property in resource-types.xml > {code:java} > > yarn.resource-types > yarn.io/gpu > {code} > Submit a job > In UI you can see % Of Queue and % Of Cluster is zero for the submitted > application > > This is because in SchedulerApplicationAttempt has below check for > calculating queueUsagePerc and clusterUsagePerc > {code:java} > if (!calc.isInvalidDivisor(cluster)) { > float queueCapacityPerc = queue.getQueueInfo(false, false) > .getCapacity(); > queueUsagePerc = calc.divide(cluster, usedResourceClone, > Resources.multiply(cluster, queueCapacityPerc)) * 100; > if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) { > queueUsagePerc = 0.0f; > } > clusterUsagePerc = > calc.divide(cluster, usedResourceClone, cluster) * 100; > } > {code} > calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class
[ https://issues.apache.org/jira/browse/YARN-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298007#comment-17298007 ] Peter Bacsko commented on YARN-10677: - [~snemeth] please fix the whitespace and checkstyle, thanks. > Logger of SLSFairScheduler is provided with the wrong class > --- > > Key: YARN-10677 > URL: https://issues.apache.org/jira/browse/YARN-10677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10677.001.patch, YARN-10677.002.patch, > YARN-10677.003.patch > > > In SLSFairScheduler, the Logger definition looks like: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69 > We need to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10679) Better logging of uncaught exceptions throughout SLS
[ https://issues.apache.org/jira/browse/YARN-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298002#comment-17298002 ] Gergely Pollak commented on YARN-10679: --- [~snemeth] thank you for the patch, LGTM+1 (Non-binding) > Better logging of uncaught exceptions throughout SLS > > > Key: YARN-10679 > URL: https://issues.apache.org/jira/browse/YARN-10679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10679.001.patch > > > In our internal environment, there was a test failure while running SLS tests > with Jenkins. > It's difficult to align the uncaught exceptions (in this case an NPE) and the > log itself as the exception is logged with {{e.printStackTrace()}}. > This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", > exception)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10681) Fix assertion failure message in BaseSLSRunnerTest
[ https://issues.apache.org/jira/browse/YARN-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298000#comment-17298000 ] Gergely Pollak commented on YARN-10681: --- [~snemeth] thank you for the patch LGTM+1(Non-binding) > Fix assertion failure message in BaseSLSRunnerTest > -- > > Key: YARN-10681 > URL: https://issues.apache.org/jira/browse/YARN-10681 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Trivial > Attachments: YARN-10681.001.patch > > > There is this failure message: > https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/BaseSLSRunnerTest.java#L129-L130 > The word "catched" should be replaced with "caught". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10678) Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions
[ https://issues.apache.org/jira/browse/YARN-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297999#comment-17297999 ] Gergely Pollak commented on YARN-10678: --- [~snemeth] thank you for the patch, LGTM+1 (Non-binding) > Try blocks without catch blocks in SLS scheduler classes can swallow other > exceptions > - > > Key: YARN-10678 > URL: https://issues.apache.org/jira/browse/YARN-10678 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10678-unchecked-exception-from-FS-allocate.diff, > YARN-10678-unchecked-exception-from-FS-allocate_fixed.diff, > YARN-10678.001.patch, > org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_modified.log, > > org.apache.hadoop.yarn.sls.TestReservationSystemInvariants__testSimulatorRunning_original.log > > > In SLSFairScheduler, we have this try-finally block (without catch block) in > the allocate method: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L109-L123 > Similarly, in SLSCapacityScheduler: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java#L116-L131 > In the finally block, the updateQueueWithAllocateRequest is invoked: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L118 > In our internal environment, there was a situation when an NPE was logged > from this method: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.updateQueueWithAllocateRequest(SLSFairScheduler.java:262) > at > org.apache.hadoop.yarn.sls.scheduler.SLSFairScheduler.allocate(SLSFairScheduler.java:118) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:288) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:352) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator$1.run(MRAMSimulator.java:349) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.sendContainerRequest(MRAMSimulator.java:348) > at > org.apache.hadoop.yarn.sls.appmaster.AMSimulator.middleStep(AMSimulator.java:212) > at > org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:94) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > This can happen if the following events occur: > 1. A runtime exception is thrown in FairScheduler or CapacityScheduler's > allocate method > 2. In this case, the local variable called 'allocation' remains null: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L110 > 3. In updateQueueWithAllocateRequest, this null object will be dereferenced > here: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L262 > 4. Then, we have an NPE here: > https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L117-L122 > In this case, we lost the original exception thrown from > FairScheduler#allocate. > In order to fix this, a catch-block should be introduced and the exception > needs to be logged. > The whole thing applies to SLSCapacityScheduler as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (YARN-10675) Consolidate YARN-10672 and YARN-10447
[ https://issues.apache.org/jira/browse/YARN-10675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297998#comment-17297998 ] Peter Bacsko commented on YARN-10675: - +1 LGTM. Thanks [~snemeth] for the patch, committed to trunk. > Consolidate YARN-10672 and YARN-10447 > - > > Key: YARN-10675 > URL: https://issues.apache.org/jira/browse/YARN-10675 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10675.001.patch > > > Let's consolidate the solution applied for YARN-10672 and apply it to the > code changes introduced with YARN-10447. > Quoting [~pbacsko]: > {quote} > The solution is much straightforward than mine in YARN-10447. Actually we > might consider applying this to TestLeafQueue with undoing my changes, > because that's more complicated (I had no patience to go deeper with Mockito > internal behavior, I just thought well, disable that thread and that's > enough). > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10676) Improve code quality in TestTimelineAuthenticationFilterForV1
[ https://issues.apache.org/jira/browse/YARN-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297996#comment-17297996 ] Peter Bacsko commented on YARN-10676: - +1 thanks [~snemeth] for the patch and [~bteke] / [~zhuqi] / [~shuzirra] for the review. Committed to trunk. > Improve code quality in TestTimelineAuthenticationFilterForV1 > - > > Key: YARN-10676 > URL: https://issues.apache.org/jira/browse/YARN-10676 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-10676.001.patch > > > - In testcase "testDelegationTokenOperations", the exception message is > checked but in case it does not match the assertion, the exception is not > printed. This happens 3 times. > - Assertion messages can be added > - Fields called "httpSpnegoKeytabFile" and "httpSpnegoPrincipal" can be > static final. > - There's a typo in comment "avaiable" (happens 2 times) > - There are some Assert.fail() calls, without messages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10683) Add total resource in NodeManager metrics
[ https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10683: --- Assignee: Minni Mittal > Add total resource in NodeManager metrics > - > > Key: YARN-10683 > URL: https://issues.apache.org/jira/browse/YARN-10683 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10683) Add total resource in NodeManager metrics
Minni Mittal created YARN-10683: --- Summary: Add total resource in NodeManager metrics Key: YARN-10683 URL: https://issues.apache.org/jira/browse/YARN-10683 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl
[ https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297827#comment-17297827 ] ANANDA G B edited comment on YARN-9731 at 3/9/21, 9:09 AM: --- Thanks for working on it [~magnum]. [~prabhujoseph] Can we commit this 3.1.2 and 3.3.1 version. was (Author: gb.ana...@gmail.com): Thanks for working on it [~magnum]. [~prabhujoseph] Can we commit this 3.1.2 version. > In ATS v1.5, all jobs are visible to all users without view-acl > --- > > Key: YARN-9731 > URL: https://issues.apache.org/jira/browse/YARN-9731 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9731.001.patch, YARN-9731.002.patch, > YARN-9731.003.patch, YARN-9731.004.patch, YARN-9731.005.patch, > ats_v1.5_screenshot.png > > > In ATS v1.5 of secure mode, > all jobs are visible to all users without view-acl. > if user does not have view-acl, user should not be able to see jobs. > I attatched ATS UI screenshot. > > ATS v1.5 log > {code:java} > 2019-08-09 10:21:13,679 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1954. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1954 > 2019-08-09 10:21:13,680 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1951. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1951 > {code} > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297927#comment-17297927 ] Qi Zhu commented on YARN-9615: -- Submitted again, trigger the Jenkins. > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-9615.001.patch, YARN-9615.002.patch, > YARN-9615.003.patch, YARN-9615.004.patch, YARN-9615.005.patch, > YARN-9615.006.patch, YARN-9615.007.patch, YARN-9615.008.patch, > YARN-9615.009.patch, YARN-9615.010.patch, YARN-9615.011.patch, > YARN-9615.011.patch, YARN-9615.poc.patch, image-2021-03-04-10-35-10-626.png, > image-2021-03-04-10-36-12-441.png, screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10682) The scheduler monitor policies conf should support trim between ",".
[ https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297923#comment-17297923 ] Qi Zhu edited comment on YARN-10682 at 3/9/21, 8:01 AM: [~pbacsko], [~gandras], [~shuzirra], [~bteke] No additional test needed. Could you help to review this fix? :D Thanks. was (Author: zhuqi): [~pbacsko], [~gandras], [~shuzirra], [~bteke] Could you help to review this fix? :D Thanks. > The scheduler monitor policies conf should support trim between ",". > > > Key: YARN-10682 > URL: https://issues.apache.org/jira/browse/YARN-10682 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10682.001.patch > > > When i configured scheduler monitor policies with space, the RM will start > with error. > The conf should support trim between ",". > > When tested multi policy, it happened. > > yarn.resourcemanager.scheduler.monitor.policies > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10682) The scheduler monitor policies conf should support trim between ",".
[ https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297923#comment-17297923 ] Qi Zhu commented on YARN-10682: --- [~pbacsko], [~gandras], [~shuzirra], [~bteke] Could you help to review this fix? :D Thanks. > The scheduler monitor policies conf should support trim between ",". > > > Key: YARN-10682 > URL: https://issues.apache.org/jira/browse/YARN-10682 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10682.001.patch > > > When i configured scheduler monitor policies with space, the RM will start > with error. > The conf should support trim between ",". > > When tested multi policy, it happened. > > yarn.resourcemanager.scheduler.monitor.policies > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10682) The scheduler monitor policies conf should support trim between ",".
[ https://issues.apache.org/jira/browse/YARN-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10682: -- Description: When i configured scheduler monitor policies with space, the RM will start with error. The conf should support trim between ",". When tested multi policy, it happened. yarn.resourcemanager.scheduler.monitor.policies org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy was: When i configured scheduler monitor policies with space, the RM will start with error. The conf should support trim between ",". > The scheduler monitor policies conf should support trim between ",". > > > Key: YARN-10682 > URL: https://issues.apache.org/jira/browse/YARN-10682 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10682.001.patch > > > When i configured scheduler monitor policies with space, the RM will start > with error. > The conf should support trim between ",". > > When tested multi policy, it happened. > > yarn.resourcemanager.scheduler.monitor.policies > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org