[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794680#comment-16794680 ] Zhaohui Xin commented on YARN-9278: --- [~wilfreds], thanks for your reply. I attached new patch which resolved code style issues and description in FairScheduler.md. {quote}We need to get either add some tests or explain why we cannot add tests. {quote} I think it's difficult to test because we can't count the number of nodes which will be preempted. Can you give me some suggestions? > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9278.001.patch, YARN-9278.002.patch, > YARN-9278.003.patch > > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9278: -- Attachment: YARN-9278.003.patch > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9278.001.patch, YARN-9278.002.patch, > YARN-9278.003.patch > > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792415#comment-16792415 ] Zhaohui Xin commented on YARN-9344: --- [~wilfreds], I updated patch. Can you help me review this? :D {quote}The test does only check memory, we should also cover other resource types in the test not just the memory resource (vcores, custom resource types). {quote} In the new patch, we tested cpu and mem separately. So there is no need to test other resource type. > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch, > YARN-9344.003.patch, YARN-9344.004.patch, YARN-9344.005.patch, > YARN-9344.006.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9344: -- Attachment: YARN-9344.006.patch > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch, > YARN-9344.003.patch, YARN-9344.004.patch, YARN-9344.005.patch, > YARN-9344.006.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9278: -- Attachment: YARN-9278.002.patch > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9278.001.patch, YARN-9278.002.patch > > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9344: -- Attachment: YARN-9344.005.patch > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch, > YARN-9344.003.patch, YARN-9344.004.patch, YARN-9344.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9369) Yarn RM metrics test build failed
[ https://issues.apache.org/jira/browse/YARN-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787889#comment-16787889 ] Zhaohui Xin commented on YARN-9369: --- Thanks for your reply. [~Prabhu Joseph]. This is indeed duplicated. > Yarn RM metrics test build failed > - > > Key: YARN-9369 > URL: https://issues.apache.org/jira/browse/YARN-9369 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Priority: Major > > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics > > {code:java} > java.lang.AssertionError: Expected 2 events to be published expected:<2> but > was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9369) Yarn RM metrics test build failed
[ https://issues.apache.org/jira/browse/YARN-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9369: -- Description: org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics {code:java} java.lang.AssertionError: Expected 2 events to be published expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} was: {code:java} java.lang.AssertionError: Expected 2 events to be published expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} > Yarn RM metrics test build failed > - > > Key: YARN-9369 > URL: https://issues.apache.org/jira/browse/YARN-9369 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Priority: Major > > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics > > {code:java} > java.lang.AssertionError: Expected 2 events to be published expected:<2> but > was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Created] (YARN-9369) Yarn RM metrics test build failed
Zhaohui Xin created YARN-9369: - Summary: Yarn RM metrics test build failed Key: YARN-9369 URL: https://issues.apache.org/jira/browse/YARN-9369 Project: Hadoop YARN Issue Type: Bug Reporter: Zhaohui Xin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9369) Yarn RM metrics test build failed
[ https://issues.apache.org/jira/browse/YARN-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9369: -- Description: {code:java} java.lang.AssertionError: Expected 2 events to be published expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} > Yarn RM metrics test build failed > - > > Key: YARN-9369 > URL: https://issues.apache.org/jira/browse/YARN-9369 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Priority: Major > > > {code:java} > java.lang.AssertionError: Expected 2 events to be published expected:<2> but > was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9344: -- Attachment: YARN-9344.004.patch > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch, > YARN-9344.003.patch, YARN-9344.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9344: -- Attachment: YARN-9344.003.patch > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch, > YARN-9344.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785603#comment-16785603 ] Zhaohui Xin commented on YARN-9344: --- The whitespace error is not relatived to this patch, [YARN-9348|https://issues.apache.org/jira/browse/YARN-9348] will fix it. > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785603#comment-16785603 ] Zhaohui Xin edited comment on YARN-9344 at 3/6/19 1:00 PM: --- The whitespace error is not relatived to this patch, YARN-9348 will fix it. was (Author: uranus): The whitespace error is not relatived to this patch, [YARN-9348|https://issues.apache.org/jira/browse/YARN-9348] will fix it. > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785202#comment-16785202 ] Zhaohui Xin commented on YARN-9344: --- [~wilfreds], thanks for your reply. I updated patch to shortcut before assign container. Can you help me review this? :D > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9344: -- Attachment: YARN-9344.002.patch > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9344.001.patch, YARN-9344.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9344: -- Summary: FS should not reserve when container capability is bigger than node total resource (was: FS should not reserve when node total resource can not meet container capability) > FS should not reserve when container capability is bigger than node total > resource > -- > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9344) FS should not reserve when node total resource can not meet container capability
[ https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-9344: - Assignee: Zhaohui Xin > FS should not reserve when node total resource can not meet container > capability > > > Key: YARN-9344 > URL: https://issues.apache.org/jira/browse/YARN-9344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9344) FS should not reserve when node total resource can not meet container capability
Zhaohui Xin created YARN-9344: - Summary: FS should not reserve when node total resource can not meet container capability Key: YARN-9344 URL: https://issues.apache.org/jira/browse/YARN-9344 Project: Hadoop YARN Issue Type: Bug Reporter: Zhaohui Xin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783239#comment-16783239 ] Zhaohui Xin commented on YARN-6487: --- Thanks for your explanation. > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6225) Global scheduler applies to Fair scheduler
[ https://issues.apache.org/jira/browse/YARN-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-6225: - Assignee: Zhaohui Xin > Global scheduler applies to Fair scheduler > -- > > Key: YARN-6225 > URL: https://issues.apache.org/jira/browse/YARN-6225 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tao Jie >Assignee: Zhaohui Xin >Priority: Major > > IIRC in global scheduling, logic for scheduling constraint such as nodelabel, > affinity/anti-affinity would take place before the scheduler try to commit > ResourceCommitRequest. This logic looks can be shared by FairScheduler and > CapacityScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528 ] Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:45 PM: Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' heartbeats trigger original scheduling, the continuous scheduling thread will be starved because of lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill. {quote} was (Author: uranus): Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' heartbeats trigger original scheduling, the continuous scheduling thread will be starved because lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill. {quote} > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528 ] Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:43 PM: Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' heartbeats trigger scheduling, the continuous scheduling thread will be starved because lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill. {quote} was (Author: uranus): Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node heartbeats trigger scheduling, the continuous scheduling thread will be starved because lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill.{quote} > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528 ] Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:43 PM: Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' heartbeats trigger original scheduling, the continuous scheduling thread will be starved because lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill. {quote} was (Author: uranus): Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' heartbeats trigger scheduling, the continuous scheduling thread will be starved because lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill. {quote} > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528 ] Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:42 PM: Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node heartbeats trigger scheduling, the continuous scheduling thread will be starved because lock conflict. {quote}The side effect is however that when a cluster grows (100+ nodes) the number of heartbeats that needed processing started interfering with the continuous scheduling thread and other internal threads. This does cause thread starvation and in the worst case scheduling comes to a standstill.{quote} was (Author: uranus): Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node heartbeats trigger scheduling, the continuous scheduling thread will be starved because lock conflict. > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528 ] Zhaohui Xin commented on YARN-6487: --- Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node heartbeats trigger scheduling, the continuous scheduling thread will be starved because lock conflict. > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776539#comment-16776539 ] Zhaohui Xin commented on YARN-6487: --- Hi, [~wilfreds]. Can you add some reasons why we should remove continuous scheduler code in FairScheduler? > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776536#comment-16776536 ] Zhaohui Xin commented on YARN-9278: --- Thanks for your suggestions, [~wilfreds]. I also think it's better to randomize nodes when the number of nodes exceeds a certain threshold. Maybe our change like this, {code:java} List potentialNodes = scheduler.getNodeTracker() .getNodesByResourceName(rr.getResourceName()); int maxTryNodeNumOnce = conf.getMaxTryNodeNumOnce(); // we should not iterate all nodes, that will be very slow if (ResourceRequest.ANY.equals(rr.getResourceName()) && potentialNodes.size() > maxTryNodeNumOnce) { Collections.shuffle(potentialNodes); potentialNodes = potentialNodes.subList(0, maxTryNodeNumOnce); } {code} > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776207#comment-16776207 ] Zhaohui Xin edited comment on YARN-9278 at 2/24/19 10:14 AM: - Thanks for your reply, [~yufeigu]. I think another solution is to stop looking for nodes when we find a suitable one. was (Author: uranus): Thanks for your reply, [~yufeigu]. Another solution is to stop looking for nodes when we find a suitable one. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776207#comment-16776207 ] Zhaohui Xin commented on YARN-9278: --- Thanks for your reply, [~yufeigu]. Another solution is to stop looking for nodes when we find a suitable one. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only
[ https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-7904: - Assignee: Eric Yang (was: Zhaohui Xin) > Privileged, trusted containers need all of their bind-mounted directories to > be read-only > - > > Key: YARN-7904 > URL: https://issues.apache.org/jira/browse/YARN-7904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > > Since they will be running as some other user than themselves, the NM likely > won't be able to clean up after them because of permissions issues. So, to > prevent this, we should make these directories read-only. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only
[ https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774703#comment-16774703 ] Zhaohui Xin commented on YARN-7904: --- [~eyang], I am not working on this. Please feel free to take it. :D > Privileged, trusted containers need all of their bind-mounted directories to > be read-only > - > > Key: YARN-7904 > URL: https://issues.apache.org/jira/browse/YARN-7904 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Zhaohui Xin >Priority: Major > Labels: Docker > > Since they will be running as some other user than themselves, the NM likely > won't be able to clean up after them because of permissions issues. So, to > prevent this, we should make these directories read-only. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773792#comment-16773792 ] Zhaohui Xin commented on YARN-9278: --- {quote}Without introduce more complexity to FS preemption, it is already very complicated, there are some workarounds you can try: To increase FairShare Preemption Timeout and FairShare Preemption Threshold to reduce the chance of preemption. This is specially useful for a large cluster, since there is more chance to get resources just by waiting. {quote} If our cluster has a lot of long-running jobs, the above method is not helpful. We have used this optimization for more than a year, which improves preemption performance effectively. BTW, we have more than 10 clusters and most of them have about 10K nodes. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772638#comment-16772638 ] Zhaohui Xin edited comment on YARN-9278 at 2/20/19 5:32 AM: Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY as resource name, it will find a best node in all nodes of this cluster. This will be costly when this cluster has more than 10k nodes. I think we should limit the number of nodes in such a situation. How do you think this? :D was (Author: uranus): Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY as resource name, it will find a best node in all nodes of this cluster. This will be costly when this cluster has more than 10k nodes. I think we should limit the number of nodes in such a situation. How do you think this? :D > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772638#comment-16772638 ] Zhaohui Xin commented on YARN-9278: --- Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY as resource name, it will find a best node in all nodes of this cluster. This will be costly when this cluster has more than 10k nodes. I think we should limit the number of nodes in such a situation. How do you think this? :D > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770420#comment-16770420 ] Zhaohui Xin commented on YARN-9277: --- In my opinion, preempting one container which has been running more than 10 hours is equivalent to preempt 10 containers which have been running in 1 hour. So we should preempt short-running containers firstly. [~yufeigu], [~wilfreds]. How do you think this? I attached new patch.:D > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch, > YARN-9277.003.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should preempt short-running containers firstly > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Description: I think we should add more restrictions in fair scheduler preemption. * We should not preempt self * We should not preempt short-running containers firstly * ... was: I think we should add more restrictions in fair scheduler preemption. * We should not preempt self * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch, > YARN-9277.003.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt short-running containers firstly > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Description: I think we should add more restrictions in fair scheduler preemption. * We should not preempt self * We should preempt short-running containers firstly * ... was: I think we should add more restrictions in fair scheduler preemption. * We should not preempt self * We should not preempt short-running containers firstly * ... > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch, > YARN-9277.003.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should preempt short-running containers firstly > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Attachment: YARN-9277.003.patch > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch, > YARN-9277.003.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7021) TestResourceUtils to be moved to hadoop-yarn-api package
[ https://issues.apache.org/jira/browse/YARN-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-7021: - Assignee: Zhaohui Xin > TestResourceUtils to be moved to hadoop-yarn-api package > > > Key: YARN-7021 > URL: https://issues.apache.org/jira/browse/YARN-7021 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-3926 >Reporter: Sunil Govindan >Assignee: Zhaohui Xin >Priority: Major > > ResourceUtils class is now in yarn-api. Its better its test class also to be > moved there, however these tests using lot of resources and using > ConfigurationProvider which is available only in yarn-common. Hence > investigate and improve test for ResourceUtils class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6971) Clean up different ways to create resources
[ https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-6971: - Assignee: (was: Zhaohui Xin) > Clean up different ways to create resources > --- > > Key: YARN-6971 > URL: https://issues.apache.org/jira/browse/YARN-6971 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Yufei Gu >Priority: Minor > Labels: newbie > > There are several ways to create a {{resource}} object, e.g., > BuilderUtils.newResource() and Resources.createResource(). These methods not > only cause confusing but also performance issues, for example > BuilderUtils.newResource() is significant slow than > Resources.createResource(). > We could merge them some how, and replace most BuilderUtils.newResource() > with Resources.createResource(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6971) Clean up different ways to create resources
[ https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-6971: - Assignee: Zhaohui Xin > Clean up different ways to create resources > --- > > Key: YARN-6971 > URL: https://issues.apache.org/jira/browse/YARN-6971 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Yufei Gu >Assignee: Zhaohui Xin >Priority: Minor > Labels: newbie > > There are several ways to create a {{resource}} object, e.g., > BuilderUtils.newResource() and Resources.createResource(). These methods not > only cause confusing but also performance issues, for example > BuilderUtils.newResource() is significant slow than > Resources.createResource(). > We could merge them some how, and replace most BuilderUtils.newResource() > with Resources.createResource(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7518) Node manager should allow resource units to be lower cased
[ https://issues.apache.org/jira/browse/YARN-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-7518: - Assignee: Zhaohui Xin > Node manager should allow resource units to be lower cased > -- > > Key: YARN-7518 > URL: https://issues.apache.org/jira/browse/YARN-7518 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-beta1, 3.1.0 >Reporter: Daniel Templeton >Assignee: Zhaohui Xin >Priority: Major > > When we do units checks, we should ignore case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6611) ResourceTypes should be renamed
[ https://issues.apache.org/jira/browse/YARN-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-6611: - Assignee: Zhaohui Xin > ResourceTypes should be renamed > --- > > Key: YARN-6611 > URL: https://issues.apache.org/jira/browse/YARN-6611 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: YARN-3926 >Reporter: Daniel Templeton >Assignee: Zhaohui Xin >Priority: Major > > {{ResourceTypes}} is too close to the unrelated {{ResourceType}} class. > Maybe {{ResourceClass}} would be better? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side
[ https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9302: -- Description: I think it's more flexible to make maxAssign configurable at NM side. After that, we can assign different amount of containers. (was: I think it's more flexible to make maxAssign configurable at NM side. ) > make maxAssign configurable at NM side > -- > > Key: YARN-9302 > URL: https://issues.apache.org/jira/browse/YARN-9302 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > I think it's more flexible to make maxAssign configurable at NM side. After > that, we can assign different amount of containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side
[ https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9302: -- Description: I think it's more flexible to make maxAssign configurable at NM side. (was: I think it's more flexible to config) > make maxAssign configurable at NM side > -- > > Key: YARN-9302 > URL: https://issues.apache.org/jira/browse/YARN-9302 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > I think it's more flexible to make maxAssign configurable at NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side
[ https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9302: -- Description: I think it's more flexible to config > make maxAssign configurable at NM side > -- > > Key: YARN-9302 > URL: https://issues.apache.org/jira/browse/YARN-9302 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > I think it's more flexible to config -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9302) make maxAssign configurable at NM side
[ https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-9302: - Assignee: Zhaohui Xin > make maxAssign configurable at NM side > -- > > Key: YARN-9302 > URL: https://issues.apache.org/jira/browse/YARN-9302 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9302) make maxAssign configurable at NM side
Zhaohui Xin created YARN-9302: - Summary: make maxAssign configurable at NM side Key: YARN-9302 URL: https://issues.apache.org/jira/browse/YARN-9302 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhaohui Xin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-2499) Respect labels in preemption policy of fair scheduler
[ https://issues.apache.org/jira/browse/YARN-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-2499: - Assignee: Zhaohui Xin > Respect labels in preemption policy of fair scheduler > - > > Key: YARN-2499 > URL: https://issues.apache.org/jira/browse/YARN-2499 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Zhaohui Xin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766890#comment-16766890 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~Steven Rand], thanks for your reply. If one long-running task is preempted, It's next attempt will run long time similarly. If this attempt is also be preempted, this job will be difficult to finish. Also, I think it's not reasonable to limit long-running apps in specific queues, which is not generic. Maybe we have a better solution? > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766880#comment-16766880 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~wilfreds], you can see issue [YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may preempt itself in case of minshare preemption. In my opinion, even if this will not happen, we should also add this sanity check. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption
[ https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-8061: - Assignee: Zhaohui Xin > An application may preempt itself in case of minshare preemption > > > Key: YARN-8061 > URL: https://issues.apache.org/jira/browse/YARN-8061 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.0, 2.8.3, 3.0.0 >Reporter: Yufei Gu >Assignee: Zhaohui Xin >Priority: Major > > Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used > 4G, so its minshare-staved resources is 6G and will be distributed to all its > apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, > and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they > are all considered as starved apps except a4 who doesn't get any. > An app can preempt another under the same queue due to minshare starvation. > For example, a1 can preempt a4 if a4 uses more resources than its fair share, > which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt > itself! I will create a unit test later. > The solution would check application's fair share while distributing minshare > starvation, more details in method > {{FSLeafQueue#updateStarvedAppsMinshare()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766880#comment-16766880 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 7:22 AM: Hi, [~wilfreds], you can see issue YARN-8061: An application may preempt itself in case of minshare preemption. In my opinion, even if this will not happen, we should also add this as a sanity check. was (Author: uranus): Hi, [~wilfreds], you can see issue [YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may preempt itself in case of minshare preemption. In my opinion, even if this will not happen, we should also add this sanity check. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:17 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction will be valid only after YARN-2098, I will remove this restriction soon afterwards. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} {quote} We should not preempt container which has been running for a long time. {quote} I think this is a import restriction. *Because it's very costly to kill one task which has been running with dozens of hours.* was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after YARN-2098. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} {quote} We should not preempt container which has been running for a long time. {quote} I think this is a import restriction. *Because it's very costly to kill one task which has been running with dozens of hours.* > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:14 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after YARN-2098. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} {quote} We should not preempt container which has been running for a long time. {quote} I think this is a import restriction. *Because it's very costly to kill one task which has been running with dozens of hours.* was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after [YARN-2098|https://issues.apache.org/jira/browse/YARN-2098]. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:09 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after [YARN-2098|https://issues.apache.org/jira/browse/YARN-2098]. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:58 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:57 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766143#comment-16766143 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~yufeigu]. Can you help me review this patch? :D > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Attachment: YARN-9277.002.patch > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Description: I think we should add more restrictions in fair scheduler preemption. * We should not preempt self * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... was: I think we should add more restrictions in fair scheduler preemption. * We should not preempt AM container * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766008#comment-16766008 ] Zhaohui Xin edited comment on YARN-8655 at 2/12/19 1:09 PM: [~wilfreds], I accidentally discovered this problem in our production cluster about a few months ago. *I think it's enough to satisfy fair share starvation, so I removed min share starvation to fix this problem finally.* I just learned that the community will also abolish min share in the future. After YARN-9066, this issue will no longer be needed. Thanks for your reply. :D was (Author: uranus): [~wilfreds], I accidentally discovered this problem in our production cluster about a few months ago. *I think it's enough to satisfy fair share starvation, so I removed min share starvation to fix this problem finally.* I just learned that the community will also abolish this in the future. After [YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no longer be needed. Thanks for your reply. :D > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766008#comment-16766008 ] Zhaohui Xin commented on YARN-8655: --- [~wilfreds], I accidentally discovered this problem in our production cluster about a few months ago. *I think it's enough to satisfy fair share starvation, so I removed min share starvation to fix this problem finally.* I just learned that the community will also abolish this in the future. After [YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no longer be needed. Thanks for your reply. :D > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9066) Deprecate Fair Scheduler min share
[ https://issues.apache.org/jira/browse/YARN-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766003#comment-16766003 ] Zhaohui Xin commented on YARN-9066: --- [~wilfreds], [~haibochen]. I agree with you very much. It's very complicated to understand min share starvation. After we remove min share starvation, [YARN-8655|https://issues.apache.org/jira/browse/YARN-8655] will no longer be needed. > Deprecate Fair Scheduler min share > -- > > Key: YARN-9066 > URL: https://issues.apache.org/jira/browse/YARN-9066 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: Proposal_Deprecate_FS_Min_Share.pdf > > > See the attached docs for details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765743#comment-16765743 ] Zhaohui Xin edited comment on YARN-8655 at 2/12/19 7:05 AM: [~wilfreds] Thanks for your reply. I think it's not reasonable to process the application twice, because once we preempt containers for this app, we will satisfy both fairshareStarvation and minshareStarvation. {code:java} Resource getStarvation() { return Resources.add(fairshareStarvation, minshareStarvation); } {code} was (Author: uranus): [~wilfreds] Thanks for your reply. I think it's not reasonable to process the application twice, because once we preempt containers for this app, we will consider both fairshareStarvation and minshareStarvation. {code:java} Resource getStarvation() { return Resources.add(fairshareStarvation, minshareStarvation); } {code} > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765743#comment-16765743 ] Zhaohui Xin commented on YARN-8655: --- [~wilfreds] Thanks for your reply. I think it's not reasonable to process the application twice, because once we preempt containers for this app, we will consider both fairshareStarvation and minshareStarvation. {code:java} Resource getStarvation() { return Resources.add(fairshareStarvation, minshareStarvation); } {code} > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7710) http://ip:8088/cluster show different ID with same name
[ https://issues.apache.org/jira/browse/YARN-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764635#comment-16764635 ] Zhaohui Xin edited comment on YARN-7710 at 2/11/19 3:07 AM: [~zjilvufe], can you reproduce this problem? I think we can locate the problem in the following ways, * Add _-verbose_ when submit job, this will print all job configs. You can check _mapreduce.job.name._ {noformat} hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat} * Another way is remoting debug. was (Author: uranus): [~zjilvufe], can you reproduce this problem? I think we can locate the problem in the following ways, # Add _-verbose_ when submit job, this will print all job configs. You can check _mapreduce.job.name._ {noformat} hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat} # remote debug. > http://ip:8088/cluster show different ID with same name > - > > Key: YARN-7710 > URL: https://issues.apache.org/jira/browse/YARN-7710 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.7.3 > Environment: hadoop2.7.3 > jdk 1.8 >Reporter: jimmy >Priority: Blocker > > 1.create five thread > 2.submit five steamJob with different name > 3.visit http://ip:8088 we can see same name for different id sometimes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7710) http://ip:8088/cluster show different ID with same name
[ https://issues.apache.org/jira/browse/YARN-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764635#comment-16764635 ] Zhaohui Xin edited comment on YARN-7710 at 2/11/19 3:06 AM: [~zjilvufe], can you reproduce this problem? I think we can locate the problem in the following ways, # Add _-verbose_ when submit job, this will print all job configs. You can check _mapreduce.job.name._ {noformat} hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat} # remote debug. was (Author: uranus): [~zjilvufe], can you reproduce this problem? I think we can locate the problem in the following ways, # Add _-verbose_ when submit job, this will print all job configs. You can check _mapreduce.job.name._ {noformat} hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat} # remote debug. > http://ip:8088/cluster show different ID with same name > - > > Key: YARN-7710 > URL: https://issues.apache.org/jira/browse/YARN-7710 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.7.3 > Environment: hadoop2.7.3 > jdk 1.8 >Reporter: jimmy >Priority: Blocker > > 1.create five thread > 2.submit five steamJob with different name > 3.visit http://ip:8088 we can see same name for different id sometimes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7710) http://ip:8088/cluster show different ID with same name
[ https://issues.apache.org/jira/browse/YARN-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764635#comment-16764635 ] Zhaohui Xin commented on YARN-7710: --- [~zjilvufe], can you reproduce this problem? I think we can locate the problem in the following ways, # Add _-verbose_ when submit job, this will print all job configs. You can check _mapreduce.job.name._ {noformat} hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat} # remote debug. > http://ip:8088/cluster show different ID with same name > - > > Key: YARN-7710 > URL: https://issues.apache.org/jira/browse/YARN-7710 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.7.3 > Environment: hadoop2.7.3 > jdk 1.8 >Reporter: jimmy >Priority: Blocker > > 1.create five thread > 2.submit five steamJob with different name > 3.visit http://ip:8088 we can see same name for different id sometimes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare
[ https://issues.apache.org/jira/browse/YARN-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764440#comment-16764440 ] Zhaohui Xin commented on YARN-8707: --- [~yufeigu], [~zsiegl]. I attached new patch, can you help me review this? :D > It's not reasonable to decide whether app is starved by fairShare > - > > Key: YARN-8707 > URL: https://issues.apache.org/jira/browse/YARN-8707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.0.0-alpha3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Minor > Attachments: YARN-8707.002.patch, YARN-8707.patch > > > When app's usage reached demand, it's still be considered fairShare starved. > Obviously, that's not reasonable! > {code:java} > boolean isStarvedForFairShare() { > return isUsageBelowShare(getResourceUsage(), getFairShare()); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare
[ https://issues.apache.org/jira/browse/YARN-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-8707: -- Attachment: YARN-8707.002.patch > It's not reasonable to decide whether app is starved by fairShare > - > > Key: YARN-8707 > URL: https://issues.apache.org/jira/browse/YARN-8707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.0.0-alpha3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Minor > Attachments: YARN-8707.002.patch, YARN-8707.patch > > > When app's usage reached demand, it's still be considered fairShare starved. > Obviously, that's not reasonable! > {code:java} > boolean isStarvedForFairShare() { > return isUsageBelowShare(getResourceUsage(), getFairShare()); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764413#comment-16764413 ] Zhaohui Xin commented on YARN-8655: --- [~yufeigu], [~bsteinbach]. I attached new patch, can you help me review this? :D > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-8655: -- Attachment: YARN-8655.002.patch > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-8655: -- Description: *FSStarvedApps is not thread safe, this may make one starve app is processed for two times continuously.* For example, when app1 is *fair share starved*, it has been added to appsToProcess. After that, app1 is taken but appBeingProcessed is not yet update to app1. At the moment, app1 is *starved by min share*, so this app is added to appsToProcess again! Because appBeingProcessed is null and appsToProcess also have not this one. {code:java} void addStarvedApp(FSAppAttempt app) { if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { appsToProcess.add(app); } } FSAppAttempt take() throws InterruptedException { // Reset appBeingProcessed before the blocking call appBeingProcessed = null; // Blocking call to fetch the next starved application FSAppAttempt app = appsToProcess.take(); appBeingProcessed = app; return app; } {code} was: *FSStarvedApps is not thread safe, this may make one starve app is processed for two times continuously.* For example, when app1 is fair share starved, it has been added to appsToProcess. After that, app1 is taken but appBeingProcessed is not yet update to app1. At the moment, app1 is starved by min share, so this app is added to appsToProcess again! Because appBeingProcessed is null and appsToProcess also have not this one. {code:java} void addStarvedApp(FSAppAttempt app) { if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { appsToProcess.add(app); } } FSAppAttempt take() throws InterruptedException { // Reset appBeingProcessed before the blocking call appBeingProcessed = null; // Blocking call to fetch the next starved application FSAppAttempt app = appsToProcess.take(); appBeingProcessed = app; return app; } {code} > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare
[ https://issues.apache.org/jira/browse/YARN-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-8707: -- Issue Type: Sub-task (was: Bug) Parent: YARN-5990 > It's not reasonable to decide whether app is starved by fairShare > - > > Key: YARN-8707 > URL: https://issues.apache.org/jira/browse/YARN-8707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.0.0-alpha3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Minor > Attachments: YARN-8707.patch > > > When app's usage reached demand, it's still be considered fairShare starved. > Obviously, that's not reasonable! > {code:java} > boolean isStarvedForFairShare() { > return isUsageBelowShare(getResourceUsage(), getFairShare()); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-5990 > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt AM container > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-8655: -- Issue Type: Sub-task (was: Bug) Parent: YARN-5990 > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is fair share starved, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is starved by min share, so this app is > added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9278: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-5990 > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Issue Type: Sub-task (was: Bug) Parent: YARN-6242 > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Sub-task > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: Cluster-Scheduler-Performance-5X-Promotion.png, > YARN-9276.001.patch > > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)
[ https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764374#comment-16764374 ] Zhaohui Xin commented on YARN-6487: --- {quote}it seems continuous scheduling will impact scheduler performance. {quote} Hi, [~imstefanlee]. Can you provide some test results to illustrate this? > FairScheduler: remove continuous scheduling (YARN-1010) > --- > > Key: YARN-6487 > URL: https://issues.apache.org/jira/browse/YARN-6487 > Project: Hadoop YARN > Issue Type: Task > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > Remove deprecated FairScheduler continuous scheduler code -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally
[ https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9184: -- Attachment: YARN-9184.005.patch > Docker run doesn't pull down latest image if the image exists locally > -- > > Key: YARN-9184 > URL: https://issues.apache.org/jira/browse/YARN-9184 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.1.0, 3.0.3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9184.001.patch, YARN-9184.002.patch, > YARN-9184.003.patch, YARN-9184.004.patch, YARN-9184.005.patch > > > See [docker run doesn't pull down latest image if the image exists > locally|https://github.com/moby/moby/issues/13331]. > So, I think we should pull image before run to make image always latest. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally
[ https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764186#comment-16764186 ] Zhaohui Xin commented on YARN-9184: --- {quote}I think Mockito update in HADOOP-14178 may have broken this patch. The patch doesn't compile anymore. [~uranus] could you take a look? Thanks {quote} [~eyang], patch 004 has been broken by -HADOOP-14178,- I attached new patch. > Docker run doesn't pull down latest image if the image exists locally > -- > > Key: YARN-9184 > URL: https://issues.apache.org/jira/browse/YARN-9184 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.1.0, 3.0.3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9184.001.patch, YARN-9184.002.patch, > YARN-9184.003.patch, YARN-9184.004.patch > > > See [docker run doesn't pull down latest image if the image exists > locally|https://github.com/moby/moby/issues/13331]. > So, I think we should pull image before run to make image always latest. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9278: -- Description: We should *shuffle* the nodes to avoid some nodes being preempted frequently. Also, we should *limit* the num of nodes to make preemption more efficient. Just like this, {code:java} // we should not iterate all nodes, that will be very slow long maxTryNodeNum = context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); if (potentialNodes.size() > maxTryNodeNum){ Collections.shuffle(potentialNodes); List newPotentialNodes = new ArrayList(); for (int i = 0; i < maxTryNodeNum; i++){ newPotentialNodes.add(potentialNodes.get(i)); } potentialNodes = newPotentialNodes; {code} was:We should shuffle the nodes to avoid some nodes being preempted frequently. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9278: -- Description: We should shuffle the nodes to avoid some nodes being preempted frequently. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should shuffle the nodes to avoid some nodes being preempted frequently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9278: -- Description: We should *shuffle* the nodes to avoid some nodes being preempted frequently. Also, we should *limit* the num of nodes to make preemption more efficient. Just like this, {code:java} // we should not iterate all nodes, that will be very slow long maxTryNodeNum = context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); if (potentialNodes.size() > maxTryNodeNum){ Collections.shuffle(potentialNodes); List newPotentialNodes = new ArrayList(); for (int i = 0; i < maxTryNodeNum; i++){ newPotentialNodes.add(potentialNodes.get(i)); } potentialNodes = newPotentialNodes; {code} was: We should *shuffle* the nodes to avoid some nodes being preempted frequently. Also, we should *limit* the num of nodes to make preemption more efficient. Just like this, {code:java} // we should not iterate all nodes, that will be very slow long maxTryNodeNum = context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); if (potentialNodes.size() > maxTryNodeNum){ Collections.shuffle(potentialNodes); List newPotentialNodes = new ArrayList(); for (int i = 0; i < maxTryNodeNum; i++){ newPotentialNodes.add(potentialNodes.get(i)); } potentialNodes = newPotentialNodes; {code} > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
Zhaohui Xin created YARN-9278: - Summary: Shuffle nodes when selecting to be preempted nodes Key: YARN-9278 URL: https://issues.apache.org/jira/browse/YARN-9278 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhaohui Xin Assignee: Zhaohui Xin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Description: I think we should add more restrictions in fair scheduler preemption. * We should not preempt AM container * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... was: I think we should add more restrictions when preempti * We should not preempt AM container * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt AM container > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions when preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Summary: Add more restrictions when preemption (was: Add more restrictions when preemption) > Add more restrictions when preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > > I think we should add more restrictions when preempti > * We should not preempt AM container > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Summary: Add more restrictions In FairScheduler Preemption (was: Add more restrictions when preemption ) > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > > I think we should add more restrictions when preempti > * We should not preempt AM container > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions when preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Description: I think we should add more restrictions when preempti * We should not preempt AM container * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... > Add more restrictions when preemption > - > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > > I think we should add more restrictions when preempti > * We should not preempt AM container > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9277) Add more restrictions when preemption
Zhaohui Xin created YARN-9277: - Summary: Add more restrictions when preemption Key: YARN-9277 URL: https://issues.apache.org/jira/browse/YARN-9277 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhaohui Xin Assignee: Zhaohui Xin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Attachment: (was: image-2019-02-03-14-58-48-148.png) > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* > > !image-2019-02-03-14-58-48-148.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Description: In some scenarios, applications will request nonexistent resource name in this cluster, such as nonexistent hosts or racks. Obviously, we should filter or degrade these invaild resource requests actively. *This is especially effective when HDFS and Yarn are deployed on different nodes, and the scheduling throughput of one of our clusters has improved by {color:#ff}5X{color}.* was: In some scenarios, applications will request nonexistent resource name in this cluster, such as nonexistent hosts or racks. Obviously, we should filter or degrade these invaild resource requests actively. *This is especially effective when HDFS and Yarn are deployed on different nodes, and the scheduling throughput of one of our clusters has improved by {color:#ff}5X{color}.* !image-2019-02-03-14-58-48-148.png! > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Attachment: Cluster-Scheduler-Performance-5X-Promotion.png > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: Cluster-Scheduler-Performance-5X-Promotion.png > > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Attachment: (was: image-2019-02-03-14-58-31-246.png) > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* > > !image-2019-02-03-14-58-48-148.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Attachment: image-2019-02-03-14-58-48-148.png > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: image-2019-02-03-14-58-31-246.png, > image-2019-02-03-14-58-48-148.png > > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* > > !image-2019-02-03-14-58-48-148.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Description: In some scenarios, applications will request nonexistent resource name in this cluster, such as nonexistent hosts or racks. Obviously, we should filter or degrade these invaild resource requests actively. *This is especially effective when HDFS and Yarn are deployed on different nodes, and the scheduling throughput of one of our clusters has improved by {color:#ff}5X{color}.* !image-2019-02-03-14-58-48-148.png! was: In some scenarios, applications will request nonexistent resource name in this cluster, such as nonexistent hosts or racks. Obviously, we should filter or degrade these invaild resource requests actively. *This is especially effective when HDFS and Yarn are deployed on different nodes, and the scheduling throughput of one of our clusters has improved by {color:#ff}5X{color}.* !image-2019-02-03-14-58-31-246.png! > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: image-2019-02-03-14-58-31-246.png, > image-2019-02-03-14-58-48-148.png > > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* > > !image-2019-02-03-14-58-48-148.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively
[ https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9276: -- Description: In some scenarios, applications will request nonexistent resource name in this cluster, such as nonexistent hosts or racks. Obviously, we should filter or degrade these invaild resource requests actively. *This is especially effective when HDFS and Yarn are deployed on different nodes, and the scheduling throughput of one of our clusters has improved by {color:#ff}5X{color}.* !image-2019-02-03-14-58-31-246.png! was: In some scenarios, applications will request nonexistent resource name in this cluster, such as nonexistent hosts or racks. Obviously, we should filter or degrade these invaild resource requests actively. *This is especially effective when HDFS and Yarn are deployed on different nodes, and the scheduling throughput of one of our clusters has improved by {color:#FF}5X{color}.* > Filter non-existent resource requests actively > -- > > Key: YARN-9276 > URL: https://issues.apache.org/jira/browse/YARN-9276 > Project: Hadoop YARN > Issue Type: Bug > Components: RM, scheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: image-2019-02-03-14-58-31-246.png, > image-2019-02-03-14-58-48-148.png > > > In some scenarios, applications will request nonexistent resource name in > this cluster, such as nonexistent hosts or racks. > Obviously, we should filter or degrade these invaild resource requests > actively. > *This is especially effective when HDFS and Yarn are deployed on different > nodes, and the scheduling throughput of one of our clusters has improved by > {color:#ff}5X{color}.* > > !image-2019-02-03-14-58-31-246.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org