[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-17 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794680#comment-16794680
 ] 

Zhaohui Xin commented on YARN-9278:
---

[~wilfreds], thanks for your reply. I attached new patch which resolved code 
style issues and description in FairScheduler.md. 
{quote}We need to get either add some tests or explain why we cannot add tests.
{quote}
I think it's difficult to test because we can't count the number of nodes which 
will be preempted. Can you give me some suggestions? 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch, 
> YARN-9278.003.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-17 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9278:
--
Attachment: YARN-9278.003.patch

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch, 
> YARN-9278.003.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-14 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792415#comment-16792415
 ] 

Zhaohui Xin commented on YARN-9344:
---

[~wilfreds], I updated patch. Can you help me review this? :D
{quote}The test does only check memory, we should also cover other resource 
types in the test not just the memory resource (vcores, custom resource types).
{quote}
In the new patch, we tested cpu and mem separately. So there is no need to test 
other resource type.

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch, 
> YARN-9344.003.patch, YARN-9344.004.patch, YARN-9344.005.patch, 
> YARN-9344.006.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9344:
--
Attachment: YARN-9344.006.patch

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch, 
> YARN-9344.003.patch, YARN-9344.004.patch, YARN-9344.005.patch, 
> YARN-9344.006.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9278:
--
Attachment: YARN-9278.002.patch

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9344:
--
Attachment: YARN-9344.005.patch

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch, 
> YARN-9344.003.patch, YARN-9344.004.patch, YARN-9344.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9369) Yarn RM metrics test build failed

2019-03-08 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787889#comment-16787889
 ] 

Zhaohui Xin commented on YARN-9369:
---

Thanks for your reply. [~Prabhu Joseph]. This is indeed duplicated.

> Yarn RM metrics test build failed
> -
>
> Key: YARN-9369
> URL: https://issues.apache.org/jira/browse/YARN-9369
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Priority: Major
>
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics
>  
> {code:java}
> java.lang.AssertionError: Expected 2 events to be published expected:<2> but 
> was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9369) Yarn RM metrics test build failed

2019-03-08 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9369:
--
Description: 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics

 
{code:java}
java.lang.AssertionError: Expected 2 events to be published expected:<2> but 
was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}

  was:
 
{code:java}
java.lang.AssertionError: Expected 2 events to be published expected:<2> but 
was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}


> Yarn RM metrics test build failed
> -
>
> Key: YARN-9369
> URL: https://issues.apache.org/jira/browse/YARN-9369
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Priority: Major
>
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics
>  
> {code:java}
> java.lang.AssertionError: Expected 2 events to be published expected:<2> but 
> was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> 

[jira] [Created] (YARN-9369) Yarn RM metrics test build failed

2019-03-08 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-9369:
-

 Summary: Yarn RM metrics test build failed
 Key: YARN-9369
 URL: https://issues.apache.org/jira/browse/YARN-9369
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhaohui Xin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9369) Yarn RM metrics test build failed

2019-03-08 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9369:
--
Description: 
 
{code:java}
java.lang.AssertionError: Expected 2 events to be published expected:<2> but 
was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}

> Yarn RM metrics test build failed
> -
>
> Key: YARN-9369
> URL: https://issues.apache.org/jira/browse/YARN-9369
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Priority: Major
>
>  
> {code:java}
> java.lang.AssertionError: Expected 2 events to be published expected:<2> but 
> was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.verifyEntity(TestSystemMetricsPublisherForV2.java:332)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2.testPublishAppAttemptMetrics(TestSystemMetricsPublisherForV2.java:259)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-08 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9344:
--
Attachment: YARN-9344.004.patch

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch, 
> YARN-9344.003.patch, YARN-9344.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-07 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9344:
--
Attachment: YARN-9344.003.patch

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch, 
> YARN-9344.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-06 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785603#comment-16785603
 ] 

Zhaohui Xin commented on YARN-9344:
---

The whitespace error is not relatived to this patch,  
[YARN-9348|https://issues.apache.org/jira/browse/YARN-9348] will fix it.

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-06 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785603#comment-16785603
 ] 

Zhaohui Xin edited comment on YARN-9344 at 3/6/19 1:00 PM:
---

The whitespace error is not relatived to this patch, YARN-9348 will fix it.


was (Author: uranus):
The whitespace error is not relatived to this patch,  
[YARN-9348|https://issues.apache.org/jira/browse/YARN-9348] will fix it.

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-05 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785202#comment-16785202
 ] 

Zhaohui Xin commented on YARN-9344:
---

[~wilfreds], thanks for your reply. I updated patch to shortcut before assign 
container. Can you help me review this? :D

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-05 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9344:
--
Attachment: YARN-9344.002.patch

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-04 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9344:
--
Summary: FS should not reserve when container capability is bigger than 
node total resource  (was: FS should not reserve when node total resource can 
not meet container capability)

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9344) FS should not reserve when node total resource can not meet container capability

2019-03-04 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-9344:
-

Assignee: Zhaohui Xin

> FS should not reserve when node total resource can not meet container 
> capability
> 
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9344) FS should not reserve when node total resource can not meet container capability

2019-03-04 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-9344:
-

 Summary: FS should not reserve when node total resource can not 
meet container capability
 Key: YARN-9344
 URL: https://issues.apache.org/jira/browse/YARN-9344
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhaohui Xin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-03-04 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783239#comment-16783239
 ] 

Zhaohui Xin commented on YARN-6487:
---

Thanks for your explanation. 

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6225) Global scheduler applies to Fair scheduler

2019-03-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-6225:
-

Assignee: Zhaohui Xin

> Global scheduler applies to Fair scheduler
> --
>
> Key: YARN-6225
> URL: https://issues.apache.org/jira/browse/YARN-6225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Jie
>Assignee: Zhaohui Xin
>Priority: Major
>
> IIRC in global scheduling, logic for scheduling constraint such as nodelabel, 
> affinity/anti-affinity would take place before the scheduler try to commit 
> ResourceCommitRequest. This logic looks can be shared by FairScheduler and 
> CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-28 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528
 ] 

Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:45 PM:


Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' 
heartbeats trigger original scheduling, the continuous scheduling thread will 
be starved because of lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.
{quote}


was (Author: uranus):
Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' 
heartbeats trigger original scheduling, the continuous scheduling thread will 
be starved because lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.
{quote}

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-28 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528
 ] 

Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:43 PM:


Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' 
heartbeats trigger scheduling, the continuous scheduling thread will be starved 
because lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.
{quote}


was (Author: uranus):
Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node 
heartbeats trigger scheduling, the continuous scheduling thread will be starved 
because lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.{quote}

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-28 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528
 ] 

Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:43 PM:


Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' 
heartbeats trigger original scheduling, the continuous scheduling thread will 
be starved because lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.
{quote}


was (Author: uranus):
Hi [~wilfreds]. Please correct me if I am wrong. When a large number of nodes' 
heartbeats trigger scheduling, the continuous scheduling thread will be starved 
because lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.
{quote}

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-28 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528
 ] 

Zhaohui Xin edited comment on YARN-6487 at 2/28/19 1:42 PM:


Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node 
heartbeats trigger scheduling, the continuous scheduling thread will be starved 
because lock conflict.
{quote}The side effect is however that when a cluster grows (100+ nodes) the 
number of heartbeats that needed processing started interfering with the 
continuous scheduling thread and other internal threads. This does cause thread 
starvation and in the worst case scheduling comes to a standstill.{quote}


was (Author: uranus):
Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node 
heartbeats trigger scheduling, the continuous scheduling thread will be starved 
because lock conflict.

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-28 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780528#comment-16780528
 ] 

Zhaohui Xin commented on YARN-6487:
---

Hi [~wilfreds]. Please correct me if I am wrong. When a large number of node 
heartbeats trigger scheduling, the continuous scheduling thread will be starved 
because lock conflict.

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-24 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776539#comment-16776539
 ] 

Zhaohui Xin commented on YARN-6487:
---

Hi, [~wilfreds]. Can you add some reasons why we should remove continuous 
scheduler code in FairScheduler?

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776536#comment-16776536
 ] 

Zhaohui Xin commented on YARN-9278:
---

Thanks for your suggestions, [~wilfreds]. I also think it's better to randomize 
nodes when the number of nodes exceeds a certain threshold.

Maybe our change like this,
{code:java}
List potentialNodes = scheduler.getNodeTracker()
.getNodesByResourceName(rr.getResourceName());
int maxTryNodeNumOnce = conf.getMaxTryNodeNumOnce();

// we should not iterate all nodes, that will be very slow
if (ResourceRequest.ANY.equals(rr.getResourceName()) &&
potentialNodes.size() > maxTryNodeNumOnce) {

  Collections.shuffle(potentialNodes);
  potentialNodes = potentialNodes.subList(0, maxTryNodeNumOnce);
}
{code}

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776207#comment-16776207
 ] 

Zhaohui Xin edited comment on YARN-9278 at 2/24/19 10:14 AM:
-

Thanks for your reply, [~yufeigu]. I think another solution is to stop looking 
for nodes when we find a suitable one. 


was (Author: uranus):
Thanks for your reply, [~yufeigu]. Another solution is to stop looking for 
nodes when we find a suitable one. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776207#comment-16776207
 ] 

Zhaohui Xin commented on YARN-9278:
---

Thanks for your reply, [~yufeigu]. Another solution is to stop looking for 
nodes when we find a suitable one. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only

2019-02-21 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-7904:
-

Assignee: Eric Yang  (was: Zhaohui Xin)

> Privileged, trusted containers need all of their bind-mounted directories to 
> be read-only
> -
>
> Key: YARN-7904
> URL: https://issues.apache.org/jira/browse/YARN-7904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Since they will be running as some other user than themselves, the NM likely 
> won't be able to clean up after them because of permissions issues. So, to 
> prevent this, we should make these directories read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only

2019-02-21 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774703#comment-16774703
 ] 

Zhaohui Xin commented on YARN-7904:
---

[~eyang], I am not working on this. Please feel free to take it. :D

> Privileged, trusted containers need all of their bind-mounted directories to 
> be read-only
> -
>
> Key: YARN-7904
> URL: https://issues.apache.org/jira/browse/YARN-7904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Zhaohui Xin
>Priority: Major
>  Labels: Docker
>
> Since they will be running as some other user than themselves, the NM likely 
> won't be able to clean up after them because of permissions issues. So, to 
> prevent this, we should make these directories read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-21 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773792#comment-16773792
 ] 

Zhaohui Xin commented on YARN-9278:
---

{quote}Without introduce more complexity to FS preemption, it is already very 
complicated, there are some workarounds you can try: To increase FairShare 
Preemption Timeout and FairShare Preemption Threshold to reduce the chance of 
preemption. This is specially useful for a large cluster, since there is more 
chance to get resources just by waiting.
{quote}
If our cluster has a lot of long-running jobs, the above method is not helpful. 

We have used this optimization for more than a year, which improves preemption 
performance effectively. BTW, we have more than 10 clusters and most of them 
have about 10K nodes.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-19 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772638#comment-16772638
 ] 

Zhaohui Xin edited comment on YARN-9278 at 2/20/19 5:32 AM:


Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY 
as resource name, it will find a best node in all nodes of this cluster. This 
will be costly when this cluster has more than 10k nodes.

I think we should limit the number of nodes in such a situation. How do you 
think this? :D


was (Author: uranus):
Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY 
as resource name, it will find a best node in all nodes of this cluster. This 
will be costly when this cluster has more than 10k nodes.

I think we should limit the number of nodes in such a situation. How do you 
think this? :D

 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-19 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772638#comment-16772638
 ] 

Zhaohui Xin commented on YARN-9278:
---

Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY 
as resource name, it will find a best node in all nodes of this cluster. This 
will be costly when this cluster has more than 10k nodes.

I think we should limit the number of nodes in such a situation. How do you 
think this? :D

 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-17 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770420#comment-16770420
 ] 

Zhaohui Xin commented on YARN-9277:
---

In my opinion, preempting one container which has been running more than 10 
hours is equivalent to preempt 10 containers which have been running in 1 hour. 

So we should preempt short-running containers firstly.

[~yufeigu], [~wilfreds]. How do you think this? I attached new patch.:D

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch, 
> YARN-9277.003.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should preempt short-running containers firstly
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-17 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Description: 
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt self
 * We should not preempt short-running containers firstly
 * ...

  was:
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt self
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...


> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch, 
> YARN-9277.003.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt short-running containers firstly
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-17 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Description: 
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt self
 * We should preempt short-running containers firstly
 * ...

  was:
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt self
 * We should not preempt short-running containers firstly
 * ...


> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch, 
> YARN-9277.003.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should preempt short-running containers firstly
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-17 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Attachment: YARN-9277.003.patch

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch, 
> YARN-9277.003.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7021) TestResourceUtils to be moved to hadoop-yarn-api package

2019-02-15 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-7021:
-

Assignee: Zhaohui Xin

> TestResourceUtils to be moved to hadoop-yarn-api package
> 
>
> Key: YARN-7021
> URL: https://issues.apache.org/jira/browse/YARN-7021
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-3926
>Reporter: Sunil Govindan
>Assignee: Zhaohui Xin
>Priority: Major
>
> ResourceUtils class is now in yarn-api. Its better its test class also to be 
> moved there, however these tests using lot of resources and using 
> ConfigurationProvider which is available only in yarn-common.  Hence 
> investigate and improve test for ResourceUtils class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6971) Clean up different ways to create resources

2019-02-15 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-6971:
-

Assignee: (was: Zhaohui Xin)

> Clean up different ways to create resources
> ---
>
> Key: YARN-6971
> URL: https://issues.apache.org/jira/browse/YARN-6971
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> There are several ways to create a {{resource}} object, e.g., 
> BuilderUtils.newResource() and Resources.createResource(). These methods not 
> only cause confusing but also performance issues, for example 
> BuilderUtils.newResource() is significant slow than 
> Resources.createResource(). 
> We could merge them some how, and replace most BuilderUtils.newResource() 
> with Resources.createResource().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6971) Clean up different ways to create resources

2019-02-15 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-6971:
-

Assignee: Zhaohui Xin

> Clean up different ways to create resources
> ---
>
> Key: YARN-6971
> URL: https://issues.apache.org/jira/browse/YARN-6971
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Yufei Gu
>Assignee: Zhaohui Xin
>Priority: Minor
>  Labels: newbie
>
> There are several ways to create a {{resource}} object, e.g., 
> BuilderUtils.newResource() and Resources.createResource(). These methods not 
> only cause confusing but also performance issues, for example 
> BuilderUtils.newResource() is significant slow than 
> Resources.createResource(). 
> We could merge them some how, and replace most BuilderUtils.newResource() 
> with Resources.createResource().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7518) Node manager should allow resource units to be lower cased

2019-02-15 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-7518:
-

Assignee: Zhaohui Xin

> Node manager should allow resource units to be lower cased
> --
>
> Key: YARN-7518
> URL: https://issues.apache.org/jira/browse/YARN-7518
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: Daniel Templeton
>Assignee: Zhaohui Xin
>Priority: Major
>
> When we do units checks, we should ignore case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6611) ResourceTypes should be renamed

2019-02-15 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-6611:
-

Assignee: Zhaohui Xin

> ResourceTypes should be renamed
> ---
>
> Key: YARN-6611
> URL: https://issues.apache.org/jira/browse/YARN-6611
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Zhaohui Xin
>Priority: Major
>
> {{ResourceTypes}} is too close to the unrelated {{ResourceType}} class.  
> Maybe {{ResourceClass}} would be better?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9302:
--
Description: I think it's more flexible to make maxAssign configurable at 
NM side. After that, we can assign different amount of containers.  (was: I 
think it's more flexible to make maxAssign configurable at NM side. )

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> I think it's more flexible to make maxAssign configurable at NM side. After 
> that, we can assign different amount of containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9302:
--
Description: I think it's more flexible to make maxAssign configurable at 
NM side.   (was: I think it's more flexible to config)

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> I think it's more flexible to make maxAssign configurable at NM side. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9302:
--
Description: I think it's more flexible to config

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> I think it's more flexible to config



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-9302:
-

Assignee: Zhaohui Xin

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-9302:
-

 Summary: make maxAssign configurable at NM side
 Key: YARN-9302
 URL: https://issues.apache.org/jira/browse/YARN-9302
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhaohui Xin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2499) Respect labels in preemption policy of fair scheduler

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-2499:
-

Assignee: Zhaohui Xin

> Respect labels in preemption policy of fair scheduler
> -
>
> Key: YARN-2499
> URL: https://issues.apache.org/jira/browse/YARN-2499
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Zhaohui Xin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766890#comment-16766890
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~Steven Rand], thanks for your reply. 

If one long-running task is preempted, It's next attempt will run long time 
similarly. If this attempt is also be preempted, this job will be difficult to 
finish.

Also, I think it's not reasonable to limit long-running apps in specific 
queues, which is not generic. Maybe we have a better solution?

 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766880#comment-16766880
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~wilfreds], you can see issue 
[YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may 
preempt itself in case of minshare preemption. In my opinion, even if this will 
not happen, we should also add this sanity check. 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-8061:
-

Assignee: Zhaohui Xin

> An application may preempt itself in case of minshare preemption
> 
>
> Key: YARN-8061
> URL: https://issues.apache.org/jira/browse/YARN-8061
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Yufei Gu
>Assignee: Zhaohui Xin
>Priority: Major
>
> Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used 
> 4G, so its minshare-staved resources is 6G and will be distributed to all its 
> apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, 
> and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they 
> are all considered as starved apps except a4 who doesn't get any. 
> An app can preempt another under the same queue due to minshare starvation. 
> For example, a1 can preempt a4 if a4 uses more resources than its fair share, 
> which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt 
> itself! I will create a unit test later. 
> The solution would check application's fair share while distributing minshare 
> starvation, more details in method 
> {{FSLeafQueue#updateStarvedAppsMinshare()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766880#comment-16766880
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 7:22 AM:


Hi, [~wilfreds], you can see issue YARN-8061: An application may preempt itself 
in case of minshare preemption. In my opinion, even if this will not happen, we 
should also add this as a sanity check. 


was (Author: uranus):
Hi, [~wilfreds], you can see issue 
[YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may 
preempt itself in case of minshare preemption. In my opinion, even if this will 
not happen, we should also add this sanity check. 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:17 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction will be valid only after YARN-2098, I will remove this 
restriction soon afterwards.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
{quote} We should not preempt container which has been running for a long time.
{quote}
I think this is a import restriction. *Because it's very costly to kill one 
task which has been running with dozens of hours.* 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
YARN-2098.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
{quote} We should not preempt container which has been running for a long time.
{quote}
I think this is a import restriction. *Because it's very costly to kill one 
task which has been running with dozens of hours.* 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:14 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
YARN-2098.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
{quote} We should not preempt container which has been running for a long time.
{quote}
I think this is a import restriction. *Because it's very costly to kill one 
task which has been running with dozens of hours.* 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
[YARN-2098|https://issues.apache.org/jira/browse/YARN-2098].
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:09 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
[YARN-2098|https://issues.apache.org/jira/browse/YARN-2098].
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:58 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.

 

 
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:57 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.

 

 
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply.
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766694#comment-16766694
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~yufeigu]. Thanks for your reply.
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766143#comment-16766143
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~yufeigu]. Can you help me review this patch? :D

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Attachment: YARN-9277.002.patch

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Description: 
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt self
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...

  was:
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt AM container
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...


> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766008#comment-16766008
 ] 

Zhaohui Xin edited comment on YARN-8655 at 2/12/19 1:09 PM:


[~wilfreds], I accidentally discovered this problem in our production cluster 
about a few months ago. *I think it's enough to satisfy fair share starvation, 
so I removed min share starvation to fix this problem finally.* 

I just learned that the community will also abolish min share in the future. 
After YARN-9066, this issue will no longer be needed.

Thanks for your reply. :D


was (Author: uranus):
[~wilfreds], I accidentally discovered this problem in our production cluster 
about a few months ago. *I think it's enough to satisfy fair share starvation, 
so I removed min share starvation to fix this problem finally.* 

I just learned that the community will also abolish this in the future. After 
[YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no 
longer be needed.

Thanks for your reply. :D

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766008#comment-16766008
 ] 

Zhaohui Xin commented on YARN-8655:
---

[~wilfreds], I accidentally discovered this problem in our production cluster 
about a few months ago. *I think it's enough to satisfy fair share starvation, 
so I removed min share starvation to fix this problem finally.* 

I just learned that the community will also abolish this in the future. After 
[YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no 
longer be needed.

Thanks for your reply. :D

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9066) Deprecate Fair Scheduler min share

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766003#comment-16766003
 ] 

Zhaohui Xin commented on YARN-9066:
---

[~wilfreds], [~haibochen]. I agree with you very much. It's very complicated to 
understand min share starvation. After we remove min share starvation, 
[YARN-8655|https://issues.apache.org/jira/browse/YARN-8655] will no longer be 
needed.

> Deprecate Fair Scheduler min share
> --
>
> Key: YARN-9066
> URL: https://issues.apache.org/jira/browse/YARN-9066
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: Proposal_Deprecate_FS_Min_Share.pdf
>
>
> See the attached docs for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-11 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765743#comment-16765743
 ] 

Zhaohui Xin edited comment on YARN-8655 at 2/12/19 7:05 AM:


[~wilfreds] Thanks for your reply. I think it's not reasonable to process the 
application twice, because once we preempt containers for this app, we will 
satisfy both fairshareStarvation  and minshareStarvation.
{code:java}
Resource getStarvation() {
  return Resources.add(fairshareStarvation, minshareStarvation);
}
{code}


was (Author: uranus):
[~wilfreds] Thanks for your reply. I think it's not reasonable to process the 
application twice, because once we preempt containers for this app, we will 
consider both fairshareStarvation  and minshareStarvation.
{code:java}
Resource getStarvation() {
  return Resources.add(fairshareStarvation, minshareStarvation);
}
{code}

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-11 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765743#comment-16765743
 ] 

Zhaohui Xin commented on YARN-8655:
---

[~wilfreds] Thanks for your reply. I think it's not reasonable to process the 
application twice, because once we preempt containers for this app, we will 
consider both fairshareStarvation  and minshareStarvation.
{code:java}
Resource getStarvation() {
  return Resources.add(fairshareStarvation, minshareStarvation);
}
{code}

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7710) http://ip:8088/cluster show different ID with same name

2019-02-10 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764635#comment-16764635
 ] 

Zhaohui Xin edited comment on YARN-7710 at 2/11/19 3:07 AM:


[~zjilvufe], can you reproduce this problem? I think we can locate the problem 
in the following ways,
 * Add _-verbose_ when submit job, this will print all job configs. You can 
check _mapreduce.job.name._
{noformat}
hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat}

 * Another way is remoting debug.


was (Author: uranus):
[~zjilvufe], can you reproduce this problem? I think we can locate the problem 
in the following ways,
 # Add _-verbose_ when submit job, this will print all job configs. You can 
check _mapreduce.job.name._
{noformat}
hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat}

 # remote debug.

> http://ip:8088/cluster show different ID with same name  
> -
>
> Key: YARN-7710
> URL: https://issues.apache.org/jira/browse/YARN-7710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.7.3
> Environment: hadoop2.7.3 
> jdk 1.8
>Reporter: jimmy
>Priority: Blocker
>
> 1.create five thread
> 2.submit five steamJob with different name
> 3.visit http://ip:8088 we can see same name for different id sometimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7710) http://ip:8088/cluster show different ID with same name

2019-02-10 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764635#comment-16764635
 ] 

Zhaohui Xin edited comment on YARN-7710 at 2/11/19 3:06 AM:


[~zjilvufe], can you reproduce this problem? I think we can locate the problem 
in the following ways,
 # Add _-verbose_ when submit job, this will print all job configs. You can 
check _mapreduce.job.name._
{noformat}
hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat}

 # remote debug.


was (Author: uranus):
[~zjilvufe], can you reproduce this problem? I think we can locate the problem 
in the following ways,
 # Add _-verbose_ when submit job, this will print all job configs. You can 
check _mapreduce.job.name._

{noformat}
hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat}

 # remote debug.

> http://ip:8088/cluster show different ID with same name  
> -
>
> Key: YARN-7710
> URL: https://issues.apache.org/jira/browse/YARN-7710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.7.3
> Environment: hadoop2.7.3 
> jdk 1.8
>Reporter: jimmy
>Priority: Blocker
>
> 1.create five thread
> 2.submit five steamJob with different name
> 3.visit http://ip:8088 we can see same name for different id sometimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7710) http://ip:8088/cluster show different ID with same name

2019-02-10 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764635#comment-16764635
 ] 

Zhaohui Xin commented on YARN-7710:
---

[~zjilvufe], can you reproduce this problem? I think we can locate the problem 
in the following ways,
 # Add _-verbose_ when submit job, this will print all job configs. You can 
check _mapreduce.job.name._

{noformat}
hadoop jar hadoop-streaming-xx.jar -D xx=xx -verbose{noformat}

 # remote debug.

> http://ip:8088/cluster show different ID with same name  
> -
>
> Key: YARN-7710
> URL: https://issues.apache.org/jira/browse/YARN-7710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.7.3
> Environment: hadoop2.7.3 
> jdk 1.8
>Reporter: jimmy
>Priority: Blocker
>
> 1.create five thread
> 2.submit five steamJob with different name
> 3.visit http://ip:8088 we can see same name for different id sometimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare

2019-02-10 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764440#comment-16764440
 ] 

Zhaohui Xin commented on YARN-8707:
---

[~yufeigu], [~zsiegl]. I attached new patch, can you help me review this? :D

> It's not reasonable to decide whether app is starved by fairShare
> -
>
> Key: YARN-8707
> URL: https://issues.apache.org/jira/browse/YARN-8707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Minor
> Attachments: YARN-8707.002.patch, YARN-8707.patch
>
>
> When app's usage reached demand, it's still be considered fairShare starved. 
> Obviously, that's not reasonable!
> {code:java}
> boolean isStarvedForFairShare() {
> return isUsageBelowShare(getResourceUsage(), getFairShare());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-8707:
--
Attachment: YARN-8707.002.patch

> It's not reasonable to decide whether app is starved by fairShare
> -
>
> Key: YARN-8707
> URL: https://issues.apache.org/jira/browse/YARN-8707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Minor
> Attachments: YARN-8707.002.patch, YARN-8707.patch
>
>
> When app's usage reached demand, it's still be considered fairShare starved. 
> Obviously, that's not reasonable!
> {code:java}
> boolean isStarvedForFairShare() {
> return isUsageBelowShare(getResourceUsage(), getFairShare());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-10 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764413#comment-16764413
 ] 

Zhaohui Xin commented on YARN-8655:
---

[~yufeigu], [~bsteinbach]. I attached new patch, can you help me review this? :D

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-8655:
--
Attachment: YARN-8655.002.patch

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-8655:
--
Description: 
*FSStarvedApps is not thread safe, this may make one starve app is processed 
for two times continuously.*

For example, when app1 is *fair share starved*, it has been added to 
appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
update to app1. At the moment, app1 is *starved by min share*, so this app is 
added to appsToProcess again! Because appBeingProcessed is null and 
appsToProcess also have not this one. 
{code:java}
void addStarvedApp(FSAppAttempt app) {
if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
appsToProcess.add(app);
}
}

FSAppAttempt take() throws InterruptedException {
  // Reset appBeingProcessed before the blocking call
  appBeingProcessed = null;

  // Blocking call to fetch the next starved application
  FSAppAttempt app = appsToProcess.take();
  appBeingProcessed = app;
  return app;
}
{code}
 

  was:
*FSStarvedApps is not thread safe, this may make one starve app is processed 
for two times continuously.*

For example, when app1 is fair share starved, it has been added to 
appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
update to app1. At the moment, app1 is starved by min share, so this app is 
added to appsToProcess again! Because appBeingProcessed is null and 
appsToProcess also have not this one. 
{code:java}
void addStarvedApp(FSAppAttempt app) {
if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
appsToProcess.add(app);
}
}

FSAppAttempt take() throws InterruptedException {
  // Reset appBeingProcessed before the blocking call
  appBeingProcessed = null;

  // Blocking call to fetch the next starved application
  FSAppAttempt app = appsToProcess.take();
  appBeingProcessed = app;
  return app;
}
{code}
 


> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-8707:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-5990

> It's not reasonable to decide whether app is starved by fairShare
> -
>
> Key: YARN-8707
> URL: https://issues.apache.org/jira/browse/YARN-8707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Minor
> Attachments: YARN-8707.patch
>
>
> When app's usage reached demand, it's still be considered fairShare starved. 
> Obviously, that's not reasonable!
> {code:java}
> boolean isStarvedForFairShare() {
> return isUsageBelowShare(getResourceUsage(), getFairShare());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-5990

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt AM container
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-8655:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-5990

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is fair share starved, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is starved by min share, so this app is 
> added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9278:
--
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-5990

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-10 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-6242

> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: Cluster-Scheduler-Performance-5X-Promotion.png, 
> YARN-9276.001.patch
>
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-10 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764374#comment-16764374
 ] 

Zhaohui Xin commented on YARN-6487:
---

{quote}it seems continuous scheduling will impact scheduler performance.
{quote}
Hi, [~imstefanlee]. Can you provide some test results to illustrate this?

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally

2019-02-09 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9184:
--
Attachment: YARN-9184.005.patch

> Docker run doesn't pull down latest image if the image exists locally 
> --
>
> Key: YARN-9184
> URL: https://issues.apache.org/jira/browse/YARN-9184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9184.001.patch, YARN-9184.002.patch, 
> YARN-9184.003.patch, YARN-9184.004.patch, YARN-9184.005.patch
>
>
> See [docker run doesn't pull down latest image if the image exists 
> locally|https://github.com/moby/moby/issues/13331].
> So, I think we should pull image before run to make image always latest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally

2019-02-09 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764186#comment-16764186
 ] 

Zhaohui Xin commented on YARN-9184:
---

{quote}I think Mockito update in HADOOP-14178 may have broken this patch. The 
patch doesn't compile anymore. [~uranus] could you take a look? Thanks
{quote}
[~eyang], patch 004 has been broken by -HADOOP-14178,- I attached new patch. 

 

> Docker run doesn't pull down latest image if the image exists locally 
> --
>
> Key: YARN-9184
> URL: https://issues.apache.org/jira/browse/YARN-9184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9184.001.patch, YARN-9184.002.patch, 
> YARN-9184.003.patch, YARN-9184.004.patch
>
>
> See [docker run doesn't pull down latest image if the image exists 
> locally|https://github.com/moby/moby/issues/13331].
> So, I think we should pull image before run to make image always latest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9278:
--
Description: 
We should *shuffle* the nodes to avoid some nodes being preempted frequently. 

Also, we should *limit* the num of nodes to  make preemption more efficient.

Just like this,
{code:java}
// we should not iterate all nodes, that will be very slow
long maxTryNodeNum = 
context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();

if (potentialNodes.size() > maxTryNodeNum){
  Collections.shuffle(potentialNodes);
  List newPotentialNodes = new ArrayList();

for (int i = 0; i < maxTryNodeNum; i++){
  newPotentialNodes.add(potentialNodes.get(i));
}
potentialNodes = newPotentialNodes;

{code}
 

  was:We should shuffle the nodes to avoid some nodes being preempted 
frequently.


> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to  make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9278:
--
Description: We should shuffle the nodes to avoid some nodes being 
preempted frequently.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should shuffle the nodes to avoid some nodes being preempted frequently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9278:
--
Description: 
We should *shuffle* the nodes to avoid some nodes being preempted frequently. 

Also, we should *limit* the num of nodes to make preemption more efficient.

Just like this,
{code:java}
// we should not iterate all nodes, that will be very slow
long maxTryNodeNum = 
context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();

if (potentialNodes.size() > maxTryNodeNum){
  Collections.shuffle(potentialNodes);
  List newPotentialNodes = new ArrayList();

for (int i = 0; i < maxTryNodeNum; i++){
  newPotentialNodes.add(potentialNodes.get(i));
}
potentialNodes = newPotentialNodes;

{code}
 

  was:
We should *shuffle* the nodes to avoid some nodes being preempted frequently. 

Also, we should *limit* the num of nodes to  make preemption more efficient.

Just like this,
{code:java}
// we should not iterate all nodes, that will be very slow
long maxTryNodeNum = 
context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();

if (potentialNodes.size() > maxTryNodeNum){
  Collections.shuffle(potentialNodes);
  List newPotentialNodes = new ArrayList();

for (int i = 0; i < maxTryNodeNum; i++){
  newPotentialNodes.add(potentialNodes.get(i));
}
potentialNodes = newPotentialNodes;

{code}
 


> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-03 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-9278:
-

 Summary: Shuffle nodes when selecting to be preempted nodes
 Key: YARN-9278
 URL: https://issues.apache.org/jira/browse/YARN-9278
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhaohui Xin
Assignee: Zhaohui Xin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Description: 
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt AM container
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...

  was:
 

I think we should add more restrictions when preempti
 * We should not preempt AM container
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...


> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt AM container
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions when preemption

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Summary: Add more restrictions when preemption   (was: Add more 
restrictions when preemption)

> Add more restrictions when preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
>  
> I think we should add more restrictions when preempti
>  * We should not preempt AM container
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Summary: Add more restrictions In FairScheduler Preemption   (was: Add more 
restrictions when preemption )

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
>  
> I think we should add more restrictions when preempti
>  * We should not preempt AM container
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions when preemption

2019-02-03 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Description: 
 

I think we should add more restrictions when preempti
 * We should not preempt AM container
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...

> Add more restrictions when preemption
> -
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
>  
> I think we should add more restrictions when preempti
>  * We should not preempt AM container
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9277) Add more restrictions when preemption

2019-02-03 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-9277:
-

 Summary: Add more restrictions when preemption
 Key: YARN-9277
 URL: https://issues.apache.org/jira/browse/YARN-9277
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhaohui Xin
Assignee: Zhaohui Xin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Attachment: (was: image-2019-02-03-14-58-48-148.png)

> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*
>  
> !image-2019-02-03-14-58-48-148.png!  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Description: 
In some scenarios, applications will request nonexistent resource name in this 
cluster, such as nonexistent hosts or racks.

Obviously, we should filter or degrade these invaild resource requests actively.

*This is especially effective when HDFS and Yarn are deployed on different 
nodes, and the scheduling throughput of one of our clusters has improved by 
{color:#ff}5X{color}.*

  was:
In some scenarios, applications will request nonexistent resource name in this 
cluster, such as nonexistent hosts or racks.

Obviously, we should filter or degrade these invaild resource requests actively.

*This is especially effective when HDFS and Yarn are deployed on different 
nodes, and the scheduling throughput of one of our clusters has improved by 
{color:#ff}5X{color}.*

 

!image-2019-02-03-14-58-48-148.png!  


> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Attachment: Cluster-Scheduler-Performance-5X-Promotion.png

> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: Cluster-Scheduler-Performance-5X-Promotion.png
>
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Attachment: (was: image-2019-02-03-14-58-31-246.png)

> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*
>  
> !image-2019-02-03-14-58-48-148.png!  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Attachment: image-2019-02-03-14-58-48-148.png

> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: image-2019-02-03-14-58-31-246.png, 
> image-2019-02-03-14-58-48-148.png
>
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*
>  
> !image-2019-02-03-14-58-48-148.png!  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Description: 
In some scenarios, applications will request nonexistent resource name in this 
cluster, such as nonexistent hosts or racks.

Obviously, we should filter or degrade these invaild resource requests actively.

*This is especially effective when HDFS and Yarn are deployed on different 
nodes, and the scheduling throughput of one of our clusters has improved by 
{color:#ff}5X{color}.*

 

!image-2019-02-03-14-58-48-148.png!  

  was:
In some scenarios, applications will request nonexistent resource name in this 
cluster, such as nonexistent hosts or racks.

Obviously, we should filter or degrade these invaild resource requests actively.

*This is especially effective when HDFS and Yarn are deployed on different 
nodes, and the scheduling throughput of one of our clusters has improved by 
{color:#ff}5X{color}.*

 

!image-2019-02-03-14-58-31-246.png!

 


> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: image-2019-02-03-14-58-31-246.png, 
> image-2019-02-03-14-58-48-148.png
>
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*
>  
> !image-2019-02-03-14-58-48-148.png!  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9276) Filter non-existent resource requests actively

2019-02-02 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9276:
--
Description: 
In some scenarios, applications will request nonexistent resource name in this 
cluster, such as nonexistent hosts or racks.

Obviously, we should filter or degrade these invaild resource requests actively.

*This is especially effective when HDFS and Yarn are deployed on different 
nodes, and the scheduling throughput of one of our clusters has improved by 
{color:#ff}5X{color}.*

 

!image-2019-02-03-14-58-31-246.png!

 

  was:
In some scenarios, applications will request nonexistent resource name in this 
cluster, such as nonexistent hosts or racks.

Obviously, we should filter or degrade these invaild resource requests actively.

*This is especially effective when HDFS and Yarn are deployed on different 
nodes, and the scheduling throughput of one of our clusters has improved by 
{color:#FF}5X{color}.*

 

 


> Filter non-existent resource requests actively
> --
>
> Key: YARN-9276
> URL: https://issues.apache.org/jira/browse/YARN-9276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM, scheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: image-2019-02-03-14-58-31-246.png, 
> image-2019-02-03-14-58-48-148.png
>
>
> In some scenarios, applications will request nonexistent resource name in 
> this cluster, such as nonexistent hosts or racks.
> Obviously, we should filter or degrade these invaild resource requests 
> actively.
> *This is especially effective when HDFS and Yarn are deployed on different 
> nodes, and the scheduling throughput of one of our clusters has improved by 
> {color:#ff}5X{color}.*
>  
> !image-2019-02-03-14-58-31-246.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >