[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4390:
-
Fix Version/s: (was: 2.9.0)

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch, YARN-4390.branch-2.8.002.patch, 
> YARN-4390.branch-2.8.003.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-12-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4390:
-
Attachment: YARN-4390.branch-2.8.003.patch

Thanks [~leftnoteasy]. I've addressed the {{getMemorySize()}} issues and 
attached new patch.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch, YARN-4390.branch-2.8.002.patch, 
> YARN-4390.branch-2.8.003.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-12-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4390:
-
Attachment: YARN-4390.branch-2.8.002.patch

Thanks [~leftnoteasy].
{quote}
Thanks Eric Payne, if the TestCapacitySchedulerSurgicalPreemption passes 
locally for you, please open a JIRA to track the UT failure and unblock the 
backport.
{quote}
YARN-5973
{quote}
And javac warnings / javadocs warnings looks related to the change, could you 
update the patch accordingly?
{quote}
Attaching patch {{YARN-4390.branch-2.8.002.patch}}

I have backported these to branch 2.8 in my local repo. If you like, I can push 
them once you approve of the patch.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch, YARN-4390.branch-2.8.002.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-12-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4390:
-
Attachment: YARN-4390.branch-2.8.001.patch

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch, 
> YARN-4390.branch-2.8.001.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-28 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Attachment: YARN-4390.8.patch

Attached ver.8 patch fixed UT failure.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch, YARN-4390.8.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-26 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Attachment: YARN-4390.7.patch

For comments from [~jianhe],
bq. why is this exception changed to be ignored ? (combine this try-catch 
clause with the one underneath)
Because we don't want to fail RM because of preemption policy's failure.

bq. code does not match comment? comment says not 
considersReservedResourceWhenCalculateIdeal
Updated, it is actually added to wrong place: instead of allowing preemption 
more than total_preemptable. We should not apply nature_termination_factor for 
reserved-preemption-candidates-selector.

In addtion (cc: [~kasha])
For changes of SchedulerNode, I run benchmark tests without volatile changes. 
So all fields of SchedulerNode kept to be synchronized. I can still get similar 
result: for a 1000 nodes cluster, each run of preemption policy takes ~10 ms.

So I removed all volatile/ConcurrentMap changes of 
SchedulerNode/FiCaSchedulerNode. Only kept few cosmetic changes, please let me 
know your thoughts.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch, YARN-4390.7.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-26 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4390:
-
Attachment: QueueNotHittingMax.jpg

Hi [~leftnoteasy]. I have been testing YARN-4390.6.patch. I have tried several 
times with different use cases, and I see something that may be a concern.

My cluster is 3 nodes with 4GB in each node. The queues are
|ops|8GB|preemption disabled|
|eng|2GB|preemption disabled|
|default|2GB|preemption enabled|

I start a job on the default queue that takes up the whole queue. Each 
container is 0.5GB:
{code}
$HADOOP_PREFIX/bin/hadoop jar 
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
 sleep -Dmapreduce.job.queuename=default -m 30 -r 0 -mt 30
{code}
Once that starts, I start a second job on the ops queue with 1GB containers:
{code}
$HADOOP_PREFIX/bin/hadoop jar 
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${HADOOP_VERSION}-tests.jar
 sleep -Dmapreduce.job.queuename=ops -Dmapreduce.map.memory.mb=1024 
-Dmapred.child.java.opts=-Xmx512m -m 30 -r 0 -mt 1
{code}
Now, please refer to the screenshot I put up.  The second job (0007) never is 
able to fill up the ops queue. Containers from the first job (0006) just keep 
getting preempted and then given back to the first job.

It may be that the 1GB container is such a large fraction of the ops queue, but 
I am somewhat concerned about these results. They are very repeatable.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: QueueNotHittingMax.jpg, YARN-4390-design.1.pdf, 
> YARN-4390-test-results.pdf, YARN-4390.1.patch, YARN-4390.2.patch, 
> YARN-4390.3.branch-2.patch, YARN-4390.3.patch, YARN-4390.4.patch, 
> YARN-4390.5.patch, YARN-4390.6.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Attachment: YARN-4390.6.patch

Attached ver.6 patch.

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch, YARN-4390.6.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Summary: Do surgical preemption based on reserved container in 
CapacityScheduler  (was: Consider container request size during CS preemption)

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)