[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063255#comment-14063255
 ] 

Wangda Tan commented on YARN-2297:
----------------------------------

Hi [~sunilg],
Thanks for providing thoughts here!
For your 1st point, I think it should be better solved as Chris suggested, 
using the dead zone parameter 
"yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity"

For your 2nd point,
{code}
I feel now we will take a percentage here to find which queue is under utilized 
more based on its used vs guaranteed_capacity ?
{code}
I think if we use ratio(used, guaranteed), a problem is, assuming qA has 
configured 100MB, it used 10MB, qB has 2GB, it used 500MB, can we say we should 
allocate resource for qA instead of qB?
We've some other options here,
1. Use (guaranteed - used)
2. Use a combined function like sigmoid(ratio(used, guaranteed)) * (guaranteed 
- used)
Do you have any ideas here?

Thanks,
Wangda

> Preemption can hang in corner case by not allowing any task container to 
> proceed.
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-2297
>                 URL: https://issues.apache.org/jira/browse/YARN-2297
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.5.0
>            Reporter: Tassapol Athiapinya
>            Assignee: Wangda Tan
>            Priority: Critical
>
> Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
> container can run.
> h3. queue configuration
> Queue A/B has 1% and 99% respectively. 
> No max capacity.
> h3. scenario
> Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
> 1 user.
> Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
> Occupy entire cluster.
> Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
> Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
> No task of either app can proceed. 
> h3. commands
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
> "-Dmapreduce.map.memory.mb=2000" 
> "-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
> "-Dmapreduce.randomtextwriter.bytespermap=2147483648" 
> "-Dmapreduce.job.queuename=A" "-Dmapreduce.map.maxattempts=100" 
> "-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
> "-Dmapreduce.map.java.opts=-Xmx1800M" 
> "-Dmapreduce.randomtextwriter.mapsperhost=1" 
> "-Dmapreduce.randomtextwriter.totalbytes=2147483648" dir1
> /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
> "-Dmapreduce.map.memory.mb=2000" 
> "-Dyarn.app.mapreduce.am.command-opts=-Xmx1800M" 
> "-Dmapreduce.job.queuename=B" "-Dmapreduce.map.maxattempts=100" 
> "-Dmapreduce.am.max-attempts=1" "-Dyarn.app.mapreduce.am.resource.mb=2000" 
> "-Dmapreduce.map.java.opts=-Xmx1800M" -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to