[
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257344#comment-15257344
]
Wangda Tan commented on YARN-4390:
----------------------------------
[~jianhe], [~sunilg],
For your concerns regarding to performance impact of the new approach.
Ran SLS test with the latest patch, mocked a 1000 nodes cluster, each node has
128G memory, typically the cluster runs 20K+ containers concurrently.
For 98% cases, total time of each PCPP execution is less than 10 ms. Only few
runs (out of 200+) use 10-25ms.
Since time complexity of the approach is O\(n\), if we run a cluster with 10k
nodes, theoretically it can take up to 200ms, which is also acceptable to me.
Thoughts?
> Do surgical preemption based on reserved container in CapacityScheduler
> -----------------------------------------------------------------------
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler
> Affects Versions: 3.0.0, 2.8.0, 2.7.3
> Reporter: Eric Payne
> Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf,
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch,
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch, YARN-4390.6.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt
> containers. One is that an app could be requesting a large container (say
> 8-GB), and the preemption monitor could conceivably preempt multiple
> containers (say 8, 1-GB containers) in order to fill the large container
> request. These smaller containers would then be rejected by the requesting AM
> and potentially given right back to the preempted app.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)