[
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503212#comment-16503212
]
Weiwei Yang commented on YARN-8320:
-----------------------------------
Hi [[email protected]]
I mean by letting user setup both #vcore and #cpus in their resource request is
too complex. Even for phase 1, if only EXCLUSIVE mode is supported, for example:
{noformat}
NM:
#vcore: 100
#cpu: 10
{noformat}
User want to use exclusive, so the request must be like
{noformat}
Request1:
#vcore: 10 * N
#cpu: N (0<N<=10)
{noformat}
if {{#vcore < 10 * N}}, that means some cpu is wasted. If user sets this to
{noformat}
Request2:
#vcore: 80
#cpu: 9
{noformat}
after allocation, NM capacity left
{noformat}
NM:
#vcore: 20
#cpu: 1
{noformat}
now when a #vcore=20 container landed on this node, it can only get 10% cputime
(instead of 20%) since 9 cpus are already occupied by request2. This is not
expected. And if you think about RESERVED/SHARED mode, it will be more complex.
User will not able to know how many number of cpus to specify in their request
to achieve a RESERVED/SHARED mode cpu sharing.
Does this make sense?
Thanks
> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>
> Key: YARN-8320
> URL: https://issues.apache.org/jira/browse/YARN-8320
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Reporter: Jiandan Yang
> Priority: Major
> Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf,
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and
> “cpu.shares” to isolate cpu resource. However,
> * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler;
> no support for differentiated latency
> * Request latency of services running on container may be frequent shake
> when all containers share cpus, and latency-sensitive services can not afford
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to
> different processors, this is inspired by the isolation technique in [Borg
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]