[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

Miklos Szegedi (JIRA) Mon, 28 May 2018 22:21:27 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493112#comment-16493112
 ]


Miklos Szegedi commented on YARN-8320:
--------------------------------------

Thank you, [~cheersyang]. For the responses. They make sense to me in general.
{quote}how many cpuset resource on a NM and how a AM to request?
{quote}
In general, this is an adapter code, passing on a cgroup functionality to 
another API. As such it can do two things. One is being transparent, the other 
is making the original API easier to use. You try to do the latter in your 
design, which makes sense. Being transparent however would mean letting the AM 
choose cpu resources controlling cpu,cpuacct and cpuset resources controlling 
cpuset separately. I would prefer the latter, since it is transparent keeping 
all functionality without restrictions and makes any future design easier to 
implement. cpuset would have as many processors as there are available in 
cpuset.cpus of the container root cgroup that is usually {{hadoop-yarn}}. 
Individual CPUs are chosen by NM based on the number of cpuset cpus granted by 
RM.

However, I do not have a strong opinion about this.

> [Umbrella] Support CPU isolation for latency-sensitive (LS) service
> -------------------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8320) [Umbrella] Support CPU isolation for latency-sensitive (LS) service

Reply via email to