[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481841#comment-16481841
 ] 

Weiwei Yang commented on YARN-8320:
-----------------------------------

Thanks [~yangjiandan]for the proposal, this is very interesting, will help to 
extend Yarn to support online services, especially latency-sensitive services.  
I think we should have an umbrella Jira track this effort to support LS 
services. [~leftnoteasy], what do you think?

I did take a look at the proposal, there are still some details to be figured 
out, but overall a good start. Some early comments,
 # Section 2 presents 4 modes, which is a bit complex. If possible, we should 
start to support exclusive and non-exclusive mode in the first phase.
 # The proposal needs to add more info about the RM side change. It's not clear 
to me if scheduler needs cpu share mode info for its scheduling decisions. And 
also not clear to me what's the relationship between vcores and cpu share mode. 
Please add more info, with some examples.
 # Update container cpu share mode might also be a phase 2 work.

I will deep dive into this area next week and share some more comments if I 
found any. We can have some discussion over this too.

Thanks

> Add support CPU isolation for latency-sensitive  (LS) service
> -------------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares”  to isolate cpu resource. However,
> * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
> * Request latency of services running on container may be frequent shake when 
> all containers share cpus, and latency-sensitive services can not afford in 
> our production environment.
> So we need more finer cpu isolation.
> My co-workers and I propose a solution using cgroup cpuset to binds 
> containers to different processors according to a [Google’s 
> PPT|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].
> Later I will upload a detailed design doc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to