[ 
https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487596#comment-16487596
 ] 

Wangda Tan commented on YARN-8320:
----------------------------------

Thanks [~cheersyang] / [~yangjiandan] for the detailed design, very helpful to 
understand the contexts. 

I took a quick look at the proposal, a couple of questions / comments: 

1) It seems that the #vcore must be divisible by #physical-core, otherwise it 
will cause rounding issue and containers will get less/more than requested 
resources. If admin enable the feature, YARN should take care of checking this 
value before starting NM.

2) I'm still trying to understand benefit of RESERVED / SHARED mode. If a 
RESERVED core can be used by ANY container, in my mind, the RESERVED container 
can be affected by adhoc ANY container. And similarly, if we allow SHARED 
containers bind to same set of cores, considering SHARED containers are running 
LS services and CPU-intensive, they could compute a lot on these SHARED 
containers. Which could lead to even worse latency and more competitions.

3) Relationship to other features:
- Related to NUMA allocation on YARN (YARN-5764), to me the two features are 
related to each other: Allocate reserved cores to a same process on the same or 
closest NUMA zone(s) has the best performance, but satisfy one condition can 
break the other one. Should be very careful to make sure the two features can 
work together.

- Related to GPU allocation on YARN: On one machine, GPU performance is 
sensitive to topology of GPUs. Communication latency and bandwidth differs a 
lot when GPUs are connected by NVLink, PCI-E, etc. It might be valuable to 
think about is it possible to have a same framework on the same NM to do 
resource-specific scheduling and placement.

- Related to ResourcePlugin framework: We added ResourcePlugin framework since 
YARN-7224, and now GPU/FPGA are using the framework to implement the feature. 
I'm not sure if this feature can benefit from the ResourcePlugin framework, or 
some refactoring required to the framework. It's better if we can extract 
common part and workflow out.

4) To me only privileged users and applications can request non-ANY CPU mode, 
how can we enforce this (maybe not in phase#1, but we need a plan here).

> Support CPU isolation for latency-sensitive (LS) service
> --------------------------------------------------------
>
>                 Key: YARN-8320
>                 URL: https://issues.apache.org/jira/browse/YARN-8320
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Jiandan Yang 
>            Priority: Major
>         Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, 
> CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch
>
>
> Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and 
> “cpu.shares” to isolate cpu resource. However,
>  * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; 
> no support for differentiated latency
>  * Request latency of services running on container may be frequent shake 
> when all containers share cpus, and latency-sensitive services can not afford 
> in our production environment.
> So we need more fine-grained cpu isolation.
> Here we propose a solution using cgroup cpuset to binds containers to 
> different processors, this is inspired by the isolation technique in [Borg 
> system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to