Hello,

I'd like to give you a return of experience because we've worked on this last 
year.
We've used CFS bandwidth isolation for several years and encountered many 
issues (lack of predictability, bugs present in old linux kernels and lack of 
cache/memory locality). At some point, we've implemented a custom isolator to 
manage cpusets (using https://github.com/criteo/mesos-command-modules/ as a 
base to write an isolator in a scripting language).

The isolator had a very simple behavior: upon new task, look at which cpus are 
not within a cpuset cgroup, select (if possible) cpus from the same numa node 
and create cpuset cgroup for the starting task.
In practice, it provided a general decrease of cpu consumption (up to 8% of 
some cpu intensive applications) and better ability to reason about the cpu 
isolation model.
The allocation is optimistic: it tries to use cpus from the same numa node but 
if it's not possible, task is spread accross nodes. In practice it happens very 
rarely because of one small optimization to assign cpus from the most loaded 
numa node (decreasing fragmentation of available cpus accross numa nodes).

I'd be glad to give more details if you are interested

--
Grégoire

Reply via email to