Re: [BULK]Re: cgroup CPUSET for mesos agent

Charles-François Natali Thu, 14 Jan 2021 15:18:29 -0800

It's a bit old but in case it could help, we recently implemented this
at work - here's how we did it:
- the NUMA topology is exposed via agent custom resources
- the framework does the allocation of the corresponding resources to
the tasks according to the NUMA topology: e.g. if the task requests 2
CPUs within the same NUMA node, the framework would allocate them
- a custom executor then implements the CPU affinity/cpuset using the
resources provided by the framework


It works really nicely.

Cheers,

Charles


Le mar. 7 juil. 2020 à 18:12, Milind Chabbi <[email protected]> a écrit :
>
> Grégoire, thanks for your reply. This is super helpful to make a stronger 
> case around the affinity benefits.
> Would you be able to offer additional details that you mentioned? I am 
> definitely interested.
> Is your isolator source code publicly available?
>
> -Milind
>
> On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux <[email protected]> wrote:
>>
>> Hello,
>>
>> I'd like to give you a return of experience because we've worked on this 
>> last year.
>> We've used CFS bandwidth isolation for several years and encountered many 
>> issues (lack of predictability, bugs present in old linux kernels and lack 
>> of cache/memory locality). At some point, we've implemented a custom 
>> isolator to manage cpusets (using 
>> https://github.com/criteo/mesos-command-modules/ as a base to write an 
>> isolator in a scripting language).
>>
>> The isolator had a very simple behavior: upon new task, look at which cpus 
>> are not within a cpuset cgroup, select (if possible) cpus from the same numa 
>> node and create cpuset cgroup for the starting task.
>> In practice, it provided a general decrease of cpu consumption (up to 8% of 
>> some cpu intensive applications) and better ability to reason about the cpu 
>> isolation model.
>> The allocation is optimistic: it tries to use cpus from the same numa node 
>> but if it's not possible, task is spread accross nodes. In practice it 
>> happens very rarely because of one small optimization to assign cpus from 
>> the most loaded numa node (decreasing fragmentation of available cpus 
>> accross numa nodes).
>>
>> I'd be glad to give more details if you are interested
>>
>> --
>> Grégoire

Re: [BULK]Re: cgroup CPUSET for mesos agent

Reply via email to