It's a bit old but in case it could help, we recently implemented this at work - here's how we did it: - the NUMA topology is exposed via agent custom resources - the framework does the allocation of the corresponding resources to the tasks according to the NUMA topology: e.g. if the task requests 2 CPUs within the same NUMA node, the framework would allocate them - a custom executor then implements the CPU affinity/cpuset using the resources provided by the framework
It works really nicely. Cheers, Charles Le mar. 7 juil. 2020 à 18:12, Milind Chabbi <mil...@uber.com> a écrit : > > Grégoire, thanks for your reply. This is super helpful to make a stronger > case around the affinity benefits. > Would you be able to offer additional details that you mentioned? I am > definitely interested. > Is your isolator source code publicly available? > > -Milind > > On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux <g.s...@criteo.com> wrote: >> >> Hello, >> >> I'd like to give you a return of experience because we've worked on this >> last year. >> We've used CFS bandwidth isolation for several years and encountered many >> issues (lack of predictability, bugs present in old linux kernels and lack >> of cache/memory locality). At some point, we've implemented a custom >> isolator to manage cpusets (using >> https://github.com/criteo/mesos-command-modules/ as a base to write an >> isolator in a scripting language). >> >> The isolator had a very simple behavior: upon new task, look at which cpus >> are not within a cpuset cgroup, select (if possible) cpus from the same numa >> node and create cpuset cgroup for the starting task. >> In practice, it provided a general decrease of cpu consumption (up to 8% of >> some cpu intensive applications) and better ability to reason about the cpu >> isolation model. >> The allocation is optimistic: it tries to use cpus from the same numa node >> but if it's not possible, task is spread accross nodes. In practice it >> happens very rarely because of one small optimization to assign cpus from >> the most loaded numa node (decreasing fragmentation of available cpus >> accross numa nodes). >> >> I'd be glad to give more details if you are interested >> >> -- >> Grégoire