Hello, I'd like to give you a return of experience because we've worked on this last year. We've used CFS bandwidth isolation for several years and encountered many issues (lack of predictability, bugs present in old linux kernels and lack of cache/memory locality). At some point, we've implemented a custom isolator to manage cpusets (using https://github.com/criteo/mesos-command-modules/ as a base to write an isolator in a scripting language).
The isolator had a very simple behavior: upon new task, look at which cpus are not within a cpuset cgroup, select (if possible) cpus from the same numa node and create cpuset cgroup for the starting task. In practice, it provided a general decrease of cpu consumption (up to 8% of some cpu intensive applications) and better ability to reason about the cpu isolation model. The allocation is optimistic: it tries to use cpus from the same numa node but if it's not possible, task is spread accross nodes. In practice it happens very rarely because of one small optimization to assign cpus from the most loaded numa node (decreasing fragmentation of available cpus accross numa nodes). I'd be glad to give more details if you are interested -- Grégoire