On Mon, Jul 6, 2020 at 1:18 PM Charles-François Natali <cf.nat...@gmail.com>
wrote:

> >> Also, there are some obvious limitations with this: for example
> >> binding processes to a specific NUMA node means that you might not
> >> benefit from CPU bursting (e.g. if there's some available CPU on
> >> another NUMA node).
> >
> >
> > True. I would like the bust to be limited to only the cores on a single
> socket.
> > Data locality can be more important than available parallelism,
> sometimes.
> >
> >>
> >> Also NUMA binding has actually quite a few possible settings: for
> >> example you might also want to bind the memory allocations, etc, which
> >> means a simple flag might not be enough to achieve what you want.
> >>
> >
> > True. I would like to rely on the default "first touch" policy and if
> the container is restricted to a socket, the data will be allocated on the
> same NUMA node, as long as memory is available.
> >
>
> Yes so it sounds like you probably want some fine-grained control over
> the numa policy, which would probably be difficult to implement in the
> agent.
>
> >> One possibility I can think of might be to write your own executor -
> >> we wrote our own executor at work for various reasons.
> >> It's a bit of work, but it would give you unlimited flexibility in how
> >> you start your tasks, bind them etc.
> >>
> >
> > I am new to the mesos code base, I would appreciate any pointers or
> examples.
>
> For the executor have you read
> http://mesos.apache.org/documentation/latest/executor-http-api/ ?
> For code you can have a look e.g. at the command executor:
> https://github.com/apache/mesos/blob/master/src/launcher/executor.cpp
>
> Or a trivial example in Python:
> https://github.com/douban/pymesos/blob/master/examples/executor.py
>
> >> Also out of curiosity - is automatic NUMA balancing enabled on your
> >> agents (kernel.numa_balancing sysctl)?
> >
> >
> > Interesting. I was unaware of this sysctl flag. On looking up more, I
> realize that it may not work for our use case.
> > It migrates pages to cores used by a container. If no CPUSET was
> assigned to begin with, for the Go and java programs with 10s (some times
> 1000s) of CPU threads, I notice that the data gets 50-50 split on a
> 2-socket system.
> > For real-time queries that last for 100s of milliseconds, I don't see
> kernel's automatic migration being very effective; in fact, it may worsen
> the situation.
> > Have you had success with kernel.numa_balancing? What was the scenario
> where it helped?
>
> Yes the reason I was asking is that it might actually be causing you
> some pain if it's enabled, depending on your workloads.
> The only times I had to use this sysctl was actually to disable it -
> in my experience it was causing some latency spikes: I'm not talking a
> few usec you might expect from a soft page fault, but single-digit ms
> latencies.
> Obviously it depends on the workloads and can probably help most of
> the time, since I believe it's enabled by default on NUMA systems.
> I guess the best way to find out is to try :).
>
> > I notice that the data gets 50-50 split on a 2-socket system
>
> Do you mean for a single process - by looking at /proc/<pid>/numa_maps ?
> Is it with or without numa balancing?
>
>
> By looking up `numastat -p pid`. Numa balancing is off.


>
> >
> >>
> >>
> >> Cheers,
> >>
> >> Charles
> >>
> >>
> >> Le lun. 6 juil. 2020 à 19:36, Milind Chabbi <mil...@uber.com> a écrit :
> >> >
> >> > Hi,
> >> >
> >> > I have noticed that without explicit flags, the mesos-agent does not
> restrict a cgroup of a container to any CPUSET. This has quite deleterious
> consequences in our usage model where the OS threads in containerized
> processes migrate to any NUMA sockets over time and lose locality to memory
> they allocated with the first touch policy. It would take a lot of effort
> to specify the exact CPUSET at the container launch time.
> >> >
> >> > I am wondering if the mesos agent can expose a flag (e.g.,
> --best-effort-numa-locality) so that if the requested number of CPU share
> and memory demands meet the requirements, then the container can be
> launched with the cgroup affinity set to a single NUMA socket and avoid any
> deleterious effects of unrestricted CPU migration.
> >> >
> >> > -Milind
>

Reply via email to