On Mon, Jun 23, 2014 at 9:24 AM, Chris Burroughs <[email protected]>
wrote:

> I have two related questions about the cpu and memory subsystems and at
> most vs at least semantics.
>
> ## cpu
>
> As I understand it, by default cpu.shares is used to give tasks at least
> the specified cpu resources.  Optional with cgroups_enable_cfs both
> cpu.shares and cpu.cfs_period_us/cpu.cfs_quota_us to provide both at most
> and at least guarantees [1].  This is a per slave/cluster(?) setting as
> opposed to per task.
>
> Since cgroups_enable_cfs potentially could leave cpu resources unused I
> presume it has a corresponding advantage, but I'm not sure what it is. When
> would one be preferred over the another?
>

Setting the upper limit provides predictability. This is desired when
running online workloads, like web servers.

For example, let's say that you're happily running along with all your web
services using 16 cores instead of 4. When the system is further
constrained and you're forced down to 4 cores you will have a bad time. :)

It is currently configured at the slave level, but of course it might be
nice to provide different cpu isolation characteristics at the executor
level in the future.


>
> ## mem
>
> Memory uses memory.limit_in_bytes to limit at most how much memory a task
> can use.  There does not seem to be a way to specify that a task should
> have at least a certain amount of memory.  This makes sense for RSS (since
> 'bursting' to RSS  > physical is not a desirable state and part of what
> these limits should avoid).  However, from the kernel docs my understanding
> is that memory.limit_in_bytes also limits how much page cache a task can
> use.  The page cache is a shared resource and keeping the kernel from
> per-fetching and caching as much as it can is a surprising choice to me.
>
> Is there a best practice for frameworks to choose memory limits that
> include RSS + page cache?  In particular, I'm unsure how to reason about
> page cache use since a page is counted against whichever cgroup happened to
> access it first.
>

The page cache accounting was a surprise to us as well, have you seen Ian's
reply here:

http://mail-archives.apache.org/mod_mbox/mesos-user/201406.mbox/%3CCAAJX7shQ_FB6qmvDBYJv5%2Bdh5VgG3WyaedBY_cQo1bY4cPvsDA%40mail.gmail.com%3E

I believe we didn't see this on newer kernels.


>
>
> [1] http://mail-archives.apache.org/mod_mbox/mesos-user/
> 201310.mbox/%3CCA%2B8RcoTbSY%3DKm2jKTVbCid-G76ytFA%2Bq_
> Zmzu5zRGC-wE206Xg%40mail.gmail.com%3E
>
> [2] https://www.kernel.org/doc/Documentation/cgroups/memory.txt
>

Reply via email to