On 22/11/13 10:58, James Hunt wrote: > Hi Stéphane, > > On 20/11/13 19:23, Stéphane Graber wrote: >> This morning at vUDS we discussed adding support for cgroups in Upstart. >> >> Before I go into details about the proposed stanza and overall >> behaviour, I'd begin by saying that contrary to some other init systems, >> our intent is solely related to resource controls which is the main goal >> of cgroups. Process grouping and tracking will remain unaffected by the >> addition of cgroup support. >> >> Cgroup support will be implemented by adding a new "cgroup" stanza which >> will control the application of cgroup based restrictions to the job. >> The limits will be applied to any of the scripts > ~~~ > s/any/all/ > >> (pre-start/post-start/job/pre-stop/post-stob) similar to what's done >> with setuid/setgid/apparmor stanzas. >> >> Now my recommended format for the stanza, which I believe should be >> flexible enough is: >> cgroup <controller> <cgroup name|auto> [<key> <value>] >> >> >> Detail on the fields: >> == controller == >> Name for one of the cgroup controller >> >> Currently the valid values are (but won't be hardcoded into upstart): >> - blkio >> - cpu >> - cpuacct >> - cpuset >> - devices >> - freezer >> - hugetlb >> - memory >> - perf_event >> >> == cgroup-name|$auto == >> Name of the cgroup to use (and create if non-existing) >> >> The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that >> it's requesting a sub-cgroup. > Since cgroups are represented by directories, we're either going to have to > require that the name be quoted, or only support cgroups without spaces in > them. > I think quotes is preferable as it provides full flexibility, for example: > > cgroup cpu "my cpu cgroup 1" soft_limit_in_bytes 1024 > >> >> "$auto" is the recommended name and will have upstart generate a name >> based on the job instance name. > I think this is confusing - "$auto" is too suggestive of an environment > variable. However, we can change that to simply 'auto' if we require cgroup > names to be quoted as mentioned since the bare-word auto can then be safely > special-cased: > > cgroup cpu auto soft_limit_in_bytes 1024 > >> >> The main use of that field is for cases where a set of jobs should share >> limits, in such case the main job should declare the various values and >> the others just refer to the cgroup by name but not defined values. >> >> The name may be different for the various controllers but may not differ >> within the same controller. Example: >> valid => cgroup memory group1 limit_in_bytes 52428800 >> cgroup cpuset group2 cpus 0-1 >> >> invalid => cgroup memory group1 limit_in_bytes 52428800 >> cgroup memory group1 soft_limit_in_bytes 1024 >> >> == key == >> The cgroup control file minus the controller name, so for example >> memory.soft_limit_in_bytes will become limit_in_bytes. >> >> == value == >> Any value valid for the given control file, upstart itself won't perform >> any validation. >> >> If the value contains spaces, it should be put between double-quotes (e.g.): >> cgroup devices auto allow "c 1:2 rwm" >> >> >> Upstart won't have any controller aware logic in its code, instead, >> it'll simply talk over dbus (using a private dbus socket) to the cgroup >> manager which will take care of applying the various limits. >> That cgroup manager will be started very early in the boot sequence. Any >> job containing a cgroup stanza will be held until the manager is >> started. >> >> The cgroup will be destroyed when a job is stopped and the cgroup isn't >> shared with another job (task count is 0 and it has no child cgroup). >> >> It'll be possible to disable cgroup support entirely by either building >> upstart without it (needed for non-Linux systems) or by passing >> --no-cgroup as a parameter to upstart. In that case, the cgroup stanza >> will simply be ignored and the jobs will start without limitations. >> >> >> All of the above is also meant to apply to user sessions. The cgroup >> manager will allow unprivileged cgroup configuration, so as long as the >> user has write access to a sub-section of a controller, it'll be allowed >> to write entries there. Similarly to other restriction stanzas, failure >> to apply a cgroup limit in a user session won't be fatal. >> >> >> Now a few examples to try and illustrate the thoughts behind that proposal: >> >> == Single job simple example == >> === Job === >> cgroup memory $auto limit_in_bytes 52428800 >> >> === Result === >> The job will only start once the manager is up and running and will have a >> 50MB memory limit. If the system has less than 50MB, the job will fail >> to start. >> >> == Single job complex example == >> === Job === >> cgroup memory $auto limit_in_bytes 52428800 >> cgroup cpuset $auto cpus 0-1 >> cgroup blkio slowio throttle.write_bps_device "8:16 1048576" >> >> == Result == >> The job will only start once the manager is up and running and will have a >> 50MB memory limit, be restricted to CPU ids 0 and 1 and have a 1MB/s >> write limit to the block device 8:16. >> The job will fail to start if the system has less than 50MB of RAM or >> less than 2 CPUs. >> >> >> == Multiple jobs complex example == >> === Job 1 === >> cgroup cpuset db cpus 0-1 >> cgroup memory db limit_in_bytes 104857600 >> cgroup blkio db throttle.write_bps_device "8:16 1048576" >> >> === Job 2 === >> cgroup cpuset db/$auto cpus 1 We've realised that using a bare auto is going to be problematic in the sub-cgroup scenario: if we require the name to be quoted, we have:
cgroup cpuset "db/"auto cpus 1 However, that is rather odd syntax since it looks wrong - most folk would expect a space immediately before the word auto. Added to which, it would be too easy to inadvertantly put the auto within the quotes which would change the behavior completely: cgroup cpuset "db/auto" cpus 1 That (probably) isn't going to do what was intended since rather than creating a sub-cgroup named 'db/<job details>', the sub-cgroup would be named literally 'db/auto'. Stéphane and I have discussed this and the feeling is that we should embrace the fact that $auto looks like a variable and support variable expansion in the cgroup name token (in fact Scott already suggested this in [1]). Further, by supporting a $UPSTART_CGROUP (*) variable (which would represent the unique representation Upstart decides to choose for the job instances sub-cgroup in question) we have: cgroup cpuset "db/$UPSTART_CGROUP" cpus 1 ... or to create a literal 'auto' sub-cgroup: cgroup cpuset "db/auto" cpus 1 Note that $UPSTART_CGROUP would map any slashes to underscores (as it done for example by the logger when logging instance job output in /var/log/upstart/). We would need to decide how best to handle a job that specifies a variable in the cgroup name string that does contain a slash (a hard error would be safest of course). Thoughts? >> cgroup memory db/$auto limit_in_bytes 52428800 >> cgroup blkio db/$auto throttle.write_bps_device "8:17 1048576" >> >> === Job 3 === >> cgroup cpuset db >> cgroup memory db >> >> === Job 4 === >> cgroup cpuset db/$auto cpus 2 >> >> == Result == >> This is rather complex, so let's go job by job: >> - Job 1 will start bound to CPU 0 and 1 with a 100MB memory limit and >> 1MB/s write limit to the 8:16 block device. It'll fail to start if >> the system has less than 2 CPUs or less than 100MB of RAM. >> >> - Job 2 will start bound to CPU 1 and with a 50MB memory limit. It'll >> inherit the 1MB/s write limit to 8:16 and on top of that also rate limit >> writes to 8:17 also at 1MB/s. >> The job will fail to start if the system has less than 50MB of RAM or >> less than 2 CPUs. >> >> - Job 3 will start in the "db" cpuset and memory cgroups. If it starts >> before Job 1, no limit will be applied at startup time. As soon as Job 1 >> starts however Job 3 will be limited to 2 CPUs and 100MB of memory. >> As it doesn't have a blkio statement, it won't have rate limited I/Os. >> >> - Job 4 if started after Job 1 will fail to start as it's requesting a >> CPU that the parent cgroup doesn't have access to. If started before >> Job 1 however, it won't have a parent value set so will inherit the >> default and so will start so long as the system has at least 3 CPUs. >> >> >> >> I think this pretty much covers all I've got in mind at this point, I >> think the above is flexible enough to work with all existing >> controllers. >> >> Questions, comment and suggestions are much welcome! >> >> >> > > Thanks for documenting this! > > Kind regards, > > James. > -- > James Hunt > ____________________________________ > #upstart on freenode > http://upstart.ubuntu.com/cookbook > https://lists.ubuntu.com/mailman/listinfo/upstart-devel > Kind regards, James. [1] - https://lists.ubuntu.com/archives/upstart-devel/2012-May/001877.html (*) - $UPSTART_CGROUP would *not* be exported into the jobs environment since if it was not used in the cgroup stanza it would not correctly represent the cgroup for the job instance. -- James Hunt ____________________________________ #upstart on freenode http://upstart.ubuntu.com/cookbook https://lists.ubuntu.com/mailman/listinfo/upstart-devel -- upstart-devel mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/upstart-devel
