-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi All,
= Caveat = This is very much a brain dump and doesn't have all the answers - please comment and fill in the blanks when you spot them! :-) = Introduction = We are looking to provide the ability to fully disable a job. = Rationale = Lots of users are familiar with the old SysV way of handling jobs and are looking for a chkconfig-like tool to ease the transition to Upstart. The "manual" stanza coupled with the Override facility does already provide this facility, but have the following shortcomings. == Shortcomings of Override Files == * There is no programmatic Upstart interface: it requires a tool/user to manually create a ".override" file contaning the "manual" stanza (or simply appending "manual" to the ".conf" file). * It is too generic a facility / not "fail-safe" Any Admin/tool/pkg can manipulate ".override" files. If an Admin disables a job using a ".override" file, they might find that it has later been changed by another tool that rewrote the override. This is undesirable since the job may no longer be disabled. * Not obvious how to determine if a job *is* enabled or disabled. It is possible though. See: http://upstart.ubuntu.com/cookbook/#determine-if-a-job-is-disabled = Requirement = A "chkconfig"-like tool [1] to allow: * Jobs to be disabled in particular runlevels. * The ability to determine if a job is disabled for a particular runlevel. * The ability to determine if a job *will* run for a particular runlevel (note: this is *NOT* the same as the bullet above! See below...) = Ideal = The ideal tool would provide the following details: * Job name. * Instance name. * Which runlevels a job is enabled and disabled in. This breaks down into: * Job is enabled for specified runlevel. * Job is explicitly disabled for specified runlevel. * Job is *implicitly* disabled for specified runlevel. * Whether the job ran last time? (would require an event+job log. Can never be 100% reliable of course since config may have changed between boots.) = Preliminaries = == Thoughts and Observations == * It is actually rather difficult to map the Upstart event model onto such a tool since SysV init doesn't behave like Upstart (further details below). * If a job is explicitly disabled completely, jobs which start on that job will be implicitly disabled. This information needs to be conveyed somehow. * If a job has a start on condition as below, what action should we take if the user requests the job be disabled in runlevel 2?: start on foo or runlevel 2 Since it is (currently) not possible to know upfront whether "foo" or "runlevel 2" will be satisfied at boot time, it may be reasonable to (by default) disable such a job in runlevel 2 since the "start on" has specified it *might* "start on runlevel 2". We could provide an option to control this subtle behaviour. * If we provide the ability to disable any job, the system could become unbootable very quickly. == Constraints == * Upstart currently has no knowledge of SystemV runlevels: they are supported through events and external applications such as telinit. This premise should not need to be contravened - the internals of Upstart should not need to be imbued with runlevel knowledge. This implies that: 1. The facility should work for *any* event (not just runlevels). 2. The facility should be driven by an external tool of some kind (in other words either a program or script which calls initctl as appropriate). * Runlevels are implemented with the "runlevel" event which has a primary environment variable "RUNLEVEL" taking a value from 0 to 6. It needs to be possible to disable a job: * entirely (where it has any "start on" condition). * in all runlevels ("[0123456]"). * in some runlevels (for example "[345]"). * Upstart allows jobs to be started based on arbitrarily complex conditions. Any facility to disable a job should consider these conditions. == Categories of Jobs == There are a number of job categories that we need to consider: 1. Jobs that specify a start on which does *NOT* include runlevel. They may start before or after the runlevel event is emitted. 2. Jobs that start on the initial event. A small handful of jobs "start on startup". This is a specialisation of (1). 3. Jobs that "start on runlevel" (a single event). Such jobs may restrict the start on further by specifying environment variables (RUNLEVEL and PREVLEVEL). 4. Jobs that specify a "complex" start on (one using "and" / "or") which includes "runlevel". = Terminology = * "limit" Since we want to be able to disable Upstart jobs based on some condition, "disable" is rather a crude term. The word "limit" is better since it connotes the more fine-grained approach being proposed. Its antonym being "delimit" (I'd initially thought of "restrict" and "derestrict" but (,de)limit is shorter :-) = Scope = Ideally, it would be possible to disable a job *instance*. But that is probably going to be an "iteration 2" feature. Of the four categories of Jobs outlined above, only category (3) and (4) can reasonably be dealt with by this design. Category (1) breaks down into jobs that run before the runlevel event is emitted (about 20 on an Ubuntu oneiric system currently) and jobs that run after. The former have to be excluded but the latter may be able to be considered. It is possible that many of those would end up being implicitly disabled if a job in category (3) or (4) were disabled anyway [2]. It isn't reasonable to stop category (2) jobs from running since that will almost certainly break your system anyway: mountall won't run for starters! = High-Level Plan = My thoughts at this stage are that we provide 3 new commands (note these are not *necessarily* initctl commands): * limit <job> [<expr>] Restrict conditions on which job <job> is started. <expr> is assumed to be a subset of the "start on" condition of <job>, however if it is not, this is not an error (but a warning should probably be issued since the command would have no effect at that point in time. QUESTION: If job <job> has already been limited, what do we do: 1. Throw an error. 2. Replace the existing limit with the new one. QUESTION: How would we handle this scenario?: $ restrict cron runlevel [35] $ restrict cron runlevel RUNLEVEL=4 Possible outcomes: 1. Cron is restricted in runlevels 3+5. 1. Cron is restricted in runlevel 4. 1. Cron is restricted in runlevel 3, 4 and 5. * delimit <job> Returns any current limit expression and undoes the effect of "limit". * show-limit [<job> [<expr>]] Show limits for all jobs or specified job. Command should emit a warning if any limit is found that is not a subset of the "start on" for the job in question (since the limit will have no effect). If no expression is supplied, show "raw" limit. If an expression *is* specified, determine if job would run given that expression. Example: Assume a job specifies "start on runlevel [345]". If a limit of "runlevel RUNLEVEL=4" has been set, we want a higher-level tool to be able to query directly if the job would run in runlevel 4 so returning "runlevel [345]" isn't that helpful. What we really want to say is: $ show-limit foo runlevel 4 And have the tool display whether for "runlevel 4" job foo would run based on the limit of "runlevel [345]". This could be displayed in parseable format and also maybe returned via the return code. Thought: maybe we could add a "query-limit" command specifically for this and have "show-limit" just return the "raw" limit details? = Implementation Details = == Limit Condition == To satisfy the chkconfig requirement, we could just allow a single event and optional environment to be specified. However, the better solution is to allow an arbitrary condition (like "start on" and "stop on"). The condition could almost be viewed as a "restrict on" stanza. Only one such limit condition may be specified. XXX: Note that the condition itself -- for the example of runlevels -- cover all the runlevels where that job must not run. This is an important point: the condition only specifies a single runlevel if that job should only be disabled in a single runlevel. The "norm" is probablly more likely to be where the condition covers *more than one* runlevel. This is perfectly acceptable since "show-limit" allows an *actual* runlevel to be specified so a higher-level tool can establish if a job would be disabled for a particular runlevel. == Matching Limits to Events == If a job condition becomes "true" such that Upstart would normally attempt to start the job and if that job has a limit condition which "matches" part of the EventOperator tree, Upstart will not run the job. === Examples === start on : runlevel [2345] runlevel : 2 limit : runlevel 2 outcome : match - job will be disabled in runlevel 2. start on : runlevel [2345] runlevel : 2 limit : runlevel outcome : match - job will be disabled in runlevel 2. start on : runlevel runlevel : 2 limit : runlevel [2345] outcome : match - job will be disabled in runlevel 2. start on : runlevel 2 runlevel : 2 limit : runlevel [2345] outcome : match - job will be disabled. start on : runlevel RUNLEVEL=2 runlevel : 2 limit : runlevel [2345] outcome : match - job will be disabled. start on : runlevel [2345] runlevel : 2 limit : runlevel RUNLEVEL=2 outcome : match - job will be disabled. start on : runlevel RUNLEVEL=2 runlevel : 2 limit : runlevel [2345] outcome : match - job will be disabled. start on : runlevel RUNLEVEL=2 PREVLEVEL=S runlevel : 2 limit : runlevel [2345] outcome : match - job will be disabled. start on : runlevel RUNLEVEL=2 runlevel : 2 limit : runlevel [2345] S outcome : no match - job will run. start on : runlevel 2 runlevel : 2 limit : runlevel [345] outcome : no match - job will run. warning will be generated since limit cannot match the start on condition. start on : foo or runlevel 2 runlevel : 2 (foo has not been emitted). limit : runlevel [2345] outcome : match? I think yes. start on : foo and runlevel 2 runlevel : 2 (and foo has been emitted). limit : runlevel [2345] outcome : match - job will not run. == Storage of Limit Conditions == The two main ideas here are: * Create a single file to store all limit information. A good location might be "/etc/init.limit". This file would store job restriction details in a simple format such as: <job> [<condition>] So, if job "cron" was disabled entirely, it would contain: cron Whereas if the job was disabled in runlevels 3-5 it would contain: cron runlevel [345] If the file exists on startup, Upstart would read the job limit details. Pros: * Single file outside of /etc/init/ so might be "safer" in the case where an admin ran "cd /etc/init; rm * .override" say by mistake. * It would be a "single point of definition" and thus easier to backup and apply to other systems maybe? Cons: * File would nominally need to be rewritten each time a change was made. Might not be too bad since changing limits is perceived as being an irregular activity (but tell me if you have other views on this! :) * Possible locking issues if multiple requests came in to change a limit at the same time. * Create per job files In a similar fashion to the existing ".conf" and ".override" files, we could introduce "/etc/init/<job>.limit". If this file existed and was empty, the job would be fully disabled (never automatically started). However, if it contains "<condition>", that would be applied. Pros: * Analog to ".conf" and ".override" so familiar to users. Cons: * Easy to inadvertently delete a ".limit" file maybe? * We're starting to create a lot of files now. Theoretically there could now be 3 files / job (".conf", ".override" and ".limit"). We're not likely to reach the inotify limit (4096 watches?) yet, but it is something to be aware of, moreso in the server or maybe development server environment. However the Limit Condition file(s) is/are created, care needs to be taken to ensure that it is not possible to lose data should the system fail / be rebooted in mid-write. == What Writes the Limit Conditions File? == The entire Upstart system currently only reads files. Changing that precedent should not be made lightly. Do we really want init or initctl to be able to write to files? There are a number of issues around doing so including: * Security concerns. Having a daemon writing files as the superuser is always something to be highly wary of. * Handling of failure conditions. Particularly for init itself, writes couldn't be synchronous since if they failed, that would block all other jobs. * Potential data loss should the system crash / be powered off whilst writing. Really, this requires a transactional system. A simple, but not fully effective semi-solution would be to write the data to a temporary file (eeeeew!) and then atomically move that over the original. There are 3 possibilities here: * /sbin/init writes the file. This is probably best avoided for the reasons outlined above. That said, it would be the cleanest solution since the new commands could be initctl commands and would work in similar fashion to existing ones. * /sbin/initctl writes the file. This is better than having init write the file, but by tasking initctl with the job, we would need to introduce some sync point with init such that: * initctl write the file. * initctl asks init to read the file and confirm when it has done so. * initctl returns with a message to the user. The sync point would guarantee that when initctl returns that Upstart would "know" about the limits and would act on them. Without it, there is a window where the user may think a job was disabled when in fact it hadn't yet been disabled. However, that window is small and realistically may only apply to the pathological case whereby a *job* disables another job. If we go with the ".limit" idea whereby Upstart would be notified by inotify as usual, the window is probably so short that we don't have to worry (much). * Some other tool writes the file. This sounds like a good option, but we still have the sync issue potentially. = Questions = * What about read-only root environments which would disallow writing to /etc/init*? We could provide a "--limitfile" option to init, but where could that point to that is guaranteed to exist potentially as early as the initramfs running? [1] - http://manpages.ubuntu.com/manpages/en/man8/chkconfig.8.html [2] - We should analyse the standard Ubuntu desktop and server installations to see how many fall into each category... Kind regards, James. - -- James Hunt ____________________________________ Ubuntu Foundations Team, Canonical. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk37rg0ACgkQYBWEaHcQG9cDXwCbBS4y05k6g8DR9JJp94guQ20y WZQAn2oggRMerD4Ob0IHKLi7kwTwIX8L =b0qf -----END PGP SIGNATURE----- -- upstart-devel mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/upstart-devel
