Public bug reported:

Upstart sometimes aborts on a stateful re-execution
triggered by "telinit u":

job.c:1977: Assertion failed in job_deserialise: job->kill_process
Caught abort, core dumped
init:job.c:1977: Assertion failed in job_deserialise: job->kill_process
[   69.668199] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x00000600

The attached file (sessions.json) is a salvaged dump of the Upstart state
that triggers the assertion failure; the problem evidently occurs while
processing the following piece:

[...]
          "name": "",
          "path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_",
          "goal": "JOB_STOP",
          "state": "JOB_KILLED",
[...]
          "kill_timer": {
            "timeout": 180,
            "due": 245
          },
          "kill_process": "PROCESS_MAIN",
[...]

The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04)
and is caused by the following code:

[init/job.c]

1954         json_kill_timer = json_object_object_get (json, "kill_timer");
1955 
1956         if (json_kill_timer) {
[...]
1973                 nih_local NihTimer *kill_timer = 
job_deserialise_kill_timer (json_kill_timer);
1974                 if (! kill_timer)
1975                         goto error;
1976 
1977                 nih_assert (job->kill_process);
1978                 job_process_set_kill_timer (job, job->kill_process,
1979                                             kill_timer->timeout);
1980                 job_process_adj_kill_timer (job, kill_timer->due);
1981         }

The assertion (job->kill_process) fails in the routine job_deserialise()
if the deserialised job has an associated kill timer and
the field kill_process == PROCESS_MAIN.

It seems the issue might still affect the trunk as well:
there're no similar checks in the routines job_process_kill()
and job_serialise(), so if the Upstart state is serialised
after the job_process_kill() but before the job kill timer fires
then the resulting state representation cannot be restored
since job->kill_timer is non-NULL and job->kill_process
isn't PROCESS_INVALID that is a result of job_process_set_kill_timer()
operation.

Probably the assertion in question should read

 (job->kill_process != PROCESS_INVALID)

if job_process_set_kill_timer() is assumed to operate correctly.

Unfortunately the issue is extremely difficult to reproduce
so additional diagnostics might be difficult to perform
and it might kill the race that triggers the issue.

** Affects: upstart (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "Serialised Upstart state dump"
   
https://bugs.launchpad.net/bugs/1514609/+attachment/4515781/+files/sessions.json

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1514609

Title:
  Deserialising a job with the attribute "kill_timer" and
  "kill_process"="PROCESS_MAIN" results in abort

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1514609/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to