** Description changed:
+ = Summary =
+
+ The version of Upstart in vivid is affected by a coule of bugs relating
+ to the flushing data from early-boot jobs to disk which can both result
+ in a crash:
+
+ == Problem 1 ==
+
+ An internal list is mishandled meaning a crash could occur randomly.
+
+ == Problem 2 ==
+
+ Jobs which spawn processes in the background then themselves exit can
+ cause a crash due.
+
+ = Explanation of how Upstart flushes early job output =
+
+ If an Upstart job starts *and ends* early in the boot sequence (before
+ the log partition is mounted and writable) and produces output to its
+ stdout/stderr, Upstart will cache the output for later flushing by
+ adding the 'Log' object associated with the 'Job' to a list.
+
+ When the log partition is mounted writable, the
+ /etc/init/flush-early-job-log.conf job is run which calls "initctl
+ notify-disk-writeable". This is a signal to Upstart to flush its cache
+ of early-boot job output which takes the form of iterating the
+ 'log_unflushed_files' list and flushing all the 'Log' entries to disk.
+
+ = Code Specifics =
+
+ There are 2 issues (note that the numbers used below match those used in
+ the Summary).
+
+ == Problem 1 detail ==
+
+ Due to a bug in the way the 'log_unflushed_files' list is handled (the
+ 'Log' cannot be added to the list directly, so is added via an
+ intermediary ('NihListElem') node), a crash can result when iterating
+ the list since the 'Log' is freed, but NOT the intermediary node. The
+ implication is that it is possible for the intermediary node to be
+ attempt to dereference already-freed data, resulting in a crash.
+
+ == Problem 2 detail ==
+
+ If a job spawns a process in the background, then itself exits, that
+ jobs 'Log' entry will be added to the 'log_unflushed_files' list. But,
+ if the background process produces output and then exits before Upstart
+ attempts to flush the original jobs data to disk, the 'NihIo'
+ corresponding to the log will be serviced automatically and the data
+ flushed to disk. The problem comes when Upstart receives the
+ notification to flush the 'log_unflushed_files' list, since that list
+ now contains an entry which has already been freed (since all its data
+ has already been flushed). The result is an assertion failure.
+
+ = Fix =
+
+ == Problem 1 fix ==
+
+ Correct the 'log_unflushed_files' list handling by freeing the
+ 'NihListElem' (which will automatically free the 'Log' object), not by
+ simply freeing the 'Log' object itself.
+
+ * Branch: lp:~jamesodhunt/ubuntu/vivid/upstart/bug-1447756/
+ * New Upstart test added to avoid regression?: Yes.
+
+ == Problem 2 fix ==
+
+ Correct the assumption that the only entries in the
+ 'log_unflushed_files' list will always have data to flush by checking if
+ there is in fact any data to flush; if not, remove the entry from the
+ 'log_unflushed_files' list since it has already been handled
+ automatically by the 'NihIo'.
+
+ * Branch: lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix
+ * New Upstart test added to avoid regression?: Yes.
+
+ = Workarounds =
+
+ If a system is affected by this bug, it will be manifested by a crash
+ early in the boot sequence.
+
+ To overcome the issue, either:
+
+ a) Boot by adding "--no-log" to the kernel command-line.
+
+ b) Disable the flush-early-job-log job (assuming the machine is
+ bootable) by running the following:
+
+ $ echo manual | sudo tee -a /etc/init/flush-early-job-log.override
+
+ = Impact =
+
+ The issue has been present in Upstart since logging was introduced but
+ no known instances of crashes relating to these problems have been
+ reported prior to this bug being reported (which relates the the issue
+ being seen on a very small subset of specific Ubuntu Touch phone
+ hardware where Upstart is used as the system init daemon).
+
+ Note that vivid still uses Upstart for managing the graphical session,
+ but now uses systemd by default for the system init daemon. Since the session
(Upstart) init does not even require
+ a flush-early-job-log, the exposure to both the bug and the updated fix
codepath is extremely limited.
+
+ = Test Case =
+
+ This bug is extremely hard to surface so the approach is simply to check
+ that the internal list can be iterated correctly by:
+
+ 1) Booting the system with upstart
+ (select the Upstart option from the grub menu or add "init=/sbin/upstart"
to the kernel command-line).
+
+ 2) Running the following on a system booted with Upstart:
+
+ $ for i in $(seq 17); do sudo start flush-early-job-log; done
+
+ = Regression Potential =
+
+ None expected:
+
+ - As noted in Impact, the problems fixed by this version of Upstart have not
been observed on server/desktop systems before.
+ - The fix is already in wily and no problems have been reported.
+ - See Impact.
+
+ = Original Description =
+
We recently started getting reprots from phone users that their devices
go into a reboot loop after changing the language or getting an OTA
upgrade (either of both end with a reboot of the phone)
after a bit of research we collected the log at
http://pastebin.ubuntu.com/10872934/
this shows a segfault of upstarts init binary in the log.c code:
[ 6.999083]init: log.c:819: Assertion failed in log_clear_unflushed:
log->unflushed->len
[ 7.000279]init: Caught abort, core dumped
[ 7.467176]Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000600
** Summary changed:
- segfault in log.c code causes phone reboot loops
+ [SRU] segfault in log.c code causes phone reboot loops
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
[SRU] segfault in log.c code causes phone reboot loops
To manage notifications about this bug go to:
https://bugs.launchpad.net/canonical-devices-system-image/+bug/1447756/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs