Hey folks,
I've done some investigation based on the hypothesis that this bug is
caused by open("/dev/console") returning the EIO error, this is
supported by comments including this log message from Upstart and by
evidence that removing "console output" from Upstart jobs appears to
correct the problem.
First an important clarification. There is some suggestion that Upstart
should "wait" for this device to be ready, that is nonsense.
/dev/console is a virtual device supplied by the kernel that represents
an active system console, whatever that may be - or the bit bucket if
there isn't one. It's always available, and trivial operations on it
such as open() and close() should always succeed. Waiting for
/dev/console makes as little sense as waiting for /dev/null.
open("/dev/console") is _not_supposed_to_fail_.
So I've read through the kernel source code and I have found a pattern
which *would* cause opening /dev/console to fail with EIO, and there is
also a good explanation of why this only started appearing in Ubuntu
10.04.
Opening /dev/console for the first time allocates memory within the
kernel, and future opens take a reference count to this allocated
memory. Closing /dev/console reduces this reference count, and should
it hit zero, it frees the memory.
The trouble seems to be that the kernel doesn't free the memory within
the tty mutex; instead it marks the allocated tty information as
TTY_CLOSING, releases the mutex, then frees the memory later. The
open() code checks for this flag, and bails out with EIO when present.
This is a clear SMP bug within the kernel, a race condition exists where
if you open /dev/console "just after" the last file descriptor is closed
from a process running on another core, that process gets a reference to
the *being freed* console information rather than referencing it and re-
using it.
As to why this has only relatively recently appeared - previously the
kernel seems to have done all of this under the BKL ("Big Kernel Lock"),
the last commits to this code were attempts to remove the BKL. This may
be a resulting bug of reducing locking and increasing pre-emptiveness.
Also an Upstart bug may have been hiding the problem; Upstart gets
passed /dev/console or opens it on initialisation so it can set up the
console with sane parameters, however it failed to close it again and
kept the console device open at all times. This was a bug, and meant
that the SysRq-K SAK key killed init and caused a kernel panic. With
this bug fix though, it became once again possible for the console
device to be released from memory (init always had a reference before)
so exposed this bug underneath.
Reassigning to the kernel team to make the tty code SMP safe.
--
system services not starting at boot
https://bugs.launchpad.net/bugs/554172
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs