Re: [Bug 554172] Re: system services not starting at boot
Status returns start/running , services start which is nice tho not conclusive, just the runlevel is unknown. On 17 Aug 2010 03:12, Steve Langasek steve.langa...@canonical.com wrote: runlevel unknown is a separate error. Please check that your /etc/network/interfaces is correct and has successfully initialized your loopback interface (as shown by 'status network-interface INTERFACE=lo'). -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this b... -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
I updated to upstart 0.6.5-7, rebooted, and have the same issues with services not starting. Specifically, nginx doesn't start, and I have two CGI services that interact with nginx, that don't start on boot. I've checked with rcconf, and all 3 are labelled to start on boot. Checking some of the logs (/var/log/messages, /var/log/syslog, and /var/log/boot{.log}) doesn't reveal anything that would indicate a problem with these services, as far as I can tell. A related issue, that I mentioned earlier, is that 'shutdown -r now' doesn't work. I get the System is shutting down message on my console, but nothing happens. I have to use 'reboot', and this seems to work consistently. From some of the Ubuntu forum conmments, it seems like this is possibly related to the upstart problems. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
On one machine I have applied the later upstart from lucid-proposed 0.6.5.-7 and still get runlevel unknown on reboot. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
runlevel unknown is a separate error. Please check that your /etc/network/interfaces is correct and has successfully initialized your loopback interface (as shown by 'status network-interface INTERFACE=lo'). -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
This bug was fixed in the package upstart - 0.6.5-7 --- upstart (0.6.5-7) lucid-proposed; urgency=low * Apply patch from trunk to use /dev/null when /dev/console is unavailable due to kernel bugs. This isn't a fix for those bugs, but it does work around it for now. LP: #554172. -- Scott James Remnant sc...@ubuntu.com Thu, 12 Aug 2010 10:45:46 -0400 ** Changed in: upstart (Ubuntu Lucid) Status: Fix Committed = Fix Released -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
The patched version of upstart looks so far. 25 reboots on the dual CPU Pentium 3 machine and it has entered runlevel 2 each time. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Can also confirm this working. The system is taking longer to boot now - who wonders: all services start now. However, I had no chance to do excessive reboots since the machine is in production. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
** Tags added: verification-done ** Tags removed: verification-needed -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Seems to work fine on my PC, all services start now. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 554172] Re: system services not starting at boot
Ditto Seems to work fine on my PC, all services start now. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a direct subscriber of a duplicate bug (543506). Status in “linux” package in Ubuntu: Confirmed Status in “upstart” package in Ubuntu: Fix Released Status in “linux” source package in Lucid: Confirmed Status in “upstart” source package in Lucid: Fix Committed Status in “linux” source package in Maverick: Confirmed Status in “upstart” source package in Maverick: Fix Released Bug description: Binary package hint: cups Cups is not loading on my machine at boot, must run sudo /etc/init.d/cups start to after booting to print. ProblemType: Bug DistroRelease: Ubuntu 10.04 Package: cups 1.4.2-10 ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1 Uname: Linux 2.6.32-19-generic i686 NonfreeKernelModules: nvidia Architecture: i386 Date: Fri Apr 2 13:07:35 2010 InstallationMedia: Ubuntu 10.04 Lucid Lynx - Alpha i386 (20100401) Lpstat: Error: command ['lpstat', '-v'] failed with exit code 1: lpstat: Connection refused MachineType: Dell Inc. Studio XPS 1340 Papersize: letter PpdFiles: Brother-HL-2170W-series: Brother HL-2170W Foomatic/pxlmono (recommended) ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-generic root=UUID=615bbe85-506a-4152-af5a-a5c2da303d83 ro quiet splash ProcEnviron: LANG=en_US.utf8 SHELL=/bin/bash SourcePackage: cups dmi.bios.date: 09/08/2009 dmi.bios.vendor: Dell Inc. dmi.bios.version: A11 dmi.board.name: 0Y279R dmi.board.vendor: Dell Inc. dmi.board.version: A11 dmi.chassis.asset.tag: 1234567890 dmi.chassis.type: 8 dmi.chassis.vendor: Dell Inc. dmi.chassis.version: A11 dmi.modalias: dmi:bvnDellInc.:bvrA11:bd09/08/2009:svnDellInc.:pnStudioXPS1340:pvrA11:rvnDellInc.:rn0Y279R:rvrA11:cvnDellInc.:ct8:cvrA11: dmi.product.name: Studio XPS 1340 dmi.product.version: A11 dmi.sys.vendor: Dell Inc. To unsubscribe from this bug, go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554172/+subscribe -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
I installed the proposed upstart binary, and since then booted my system several times: the bug didn't reappear. All services started well. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
CUPS still not starting after update of the proposed upstart package -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Oops! My mistake! All services start but since smb start before cups, printers are still not available in samba! I thought at first that the new upstart package would solve that! -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Serge, please file a separate bug against the samba package for your issue. This is not related to this bug preventing the startup of cups. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
I am running a rack-mounted remote server that I was going to upgrade while the students are away. I cannot rely on ssh to start, or apache2 or mysql so I am somewhat stuffed chaps. Any news on this getting sorted or should I leave it with 7.04 server? -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
This bug was fixed in the package upstart - 0.6.6-2 --- upstart (0.6.6-2) maverick; urgency=low * Apply patch from trunk to use /dev/null when /dev/console is unavailable due to kernel bugs. This isn't a fix for those bugs, but it does work around it for now. LP: #554172. -- Scott James Remnant sc...@ubuntu.com Thu, 12 Aug 2010 09:52:07 -0400 ** Changed in: upstart (Ubuntu Maverick) Status: Triaged = Fix Released -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Robbie: Way ahead of you on that one, I put a workaround in Upstart together yesterday, and just uploaded it - now we understand the bug, and the implications, it's safe to do so. Upstart will now fallback to using /dev/null for jobs with console output -- console owner will still fail, because those are expressing a stronger desire for the console ** Also affects: upstart (Ubuntu) Importance: Undecided Status: New ** Changed in: upstart (Ubuntu Lucid) Assignee: (unassigned) = Scott James Remnant (scott) ** Changed in: upstart (Ubuntu Maverick) Importance: Undecided = Medium ** Changed in: upstart (Ubuntu Lucid) Importance: Undecided = Medium ** Changed in: upstart (Ubuntu Maverick) Assignee: (unassigned) = Scott James Remnant (scott) ** Changed in: upstart (Ubuntu Lucid) Status: New = Triaged ** Changed in: upstart (Ubuntu Maverick) Status: New = Triaged -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Though open(2) does indeed not document this error, it is a documented POSIX return and it has been possible that this could get returned on open for a TTY for a very long time. No it isn't, the current and previous editions of POSIX don't document EIO as a return for open() - are you sure you're not reading the XSI STREAMS specification? :-) That being said, as I discussed above, open() has always apparently returned EIO in Linux for other reasons. We should probably make sure this is documented. Yes EIO is not a very intuitive return but actually they chose a different return code as it does indeed indicate something different than an EAGAIN might. EGAIN generally meaning just do it again and EIO meaning this is stuck closing at the moment. The problem is that EIO is *already* returned from open() to mean omg, the filesystem/block device is on fire! stop! stop! stop! It seems that this is predicated only on your dislike of EIO as a return. No! You dangerously misunderstand. This is predicated on my discovery that open() can already return EIO for different errors that are catastrophic, and thus code that loops on EIO isn't possible. A different error would mean we could retry in userspace - right now the only option is to fail. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Andy: if we can change the returned error code so it's not EIO, and something more like EAGAIN that unequivocoably tells userspace that it can loop, we can deal with that in userspace. That's just a one-line change. For example we could patch libc to always loop on those open()s, which would fix everything at once -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
FYI, the fix for Lucid has been uploaded into -proposed: http://launchpadlibrarian.net/53562040/upstart_0.6.5-7_source.changes and is waiting for approval. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Accepted into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance! ** Changed in: linux (Ubuntu Lucid) Status: Confirmed = Fix Committed ** Changed in: upstart (Ubuntu Lucid) Status: Triaged = Fix Committed ** Tags added: verification-needed ** Changed in: linux (Ubuntu Lucid) Status: Fix Committed = Confirmed -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
** Branch linked: lp:ubuntu/lucid-proposed/upstart -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
** Changed in: upstart (Ubuntu Lucid) Milestone: None = ubuntu-10.04.1 -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Scott, Could we consider carrying a short-term hack in Lucid/Maverick for upstart (but not put in the main upstart tree), and then remove it for 11.04...when the issue is supposedly fixed in the kernel? -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
open(2) does not document EIO as a valid return from this function, and I'm not even sure this error is appropriate - where it's used elsewhere it nearly always refers to a filesystem error - there are few exceptions. If the intent is that the calling process should just try again, shouldn't it instead return EAGAIN? Though open(2) does indeed not document this error, it is a documented POSIX return and it has been possible that this could get returned on open for a TTY for a very long time. Yes EIO is not a very intuitive return but actually they chose a different return code as it does indeed indicate something different than an EAGAIN might. EGAIN generally meaning just do it again and EIO meaning this is stuck closing at the moment. Also please bear in mind that should implies that it should somehow have been anticipated that the kernel was going to change an interface and introduce an undocumented non-transient error code where none existed before? :-) This interface has _not_ changed, an open on a TTY which has recently been closed has always had the possibility of returning EIO, the /dev/console device is a TTY and therefore could trigger this behaviour; you have been lucky up to now. Two things have changed. Firstly, the window in which it can triggered has widened slightly in the kernel. Secondly, upstart recently stopped holding /dev/console open in the main thread (to avoid the REISUB death), holding it open mitigates this issue completely. (And we might consider this as a mitigation option.) Also, let's consider the other effects of this kernel change. For example, the following code from the initramfs that actually exec's init in the first place: exec run-init ${rootmnt} ${init} $@ ${rootmnt}/dev/console ${rootmnt}/dev/console 21 This opens /dev/console to be bound to init's file descriptors, if the console has recently been closed, these shell redirects can now fail with EIO. That means it's not just init that has to be fixed, it's every single possible shell out there, including the shells inside things like busybox? Actually the race can only be triggered by parallel execution, so for the init process up to this point we are likely protected by being singly threaded. If the thread has recently closed the console it will have paid the cost of closing it before continuing and we are not affected. This is why the kernel can't just push its own lazyness down to userspace like this. We commonly hand off unfortunate semantics to userspace and let that handle things. EINTR is a classic example. Another point to consider (I discussed this with a few people here at LinuxCon): open() is supposed to be an inherently blocking system call, just like connect(), creat(), etc. If the kernel hasn't finished hanging up the tty from last time, it's *okay* for the subsequent open() to block for a while while it hangs up the tty and reinitializes it. The app will be expecting that. If the app calls open() with the O_NONBLOCK flag, which it accepts today already, then it's a non-blocking open - and in that case it would be acceptable for the kernel to fail the open with the EAGAIN or EWOULDBLOCK error - *NOT* EIO. While that is a reasonable position to take, /dev/console is an implicitly non-blocking device, so in your case because you are using /dev/console you are getting non-blocking semantics whether you expected them or not. Also O_NOBLOCK actually does not say that the open should fail EWOULDBLOCK if it cannot be completed. It means open the device without waiting for it to be 'connected' it should result in a succesful open. (not EIO because it turns out that that error code is already returned in some cases to indicate filesystem corruption or disk error, neither of which are transient and acceptable to loop on) It seems that this is predicated only on your dislike of EIO as a return. Yes it is an unexpected one, but we commonly use error codes to mean different things from different types of device. EIO is defined as IO Error, and generally means the IO you wanted to do was not possible. It is not a big twist to use it to say open failed because IO is not possible. Nor for it to mean something completely different on a file and on a TTY. If it returned an ESLOWCLOSEINPROGRESS or indeed EWOULDBLOCK it would still be the same semantics, and I suspect you would still not be happy. Overall I can understand these semantics are not ideal, but they are the current semantics, they are not new semantics either. Even with the coming upstream changes (they appear to be merged now), the window is not gone just reduced and EIO is still a possibility with some TTYs, some which can be consoles. I have been doing some research to see if I can find a basis for this selection of return code and indeed this behaviour; so far I have not found one. But even if upstream were to concur this is not the correct behaviour and change it, we are unlikely to have
Re: [Bug 554172] Re: system services not starting at boot
If I can reproduce the bug when the quiet kernel option is set (this disables console output to speed up boot), does this analysis still hold water? Same for commenting out output to /dev/console in init files. There may be an issue according to the analysis, but am not sure whether it is the root cause of the symptoms in this bug. eg, in all combinations, quiet/not quiet commented/not commented the bug occurs(some services fail to start sometimes) on two giada n10U devices (dual atom 330 nvidia ion), as well as asus eee 1201n (also dual atom nvidia ion). -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
@Phil -- setting quiet would likely make things more likely as it simply prevents visible printing of the output, it does not reduce the contention on the console_sem its likely to increase it. Commenting out the open of /dev/console however from the upstarts configuration files should have an effect, and indeed is reported to in this bug for a number of people. It is quite possible there are two issues here with similar symptoms. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Similar symptoms for me on fully upgraded Ubuntu 10.04 on Dell Inspiron laptop dual-booting with Win7 using grub2. A triple-workaround that worked for others here (but not me) is listed below. sudo -s for file in /etc/init/*.conf; do sed -i 's/^console output/\#console output/' $file; done sed -i 's/start on filesystem and net-device-up IFACE=lo/start on filesystem and started rsyslog and net-device-up IFACE=lo/' /etc/init/rc-sysinit.conf sed -i 's/GRUB_CMDLINE_LINUX=/GRUB_CMDLINE_LINUX=init='\''\/sbin\/init --verbose'\''/' /etc/default/grub update-grub My problem may be related to a failed hibernate resume due to laptop battery draining below hibernate threshold just as I was correcting a permissions misconfiguration on my /var or /usr directory that was preventing proper OS operation. I think others have mentioned the possibility of a stale file lock problem, but can't remember where or how to correct it. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Having talked to upstream and clarified the plans with the BTM, it seems that they are intending on closing some of the races as they are deemed unhelpful. Overall however some slow TTY devices will indeed still legitimatly return EIO when a slow close is in operation. That is expected behaviour. Looking at upstart it does appear that a change there has exposed us to this issue. Until recently upstart used to hold /dev/console open in its own name. That was stopped (quite reasonably) to avoid another issue, from the changelog: 0.6.5 2010-02-04 Our last, best hope for victory [...] * No longer holds /dev/console open, so the SAK SysRq key will not kill Upstart. (Bug: #486005) [...] If we look at upstart itself, it seems to be using a plain open which likely should be more robust in any case: system_setup_console (ConsoleType type, [...] switch (type) { case CONSOLE_OUTPUT: case CONSOLE_OWNER: /* Ordinary console input and output */ fd = open (CONSOLE, O_RDWR | O_NOCTTY); if (fd 0) nih_return_system_error (-1); [...] -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
More robust, how? If the job is configured to require its standard three file descriptors to the system console, and Upstart is unable to open the system console, it is unable to satisfy the requirements of the job configuration so will terminate the attempt to start the job. Read through the entire job_process_spawn() function and you'll see that the code is already safe from EINTR due to the signal disposition, and all other permissible error returns from open() are non-transient. open(2) does not document EIO as a valid return from this function, and I'm not even sure this error is appropriate - where it's used elsewhere it nearly always refers to a filesystem error - there are few exceptions. If the intent is that the calling process should just try again, shouldn't it instead return EAGAIN? Also please bear in mind that should implies that it should somehow have been anticipated that the kernel was going to change an interface and introduce an undocumented non-transient error code where none existed before? :-) -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Also, let's consider the other effects of this kernel change. For example, the following code from the initramfs that actually exec's init in the first place: exec run-init ${rootmnt} ${init} $@ ${rootmnt}/dev/console ${rootmnt}/dev/console 21 This opens /dev/console to be bound to init's file descriptors, if the console has recently been closed, these shell redirects can now fail with EIO. That means it's not just init that has to be fixed, it's every single possible shell out there, including the shells inside things like busybox? This is why the kernel can't just push its own lazyness down to userspace like this. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Another point to consider (I discussed this with a few people here at LinuxCon): open() is supposed to be an inherently blocking system call, just like connect(), creat(), etc. If the kernel hasn't finished hanging up the tty from last time, it's *okay* for the subsequent open() to block for a while while it hangs up the tty and reinitializes it. The app will be expecting that. If the app calls open() with the O_NONBLOCK flag, which it accepts today already, then it's a non-blocking open - and in that case it would be acceptable for the kernel to fail the open with the EAGAIN or EWOULDBLOCK error - *NOT* EIO. (not EIO because it turns out that that error code is already returned in some cases to indicate filesystem corruption or disk error, neither of which are transient and acceptable to loop on) -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
First some background. Analysis above indicates that upstart jobs are failing because opens of /dev/console are failing, and that this should not be possible: ... open(/dev/console) is _not_supposed_to_fail_. As previously indicated /dev/console is a virtual device which represents the currently active console. However, there is no guarentee that this device will successfully open, no guarentee that there is a system console device. This device will only open successfully if there is a real active system console defined, if there is no system console at open time then the open will fail with errno set to ENODEV. There is no guarentee that it will ever become openable. Also while /dev/console is a virtual device representing the current active console, it is implemented as a direct open of the real console device. Opens of /dev/console are redirected to opens of the actual active console device at the time the open occurs. The open is therefore to the real underlying device, the returned file descriptor has all the semantics of the real tty device. I have managed to trivially reproduce open failures to /dev/console, returning errno of EIO, by running two open/close loops on a tty device in parallel, in my case /dev/tty10: #include stdio.h #include fcntl.h #include errno.h main(int argc, char *argv[]) { int fd; while (1) { fd = open(argv[1], O_RDWR); if (fd 0) { printf(fd%d errno%d\n, fd, errno); fflush(stdout); } close(fd); } } Looking at the open code, the suspected source of the error (EIO during open) is the code fragment below. That the console is in the process of being closed when the open is occuring: static int tty_reopen(struct tty_struct *tty) { struct tty_driver *driver = tty-driver; if (test_bit(TTY_CLOSING, tty-flags)) return -EIO; [...] Nominally open/close handling is protected by tty_mutex, this prevents parallel opens and closes from racing with each other. However once we close a device for the last time (ie all sharers have closed the device) a real shutdown of the device occurs. For tty devices this may involve an extended handshake at the hardware level. During this close process is it not safe to initiate a reopen, but we also do not wish to block all tty opens. Thus the kernel only holds the tty_mutex long enough to mark the device as in the process of closing (sets TTY_CLOSING in the device flags) and releases tty_mutex before executing the potentially extended close handling. In the single thread close/reopen race is avoided as tty shutdown processing is executed in the context of the closer, thus the device close has progressed sufficiently far to prevent a subsequent open from seeing this partially closed state, and triggering the EIO return. It should be noted that open/close processing is also covered by the BKL, however this is not proof against parallel execution, should the close handing sleep (which can occur should there be any mutexs in the path) the BKL is dropped and reaquired (as it is a preemptable lock). For the common case the console is a VT device, which needs to take console_sem during device shutdown which opens the race with another parallel thread. This EIO return has been pointed out to the tty maintainers, but in short this is deemed to be correct behaviour, though it is not well documented. The window during which this was possible was reduced significantly by splitting close processing, but the window is unlikely be removed, as the behaviour has meaning, to quote from the git log: Fun but it's actually not a bug and the fix is wrong in itself as the port may be closing but not yet being destructed, in which case it seems to do the wrong thing. Opening a tty that is closing (and could be closing for long periods) is supposed to return -EIO. In short the EIO behaviour looks to be expected behavior on a real TTY device, also /dev/console cannot be expected to always open, but must be expected to have the same semantics as the underlying active console device. Looking forward, there is much activity in the upstream kernel in the area of removing the BKL which currently does impact on this area. There are currently patches proposed for (but not merged to) v2.6.36 which substitute the BKL for a tty specific mutex. This appears that it will avoid this open/close race by rendering all opens and closes serialised. It is not yet clear that this change in semantics was planned as the code to handle overlapping open/closes has not been removed. We need to understand if this change in semantics was intended before we could consider changing the semantics for Maverick or Lucid. For Lucid we should also consider the difficulty in testing any change of this
[Bug 554172] Re: system services not starting at boot
** Changed in: linux (Ubuntu Lucid) Assignee: Canonical Kernel Team (canonical-kernel-team) = Andy Whitcroft (apw) ** Changed in: linux (Ubuntu Maverick) Assignee: Canonical Kernel Team (canonical-kernel-team) = Andy Whitcroft (apw) -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Many thanks to Andy Whitcroft for his truly excellent analysis above in comment #245. REPRODUCING THE BUG: It looks to me as though what is required to reproduce this bug is a fast multi-core machine with a slow video card. It just so happens that is exactly what my machine is like. When I build a machine, I always buy a high-end gamer-style motherboard and put a close-to-top-of-the-line CPU in it. My experience has been that motherboards are a reliability problem area, so I spend up on the motherboard. I tend to prefer motherboards and CPUs that have been out for a little while, so the bugs have been worked out of them. However, I do not like the noise and power consumption of high-end video cards, so I buy quiet (preferably silent) video cards. Such cards are inevitably slow. Many server machines also have fast CPUs and slow video. Most servers spend almost all of their lives with their video never being looked at by a human being, so fast video performance is unimportant in that market. So server computer manufacturers put in cheap slow video systems. Meanwhile, the CPU performance is critical, so server manufacturers put in big fast multi-core CPUs. This explains the prominent presence of the server guys in the comments on this bug. They have the kind of machines that show the bug. FLOW CONTROL PROBLEM: One problem not mentioned by Andy is the possibility of flow control happening. For example, suppose the console was something really slow such as an actual teletype, joined to the computer via modems. The word teletype is where tty came from. Of course, few of you young people know what a real teletype looks, sounds or smells like, but there are some of us who remember that they only went at 10 characters per second. Now suppose that the telephone line connecting the modems has got signal quality problems, maybe it has noise or crosstalk. So there are hangups and redialling going on as the modems struggle with the telephone line. Then do a few restarts and generate a lot of console traffic. The poor old console could end up hours behind. Linux should cope with all that and keep right on working properly. If there is spooling for the console, there will be short hesitations in the flow of data as the data is written to disk. Then there has to be a flow-control-asserted signal back to whatever is writing the data, to say, Hey, wait up, my buffer is nearly full. Then the writer has to wait until flow control is deasserted and writing may resume. Flow control should always be present between any two asynchronous processes transferring data. If there is no spooling for the console, the short hesitations can become quite long waits, as the console labours to catch up. I do not know whether there is a spooling option for console traffic. Perhaps someone more knowledgeable might comment. However, the problem will always be there, of the writer to the console possibly getting to be faster than the console device can take the data. CPUs keep on getting faster. Spooling disks or video cards cannot necessarily keep up. So there will always be the necessity for flow control. Linux in general, and upstart in particular, must cope satisfactorily with this problem. Otherwise, we are just headed for yet more nasty bugs like this one. CONCLUSION: The fact is, the console is not always available for writing. So any software writing to the console has to cope with that. That insight sounds like progress to me. I still think the problem is within upstart. I look forward to the next contribution from Scott James Remnant. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
My twopennies. I'm running 10.04 server w. the following kernel: 2.6.32-21-generic-pae #32-Ubuntu SMP The machine is an old Thosiba laptop w: 250 Mb and a Pentium III processor (~600 Mhz). It has worked flawlessly w-out the recommended fixes as a printerserver for some months. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
@John Edwards A Question to your machines (real and virtual) that are not affected by this bug: How many processor cores or virtual processors cores do they have? -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
@Robbie Williamson: Just upgraded to #39, and I was lucky this time: m...@bluebird:~$ runlevel N 2 m...@bluebird:~$ uname -a Linux Bluebird 2.6.32-24-generic #39-Ubuntu SMP Wed Jul 28 05:14:15 UTC 2010 x86_64 GNU/Linux First boot was ok, while #38 always failed. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
#39 broken here after 3 boots: mdo...@doris:~$ runlevel unknown mdo...@doris:~$ uname -a Linux doris 2.6.32-24-generic #39-Ubuntu SMP Wed Jul 28 05:14:15 UTC 2010 x86_64 GNU/Linux Also have a suspend-to-ram failure since 2.6.32-23, bug #602049 Vaio SZ6 notebook, Core 2 Duo + Intel graphics + Intel X25-M SSD -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Hey folks, I've done some investigation based on the hypothesis that this bug is caused by open(/dev/console) returning the EIO error, this is supported by comments including this log message from Upstart and by evidence that removing console output from Upstart jobs appears to correct the problem. First an important clarification. There is some suggestion that Upstart should wait for this device to be ready, that is nonsense. /dev/console is a virtual device supplied by the kernel that represents an active system console, whatever that may be - or the bit bucket if there isn't one. It's always available, and trivial operations on it such as open() and close() should always succeed. Waiting for /dev/console makes as little sense as waiting for /dev/null. open(/dev/console) is _not_supposed_to_fail_. So I've read through the kernel source code and I have found a pattern which *would* cause opening /dev/console to fail with EIO, and there is also a good explanation of why this only started appearing in Ubuntu 10.04. Opening /dev/console for the first time allocates memory within the kernel, and future opens take a reference count to this allocated memory. Closing /dev/console reduces this reference count, and should it hit zero, it frees the memory. The trouble seems to be that the kernel doesn't free the memory within the tty mutex; instead it marks the allocated tty information as TTY_CLOSING, releases the mutex, then frees the memory later. The open() code checks for this flag, and bails out with EIO when present. This is a clear SMP bug within the kernel, a race condition exists where if you open /dev/console just after the last file descriptor is closed from a process running on another core, that process gets a reference to the *being freed* console information rather than referencing it and re- using it. As to why this has only relatively recently appeared - previously the kernel seems to have done all of this under the BKL (Big Kernel Lock), the last commits to this code were attempts to remove the BKL. This may be a resulting bug of reducing locking and increasing pre-emptiveness. Also an Upstart bug may have been hiding the problem; Upstart gets passed /dev/console or opens it on initialisation so it can set up the console with sane parameters, however it failed to close it again and kept the console device open at all times. This was a bug, and meant that the SysRq-K SAK key killed init and caused a kernel panic. With this bug fix though, it became once again possible for the console device to be released from memory (init always had a reference before) so exposed this bug underneath. Reassigning to the kernel team to make the tty code SMP safe. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
** Also affects: Ubuntu Maverick Importance: High Status: Confirmed ** Changed in: Ubuntu Maverick Assignee: (unassigned) = Canonical Kernel Team (canonical-kernel-team) ** Changed in: Ubuntu Lucid Assignee: Canonical Foundations Team (canonical-foundations) = Robbie Williamson (robbie.w) ** Changed in: Ubuntu Lucid Assignee: Robbie Williamson (robbie.w) = Canonical Kernel Team (canonical-kernel-team) ** Package changed: Ubuntu Lucid = linux (Ubuntu Lucid) -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
** Tags added: kernel-core kernel-needs-review -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
The kernel patch that introduces this regression is f278a2f7bbc2239f479eaf63d0b3ae573b1d746c, which even notes in the commit log: Due to tty release routines run in a workqueue now, error like the following will be reported while booting: INIT open /dev/console Input/output error -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
It should be noted that the commentary is #238 is from the fix which eliminated that error, not introducing that error. This fix is also already applied to both Lucid and Maverick kernels. commit f278a2f7bbc2239f479eaf63d0b3ae573b1d746c Author: Dave Young hidave.darks...@gmail.com Date: Sun Sep 27 16:00:42 2009 + tty: Fix regressions caused by commit b50989dc The following commit made console open fails while booting: commit b50989dc444599c8b21edc23536fc305f4e9b7d5 Author: Alan Cox a...@linux.intel.com Date: Sat Sep 19 13:13:22 2009 -0700 tty: make the kref destructor occur asynchronously Due to tty release routines run in a workqueue now, error like the following will be reported while booting: [...] Fix it as per the following Alan's suggestion: [...] -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
akaname, I've only seen it on one machine and that had two Pentium 3 CPUs (separate chips). Of the machines which have not seen this problem, the virtual machines are all single processor and the servers all had multiple core CPUs (Pentium D, AMD X2, Core2 Duo and Quad). There were a few desktop machines upgraded to Ubuntu 10.04 that had single core CPUs - Pentium 4, AMD Athlon, and Intel Atom. The desktops were only upgraded a few weeks ago and are not used by me day-to-day, but have not reported problems so far. I'll try to grab one and power cycle it a bit to recreate the problem. I suspect Scott James Remnant is correct and this is an SMP problem. Most machines built over the past few years have had multi-core CPUs. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 554172] Re: system services not starting at boot
I've installed Ubuntu on 20 machines for individual desktop use. Most have been on AMD processors, and most of those have been on dual core and 1 quad core. The single core machines have rarely been used to print from. I've seen the issue on both dual core and quad core. I hope this helps. On 08/05/2010 11:53 AM, akaname wrote: @John Edwards A Question to your machines (real and virtual) that are not affected by this bug: How many processor cores or virtual processors cores do they have? -- *** * -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 554172] Re: system services not starting at boot
Scott, I've done some investigation based on the hypothesis that this bug is caused by open(/dev/console) returning the EIO error, this is supported by comments including this log message from Upstart and by evidence that removing console output from Upstart jobs appears to correct the problem. Not sure if this is relevant, but the workaround of commenting out the /dev/console statements may _seem_ to be a fix, but it is not as I have seen the bug reproduced with this workaround applied, albeit less frequently. It does remove the console write error from the log, but some services may still not run. No obvious signs of failure does not mean there has been no failure, as I have found machines that seem OK but it is not obvious as they have a small number of services. From memory, there is also an option to turn off the console. I tried each workaround individually with multiple reboots until failure, including the disabling of ureadahead. The only workaround that I have not seen the bug is after disabling ureadahead, but this may mean nothing given the intermittent nature of the bug. -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 554172] Re: system services not starting at boot
Thanks Robbie. This problem shows up more often on faster systems. every quad phenom/nvidia box I have put together does it. all quad opteron servers (NO GUI) does it regardless of video card. Sometimes on the servers bind9 will fail. ddclient also fails vboxdrv also fails to start on boot and sometimes the login screen fails to show up Some slower intel boxes with intel video cards do it. on some systems with GeForce fx5200 you get a black screen and login fails to load sometimes while other times you get a console login but X fails to load. on some systems with integrated intel 845G same problem with X not starting and login on console not starting sometimes while other times login starts at the console. list of services known not to start because of this problem on various machines: cups X bind9 vboxdrv console login ddclient. nmbd just about any other 3rd party service. This problem did NOT show up on any boxes I setup during the Karmic release. It does not matter whether the runlevel command shows N 2 or (unknown) I have had services fail to start on regardless of the output of the runlevel command. Usually shows N 2 -- system services not starting at boot https://bugs.launchpad.net/bugs/554172 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs