On Wed, 2011-03-30 at 07:52 -0500, Serge E. Hallyn wrote: > Quoting Scott Kitterman ([email protected]): > > There was a lot of discussion around improving the server boot experience > > before the UDS-M. A number of people expressed interest in seeing more > > useful > > diagnostic information during boot. Others expressed concerns with boot > > reliability on the more complex hardware typically found in servers. > > > > How are we doing on this? Personally, I can't remember the last time I > > rebooted a server and it wasn't via SSH and the hardware I use is the sort > > there were problems with. Are these still issues for the Ubuntu Server > > community? > > > > Scott K > > I think right now these issues are oveshadowed by the fact that a > great deal of server software is not yet upstartified. I think that > needs to be addressed for O.
I wonder if we need to address all of them. There are hundreds of daemons that will always work perfectly fine in /etc/init.d as a sysvinit script. $ apt-file search /etc/init.d| wc -l 1179 If I narrow it down to main, that drops to 220, 50 or so of those are already symlinks to upstart-job. So realistically, I'd say there are 150 - 170 left to convert in main, and probably about 1000 in universe. Rather than focus on upstartifying everything, the focus should probably be on getting the key infrastructure pieces working well in upstart (kerberos, ssh, ldap, nfs, etc), and then in improving the sysvinit compatibility layer so that Ubuntu continues to shine when something uses a sysvinit job. I think one issue with server boot is that its been left to the event model without many fences. James Hunt's visualization tool shows arrows going *everywhere*: http://upstart.at/2011/03/25/visualisation-of-jobs-and-events-in-ubuntu-natty/ http://upstart.at/wp-content/uploads/2011/03/initctl2dot.png But if you look at it, things get *much* more orderly around the runlevel event. This is a fence. We can reasonably say that the system, upon emitting the runlevel 2 event, has crossed into a zone where it is ready for network services to start. The problem is, its not true. This event is emitted as soon as lo is up by rc-sysinit: start on filesystem and net-device-up IFACE=lo Some services handle this quite well, some do not. Right now, the only other fences are flawed: start on net-device-up IFACE!=lo Which means at least one real interface is configured. This will never come on a machine that has no network. It has the benefit though, that it is emitted every time a network appears, so for laptops bouncing from no network, to wifi, and back, this is a great event to use to make sure something is up whenever there is a real network. On a server though, this just means that one of the possibly many interfaces is up, and so probably shouldn't be used. Or start on started networking Which means 'ifup -a' has returned, which means that all static, auto interfaces are configured. It also means we're missing dhcp interfaces. We should change rc-sysinit to start on started networking. This carries with it one problem, which is that if a static network interface needs a sysvinit service to finish coming up, it will lock the boot up. So we would have to review all scripts in /etc/network/ifup-pre.d and /etc/network/ifup-post.d and make sure they don't rely on sysvinit services. Likewise, we'd have to get this done quickly so users can review any custom scripts they have before the next LTS. As a secondary measure, running these scripts should time out so the boot can continue if this deadlock is encountered. This condition, of finishing 'ifup -a', was the case up until the all-upstart boot was done. The way the deadlock was avoided was services that expected to be needed before networking was available would specify a low number for runlevel S. These services are quite few, and can be easily identified and converted to upstart jobs that start at the right time. If I look on a hardy system at /etc/rcS.d, with netbase installed.. I see very little between loopback and networking: lrwxrwxrwx 1 root root 18 Mar 30 10:45 S08loopback -> ../init.d/loopback lrwxrwxrwx 1 root root 20 Nov 30 17:46 S11hwclock.sh -> ../init.d/hwclock.sh lrwxrwxrwx 1 root root 26 Nov 30 17:46 S11mountdevsubfs.sh -> ../init.d/mountdevsubfs.sh lrwxrwxrwx 1 root root 16 Nov 30 17:46 S17procps -> ../init.d/procps lrwxrwxrwx 1 root root 22 Nov 30 17:46 S20checkroot.sh -> ../init.d/checkroot.sh lrwxrwxrwx 1 root root 17 Nov 30 17:46 S22mtab.sh -> ../init.d/mtab.sh lrwxrwxrwx 1 root root 20 Nov 30 17:46 S30checkfs.sh -> ../init.d/checkfs.sh lrwxrwxrwx 1 root root 21 Nov 30 17:46 S35mountall.sh -> ../init.d/mountall.sh lrwxrwxrwx 1 root root 31 Nov 30 17:46 S36mountall-bootclean.sh -> ../init.d/mountall-bootclean.sh lrwxrwxrwx 1 root root 26 Nov 30 17:46 S37mountoverflowtmp -> ../init.d/mountoverflowtmp lrwxrwxrwx 1 root root 20 Mar 30 10:46 S40networking -> ../init.d/networking In fact, IMO, none of these would qualify for this condition, and are likely just in this order for other reasons. -- ubuntu-server mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam
