Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/20 James Hunt 688...@bugs.launchpad.net: 3) Modify all upstart configs for services which are slow to stop such that they stop on unmount-filesystem, rather than stop on runlevel [016]. - What about single user mode? I guess when switching to runlevel 1 we want to stop services like mysql? - How do you decide if a service is 'slow to stop' ? Imho that highly depends on the given hardware, local configuration and the amount of data you are dealing with. A general approach would be preferable. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/16 Clint Byrum cl...@fewbar.com: /etc/init.d/sendsigs has this code: # Upstart jobs have their own stop on clauses that sends # SIGTERM/SIGKILL just like this, so if they're still running, # they're supposed to be for pid in $(initctl list | sed -n -e /process [0-9]/s/.*process //p); do OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid done It uses this to determine which pids not to kill because, presumably, upstart should be managing them. However, this code is flawed. killall5 will kill the children of all of these if they are multi process daemons or scripts running things. This observation is correct. On the other hand, isn't this exactly what the sendsigs script is for: clean up any remaining, stray processes which have not been stopped by its corresponding sysv init script or upstart job (or have been e.g. started by the user)? But I guess you are right, we should first stop all upstart jobs, give them time to finish stopping, and then let sendsigs clean up anything remaining afterwards. However, this technique can actually be used to determine if there are still jobs that are supposed to be stopped, but haven't finished stopping yet. Since they should be listed as stop/(pre-stop|post- stop|killed), we can determine exactly which pids we expect to go away. Since upstart has its own idea of how long to wait before it kills these, we should actually wait indefinitely. I'm attaching a debdiff that solves the race as far as I can tell, though I think it needs a good long look, since it could mean shutdowns hang for a long time waiting (I'm especially curious if the pre-stop /post-stop's are subject to kill timeout) This code is still racy, afaics. What about upstart jobs, which are not stopped by stop on runlevel [016]? They could receive their stop signal at a point when your loop has already been run. If you don't want to change existing jobs, we probably have to pick up Ante's suggestion, and do the following in sendsigs: 1) run a for loop to wait for *all* running upstart jobs to stop. upstart jobs which need to keep running past sendsigs (e.g. plymouth) need to signal that using a similar mechanism like the killall5 sendsigs.d omit interface. I'd at least give upstart jobs 60secs time to stop, so big databases etc have enough time to cleanly shutdown 2.) run a for loop and send SIGTERM all remaining processes, but do *not* add upstart pids to $OMITPIDS 3.) send a final SIGKILL if any processes are left. Regarding 1.), it would be nice to have a native C implementation in upstart, instead of running initctl, grep and sleep manually. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/14 Clint Byrum cl...@fewbar.com: I do think the appropriate fix is to have umountfs emit an 'unmounting- filesystems' event and anything that does a 'start on local-filesystems' or 'start on filesystem' should also 'stop on unmounting-filesystems', What do you do about services which have start on runlevel [2345] and the binary is in /usr? There are quite a few examples here: acpid, atd, cron, irqbalance, etc which all have: start on runlevel [2345] stop on runlevel [!2345] Either those jobs are buggy to not specify the start on (local-)filesystems dependency or your criteria is not sufficient. Imho the major problem here is, that there is a mixup between dependencies that need to be satisfied to be able to run a job and when (in which runlevels) to start a job. Michael -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 688541] [NEW] race condition on shutdown (leads to corrupted fs)
Public bug reported: I'm using mysql-server-5.1 on a 10.04 LTS installation. The mysql db is around 27GB and on a separate partition mounted as /var/lib/mysql. On shutdown I get the following error message: Checking for running unattended-upgrades: * Asking all remaining processes to terminate... [80G [74G[ OK ] * All processes ended within 1 seconds [80G [74G[ OK ] * Deconfiguring network interfaces... [80G [74G[ OK ] * Deactivating swap... [80G [74G[ OK ] * Unmounting local filesystems... [80G umount2: Device or resource busy umount: /var/lib/mysql: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy umount2: Device or resource busy umount: /tmp: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy [74G[ [31mfail [39;49m] mount: / is busy * Will now restart [ 3369.429751] Restarting system. On the next reboot the file system is corrupt and need to be fsck-ed. I think the problem is, that mysql uses an upstart job (/etc/init/mysql.conf) and has stop on runlevel [016] The rc.conf job is also triggered on runlevel 0 and 6, so they basically run at the same time.As When /etc/rc0.d/S20sendsigs is run, it deliberatly does not wait or kill any upstart jobs. As my mysqld process takes some time to shutdown, S40umountfs and S60umountroot are run before the mysqld has quit. Leading to the fs not being properly unmounted. It is event possible that mysqld is forcefully killed by halt in S90halt if it hasn't stopped by then. This is a serious issue, as it can (and will) lead to data loss. Other upstart jobs, like rsyslog.conf, use the same stop on runlevel [016] stanza, so they are probably affected too. ProblemType: Bug DistroRelease: Ubuntu 10.10 Package: mysql-server-5.1 5.1.49-1ubuntu8.1 Uname: Linux 2.6.32-5-686 i686 NonfreeKernelModules: michael_mic arc4 ecb lib80211_crypt_tkip aes_i586 aes_generic lib80211_crypt_ccmp sco bnep rfcomm l2cap binfmt_misc acpi_cpufreq ppdev lp cpufreq_userspace cpufreq_stats vboxnetadp cpufreq_powersave vboxnetflt cpufreq_conservative vboxdrv fuse pcmcia snd_intel8x0m snd_intel8x0 snd_ac97_codec btusb bluetooth rfkill ac97_bus yenta_socket ipw2200 snd_pcm 8139too firewire_ohci snd_seq 8139cp firewire_core sg uhci_hcd snd_timer rsrc_nonstatic libipw snd_seq_device pcmcia_core crc_itu_t parport_pc smsc_ircc2 ehci_hcd mii joydev lib80211 sr_mod parport i2c_i801 irda snd usbcore wbsd soundcore shpchp mmc_core pcspkr rng_core cdrom psmouse container crc_ccitt snd_page_alloc serio_raw pci_hotplug ac battery nls_base processor evdev ppp_generic slhc loop autofs4 ext4 mbcache jbd2 crc16 dm_mod sd_mod crc_t10dif radeon ttm ata_generic drm_kms_helper ata_piix drm i2c_algo_bit libata video thermal i2c_core scsi_mod output thermal_sys button Architecture: i386 Date: Fri Dec 10 13:41:52 2010 ProcEnviron: PATH=(custom, no user) LANG=de_DE.utf8 SHELL=/bin/bash SourcePackage: mysql-5.1 ** Affects: mysql-5.1 (Ubuntu) Importance: Undecided Status: New ** Tags: apport-bug i386 maverick -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
-- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
2010/12/10 Ante Karamatić iv...@grad.hr: Suggestion: make umountfs wait for all upstart jobs to finish. Doesn't that conflict though with what is written in /etc/init.d/sendsigs: # Upstart jobs have their own stop on clauses that sends # SIGTERM/SIGKILL just like this, so if they're still running, # they're supposed to be for pid in $(initctl list | sed -n -e /process [0-9]/s/.*process //p); do OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid done or # did an upstart job start since we last polled initctl? check # again on each loop and add any new jobs (e.g., plymouth) to # the list. If we did miss one starting up, this beats waiting # 10 seconds before shutting down. for pid in $(initctl list | sed -n -e /process [0-9]/s/.*process //p); do OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid done -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs