Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-20 Thread Michael Biebl
2010/12/20 James Hunt 688...@bugs.launchpad.net:

 3) Modify all upstart configs for services which are slow to stop such that 
 they stop on unmount-filesystem,
    rather than stop on runlevel [016].

- What about single user mode? I guess when switching to runlevel 1 we
want to stop services like mysql?
- How do you decide if a service  is 'slow to stop' ? Imho that
highly depends on the given hardware, local configuration and the
amount of data you are dealing with. A general approach would be
preferable.

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-16 Thread Michael Biebl
2010/12/16 Clint Byrum cl...@fewbar.com:

 /etc/init.d/sendsigs has this code:


        # Upstart jobs have their own stop on clauses that sends
        # SIGTERM/SIGKILL just like this, so if they're still running,
        # they're supposed to be
        for pid in $(initctl list | sed -n -e /process [0-9]/s/.*process 
 //p); do
                OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
        done


 It uses this to determine which pids not to kill because, presumably, upstart 
 should be managing them.

 However, this code is flawed. killall5 will kill the children of all of
 these if they are multi process daemons or scripts running things.

This observation is correct. On the other hand, isn't this exactly
what the sendsigs script is for: clean up any remaining, stray
processes  which have not been stopped by its corresponding sysv init
script or upstart job (or have been e.g. started by the user)?

But I guess you are right, we should first stop all upstart jobs, give
them time to finish stopping, and then let sendsigs clean up anything
remaining afterwards.

 However, this technique can actually be used to determine if there are
 still jobs that are supposed to be stopped, but haven't finished
 stopping yet. Since they should be listed as stop/(pre-stop|post-
 stop|killed), we can determine exactly which pids we expect to go away.
 Since upstart has its own idea of how long to wait before it kills
 these, we should actually wait indefinitely.

 I'm attaching a debdiff that solves the race as far as I can tell,
 though I think it needs a good long look, since it could mean shutdowns
 hang for a long time waiting (I'm especially curious if the pre-stop
 /post-stop's are subject to kill timeout)

This code is still racy, afaics. What about upstart jobs, which are
not stopped by stop on runlevel [016]? They could receive their stop
signal at a point when your loop has already been run.

If you don't want to change existing jobs, we probably have to pick up
Ante's suggestion, and do the following in sendsigs:

1) run a for loop to wait for *all* running upstart jobs to stop.
upstart jobs which need to keep running past sendsigs (e.g. plymouth)
need to signal that using a similar mechanism like the killall5
sendsigs.d omit interface. I'd at least give upstart jobs 60secs time
to stop, so big databases etc have enough time to cleanly shutdown
2.) run a for loop and send SIGTERM all remaining processes, but do
*not* add upstart pids to $OMITPIDS
3.) send a final SIGKILL if any processes are left.


Regarding 1.), it would be nice to have a native C implementation in
upstart, instead of running initctl, grep and sleep manually.


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-13 Thread Michael Biebl
2010/12/14 Clint Byrum cl...@fewbar.com:

 I do think the appropriate fix is to have umountfs emit an 'unmounting-
 filesystems' event and anything that does a 'start on local-filesystems'
 or 'start on filesystem' should also 'stop on unmounting-filesystems',

What do you do about services which have
start on runlevel [2345] and the binary is in /usr?

There are quite a few examples here: acpid, atd, cron, irqbalance, etc
which all have:

start on runlevel [2345]
stop on runlevel [!2345]

Either those jobs are buggy to not specify the start on
(local-)filesystems dependency or your criteria is not sufficient.

Imho the major problem here is, that there is a mixup between
dependencies that need to be satisfied to be able to run a job and
when (in which runlevels) to start a job.

Michael


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 688541] [NEW] race condition on shutdown (leads to corrupted fs)

2010-12-10 Thread Michael Biebl
Public bug reported:

I'm using mysql-server-5.1 on a 10.04 LTS installation.
The mysql db is around 27GB and on a separate partition mounted as 
/var/lib/mysql.

On shutdown I get the following error message:

Checking for running unattended-upgrades:  * Asking all remaining processes to 
terminate...   
[80G 
[74G[ OK ]
 * All processes ended within 1 seconds   
[80G 
[74G[ OK ]
 * Deconfiguring network interfaces...   
[80G 
[74G[ OK ]
 * Deactivating swap...   
[80G 
[74G[ OK ]
 * Unmounting local filesystems...   
[80G umount2: Device or resource busy
umount: /var/lib/mysql: device is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
umount2: Device or resource busy
umount: /tmp: device is busy.
(In some cases useful info about processes that use
 the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
[74G[
[31mfail
[39;49m]
mount: / is busy
 * Will now restart
[ 3369.429751] Restarting system.


On the next reboot the file system is corrupt and need to be fsck-ed.

I think the problem is, that mysql uses an upstart job (/etc/init/mysql.conf) 
and has
stop on runlevel [016]

The rc.conf job is also triggered on runlevel 0 and 6, so they basically
run at the same time.As

When /etc/rc0.d/S20sendsigs is run, it deliberatly does not wait or kill
any upstart jobs.

As my mysqld process takes some time to shutdown, S40umountfs and
S60umountroot are run before the mysqld has quit.

Leading to the fs not being properly unmounted. It is event possible
that mysqld is forcefully killed by halt in S90halt if it hasn't stopped
by then.

This is a serious issue, as it can (and will) lead to data loss.

Other upstart jobs, like rsyslog.conf, use the same stop on runlevel
[016] stanza, so they are probably affected too.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: mysql-server-5.1 5.1.49-1ubuntu8.1
Uname: Linux 2.6.32-5-686 i686
NonfreeKernelModules: michael_mic arc4 ecb lib80211_crypt_tkip aes_i586 
aes_generic lib80211_crypt_ccmp sco bnep rfcomm l2cap binfmt_misc acpi_cpufreq 
ppdev lp cpufreq_userspace cpufreq_stats vboxnetadp cpufreq_powersave 
vboxnetflt cpufreq_conservative vboxdrv fuse pcmcia snd_intel8x0m snd_intel8x0 
snd_ac97_codec btusb bluetooth rfkill ac97_bus yenta_socket ipw2200 snd_pcm 
8139too firewire_ohci snd_seq 8139cp firewire_core sg uhci_hcd snd_timer 
rsrc_nonstatic libipw snd_seq_device pcmcia_core crc_itu_t parport_pc 
smsc_ircc2 ehci_hcd mii joydev lib80211 sr_mod parport i2c_i801 irda snd 
usbcore wbsd soundcore shpchp mmc_core pcspkr rng_core cdrom psmouse container 
crc_ccitt snd_page_alloc serio_raw pci_hotplug ac battery nls_base processor 
evdev ppp_generic slhc loop autofs4 ext4 mbcache jbd2 crc16 dm_mod sd_mod 
crc_t10dif radeon ttm ata_generic drm_kms_helper ata_piix drm i2c_algo_bit 
libata video thermal i2c_core scsi_mod output thermal_sys button
Architecture: i386
Date: Fri Dec 10 13:41:52 2010
ProcEnviron:
 PATH=(custom, no user)
 LANG=de_DE.utf8
 SHELL=/bin/bash
SourcePackage: mysql-5.1

** Affects: mysql-5.1 (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: apport-bug i386 maverick

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-10 Thread Michael Biebl


-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 688541] Re: race condition on shutdown (leads to corrupted fs)

2010-12-10 Thread Michael Biebl
2010/12/10 Ante Karamatić iv...@grad.hr:
 Suggestion: make umountfs wait for all upstart jobs to finish.

Doesn't that conflict though with what is written in
/etc/init.d/sendsigs:

# Upstart jobs have their own stop on clauses that sends
# SIGTERM/SIGKILL just like this, so if they're still running,
# they're supposed to be
for pid in $(initctl list | sed -n -e /process
[0-9]/s/.*process //p); do
OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
done

or

# did an upstart job start since we last polled initctl? check
# again on each loop and add any new jobs (e.g., plymouth) to
# the list.  If we did miss one starting up, this beats waiting
# 10 seconds before shutting down.
for pid in $(initctl list | sed -n -e /process
[0-9]/s/.*process //p); do
OMITPIDS=${OMITPIDS:+$OMITPIDS }-o $pid
done


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to mysql-5.1 in ubuntu.
https://bugs.launchpad.net/bugs/688541

Title:
  race condition on shutdown (leads to corrupted fs)

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs