Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-03-22 Thread Khalid Aziz
On 03/16/2016 12:18 PM, Jan Stattegger-Sievers wrote:
> Hi,
> 
> for the second time we ran into this problem and lost again data on
> several servers. First time, we assumed a firmware update we ran before
> the reboot to be the cause.
> 
> This time, there was no firmware update and we looked further into this,
> knowing now this is no singular issue. And it turns out: On every kexec
> reboot you will loose data, if your servers are doing a significant
> amount of work shortly before the reboot, since no umount is done
> before the new kernel kicks in.
> 
> This definitely needs a fix!
> 
> Since the proposed fix changes the behavior of reboot and shutdown -r I
> am cc'ing systemd-sysv maintainers.
> 
> Regards,
> Jan
> 

I have managed to reproduce this problem on a test system. I am working
on a fix.

--
Khalid



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-03-20 Thread Jan Stattegger-Sievers
Hi,

for the second time we ran into this problem and lost again data on
several servers. First time, we assumed a firmware update we ran before
the reboot to be the cause.

This time, there was no firmware update and we looked further into this,
knowing now this is no singular issue. And it turns out: On every kexec
reboot you will loose data, if your servers are doing a significant
amount of work shortly before the reboot, since no umount is done
before the new kernel kicks in.

This definitely needs a fix!

Since the proposed fix changes the behavior of reboot and shutdown -r I
am cc'ing systemd-sysv maintainers.

Regards,
Jan



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-03-04 Thread Daniel Baumann

On 2016-03-03 21:48, Khalid Aziz wrote:

Is this issue happening for you on jessie or sid? Your original bug
report said Debian 8.0, so I have been focusing on jessie.


it happens on both, but primarily I'm testing sid.

Regards,
Daniel



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-03-03 Thread Khalid Aziz
On 03/03/2016 12:10 PM, Daniel Baumann wrote:
> On 2016-03-03 01:33, Khalid Aziz and Shuah Khan wrote:
>> I have not been able to reproduce this bug and that has been the
>> limiting factor in being able to fix it.
> 
> I can reliably reproduce it on unmodified, standard/default sid minimal
> install with / on raid1. i'll check tomorrow if i can also reproduce it
> without mdadm (i used to have the problem too without mdadm, but last
> checked a few weeks ago, thus rechecking to confirm).
> 

Hi Daniel,

Is this issue happening for you on jessie or sid? Your original bug
report said Debian 8.0, so I have been focusing on jessie.

--
Khalid



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-03-03 Thread Daniel Baumann

On 2016-03-03 01:33, Khalid Aziz and Shuah Khan wrote:

I have not been able to reproduce this bug and that has been the
limiting factor in being able to fix it.


I can reliably reproduce it on unmodified, standard/default sid minimal 
install with / on raid1. i'll check tomorrow if i can also reproduce it 
without mdadm (i used to have the problem too without mdadm, but last 
checked a few weeks ago, thus rechecking to confirm).


Regards,
Daniel



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-03-02 Thread Khalid Aziz and Shuah Khan
On 02/26/2016 01:13 PM, Daniel Baumann wrote:
> Hi,
> 
> thanks for maintaining kexec-tools, however, kexec-tools reliably and
> reproducibly trashes my root filesystem *on* *every* *reboot*.
> 
> can this be finally fixed please?
> 
> Regards,
> Daniel
> 

Hi Daniel,

I have not been able to reproduce this bug and that has been the
limiting factor in being able to fix it. I have tried this on 3
different machines and kexec reboot always unmounts the root filesystem
correctly before reboot. I am doing a fresh install on another machine
and will spend some time trying to reproduce it. I appreciate your
patience with me.

Thanks,
Khalid



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2016-02-26 Thread Daniel Baumann
Hi,

thanks for maintaining kexec-tools, however, kexec-tools reliably and
reproducibly trashes my root filesystem *on* *every* *reboot*.

can this be finally fixed please?

Regards,
Daniel



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2015-06-25 Thread Thomas Stangner
Hi,

some additional information/thoughts:

Systemds builtin kexec.target is indeed the indended way to do kexec
reboots by the developers. See man systemctl:

kexec
 Shut down and reboot the system via kexec. This is mostly equivalent to
start kexec.target --irreversible, but also prints a wall message to all
users. If combined with --force, shutdown of all running services is
skipped, however all processes are killed and all file systems are
unmounted or mounted read-only, immediately followed by the reboot.


Thus I suggest using the intended way to do kexec reboots on systemd
enabled installations.

This also means breaking the old SysV behavior - when the current target
is kexec, imho you should _always_ load a new kernel image, regardless
of the LOAD_KEXEC value in /etc/default/kexec.

As a first step, the SysV LSB loader script could be modified (with a
slightly modified patch from my first report):

--- debian/kexec-load.init.d.org2014-11-24 05:02:10.0 +0100
+++ debian/kexec-load.init.d2015-06-25 17:16:34.152465942 +0200
@@ -102,12 +102,14 @@
;;
   stop)
# If running systemd, we want kexec reboot only if current
-   # command is reboot
+   # target is kexec (determined via systemd-exec.service)
if [ -d /run/systemd/system ]; then
-   systemctl list-jobs systemd-reboot.service | grep -q
systemd-reboot.service
+   systemctl list-jobs systemd-kexec.service | grep -q
systemd-kexec.service
if [ $? -ne 0 ]; then
exit 0
fi
+   # Override LOAD_EXEC option, because kexec target always
implies kexec reboot
+   LOAD_KEXEC=true
fi
do_stop
;;


The SysV LSB kexec script also has to be modified, kexec invocation is
done by systemd (when everything has been properly terminated/mounted
read only), so it should be skipped in the LSB script when running
systemd. If you don't skip it, you will end up with race conditions and
possible data corruption (as explained earlier).

+++ debian/kexec.init.d 2015-06-25 17:21:02.405402935 +0200
@@ -36,7 +36,10 @@
exit 3
;;
   stop)
-   do_stop
+   # Systemd has its own kexec service (which will call the kexec
binary), so skip, if running with systemd
+   if [ ! -d /run/systemd/system ]; then
+   do_stop
+   fi
;;
   *)
echo Usage: $0 start|stop 2



Both modifications should make kexec work reliably on systemd enabled
installations, kexec reboots will have to be performed/triggered with
'systemctl kexec' (or 'systemctl start kexec.target'), other reboot
methods will result in cold reboots, the value of LOAD_KEXEC in
/etc/default/kexec is ignored.

The old behavior will not change when running SysV as init system.


As a second step, I would suggest creating a proper systemd unit file to
do the kernel image loading, instead of relying on systemd-sysv to
autogenerate it from the SysV LSB init script.


Cheers,
Thomas


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2015-06-19 Thread Thomas Stangner
Hi Khalid,

first of all, thanks for looking into the issue.

I am by far no systemd expert, so if I get anything wrong, please
correct me :) I do think however, that I know enough to be certain that
there is a problem with the package when utilizing systemd as init system.


 Have you seen any evidence of corruption on a filesystem?

Yes, I can reproduce unclean (un-)mounts on my testing machine (Xeon
E3-1245, SSD) when using the original kexec-tools package on every
reboot with kexec enabled:

[...]
[Sat Jun 20 00:05:32 2015] Adding 4194300k swap on /dev/mapper/vg-swap.
 Priority:-1 extents:1 across:4194300k FS
[Sat Jun 20 00:05:32 2015] EXT4-fs (dm-4): 1 orphan inode deleted
[Sat Jun 20 00:05:32 2015] EXT4-fs (dm-4): recovery complete
[Sat Jun 20 00:05:32 2015] EXT4-fs (dm-4): mounted filesystem with
ordered data mode. Opts: (null)
[Sat Jun 20 00:05:32 2015] EXT4-fs (dm-3): 5 orphan inodes deleted
[Sat Jun 20 00:05:32 2015] EXT4-fs (dm-3): recovery complete
[...]

Whereas I cannot reproduce this when using my patched package:

[...]
[Sat Jun 20 00:07:39 2015] EXT4-fs (dm-3): mounted filesystem with
ordered data mode. Opts: usrjquota=aquota.user,jqfmt=vfsv1
[Sat Jun 20 00:07:39 2015] EXT4-fs (dm-4): mounted filesystem with
ordered data mode. Opts: (null)
[Sat Jun 20 00:07:39 2015] Adding 4194300k swap on /dev/mapper/vg-swap.
 Priority:-1 extents:1 across:4194300k FS
[Sat Jun 20 00:07:39 2015] EXT4-fs (dm-5): mounted filesystem with
ordered data mode. Opts: (null)
[...]

Reboot times are roughly the same, only with the current official Jessie
package, I get unclean mounts all the time, which in turn suggests, that
the filesystem has not been properly unmounted before kexec loaded the
new kernel image.


 /etc/init.d/kexec script header says:

 # X-Stop-After:umountroot

 which means stop target for /etc/init.d/kexec will not be called until
 root fs has been unmounted, at which point all other filesystems must
 have already been unmounted.

When using SysV init, this is indeed true - however systemd seems to
mask the umountroot/umountfs LSB init scripts, which means that they are
not considered for dependencies at all. Thus the kexec LSB init script
dependency is rendered useless:


lrwxrwxrwx 1 root root 9 May 26 08:07
/lib/systemd/system/umountroot.service - /dev/null
lrwxrwxrwx 1 root root 9 May 26 08:07
/lib/systemd/system/umountfs.service - /dev/null


This can also bee seen when checking the unit file (auto-)generated by
systemd-sysv:

/run/systemd/generator.late/kexec-load.service:

# Automatically generated by systemd-sysv-generator

[Unit]
SourcePath=/etc/init.d/kexec-load
Description=LSB: Load kernel image with kexec
Before=runlevel2.target runlevel3.target runlevel4.target
runlevel5.target shutdown.target
After=local-fs.target remote-fs.target kexec.service
Conflicts=shutdown.target

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SysVStartPriority=2
ExecStart=/etc/init.d/kexec-load start
ExecStop=/etc/init.d/kexec-load stop



/run/systemd/generator.late/kexec.service:

# Automatically generated by systemd-sysv-generator

[Unit]
SourcePath=/etc/init.d/kexec
Description=LSB: Execute the kexec -e command to reboot system
Before=runlevel2.target runlevel3.target runlevel4.target
runlevel5.target shutdown.target
Conflicts=shutdown.target

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SysVStartPriority=1
ExecStart=/etc/init.d/kexec start
ExecStop=/etc/init.d/kexec stop


So basically kexec.service depends on kexec-load.service to be stopped
before itself, which is okay, because the kernel image needs to be
loaded before doing the actual kexec reboot.
However there are no other dependencies whatsoever on the filesystem or
other services, which basically means that kexec.service will be in a
race with other services being shut down at the same time.
If you get lucky the other services have been stopped before kexec gets
executed, otherwise you are not (and the observed inconsistencies may
occur) :(


 Rebooting the system with reboot calls the target reboot.

That is correct.

If one invokes the target kexec directly, their intent is to kexec
immediately
 (same as exceuting kexec -e which does not do an orderly shutdown).
 Orderly shutdown is meant to happen when reboot is called, not when
 kernel is kexec'd explicitly by the user. I do not agree with this change.

This however, is not. I guess you are confusing kexec.service (which is
the autogenerated unit for your LSB kexec script) with the systemds'
builtin kexec.target (which will be used when issuing 'systemctl kexec')
- the according unit file can be found here:


/lib/systemd/system/kexec.target:

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free 

Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2015-06-19 Thread Khalid Aziz
I have a freshly installed jessie system (which is running systemd, not 
SysV init) and I am having trouble reproducing most of these issues. Let 
me address these one at a time below.


On 05/19/2015 07:15 AM, Thomas Stangner wrote:

Package: kexec-tools
Version: 1:2.0.7-5
Severity: grave
Tags: patch

Rebooting via kexec is currently broken when using the default init system of 
Jessie (systemd) and may result in filesystem corruption or other unwanted 
effects of improperly shutdown services. This bug does not apply when using the 
SysV init system.

Reboots (e.g. via 'reboot', 'systemctl kexec' or 'systemctl reboot') will cause 
the LSB SysV scripts of kexec-tools (/etc/init.d/kexec-load and 
/etc/init.d/kexec) to be stopped BEFORE reaching the systemd kexec target (and 
other systemd targets), which means that other LSB init scripts and/or systemd 
services will be in a race condition with the LSB kexec script and the 
filesystems may not be properly unmounted beforce /etc/init.d/kexec calls 
/sbin/kexec -e.
As an effect of this, after rebooting via kexec, one may often observe orphaned 
inodes or other filesystem incosistencies, especially on a busy machine with 
heavy IO.



/etc/init.d/kexec script header says:

# X-Stop-After: umountroot

which means stop target for /etc/init.d/kexec will not be called until 
root fs has been unmounted, at which point all other filesystems must 
have already been unmounted.


Have you seen any evidence of corruption on a filesystem?


There also seems to be logical error in the /etc/init.d/kexec-load LSB script, 
because the kexec image will only get loaded when the systemd target is reboot 
- when the target is kexec, the image won't be loaded and thus a normal reboot 
will occur.


Rebooting the system with reboot calls the target reboot. If one 
invokes the target kexec directly, their intent is to kexec immediately 
(same as exceuting kexec -e which does not do an orderly shutdown). 
Orderly shutdown is meant to happen when reboot is called, not when 
kernel is kexec'd explicitly by the user. I do not agree with this change.




The following patch for the source package should fix this and make rebooting 
via 'systemctl kexec' possible (when /etc/default/kexec has LOAD_KEXEC 
enabled), 'reboot' or 'systemctl reboot' will result in normal reboots.


--- debian/kexec.init.d.org 2014-07-25 19:03:25.0 +0200
+++ debian/kexec.init.d 2015-05-19 14:26:40.680256999 +0200
@@ -36,7 +36,10 @@
 exit 3
 ;;
stop)
-   do_stop
+   # Only execute, when running with SysV, systemd has its own kexec target
+   if [ ! -d /run/systemd/system ] ; then
+   do_stop
+   fi


This causes systems running systemd to do a normal full reboot, not a 
kexec reboot. I applied this change to a freshly installed jessie system 
running systemd and confirmed lack of kexec reboot with this change.



 ;;
*)
 echo Usage: $0 start|stop 2
--- debian/kexec-load.init.d.org2014-11-24 05:02:10.0 +0100
+++ debian/kexec-load.init.d2015-05-19 14:27:43.431537728 +0200
@@ -102,9 +102,9 @@
 ;;
stop)
 # If running systemd, we want kexec reboot only if current
-   # command is reboot
+   # target is kexec
 if [ -d /run/systemd/system ]; then
-   systemctl list-jobs systemd-reboot.service | grep -q 
systemd-reboot.service
+   systemctl list-jobs systemd-kexec.service | grep -q 
systemd-kexec.service
 if [ $? -ne 0 ]; then
 exit 0
 fi


This change also bypasses kexec reboot and does a normal reboot. Again 
confirmed on a freshly installed jessie system.


--
Khalid




-- System Information:
Debian Release: 8.0
   APT prefers stable-updates
   APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages kexec-tools depends on:
ii  debconf [debconf-2.0]  1.5.56
ii  libc6  2.19-18

kexec-tools recommends no packages.

kexec-tools suggests no packages.

-- debconf information:
* kexec-tools/load_kexec: true
   kexec-tools/use_grub_config: false




--
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#785714: kexec-tools is broken when using systemd, danger of filesystem corruption

2015-05-19 Thread Thomas Stangner
Package: kexec-tools
Version: 1:2.0.7-5
Severity: grave
Tags: patch

Rebooting via kexec is currently broken when using the default init system of 
Jessie (systemd) and may result in filesystem corruption or other unwanted 
effects of improperly shutdown services. This bug does not apply when using the 
SysV init system.

Reboots (e.g. via 'reboot', 'systemctl kexec' or 'systemctl reboot') will cause 
the LSB SysV scripts of kexec-tools (/etc/init.d/kexec-load and 
/etc/init.d/kexec) to be stopped BEFORE reaching the systemd kexec target (and 
other systemd targets), which means that other LSB init scripts and/or systemd 
services will be in a race condition with the LSB kexec script and the 
filesystems may not be properly unmounted beforce /etc/init.d/kexec calls 
/sbin/kexec -e.
As an effect of this, after rebooting via kexec, one may often observe orphaned 
inodes or other filesystem incosistencies, especially on a busy machine with 
heavy IO.

There also seems to be logical error in the /etc/init.d/kexec-load LSB script, 
because the kexec image will only get loaded when the systemd target is reboot 
- when the target is kexec, the image won't be loaded and thus a normal reboot 
will occur.

The following patch for the source package should fix this and make rebooting 
via 'systemctl kexec' possible (when /etc/default/kexec has LOAD_KEXEC 
enabled), 'reboot' or 'systemctl reboot' will result in normal reboots.


--- debian/kexec.init.d.org 2014-07-25 19:03:25.0 +0200
+++ debian/kexec.init.d 2015-05-19 14:26:40.680256999 +0200
@@ -36,7 +36,10 @@
exit 3
;;
   stop)
-   do_stop
+   # Only execute, when running with SysV, systemd has its own kexec target
+   if [ ! -d /run/systemd/system ] ; then
+   do_stop
+   fi
;;
   *)
echo Usage: $0 start|stop 2
--- debian/kexec-load.init.d.org2014-11-24 05:02:10.0 +0100
+++ debian/kexec-load.init.d2015-05-19 14:27:43.431537728 +0200
@@ -102,9 +102,9 @@
;;
   stop)
# If running systemd, we want kexec reboot only if current
-   # command is reboot
+   # target is kexec
if [ -d /run/systemd/system ]; then
-   systemctl list-jobs systemd-reboot.service | grep -q 
systemd-reboot.service
+   systemctl list-jobs systemd-kexec.service | grep -q 
systemd-kexec.service
if [ $? -ne 0 ]; then
exit 0
fi


-- System Information:
Debian Release: 8.0
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages kexec-tools depends on:
ii  debconf [debconf-2.0]  1.5.56
ii  libc6  2.19-18

kexec-tools recommends no packages.

kexec-tools suggests no packages.

-- debconf information:
* kexec-tools/load_kexec: true
  kexec-tools/use_grub_config: false


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org