The shutdown now also hangs during a resync. So it behaves
inconsistently. I update the subject and the description to reflect
this.
** Summary changed:
- Reboot hangs once RAID1 resynced
+ Shutdown hangs in md kworker after "Reached target Shutdown."
** Description changed:
I'm booting a fully patched 16.04 from an Intel Rapid Storage Technology
enterprise RAID1 volume (ThinkServer TS140 with two SATA
ST1000NM0033-9ZM drives, ext4 root partition, no LVM, UEFI mode).
- If the RAID volume is recovering or resyncing for whatever reason, then
- `sudo systemctl reboot` and `sudo systemctl poweroff` work fine (I had
- to `sudo systemctl --now disable lvm2-lvmetad lvm2-lvmpolld
- lvm2-monitor` in order to consistently get that). However, once the
- recovery/resync is complete and clean, the reboot and poweroff commands
- above hang forever after "Reached target Shutdown.". Note that issuing
- `sudo swapoff -a` beforehand (suggested in the bug #1464917) does not
- help.
+ If the RAID volume is recovering or resyncing for whatever reason, then `sudo
systemctl reboot` and `sudo systemctl poweroff` work fine (I had to `sudo
systemctl --now disable lvm2-lvmetad lvm2-lvmpolld lvm2-monitor` in order to
consistently get that). However, once the recovery/resync is complete and
clean, the reboot and poweroff commands above hang forever after "Reached
target Shutdown.". Note that issuing `sudo swapoff -a` beforehand (suggested in
the bug #1464917) does not help.
+ [EDIT]Actually, the shutdown also hangs from time to time during a resync.
But I've never seen it succeed once the resync is complete.[/EDIT]
Then, if the server has been forcibly restarted with the power button,
the Intel Matrix Storage Manager indicates a "Normal" status for the
RAID1 volume, but Ubuntu then resyncs the volume anyway:
[1.223649] md: bind
[1.228426] md: bind
[1.230030] md: bind
[1.230738] md: bind
[1.232985] usbcore: registered new interface driver usbhid
[1.233494] usbhid: USB HID core driver
[1.234022] md: raid1 personality registered for level 1
[1.234876] md/raid1:md126: not clean -- starting background reconstruction
[1.234956] input: CHESEN USB Keyboard as
/devices/pci:00/:00:14.0/usb3/3-10/3-10:1.0/0003:0A81:0101.0001/input/input5
[1.236273] md/raid1:md126: active with 2 out of 2 mirrors
[1.236797] md126: detected capacity change from 0 to 1000202043392
[1.246271] md: md126 switched to read-write mode.
[1.246834] md: resync of RAID array md126
[1.247325] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[1.247503] md126: p1 p2 p3 p4
[1.248269] md: using maximum available idle IO bandwidth (but not more
than 20 KB/sec) for resync.
[1.248774] md: using 128k window, over a total of 976759940k.
Note that the pain of "resync upon every (re)boot" cannot even be a bit
relieved thanks to bitmaps because mdadm does not support them for IMSM
containers:
$ sudo mdadm --grow --bitmap=internal /dev/md126
mdadm: Cannot add bitmaps to sub-arrays yet
I also get this in syslog during boot when the individual drives are
detected, but this seems to be harmless:
May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/sbin/mdadm --incremental
/dev/sdb --offroot' failed with exit code 1.
May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/lib/udev/hdparm' failed
with exit code 1.
May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/sbin/mdadm --incremental
/dev/sda --offroot' failed with exit code 1.
May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/lib/udev/hdparm' failed
with exit code 1.
During a resync, `sudo sh -c 'echo idle >
/sys/block/md126/md/sync_action'` actually stops it as expected, but it
restarts immediately though nothing seems to have triggered it:
May 30 18:17:02 wssrv1 kernel: [ 3106.826710] md: md126: resync interrupted.
May 30 18:17:02 wssrv1 kernel: [ 3106.836320] md: checkpointing resync of
md126.
May 30 18:17:02 wssrv1 kernel: [ 3106.836623] md: resync of RAID array md126
May 30 18:17:02 wssrv1 kernel: [ 3106.836625] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
May 30 18:17:02 wssrv1 kernel: [ 3106.836626] md: using maximum available
idle IO bandwidth (but not more than 20 KB/sec) for resync.
May 30 18:17:02 wssrv1 kernel: [ 3106.836627] md: using 128k window, over a
total of 976759940k.
May 30 18:17:02 wssrv1 kernel: [ 3106.836628] md: resuming resync of md126
from checkpoint.
May 30 18:17:02 wssrv1 mdadm[982]: RebuildStarted event detected on md device
/dev/md/Volume0
- I attach screenshots of the hanging shutdown log after a `sudo sh -c
- 'echo 8 > /proc/sys/kernel/printk'`. The second screenshot shows that
- the kernel has deadlocked in md_write_start(). Note that `sudo systemctl
- start debug-shell` is unusable on this machine at this point because
- Ctrl+Alt+F9 brings tty9 without any keyboard.
+