Re: [systemd-devel] Errorneous detection of degraded array
> Does > systemctl list-dependencies sys-devices-virtual-block-md0.device > report anything interesting? I get > > sys-devices-virtual-block-md0.device > ● └─mdmonitor.service Nothing interesting, the same output as you have above. > Could you try run with systemd.log_level=debug on kernel command line and > upload journal again. We can only hope that it will not skew timings enough > but it may prove my hypothesis. I've uploaded the full debug logs to: https://gist.github.com/Kryai/8273322c8a61347e2300e476c70b4d05 In around 20 reboots, the error appeared only twice, certainly with debug enabled it is more rare, but it does still occur, but to your correct guess, debug logging does affect the exhibition of the race condition. Reminder of key things in the log: # cat /etc/systemd/system/mdadm-last-resort@.timer [Unit] Description=Timer to wait for more drives before activating degraded array. DefaultDependencies=no Conflicts=sys-devices-virtual-block-%i.device [Timer] OnActiveSec=30 # cat /etc/systemd/system/share.mount [Unit] Description=Mount /share RAID partition explicitly Before=nfs-server.service [Mount] What=/dev/disk/by-uuid/2b9114be-3d5a-41d7-8d4b-e5047d223129 Where=/share Type=ext4 Options=defaults TimeoutSec=120 [Install] WantedBy=multi-user.target Again, if any more information is needed please let me know I'll provide it. Many thanks, Luke Pyzowski ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
mdadm-last-resort@.timer to OnActiveSec=300 and TimeoutSec=320. This give me enough time to log in to the system. During that time, I can view the RAID and everything appears proper, yet 300 seconds later the mdadm-last-resort@.timer expires with an error on /dev/md0? Perhaps systemd is working normally, but then the question is why is the raid not being assembled properly - which triggers /usr/lib/systemd/system/mdadm-last-resort@.service? From your suggestion I need to next move to full udev debug logs? Can I delay the assembly of the RAID by the kernel if this is a race condition, as it appears that might be the case? Should that delay be early in the systemd startup process or elsewhere? Thanks again, Luke Pyzowski ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Errorneous detection of degraded array
Hello, I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur. systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting Activate md array even though degraded... systemd[1]: Stopped target Local File Systems. systemd[1]: Stopping Local File Systems. systemd[1]: Unmounting /share... systemd[1]: Stopped (with error) /dev/md0. systemd[1]: Started Activate md array even though degraded. systemd[1]: Unmounted /share. When the system boots normally the following is in the logs: systemd[1]: Started Timer to wait for more drives before activating degraded array.. systemd[1]: Starting Timer to wait for more drives before activating degraded array.. ... systemd[1]: Stopped Timer to wait for more drives before activating degraded array.. systemd[1]: Stopping Timer to wait for more drives before activating degraded array.. The above occurs within the same second according to the timestamps and the timer ends prior to mounting any local filesystems, it properly detects that the RAID is valid and everything continues normally. The other RAID device - a RAID1 of 2 disks containing swap and / have never exhibited this failure. My question is, what are the conditions where systemd detects the RAID6 as being degraded? It seems to be a race condition somewhere, but I am not sure what configuration should be modified if any. If needed I can provide more verbose logs, just let me know if they might be useful. Many thanks, Luke Pyzowski ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel