Re: [systemd-devel] Errorneous detection of degraded array
26.01.2017 21:02, Luke Pyzowski пишет: > Hello, > I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly > (around 50% of the time) systemd will unmount my RAID device thinking it is > degraded after the mdadm-last-resort@.timer expires, however the device is > working normally by all accounts, and I can immediately mount it manually > upon boot completion. In the logs below /share is the RAID device. I can > increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from > 30 to 60 seconds, but this problem can randomly still occur. > > systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. > systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. > systemd[1]: Starting Activate md array even though degraded... > systemd[1]: Stopped target Local File Systems. > systemd[1]: Stopping Local File Systems. > systemd[1]: Unmounting /share... > systemd[1]: Stopped (with error) /dev/md0. > systemd[1]: Started Activate md array even though degraded. > systemd[1]: Unmounted /share. > > When the system boots normally the following is in the logs: > systemd[1]: Started Timer to wait for more drives before activating degraded > array.. > systemd[1]: Starting Timer to wait for more drives before activating degraded > array.. > ... > systemd[1]: Stopped Timer to wait for more drives before activating degraded > array.. > systemd[1]: Stopping Timer to wait for more drives before activating degraded > array.. > > The above occurs within the same second according to the timestamps and the > timer ends prior to mounting any local filesystems, it properly detects that > the RAID is valid and everything continues normally. The other RAID device - > a RAID1 of 2 disks containing swap and / have never exhibited this failure. > > My question is, what are the conditions where systemd detects the RAID6 as > being degraded? It seems to be a race condition somewhere, but I am not sure > what configuration should be modified if any. If needed I can provide more > verbose logs, just let me know if they might be useful. > It is not directly related to systemd. When block device that is part of MD array is detected by kernel, udev rule queries array if it is complete. If it is, it starts array (subject to general rules of which arrays are auto-started); and if not, it (udev rule) starts timer to assemble degraded array. See udev-md-raid-assembly.rules in mdadm sources: ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer" So it looks like events for some array members either got lost or are delivered late. Note that there was discussion on openSUSE list where arrays would not be auto-assembled on boot, even though triggering device change *after* initial boot would correctly run these rules. This situation was triggered by adding extra disk to the system (i.e. - boot with 3 disks worked, with 4 disks - not). I could not find any hints even after enabling full udev and systemd debug logs. Logs are available if anyone wants to try it. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
Am 26.01.2017 um 19:02 schrieb Luke Pyzowski: I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur. systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting Activate md array even though degraded... systemd[1]: Stopped target Local File Systems. systemd[1]: Stopping Local File Systems. systemd[1]: Unmounting /share... systemd[1]: Stopped (with error) /dev/md0. systemd[1]: Started Activate md array even though degraded. systemd[1]: Unmounted /share. that also happens randomly in my Fedora 24 testing-vm with a RAID10 and you can be sure that in a virtual machine drives don't disappear or take long to appear ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
Am 26.01.2017 um 19:02 schrieb Luke Pyzowski: I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur. systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting Activate md array even though degraded... systemd[1]: Stopped target Local File Systems. systemd[1]: Stopping Local File Systems. systemd[1]: Unmounting /share... systemd[1]: Stopped (with error) /dev/md0. systemd[1]: Started Activate md array even though degraded. systemd[1]: Unmounted /share. that also happens randomly in my Fedora 24 testing-vm with a RAID10 and you can be sure that in a virtual machine drives don't disappear or take long to appear ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Errorneous detection of degraded array
Hello, I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur. systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting Activate md array even though degraded... systemd[1]: Stopped target Local File Systems. systemd[1]: Stopping Local File Systems. systemd[1]: Unmounting /share... systemd[1]: Stopped (with error) /dev/md0. systemd[1]: Started Activate md array even though degraded. systemd[1]: Unmounted /share. When the system boots normally the following is in the logs: systemd[1]: Started Timer to wait for more drives before activating degraded array.. systemd[1]: Starting Timer to wait for more drives before activating degraded array.. ... systemd[1]: Stopped Timer to wait for more drives before activating degraded array.. systemd[1]: Stopping Timer to wait for more drives before activating degraded array.. The above occurs within the same second according to the timestamps and the timer ends prior to mounting any local filesystems, it properly detects that the RAID is valid and everything continues normally. The other RAID device - a RAID1 of 2 disks containing swap and / have never exhibited this failure. My question is, what are the conditions where systemd detects the RAID6 as being degraded? It seems to be a race condition somewhere, but I am not sure what configuration should be modified if any. If needed I can provide more verbose logs, just let me know if they might be useful. Many thanks, Luke Pyzowski ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] how to add test sysfs nodes, to sys.tar.xz?
Hi, Re: bug 4833 and pull request 4837, https://github.com/systemd/systemd/issues/4833 https://github.com/systemd/systemd/pull/4837 I was thinking of how to add some test cases for whitespace, and keszybz just merged these tests: https://github.com/systemd/systemd/pull/5158 which test for proper whitespace->underscore replacement using $env{}. However, I'd like to also add tests for whitespace replacement using actual device $attr{}, which I think means the test/sys.tar.xz file needs to be updated to add device (maybe a NVMe device) nodes that include whitespace in its model and/or serial strings - is that how new test sysfs device nodes are added? Updating the entire binary seems like a big change just for a few device node files.. Thanks! ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] how to add test sysfs nodes, to sys.tar.xz?
Thanks for working on the tests. On Thu, Jan 26, 2017 at 09:21:41AM -0500, Dan Streetman wrote: > However, I'd like to also add tests for whitespace replacement using > actual device $attr{}, which I think means the test/sys.tar.xz file > needs to be updated to add device (maybe a NVMe device) nodes that > include whitespace in its model and/or serial strings - is that how > new test sysfs device nodes are added? Updating the entire binary > seems like a big change just for a few device node files.. It's only 162k. It's not perfect that we have to update it every time we add tests, but it's not too terrible. If you're feeling ambitious, you might want to convert that tarball to a script which generates the nodes. After all, it's just a bunch of directories, with symlinks and a few simple text files. Then this will be normal text file and git will be able to track changes to it. This would a much nicer solution in the long run. Z. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel