Re: [systemd-devel] Errorneous detection of degraded array

2017-01-26 Thread Andrei Borzenkov
26.01.2017 21:02, Luke Pyzowski пишет:
> Hello,
> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly 
> (around 50% of the time) systemd will unmount my RAID device thinking it is 
> degraded after the mdadm-last-resort@.timer expires, however the device is 
> working normally by all accounts, and I can immediately mount it manually 
> upon boot completion. In the logs below /share is the RAID device. I can 
> increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 
> 30 to 60 seconds, but this problem can randomly still occur.
> 
> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
> systemd[1]: Starting Activate md array even though degraded...
> systemd[1]: Stopped target Local File Systems.
> systemd[1]: Stopping Local File Systems.
> systemd[1]: Unmounting /share...
> systemd[1]: Stopped (with error) /dev/md0.
> systemd[1]: Started Activate md array even though degraded.
> systemd[1]: Unmounted /share.
> 
> When the system boots normally the following is in the logs:
> systemd[1]: Started Timer to wait for more drives before activating degraded 
> array..
> systemd[1]: Starting Timer to wait for more drives before activating degraded 
> array..
> ...
> systemd[1]: Stopped Timer to wait for more drives before activating degraded 
> array..
> systemd[1]: Stopping Timer to wait for more drives before activating degraded 
> array..
> 
> The above occurs within the same second according to the timestamps and the 
> timer ends prior to mounting any local filesystems, it properly detects that 
> the RAID is valid and everything continues normally. The other RAID device - 
> a RAID1 of 2 disks containing swap and / have never exhibited this failure.
> 
> My question is, what are the conditions where systemd detects the RAID6 as 
> being degraded? It seems to be a race condition somewhere, but I am not sure 
> what configuration should be modified if any. If needed I can provide more 
> verbose logs, just let me know if they might be useful.
> 

It is not directly related to systemd. When block device that is part of
MD array is detected by kernel, udev rule queries array if it is
complete. If it is, it starts array (subject to general rules of which
arrays are auto-started); and if not, it (udev rule) starts timer to
assemble degraded array.

See udev-md-raid-assembly.rules in mdadm sources:

ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*",
ENV{MD_FOREIGN}=="no",
ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"

So it looks like events for some array members either got lost or are
delivered late.

Note that there was discussion on openSUSE list where arrays would not
be auto-assembled on boot, even though triggering device change *after*
initial boot would correctly run these rules. This situation was
triggered by adding extra disk to the system (i.e. - boot with 3 disks
worked, with 4 disks - not). I could not find any hints even after
enabling full udev and systemd debug logs. Logs are available if anyone
wants to try it.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Errorneous detection of degraded array

2017-01-26 Thread Reindl Harald



Am 26.01.2017 um 19:02 schrieb Luke Pyzowski:

I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 
50% of the time) systemd will unmount my RAID device thinking it is degraded 
after the mdadm-last-resort@.timer expires, however the device is working 
normally by all accounts, and I can immediately mount it manually upon boot 
completion. In the logs below /share is the RAID device. I can increase the 
timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 
seconds, but this problem can randomly still occur.

systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
systemd[1]: Starting Activate md array even though degraded...
systemd[1]: Stopped target Local File Systems.
systemd[1]: Stopping Local File Systems.
systemd[1]: Unmounting /share...
systemd[1]: Stopped (with error) /dev/md0.
systemd[1]: Started Activate md array even though degraded.
systemd[1]: Unmounted /share.


that also happens randomly in my Fedora 24 testing-vm with a RAID10 and 
you can be sure that in a virtual machine drives don't disappear or take 
long to appear

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Errorneous detection of degraded array

2017-01-26 Thread Reindl Harald



Am 26.01.2017 um 19:02 schrieb Luke Pyzowski:

I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 
50% of the time) systemd will unmount my RAID device thinking it is degraded 
after the mdadm-last-resort@.timer expires, however the device is working 
normally by all accounts, and I can immediately mount it manually upon boot 
completion. In the logs below /share is the RAID device. I can increase the 
timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 
seconds, but this problem can randomly still occur.

systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
systemd[1]: Starting Activate md array even though degraded...
systemd[1]: Stopped target Local File Systems.
systemd[1]: Stopping Local File Systems.
systemd[1]: Unmounting /share...
systemd[1]: Stopped (with error) /dev/md0.
systemd[1]: Started Activate md array even though degraded.
systemd[1]: Unmounted /share.


that also happens randomly in my Fedora 24 testing-vm with a RAID10 and 
you can be sure that in a virtual machine drives don't disappear or take 
long to appear

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Errorneous detection of degraded array

2017-01-26 Thread Luke Pyzowski
Hello,
I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 
50% of the time) systemd will unmount my RAID device thinking it is degraded 
after the mdadm-last-resort@.timer expires, however the device is working 
normally by all accounts, and I can immediately mount it manually upon boot 
completion. In the logs below /share is the RAID device. I can increase the 
timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 
seconds, but this problem can randomly still occur.

systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
systemd[1]: Starting Activate md array even though degraded...
systemd[1]: Stopped target Local File Systems.
systemd[1]: Stopping Local File Systems.
systemd[1]: Unmounting /share...
systemd[1]: Stopped (with error) /dev/md0.
systemd[1]: Started Activate md array even though degraded.
systemd[1]: Unmounted /share.

When the system boots normally the following is in the logs:
systemd[1]: Started Timer to wait for more drives before activating degraded 
array..
systemd[1]: Starting Timer to wait for more drives before activating degraded 
array..
...
systemd[1]: Stopped Timer to wait for more drives before activating degraded 
array..
systemd[1]: Stopping Timer to wait for more drives before activating degraded 
array..

The above occurs within the same second according to the timestamps and the 
timer ends prior to mounting any local filesystems, it properly detects that 
the RAID is valid and everything continues normally. The other RAID device - a 
RAID1 of 2 disks containing swap and / have never exhibited this failure.

My question is, what are the conditions where systemd detects the RAID6 as 
being degraded? It seems to be a race condition somewhere, but I am not sure 
what configuration should be modified if any. If needed I can provide more 
verbose logs, just let me know if they might be useful.

Many thanks,
Luke Pyzowski
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] how to add test sysfs nodes, to sys.tar.xz?

2017-01-26 Thread Dan Streetman
Hi,

Re: bug 4833 and pull request 4837,
https://github.com/systemd/systemd/issues/4833
https://github.com/systemd/systemd/pull/4837

I was thinking of how to add some test cases for whitespace, and
keszybz just merged these tests:
https://github.com/systemd/systemd/pull/5158

which test for proper whitespace->underscore replacement using $env{}.
However, I'd like to also add tests for whitespace replacement using
actual device $attr{}, which I think means the test/sys.tar.xz file
needs to be updated to add device (maybe a NVMe device) nodes that
include whitespace in its model and/or serial strings - is that how
new test sysfs device nodes are added?  Updating the entire binary
seems like a big change just for a few device node files..

Thanks!
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] how to add test sysfs nodes, to sys.tar.xz?

2017-01-26 Thread Zbigniew Jędrzejewski-Szmek
Thanks for working on the tests.

On Thu, Jan 26, 2017 at 09:21:41AM -0500, Dan Streetman wrote:
> However, I'd like to also add tests for whitespace replacement using
> actual device $attr{}, which I think means the test/sys.tar.xz file
> needs to be updated to add device (maybe a NVMe device) nodes that
> include whitespace in its model and/or serial strings - is that how
> new test sysfs device nodes are added?  Updating the entire binary
> seems like a big change just for a few device node files..
It's only 162k. It's not perfect that we have to update it every time
we add tests, but it's not too terrible.

If you're feeling ambitious, you might want to convert that tarball to
a script which generates the nodes. After all, it's just a bunch of
directories, with symlinks and a few simple text files. Then this will
be normal text file and git will be able to track changes to it. This
would a much nicer solution in the long run.

Z.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel