Re: [systemd-devel] Errorneous detection of degraded array

2017-01-27 Thread Luke Pyzowski
I've modified a number of settings to try to resolve this, so far no success.
I've created an explicit mount file for the RAID array: 
/etc/systemd/system/share.mount
Inside there I've experimented with TimeoutSec=

In /etc/systemd/system/mdadm-last-resort@.timer I've worked with
OnActiveSec=

I've also tried (without an explicit mount file) to add 
x-systemd.device-timeout to /etc/fstab for the mount. 

Here's a few more system logs showing perhaps more detail. I've edited them to 
show only relevant details, full pastebin of logs: http://pastebin.com/sL8nKt7j
These logs were generated with TimeoutSec=120 in 
/etc/systemd/system/share.mount the description of the mount in the logs is: 
"Mount /share RAID partition explicitly"
And OnActiveSec=30 in /etc/systemd/system/mdadm-last-resort@.timer
From blkid:
/dev/md0: UUID="2b9114be-3d5a-41d7-8d4b-e5047d223129" TYPE="ext4"
/dev/md0 is the /share partition.

From /etc/mdadm.conf:
ARRAY /dev/md/0  metadata=1.2 UUID=97566d2f:ae7a169b:966f5840:3e8267f9 
name=lnxnfs01:0

Boot begins at Jan 27 11:33:10
+4 seconds from boot:
Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device 
/dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959.
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdc operational as raid 
disk 2
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdx operational as raid 
disk 23
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdu operational as raid 
disk 20
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdt operational as raid 
disk 19
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdo operational as raid 
disk 14
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdn operational as raid 
disk 13
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdd operational as raid 
disk 3
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdv operational as raid 
disk 21
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sda operational as raid 
disk 0
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdf operational as raid 
disk 5
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdm operational as raid 
disk 12
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sde operational as raid 
disk 4
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdp operational as raid 
disk 15
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdi operational as raid 
disk 8
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdl operational as raid 
disk 11
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdk operational as raid 
disk 10
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sds operational as raid 
disk 18
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdb operational as raid 
disk 1
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdj operational as raid 
disk 9
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdg operational as raid 
disk 6
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdr operational as raid 
disk 17
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdh operational as raid 
disk 7
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdq operational as raid 
disk 16
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdw operational as raid 
disk 22
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: allocated 25534kB
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: raid level 6 active with 24 out 
of 24 devices, algorithm 2
Jan 27 11:33:14 lnxnfs01 kernel: RAID conf printout:
Jan 27 11:33:14 lnxnfs01 kernel:  --- level:6 rd:24 wd:24
Jan 27 11:33:14 lnxnfs01 kernel:  disk 0, o:1, dev:sda
Jan 27 11:33:14 lnxnfs01 kernel:  disk 1, o:1, dev:sdb
Jan 27 11:33:14 lnxnfs01 kernel:  disk 2, o:1, dev:sdc
Jan 27 11:33:14 lnxnfs01 kernel:  disk 3, o:1, dev:sdd
Jan 27 11:33:14 lnxnfs01 kernel:  disk 4, o:1, dev:sde
Jan 27 11:33:14 lnxnfs01 kernel:  disk 5, o:1, dev:sdf
Jan 27 11:33:14 lnxnfs01 kernel:  disk 6, o:1, dev:sdg
Jan 27 11:33:14 lnxnfs01 kernel:  disk 7, o:1, dev:sdh
Jan 27 11:33:14 lnxnfs01 kernel:  disk 8, o:1, dev:sdi
Jan 27 11:33:14 lnxnfs01 kernel:  disk 9, o:1, dev:sdj
Jan 27 11:33:14 lnxnfs01 kernel:  disk 10, o:1, dev:sdk
Jan 27 11:33:14 lnxnfs01 kernel:  disk 11, o:1, dev:sdl
Jan 27 11:33:14 lnxnfs01 kernel:  disk 12, o:1, dev:sdm
Jan 27 11:33:14 lnxnfs01 kernel:  disk 13, o:1, dev:sdn
Jan 27 11:33:14 lnxnfs01 kernel:  disk 14, o:1, dev:sdo
Jan 27 11:33:14 lnxnfs01 kernel:  disk 15, o:1, dev:sdp
Jan 27 11:33:14 lnxnfs01 kernel:  disk 16, o:1, dev:sdq
Jan 27 11:33:14 lnxnfs01 kernel:  disk 17, o:1, dev:sdr
Jan 27 11:33:14 lnxnfs01 kernel:  disk 18, o:1, dev:sds
Jan 27 11:33:14 lnxnfs01 kernel:  disk 19, o:1, dev:sdt
Jan 27 11:33:14 lnxnfs01 kernel:  disk 20, o:1, dev:sdu
Jan 27 11:33:14 lnxnfs01 kernel:  disk 21, o:1, dev:sdv
Jan 27 11:33:14 lnxnfs01 kernel:  disk 22, o:1, dev:sdw
Jan 27 11:33:14 lnxnfs01 kernel:  disk 23, o:1, dev:sdx
Jan 27 11:33:14 lnxnfs01 kernel: md0: detected capacity change from 0 to 
45062020923392
Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device 

Re: [systemd-devel] Errorneous detection of degraded array

2017-01-27 Thread Martin Wilck
> 26.01.2017 21:02, Luke Pyzowski пишет:
> > Hello,
> > I have a large RAID6 device with 24 local drives on CentOS7.3.
> > Randomly (around 50% of the time) systemd will unmount my RAID
> > device thinking it is degraded after the mdadm-last-resort@.timer
> > expires, however the device is working normally by all accounts,
> > and I can immediately mount it manually upon boot completion. In
> > the logs below /share is the RAID device. I can increase the timer
> > in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60
> > seconds, but this problem can randomly still occur.

It seems to me that you rather need to decrease the timeout value, or
(more reasonable) increase x-systemd.device-timeout for the /share
mount point.
Unfortunately your log excerpt contains to time stamps but I suppose
you're facing a race where the device times out before the "last
resort" timer starts it (and before the last devices appear).

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel