Hello folks,

This is my first adventure with SL after many years of using CentOS. I'm using SL-6 on a large-ish VM server, and have been quite happy with it.

I am experiencing a weird problem at bootup with large RAID-6 arrays. After Googling around (a lot) I find that others are having the same issues with CentOS/RHEL/Ubuntu/whatever. In my case it's Scientific Linux-6 which should behave the same way as CentOS-6. I had the same problem with the RHEL-6 evaluation version. I'm posting this question to the CentOS mailing list as well.

For some reason, each time I boot the server a random number of RAID arrays will come up with the hot-spare missing. This occurs with hot-spare components only, never with the active components. Once in a while I'm lucky enough to have all components come up correctly when the system boots. Which hot spares fail to be configured is completely random.

I have 12 2TB drives, each divided into 4 primary partitions, and configured as 8 partitionable MD arrays. All drives are partitioned exactly the same way. Each R6 array consists of 5 components (partitions) plus a hot-spare. The small RAID-1 host OS array never has a problem with its hot spare.

The predominant theory via Google is that there's a race condition at boot time between full enumeration of all disk partitions and mdadm assembling the arrays.

Does anyone know of a way to have mdadm delay its assembly until all partitions are enumerated? Even if it's simply to insert a several-second wait time, that would probably work. My knowledge of the internal workings of the boot process isn't good enough to know where to look.

I tried to issue 'mdadm -A -s /dev/md/md_dXX' after booting, but all it does is complain about "No suitable drives found for /dev....."

Here is the mdadm.conf file:
-------------------------------------

MAILADDR root
PROGRAM /root/bin/record_md_events.sh

DEVICE partitions
##DEVICE /dev/sd*    <<---- this didn't help.
AUTO +imsm +1.x -all

## Host OS root arrays:
ARRAY /dev/md0
   metadata=1.0 num-devices=2 spares=1
   UUID=75941adb:33e8fa6a:095a70fd:6fe72c69
ARRAY /dev/md1
   metadata=1.1 num-devices=2 spares=1
   UUID=7a96d82d:bd6480a2:7433f1c2:947b84e9
ARRAY /dev/md2
   metadata=1.1 num-devices=2 spares=1
   UUID=ffc6070d:e57a675e:a1624e53:b88479d0

## Partitionable arrays on LSI controller:
ARRAY /dev/md/md_d10
   metadata=1.2 num-devices=5 spares=1
   UUID=135f0072:90551266:5d9a126a:011e3471
ARRAY /dev/md/md_d11
   metadata=1.2 num-devices=5 spares=1
   UUID=59e05755:5b3ec51e:e3002cfd:f0720c38
ARRAY /dev/md/md_d12
   metadata=1.2 num-devices=5 spares=1
   UUID=7916eb13:cd5063ba:a1404cd7:3b65a438
ARRAY /dev/md/md_d13
   metadata=1.2 num-devices=5 spares=1
   UUID=9a767e04:e4e56a9d:c369d25c:9d333760

## Partitionable arrays on Tempo controllers:
ARRAY /dev/md/md_d20
   metadata=1.2 num-devices=5 spares=1
   UUID=1d5a3c32:eb9374ac:eff41754:f8a176c1
ARRAY /dev/md/md_d21
   metadata=1.2 num-devices=5 spares=1
   UUID=38ffe8c9:f3922db9:60bb1522:80fea016
ARRAY /dev/md/md_d22
   metadata=1.2 num-devices=5 spares=1
   UUID=ebb4ea67:b31b2105:498d81af:9b4f45d3
ARRAY /dev/md/md_d23
   metadata=1.2 num-devices=5 spares=1
   UUID=da07407f:deeb8906:7a70ae82:6b1d8c4a

-------------------------------------

Your suggestions are most welcome ... thanks.

Chuck

Reply via email to