You're welcome, Dmitrijs.

Now that this system is finally behaving itself (for the first time in
the better part of a year!)  I can now look at this properly functioning
configuration and compare it with the previously broken one.  It is
becoming more clear what happened.

(Note:  all of the following are things I have deduced by reading a vast
amount of material from different sources [and giving myself many
headaches in the process], and some of it may not be an entirely
accurate description of what's really happening.)

The superblock contains a field called name.  It's not like /dev/md1 or 
/dev/md/1 or /dev/md1p1 or anything like that.  On my system, it's more like 5. 
 As it happened I had two very different arrays (different array levels, sizes, 
etc.) that both had that name.  When you run Disk Utility and select a RAID 
array, this is the Name displayed in the right pane;  if it's empty, the pane 
shows Name:  - 
but on my system two arrays showed Name:  5.  I didn't choose this name;  I 
think mdadm assigned it because each array happened to be mounted at /dev/md5 
at the time it was created, and the two arrays were created at different times 
(of course these device names change arbitrarily whenever you boot).

But of course it's more complicated than that.  That's just part of the
name;  the superblock actually contains a 'fully qualified name' that is
of the form hostname:name and Disk Utility only displays the last part
of it.  The hostname part is just the hostname at the time the array is
created.

My system has a long history.  A year ago, its hostname was different,
and one of the arrays was created then.  After the system became
unstable (when I upgraded to Oneiric and gained a particularly buggy
version of mdadm) I stopped using it and backed all the data off.

When Precise became available, I did a fresh install onto a non-RAID
partition and left all the existing RAID partitions in place for testing
purposes.  Because I was no longer going to use this system as a file
server, but as a test machine, I gave it a different hostname
(precisetest).  A bit later I added another array and mdadm assigned it
the name 5, presumably because it was sitting at /dev/md5 at the time.
I did not even notice this at first.  Of course, the fully qualified
names stored in the superblocks were actually different, having
different hostnames on the front, so even though Disk Utility showed the
name 5 on both arrays, they really had different full names.

Although RAID was really messed up on this system, that was only a
problem at boot time.  After booting, I could go into Disk Utility and
manually start all affected arrays.  Once this was done, the system
worked great, until the next reboot.  RAID was working;  I could access
files on any array.  I came to the (perhaps incorrect?) conclusion that
these two arrays having the same name was not a problem.  After all, I
could look at mdadm.conf and see that the arrays really had different
names (the fully qualified names are shown there).

Now I am thinking that it is not sufficient that the fully qualified
names be unique.  I think the part of the name after the : has to be
unique too, otherwise problems happen at boot, at least on Ubuntu
Precise.  But I don't think mdadm upstream intended it to be that way.

So:  some part of the boot process is getting hung up on these
(apparently) duplicate names, because it is looking at just the short
names instead of the fully qualified names.  (In the udev scripts
perhaps?)  If that code looked at hostname:name instead of just name,
perhaps this problem would disappear.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1036366

Title:
  software RAID arrays fail to start on boot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1036366/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to