Re: [gentoo-user] Oh no! My raid5 is not assembled, and how I fixed it.

2012-06-09 Thread Walter Dnes
  Are you using sys-fs/mdadm-3.2.4 or sys-fs/mdadm-3.2.5?  If so see
http://www.gossamer-threads.com/lists/gentoo/dev/255107 (Gentoo Dev
list) and bug https://bugs.gentoo.org/show_bug.cgi?id=416081
-- 
Walter Dnes waltd...@waltdnes.org



Re: [gentoo-user] Oh no! My raid5 is not assembled, and how I fixed it.

2012-06-09 Thread Volker Armin Hemmann
Am Samstag, 9. Juni 2012, 10:21:51 schrieb Walter Dnes:
   Are you using sys-fs/mdadm-3.2.4 or sys-fs/mdadm-3.2.5?  If so see
 http://www.gossamer-threads.com/lists/gentoo/dev/255107 (Gentoo Dev
 list) and bug https://bugs.gentoo.org/show_bug.cgi?id=416081

yeah, saw similar for a while. My autoassembled raid5 setups still work, 
autoassembled raid1 to. But the stuff that is supposed to be assembled by mdadm 
has to be done by hand.

So much about 'user space assembling is so much better and robust'. As shown: 
it isn't.

Pisses me off everytime I reboot. which is, luckily, not THAT often.

-- 
#163933



Re: [gentoo-user] Oh no! My raid5 is not assembled, and how I fixed it.

2012-06-09 Thread Paul Hartman
On Sat, Jun 9, 2012 at 9:21 AM, Walter Dnes waltd...@waltdnes.org wrote:
  Are you using sys-fs/mdadm-3.2.4 or sys-fs/mdadm-3.2.5?  If so see
 http://www.gossamer-threads.com/lists/gentoo/dev/255107 (Gentoo Dev
 list) and bug https://bugs.gentoo.org/show_bug.cgi?id=416081

Yes, 3.2.5. So, that was the culprit!

I just did a portage sync and now it wants to downgrade back to
3.2.3-r1. I had bad luck on the timing for my quarterly reboot, I
guess. :) Thanks.



[gentoo-user] Oh no! My raid5 is not assembled, and how I fixed it.

2012-06-08 Thread Paul Hartman
I rebooted to upgrade to kernel 3.4.1. I accidentally had the
combination of uvesafb, nouveau kms and nvidia-drivers enabled, which
caused my system to go blank after rebooting. I was not able to SSH
into the machine, so I did the magic-sysrq REISUB to reboot into my
previous kernel. When it booted into the previous kernel (3.3.5), I
saw a whole bunch of I/O error messages scrolling by, for every disk
in my RAID array. I have never seen these errors before. I hoped it
was just some module confusion because I was booting a different
kernel. I was able to boot into my root filesystem, but the raid did
not assemble. After blacklisting nouveau and rebooting into 3.4.1,
there were none of the I/O errors mentioned, but mdraid failed with
this message:

 * Starting up RAID devices ...
 * mdadm main: failed to get exclusive lock on mapfile
mdadm: /dev/md2 is already in use.
mdadm: /dev/md1 is already in use.
 [ !! ]

Oh no! Heart beating quickly... terabytes of data... Google finds
nothing useful with these messages.

My mdadm.conf has not changed, no physical disks have been added or
removed in over a year. mdadm configuration has not changed at all. I
have of course updated hundreds of packages since my last reboot,
including mdadm.

From the /proc/mdstat it shows that it's not detecting all of the
member disks/partitions:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath] [faulty]
md1 : inactive sdb1[0](S)
  1048575868 blocks super 1.1

md2 : inactive sdf2[5](S)
  904938415 blocks super 1.1

unused devices: none


Those normally included all disks in sdb through sdf, partition 1 and
2 from each disk.

My mdadm.conf has always had only two ARRAY lines (for /dev/md1 and
/dev/md2) with the UUID of the arrays. Previously the member disks
were always automatically detected and assembled when I booted and
started mdadm. Running mdadm --query --examine on the partitions
showed they did still contain the valid raid information. So I felt
confident in trying to reassemble it.

To fix, I did:

/etc/init.d/mdraid stop

to stop the array (could have also done mdadm -Ss, which is what the
stop script did)

Then I edited mdadm.conf and added a device line:

DEVICE /dev/sd[bcdef][12]

So now I am telling it specifically where to look. I then restarted mdraid:

/etc/init.d/mdraid start

et voilà! my raid was back and functioning. I don't know if this is a
result of a change in kernel or mdadm behavior, or simply a result of
my REISUB that left the raid in a strange state.