Thanks! Was:[Re: strange RAID5 problem]
Thanks to Neil, Luca, and CaT, who were all a big help. -- With our best regards, Maurice W. HilariusTelephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:[EMAIL PROTECTED] Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 But, I get this error message: mdadm: hot add failed for /dev/sdw1: No such device What? We just made the partition on sdw a moment ago in fdisk. It IS there! I don't believe you, prove it (/proc/partitions) So. we look around a bit: # /cat/proc/mdstat md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11] sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2] sdr1[1] 5860631040 blocks Yup, that looks correct, missing sdw1[6] no, it does not, it is 'inactive' [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid1] [raid5] ... md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11] sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2] sdr1[1] 5860631040 blocks ... [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 mdadm: hot add failed for /dev/sdw1: No such device OK, let's mount the degraded RAID and try to copy the files to somewhere else, so we can make it from scratch: [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/ /dev/md3: Invalid argument mount: /dev/md3: can't read superblock it is still inactive, no wonder you cannot access it. try running the array, or really stop it before assembling. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1 mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted Have you tried zeroing the superblock with mdadm --misc --zero-superblock /dev/sdw1 and then adding it in? [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/ /dev/md3: Invalid argument mount: /dev/md3: can't read superblock Wow that looks messy. ummm. about the only thing I can think of is failing /dev/sdw1 and removing it (I know it says it's not there but...) Also, not biggest expert on raid around here. ;) -- To the extent that we overreact, we proffer the terrorists the greatest tribute. - High Court Judge Michael Kirby - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
Luca Berra wrote: On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 But, I get this error message: mdadm: hot add failed for /dev/sdw1: No such device What? We just made the partition on sdw a moment ago in fdisk. It IS there! I don't believe you, prove it (/proc/partitions) I understand. Here we go then. Devices in question bracketed with **: [EMAIL PROTECTED] ~]# cat /proc/partitions major minor #blocks name 3 0 117220824 hda 3 1 104391 hda1 3 22008125 hda2 3 3 115105725 hda3 364 117220824 hdb 365 104391 hdb1 3662008125 hdb2 367 115105725 hdb3 8 0 390711384 sda 8 1 390708801 sda1 816 390711384 sdb 817 390708801 sdb1 832 390711384 sdc 833 390708801 sdc1 848 390711384 sdd 849 390708801 sdd1 864 390711384 sde 865 390708801 sde1 880 390711384 sdf 881 390708801 sdf1 896 390711384 sdg 897 390708801 sdg1 8 112 390711384 sdh 8 113 390708801 sdh1 8 128 390711384 sdi 8 129 390708801 sdi1 8 144 390711384 sdj 8 145 390708801 sdj1 8 160 390711384 sdk 8 161 390708801 sdk1 8 176 390711384 sdl 8 177 390708801 sdl1 8 192 390711384 sdm 8 193 390708801 sdm1 8 208 390711384 sdn 8 209 390708801 sdn1 8 224 390711384 sdo 8 225 390708801 sdo1 8 240 390711384 sdp 8 241 390708801 sdp1 65 0 390711384 sdq 65 1 390708801 sdq1 6516 390711384 sdr 6517 390708801 sdr1 6532 390711384 sds 6533 390708801 sds1 6548 390711384 sdt 6549 390708801 sdt1 6564 390711384 sdu 6565 390708801 sdu1 6580 390711384 sdv 6581 390708801 sdv1 ** 6596 390711384 sdw 6597 390708801 sdw1 ** 65 112 390711384 sdx 65 113 390708801 sdx1 65 128 390711384 sdy 65 129 390708801 sdy1 65 144 390711384 sdz 65 145 390708801 sdz1 65 160 390711384 sdaa 65 161 390708801 sdaa1 65 176 390711384 sdab 65 177 390708801 sdab1 65 192 390711384 sdac 65 193 390708801 sdac1 65 208 390711384 sdad 65 209 390708801 sdad1 65 224 390711384 sdae 65 225 390708801 sdae1 65 240 390711384 sdaf 65 241 390708801 sdaf1 ** 9 0 104320 md0 ** 9 2 5860631040 md2 9 1 115105600 md1 -- Regards, Maurice - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
On Tue, May 09, 2006 at 10:16:25AM -0600, Maurice Hilarius wrote: Luca Berra wrote: On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 But, I get this error message: mdadm: hot add failed for /dev/sdw1: No such device What? We just made the partition on sdw a moment ago in fdisk. It IS there! I don't believe you, prove it (/proc/partitions) I understand. Here we go then. Devices in question bracketed with **: ok, now i do. is the /dev/sdw1 device file correctly created? you could try straceing mdadm to see what happens what about the other suggestion? trying to stop the array and restart it, since it is marked as inactive. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
Luca Berra wrote: .. I don't believe you, prove it (/proc/partitions) I understand. Here we go then. Devices in question bracketed with **: ok, now i do. is the /dev/sdw1 device file correctly created? you could try straceing mdadm to see what happens what about the other suggestion? trying to stop the array and restart it, since it is marked as inactive. L. Here is what we ended up doing that fixed it. Thanks to Neil on the --force, however even with that, ALL parameters were needed on the mdadm -C or it still refused. We used EVMS to rebuild as that is what originally created the RAID. mdadm -C /dev/md3 --chunk=256 --level=5 --parity=ls --raid-devices=16 --force /dev/evms/.nodes/sdq1 /dev/evms/.nodes/sdr1 /dev/evms/.nodes/sds1 /dev/evms/.nodes/sdt1 /dev/evms/.nodes/sdu1 /dev/evms/.nodes/sdv1 missing /dev/evms/.nodes/sdx1 /dev/evms/.nodes/sdy1 /dev/evms/.nodes/sdz1 /dev/evms/.nodes/sdaa1 /dev/evms/.nodes/sdab1 /dev/evms/.nodes/sdac1 /dev/evms/.nodes/sdad1 /dev/evms/.nodes/sdae1 /dev/evms/.nodes/sdaf1 Notice we are assembling a device with a missing member, and the devices are in order per: mdamd -D /dev/md3 This was the *only* that it would come up. It was mountable, data seems intact. We started the rebuild with no errors by simply adding the device as I mentioned before with -a. Then sped it up via: echo 10 /proc/sys/dev/raid/speed_limit_min Because frankly we have the resources to do so and need it going as fast as possible. -- Regards, Maurice - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
On Monday May 8, [EMAIL PROTECTED] wrote: Good evening. I am having a bit of a problem with a largish RAID5 set. Now it is looking more and more like I am about to lose all the data on it, so I am asking (begging?) to see if anyone can help me sort this out. Very thorough description, but you omitted the 'dmesg' output corresponding to : [EMAIL PROTECTED] ~]# mdadm --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1 mdadm: failed to RUN_ARRAY /dev/md3: Invalid argument Also, you don't seem to have tried '--force' with '--assemble'. It might help. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html