Re: strange RAID5 problem
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 But, I get this error message: mdadm: hot add failed for /dev/sdw1: No such device What? We just made the partition on sdw a moment ago in fdisk. It IS there! I don't believe you, prove it (/proc/partitions) So. we look around a bit: # /cat/proc/mdstat md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11] sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2] sdr1[1] 5860631040 blocks Yup, that looks correct, missing sdw1[6] no, it does not, it is 'inactive' [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid1] [raid5] ... md3 : inactive sdq1[0] sdaf1[15] sdae1[14] sdad1[13] sdac1[12] sdab1[11] sdaa1[10] sdz1[9] sdy1[8] sdx1[7] sdv1[5] sdu1[4] sdt1[3] sds1[2] sdr1[1] 5860631040 blocks ... [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 mdadm: hot add failed for /dev/sdw1: No such device OK, let's mount the degraded RAID and try to copy the files to somewhere else, so we can make it from scratch: [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/ /dev/md3: Invalid argument mount: /dev/md3: can't read superblock it is still inactive, no wonder you cannot access it. try running the array, or really stop it before assembling. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm --assemble /dev/md3 /dev/sdq1 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 /dev/sdz1 /dev/sdaa1 /dev/sdab1 /dev/sdac1 /dev/sdad1 /dev/sdae1 /dev/sdaf1 mdadm: superblock on /dev/sdw1 doesn't match others - assembly aborted Have you tried zeroing the superblock with mdadm --misc --zero-superblock /dev/sdw1 and then adding it in? [EMAIL PROTECTED] ~]# mount /dev/md3 /all/boxw16/ /dev/md3: Invalid argument mount: /dev/md3: can't read superblock Wow that looks messy. ummm. about the only thing I can think of is failing /dev/sdw1 and removing it (I know it says it's not there but...) Also, not biggest expert on raid around here. ;) -- To the extent that we overreact, we proffer the terrorists the greatest tribute. - High Court Judge Michael Kirby - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slackware -current softraid5 boot problem - additional info
Am Dienstag, 9. Mai 2006 07:50 schrieb Luca Berra: you don't give a lot of information about your setup, You're sure right here, I was a bit off track yesterday from tinkering till night - info below. in any case it could be something like udev and the /dev/sdd device node not being available at boot? Ok: Slackware-current with kernel 2.6.14.6, *no* udev, plain old hotplug I had to put the raid start script in a reasonable place myself (not preconfed in Slack) so I have to figure yet if sees /etc/mdadm.conf when the script is called. (If presence of mdadm.conf is totally uninteresting, let me know, I just started on raid.) The other disks are seen fine, and since they are all the same type on the same controller there's no reason why it is not seen then. (Unless for some reason mdadm talks to the *last* disk first and then stops - else it should complain about sda rather.) * mdadm -E info * # mdadm -E /dev/sdd /dev/sdd: Magic : a92b4efc Version : 00.90.02 UUID : db7e5b65:e35c69dc:7c267a5a:e676c929 Creation Time : Mon May 8 00:05:16 2006 Raid Level : raid5 Device Size : 244198464 (232.89 GiB 250.06 GB) Array Size : 732595392 (698.66 GiB 750.18 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Tue May 9 00:43:46 2006 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : 61f0ffd6 - correct Events : 0.24796 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 3 8 483 active sync /dev/sdd 0 0 800 active sync /dev/sda 1 1 8 161 active sync /dev/sdb 2 2 8 322 active sync /dev/sdc 3 3 8 483 active sync /dev/sdd * mdstat * Once I started the array manually (which works fine) mdstats look like: # cat /proc/mdstat Personalities : [raid5] md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1] 732563712 blocks level 5, 32k chunk, algorithm 2 [4/4] [] unused devices: none -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C+++() UL+ P+++ L+++ E-- W++ N o? K- w--(---) !O M+ V- PS++(+) PE(-) Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D G++ e* h++ r%* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slackware -current softraid5 boot problem - additional info
Something fishy here Dexter Filmore wrote: # mdadm -E /dev/sdd Device /dev/sdd # cat /proc/mdstat Personalities : [raid5] md0 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1] 732563712 blocks level 5, 32k chunk, algorithm 2 [4/4] [] Components that are all the first partition. Are you using the whole disk, or the first partition? It appears that to some extent, you are using both. Perhaps some confusion on that point between your boot scripts and your manual run explains things? -Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
Luca Berra wrote: On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 But, I get this error message: mdadm: hot add failed for /dev/sdw1: No such device What? We just made the partition on sdw a moment ago in fdisk. It IS there! I don't believe you, prove it (/proc/partitions) I understand. Here we go then. Devices in question bracketed with **: [EMAIL PROTECTED] ~]# cat /proc/partitions major minor #blocks name 3 0 117220824 hda 3 1 104391 hda1 3 22008125 hda2 3 3 115105725 hda3 364 117220824 hdb 365 104391 hdb1 3662008125 hdb2 367 115105725 hdb3 8 0 390711384 sda 8 1 390708801 sda1 816 390711384 sdb 817 390708801 sdb1 832 390711384 sdc 833 390708801 sdc1 848 390711384 sdd 849 390708801 sdd1 864 390711384 sde 865 390708801 sde1 880 390711384 sdf 881 390708801 sdf1 896 390711384 sdg 897 390708801 sdg1 8 112 390711384 sdh 8 113 390708801 sdh1 8 128 390711384 sdi 8 129 390708801 sdi1 8 144 390711384 sdj 8 145 390708801 sdj1 8 160 390711384 sdk 8 161 390708801 sdk1 8 176 390711384 sdl 8 177 390708801 sdl1 8 192 390711384 sdm 8 193 390708801 sdm1 8 208 390711384 sdn 8 209 390708801 sdn1 8 224 390711384 sdo 8 225 390708801 sdo1 8 240 390711384 sdp 8 241 390708801 sdp1 65 0 390711384 sdq 65 1 390708801 sdq1 6516 390711384 sdr 6517 390708801 sdr1 6532 390711384 sds 6533 390708801 sds1 6548 390711384 sdt 6549 390708801 sdt1 6564 390711384 sdu 6565 390708801 sdu1 6580 390711384 sdv 6581 390708801 sdv1 ** 6596 390711384 sdw 6597 390708801 sdw1 ** 65 112 390711384 sdx 65 113 390708801 sdx1 65 128 390711384 sdy 65 129 390708801 sdy1 65 144 390711384 sdz 65 145 390708801 sdz1 65 160 390711384 sdaa 65 161 390708801 sdaa1 65 176 390711384 sdab 65 177 390708801 sdab1 65 192 390711384 sdac 65 193 390708801 sdac1 65 208 390711384 sdad 65 209 390708801 sdad1 65 224 390711384 sdae 65 225 390708801 sdae1 65 240 390711384 sdaf 65 241 390708801 sdaf1 ** 9 0 104320 md0 ** 9 2 5860631040 md2 9 1 115105600 md1 -- Regards, Maurice - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: softraid5 boot problem - partly my fault, solved
Mystery solved: had to probe another module. Wait, wait, I can defend myself :) What led me to believe the controller was autoprobed during boot is that mdadm complained about *sdd*, but not about sd[abc], hence I assumed [abc] were all fine. Plus, I didn't have to probe the module manually after boot was completed (appears that at that point some other module inserted it as a dependency). So - is that how mdadm (or the kernel?) handle raid, is the last disk checked first by design? Dex -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C+++() UL+ P+++ L+++ E-- W++ N o? K- w--(---) !O M+ V- PS++(+) PE(-) Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D G++ e* h++ r%* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
On Tue, May 09, 2006 at 10:16:25AM -0600, Maurice Hilarius wrote: Luca Berra wrote: On Mon, May 08, 2006 at 11:30:52PM -0600, Maurice Hilarius wrote: [EMAIL PROTECTED] ~]# mdadm /dev/md3 -a /dev/sdw1 But, I get this error message: mdadm: hot add failed for /dev/sdw1: No such device What? We just made the partition on sdw a moment ago in fdisk. It IS there! I don't believe you, prove it (/proc/partitions) I understand. Here we go then. Devices in question bracketed with **: ok, now i do. is the /dev/sdw1 device file correctly created? you could try straceing mdadm to see what happens what about the other suggestion? trying to stop the array and restart it, since it is marked as inactive. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange RAID5 problem
Luca Berra wrote: .. I don't believe you, prove it (/proc/partitions) I understand. Here we go then. Devices in question bracketed with **: ok, now i do. is the /dev/sdw1 device file correctly created? you could try straceing mdadm to see what happens what about the other suggestion? trying to stop the array and restart it, since it is marked as inactive. L. Here is what we ended up doing that fixed it. Thanks to Neil on the --force, however even with that, ALL parameters were needed on the mdadm -C or it still refused. We used EVMS to rebuild as that is what originally created the RAID. mdadm -C /dev/md3 --chunk=256 --level=5 --parity=ls --raid-devices=16 --force /dev/evms/.nodes/sdq1 /dev/evms/.nodes/sdr1 /dev/evms/.nodes/sds1 /dev/evms/.nodes/sdt1 /dev/evms/.nodes/sdu1 /dev/evms/.nodes/sdv1 missing /dev/evms/.nodes/sdx1 /dev/evms/.nodes/sdy1 /dev/evms/.nodes/sdz1 /dev/evms/.nodes/sdaa1 /dev/evms/.nodes/sdab1 /dev/evms/.nodes/sdac1 /dev/evms/.nodes/sdad1 /dev/evms/.nodes/sdae1 /dev/evms/.nodes/sdaf1 Notice we are assembling a device with a missing member, and the devices are in order per: mdamd -D /dev/md3 This was the *only* that it would come up. It was mountable, data seems intact. We started the rebuild with no errors by simply adding the device as I mentioned before with -a. Then sped it up via: echo 10 /proc/sys/dev/raid/speed_limit_min Because frankly we have the resources to do so and need it going as fast as possible. -- Regards, Maurice - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html