Re: mdadm error when trying to replace a failed drive in RAID5 array
On Jan 20, 2008, at 1:21 PM, Steve Fairbairn wrote: So the device I was trying to add was about 22 blocks too small. Taking Neils suggestion and looking at /proc/partitions showed this up incredibly quickly. Always leave a little space in the end; it makes sure you don't run into that particular problem when you replace disks and the end of the disk is often significantly slower anyway. From before the write-intent bitmap stuff I have/had a habit of creating separate raids on relatively small partitions (joined together by LVM). I'd just pick a fixed size (on 500GB disks I'd use 90GB per partition for example) and create however many partitions would fit like that and leave the end for scratch space / experiments / whatever. - ask -- http://develooper.com/ - http://askask.com/ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: mdadm error when trying to replace a failed drive in RAID5 array
> -Original Message- > From: Neil Brown [mailto:[EMAIL PROTECTED] > Sent: 20 January 2008 20:37 > > > md: hdd1 has invalid sb, not importing! > > md: md_import_device returned -22 > > In 2.6.18, the only thing that can return this message > without other more explanatory messages are: > > 2/ If the device appears to be too small. > > Maybe it is the later, though that seems unlikely. > [EMAIL PROTECTED] ~]# mdadm /dev/md0 --verbose --add /dev/hdd1 mdadm: added /dev/hdd1 HUGE thanks to Neil, and one white gold plated donkey award to me. OK. When I created /dev/md1 after creating /dev/md0, I was using a mishmash of disks I had lying around. As this selection of disks used differing block sizes, I chose to create the raid partitions from the first block, to a set size (+250G). When I reinstalled the disk for going into /dev/md0, I partitioned the disk the same way (+500G), which it turns out isn't how I created the partitions when I created that array. So the device I was trying to add was about 22 blocks too small. Taking Neils suggestion and looking at /proc/partitions showed this up incredibly quickly. My sincere apologies for wasting all your time on a stupid error, and again many many thanks for the solution... md0 : active raid5 hdd1[5] sdd1[4] sdc1[2] sdb1[1] sda1[0] 1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUU_U] [>] recovery = 0.9% (4430220/488383936) finish=1110.8min speed=7259K/sec Steve. No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.7/1233 - Release Date: 19/01/2008 18:37 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm error when trying to replace a failed drive in RAID5 array
On Sat Jan 19, 2008 at 11:08:43PM -, Steve Fairbairn wrote: > > Hi All, > > I have a Software RAID 5 device configured, but one of the drives > failed. I removed the drive with the following command... > > mdadm /dev/md0 --remove /dev/hdc1 > > Now, when I try to insert the replacement drive back in, I get the > following... > > [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1 > mdadm: add new device failed for /dev/hdc1 as 5: Invalid argument > > [EMAIL PROTECTED] mdadm-2.6.4]# dmesg | tail > ... > md: hdc1 has invalid sb, not importing! > md: md_import_device returned -22 > md: hdc1 has invalid sb, not importing! > md: md_import_device returned -22 > I've had the same error message trying to add a drive into an array myself - in my case I'm almost certain it's because the drive is slightly smaller than the others in the array (the array's currently growing so I haven't delved any further yet). Have you checked the actual partition sizes? Particularly if it's a different type of drive as drives from different manufacturers can vary by quite a large amount. Cheers, Robin -- ___ ( ' } | Robin Hill<[EMAIL PROTECTED]> | / / ) | Little Jim says | // !! | "He fallen in de water !!" | pgpKJVYKhAk6m.pgp Description: PGP signature
RE: mdadm error when trying to replace a failed drive in RAID5 array
Thanks for the response Bill. Neil has responded to me a few times, but I'm more than happy to try and keep it on this list instead as it feels like I'm badgering Neil which really isn't fair... Since my initial email, I got to the point of believing it was down to the superblock, and that --zero-superblock wasn't working, so a good few hours and a dd if=/dev/zero of=/dev/hdc later, I tried adding it again to the same result. As it happens, I did the --zero-superblock, then tried to insert it again and then examined (mdadm -E) again and the block was 'still there' - What really happened was that the act of trying to add it writes in the superblock. So --zero-superblock is working fine for me, but it's still refusing to add the device. The only other thing I've tried is moving the replacement drive to /dev/hdd instead (secondary slave) with an small old HD I had lying around as hdc. [EMAIL PROTECTED] ~]# mdadm -E /dev/hdd1 mdadm: No md superblock detected on /dev/hdd1. [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdd1 mdadm: add new device failed for /dev/hdd1 as 5: Invalid argument [EMAIL PROTECTED] ~]# dmesg | tail ... md: hdd1 has invalid sb, not importing! md: md_import_device returned -22 [EMAIL PROTECTED] ~]# mdadm -E /dev/hdd1 /dev/hdd1: Magic : a92b4efc Version : 00.90.00 UUID : 382c157a:405e0640:c30f9e9e:888a5e63 Creation Time : Wed Jan 9 18:57:53 2008 Raid Level : raid5 Used Dev Size : 488383936 (465.76 GiB 500.11 GB) Array Size : 1953535744 (1863.04 GiB 2000.42 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jan 20 13:02:00 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 1 Spare Devices : 0 Checksum : 198f8fb4 - correct Events : 0.348270 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 5 22 65 -1 spare /dev/hdd1 0 0 8 1 0 active sync /dev/sda1 1 1 8 17 1 active sync /dev/sdb1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 49 4 active sync /dev/sdd1 I have mentioned it to Neil, but didn't mention it here before. I am a C developer by trade, so can easily devle into the mdadm source for extra debug if anyone thinks it could help. I could also delve into md in the kernel if really wanted, but my knowledge of building kernels on linux is some 4 years+ out of date and forgotten, so if that's a yes, then some pointers on how to get the centos kernel config and a choice of kernel from www.kernel.org, or from the centos distro would be invaluable. I'm away for a few days from tomorrow and probably wont be able to do much if anything until I'm back on Thursday, so please be patient if I don't respond before then. Many Thanks, Steve. No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.7/1233 - Release Date: 19/01/2008 18:37 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mdadm error when trying to replace a failed drive in RAID5 array
Hi All, Firstly, I must express my thanks to Neil Brown for being willing to respond to the direct email I sent him as I couldn't for the life of me find any forums on mdadm or this list... I have a Software RAID 5 device configured, but one of the drives failed. I removed the drive with the following command... mdadm /dev/md0 --remove /dev/hdc1 [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md1 : active raid5 hdk1[5] hdi1[3] hdh1[2] hdg1[1] hde1[0] 976590848 blocks level 5, 64k chunk, algorithm 2 [5/4] [_] [>] recovery = 22.1% (54175872/244147712) finish=3615.3min speed=872K/sec md0 : active raid5 sdd1[4] sdc1[2] sdb1[1] sda1[0] 1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUU_U] unused devices: Please ignore /dev/md1 for now at least. Now my array (/dev/md0) shows the following... [EMAIL PROTECTED] ~]# mdadm -QD /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Wed Jan 9 18:57:53 2008 Raid Level : raid5 Array Size : 1953535744 (1863.04 GiB 2000.42 GB) Used Dev Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jan 4 04:28:03 2005 State : clean, degraded Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 382c157a:405e0640:c30f9e9e:888a5e63 Events : 0.337650 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 0 0 3 removed 4 8 49 4 active sync /dev/sdd1 Now, when I try to insert the replacement drive back in, I get the following... [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1 mdadm: add new device failed for /dev/hdc1 as 5: Invalid argument It seems to be that mdadm is trying to add the device as number 5 instead of replacing number 3, but I have no idea why, or how to make it replace number 3. --- Neil has explained to me already that the drive should be added as 5, and then switched to 3 after a a rebuild is complete. Neil aslo asked me if dmesg showed up anything when I tried adding the drive [EMAIL PROTECTED] mdadm-2.6.4]# dmesg | tail ... md: hdc1 has invalid sb, not importing! md: md_import_device returned -22 md: hdc1 has invalid sb, not importing! md: md_import_device returned -22 I have updated mdadm to the latest version I can find... [EMAIL PROTECTED] ~]# mdadm --version mdadm - v2.6.4 - 19th October 2007 Still get the same error. I'm hoping someone will have some suggestion as to how to sort this out. Backing up nearly 2TB of data isn't really a viable option for me, so I'm quite desperate to get the redundancy back. My linux distribution is a relatively new installation from CentOS 5.1 ISOs. The Kernel version is [EMAIL PROTECTED] ~]# uname -a Linux space.homenet.com 2.6.18-53.1.4.el5 #1 SMP Fri Nov 30 00:45:55 EST 2007 x86_64 x86_64 x86_64 GNU/Linux Many Thanks, Steve. No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.7/1232 - Release Date: 18/01/2008 19:32 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html