Re: Two-disk RAID5?
On Wed, 26 Apr 2006, John Rowe wrote: I'm about to create a RAID1 file system and a strange thought occurs to me: if I create a two-disk RAID5 array then I can grow it later by the simple expedient of adding a third disk and hence doubling its size. No. When one of the 2 drives in your RAID5 dies, and all you have for some blocks is parity info, how will the missing data be reconstructed? You could [I suspect] create a 2 disk RAID5 in degraded mode (3rd member missing), but it'll obviously lack redundancy until you add a 3rd disk, which won't add anything to your RAID5 storage capacity. -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two-disk RAID5?
No. When one of the 2 drives in your RAID5 dies, and all you have for some blocks is parity info, how will the missing data be reconstructed? You could [I suspect] create a 2 disk RAID5 in degraded mode (3rd member missing), but it'll obviously lack redundancy until you add a 3rd disk, which won't add anything to your RAID5 storage capacity. IMO if you have a 2-disk raid5, the parity for each block is the same as the data. There is performance drop as I suspect md isn't smart enough to read data from both disks, but that's all. When one disk fails, the (lone) parity block is quite enough to reconstruct. With XOR parity, you can always assume any amount of additional disks full of zero, it doesn't really change the algorithm. (maybe mdadm could/can change a raid-1 into raid5 by just changing the superblocks, for the purpose of expanding into more disks..) - tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Two-disk RAID5?
On Wed, 26 Apr 2006, Jansen, Frank wrote: It is not possible to flip a bit to change a set of disks from RAID 1 to RAID 5, as the physical layout is different. As Tuomas pointed out though, a 2 disk RAID5 is kind of a special case where all you have is data and parity which is actually also just data. Seems kind of like a RAID1 with extra overhead. I don't think I've ever heard of a RAID5 implementation willing to handle 3 drives though. I suspect I should have just kept out of this, and waited for someone like Neil to answer authoratatively. So...Neil, what's the right answer to Tuomas's 2 disk RAID5 question? :) -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Two-disk RAID5?
On Wednesday April 26, [EMAIL PROTECTED] wrote: I suspect I should have just kept out of this, and waited for someone like Neil to answer authoratatively. So...Neil, what's the right answer to Tuomas's 2 disk RAID5 question? :) .. and a deep resounding voice from on-high spoke and in it's infinite wisdom it said yeh, whatever The data layout on a 2disk raid5 and a 2 disk raid1 is identical (if you ignore chunksize issues (raid1 doesn't need one) and the superblock (which isn't part of the data)). Each drive contains identical data(*). Write throughput to a the r5 would be a bit slower because data is always copied in memory first, then written. Read through put would be largely the same if the r5 chunk size was fairly large, but much poorer for r5 if the chunksize was small. Converting a raid1 to a raid5 while offline would be quite straight forward except for the chunksize issue. If the r1 wasn't a multiple of the chunksize you chose for r5, then you would lose the last fraction of a chunk. So if you are planning to do this, set the size of your r1 to something that is nice and round (e.g. a multiple of 128k). Converting a raid1 to a raid5 while online is something I have been thinking about, but it is not likely to happen any time soon. I think that answers all the issues. NeilBrown (*) The term 'mirror' for raid1 has always bothered me because a mirror presents a reflected image, while raid1 copies the data without any transformation. With a 2drive raid5, one drive gets the original data, and the other drive gets the data after it has been 'reflected' through an XOR operation, so maybe a 2drive raid5 is really a 'mirrored' pair Except that the data is still the same as XOR with 0 produces no change. So, if we made a tiny change to raid5 and got the xor operation to start with 0xff in every byte, then the XOR would reflect each byte in a reasonable meaningful way, and we might actually get a mirrored pair!!! But I don't think that would provide any real value :-) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Trying to start dirty, degraded RAID6 array
The short version: I have a 12-disk RAID6 array that has lost a device and now whenever I try to start it with: mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1 I get: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error And in dmesg: md: bindsdk1 md: bindsdi1 md: bindsdj1 md: bindsde1 md: bindsdf1 md: bindsdg1 md: bindsdb1 md: bindsdd1 md: bindsda1 md: bindsdc1 md: bindsdl1 md: md0: raid array is not clean -- starting background reconstruction raid6: device sdl1 operational as raid disk 0 raid6: device sdc1 operational as raid disk 11 raid6: device sda1 operational as raid disk 10 raid6: device sdd1 operational as raid disk 9 raid6: device sdb1 operational as raid disk 8 raid6: device sdg1 operational as raid disk 6 raid6: device sdf1 operational as raid disk 5 raid6: device sde1 operational as raid disk 4 raid6: device sdj1 operational as raid disk 3 raid6: device sdi1 operational as raid disk 2 raid6: device sdk1 operational as raid disk 1 raid6: cannot start dirty degraded array for md0 RAID6 conf printout: --- rd:12 wd:11 fd:1 disk 0, o:1, dev:sdl1 disk 1, o:1, dev:sdk1 disk 2, o:1, dev:sdi1 disk 3, o:1, dev:sdj1 disk 4, o:1, dev:sde1 disk 5, o:1, dev:sdf1 disk 6, o:1, dev:sdg1 disk 8, o:1, dev:sdb1 disk 9, o:1, dev:sdd1 disk 10, o:1, dev:sda1 disk 11, o:1, dev:sdc1 raid6: failed to run raid set md0 md: pers-run() failed ... I'm 99% sure the data is ok and I'd like to know how to force the array online. Longer version: A couple of days ago I started having troubles with my fileserver mysteriously hanging during boot (I was messing with trying to get Xen running at the time, so lots of reboots were involved). I finally nailed it down to the autostarting of the RAID array. After several hours of pulling CPUs, SATA cards, RAM (not to mention some scary problems with memtest86+ that turned out to be because USB Legacy was enabled) I finally managed to figure out that one of my drives would simply stop transferring data after about the first gig (tested with dd, monitoring with iostat). About 30 seconds after the drive stops, the rest of the machine also hangs. Interestingly, there are no error messages anywhere I could find indicating the drive was having problem. Even its SMART test (smartctl -t long) says it's ok. This made the problem substantially more difficult to figure out. I then tried to start the array without the broken disk and had the problem mentioned in the short version above - the array wouldn't start, presumably because its rebuild had been started and (uncleanly) stopped about a dozen times since it last succeeeded. I finally managed to get the array online by starting it with all the disks, then immediately knocking the one I knew to be bad offline with 'mdadm /dev/md0 -f /dev/sdh1' before it hit the point where it would hang. After that the rebuild completed without error (I didn't touch the machine at all while it was rebuilding). However, a few hours after the rebuild completed, a power failure killed the machine again and now I can't start the array, as outlined in the short version above. I must admit I find it a bit weird that the array is dirty and degraded after it had successfully completed a rebuild. Unfortunately the original failed drive (/dev/sdh) is no longer available, so I can't do my original trick again. I'm pretty sure - based on the rebuild completing previously - that the data will be fine if I can just get the array back online, is there some sort of --really-force switch to mdadm ? Can the array be brought back online *without* triggering a rebuild, so I can get as much data as possible off and then start from scratch again ? CS Here is the 'mdadm --examine /dev/sdX' output for each of the remaining drives, if it is helpful: /dev/sda1: Magic : a92b4efc Version : 00.90.02 UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298 Creation Time : Wed Feb 1 01:09:11 2006 Raid Level : raid6 Device Size : 244195904 (232.88 GiB 250.06 GB) Array Size : 2441959040 (2328.83 GiB 2500.57 GB) Raid Devices : 12 Total Devices : 11 Preferred Minor : 0 Update Time : Wed Apr 26 22:30:01 2006 State : active Active Devices : 11 Working Devices : 11 Failed Devices : 1 Spare Devices : 0 Checksum : 1685ebfc - correct Events : 0.11176511 Number Major Minor RaidDevice State this10 81 10 active sync /dev/sda1 0 0 8 1770 active sync /dev/sdl1 1 1 8 1611 active sync /dev/sdk1 2 2 8 1292 active sync /dev/sdi1 3 3 8 1453 active sync /dev/sdj1 4 4 8 654 active sync /dev/sde1 5 5 8 815 active sync /dev/sdf1 6 6 8 976 active sync /dev/sdg1 7 7 007 faulty removed
Re: Trying to start dirty, degraded RAID6 array
On Thursday April 27, [EMAIL PROTECTED] wrote: The short version: I have a 12-disk RAID6 array that has lost a device and now whenever I try to start it with: mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1 I get: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error ... raid6: cannot start dirty degraded array for md0 The '-f' is meant to make this work. However it seems there is a bug. Could you please test this patch? It isn't exactly the right fix, but it definitely won't hurt. Thanks, NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./super0.c |1 + 1 file changed, 1 insertion(+) diff ./super0.c~current~ ./super0.c --- ./super0.c~current~ 2006-03-28 17:10:51.0 +1100 +++ ./super0.c 2006-04-27 10:03:40.0 +1000 @@ -372,6 +372,7 @@ static int update_super0(struct mdinfo * if (sb-level == 5 || sb-level == 4 || sb-level == 6) /* need to force clean */ sb-state |= (1 MD_SB_CLEAN); + rv = 1; } if (strcmp(update, assemble)==0) { int d = info-disk.number; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linear writes to raid5
On Thursday April 20, [EMAIL PROTECTED] wrote: Neil Brown wrote: What is the rationale for your position? My rationale was that if md layer receives *write* requests not smaller than a full stripe size, it is able to omit reading data to update, and can just calculate new parity from the new data. Hence, combining a dozen small write requests coming from a filesystem to form a single request = full stripe size should dramatically increase performance. That makes sense. However in both cases (above and below raid5), the device receiving the requests is in a better position to know what size is a good size than the client sending the requests. That is exactly what the 'plugging' concept is for. When a request arrives, the device is 'plugged' so that it won't process new requests, and the request plus any following requests are queued. At some point the queue is unplugged and the device should be able to collect related requests to make large requests of an appropriate size and alignment for the device. The current suggestion is that plugging is quite working right for raid5. That is certainly possible. Eg, when I use dd with O_DIRECT mode (oflag=direct) and experiment with different block size, write performance increases alot when bs becomes full stripe size. Ofcourse it decreases again when bs is increased a bit further (as md starts reading again, to construct parity blocks). Yes O_DIRECT is essentially saying I know what I am doing and I want to bypass all the smarts and go straight to the device. O_DIRECT requests should certainly be sized and aligned to make the device. For non-O_DIRECT it shouldn't matter so much. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trying to start dirty, degraded RAID6 array
Neil Brown wrote: The '-f' is meant to make this work. However it seems there is a bug. Could you please test this patch? It isn't exactly the right fix, but it definitely won't hurt. Thanks, Neil, I'll give this a go when I get home tonight. Is there any way to start an array without kicking off a rebuild ? CS - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trying to start dirty, degraded RAID6 array
On Thursday April 27, [EMAIL PROTECTED] wrote: Neil Brown wrote: The '-f' is meant to make this work. However it seems there is a bug. Could you please test this patch? It isn't exactly the right fix, but it definitely won't hurt. Thanks, Neil, I'll give this a go when I get home tonight. Is there any way to start an array without kicking off a rebuild ? echo 1 /sys/module/md_mod/parameters/start_ro If you do this, then arrays will be read-only when they are started, and so will not do a rebuild. The first write request to the array (e.g. if you mount a filesystem) will cause a switch to read/write and any required rebuild will start. echo 0 will revert the effect. This requires a reasonably recent kernel. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html