Re: Trying to start dirty, degraded RAID6 array
Neil Brown wrote: The '-f' is meant to make this work. However it seems there is a bug. Could you please test this patch? It isn't exactly the right fix, but it definitely won't hurt. Thanks, Neil, I'll give this a go when I get home tonight. Is there any way to start an array without kicking off a rebuild ? CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Trying to start dirty, degraded RAID6 array
The short version: I have a 12-disk RAID6 array that has lost a device and now whenever I try to start it with: mdadm -Af /dev/md0 /dev/sd[abcdefgijkl]1 I get: mdadm: failed to RUN_ARRAY /dev/md0: Input/output error And in dmesg: md: bind md: bind md: bind md: bind md: bind md: bind md: bind md: bind md: bind md: bind md: bind md: md0: raid array is not clean -- starting background reconstruction raid6: device sdl1 operational as raid disk 0 raid6: device sdc1 operational as raid disk 11 raid6: device sda1 operational as raid disk 10 raid6: device sdd1 operational as raid disk 9 raid6: device sdb1 operational as raid disk 8 raid6: device sdg1 operational as raid disk 6 raid6: device sdf1 operational as raid disk 5 raid6: device sde1 operational as raid disk 4 raid6: device sdj1 operational as raid disk 3 raid6: device sdi1 operational as raid disk 2 raid6: device sdk1 operational as raid disk 1 raid6: cannot start dirty degraded array for md0 RAID6 conf printout: --- rd:12 wd:11 fd:1 disk 0, o:1, dev:sdl1 disk 1, o:1, dev:sdk1 disk 2, o:1, dev:sdi1 disk 3, o:1, dev:sdj1 disk 4, o:1, dev:sde1 disk 5, o:1, dev:sdf1 disk 6, o:1, dev:sdg1 disk 8, o:1, dev:sdb1 disk 9, o:1, dev:sdd1 disk 10, o:1, dev:sda1 disk 11, o:1, dev:sdc1 raid6: failed to run raid set md0 md: pers->run() failed ... I'm 99% sure the data is ok and I'd like to know how to force the array online. Longer version: A couple of days ago I started having troubles with my fileserver mysteriously hanging during boot (I was messing with trying to get Xen running at the time, so lots of reboots were involved). I finally nailed it down to the autostarting of the RAID array. After several hours of pulling CPUs, SATA cards, RAM (not to mention some scary problems with memtest86+ that turned out to be because "USB Legacy" was enabled) I finally managed to figure out that one of my drives would simply stop transferring data after about the first gig (tested with dd, monitoring with iostat). About 30 seconds after the drive "stops", the rest of the machine also hangs. Interestingly, there are no error messages anywhere I could find indicating the drive was having problem. Even its SMART test (smartctl -t long) says it's ok. This made the problem substantially more difficult to figure out. I then tried to start the array without the broken disk and had the problem mentioned in the short version above - the array wouldn't start, presumably because its rebuild had been started and (uncleanly) stopped about a dozen times since it last succeeeded. I finally managed to get the array online by starting it with all the disks, then immediately knocking the one I knew to be bad offline with 'mdadm /dev/md0 -f /dev/sdh1' before it hit the point where it would hang. After that the rebuild completed without error (I didn't touch the machine at all while it was rebuilding). However, a few hours after the rebuild completed, a power failure killed the machine again and now I can't start the array, as outlined in the "short version" above. I must admit I find it a bit weird that the array is "dirty and degraded" after it had successfully completed a rebuild. Unfortunately the original failed drive (/dev/sdh) is no longer available, so I can't do my original trick again. I'm pretty sure - based on the rebuild completing previously - that the data will be fine if I can just get the array back online, is there some sort of --really-force switch to mdadm ? Can the array be brought back online *without* triggering a rebuild, so I can get as much data as possible off and then start from scratch again ? CS Here is the 'mdadm --examine /dev/sdX' output for each of the remaining drives, if it is helpful: /dev/sda1: Magic : a92b4efc Version : 00.90.02 UUID : 78ddbb47:e4dfcf9e:5f24461a:19104298 Creation Time : Wed Feb 1 01:09:11 2006 Raid Level : raid6 Device Size : 244195904 (232.88 GiB 250.06 GB) Array Size : 2441959040 (2328.83 GiB 2500.57 GB) Raid Devices : 12 Total Devices : 11 Preferred Minor : 0 Update Time : Wed Apr 26 22:30:01 2006 State : active Active Devices : 11 Working Devices : 11 Failed Devices : 1 Spare Devices : 0 Checksum : 1685ebfc - correct Events : 0.11176511 Number Major Minor RaidDevice State this10 81 10 active sync /dev/sda1 0 0 8 1770 active sync /dev/sdl1 1 1 8 1611 active sync /dev/sdk1 2 2 8 1292 active sync /dev/sdi1 3 3 8 1453 active sync /dev/sdj1 4 4 8 654 active sync /dev/sde1 5 5 8 815 active sync /dev/sdf1 6 6 8 976 active sync /dev/sdg1 7 7 007 faulty removed 8 8 8 178
Re: Recommendations for supported 4-port SATA PCI card ?
Brad Campbell wrote: I've been running 3 together in one box for about 18 months, and four in another for a year now... the on board BIOS will only pickup 8 drives, but they work just fine under Linux and recognise all connected drives. What distro and kernel ? I tried this about 2 - 3 months ago and had problems whenever more than two cards were in my system, even if only a single drive was installed. I posted about it here (search for "multiple promise sata150 tx4 cards" back in January). The symptoms were ata timeouts: ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x0c { DriveStatusError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x0c { DriveStatusError } ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x0c { DriveStatusError } ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x0c { DriveStatusError } This would happen consistently with three controllers in, never with two. I tried every possible combination of controllers and drives I could think of, to eliminate any potential of broken hardware being the cause. I should probably give it another go, given there have been a couple of minor kernel versions since then, but I'm surprised to hear you've had it working for so long - no-one was able to give me a solution to my problem at the time (I ended up getting a pair of two-channel SATA cards) and I assumed it was a driver bug of some description. Promise, of course, were useless, saying more than a single controller was an unsupported configuration. I have 11 drives in one box (on promise.. 2 on the on-board VIA and 1 on PATA) and 15 drives in another (all across 4 promise cards).. all on SATA150-TX4 cards.. Performance sucks.. but then when you put 15 drives on a single PCI 33Mhz bus what do you expect ? Great for streaming media though.. (and cheap, and very reliable). The only media errors I get are on the VIA controller. The promise controllers have not had a single media error since they were installed. I have no complaints about the performance (relatively speaking, of course), but I've got the cards in a machine with multiple PCI-X busses, so it's not really bottlenecked there. CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with multiple Promise SATA150 TX4 cards
Erik Slagter wrote: On Tue, 2006-01-24 at 17:40 +, David Greaves wrote: sounds like a spinup time on marginal power to me. No, it's a limitation of the Promise BIOS on the cards, it will only detect a maximum of 8 drives. I had a quick convo with tech support from Promise over this and they told me they don't support more than one card in a machine in any case. (Which is odd given they advertise the ability to RAID-5 across 2 cards!) I'd really consider the PSU. I had all sorts of weird problems with my promise SATA150 TX2plus until I replaced the PSU. Apparently it doesn't suffice to supply _enough_ power. I've since setup the machine so only the motherboard and boot drive are powered from the system PSU and the 12 SATA drives are powered from a separate PSU. Since the machine has previously been running with 8 drives fine on just the system PSU, I feel confident saying power supply has nothing to do with my problems. CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with multiple Promise SATA150 TX4 cards
Christopher Smith wrote: Brad Campbell wrote: Can you send an lspci -vv please? I did have some strange problems with the BIOS setting up weird timing modes on some of the cards. This did not present a reliability problem for me, just performance however. I attached lspci output to my original post. I have also included it on the end of this one (with a slight difference regarding which slots the cards were in, but that makes no difference to the problem). Oops, forgot to attach lspci output - that's what you get for posting just before bed :). [EMAIL PROTECTED] ~]# lspci -vv 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) Subsystem: Intel Corporation E7501 Memory Controller Hub Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 Capabilities: [40] Vendor Specific Information 00:02.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface B PCI-to-PCI Bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 64 Bus: primary=00, secondary=01, subordinate=03, sec-latency=0 I/O behind bridge: a000-bfff Memory behind bridge: fc40-fc6f Prefetchable memory behind bridge: ff30-ff5f Secondary status: 66Mhz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- 00:03.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface C PCI-to-PCI Bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 64 Bus: primary=00, secondary=04, subordinate=06, sec-latency=0 I/O behind bridge: c000-cfff Memory behind bridge: fc70-fc9f Prefetchable memory behind bridge: ff60-ff8f Secondary status: 66Mhz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) (prog-if 00 [UHCI]) Subsystem: Gateway 2000: Unknown device 891f Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Latency: 0 Interrupt: pin A routed to IRQ 209 Region 4: I/O ports at ec00 [size=32] 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) (prog-if 00 [UHCI]) Subsystem: Gateway 2000: Unknown device 891f Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Latency: 0 Interrupt: pin B routed to IRQ 217 Region 4: I/O ports at e880 [size=32] 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) (prog-if 00 [UHCI]) Subsystem: Gateway 2000: Unknown device 891f Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Latency: 0 Interrupt: pin C routed to IRQ 169 Region 4: I/O ports at e800 [size=32] 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 Bus: primary=00, secondary=07, subordinate=07, sec-latency=32 I/O behind bridge: d000-dfff Memory behind bridge: fca0-feaf Prefetchable memory behind bridge: ff90-ff9f Secondary status: 66Mhz- FastB2B+ ParErr- DEVSEL=medium >TAbort- BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Latency: 0 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) (prog-if 8a [Master SecP PriP]) Subsystem: Gateway 2000: Unknown device 891f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status:
Re: Problems with multiple Promise SATA150 TX4 cards
Brad Campbell wrote: I have 3 cards with 12 drives in one box, and 4 card with 15 drives in another. They work just dandy. They are not the fastest machines in the world, and the PCI but sometime groans under the strain, but it's reliable and error-free. Are these the same cards I have ? Can you send an lspci -vv please? I did have some strange problems with the BIOS setting up weird timing modes on some of the cards. This did not present a reliability problem for me, just performance however. I attached lspci output to my original post. I have also included it on the end of this one (with a slight difference regarding which slots the cards were in, but that makes no difference to the problem). My 1st quick and dirty test would be to boot with a UP kernel. (Only because that is all I have also) And to try a vanilla kernel.org kernel rather than the Redhat one. (I have one machine on 2.6.10 and one on 2.6.15-git11. Both are solid) I have tried the latest Fedora Core 4 kernel, both SMP and UP. I have also tried their 2.6.11 UP kernel (2.6.11-1.1369_FC4). All exhibit the problem, although it appears that the 2.6.11 kernel takes slightly longer for it to appear (maybe 10 vs 5 seconds). I have not tested with a vanilla kernel. I'll try to do it tomorrow (although I suspect it won't help). bklaptop:~>ssh storage1 uname -a Linux storage1 2.6.15-git11 #1 Sun Jan 15 22:25:19 GST 2006 i686 GNU/Linux bklaptop:~>ssh srv uname -a Linux srv 2.6.10 #4 Mon Feb 14 23:10:38 GST 2005 i686 GNU/Linux Are you using the cards in standard PCI 33Mhz Slots? I recall an issue a while ago where someone had a big problem with the cards in 66Mhz Slots. The cards were all in PCI-X slots ranging from 64/66 to 64/133. I tried placing one of the cards in the only regular 32/33 PCI slot my motherboard has and it does not help (this is the configuration where the attached lspci was taken). Another test I'd like you to try if you would, is place one or two drives on each controller, so you only have 3 in the system.. and then try to reproduce the error. This configuration also produces the error. Something else I tried was some crappy dual-port SIL-based SATA card with two of the Promise TX4s, and that worked without a problem. While I'm waiting to find out what this is, I might buy another one and use the two of them temporarily so I can build my RAID array, at least. Thanks for your help. CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems with multiple Promise SATA150 TX4 cards
Mark Hahn wrote: I have shuffled the cards,cables and physical drives around to determine that this is not a problem with any of them individually - no matter the combination, it only ever happens to drives that are at sd[abcd] (ie: if I rejig the hardware so the drive at /dev/sdh, which was working fine, is on a different cable and controller, but appears at /dev/sdb, it will produce the errors). did you test the case where all disks had power, but only 8 were plugged into controllers? no individual card, cable or drive was responsible. The errors _only_ occur with three cards in the system, _only_ with whichever drives are attached to the "first" controller (ie: sd[abcd]) and _regardless_ of other system activity. the "first" card would correspond to position on the PCI bus (slot), so perhaps that card is getting iffy power. but did you actually move around which power cables are supplying which disks? Power supply was also one of my suspicions, so I tried powering up half a dozen of the drives off another ATX power supply I had and the remainder off the system PSU. The same problems occurred, which I think rules out the possibility of insufficient power (I did try with all drives powered, but only two cards installed and that worked fine - but I haven't moved around the power plugs of individual drives). I'll try again tonight with all the drives powered off their own PSU, just to be sure. CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Problems with multiple Promise SATA150 TX4 cards
This is probably not entirely the right list for this query, but I figure there's enough people here who have experience in the right places and have probably tried this sort of thing themselves. Apologies to anyone whose time I waste. Or, at the very least, I'm sure someone here can point me in the right direction :). I currently have a machine with two Promise SATA150 TX4 cards (PDC20318, http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=98#) and a 4-drive RAID5 array attached to each. Since I was nearly out of space, I decided it was time to add another 4 drives and expand again. So, I bought another TX4 card and another 4 SATA drives and plonked them in the machine, thinking it would be as easy as the last time I did it (going from 4 to 8 drives). The first problem is that the Promise cards' onboard BIOS(es) only recognise(s) (or, at least, list) 8 of the 12 drives in the machine at boot. However, once Linux has booted it detects all three cards and all twelve drives, so this is a relatively insignificant issue. The second (major) problem is whenever I try to access drives attached to the "first" controller (ie: /dev/sd[abcd], I get ATA timeout errors like these: ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x0c { DriveStatusError } ata3: status=0x51 { DriveReady SeekComplete Error } ata3: error=0x0c { DriveStatusError } ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x0c { DriveStatusError } ata2: status=0x51 { DriveReady SeekComplete Error } ata2: error=0x0c { DriveStatusError } This _only_ happens when accessing drives attached to the "first" controller. I can have 8 simultaneous 'dd if=/dev/sdX of=/dev/null' happening to the other 8 drives for minutes at a time without a problem, but as soon as I fire up a dd to /dev/sd[abcd], I get the errors listed above within seconds. Additionally, a dd to only /dev/sd[abcd] with no other system activity also produces the errors - again within seconds. I have shuffled the cards,cables and physical drives around to determine that this is not a problem with any of them individually - no matter the combination, it only ever happens to drives that are at sd[abcd] (ie: if I rejig the hardware so the drive at /dev/sdh, which was working fine, is on a different cable and controller, but appears at /dev/sdb, it will produce the errors). Removing cards so there are only one or two in the system results in no errors. Similarly, I tried shuffling the hardware around to verify that no individual card, cable or drive was responsible. The errors _only_ occur with three cards in the system, _only_ with whichever drives are attached to the "first" controller (ie: sd[abcd]) and _regardless_ of other system activity. I'm trying to locate where the problem is so I can suggest to the right people that they fix it :). Any suggestions people might have that could possibly workaround (PCI timings ?) or further contacts will be gratefully accepted and tried. Cheers, CS PS: Here are the outputs of 'dmesg' and 'lspci -vvv' for my system: dmesg: [EMAIL PROTECTED] ~]# dmesg Linux version 2.6.14-1.1656_FC4smp ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Thu Jan 5 22:24:06 EST 2006 BIOS-provided physical RAM map: BIOS-e820: - 0009d800 (usable) BIOS-e820: 0009d800 - 0009f800 (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 7fff (usable) BIOS-e820: 7fff - 7000 (ACPI data) BIOS-e820: 7000 - 8000 (ACPI NVS) BIOS-e820: fec0 - fed0 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: fff8 - 0001 (reserved) 1151MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000ff780 Using x86 segment limits to approximate NX protection On node 0 totalpages: 524272 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 294896 pages, LIFO batch:31 DMI 2.3 present. Using APIC driver default ACPI: RSDP (v000 ACPIAM) @ 0x000f62f0 ACPI: RSDT (v001 A M I OEMRSDT 0x01000412 MSFT 0x0097) @ 0x7fff ACPI: FADT (v001 A M I OEMFACP 0x01000412 MSFT 0x0097) @ 0x7fff0200 ACPI: MADT (v001 A M I OEMAPIC 0x01000412 MSFT 0x0097) @ 0x7fff0300 ACPI: OEMB (v001 A M I OEMBIOS 0x01000412 MSFT 0x0097) @ 0x7040 ACPI: DSDT (v001 0AAYB 0AAYB007 0x0007 MSFT 0x010d) @ 0x ACPI: PM-Timer IO Port: 0x408 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled) Processor #6 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled) Processor #1 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x04] l
Re: Raid sync observations
Sebastian Kuzminsky wrote: I just created a RAID array (4-disk RAID-6). When "mdadm -C" returned, /proc/mdstat showed it syncing the new array at about 17 MB/s. "vmstat 1" showed hardly any blocks in or out, and an almost completely idle cpu. This isn't really relevant to your questions but... Why would you use RAID6 and not RAID10 with four disks ? CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: split RAID1 during backups?
Jeff Breidenbach wrote: Hi all, [...] So - I'm thinking of the following backup scenario. First, remount /dev/md0 readonly just to be safe. Then mount the two component paritions (sdc1, sdd1) readonly. Tell the webserver to work from one component partition, and tell the backup process to work from the other component partition. Once the backup is complete, point the webserver back at /dev/md0, unmount the component partitions, then switch read-write mode back on. Isn't this just the sort of scenario LVM snapshots are meant for ? It might not help with the duration aspect, but it will mean your services aren't down/non-redundant for the entire time it takes to backup. Everything on this system seems bottlenecked by disk I/O. That includes the rate web pages are served as well as the backup process described above. While I'm always hungry for perforance tips, faster backups are the current focus. For those interested in gory details such as drive types, NCQ settings, kernel version and whatnot, I dumped a copy of dmesg output here: http://www.jab.org/dmesg I think this might be one of those situations where SCSI really does offer a significant performance advantage, although if you're actually filling up that 500G, it'll be a quite a bit more expensive. See if you can get hold of a reasonably sized array using SCSI drives and do some comparitive benchmarking. You might also want to experiment with different filesystems, although that may not be feasible... CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)
Daniel Pittman wrote: Christopher Smith <[EMAIL PROTECTED]> writes: [...] The components are 12x400GB drives attached to a 3ware 9500s-12 controller. They are configured as "single disks" on the controller, ie: no hardware RAID is involved. A quick question for you, because I have a client looking at 3ware RAID hardware at the moment: Why are you running this as software RAID, rather than using the hardware on the 3ware card? Because after doing some preliminary benchmarks, I've found Linux's software RAID to be significantly faster than 3ware's hardware RAID (at the sacrifice of higher CPU usage, but since the machine has a fairly fast CPU and doesn't do anything else, that's a sacrifice I'm happy to make). I have some iozone and bonnie++ results, but they're at work and I'm at home - I'll post them tomorrow. CS - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID resync stalled at 99.7% ?
In doing some benchmarking, I've found a curious problem - after creating an array the resync has stalled at 99.7%: [EMAIL PROTECTED] ~]# cat /proc/mdstat Personalities : [raid6] md0 : active raid6 sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdi1[7] sdh1[6] sdg1[5] sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0] 4963200 blocks level 6, 32k chunk, algorithm 2 [12/12] [] [===>.] resync = 99.7% (496320/496349) finish=0.0min speed=628K/sec unused devices: [EMAIL PROTECTED] ~]# It's been sitting like this for some time now, and since the resync up until this point progress at about 15M/sec, I can't see any reason to think it will suddenly finish. mdadm -S /dev/md0 simply hangs. This problem is reproducible as well - if I reboot the machine the resync will complete successfully, then if I delete it and try to create another array, exactly the same thing will happen. It's also not a problem with, for example, bad sectors on one of the components, as creating a larger array stalls right near the end as well (the exact percentage varies, but it's always around the 99% part). Does anyone have any ideas ? Some relevant info: The command used to create the aray was: mdadm -C /dev/md0 -l6 -n12 -c 32 -z 496349 /dev/sd[b-m]1 It's a Fedora Core 4 box: [EMAIL PROTECTED] ~]# uname -a Linux justinstalled.syd.nighthawkrad.net 2.6.12-1.1398_FC4smp #1 SMP Fri Jul 15 01:30:13 EDT 2005 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] ~]# The components are 12x400GB drives attached to a 3ware 9500s-12 controller. They are configured as "single disks" on the controller, ie: no hardware RAID is involved. Regards, Chris Smith - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating from non-RAID to RAID-1
Shaun Jackman wrote: I have a single non-RAID SATA drive with Debian (Sarge) installed and data on it. I also have a duplicate blank drive. I would like to migrate from my non-RAID system to a RAID-1 (mirrored) system for redundancy, with each disk being an exact duplicate of the other. Is it possible to do this without having to wipe the first drive clean? I'd appreciate a pointer to a HOWTO or recipe if it answers this specific question. This should get you started: http://xtronics.com/reference/SATA-RAID-debian-for-2.6.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html