Raid-1 on top of multipath
I'm attempting to do host-based mirroring with one LUN on each of two EMC CX storage units, each with two service processors. Connection is via Emulex LP9802, using lpfc driver, and sg. The two LUNs (with two possible paths each) present fine as /dev/sd[a-d]. I have tried using both md-multipath and dm-multipath on separate occasions, and made a md-raid1 device on top of them. Both work when all paths are alive. Both work great when one path to a disk dies. Neither work when both paths to a disk die. md-raid1 does not deduce (or is not informed) that the entire multipath device is dead when dm-multipath is used, and continues to hang i/o on the raid1 device trying to access the sick dm device. When md-raid1 is run on top of md-multipath, I get a race. I'm going to focus on the md-raid1 on md-multpath implementation, as I feel it's more on-topic for this group. sda/sdb share a FC cable, and can access the same LUN through two service processors. The same goes for sdc/sdd. $ mdadm --create /dev/md0 --level=multipath -n2 /dev/sda /dev/sdb mdadm: array /dev/md0 started. $ mdadm --create /dev/md1 --level=multipath -n2 /dev/sdc /dev/sdd mdadm: array /dev/md1 started. $ mdadm --create /dev/md16 --level=raid1 -n2 /dev/md0 /dev/md1 mdadm: array /dev/md16 started. $ cat /proc/mdstat Personalities : [multipath] [raid1] md16 : active raid1 md1[1] md0[0] 52428672 blocks [2/2] [UU] [] resync = 3.3% (1756736/52428672) finish=6.7min speed=125481K/sec md0 : active multipath sda[0] sdb[1] 52428736 blocks [2/2] [UU] md1 : active multipath sdd[0] sdc[1] 52428736 blocks [2/2] [UU] unused devices: none Failing one of the service processor paths results in a [U_] for the multpath device, and business goes on as usual. It has to be added back in by hand when the path is restored, whcih is expected. Failing both of the paths (taking the FC link down) at once results in a crazy race: Apr 20 12:59:21 vmprog kernel: lpfc :02:04.0: 0:0203 Nodev timeout on WWPN 50:6:1:69:30:20:83:45 NPort x7a00ef Data: x8 x7 x0 Apr 20 12:59:21 vmprog kernel: lpfc :02:04.0: 0:0203 Nodev timeout on WWPN 50:6:1:61:30:20:83:45 NPort x7a01ef Data: x8 x7 x0 Apr 20 12:59:26 vmprog kernel: rport-0:0-2: blocked FC remote port time out: removing target and saving binding Apr 20 12:59:26 vmprog kernel: rport-0:0-3: blocked FC remote port time out: removing target and saving binding Apr 20 12:59:26 vmprog kernel: 0:0:1:0: SCSI error: return code = 0x1 Apr 20 12:59:26 vmprog kernel: end_request: I/O error, dev sdb, sector 10998152 Apr 20 12:59:26 vmprog kernel: end_request: I/O error, dev sdb, sector 10998160 Apr 20 12:59:26 vmprog kernel: multipath: IO failure on sdb, disabling IO path. Apr 20 12:59:26 vmprog kernel: ^IOperation continuing on 1 IO paths. Apr 20 12:59:26 vmprog kernel: multipath: sdb: rescheduling sector 10998168 Apr 20 12:59:26 vmprog kernel: 0:0:1:0: SCSI error: return code = 0x1 Apr 20 12:59:26 vmprog kernel: end_request: I/O error, dev sdb, sector 104857344 Apr 20 12:59:26 vmprog kernel: multipath: sdb: rescheduling sector 104857352 Apr 20 12:59:26 vmprog kernel: MULTIPATH conf printout: Apr 20 12:59:26 vmprog kernel: --- wd:1 rd:2 Apr 20 12:59:26 vmprog kernel: disk0, o:0, dev:sdb Apr 20 12:59:26 vmprog kernel: disk1, o:1, dev:sda Apr 20 12:59:26 vmprog kernel: MULTIPATH conf printout: Apr 20 12:59:26 vmprog kernel: --- wd:1 rd:2 Apr 20 12:59:26 vmprog kernel: disk1, o:1, dev:sda Apr 20 12:59:26 vmprog kernel: multipath: sdb: redirecting sector 10998152 to another IO path Apr 20 12:59:26 vmprog kernel: 0:0:0:0: rejecting I/O to dead device Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error. Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 10998168 Apr 20 12:59:26 vmprog kernel: multipath: sdb: redirecting sector 104857344 to another IO path Apr 20 12:59:26 vmprog kernel: 0:0:0:0: rejecting I/O to dead device Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error. Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 104857352 Apr 20 12:59:26 vmprog kernel: multipath: sda: redirecting sector 10998152 to another IO path Apr 20 12:59:26 vmprog kernel: multipath: sda: redirecting sector 104857344 to another IO path Apr 20 12:59:26 vmprog kernel: 0:0:0:0: rejecting I/O to dead device Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error. Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 104857352 Apr 20 12:59:26 vmprog kernel: 0:0:0:0: rejecting I/O to dead device Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error. Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 10998168 Apr 20 12:59:26 vmprog kernel: multipath: sda: redirecting sector 104857344 to another IO path Apr 20 12:59:26 vmprog kernel: 0:0:0:0: rejecting I/O to dead device Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left
Re: Raid on USB flash disk
Hi, I just tried to setup a one-device raid onto an USB flash drive. Creating, setting up ext3 and filling with data was no problem. But when I tried to work with it afterwards the metadevice was unresponsive. I tried both linear and raid0 levels, but that made no difference. For my uneducated eye it looks like something is deadlocking if md tries to read from the device. I'm using kernel 2.6.18 (gentoo) on a VIA EPIA CN1 mainboard with a 2GB USB flash drive (extreme). Please ask if I should provide more information like dmesg or lspci. The main reason why I'm trying this weird setup is that the USB drive is always enumerated last in my kernel, and I want to boot from it. That means every time I add a disk or remove one I have to edit grub.conf and fstab. Very inconvenient. So my idea was to create a single device md on it and leave it to the autodetection to find the device. So I never have to edit /etc/fstab again for a simple hardware change and I'm independent of any enumeration changes in future kernel releases. But unfortunately it doesn't work :-( Any help appreciated. --Arne - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I believe you could use udev persistant names (eg /dev/dsk/by-name) to refer to your USB partitions, instead of /dev/sd?, etc. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Move superblock on partition resize?
I am trying to grow a raid5 volume in-place. I would like to expand the partition boundaries, then grow raid5 into the newly-expanded partitions. I was wondering if there is a way to move the superblock from the end of the old partition to the end of the new partition. I've tried dd if=/dev/sdX1 of=/dev/sdX1 bs=512 count=256 skip=(sizeOfOldPartitionInBlocks - 256) seek=(sizeOfNewPartitionInBlocks - 256) unsuccessfully. Also, copying the last 128KB (256 blocks) of the old partition before the table modification to a file, and placing that data at the tail of the new partition also yields no beans. I can drop one drive at a time from the group, change the partition table, then hot-add it, but a resync times 7 drives is a lot of juggling. Any ideas? Thanks, Rob - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: disappointed with 3ware 9550sx
i want to say up front that i have several 3ware 7504 and 7508 cards which i am completely satisfied with. i use them as JBOD, and they make stellar PATA controllers (not RAID controllers). they're not perfect (they're slow), but they've been rock solid for years. not so the 9550sx. i've been a software raid devotee for years now. i've never wanted to trust my data to hw raid, because i can't look under the covers and see what it's doing, and i'm at the mercy of the vendor when it comes to recovery situations. so why did i even consider hw raid? NVRAM. i wanted the write performance of NVRAM. i debated between areca and 3ware, but given the areca driver wasn't in the kernel (it is now), the lack of smartmontools support for areca, and my experiences with the 7504/7508 i figured i'd stick with what i know. sure i am impressed with the hw raid i/o rates on the 9550sx, especially with the NVRAM. but i am unimpressed with several failures which have occured which evidence suggests are 3ware's fault (or at worst would not have resulted in problems with sw raid). my configuration has 7 disks: - 3x400GB WDC WD4000YR-01PLB0 firmware 01.06A01 - 4x250GB WDC WD2500YD-01NVB1 firmware 10.02E01 those disks and firmwares are on the 3ware drive compatibility list: http://www.3ware.com/products/pdf/Drive_compatibility_list_9550SX_9590SE_2006_09.pdf note that the compatibility list has a column NCQ, which i read as an indication the drive supports NCQ or not. as supporting evidence for this i refer to footnote number 4, which is specifically used on some drives which MUST NOT have NCQ enabled. i had NCQ enabled on all 7 drives. perhaps this is the source of some of my troubles, i'll grant 3ware that. initially i had the firmware from the 9.3.0.4 release on the 9550sx (3.04.00.005) it was the most recent at the time i installed the system. (and the appropriate driver in the kernel -- i think i was using 2.6.16.x at the time.) my first disappointment came when i tried to create a 3-way raid1 on the 3x400 disks. it doesn't support it at all. i had become so accustomed to using a 3-way raid1 with software raid it didn't even occur to me to find out up front if the 3ware could support this. apparently this is so revolutionary an idea 3ware support was completely baffled when i opened a ticket regarding it. why would you want that? it will fail over to a spare disk automatically. still lured by the NVRAM i gave in and went with a 2-way mirror plus a spare. (i prefer the 3-way mirror so i'm never without a redundant copy and don't have to rush to the colo with a replacement when a disk fails.) the 4x250GB were turned into a raid-10. install went fine, testing went fine, system was put into production. second disappointment: within a couple weeks the 9550sx decided it didn't like one of the 400GB disks and knocked it out of the array. here's what the driver had to say about it: Sep 6 23:47:30 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=0. Sep 6 23:47:31 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded unit:unit=0, port=0. Sep 6 23:48:46 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild started:unit=0. Sep 7 00:02:12 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x003B): Rebuild paused:unit=0. Sep 7 00:02:27 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild started:unit=0. Sep 7 09:32:19 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x0005): Rebuild completed:unit=0. the 9550sx could still communicate with the disk -- the SMART log had no indications of error. i converted the drive to JBOD and read and overwrote the entire surface without a problem. i ended up just just converting the drive to the spare disk... but remained worried about why it could have been knocked out of the array. maybe this is a WD bug, maybe it's a 3ware bug, who knows. third disappointment: for a large data copy i inserted a disk into the remaining spare slot on the 3ware. now i'm familiar with 750[48] where i run everything as JBOD and never let 3ware raid touch it. when i inserted this 8th disk i found i had to ask tw_cli to create a JBOD. the disappointment comes here: it zeroed the MBR! fortunately the disk had a single full-sized partition and i could recreate the partition table, but there's no sane reason to zero the MBR just because i asked for the disk to be treated as JBOD (and don't tell me it'll reduce customer support cases because people might reuse a bad partition table from a previously raid disk -- i think it'll create even more problems than that explanation might solve). fourth disappointment: heavy write traffic on one unit can affect other units even though they have separate spindles. my educated guess is the 3ware does not share its cache fairly and the write traffic starves everything else. i described this in a post here
Re: future hardware
I have been using an older 64bit system, socket 754 for a while now. It has the old PCI bus 33Mhz. I have two low cost (no HW RAID) PCI SATA I cards each with 4 ports to give me an eight disk RAID 6. I also have a Gig NIC, on the PCI bus. I have Gig switches with clients connecting to it at Gig speed. As many know you get a peak transfer rate of 133 MB/s or 1064Mb/s from that PCI bus http://en.wikipedia.org/wiki/Peripheral_Component_Interconnect The transfer rate is not bad across the network but my bottle neck it the PCI bus. I have been shopping around for new MB and PCI-express cards. I have been using mdadm for a long time and would like to stay with it. I am having trouble finding an eight port PCI-express card that does not have all the fancy HW RAID which jacks up the cost. I am now considering using a MB with eight SATA II slots onboard. GIGABYTE GA-M59SLI-S5 Socket AM2 NVIDIA nForce 590 SLI MCP ATX. What are other users of mdadm using with the PCI-express cards, most cost effective solution? I agree that SATA drives on PCI-E cards are as much bang-for-buck as is available right now. On the newer platforms, each PCI-E slot, the onboard RAID controller(s), and the 32-bit PCI bus all have discrete paths to the chip. Play with the thing to see how many disks you can put on a controller without a slowdown. Don't assume the controller isn't oversold on bandwidth (I was only able to use three out of four CK804 ports on a GA-K8NE without saturating it; two out of four slots on a PCI Sil3114). Combining the bandwidth of the onboard RAID controller, two SATA slots, and one PCI controller card, sustained reads reach 450MB/s (across 7 disks, RAID-0) with an $80 board, and three $20 controller cards. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: avoiding the initial resync on --create
On Mon, 2006-10-09 at 15:49 +0200, Erik Mouw wrote: There is no way to figure out what exactly is correct data and what is not. It might work right after creation and during the initial install, but after the next reboot there is no way to figure out what blocks to believe. You don't really need to. After a clean install, the operating system has no business reading any block it didn't write to during the install unless you are just reading disk blocks for the fun of it. And any program that depends on data that hasn't first been written to disk is just wrong and stupid anyway. I suppose a partial-stripe write would read back junk data on the other disks, xor with your write, and update the parity block. If you benchmark the disk, you're going to be reading blocks you didn't necessarily write, which could kick out consistency errors. A whole-array consistency check would puke on the out-of-whack parity data. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ATA cables and drives
I'm looking for new harddrives. This is my experience so far. SATA cables: = I have zero good experiences with any SATA cables. They've all been crap so far. 3.5 ATA harddrives buyable where I live: == (All drives are 7200rpm, for some reason.) Hitachi DeskStar 500 GB / 16 MB / 8.5 ms / SATA or PATA Maxtor DiamondMax 11 500 GB / 16 MB / 8.5 ms / SATA or PATA Maxtor MaXLine Pro500 GB / 16 MB / 8.5 ms / SATA or PATA Seagate Barracuda 7200.10 500 GB / 16 MB /?/ SATA or PATA Seagate Barracuda 7200.10 750 GB / 16 MB /?/ SATA or PATA Seagate Barracuda 7200.9 500 GB / 16 MB / 11 ms / SATA or PATA Seagate Barracuda 7200.9 500 GB / 8 MB / 11 ms / SATA or PATA Seagate Barracuda ES 500 GB / 16 MB / 8.5 ms / SATA Seagate Barracuda ES 750 GB / 16 MB / 8.5 ms / SATA Seagate ESATA 500 GB / 16 MB /?/ SATA (external) Seagate NL35.2 ST3500641NS500 GB / 16 MB / 8 ms / ? / SATA Seagate NL35.2 ST3500841NS500 GB / 8 MB / 8 ms / ? / SATA Western Digital SE16 WD5000KS 500 GB / 16 MB / 8.9 ms / SATA Western Digital RE2 WD5000YS 500 GB / 16 MB / 8.7 ms / SATA I've tried Maxtor and IBM (now Hitachi) harddrives. Both makes have failed on me, but most of the time due to horrible packaging. I don't care a split-second whether one kind is marginally faster than the other, so all the reviews on AnandTech etc. are utterly useless to me. There's an infinite number of more effective ways to get better performance than to buy a slightly faster harddrive. I DO care about quality, namely: * How often the drives has catastrophic failure, * How they handle heat (dissipation acceptance - how hot before it fails?), * How big the spare area is, * How often they have single-sector failures, * How long the manufacturer warranty lasts, * How easy the manufacturer is to work with wrt. warranty. I haven't been able to figure the spare area size, heat properties, etc. for any drives. Thus my only criteria so far has been manufacturer warranty: How much bitching do I get when I tell them my drive doesn't work. My main experience is with Maxtor. Maxtor has been none less than superb wrt. warranty! Download an ISO with a diag tool, burn the CD, boot the CD, type in the fault code it prints on Maxtor's site, and a day or two later you've got a new drive in the mail and packaging to ship the old one back in. If something odd happens, call them up and they're extremely helpful. Unfortunately, I lack thorough experience with the other brands. Questions: === A.) Does anyone have experience with returning Hitachi, Seagate or WD drives to the manufacturer? Do they have manufacturer warranty at all? How much/little trouble did you have with Hitachi, Seagate or WD? B.) Can anyone *prove* (to a reasonable degree) that drives from manufacturer H, M, S or WD is of better quality? Has anyone seen a review that heat/shock/stress test drives? C.) Does good SATA cables exist? Eg. cables that lock on to the drives, or backplanes which lock the entire disk in place? Thanks for reading, and thanks in advance for answers (if any) :-). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I've experienced consumer-class drive failures with Seagate, Western Digital, and Maxtor. WD is the only one I've had catastrophically fail on me -- the rest developed bad sectors slowly enough to catch. I have yet to have a Hitachi drive fail. WD got a bad name with me when they sold drives under the same model number while changing to different drives (assembled in Thailand) that performend considerably slower than the originals (from Malaysia). Warranty experiences with Western Digital, Maxtor, and Seagate have been pleasant and quick (although WD and Maxtor sent me refurbished disks). I've never been refused a drive replacement. Maxtor required me to generate a code with their proprietary tool before the website would even talk about a replacement; a way around this was to select Drive doesn't spin up as the symptom :-P Consumer-level drives are made primarily with volume and profit margin in mind. If reliability is a chief concern, most server-class drives are designed with longer service life and higher reliability in mind. If price is paramount, as you said, go for the most inexpensive disk with the longest warranty. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: access array from knoppix
Maybe valid but not helping with my problem since the problem is/was, that /dev/md0 didn't exist at all. mdadm -C won't create device nodes. But I figured the workaround meanwhile, so it doesn't matter anymore. (In case someone wanna know: mknod in /lib/udev/devices does it on a hard disk install, I guess could work in /dev on knoppix, too, haven't tried yet.) Do you want to be able to boot using knoppix all the time, or are you looking for a oneshot solution? I feel like 'mknod' with major 9 is a trivial thing to have to do. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ATA cables and drives
I'm looking for new harddrives. This is my experience so far. SATA cables: = I have zero good experiences with any SATA cables. They've all been crap so far. 3.5 ATA harddrives buyable where I live: == (All drives are 7200rpm, for some reason.) Hitachi DeskStar 500 GB / 16 MB / 8.5 ms / SATA or PATA Maxtor DiamondMax 11 500 GB / 16 MB / 8.5 ms / SATA or PATA Maxtor MaXLine Pro500 GB / 16 MB / 8.5 ms / SATA or PATA Seagate Barracuda 7200.10 500 GB / 16 MB /?/ SATA or PATA Seagate Barracuda 7200.10 750 GB / 16 MB /?/ SATA or PATA Seagate Barracuda 7200.9 500 GB / 16 MB / 11 ms / SATA or PATA Seagate Barracuda 7200.9 500 GB / 8 MB / 11 ms / SATA or PATA Seagate Barracuda ES 500 GB / 16 MB / 8.5 ms / SATA Seagate Barracuda ES 750 GB / 16 MB / 8.5 ms / SATA Seagate ESATA 500 GB / 16 MB /?/ SATA (external) Seagate NL35.2 ST3500641NS500 GB / 16 MB / 8 ms / ? / SATA Seagate NL35.2 ST3500841NS500 GB / 8 MB / 8 ms / ? / SATA Western Digital SE16 WD5000KS 500 GB / 16 MB / 8.9 ms / SATA Western Digital RE2 WD5000YS 500 GB / 16 MB / 8.7 ms / SATA I've tried Maxtor and IBM (now Hitachi) harddrives. Both makes have failed on me, but most of the time due to horrible packaging. I don't care a split-second whether one kind is marginally faster than the other, so all the reviews on AnandTech etc. are utterly useless to me. There's an infinite number of more effective ways to get better performance than to buy a slightly faster harddrive. I DO care about quality, namely: * How often the drives has catastrophic failure, * How they handle heat (dissipation acceptance - how hot before it fails?), * How big the spare area is, * How often they have single-sector failures, * How long the manufacturer warranty lasts, * How easy the manufacturer is to work with wrt. warranty. I haven't been able to figure the spare area size, heat properties, etc. for any drives. Thus my only criteria so far has been manufacturer warranty: How much bitching do I get when I tell them my drive doesn't work. My main experience is with Maxtor. Maxtor has been none less than superb wrt. warranty! Download an ISO with a diag tool, burn the CD, boot the CD, type in the fault code it prints on Maxtor's site, and a day or two later you've got a new drive in the mail and packaging to ship the old one back in. If something odd happens, call them up and they're extremely helpful. Unfortunately, I lack thorough experience with the other brands. Questions: === A.) Does anyone have experience with returning Hitachi, Seagate or WD drives to the manufacturer? Do they have manufacturer warranty at all? How much/little trouble did you have with Hitachi, Seagate or WD? B.) Can anyone *prove* (to a reasonable degree) that drives from manufacturer H, M, S or WD is of better quality? Has anyone seen a review that heat/shock/stress test drives? C.) Does good SATA cables exist? Eg. cables that lock on to the drives, or backplanes which lock the entire disk in place? Thanks for reading, and thanks in advance for answers (if any) :-). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I've experienced consumer-class drive failures with Seagate, Western Digital, and Maxtor. WD is the only one I've had catastrophically fail on me -- the rest developed bad sectors slowly enough to catch. I have yet to have a Hitachi drive fail. WD got a bad name with me when they sold drives under the same model number while changing to different drives (assembled in Thailand) that performend considerably slower than the originals (from Malaysia). Warranty experiences with Western Digital, Maxtor, and Seagate have been pleasant and quick (although WD and Maxtor sent me refurbished disks). I've never been refused a drive replacement. Maxtor required me to generate a code with their proprietary tool before the website would even talk about a replacement; a way around this was to select Drive doesn't spin up as the symptom :-P Consumer-level drives are made primarily with volume and profit margin in mind. If reliability is a chief concern, most server-class drives are designed with longer service life and higher reliability in mind. If price is paramount, as you said, go for the most inexpensive disk with the longest warranty. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scrub was Re: RAID5 Problem - $1000 reward for help
Am Sonntag, 17. September 2006 13:36 schrieben Sie: On 9/17/06, Ask Bjørn Hansen [EMAIL PROTECTED] wrote: It's recommended to use a script to scrub the raid device regularly, to detect sleeping bad blocks early. What's the best way to do that? dd the full md device to /dev/null? echo check /sys/block/md?/md/sync_action Distros may have cron scripts to do this right. And you need a fairly recent kernel. Does this test stress the discs a lot, like a resync? How long does it take? Can I use it on a mounted array? I'd like to add to this question -- does 'check' action on a RAID5 array verify the accuracy of parity data, blindly read back all data, or only verify readability of data blocks? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Interesting RAID checking observations
Just to follow up my speed observations last month on a 6x SATA - 3x PCIe - AMD64 system, as of 2.6.18 final, RAID-10 checking is running at a reasonable ~156 MB/s (which I presume means 312 MB/s of reads), and raid5 is better than the 23 MB/s I complained about earlier, but still a bit sluggish... md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0] 1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] [=...] resync = 6.2% (21564928/343831040) finish=86.0min speed=62429K/sec I'm not sure why the raid5 check can't run at 250 MB/s (300 MB/s disk speed). The processor is idle and can do a lot more than that: raid5: automatically using best checksumming function: generic_sse generic_sse: 6769.000 MB/sec raid5: using function: generic_sse (6769.000 MB/sec) But anyway, it's better, so thank you! I haven't rebooted the celeron I hung for the duration of a RAID-1 check, so I haven't checked that with 2.6.18 yet. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Check the I/O performance on the box. I think the speed indicator comes out of calculations to determine how fast a failing drive would be rebuilt, were you doing a rebuild instead of a check. I like using the dstat tool to get that info at-a-glance (http://dag.wieers.com/home-made/dstat/). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: access *existing* array from knoppix
Am Dienstag, 12. September 2006 16:08 schrieb Justin Piszcz: /dev/MAKEDEV /dev/md0 also make sure the SW raid modules etc are loaded if necessary. Won't work, MAKEDEV doesn't know how to create [/dev/]md0. mknod /dev/md0 b 9 0 perhaps? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reads and cpu
On Monday August 28, [EMAIL PROTECTED] wrote: This might be a dumb question, but what causes md to use a large amount of cpu resources when reading a large amount of data from a raid1 array? I assume you meant raid5 there. md/raid5 shouldn't use that much CPU when reading. It does use more than raid0 as it reads data in the stripe-cache and then copies the data from the stripe cache into the read-buffer. But I wouldn't expect that to come anywhere near 50%. Are you really seeing 'raid5d' using 50% of CPU in 'top' or similar? NeilBrown Sorry for the long response time -- email got lost. top - 16:45:21 up 10 days, 17:41, 2 users, load average: 0.58, 0.17, 0.05 Tasks: 113 total, 2 running, 111 sleeping, 0 stopped, 0 zombie Cpu(s): 1.7% us, 87.7% sy, 6.3% ni, 0.0% id, 0.0% wa, 0.0% hi, 4.3% si Mem: 2061564k total, 2044784k used,16780k free, 1193384k buffers Swap: 4257016k total, 552k used, 4256464k free,24348k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 945 root 10 -5 000 S 44.2 0.0 7:27.73 md11_raid5 Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md enhancements to later versions; I had some other unrelated issues and rolled back to one I've run on for several months). A given 7-disk raid0 array can read 450MB/s (using cat null) and use virtually no CPU resources. (Although cat and kswapd use quite a bit [60%] munching on the data) A raid5 array on the same drive set pulls in at 250MB/s, but md uses roughly 50% of the CPU (the other 50% is spent dealing with the data, saturating the processor). A consistency check on the raid5 array uses roughly 3% of the cpu. It is otherwise ~97% idle. md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0] 248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU] [==..] resync = 72.2% (29976960/41495808) finish=3.7min speed=51460K/sec (~350MB/s aggregate throughput, 50MB/s on each device) Just a friendly question as to why CPU utilization is significantly different between a check and a real-world read on raid5? I feel like if there was vm overhead getting the data into userland, the slowdown would be present in raid0 as well. I assume parity calculations aren't done on a read of the array, which leaves me at my question. Thanks, Rob - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid5 reads and cpu
Rob Bray wrote: This might be a dumb question, but what causes md to use a large amount of cpu resources when reading a large amount of data from a raid1 array? Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md enhancements to later versions; I had some other unrelated issues and rolled back to one I've run on for several months). A given 7-disk raid0 array can read 450MB/s (using cat null) and use virtually no CPU resources. (Although cat and kswapd use quite a bit [60%] munching on the data) A raid5 array on the same drive set pulls in at 250MB/s, but md uses roughly 50% of the CPU (the other 50% is spent dealing with the data, saturating the processor). A consistency check on the raid5 array uses roughly 3% of the cpu. It is otherwise ~97% idle. md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0] 248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU] [==..] resync = 72.2% (29976960/41495808) finish=3.7min speed=51460K/sec (~350MB/s aggregate throughput, 50MB/s on each device) Just a friendly question as to why CPU utilization is significantly different between a check and a real-world read on raid5? I feel like if there was vm overhead getting the data into userland, the slowdown would be present in raid0 as well. I assume parity calculations aren't done on a read of the array, which leaves me at my question. What are you stripe and cache sizes? -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0] 248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU] stripe_cache_size = 256 I've tried increasing it with the same result - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid5 reads and cpu
This might be a dumb question, but what causes md to use a large amount of cpu resources when reading a large amount of data from a raid1 array? Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md enhancements to later versions; I had some other unrelated issues and rolled back to one I've run on for several months). A given 7-disk raid0 array can read 450MB/s (using cat null) and use virtually no CPU resources. (Although cat and kswapd use quite a bit [60%] munching on the data) A raid5 array on the same drive set pulls in at 250MB/s, but md uses roughly 50% of the CPU (the other 50% is spent dealing with the data, saturating the processor). A consistency check on the raid5 array uses roughly 3% of the cpu. It is otherwise ~97% idle. md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0] 248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU] [==..] resync = 72.2% (29976960/41495808) finish=3.7min speed=51460K/sec (~350MB/s aggregate throughput, 50MB/s on each device) Just a friendly question as to why CPU utilization is significantly different between a check and a real-world read on raid5? I feel like if there was vm overhead getting the data into userland, the slowdown would be present in raid0 as well. I assume parity calculations aren't done on a read of the array, which leaves me at my question. Thanks, Rob - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid5 reads and cpu
This might be a dumb question, but what causes md to use a large amount of cpu resources when reading a large amount of data from a raid1 array? Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md enhancements to later versions; I had some other unrelated issues and rolled back to one I've run on for several months). A given 7-disk raid0 array can read 450MB/s (using cat null) and use virtually no CPU resources. (Although cat and kswapd use quite a bit [60%] munching on the data) A raid5 array on the same drive set pulls in at 250MB/s, but md uses roughly 50% of the CPU (the other 50% is spent dealing with the data, saturating the processor). A consistency check on the raid5 array uses roughly 3% of the cpu. It is otherwise ~97% idle. md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0] 248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU] [==..] resync = 72.2% (29976960/41495808) finish=3.7min speed=51460K/sec (~350MB/s aggregate throughput, 50MB/s on each device) Just a friendly question as to why CPU utilization is significantly different between a check and a real-world read on raid5? I feel like if there was vm overhead getting the data into userland, the slowdown would be present in raid0 as well. I assume parity calculations aren't done on a read of the array, which leaves me at my question. Thanks, Rob - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md: only binds to one mirror after reboot
hello, after reboot, md only binds to one mirror (/dev/hdb1). raid1: raid set md0 active with 1 out of 2 mirrors After adding /dev/hda1 manually 'mdadm --add /dev/md0 /dev/hda1', the raid seems to work well: isp:/var/log# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 hda1[0] hdb1[1] 154191744 blocks [2/2] [UU] Any idea what I did wrong? Tia some system information: isp:/var/log# uname -r 2.6.17.8 isp:/var/log# mdadm -V mdadm - v1.9.0 - 04 February 2005 isp:/var/log# fdisk -l Disk /dev/hda: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 19196 154191838+ fd Linux raid autodetect /dev/hda2 19197 19457 2096482+ 5 Extended /dev/hda5 19197 19388 1542208+ 82 Linux swap / Solaris Disk /dev/hdb: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hdb1 * 1 19196 154191838+ fd Linux raid autodetect /dev/hdb2 19197 19457 2096482+ 5 Extended /dev/hdb5 19197 19388 1542208+ 82 Linux swap / Solaris Disk /dev/md0: 157.8 GB, 157892345856 bytes 2 heads, 4 sectors/track, 38547936 cylinders Units = cylinders of 8 * 512 = 4096 bytes # isp:/var/log# grep md messages Aug 23 22:16:07 isp kernel: Kernel command line: root=/dev/md0 md=0,/dev/hda1,/dev/hdb1 ro Aug 23 22:16:07 isp kernel: md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 Aug 23 22:16:07 isp kernel: md: bitmap version 4.39 Aug 23 22:16:07 isp kernel: md: raid1 personality registered for level 1 Aug 23 22:16:07 isp kernel: md: md0 stopped. Aug 23 22:16:07 isp kernel: md: bindhdb1 Aug 23 22:16:07 isp kernel: raid1: raid set md0 active with 1 out of 2 mirrors Aug 23 22:16:07 isp kernel: EXT3 FS on md0, internal journal Aug 23 22:45:55 isp kernel: Kernel command line: root=/dev/md0 md=0,/dev/hda1,/dev/hdb1 ro Aug 23 22:45:55 isp kernel: md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 Aug 23 22:45:55 isp kernel: md: bitmap version 4.39 Aug 23 22:45:55 isp kernel: md: raid1 personality registered for level 1 Aug 23 22:45:55 isp kernel: md: md0 stopped. Aug 23 22:45:55 isp kernel: md: bindhdb1 Aug 23 22:45:55 isp kernel: raid1: raid set md0 active with 1 out of 2 mirrors Aug 23 22:45:55 isp kernel: EXT3 FS on md0, internal journal manually added /dev/hda1/ Aug 24 07:12:30 isp kernel: md: bindhda1 Aug 24 07:12:30 isp kernel: md: syncing RAID array md0 Aug 24 07:12:30 isp kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Aug 24 07:12:30 isp kernel: md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reconstruction. Aug 24 07:12:30 isp kernel: md: using 128k window, over a total of 154191744 blocks. Aug 24 08:28:05 isp kernel: md: md0: sync done. Kind Regards Andreas Pelzner Do you have the autodetect kernel messages from booting? Rob - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html