Raid-1 on top of multipath

2007-04-20 Thread Rob Bray
I'm attempting to do host-based mirroring with one LUN on each of two EMC
CX storage units, each with two service processors. Connection is via
Emulex LP9802, using lpfc driver, and sg.

The two LUNs (with two possible paths each) present fine as /dev/sd[a-d].
I have tried using both md-multipath and dm-multipath on separate
occasions, and made a md-raid1 device on top of them. Both work when all
paths are alive. Both work great when one path to a disk dies. Neither
work when both paths to a disk die.

md-raid1 does not deduce (or is not informed) that the entire multipath
device is dead when dm-multipath is used, and continues to hang i/o on the
raid1 device trying to access the sick dm device.

When md-raid1 is run on top of md-multipath, I get a race. I'm going to
focus on the md-raid1 on md-multpath implementation, as I feel it's more
on-topic for this group.

sda/sdb share a FC cable, and can access the same LUN through two service
processors. The same goes for sdc/sdd.


$ mdadm --create /dev/md0 --level=multipath -n2 /dev/sda /dev/sdb
mdadm: array /dev/md0 started.

$ mdadm --create /dev/md1 --level=multipath -n2 /dev/sdc /dev/sdd
mdadm: array /dev/md1 started.

$ mdadm --create /dev/md16 --level=raid1 -n2 /dev/md0 /dev/md1
mdadm: array /dev/md16 started.

$ cat /proc/mdstat
Personalities : [multipath] [raid1]
md16 : active raid1 md1[1] md0[0]
  52428672 blocks [2/2] [UU]
  []  resync =  3.3% (1756736/52428672)
finish=6.7min speed=125481K/sec

md0 : active multipath sda[0] sdb[1]
  52428736 blocks [2/2] [UU]

md1 : active multipath sdd[0] sdc[1]
  52428736 blocks [2/2] [UU]

unused devices: none


Failing one of the service processor paths results in a [U_] for the
multpath device, and business goes on as usual. It has to be added back in
by hand when the path is restored, whcih is expected.

Failing both of the paths (taking the FC link down) at once results in a
crazy race:

Apr 20 12:59:21 vmprog kernel: lpfc :02:04.0: 0:0203 Nodev timeout on
WWPN 50:6:1:69:30:20:83:45 NPort x7a00ef Data: x8 x7 x0
Apr 20 12:59:21 vmprog kernel: lpfc :02:04.0: 0:0203 Nodev timeout on
WWPN 50:6:1:61:30:20:83:45 NPort x7a01ef Data: x8 x7 x0
Apr 20 12:59:26 vmprog kernel:  rport-0:0-2: blocked FC remote port time
out: removing target and saving binding
Apr 20 12:59:26 vmprog kernel:  rport-0:0-3: blocked FC remote port time
out: removing target and saving binding
Apr 20 12:59:26 vmprog kernel:  0:0:1:0: SCSI error: return code = 0x1
Apr 20 12:59:26 vmprog kernel: end_request: I/O error, dev sdb, sector
10998152
Apr 20 12:59:26 vmprog kernel: end_request: I/O error, dev sdb, sector
10998160
Apr 20 12:59:26 vmprog kernel: multipath: IO failure on sdb, disabling IO
path.
Apr 20 12:59:26 vmprog kernel: ^IOperation continuing on 1 IO paths.
Apr 20 12:59:26 vmprog kernel: multipath: sdb: rescheduling sector 10998168
Apr 20 12:59:26 vmprog kernel:  0:0:1:0: SCSI error: return code = 0x1
Apr 20 12:59:26 vmprog kernel: end_request: I/O error, dev sdb, sector
104857344
Apr 20 12:59:26 vmprog kernel: multipath: sdb: rescheduling sector 104857352
Apr 20 12:59:26 vmprog kernel: MULTIPATH conf printout:
Apr 20 12:59:26 vmprog kernel:  --- wd:1 rd:2
Apr 20 12:59:26 vmprog kernel:  disk0, o:0, dev:sdb
Apr 20 12:59:26 vmprog kernel:  disk1, o:1, dev:sda
Apr 20 12:59:26 vmprog kernel: MULTIPATH conf printout:
Apr 20 12:59:26 vmprog kernel:  --- wd:1 rd:2
Apr 20 12:59:26 vmprog kernel:  disk1, o:1, dev:sda
Apr 20 12:59:26 vmprog kernel: multipath: sdb: redirecting sector 10998152
to another IO path
Apr 20 12:59:26 vmprog kernel:  0:0:0:0: rejecting I/O to dead device
Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error.
Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 10998168
Apr 20 12:59:26 vmprog kernel: multipath: sdb: redirecting sector
104857344 to another IO path
Apr 20 12:59:26 vmprog kernel:  0:0:0:0: rejecting I/O to dead device
Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error.
Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 104857352
Apr 20 12:59:26 vmprog kernel: multipath: sda: redirecting sector 10998152
to another IO path
Apr 20 12:59:26 vmprog kernel: multipath: sda: redirecting sector
104857344 to another IO path
Apr 20 12:59:26 vmprog kernel:  0:0:0:0: rejecting I/O to dead device
Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error.
Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 104857352
Apr 20 12:59:26 vmprog kernel:  0:0:0:0: rejecting I/O to dead device
Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left and IO error.
Apr 20 12:59:26 vmprog kernel: multipath: sda: rescheduling sector 10998168
Apr 20 12:59:26 vmprog kernel: multipath: sda: redirecting sector
104857344 to another IO path
Apr 20 12:59:26 vmprog kernel:  0:0:0:0: rejecting I/O to dead device
Apr 20 12:59:26 vmprog kernel: multipath: only one IO path left 

Re: Raid on USB flash disk

2007-03-05 Thread Rob Bray
 Hi,

 I just tried to setup a one-device raid onto an USB flash drive.
 Creating, setting up ext3 and filling with data was no problem.
 But when I tried to work with it afterwards the metadevice was
 unresponsive. I tried both linear and raid0 levels, but that
 made no difference.
 For my uneducated eye it looks like something is deadlocking if
 md tries to read from the device.

 I'm using kernel 2.6.18 (gentoo) on a VIA EPIA CN1 mainboard
 with a 2GB USB flash drive (extreme). Please ask if I should provide
 more information like dmesg or lspci.

 The main reason why I'm trying this weird setup is that the USB
 drive is always enumerated last in my kernel, and I want to boot
 from it. That means every time I add a disk or remove one I have
 to edit grub.conf and fstab. Very inconvenient. So my idea was
 to create a single device md on it and leave it to the autodetection
 to find the device. So I never have to edit /etc/fstab again for
 a simple hardware change and I'm independent of any enumeration
 changes in future kernel releases.

 But unfortunately it doesn't work :-(

 Any help appreciated.

 --Arne
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


I believe you could use udev persistant names (eg /dev/dsk/by-name) to
refer to your USB partitions, instead of /dev/sd?, etc.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Move superblock on partition resize?

2007-02-07 Thread Rob Bray
I am trying to grow a raid5 volume in-place. I would like to expand the
partition boundaries, then grow raid5 into the newly-expanded partitions.
I was wondering if there is a way to move the superblock from the end of
the old partition to the end of the new partition. I've tried dd
if=/dev/sdX1 of=/dev/sdX1 bs=512 count=256
skip=(sizeOfOldPartitionInBlocks - 256) seek=(sizeOfNewPartitionInBlocks -
256) unsuccessfully. Also, copying the last 128KB (256 blocks) of the old
partition before the table modification to a file, and placing that data
at the tail of the new partition also yields no beans. I can drop one
drive at a time from the group, change the partition table, then hot-add
it, but a resync times 7 drives is a lot of juggling. Any ideas?

Thanks,
Rob

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: disappointed with 3ware 9550sx

2006-12-14 Thread Rob Bray
 i want to say up front that i have several 3ware 7504 and 7508 cards
 which i am completely satisfied with.  i use them as JBOD, and they make
 stellar PATA controllers (not RAID controllers).  they're not perfect
 (they're slow), but they've been rock solid for years.

 not so the 9550sx.

 i've been a software raid devotee for years now.  i've never wanted to
 trust my data to hw raid, because i can't look under the covers and see
 what it's doing, and i'm at the mercy of the vendor when it comes to
 recovery situations.  so why did i even consider hw raid?  NVRAM.  i
 wanted the write performance of NVRAM.

 i debated between areca and 3ware, but given the areca driver wasn't in
 the kernel (it is now), the lack of smartmontools support for areca, and
 my experiences with the 7504/7508 i figured i'd stick with what i know.

 sure i am impressed with the hw raid i/o rates on the 9550sx, especially
 with the NVRAM.  but i am unimpressed with several failures which have
 occured which evidence suggests are 3ware's fault (or at worst would
 not have resulted in problems with sw raid).

 my configuration has 7 disks:

 - 3x400GB WDC WD4000YR-01PLB0 firmware 01.06A01
 - 4x250GB WDC WD2500YD-01NVB1 firmware 10.02E01

 those disks and firmwares are on the 3ware drive compatibility list:
 http://www.3ware.com/products/pdf/Drive_compatibility_list_9550SX_9590SE_2006_09.pdf

 note that the compatibility list has a column NCQ, which i read as an
 indication the drive supports NCQ or not.  as supporting evidence for this
 i refer to footnote number 4, which is specifically used on some drives
 which MUST NOT have NCQ enabled.

 i had NCQ enabled on all 7 drives.  perhaps this is the source of some of
 my troubles, i'll grant 3ware that.

 initially i had the firmware from the 9.3.0.4 release on the 9550sx
 (3.04.00.005) it was the most recent at the time i installed the system.
 (and the appropriate driver in the kernel -- i think i was using 2.6.16.x
 at the time.)

 my first disappointment came when i tried to create a 3-way raid1 on the
 3x400 disks.  it doesn't support it at all.  i had become so accustomed to
 using a 3-way raid1 with software raid it didn't even occur to me to find
 out up front if the 3ware could support this.  apparently this is so
 revolutionary an idea 3ware support was completely baffled when i opened a
 ticket regarding it.  why would you want that?  it will fail over to a
 spare disk automatically.

 still lured by the NVRAM i gave in and went with a 2-way mirror plus a
 spare.  (i prefer the 3-way mirror so i'm never without a redundant copy
 and don't have to rush to the colo with a replacement when a disk fails.)

 the 4x250GB were turned into a raid-10.

 install went fine, testing went fine, system was put into production.


 second disappointment:  within a couple weeks the 9550sx decided it
 didn't like one of the 400GB disks and knocked it out of the array.
 here's what the driver had to say about it:

 Sep  6 23:47:30 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive
 timeout detected:port=0.
 Sep  6 23:47:31 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded
 unit:unit=0, port=0.
 Sep  6 23:48:46 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild
 started:unit=0.
 Sep  7 00:02:12 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x003B): Rebuild
 paused:unit=0.
 Sep  7 00:02:27 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild
 started:unit=0.
 Sep  7 09:32:19 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x0005): Rebuild
 completed:unit=0.

 the 9550sx could still communicate with the disk -- the SMART log
 had no indications of error.  i converted the drive to JBOD and read and
 overwrote the entire surface without a problem.  i ended up just just
 converting the drive to the spare disk... but remained worried about
 why it could have been knocked out of the array.

 maybe this is a WD bug, maybe it's a 3ware bug, who knows.


 third disappointment:  for a large data copy i inserted a disk into the
 remaining spare slot on the 3ware.  now i'm familiar with 750[48] where
 i run everything as JBOD and never let 3ware raid touch it.  when i
 inserted this 8th disk i found i had to ask tw_cli to create a JBOD.
 the disappointment comes here:  it zeroed the MBR!  fortunately the disk
 had a single full-sized partition and i could recreate the partition
 table, but there's no sane reason to zero the MBR just because i asked
 for the disk to be treated as JBOD (and don't tell me it'll reduce
 customer support cases because people might reuse a bad partition table
 from a previously raid disk -- i think it'll create even more problems
 than that explanation might solve).


 fourth disappointment:  heavy write traffic on one unit can affect
 other units even though they have separate spindles.  my educated
 guess is the 3ware does not share its cache fairly and the write
 traffic starves everything else.  i described this in a post here
 

Re: future hardware

2006-10-31 Thread Rob Bray
 I have been using an older 64bit system, socket 754 for a while now.  It
 has
 the old PCI bus 33Mhz.  I have two low cost (no HW RAID) PCI SATA I cards
 each with 4 ports to give me an eight disk RAID 6.  I also have a Gig NIC,
 on the PCI bus.  I have Gig switches with clients connecting to it at Gig
 speed.

 As many know you get a peak transfer rate of 133 MB/s or 1064Mb/s from
 that
 PCI bus http://en.wikipedia.org/wiki/Peripheral_Component_Interconnect

 The transfer rate is not bad across the network but my bottle neck it the
 PCI bus.  I have been shopping around for new MB and PCI-express cards.  I
 have been using mdadm for a long time and would like to stay with it.  I
 am
 having trouble finding an eight port PCI-express card that does not have
 all
 the fancy HW RAID which jacks up the cost.  I am now considering using a
 MB
 with eight SATA II slots onboard.  GIGABYTE GA-M59SLI-S5 Socket AM2 NVIDIA
 nForce 590 SLI MCP ATX.

 What are other users of mdadm using with the PCI-express cards, most cost
 effective solution?



I agree that SATA drives on PCI-E cards are as much bang-for-buck as is
available right now. On the newer platforms, each PCI-E slot, the onboard
RAID controller(s), and the 32-bit PCI bus all have discrete paths to the
chip.

Play with the thing to see how many disks you can put on a controller
without a slowdown. Don't assume the controller isn't oversold on
bandwidth (I was only able to use three out of four CK804 ports on a
GA-K8NE without saturating it; two out of four slots on a PCI Sil3114).
Combining the bandwidth of the onboard RAID controller, two SATA slots,
and one PCI controller card, sustained reads reach 450MB/s (across 7
disks, RAID-0) with an $80 board, and three $20 controller cards.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: avoiding the initial resync on --create

2006-10-09 Thread Rob Bray
 On Mon, 2006-10-09 at 15:49 +0200, Erik Mouw wrote:

 There is no way to figure out what exactly is correct data and what is
 not. It might work right after creation and during the initial install,
 but after the next reboot there is no way to figure out what blocks to
 believe.

 You don't really need to.  After a clean install, the operating system
 has no business reading any block it didn't write to during the install
 unless you are just reading disk blocks for the fun of it.  And any
 program that depends on data that hasn't first been written to disk is
 just wrong and stupid anyway.

I suppose a partial-stripe write would read back junk data on the other
disks, xor with your write, and update the parity block.

If you benchmark the disk, you're going to be reading blocks you didn't
necessarily write, which could kick out consistency errors.

A whole-array consistency check would puke on the out-of-whack parity data.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ATA cables and drives

2006-09-21 Thread Rob Bray
 I'm looking for new harddrives.

 This is my experience so far.


 SATA cables:
 =

 I have zero good experiences with any SATA cables.
 They've all been crap so far.


 3.5 ATA harddrives buyable where I live:
 ==

 (All drives are 7200rpm, for some reason.)

 Hitachi DeskStar  500 GB / 16 MB /  8.5 ms / SATA or PATA
 Maxtor DiamondMax 11  500 GB / 16 MB /  8.5 ms / SATA or PATA
 Maxtor MaXLine Pro500 GB / 16 MB /  8.5 ms / SATA or PATA
 Seagate Barracuda 7200.10 500 GB / 16 MB /?/ SATA or PATA
 Seagate Barracuda 7200.10 750 GB / 16 MB /?/ SATA or PATA
 Seagate Barracuda 7200.9  500 GB / 16 MB / 11   ms / SATA or PATA
 Seagate Barracuda 7200.9  500 GB /  8 MB / 11   ms / SATA or PATA
 Seagate Barracuda ES  500 GB / 16 MB /  8.5 ms / SATA
 Seagate Barracuda ES  750 GB / 16 MB /  8.5 ms / SATA
 Seagate ESATA 500 GB / 16 MB /?/ SATA (external)
 Seagate NL35.2 ST3500641NS500 GB / 16 MB /  8   ms / ? / SATA
 Seagate NL35.2 ST3500841NS500 GB /  8 MB /  8   ms / ? / SATA
 Western Digital SE16 WD5000KS 500 GB / 16 MB /  8.9 ms / SATA
 Western Digital RE2 WD5000YS  500 GB / 16 MB /  8.7 ms / SATA

 I've tried Maxtor and IBM (now Hitachi) harddrives.
 Both makes have failed on me, but most of the time due to horrible
 packaging.

 I don't care a split-second whether one kind is marginally faster than
 the other, so all the reviews on AnandTech etc. are utterly useless to
 me.  There's an infinite number of more effective ways to get better
 performance than to buy a slightly faster harddrive.

 I DO care about quality, namely:
 * How often the drives has catastrophic failure,
 * How they handle heat (dissipation  acceptance - how hot before it
 fails?),
 * How big the spare area is,
 * How often they have single-sector failures,
 * How long the manufacturer warranty lasts,
 * How easy the manufacturer is to work with wrt. warranty.

 I haven't been able to figure the spare area size, heat properties,
 etc. for any drives.
 Thus my only criteria so far has been manufacturer warranty: How much
 bitching do I get when I tell them my drive doesn't work.

 My main experience is with Maxtor.
 Maxtor has been none less than superb wrt. warranty!
 Download an ISO with a diag tool, burn the CD, boot the CD, type in
 the fault code it prints on Maxtor's site, and a day or two later
 you've got a new drive in the mail and packaging to ship the old one
 back in.  If something odd happens, call them up and they're extremely
 helpful.

 Unfortunately, I lack thorough experience with the other brands.


 Questions:
 ===

 A.) Does anyone have experience with returning Hitachi, Seagate or WD
 drives to the manufacturer?
 Do they have manufacturer warranty at all?
 How much/little trouble did you have with Hitachi, Seagate or WD?

 B.) Can anyone *prove* (to a reasonable degree) that drives from
 manufacturer H, M, S or WD is of better quality?
 Has anyone seen a review that heat/shock/stress test drives?

 C.) Does good SATA cables exist?
 Eg. cables that lock on to the drives, or backplanes which lock
 the entire disk in place?


 Thanks for reading, and thanks in advance for answers (if any) :-).
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


I've experienced consumer-class drive failures with Seagate, Western
Digital, and Maxtor. WD is the only one I've had catastrophically fail on
me -- the rest developed bad sectors slowly enough to catch. I have yet to
have a Hitachi drive fail. WD got a bad name with me when they sold drives
under the same model number while changing to different drives (assembled
in Thailand) that performend considerably slower than the originals (from
Malaysia).

Warranty experiences with Western Digital, Maxtor, and Seagate have been
pleasant and quick (although WD and Maxtor sent me refurbished disks).
I've never been refused a drive replacement. Maxtor required me to
generate a code with their proprietary tool before the website would even
talk about a replacement; a way around this was to select Drive doesn't
spin up as the symptom :-P

Consumer-level drives are made primarily with volume and profit margin in
mind. If reliability is a chief concern, most server-class drives are
designed with longer service life and higher reliability in mind. If price
is paramount, as you said, go for the most inexpensive disk with the
longest warranty.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: access array from knoppix

2006-09-21 Thread Rob Bray
 Maybe valid but not helping with my problem since the problem is/was,
 that /dev/md0 didn't exist at all. mdadm -C won't create device nodes.

 But I figured the workaround meanwhile, so it doesn't matter anymore.
 (In case someone wanna know: mknod in /lib/udev/devices does it on a hard
 disk
 install, I guess could work in /dev on knoppix, too, haven't tried yet.)


Do you want to be able to boot using knoppix all the time, or are you
looking for a oneshot solution? I feel like 'mknod' with major 9 is a
trivial thing to have to do.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ATA cables and drives

2006-09-21 Thread Rob Bray
 I'm looking for new harddrives.

 This is my experience so far.


 SATA cables:
 =

 I have zero good experiences with any SATA cables.
 They've all been crap so far.


 3.5 ATA harddrives buyable where I live:
 ==

 (All drives are 7200rpm, for some reason.)

 Hitachi DeskStar  500 GB / 16 MB /  8.5 ms / SATA or PATA
 Maxtor DiamondMax 11  500 GB / 16 MB /  8.5 ms / SATA or PATA
 Maxtor MaXLine Pro500 GB / 16 MB /  8.5 ms / SATA or PATA
 Seagate Barracuda 7200.10 500 GB / 16 MB /?/ SATA or PATA
 Seagate Barracuda 7200.10 750 GB / 16 MB /?/ SATA or PATA
 Seagate Barracuda 7200.9  500 GB / 16 MB / 11   ms / SATA or PATA
 Seagate Barracuda 7200.9  500 GB /  8 MB / 11   ms / SATA or PATA
 Seagate Barracuda ES  500 GB / 16 MB /  8.5 ms / SATA
 Seagate Barracuda ES  750 GB / 16 MB /  8.5 ms / SATA
 Seagate ESATA 500 GB / 16 MB /?/ SATA (external)
 Seagate NL35.2 ST3500641NS500 GB / 16 MB /  8   ms / ? / SATA
 Seagate NL35.2 ST3500841NS500 GB /  8 MB /  8   ms / ? / SATA
 Western Digital SE16 WD5000KS 500 GB / 16 MB /  8.9 ms / SATA
 Western Digital RE2 WD5000YS  500 GB / 16 MB /  8.7 ms / SATA

 I've tried Maxtor and IBM (now Hitachi) harddrives.
 Both makes have failed on me, but most of the time due to horrible
 packaging.

 I don't care a split-second whether one kind is marginally faster than
 the other, so all the reviews on AnandTech etc. are utterly useless to
 me.  There's an infinite number of more effective ways to get better
 performance than to buy a slightly faster harddrive.

 I DO care about quality, namely:
 * How often the drives has catastrophic failure,
 * How they handle heat (dissipation  acceptance - how hot before it
 fails?),
 * How big the spare area is,
 * How often they have single-sector failures,
 * How long the manufacturer warranty lasts,
 * How easy the manufacturer is to work with wrt. warranty.

 I haven't been able to figure the spare area size, heat properties,
 etc. for any drives.
 Thus my only criteria so far has been manufacturer warranty: How much
 bitching do I get when I tell them my drive doesn't work.

 My main experience is with Maxtor.
 Maxtor has been none less than superb wrt. warranty!
 Download an ISO with a diag tool, burn the CD, boot the CD, type in
 the fault code it prints on Maxtor's site, and a day or two later
 you've got a new drive in the mail and packaging to ship the old one
 back in.  If something odd happens, call them up and they're extremely
 helpful.

 Unfortunately, I lack thorough experience with the other brands.


 Questions:
 ===

 A.) Does anyone have experience with returning Hitachi, Seagate or WD
 drives to the manufacturer?
 Do they have manufacturer warranty at all?
 How much/little trouble did you have with Hitachi, Seagate or WD?

 B.) Can anyone *prove* (to a reasonable degree) that drives from
 manufacturer H, M, S or WD is of better quality?
 Has anyone seen a review that heat/shock/stress test drives?

 C.) Does good SATA cables exist?
 Eg. cables that lock on to the drives, or backplanes which lock
 the entire disk in place?


 Thanks for reading, and thanks in advance for answers (if any) :-).
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


I've experienced consumer-class drive failures with Seagate, Western
Digital, and Maxtor. WD is the only one I've had catastrophically fail on
me -- the rest developed bad sectors slowly enough to catch. I have yet to
have a Hitachi drive fail. WD got a bad name with me when they sold drives
under the same model number while changing to different drives (assembled
in Thailand) that performend considerably slower than the originals (from
Malaysia).

Warranty experiences with Western Digital, Maxtor, and Seagate have been
pleasant and quick (although WD and Maxtor sent me refurbished disks).
I've never been refused a drive replacement. Maxtor required me to
generate a code with their proprietary tool before the website would even
talk about a replacement; a way around this was to select Drive doesn't
spin up as the symptom :-P

Consumer-level drives are made primarily with volume and profit margin in
mind. If reliability is a chief concern, most server-class drives are
designed with longer service life and higher reliability in mind. If price
is paramount, as you said, go for the most inexpensive disk with the
longest warranty.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scrub was Re: RAID5 Problem - $1000 reward for help

2006-09-21 Thread Rob Bray
 Am Sonntag, 17. September 2006 13:36 schrieben Sie:
 On 9/17/06, Ask Bjørn Hansen [EMAIL PROTECTED] wrote:
   It's recommended to use a script to scrub the raid device regularly,
   to detect sleeping bad blocks early.
 
  What's the best way to do that?  dd the full md device to /dev/null?

 echo check /sys/block/md?/md/sync_action

 Distros may have cron scripts to do this right.

 And you need a fairly recent kernel.

 Does this test stress the discs a lot, like a resync?
 How long does it take?
 Can I use it on a mounted array?


I'd like to add to this question -- does 'check' action on a RAID5 array
verify the accuracy of parity data, blindly read back all data, or only
verify readability of data blocks?

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Interesting RAID checking observations

2006-09-21 Thread Rob Bray
 Just to follow up my speed observations last month on a 6x SATA - 3x
 PCIe - AMD64 system, as of 2.6.18 final, RAID-10 checking is running
 at a reasonable ~156 MB/s (which I presume means 312 MB/s of reads),
 and raid5 is better than the 23 MB/s I complained about earlier, but
 still a bit sluggish...

 md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
   1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]
   [=...]  resync =  6.2% (21564928/343831040)
 finish=86.0min speed=62429K/sec

 I'm not sure why the raid5 check can't run at 250 MB/s (300 MB/s disk
 speed).  The processor is idle and can do a lot more than that:

 raid5: automatically using best checksumming function: generic_sse
generic_sse:  6769.000 MB/sec
 raid5: using function: generic_sse (6769.000 MB/sec)


 But anyway, it's better, so thank you!  I haven't rebooted the celeron
 I hung for the duration of a RAID-1 check, so I haven't checked that with
 2.6.18 yet.
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


Check the I/O performance on the box. I think the speed indicator comes
out of calculations to determine how fast a failing drive would be
rebuilt, were you doing a rebuild instead of a check. I like using the
dstat tool to get that info at-a-glance
(http://dag.wieers.com/home-made/dstat/).

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: access *existing* array from knoppix

2006-09-13 Thread Rob Bray
 Am Dienstag, 12. September 2006 16:08 schrieb Justin Piszcz:
 /dev/MAKEDEV /dev/md0

 also make sure the SW raid modules etc are loaded if necessary.

 Won't work, MAKEDEV doesn't know how to create [/dev/]md0.

mknod /dev/md0 b 9 0
perhaps?

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 reads and cpu

2006-09-07 Thread Rob Bray
 On Monday August 28, [EMAIL PROTECTED] wrote:
 This might be a dumb question, but what causes md to use a large amount
 of
 cpu resources when reading a large amount of data from a raid1 array?

 I assume you meant raid5 there.

 md/raid5 shouldn't use that much CPU when reading.
 It does use more than raid0 as it reads data in the stripe-cache and
 then copies the data from the stripe cache into the read-buffer.  But
 I wouldn't expect that to come anywhere near 50%.

 Are you really seeing 'raid5d' using 50% of CPU in 'top' or similar?

 NeilBrown

Sorry for the long response time -- email got lost.

top - 16:45:21 up 10 days, 17:41,  2 users,  load average: 0.58, 0.17, 0.05
Tasks: 113 total,   2 running, 111 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.7% us, 87.7% sy,  6.3% ni,  0.0% id,  0.0% wa,  0.0% hi,  4.3% si
Mem:   2061564k total,  2044784k used,16780k free,  1193384k buffers
Swap:  4257016k total,  552k used,  4256464k free,24348k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  945 root  10  -5 000 S 44.2  0.0   7:27.73 md11_raid5




 Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md
 enhancements to later versions; I had some other unrelated issues and
 rolled back to one I've run on for several months).

 A given 7-disk raid0 array can read 450MB/s (using cat  null) and use
 virtually no CPU resources. (Although cat and kswapd use quite a bit
 [60%]
 munching on the data)

 A raid5 array on the same drive set pulls in at 250MB/s, but md uses
 roughly 50% of the CPU (the other 50% is spent dealing with the data,
 saturating the processor).

 A consistency check on the raid5 array uses roughly 3% of the cpu. It is
 otherwise ~97% idle.
 md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6]
 sdb2[0]
   248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU]
   [==..]  resync = 72.2% (29976960/41495808)
 finish=3.7min speed=51460K/sec
 (~350MB/s aggregate throughput, 50MB/s on each device)

 Just a friendly question as to why CPU utilization is significantly
 different between a check and a real-world read on raid5? I feel like if
 there was vm overhead getting the data into userland, the slowdown would
 be present in raid0 as well. I assume parity calculations aren't done on
 a
 read of the array, which leaves me at my question.

 Thanks,
 Rob

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 reads and cpu

2006-09-07 Thread Rob Bray
 Rob Bray wrote:

This might be a dumb question, but what causes md to use a large amount
 of
cpu resources when reading a large amount of data from a raid1 array?
Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md
enhancements to later versions; I had some other unrelated issues and
rolled back to one I've run on for several months).

A given 7-disk raid0 array can read 450MB/s (using cat  null) and use
virtually no CPU resources. (Although cat and kswapd use quite a bit
 [60%]
munching on the data)

A raid5 array on the same drive set pulls in at 250MB/s, but md uses
roughly 50% of the CPU (the other 50% is spent dealing with the data,
saturating the processor).

A consistency check on the raid5 array uses roughly 3% of the cpu. It is
otherwise ~97% idle.
md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6]
 sdb2[0]
  248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU]
  [==..]  resync = 72.2% (29976960/41495808)
finish=3.7min speed=51460K/sec
(~350MB/s aggregate throughput, 50MB/s on each device)

Just a friendly question as to why CPU utilization is significantly
different between a check and a real-world read on raid5? I feel like if
there was vm overhead getting the data into userland, the slowdown would
be present in raid0 as well. I assume parity calculations aren't done on
 a
read of the array, which leaves me at my question.


 What are you stripe and cache sizes?

 --
 bill davidsen [EMAIL PROTECTED]
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0]
  248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU]

stripe_cache_size = 256
I've tried increasing it with the same result


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid5 reads and cpu

2006-08-28 Thread Rob Bray
This might be a dumb question, but what causes md to use a large amount of
cpu resources when reading a large amount of data from a raid1 array?
Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md
enhancements to later versions; I had some other unrelated issues and
rolled back to one I've run on for several months).

A given 7-disk raid0 array can read 450MB/s (using cat  null) and use
virtually no CPU resources. (Although cat and kswapd use quite a bit [60%]
munching on the data)

A raid5 array on the same drive set pulls in at 250MB/s, but md uses
roughly 50% of the CPU (the other 50% is spent dealing with the data,
saturating the processor).

A consistency check on the raid5 array uses roughly 3% of the cpu. It is
otherwise ~97% idle.
md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0]
  248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU]
  [==..]  resync = 72.2% (29976960/41495808)
finish=3.7min speed=51460K/sec
(~350MB/s aggregate throughput, 50MB/s on each device)

Just a friendly question as to why CPU utilization is significantly
different between a check and a real-world read on raid5? I feel like if
there was vm overhead getting the data into userland, the slowdown would
be present in raid0 as well. I assume parity calculations aren't done on a
read of the array, which leaves me at my question.

Thanks,
Rob

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid5 reads and cpu

2006-08-28 Thread Rob Bray
This might be a dumb question, but what causes md to use a large amount of
cpu resources when reading a large amount of data from a raid1 array?
Examples are on a 2.4GHz AMD64, 2GB, 2.6.15.1 (I realize there are md
enhancements to later versions; I had some other unrelated issues and
rolled back to one I've run on for several months).

A given 7-disk raid0 array can read 450MB/s (using cat  null) and use
virtually no CPU resources. (Although cat and kswapd use quite a bit [60%]
munching on the data)

A raid5 array on the same drive set pulls in at 250MB/s, but md uses
roughly 50% of the CPU (the other 50% is spent dealing with the data,
saturating the processor).

A consistency check on the raid5 array uses roughly 3% of the cpu. It is
otherwise ~97% idle.
md11 : active raid5 sdi2[5] sdh2[4] sdf2[3] sde2[2] sdd2[1] sdc2[6] sdb2[0]
  248974848 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUU]
  [==..]  resync = 72.2% (29976960/41495808)
finish=3.7min speed=51460K/sec
(~350MB/s aggregate throughput, 50MB/s on each device)

Just a friendly question as to why CPU utilization is significantly
different between a check and a real-world read on raid5? I feel like if
there was vm overhead getting the data into userland, the slowdown would
be present in raid0 as well. I assume parity calculations aren't done on a
read of the array, which leaves me at my question.

Thanks,
Rob

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: md: only binds to one mirror after reboot

2006-08-24 Thread Rob Bray
 hello,

 after reboot, md only binds to one mirror (/dev/hdb1).
 raid1: raid set md0 active with 1 out of 2 mirrors

 After adding /dev/hda1 manually 'mdadm --add /dev/md0 /dev/hda1', the
 raid seems to work well:

 isp:/var/log# cat /proc/mdstat
 Personalities : [raid1]
 md0 : active raid1 hda1[0] hdb1[1]
   154191744 blocks [2/2] [UU]

 Any idea what I did wrong?

 Tia

 some system information:

 isp:/var/log# uname -r
 2.6.17.8
 isp:/var/log# mdadm -V
 mdadm - v1.9.0 - 04 February 2005

 isp:/var/log# fdisk -l

 Disk /dev/hda: 160.0 GB, 160041885696 bytes
 255 heads, 63 sectors/track, 19457 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot  Start End  Blocks   Id  System
 /dev/hda1   *   1   19196   154191838+  fd  Linux raid
 autodetect
 /dev/hda2   19197   19457 2096482+   5  Extended
 /dev/hda5   19197   19388 1542208+  82  Linux swap /
 Solaris

 Disk /dev/hdb: 160.0 GB, 160041885696 bytes
 255 heads, 63 sectors/track, 19457 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot  Start End  Blocks   Id  System
 /dev/hdb1   *   1   19196   154191838+  fd  Linux raid
 autodetect
 /dev/hdb2   19197   19457 2096482+   5  Extended
 /dev/hdb5   19197   19388 1542208+  82  Linux swap /
 Solaris

 Disk /dev/md0: 157.8 GB, 157892345856 bytes
 2 heads, 4 sectors/track, 38547936 cylinders
 Units = cylinders of 8 * 512 = 4096 bytes
 #
 isp:/var/log# grep md messages
 Aug 23 22:16:07 isp kernel: Kernel command line: root=/dev/md0
 md=0,/dev/hda1,/dev/hdb1 ro
 Aug 23 22:16:07 isp kernel: md: md driver 0.90.3 MAX_MD_DEVS=256,
 MD_SB_DISKS=27
 Aug 23 22:16:07 isp kernel: md: bitmap version 4.39
 Aug 23 22:16:07 isp kernel: md: raid1 personality registered for level 1
 Aug 23 22:16:07 isp kernel: md: md0 stopped.
 Aug 23 22:16:07 isp kernel: md: bindhdb1
 Aug 23 22:16:07 isp kernel: raid1: raid set md0 active with 1 out of 2
 mirrors
 Aug 23 22:16:07 isp kernel: EXT3 FS on md0, internal journal
 Aug 23 22:45:55 isp kernel: Kernel command line: root=/dev/md0
 md=0,/dev/hda1,/dev/hdb1 ro
 Aug 23 22:45:55 isp kernel: md: md driver 0.90.3 MAX_MD_DEVS=256,
 MD_SB_DISKS=27
 Aug 23 22:45:55 isp kernel: md: bitmap version 4.39
 Aug 23 22:45:55 isp kernel: md: raid1 personality registered for level 1
 Aug 23 22:45:55 isp kernel: md: md0 stopped.
 Aug 23 22:45:55 isp kernel: md: bindhdb1
 Aug 23 22:45:55 isp kernel: raid1: raid set md0 active with 1 out of 2
 mirrors
 Aug 23 22:45:55 isp kernel: EXT3 FS on md0, internal journal

 manually added /dev/hda1/

 Aug 24 07:12:30 isp kernel: md: bindhda1
 Aug 24 07:12:30 isp kernel: md: syncing RAID array md0
 Aug 24 07:12:30 isp kernel: md: minimum _guaranteed_ reconstruction
 speed: 1000 KB/sec/disc.
 Aug 24 07:12:30 isp kernel: md: using maximum available idle IO
 bandwidth (but not more than 20 KB/sec) for reconstruction.
 Aug 24 07:12:30 isp kernel: md: using 128k window, over a total of
 154191744 blocks.
 Aug 24 08:28:05 isp kernel: md: md0: sync done.

 Kind Regards
 Andreas Pelzner

Do you have the autodetect kernel messages from booting?

Rob

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html