Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap
On Friday October 19, [EMAIL PROTECTED] wrote: On 10/19/07, Neil Brown [EMAIL PROTECTED] wrote: On Friday October 19, [EMAIL PROTECTED] wrote: I'm using a stock 2.6.19.7 that I then backported various MD fixes to from 2.6.20 - 2.6.23... this kernel has worked great until I attempted v1.0 sb w/ bitmap=internal using mdadm 2.6.x. But would you like me to try a stock 2.6.22 or 2.6.23 kernel? Yes please. I'm suspecting the code in write_sb_page where it tests if the bitmap overlaps the data or metadata. The only way I can see you getting the exact error that you do get it for that to fail. That test was introduced in 2.6.22. Did you backport that? Any chance it got mucked up a bit? I believe you're referring to commit f0d76d70bc77b9b11256a3a23e98e80878be1578. That change actually made it into 2.6.23 AFAIK; but yes I actually did backport that fix (which depended on ab6085c795a71b6a21afe7469d30a365338add7a). If I back-out f0d76d70bc77b9b11256a3a23e98e80878be1578 I can create a raid1 w/ v1.0 sb and an internal bitmap. But clearly that is just because I removed the negative checks that you introduced ;) For me this begs the question: what else would f0d76d70bc77b9b11256a3a23e98e80878be1578 depend on that I missed? I included 505fa2c4a2f125a70951926dfb22b9cf273994f1 and ab6085c795a71b6a21afe7469d30a365338add7a too. *shrug*... This is all very odd... I definitely tested this last week and couldn't reproduce the problem. This week I can reproduce it easily. And given the nature of the bug, I cannot see how it ever worked. Anyway, here is a fix that works for me. NeilBrown Fix an unsigned compare to allow creation of bitmaps with v1.0 metadata. As page-index is unsigned, this all becomes an unsigned comparison, which almost always returns an error. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-10-22 16:47:52.0 +1000 +++ ./drivers/md/bitmap.c 2007-10-22 16:50:10.0 +1000 @@ -274,7 +274,7 @@ static int write_sb_page(struct bitmap * if (bitmap-offset 0) { /* DATA BITMAP METADATA */ if (bitmap-offset - + page-index * (PAGE_SIZE/512) + + (long)(page-index * (PAGE_SIZE/512)) + size/512 0) /* bitmap runs in to metadata */ return -EINVAL; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 000 of 2] md: Fixes for md in 2.6.23
It appears that a couple of bugs slipped in to md for 2.6.23. These two patches fix them and are appropriate for 2.6.23.y as well as 2.6.24-rcX Thanks, NeilBrown [PATCH 001 of 2] md: Fix an unsigned compare to allow creation of bitmaps with v1.0 metadata. [PATCH 002 of 2] md: raid5: fix clearing of biofill operations - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 001 of 2] md: Fix an unsigned compare to allow creation of bitmaps with v1.0 metadata.
As page-index is unsigned, this all becomes an unsigned comparison, which almost always returns an error. Signed-off-by: Neil Brown [EMAIL PROTECTED] Cc: Stable [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-10-22 16:55:48.0 +1000 +++ ./drivers/md/bitmap.c 2007-10-22 16:55:52.0 +1000 @@ -274,7 +274,7 @@ static int write_sb_page(struct bitmap * if (bitmap-offset 0) { /* DATA BITMAP METADATA */ if (bitmap-offset - + page-index * (PAGE_SIZE/512) + + (long)(page-index * (PAGE_SIZE/512)) + size/512 0) /* bitmap runs in to metadata */ return -EINVAL; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 002 of 2] md: raid5: fix clearing of biofill operations
From: Dan Williams [EMAIL PROTECTED] ops_complete_biofill() runs outside of spin_lock(sh-lock) and clears the 'pending' and 'ack' bits. Since the test_and_ack_op() macro only checks against 'complete' it can get an inconsistent snapshot of pending work. Move the clearing of these bits to handle_stripe5(), under the lock. Signed-off-by: Dan Williams [EMAIL PROTECTED] Tested-by: Joël Bertrand [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] Cc: Stable [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid5.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2007-10-22 16:55:49.0 +1000 +++ ./drivers/md/raid5.c2007-10-22 16:57:41.0 +1000 @@ -665,7 +665,12 @@ static unsigned long get_stripe_work(str ack++; sh-ops.count -= ack; - BUG_ON(sh-ops.count 0); + if (unlikely(sh-ops.count 0)) { + printk(KERN_ERR pending: %#lx ops.pending: %#lx ops.ack: %#lx + ops.complete: %#lx\n, pending, sh-ops.pending, + sh-ops.ack, sh-ops.complete); + BUG(); + } return pending; } @@ -842,8 +847,7 @@ static void ops_complete_biofill(void *s } } } - clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); - clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); + set_bit(STRIPE_OP_BIOFILL, sh-ops.complete); return_io(return_bi); @@ -3130,6 +3134,13 @@ static void handle_stripe5(struct stripe s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state); /* Now to look around and see what can be done */ + /* clean-up completed biofill operations */ + if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) { + clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); + clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); + clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete); + } + rcu_read_lock(); for (i=disks; i--; ) { mdk_rdev_t *rdev; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: very degraded RAID5, or increasing capacity by adding discs
On Tue, Oct 09, 2007 at 01:48:50PM +0400, Michael Tokarev wrote: There still is - at least for ext[23]. Even offline resizers can't do resizes from any to any size, extfs developers recommend to recreate filesystem anyway if size changes significantly. I'm too lazy to find a reference now, it has been mentioned here on linux-raid at least this year. It's sorta like fat (yea, that ms-dog filesystem) - when you resize it from, say, 501Mb to 999Mb, everything is ok, but if you want to go from 501Mb to 1Gb+1, you have to recreate almost all data structures because sizes of all internal fields changes - and here it's much safer to just re-create it from scratch than trying to modify it in place. Sure it's much better for extfs, but the point is still the same. I'll just mention that I once resized a multi-Tera ext3 filesystem and it took 8hours +, a comparable XFS online resize lasted all of 10 seconds! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: issues rebuilding raid array.
Greetings happy mdadm users. I have a little problem that after many hours of searching around I couldn't seem to solve. I have upgraded my motherboard and kernel (bad practice I know but the ICH9R controller needs 2.6.2*+) at the same time. The array was build using 2.6.18-7 Now i'm using 2.6.21-2 I'm trying to recreate the raid array with the following command and this is the error I get: mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md1 mdadm: no RAID superblock on /dev/sdc mdadm: /dev/sdc has no superblock - assembly aborted So I figure, oh look the disk sdc has gone cactus, I'll just remove it from the list. One of the advantages of mdadm. mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md1 mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1. mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0. mdadm: /dev/sde is identified as a member of /dev/md1, slot 1. mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5. mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4. mdadm: added /dev/sde to /dev/md1 as 1 mdadm: no uptodate device for slot 2 of /dev/md1 mdadm: no uptodate device for slot 3 of /dev/md1 mdadm: added /dev/sdg to /dev/md1 as 4 mdadm: added /dev/sdf to /dev/md1 as 5 mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument mdadm: added /dev/sdd to /dev/md1 as 0 mdadm: /dev/md1 assembled from 4 drives - not enough to start the array. If found this really difficult to understand considering that I can get the output of mdamd -E /dev/sdb (other disks included to overload you with information) mdadm -E /dev/sd[b-h] /dev/sdb: Magic : a92b4efc Version : 00.90.00 UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf Creation Time : Fri Oct 5 09:18:25 2007 Raid Level : raid5 Device Size : 312571136 (298.09 GiB 320.07 GB) Array Size : 1562855680 (1490.46 GiB 1600.36 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 1 Update Time : Tue Oct 16 20:03:13 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Checksum : 80d47486 - correct Events : 0.623738 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 16 -1 spare /dev/sdb 0 0 8 800 active sync /dev/sdf 1 1 8 1281 active sync /dev/.static/dev/sdi 2 2 8 1442 active sync /dev/.static/dev/sdj 3 3 8 163 active sync /dev/sdb 4 4 8 644 active sync /dev/sde 5 5 8 965 active sync /dev/sdg mdadm: No md superblock detected on /dev/sdc. /dev/sdd: Magic : a92b4efc Version : 00.90.00 UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf Creation Time : Fri Oct 5 09:18:25 2007 Raid Level : raid5 Device Size : 312571136 (298.09 GiB 320.07 GB) Array Size : 1562855680 (1490.46 GiB 1600.36 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 1 Update Time : Tue Oct 16 20:03:13 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Checksum : 80d474a8 - correct Events : 0.623738 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 800 active sync /dev/sdf 0 0 8 800 active sync /dev/sdf 1 1 8 1281 active sync /dev/.static/dev/sdi 2 2 8 1442 active sync /dev/.static/dev/sdj 3 3 8 163 active sync /dev/sdb 4 4 8 644 active sync /dev/sde 5 5 8 965 active sync /dev/sdg /dev/sde: Magic : a92b4efc Version : 00.90.00 UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf Creation Time : Fri Oct 5 09:18:25 2007 Raid Level : raid5 Device Size : 312571136 (298.09 GiB 320.07 GB) Array Size : 1562855680 (1490.46 GiB 1600.36 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 1 Update Time : Tue Oct 16 20:03:13 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Checksum : 80d474da - correct Events : 0.623738 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 1281 active sync /dev/.static/dev/sdi 0 0 8 800 active sync /dev/sdf 1 1 8 1281 active sync /dev/.static/dev/sdi 2 2 8 1442
Re: Fwd: issues rebuilding raid array.
On Mon Oct 22, 2007 at 09:46:08PM +1000, Sam Redfern wrote: Greetings happy mdadm users. I have a little problem that after many hours of searching around I couldn't seem to solve. I have upgraded my motherboard and kernel (bad practice I know but the ICH9R controller needs 2.6.2*+) at the same time. The array was build using 2.6.18-7 Now i'm using 2.6.21-2 I'm trying to recreate the raid array with the following command and this is the error I get: mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md1 mdadm: no RAID superblock on /dev/sdc mdadm: /dev/sdc has no superblock - assembly aborted You're trying to assemble the array from 6 disks here and one looks to be dodgy. That's okay so far. So I figure, oh look the disk sdc has gone cactus, I'll just remove it from the list. One of the advantages of mdadm. mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md1 mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1. mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0. mdadm: /dev/sde is identified as a member of /dev/md1, slot 1. mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5. mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4. mdadm: added /dev/sde to /dev/md1 as 1 mdadm: no uptodate device for slot 2 of /dev/md1 mdadm: no uptodate device for slot 3 of /dev/md1 mdadm: added /dev/sdg to /dev/md1 as 4 mdadm: added /dev/sdf to /dev/md1 as 5 mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument mdadm: added /dev/sdd to /dev/md1 as 0 mdadm: /dev/md1 assembled from 4 drives - not enough to start the array. Now you're trying to assemble with 5 disks and getting 4 out of 6 in the array, and one at slot -1 (i.e. a spare). If found this really difficult to understand considering that I can get the output of mdamd -E /dev/sdb (other disks included to overload you with information) mdadm -E /dev/sd[b-h] /dev/sdb: Magic : a92b4efc Version : 00.90.00 UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf Creation Time : Fri Oct 5 09:18:25 2007 Raid Level : raid5 Device Size : 312571136 (298.09 GiB 320.07 GB) Array Size : 1562855680 (1490.46 GiB 1600.36 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 1 Update Time : Tue Oct 16 20:03:13 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Checksum : 80d47486 - correct Events : 0.623738 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 16 -1 spare /dev/sdb 0 0 8 800 active sync /dev/sdf 1 1 8 1281 active sync /dev/.static/dev/sdi 2 2 8 1442 active sync /dev/.static/dev/sdj 3 3 8 163 active sync /dev/sdb 4 4 8 644 active sync /dev/sde 5 5 8 965 active sync /dev/sdg And here we see that the array has 6 active devices and a spare. You currently have 4 working active devices, a failed active device and the spare. What's happened to the other device? You can't get the array working with 4 out of 6 devices so you'll need to either find the other active device (and rebuild onto the spare) or get the failed disk working again. HTH, Robin -- ___ ( ' } | Robin Hill[EMAIL PROTECTED] | / / ) | Little Jim says | // !! | He fallen in de water !! | pgpHEB3tyY49z.pgp Description: PGP signature
flaky controller or disk error?
Hi, [using kernel 2.6.23 and mdadm 2.6.3+20070929] I have a rather flaky sata controller with which I am trying to resync a raid5 array. It usually starts failing after 40% of the resync is done. Short of changing the controller (which I will do later this week), is there a way to have mdmadm resume the resync where it left at reboot time? Here is the error I am seeing in the syslog. Can this actually be a disk error? Oct 18 11:54:34 sylla kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen Oct 18 11:54:34 sylla kernel: ata1.00: irq_stat 0x0040, PHY RDY changed Oct 18 11:54:34 sylla kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Oct 18 11:54:34 sylla kernel: res 40/00:00:19:26:33/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) Oct 18 11:54:35 sylla kernel: ata1: soft resetting port Oct 18 11:54:40 sylla kernel: ata1: failed to reset engine (errno=-95)4ata1: port is slow to respond, please be patient (Status 0xd0) Oct 18 11:54:45 sylla kernel: ata1: softreset failed (device not ready) Oct 18 11:54:45 sylla kernel: ata1: hard resetting port Oct 18 11:54:46 sylla kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 18 11:54:46 sylla kernel: ata1.00: configured for UDMA/133 Oct 18 11:54:46 sylla kernel: ata1: EH complete Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB) Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write Protect is off Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap
On 10/22/07, Neil Brown [EMAIL PROTECTED] wrote: On Friday October 19, [EMAIL PROTECTED] wrote: On 10/19/07, Neil Brown [EMAIL PROTECTED] wrote: On Friday October 19, [EMAIL PROTECTED] wrote: I'm using a stock 2.6.19.7 that I then backported various MD fixes to from 2.6.20 - 2.6.23... this kernel has worked great until I attempted v1.0 sb w/ bitmap=internal using mdadm 2.6.x. But would you like me to try a stock 2.6.22 or 2.6.23 kernel? Yes please. I'm suspecting the code in write_sb_page where it tests if the bitmap overlaps the data or metadata. The only way I can see you getting the exact error that you do get it for that to fail. That test was introduced in 2.6.22. Did you backport that? Any chance it got mucked up a bit? I believe you're referring to commit f0d76d70bc77b9b11256a3a23e98e80878be1578. That change actually made it into 2.6.23 AFAIK; but yes I actually did backport that fix (which depended on ab6085c795a71b6a21afe7469d30a365338add7a). If I back-out f0d76d70bc77b9b11256a3a23e98e80878be1578 I can create a raid1 w/ v1.0 sb and an internal bitmap. But clearly that is just because I removed the negative checks that you introduced ;) For me this begs the question: what else would f0d76d70bc77b9b11256a3a23e98e80878be1578 depend on that I missed? I included 505fa2c4a2f125a70951926dfb22b9cf273994f1 and ab6085c795a71b6a21afe7469d30a365338add7a too. *shrug*... This is all very odd... I definitely tested this last week and couldn't reproduce the problem. This week I can reproduce it easily. And given the nature of the bug, I cannot see how it ever worked. Anyway, here is a fix that works for me. Hey Neil, Your fix works for me too. However, I'm wondering why you held back on fixing the same issue in the bitmap runs into data comparison that follows: --- ./drivers/md/bitmap.c 2007-10-19 19:11:58.0 -0400 +++ ./drivers/md/bitmap.c 2007-10-22 09:53:41.0 -0400 @@ -286,7 +286,7 @@ /* METADATA BITMAP DATA */ if (rdev-sb_offset*2 + bitmap-offset - + page-index*(PAGE_SIZE/512) + size/512 + + (long)(page-index*(PAGE_SIZE/512)) + size/512 rdev-data_offset) /* bitmap runs in to data */ return -EINVAL; Thanks, Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: flaky controller or disk error?
On Mon, 22 Oct 2007, Louis-David Mitterrand wrote: Hi, [using kernel 2.6.23 and mdadm 2.6.3+20070929] I have a rather flaky sata controller with which I am trying to resync a raid5 array. It usually starts failing after 40% of the resync is done. Short of changing the controller (which I will do later this week), is there a way to have mdmadm resume the resync where it left at reboot time? Here is the error I am seeing in the syslog. Can this actually be a disk error? Oct 18 11:54:34 sylla kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen Oct 18 11:54:34 sylla kernel: ata1.00: irq_stat 0x0040, PHY RDY changed Oct 18 11:54:34 sylla kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Oct 18 11:54:34 sylla kernel: res 40/00:00:19:26:33/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) Oct 18 11:54:35 sylla kernel: ata1: soft resetting port Oct 18 11:54:40 sylla kernel: ata1: failed to reset engine (errno=-95)4ata1: port is slow to respond, please be patient (Status 0xd0) Oct 18 11:54:45 sylla kernel: ata1: softreset failed (device not ready) Oct 18 11:54:45 sylla kernel: ata1: hard resetting port Oct 18 11:54:46 sylla kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 18 11:54:46 sylla kernel: ata1.00: configured for UDMA/133 Oct 18 11:54:46 sylla kernel: ata1: EH complete Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB) Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write Protect is off Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Thanks, - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I've seen something similiar, it turned out to be a bad disk. I've also seen it when the cable was loose. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
Does anyone have any insights here? How do I interpret the seemingly competing system iowait numbers... is my system both CPU and PCI bus bound? - Original Message From: nefilim To: linux-raid@vger.kernel.org Sent: Thursday, October 18, 2007 4:45:20 PM Subject: slow raid5 performance Hi Pretty new to software raid, I have the following setup in a file server: /dev/md0: Version : 00.90.03 Creation Time : Wed Oct 10 11:05:46 2007 Raid Level : raid5 Array Size : 976767872 (931.52 GiB 1000.21 GB) Used Dev Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Oct 18 15:02:16 2007 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d Events : 0.9 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 3 x 500GB WD RE2 hard drives AMD Athlon XP 2400 (2.0Ghz), 1GB RAM /dev/sd[ab] are connected to Sil 3112 controller on PCI bus /dev/sd[cde] are connected to Sil 3114 controller on PCI bus Transferring large media files from /dev/sdb to /dev/md0 I see the following with iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.010.00 55.56 40.400.003.03 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda 0.00 0.00 0.00 0 0 sdb 261.6231.09 0.00 30 0 sdc 148.48 0.1516.40 0 16 sdd 102.02 0.4116.14 0 15 sde 113.13 0.2916.18 0 16 md08263.64 0.0032.28 0 31 which is pretty much what I see with hdparm etc. 32MB/s seems pretty slow for drives that can easily do 50MB/s each. Read performance is better around 85MB/s (although I expected somewhat higher). So it doesn't seem that PCI bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s real world?) quite yet... I see a lot of time being spent in the kernel.. and a significant iowait time. The CPU is pretty old but where exactly is the bottleneck? Any thoughts, insights or recommendations welcome! Cheers Peter -- View this message in context: http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909 Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: issues rebuilding raid array.
- Message from [EMAIL PROTECTED] - Date: Mon, 22 Oct 2007 21:46:08 +1000 From: Sam Redfern [EMAIL PROTECTED] Reply-To: Sam Redfern [EMAIL PROTECTED] Subject: Fwd: issues rebuilding raid array. To: linux-raid@vger.kernel.org The array was build using 2.6.18-7 Now i'm using 2.6.21-2 I'm trying to recreate the raid array with the following command and this is the error I get: mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md1 mdadm: no RAID superblock on /dev/sdc mdadm: /dev/sdc has no superblock - assembly aborted So I figure, oh look the disk sdc has gone cactus, I'll just remove it from the list. One of the advantages of mdadm. mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg mdadm: looking for devices for /dev/md1 mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1. mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0. mdadm: /dev/sde is identified as a member of /dev/md1, slot 1. mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5. mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4. mdadm: added /dev/sde to /dev/md1 as 1 mdadm: no uptodate device for slot 2 of /dev/md1 mdadm: no uptodate device for slot 3 of /dev/md1 mdadm: added /dev/sdg to /dev/md1 as 4 mdadm: added /dev/sdf to /dev/md1 as 5 mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument mdadm: added /dev/sdd to /dev/md1 as 0 mdadm: /dev/md1 assembled from 4 drives - not enough to start the array. If found this really difficult to understand considering that I can get the output of mdamd -E /dev/sdb (other disks included to overload you with information) mdadm -E /dev/sd[b-h] /dev/sdb: Magic : a92b4efc Version : 00.90.00 UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf Creation Time : Fri Oct 5 09:18:25 2007 Raid Level : raid5 Device Size : 312571136 (298.09 GiB 320.07 GB) Array Size : 1562855680 (1490.46 GiB 1600.36 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 1 Update Time : Tue Oct 16 20:03:13 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Checksum : 80d47486 - correct Events : 0.623738 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 16 -1 spare /dev/sdb 0 0 8 800 active sync /dev/sdf 1 1 8 1281 active sync /dev/.static/dev/sdi 2 2 8 1442 active sync /dev/.static/dev/sdj 3 3 8 163 active sync /dev/sdb 4 4 8 644 active sync /dev/sde 5 5 8 965 active sync /dev/sdg If anyone could offer a solution I'd be forever grateful, also to prove that supporting open source isn't all free labour I'll send you can choose one of 1 of 2 Nintendo DS games, the new radiohead album or a cree flash light. :) - End message from [EMAIL PROTECTED] - Hey, this looks similar to what I recently had. (http://www.mail-archive.com/linux-raid@vger.kernel.org/msg09306.html) I my case a RAID5 reshape was interrupted and the new devices were also marked spare with slot -1. Apply the attached patch to mdadm-2.6.3, build then do: mdadm -S /dev/md1 ./mdadm -Av /dev/md1 --update=this /dev/sd[b-g] That should update the slot on /dev/sdb. Then: mdadm -S /dev/md1 ./mdadm -Av /dev/md1 /dev/sd[bcdefg] should bring back your array in degraded mode. If it works send your gifts to Neil Brown [EMAIL PROTECTED], he wrote the patch! :) Good luck! #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. diff --git a/Grow.c b/Grow.c index 825747e..8ad1537 100644 --- a/Grow.c +++ b/Grow.c @@ -978,5 +978,5 @@ int Grow_restart(struct supertype *st, struct mdinfo *info, int *fdlist, int cnt /* And we are done! */ return 0; } - return 1; + return 0; } diff --git a/mdadm.c b/mdadm.c index 40fdccf..7e7e803 100644 --- a/mdadm.c +++ b/mdadm.c @@ -584,6 +584,8 @@ int main(int argc, char *argv[]) exit(2); } update = optarg; + if (strcmp(update, this)==0)
Re: slow raid5 performance
- Original Message From: Peter Grandi [EMAIL PROTECTED] Thank you for your insightful response Peter (Yahoo spam filter hid it from me until now). Most 500GB drives can do 60-80MB/s on the outer tracks (30-40MB/s on the inner ones), and 3 together can easily swamp the PCI bus. While you see the write rates of two disks, the OS is really writing to all three disks at the same time, and it will do read-modify-write unless the writes are exactly stripe aligned. When RMW happens write speed is lower than writing to a single disk. I can understand that if a RMW happens it will effectively lower the write throughput substantially but I'm not sure entirely sure why this would happen while writing new content, I don't know enough about RAID internals. Would this be the case the majority of time? The system time is because the Linux page cache etc. is CPU bound (never mind RAID5 XOR computation, which is not that big). The IO wait is because IO is taking place. http://www.sabi.co.uk/blog/anno05-4th.html#051114 Almost all kernel developers of note have been hired by wealthy corporations who sell to people buying large servers. Then the typical system that these developers may have and also target are high ends 2-4 CPU workstations and servers, with CPUs many times faster than your PC, and on those system the CPU overhead of the page cache at speeds like yours less than 5%. My impression is that something that takes less than 5% on a developers's system does not get looked at, even if it takes 50% on your system. The Linux kernel was very efficient when most developers were using old cheap PCs themselves. scratch your itch rules. This is a rather unfortunate situation, it seems that some of the roots are forgotten, especially in a case like this where one would think running a file server on a modest CPU should be enough. I was waiting for Phenom and AM2+ motherboards to become available before relegating this X2 4600+ to file server duty, guess I'll need to stay with the slow performance for a few more months. Anyhow, try to bypass the page cache with 'O_DIRECT' or test with 'dd oflag=direct' and similar for an alterative code path. I'll give this a try, thanks. Misaligned writes and page cache CPU time most likely. What influence would adding more harddrives to this RAID have? I know in terms of a Netapp filer they always talk about spindle count for performance. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
Thanks Justin, good to hear about some real world experience. - Original Message From: Justin Piszcz [EMAIL PROTECTED] To: Peter [EMAIL PROTECTED] Cc: linux-raid@vger.kernel.org Sent: Monday, October 22, 2007 9:58:16 AM Subject: Re: slow raid5 performance With SW RAID 5 on the PCI bus you are not going to see faster than 38-42 MiB/s. Especially with only three drives it may be slower than that. Forget / stop using the PCI bus and expect high transfer rates. For writes = 38-42 MiB/s sw raid5. For reads = you will get close to 120-122 MiB/s sw raid5. This is from a lot of testing going up to 400GB x 10 drives using PCI cards on a regular PCI bus. Then I went PCI-e and used faster disks to get 0.5gigabytes/sec SW raid5. Justin. On Mon, 22 Oct 2007, Peter wrote: Does anyone have any insights here? How do I interpret the seemingly competing system iowait numbers... is my system both CPU and PCI bus bound? - Original Message From: nefilim To: linux-raid@vger.kernel.org Sent: Thursday, October 18, 2007 4:45:20 PM Subject: slow raid5 performance Hi Pretty new to software raid, I have the following setup in a file server: /dev/md0: Version : 00.90.03 Creation Time : Wed Oct 10 11:05:46 2007 Raid Level : raid5 Array Size : 976767872 (931.52 GiB 1000.21 GB) Used Dev Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Oct 18 15:02:16 2007 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d Events : 0.9 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 3 x 500GB WD RE2 hard drives AMD Athlon XP 2400 (2.0Ghz), 1GB RAM /dev/sd[ab] are connected to Sil 3112 controller on PCI bus /dev/sd[cde] are connected to Sil 3114 controller on PCI bus Transferring large media files from /dev/sdb to /dev/md0 I see the following with iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.010.00 55.56 40.400.003.03 Device:tpsMB_read/sMB_wrtn/sMB_read MB_wrtn sda 0.00 0.00 0.00 0 0 sdb 261.6231.09 0.00 30 0 sdc 148.48 0.1516.40 0 16 sdd 102.02 0.4116.14 0 15 sde 113.13 0.2916.18 0 16 md08263.64 0.0032.28 0 31 which is pretty much what I see with hdparm etc. 32MB/s seems pretty slow for drives that can easily do 50MB/s each. Read performance is better around 85MB/s (although I expected somewhat higher). So it doesn't seem that PCI bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s real world?) quite yet... I see a lot of time being spent in the kernel.. and a significant iowait time. The CPU is pretty old but where exactly is the bottleneck? Any thoughts, insights or recommendations welcome! Cheers Peter -- View this message in context: http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909 Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
Peter wrote: Thanks Justin, good to hear about some real world experience. Hi Peter, I recently built a 3 drive RAID5 using the onboard SATA controllers on an MCP55 based board and get around 115MB/s write and 141MB/s read. A fourth drive was added some time later and after growing the array and filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% full. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow raid5 performance
On Tue, 23 Oct 2007, Richard Scobie wrote: Peter wrote: Thanks Justin, good to hear about some real world experience. Hi Peter, I recently built a 3 drive RAID5 using the onboard SATA controllers on an MCP55 based board and get around 115MB/s write and 141MB/s read. A fourth drive was added some time later and after growing the array and filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% full. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Yes, your chipset must be PCI-e based and not PCI. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
[ I was going to reply to this earlier, but the Red Sox and good weather got into the way this weekend. ;-] Michael == Michael Tokarev [EMAIL PROTECTED] writes: Michael I'm doing a sysadmin work for about 15 or 20 years. Welcome to the club! It's a fun career, always something new to learn. If you are going to mirror an existing filesystem, then by definition you have a second disk or partition available for the purpose. So you would merely setup the new RAID1, in degraded mode, using the new partition as the base. Then you copy the data over to the new RAID1 device, change your boot setup, and reboot. Michael And you have to copy the data twice as a result, instead of Michael copying it only once to the second disk. So? Why is this such a big deal? As I see it, there are two seperate ways to setup a RAID1 setup, on an OS. 1. The mirror is built ahead of time and you install onto the mirror. And twice as much data gets written, half to each disk. *grin* 2. You are encapsulating an existing OS install and you need to do a reboot from the un-mirrored OS to the mirrored setup. So yes, you do have to copy the data from the orig to the mirror, reboot, then resync back onto the original disk whish has been added into the the RAID set. Neither case is really that big a deal. And with the RAID super block at the front of the disk, you don't have to worry about mixing up which disk is which. It's not fun when you boot one disk, thinking it's the RAID disk, but end up booting the original disk. As Doug says, and I agree strongly, you DO NOT want to have the possibility of confusion and data loss, especially on bootup. And Michael There are different point of views, and different settings Michael etc. For example, I once dealt with a linux user who was Michael unable to use his disk partition, because his system (it was Michael RedHat if I remember correctly) recognized some LVM volume on Michael his disk (it was previously used with Windows) and tried to Michael automatically activate it, thus making it busy. What I'm Michael talking about here is that any automatic activation of Michael anything should be done with extreme care, using smart logic Michael in the startup scripts if at all. Ah... but you can also de-active LVM partitions as well if you like. Michael The Doug's example - in my opinion anyway - shows wrong tools Michael or bad logic in the startup sequence, not a general flaw in Michael superblock location. I don't agree completely. I think the superblock location is a key issue, because if you have a superblock location which moves depending the filesystem or LVM you use to look at the partition (or full disk) then you need to be even more careful about how to poke at things. This is really true when you use the full disk for the mirror, because then you don't have the partition table to base some initial guestimates on. Since there is an explicit Linux RAID partition type, as well as an explicit linux filesystem (filesystem is then decoded from the first Nk of the partition), you have a modicum of safety. If ext3 has the superblock in the first 4k of the disk, but you've setup the disk to use RAID1 with the LVM superblock at the end of the disk, you now need to be careful about how the disk is detected and then mounted. To the ext3 detection logic, it looks like an ext3 filesystem, to LVM, it looks like a RAID partition. Which is correct? Which is wrong? How do you tell programmatically? That's what I think that all superblocks should be in the SAME location on the disk and/or partitions if used. It keeps down problems like this. Michael Another example is ext[234]fs - it does not touch first 512 Michael bytes of the device, so if there was an msdos filesystem Michael there before, it will be recognized as such by many tools, Michael and an attempt to mount it automatically will lead to at Michael least scary output and nothing mounted, or in fsck doing Michael fatal things to it in worst scenario. Sure thing the first Michael 512 bytes should be just cleared.. but that's another topic. I would argue that ext[234] should be clearing those 512 bytes. Why aren't they cleared Michael Speaking of cases where it was really helpful to have an Michael ability to mount individual raid components directly without Michael the raid level - most of them was due to one or another Michael operator errors, usually together with bugs and/or omissions Michael in software. I don't remember exact scenarious anymore (last Michael time it was more than 2 years ago). Most of the time it was Michael one or another sort of system recovery. In this case, you're only talking about RAID1 mirrors, no other RAID configuration fits this scenario. And while this might look to be helpful, I would strongly argue that it's not, because it's a special case of the RAID code and can lead to all kinds of bugs and problems if it's not
Re: slow raid5 performance
On Mon, 22 Oct 2007 15:33:09 -0400 (EDT), Justin Piszcz [EMAIL PROTECTED] said: [ ... speed difference between PCI and PCIe RAID HAs ... ] I recently built a 3 drive RAID5 using the onboard SATA controllers on an MCP55 based board and get around 115MB/s write and 141MB/s read. A fourth drive was added some time later and after growing the array and filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% full. jpiszcz Yes, your chipset must be PCI-e based and not PCI. Broadly speaking yes (the MCP55 is a PCIe chipset), but it is more complicated than that. The south bridge chipset host adapters often have a rather faster link to memory and the CPU interconnect than the PCI or PCIe buses can provide, even when they are externally ''PCI''. Also, when the RAID HA is not in-chipset it also matters a fair bit how many lanes the PCIe slot (or whether it is PCI-X 64 bit and 66MHz) it is plugged in has -- most PCIe RAID HAs can use 4 or 8 lanes (or equivalent for PCI-X). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
John Stoffel wrote: Michael == Michael Tokarev [EMAIL PROTECTED] writes: If you are going to mirror an existing filesystem, then by definition you have a second disk or partition available for the purpose. So you would merely setup the new RAID1, in degraded mode, using the new partition as the base. Then you copy the data over to the new RAID1 device, change your boot setup, and reboot. Michael And you have to copy the data twice as a result, instead of Michael copying it only once to the second disk. So? Why is this such a big deal? As I see it, there are two seperate ways to setup a RAID1 setup, on an OS. [..] that was just a tiny nitpick, so to say, about a particular way to convert existing system into raid1 - not something which's done every day anyway. Still, double the time for copying your terabyte-sized drive is something to consider. [] Michael automatically activate it, thus making it busy. What I'm Michael talking about here is that any automatic activation of Michael anything should be done with extreme care, using smart logic Michael in the startup scripts if at all. Ah... but you can also de-active LVM partitions as well if you like. Yes, esp. being a newbie user who first installed linux on his PC just to see that he can't use his disk.. ;) That was a real situation - I helped someone who had never heard of LVM and did little of anything with filesystems/disks before. Michael The Doug's example - in my opinion anyway - shows wrong tools Michael or bad logic in the startup sequence, not a general flaw in Michael superblock location. I don't agree completely. I think the superblock location is a key issue, because if you have a superblock location which moves depending the filesystem or LVM you use to look at the partition (or full disk) then you need to be even more careful about how to poke at things. Superblock location does not depend on the filesystem. Raid exports the inside space only, excluding superblocks, to the next level (filesystem or else). This is really true when you use the full disk for the mirror, because then you don't have the partition table to base some initial guestimates on. Since there is an explicit Linux RAID partition type, as well as an explicit linux filesystem (filesystem is then decoded from the first Nk of the partition), you have a modicum of safety. Speaking of whole disks - first, don't do that (for reasons suitable for another topic), and second, using the whole disk or partitions makes no real difference whatsoever to the topic being discussed. There's just no need for the guesswork, except for the first install (to automatically recognize existing devices, and to use them after confirmation), and maybe for rescue systems, which again is a different topic. In any case, for a tool that does a guesswork (like libvolume-id, to create /dev/ symlinks), it's as easy to look at the end of the device as to the beginning or to any other fixed place - since the tool has to know the superblock format, it knows superblock location as well). Maybe manual guesswork, based on hexdump of first several kilobytes of data, is a bit more difficult in case where superblock is located at the end. But if one has to analyze hexdump, he doesn't care about raid anymore. If ext3 has the superblock in the first 4k of the disk, but you've setup the disk to use RAID1 with the LVM superblock at the end of the disk, you now need to be careful about how the disk is detected and then mounted. See above. For tools, it's trivial to distinguish a component of a raid volume from the volume itself, by looking for superblock at whatever location. Including stuff like mkfs, which - like mdadm does - may warn one about previous filesystem/volume information on the device in question. Michael Speaking of cases where it was really helpful to have an Michael ability to mount individual raid components directly without Michael the raid level - most of them was due to one or another Michael operator errors, usually together with bugs and/or omissions Michael in software. I don't remember exact scenarious anymore (last Michael time it was more than 2 years ago). Most of the time it was Michael one or another sort of system recovery. In this case, you're only talking about RAID1 mirrors, no other RAID configuration fits this scenario. And while this might look to be Definitely. However, linear - to some extent - can be used partially. But sure with much less usefulness. However, raid1 is much more common setup than anything else - IMHO anyway. It's the cheapest and the most reliable thing for an average user anyway - it's cheaper to get 2 large drives than to, say, 3 a bit smaller drives. Yes, raid1 has 1/2 space wasted, compared with, say, raid5 on top of 3 drives (only 1/3 wasted), but still 3 smallish drives costs more than 2 larger drives. helpful, I would strongly argue that it's not, because it's a special
mdadm devices building in the wrong order
Hello, I am having a rather urgent and annoying problem and I would appreciate some input from anyone who has come across this. I have not been able to find a solution as of yet. My issue deals with nested raid using mdadm, and it seems that upon a reboot mdadm is attempting to assemble the larger array before the smaller component array is created, and thus it is failing. I have a degraded raid 5 array md1 whch is composed of hda1 and md0. Upon a reboot, mdadm attempts to build md1 before md0 is built. It fails, so md1 is not build and I need to assemble it manually. Is there a solution for this? Thanks you for your time, Marc p.s. I would like to also take a second to express my gratitude to Neil Brown for the mdadm utility. I have found it very useful and it has made working with raid in linux very enjoyable and straight-forward! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm devices building in the wrong order
On Monday October 22, [EMAIL PROTECTED] wrote: Hello, I am having a rather urgent and annoying problem and I would appreciate some input from anyone who has come across this. I have not been able to find a solution as of yet. My issue deals with nested raid using mdadm, and it seems that upon a reboot mdadm is attempting to assemble the larger array before the smaller component array is created, and thus it is failing. I have a degraded raid 5 array md1 whch is composed of hda1 and md0. Upon a reboot, mdadm attempts to build md1 before md0 is built. It fails, so md1 is not build and I need to assemble it manually. Is there a solution for this? What order are the arrays listed in in mdadm.conf? If md1 comes first, put it last. Otherwise, I cannot think what might be happening. Maybe if you include some kernel logs that might help. Thanks you for your time, Marc p.s. I would like to also take a second to express my gratitude to Neil Brown for the mdadm utility. I have found it very useful and it has made working with raid in linux very enjoyable and straight-forward! Thanks :-) NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap
On Monday October 22, [EMAIL PROTECTED] wrote: Hey Neil, Your fix works for me too. However, I'm wondering why you held back on fixing the same issue in the bitmap runs into data comparison that follows: It isn't really needed here. In this case bitmap-offset is positive, so all the numbers are positive, so it doesn't matter if the comparison is signed or not. Thanks for mentioning it though. NeilBrown --- ./drivers/md/bitmap.c 2007-10-19 19:11:58.0 -0400 +++ ./drivers/md/bitmap.c 2007-10-22 09:53:41.0 -0400 @@ -286,7 +286,7 @@ /* METADATA BITMAP DATA */ if (rdev-sb_offset*2 + bitmap-offset - + page-index*(PAGE_SIZE/512) + size/512 + + (long)(page-index*(PAGE_SIZE/512)) + size/512 rdev-data_offset) /* bitmap runs in to data */ return -EINVAL; Thanks, Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html