Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-22 Thread Neil Brown
On Friday October 19, [EMAIL PROTECTED] wrote:
 On 10/19/07, Neil Brown [EMAIL PROTECTED] wrote:
  On Friday October 19, [EMAIL PROTECTED] wrote:
 
   I'm using a stock 2.6.19.7 that I then backported various MD fixes to
   from 2.6.20 - 2.6.23...  this kernel has worked great until I
   attempted v1.0 sb w/ bitmap=internal using mdadm 2.6.x.
  
   But would you like me to try a stock 2.6.22 or 2.6.23 kernel?
 
  Yes please.
  I'm suspecting the code in write_sb_page where it tests if the bitmap
  overlaps the data or metadata.  The only way I can see you getting the
  exact error that you do get it for that to fail.
  That test was introduced in 2.6.22.  Did you backport that?  Any
  chance it got mucked up a bit?
 
 I believe you're referring to commit
 f0d76d70bc77b9b11256a3a23e98e80878be1578.  That change actually made
 it into 2.6.23 AFAIK; but yes I actually did backport that fix (which
 depended on ab6085c795a71b6a21afe7469d30a365338add7a).
 
 If I back-out f0d76d70bc77b9b11256a3a23e98e80878be1578 I can create a
 raid1 w/ v1.0 sb and an internal bitmap.  But clearly that is just
 because I removed the negative checks that you introduced ;)
 
 For me this begs the question: what else would
 f0d76d70bc77b9b11256a3a23e98e80878be1578 depend on that I missed?  I
 included 505fa2c4a2f125a70951926dfb22b9cf273994f1 and
   ab6085c795a71b6a21afe7469d30a365338add7a too.
 
 *shrug*...
 

This is all very odd...
I definitely tested this last week and couldn't reproduce the
problem.  This week I can reproduce it easily.  And given the nature
of the bug, I cannot see how it ever worked.

Anyway, here is a fix that works for me.

NeilBrown


Fix an unsigned compare to allow creation of bitmaps with v1.0 metadata.

As page-index is unsigned, this all becomes an unsigned comparison, which
 almost always returns an error.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c   2007-10-22 16:47:52.0 +1000
+++ ./drivers/md/bitmap.c   2007-10-22 16:50:10.0 +1000
@@ -274,7 +274,7 @@ static int write_sb_page(struct bitmap *
if (bitmap-offset  0) {
/* DATA  BITMAP METADATA  */
if (bitmap-offset
-   + page-index * (PAGE_SIZE/512)
+   + (long)(page-index * (PAGE_SIZE/512))
+ size/512  0)
/* bitmap runs in to metadata */
return -EINVAL;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-10-22 Thread NeilBrown

It appears that a couple of bugs slipped in to md for 2.6.23.
These two patches fix them and are appropriate for 2.6.23.y as well
as 2.6.24-rcX

Thanks,
NeilBrown

 [PATCH 001 of 2] md: Fix an unsigned compare to allow creation of bitmaps with 
v1.0 metadata.
 [PATCH 002 of 2] md: raid5: fix clearing of biofill operations
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 001 of 2] md: Fix an unsigned compare to allow creation of bitmaps with v1.0 metadata.

2007-10-22 Thread NeilBrown

As page-index is unsigned, this all becomes an unsigned comparison, which
 almost always returns an error.

Signed-off-by: Neil Brown [EMAIL PROTECTED]
Cc: Stable [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c   2007-10-22 16:55:48.0 +1000
+++ ./drivers/md/bitmap.c   2007-10-22 16:55:52.0 +1000
@@ -274,7 +274,7 @@ static int write_sb_page(struct bitmap *
if (bitmap-offset  0) {
/* DATA  BITMAP METADATA  */
if (bitmap-offset
-   + page-index * (PAGE_SIZE/512)
+   + (long)(page-index * (PAGE_SIZE/512))
+ size/512  0)
/* bitmap runs in to metadata */
return -EINVAL;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 002 of 2] md: raid5: fix clearing of biofill operations

2007-10-22 Thread NeilBrown

From: Dan Williams [EMAIL PROTECTED]

ops_complete_biofill() runs outside of spin_lock(sh-lock) and clears the
'pending' and 'ack' bits.  Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.

Move the clearing of these bits to handle_stripe5(), under the lock.

Signed-off-by: Dan Williams [EMAIL PROTECTED]
Tested-by: Joël Bertrand [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]
Cc: Stable [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2007-10-22 16:55:49.0 +1000
+++ ./drivers/md/raid5.c2007-10-22 16:57:41.0 +1000
@@ -665,7 +665,12 @@ static unsigned long get_stripe_work(str
ack++;
 
sh-ops.count -= ack;
-   BUG_ON(sh-ops.count  0);
+   if (unlikely(sh-ops.count  0)) {
+   printk(KERN_ERR pending: %#lx ops.pending: %#lx ops.ack: %#lx 
+   ops.complete: %#lx\n, pending, sh-ops.pending,
+   sh-ops.ack, sh-ops.complete);
+   BUG();
+   }
 
return pending;
 }
@@ -842,8 +847,7 @@ static void ops_complete_biofill(void *s
}
}
}
-   clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
-   clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
+   set_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
 
return_io(return_bi);
 
@@ -3130,6 +3134,13 @@ static void handle_stripe5(struct stripe
s.expanded = test_bit(STRIPE_EXPAND_READY, sh-state);
/* Now to look around and see what can be done */
 
+   /* clean-up completed biofill operations */
+   if (test_bit(STRIPE_OP_BIOFILL, sh-ops.complete)) {
+   clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);
+   clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
+   clear_bit(STRIPE_OP_BIOFILL, sh-ops.complete);
+   }
+
rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: very degraded RAID5, or increasing capacity by adding discs

2007-10-22 Thread Louis-David Mitterrand
On Tue, Oct 09, 2007 at 01:48:50PM +0400, Michael Tokarev wrote:
 
 There still is - at least for ext[23].  Even offline resizers
 can't do resizes from any to any size, extfs developers recommend
 to recreate filesystem anyway if size changes significantly.
 I'm too lazy to find a reference now, it has been mentioned here
 on linux-raid at least this year.  It's sorta like fat (yea, that
 ms-dog filesystem) - when you resize it from, say, 501Mb to 999Mb,
 everything is ok, but if you want to go from 501Mb to 1Gb+1, you
 have to recreate almost all data structures because sizes of
 all internal fields changes - and here it's much safer to just
 re-create it from scratch than trying to modify it in place.
 Sure it's much better for extfs, but the point is still the same.

I'll just mention that I once resized a multi-Tera ext3 filesystem and 
it took 8hours +, a comparable XFS online resize lasted all of 10 
seconds! 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: issues rebuilding raid array.

2007-10-22 Thread Sam Redfern
Greetings happy mdadm users.

I have a little problem that after many hours of searching around I
couldn't seem to solve.

I have upgraded my motherboard and kernel (bad practice I know but the
ICH9R controller needs  2.6.2*+) at the same time.

The array was build using 2.6.18-7 Now i'm using  2.6.21-2

I'm trying to recreate the raid array with the following command and
this is the error I get:

mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde
/dev/sdf /dev/sdg
mdadm: looking for devices for /dev/md1
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has no superblock - assembly aborted

So I figure, oh look the disk sdc has gone cactus, I'll just remove it
from the list. One of the advantages of mdadm.

mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg
mdadm: looking for devices for /dev/md1
mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1.
mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0.
mdadm: /dev/sde is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5.
mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4.
mdadm: added /dev/sde to /dev/md1 as 1
mdadm: no uptodate device for slot 2 of /dev/md1
mdadm: no uptodate device for slot 3 of /dev/md1
mdadm: added /dev/sdg to /dev/md1 as 4
mdadm: added /dev/sdf to /dev/md1 as 5
mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument
mdadm: added /dev/sdd to /dev/md1 as 0
mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.

If found this really difficult to understand considering that I can
get the output of mdamd -E /dev/sdb (other disks included to overload
you with information)

mdadm -E /dev/sd[b-h]

/dev/sdb:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
  Creation Time : Fri Oct  5 09:18:25 2007
 Raid Level : raid5
Device Size : 312571136 (298.09 GiB 320.07 GB)
 Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 1

Update Time : Tue Oct 16 20:03:13 2007
  State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 80d47486 - correct
 Events : 0.623738

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 6   8   16   -1  spare   /dev/sdb

   0 0   8   800  active sync   /dev/sdf
   1 1   8  1281  active sync   /dev/.static/dev/sdi
   2 2   8  1442  active sync   /dev/.static/dev/sdj
   3 3   8   163  active sync   /dev/sdb
   4 4   8   644  active sync   /dev/sde
   5 5   8   965  active sync   /dev/sdg
mdadm: No md superblock detected on /dev/sdc.
/dev/sdd:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
  Creation Time : Fri Oct  5 09:18:25 2007
 Raid Level : raid5
Device Size : 312571136 (298.09 GiB 320.07 GB)
 Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 1

Update Time : Tue Oct 16 20:03:13 2007
  State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 80d474a8 - correct
 Events : 0.623738

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 0   8   800  active sync   /dev/sdf

   0 0   8   800  active sync   /dev/sdf
1 1   8  1281  active sync   /dev/.static/dev/sdi
   2 2   8  1442  active sync   /dev/.static/dev/sdj
   3 3   8   163  active sync   /dev/sdb
   4 4   8   644  active sync   /dev/sde
   5 5   8   965  active sync   /dev/sdg
/dev/sde:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
  Creation Time : Fri Oct  5 09:18:25 2007
 Raid Level : raid5
Device Size : 312571136 (298.09 GiB 320.07 GB)
 Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 1

Update Time : Tue Oct 16 20:03:13 2007
  State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 80d474da - correct
 Events : 0.623738

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 1   8  1281  active sync   /dev/.static/dev/sdi

   0 0   8   800  active sync   /dev/sdf
   1 1   8  1281  active sync   /dev/.static/dev/sdi
   2 2   8  1442  

Re: Fwd: issues rebuilding raid array.

2007-10-22 Thread Robin Hill
On Mon Oct 22, 2007 at 09:46:08PM +1000, Sam Redfern wrote:

 Greetings happy mdadm users.
 
 I have a little problem that after many hours of searching around I
 couldn't seem to solve.
 
 I have upgraded my motherboard and kernel (bad practice I know but the
 ICH9R controller needs  2.6.2*+) at the same time.
 
 The array was build using 2.6.18-7 Now i'm using  2.6.21-2
 
 I'm trying to recreate the raid array with the following command and
 this is the error I get:
 
 mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde
 /dev/sdf /dev/sdg
 mdadm: looking for devices for /dev/md1
 mdadm: no RAID superblock on /dev/sdc
 mdadm: /dev/sdc has no superblock - assembly aborted
 
You're trying to assemble the array from 6 disks here and one looks to
be dodgy.  That's okay so far.

 So I figure, oh look the disk sdc has gone cactus, I'll just remove it
 from the list. One of the advantages of mdadm.
 
 mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg
 mdadm: looking for devices for /dev/md1
 mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1.
 mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0.
 mdadm: /dev/sde is identified as a member of /dev/md1, slot 1.
 mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5.
 mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4.
 mdadm: added /dev/sde to /dev/md1 as 1
 mdadm: no uptodate device for slot 2 of /dev/md1
 mdadm: no uptodate device for slot 3 of /dev/md1
 mdadm: added /dev/sdg to /dev/md1 as 4
 mdadm: added /dev/sdf to /dev/md1 as 5
 mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument
 mdadm: added /dev/sdd to /dev/md1 as 0
 mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.
 
Now you're trying to assemble with 5 disks and getting 4 out of 6 in the
array, and one at slot -1 (i.e. a spare).

 If found this really difficult to understand considering that I can
 get the output of mdamd -E /dev/sdb (other disks included to overload
 you with information)
 
 mdadm -E /dev/sd[b-h]
 
 /dev/sdb:
   Magic : a92b4efc
 Version : 00.90.00
UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
   Creation Time : Fri Oct  5 09:18:25 2007
  Raid Level : raid5
 Device Size : 312571136 (298.09 GiB 320.07 GB)
  Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
Raid Devices : 6
   Total Devices : 6
 Preferred Minor : 1
 
 Update Time : Tue Oct 16 20:03:13 2007
   State : clean
  Active Devices : 6
 Working Devices : 6
  Failed Devices : 0
   Spare Devices : 0
Checksum : 80d47486 - correct
  Events : 0.623738
 
  Layout : left-symmetric
  Chunk Size : 64K
 
   Number   Major   Minor   RaidDevice State
 this 6   8   16   -1  spare   /dev/sdb
 
0 0   8   800  active sync   /dev/sdf
1 1   8  1281  active sync   /dev/.static/dev/sdi
2 2   8  1442  active sync   /dev/.static/dev/sdj
3 3   8   163  active sync   /dev/sdb
4 4   8   644  active sync   /dev/sde
5 5   8   965  active sync   /dev/sdg

And here we see that the array has 6 active devices and a spare.  You
currently have 4 working active devices, a failed active device and the
spare.  What's happened to the other device?  You can't get the array
working with 4 out of 6 devices so you'll need to either find the other
active device (and rebuild onto the spare) or get the failed disk
working again.

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpHEB3tyY49z.pgp
Description: PGP signature


flaky controller or disk error?

2007-10-22 Thread Louis-David Mitterrand
Hi,

[using kernel 2.6.23 and mdadm 2.6.3+20070929]

I have a rather flaky sata controller with which I am trying to resync a raid5
array. It usually starts failing after 40% of the resync is done. Short of
changing the controller (which I will do later this week), is there a way to
have mdmadm resume the resync where it left at reboot time?

Here is the error I am seeing in the syslog. Can this actually be a disk 
error?

Oct 18 11:54:34 sylla kernel: ata1.00: exception Emask 0x10 SAct 0x0 
SErr 0x1 action 0x2 frozen
Oct 18 11:54:34 sylla kernel: ata1.00: irq_stat 0x0040, PHY RDY 
changed
Oct 18 11:54:34 sylla kernel: ata1.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
Oct 18 11:54:34 sylla kernel: res 40/00:00:19:26:33/00:00:3a:00:00/40 
Emask 0x10 (ATA bus error)
Oct 18 11:54:35 sylla kernel: ata1: soft resetting port
Oct 18 11:54:40 sylla kernel: ata1: failed to reset engine 
(errno=-95)4ata1: port is slow to respond, please be patient (Status 0xd0)
Oct 18 11:54:45 sylla kernel: ata1: softreset failed (device not ready)
Oct 18 11:54:45 sylla kernel: ata1: hard resetting port
Oct 18 11:54:46 sylla kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 
SControl 300)
Oct 18 11:54:46 sylla kernel: ata1.00: configured for UDMA/133
Oct 18 11:54:46 sylla kernel: ata1: EH complete
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] 976773168 512-byte 
hardware sectors (500108 MB)
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA


Thanks,
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-22 Thread Mike Snitzer
On 10/22/07, Neil Brown [EMAIL PROTECTED] wrote:
 On Friday October 19, [EMAIL PROTECTED] wrote:
  On 10/19/07, Neil Brown [EMAIL PROTECTED] wrote:
   On Friday October 19, [EMAIL PROTECTED] wrote:
 
I'm using a stock 2.6.19.7 that I then backported various MD fixes to
from 2.6.20 - 2.6.23...  this kernel has worked great until I
attempted v1.0 sb w/ bitmap=internal using mdadm 2.6.x.
   
But would you like me to try a stock 2.6.22 or 2.6.23 kernel?
  
   Yes please.
   I'm suspecting the code in write_sb_page where it tests if the bitmap
   overlaps the data or metadata.  The only way I can see you getting the
   exact error that you do get it for that to fail.
   That test was introduced in 2.6.22.  Did you backport that?  Any
   chance it got mucked up a bit?
 
  I believe you're referring to commit
  f0d76d70bc77b9b11256a3a23e98e80878be1578.  That change actually made
  it into 2.6.23 AFAIK; but yes I actually did backport that fix (which
  depended on ab6085c795a71b6a21afe7469d30a365338add7a).
 
  If I back-out f0d76d70bc77b9b11256a3a23e98e80878be1578 I can create a
  raid1 w/ v1.0 sb and an internal bitmap.  But clearly that is just
  because I removed the negative checks that you introduced ;)
 
  For me this begs the question: what else would
  f0d76d70bc77b9b11256a3a23e98e80878be1578 depend on that I missed?  I
  included 505fa2c4a2f125a70951926dfb22b9cf273994f1 and
ab6085c795a71b6a21afe7469d30a365338add7a too.
 
  *shrug*...
 

 This is all very odd...
 I definitely tested this last week and couldn't reproduce the
 problem.  This week I can reproduce it easily.  And given the nature
 of the bug, I cannot see how it ever worked.

 Anyway, here is a fix that works for me.

Hey Neil,

Your fix works for me too.  However, I'm wondering why you held back
on fixing the same issue in the bitmap runs into data comparison
that follows:

--- ./drivers/md/bitmap.c 2007-10-19 19:11:58.0 -0400
+++ ./drivers/md/bitmap.c 2007-10-22 09:53:41.0 -0400
@@ -286,7 +286,7 @@
/* METADATA BITMAP DATA */
if (rdev-sb_offset*2
+ bitmap-offset
-   + page-index*(PAGE_SIZE/512) + size/512
+   +
(long)(page-index*(PAGE_SIZE/512)) + size/512
 rdev-data_offset)
/* bitmap runs in to data */
return -EINVAL;

Thanks,
Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: flaky controller or disk error?

2007-10-22 Thread Justin Piszcz



On Mon, 22 Oct 2007, Louis-David Mitterrand wrote:


Hi,

[using kernel 2.6.23 and mdadm 2.6.3+20070929]

I have a rather flaky sata controller with which I am trying to resync a raid5
array. It usually starts failing after 40% of the resync is done. Short of
changing the controller (which I will do later this week), is there a way to
have mdmadm resume the resync where it left at reboot time?

Here is the error I am seeing in the syslog. Can this actually be a disk
error?

Oct 18 11:54:34 sylla kernel: ata1.00: exception Emask 0x10 SAct 0x0 
SErr 0x1 action 0x2 frozen
Oct 18 11:54:34 sylla kernel: ata1.00: irq_stat 0x0040, PHY RDY 
changed
Oct 18 11:54:34 sylla kernel: ata1.00: cmd 
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Oct 18 11:54:34 sylla kernel: res 40/00:00:19:26:33/00:00:3a:00:00/40 
Emask 0x10 (ATA bus error)
Oct 18 11:54:35 sylla kernel: ata1: soft resetting port
Oct 18 11:54:40 sylla kernel: ata1: failed to reset engine 
(errno=-95)4ata1: port is slow to respond, please be patient (Status 0xd0)
Oct 18 11:54:45 sylla kernel: ata1: softreset failed (device not ready)
Oct 18 11:54:45 sylla kernel: ata1: hard resetting port
Oct 18 11:54:46 sylla kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 
SControl 300)
Oct 18 11:54:46 sylla kernel: ata1.00: configured for UDMA/133
Oct 18 11:54:46 sylla kernel: ata1: EH complete
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] 976773168 512-byte 
hardware sectors (500108 MB)
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write Protect is off
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 18 11:54:46 sylla kernel: sd 0:0:0:0: [sda] Write cache: enabled, 
read cache: enabled, doesn't support DPO or FUA


Thanks,
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I've seen something similiar, it turned out to be a bad disk.

I've also seen it when the cable was loose.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Peter
Does anyone have any insights here? How do I interpret the seemingly competing 
system  iowait numbers... is my system both CPU and PCI bus bound? 

- Original Message 
From: nefilim
To: linux-raid@vger.kernel.org
Sent: Thursday, October 18, 2007 4:45:20 PM
Subject: slow raid5 performance



Hi

Pretty new to software raid, I have the following setup in a file
 server:

/dev/md0:
Version : 00.90.03
  Creation Time : Wed Oct 10 11:05:46 2007
 Raid Level : raid5
 Array Size : 976767872 (931.52 GiB 1000.21 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Oct 18 15:02:16 2007
  State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d
 Events : 0.9

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1

3 x 500GB WD RE2 hard drives
AMD Athlon XP 2400 (2.0Ghz), 1GB RAM
/dev/sd[ab] are connected to Sil 3112 controller on PCI bus
/dev/sd[cde] are connected to Sil 3114 controller on PCI bus

Transferring large media files from /dev/sdb to /dev/md0 I see the
 following
with iostat:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.010.00   55.56   40.400.003.03

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda   0.00 0.00 0.00  0  0
sdb 261.6231.09 0.00 30  0
sdc 148.48 0.1516.40  0 16
sdd 102.02 0.4116.14  0 15
sde 113.13 0.2916.18  0 16
md08263.64 0.0032.28  0 31
 
which is pretty much what I see with hdparm etc. 32MB/s seems pretty
 slow
for drives that can easily do 50MB/s each. Read performance is better
 around
85MB/s (although I expected somewhat higher). So it doesn't seem that
 PCI
bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s
 real
world?) quite yet... I see a lot of time being spent in the kernel..
 and a
significant iowait time. The CPU is pretty old but where exactly is the
bottleneck? 

Any thoughts, insights or recommendations welcome!

Cheers
Peter
-- 
View this message in context:
 http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909
Sent from the linux-raid mailing list archive at Nabble.com.

-
To unsubscribe from this list: send the line unsubscribe linux-raid
 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: issues rebuilding raid array.

2007-10-22 Thread Nagilum

- Message from [EMAIL PROTECTED] -
Date: Mon, 22 Oct 2007 21:46:08 +1000
From: Sam Redfern [EMAIL PROTECTED]
Reply-To: Sam Redfern [EMAIL PROTECTED]
 Subject: Fwd: issues rebuilding raid array.
  To: linux-raid@vger.kernel.org



The array was build using 2.6.18-7 Now i'm using  2.6.21-2

I'm trying to recreate the raid array with the following command and
this is the error I get:

mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdc /dev/sdd /dev/sde
/dev/sdf /dev/sdg
mdadm: looking for devices for /dev/md1
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has no superblock - assembly aborted

So I figure, oh look the disk sdc has gone cactus, I'll just remove it
from the list. One of the advantages of mdadm.

mca4:~# mdadm -Av /dev/md1 /dev/sdb /dev/sdd /dev/sde /dev/sdf /dev/sdg
mdadm: looking for devices for /dev/md1
mdadm: /dev/sdb is identified as a member of /dev/md1, slot -1.
mdadm: /dev/sdd is identified as a member of /dev/md1, slot 0.
mdadm: /dev/sde is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md1, slot 5.
mdadm: /dev/sdg is identified as a member of /dev/md1, slot 4.
mdadm: added /dev/sde to /dev/md1 as 1
mdadm: no uptodate device for slot 2 of /dev/md1
mdadm: no uptodate device for slot 3 of /dev/md1
mdadm: added /dev/sdg to /dev/md1 as 4
mdadm: added /dev/sdf to /dev/md1 as 5
mdadm: failed to add /dev/sdb to /dev/md1: Invalid argument
mdadm: added /dev/sdd to /dev/md1 as 0
mdadm: /dev/md1 assembled from 4 drives - not enough to start the array.

If found this really difficult to understand considering that I can
get the output of mdamd -E /dev/sdb (other disks included to overload
you with information)

mdadm -E /dev/sd[b-h]

/dev/sdb:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 4e3b82e1:f5604e19:a9c9775f:49745adf
  Creation Time : Fri Oct  5 09:18:25 2007
 Raid Level : raid5
Device Size : 312571136 (298.09 GiB 320.07 GB)
 Array Size : 1562855680 (1490.46 GiB 1600.36 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 1

Update Time : Tue Oct 16 20:03:13 2007
  State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 80d47486 - correct
 Events : 0.623738

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 6   8   16   -1  spare   /dev/sdb

   0 0   8   800  active sync   /dev/sdf
   1 1   8  1281  active sync   /dev/.static/dev/sdi
   2 2   8  1442  active sync   /dev/.static/dev/sdj
   3 3   8   163  active sync   /dev/sdb
   4 4   8   644  active sync   /dev/sde
   5 5   8   965  active sync   /dev/sdg




If anyone could offer a solution I'd be forever grateful, also to
prove that supporting open source isn't all free labour I'll send you
can choose one of 1 of 2 Nintendo DS games, the new radiohead album or
a cree flash light. :)



- End message from [EMAIL PROTECTED] -

Hey, this looks similar to what I recently had.  
(http://www.mail-archive.com/linux-raid@vger.kernel.org/msg09306.html)
I my case a RAID5 reshape was interrupted and the new devices were  
also marked spare with slot -1.

Apply the attached patch to mdadm-2.6.3, build then do:
 mdadm -S /dev/md1
 ./mdadm -Av /dev/md1 --update=this /dev/sd[b-g]
That should update the slot on /dev/sdb. Then:
 mdadm -S /dev/md1
 ./mdadm -Av /dev/md1 /dev/sd[bcdefg]

should bring back your array in degraded mode.
If it works send your gifts to Neil Brown [EMAIL PROTECTED], he wrote  
the patch! :)

Good luck!



#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..

diff --git a/Grow.c b/Grow.c
index 825747e..8ad1537 100644
--- a/Grow.c
+++ b/Grow.c
@@ -978,5 +978,5 @@ int Grow_restart(struct supertype *st, struct mdinfo *info, 
int *fdlist, int cnt
/* And we are done! */
return 0;
}
-   return 1;
+   return 0;
 }
diff --git a/mdadm.c b/mdadm.c
index 40fdccf..7e7e803 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -584,6 +584,8 @@ int main(int argc, char *argv[])
exit(2);
}
update = optarg;
+   if (strcmp(update, this)==0) 

Re: slow raid5 performance

2007-10-22 Thread Peter

- Original Message 

From: Peter Grandi [EMAIL PROTECTED]



Thank you for your insightful response Peter (Yahoo spam filter hid it from me 
until now). 



 Most 500GB drives can do 60-80MB/s on the outer tracks

 (30-40MB/s on the inner ones), and 3 together can easily swamp

 the PCI bus. While you see the write rates of two disks, the OS

 is really writing to all three disks at the same time, and it

 will do read-modify-write unless the writes are exactly stripe

 aligned. When RMW happens write speed is lower than writing to a

 single disk.



I can understand that if a RMW happens it will effectively lower the write 
throughput substantially but I'm not sure entirely sure why this would happen 
while  writing new content, I don't know enough about RAID internals. Would 
this be the case the majority of time?



 The system time is because the Linux page cache etc. is CPU

 bound (never mind RAID5 XOR computation, which is not that

 big). The IO wait is because IO is taking place.



  http://www.sabi.co.uk/blog/anno05-4th.html#051114



 Almost all kernel developers of note have been hired by wealthy

 corporations who sell to people buying large servers. Then the

 typical system that these developers may have and also target

 are high ends 2-4 CPU workstations and servers, with CPUs many

 times faster than your PC, and on those system the CPU overhead

 of the page cache at speeds like yours less than 5%.



 My impression is that something that takes less than 5% on a

 developers's  system does not get looked at, even if it takes 50%

 on your system. The Linux kernel was very efficient when most

 developers were using old cheap PCs themselves. scratch your

 itch rules.



This is a rather unfortunate situation, it seems that some of the roots are 
forgotten, especially in a case like this where one would think running a file 
server on a modest CPU should be enough. I was waiting for Phenom and AM2+ 
motherboards to become available before relegating this X2 4600+ to file server 
duty, guess I'll need to stay with the slow performance for a few more months. 



 Anyhow, try to bypass the page cache with 'O_DIRECT' or test

 with 'dd oflag=direct' and similar for an alterative code path.



I'll give this a try, thanks.



 Misaligned writes and page cache CPU time most likely.



What influence would adding more harddrives to this RAID have? I know in terms 
of a Netapp filer they  always talk about spindle count for performance. 



-

To unsubscribe from this list: send the line unsubscribe linux-raid  in

the body of a message to [EMAIL PROTECTED]

More majordomo info at  http://vger.kernel.org/majordomo-info.html















-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Peter
Thanks Justin, good to hear about some real world experience. 

- Original Message 
From: Justin Piszcz [EMAIL PROTECTED]
To: Peter [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Monday, October 22, 2007 9:58:16 AM
Subject: Re: slow raid5 performance


With SW RAID 5 on the PCI bus you are not going to see faster than
 38-42 
MiB/s.  Especially with only three drives it may be slower than that. 
Forget / stop using the PCI bus and expect high transfer rates.

For writes = 38-42 MiB/s sw raid5.
For reads = you will get close to 120-122 MiB/s sw raid5.

This is from a lot of testing going up to 400GB x 10 drives using PCI 
cards on a regular PCI bus.

Then I went PCI-e and used faster disks to get 0.5gigabytes/sec SW
 raid5.

Justin.

On Mon, 22 Oct 2007, Peter wrote:

 Does anyone have any insights here? How do I interpret the seemingly
 competing system  iowait numbers... is my system both CPU and PCI bus
 bound?

 - Original Message 
 From: nefilim
 To: linux-raid@vger.kernel.org
 Sent: Thursday, October 18, 2007 4:45:20 PM
 Subject: slow raid5 performance



 Hi

 Pretty new to software raid, I have the following setup in a file
 server:

 /dev/md0:
Version : 00.90.03
  Creation Time : Wed Oct 10 11:05:46 2007
 Raid Level : raid5
 Array Size : 976767872 (931.52 GiB 1000.21 GB)
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 3
  Total Devices : 3
 Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Oct 18 15:02:16 2007
  State : active
 Active Devices : 3
 Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : 9dcbd480:c5ca0550:ca45cdab:f7c9f29d
 Events : 0.9

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1

 3 x 500GB WD RE2 hard drives
 AMD Athlon XP 2400 (2.0Ghz), 1GB RAM
 /dev/sd[ab] are connected to Sil 3112 controller on PCI bus
 /dev/sd[cde] are connected to Sil 3114 controller on PCI bus

 Transferring large media files from /dev/sdb to /dev/md0 I see the
 following
 with iostat:

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.010.00   55.56   40.400.003.03

 Device:tpsMB_read/sMB_wrtn/sMB_read  
  MB_wrtn
 sda   0.00 0.00 0.00  0
  0
 sdb 261.6231.09 0.00 30
  0
 sdc 148.48 0.1516.40  0
 16
 sdd 102.02 0.4116.14  0
 15
 sde 113.13 0.2916.18  0
 16
 md08263.64 0.0032.28  0
 31

 which is pretty much what I see with hdparm etc. 32MB/s seems pretty
 slow
 for drives that can easily do 50MB/s each. Read performance is better
 around
 85MB/s (although I expected somewhat higher). So it doesn't seem that
 PCI
 bus is limiting factor here (127MB/s theoretical throughput.. 100MB/s
 real
 world?) quite yet... I see a lot of time being spent in the kernel..
 and a
 significant iowait time. The CPU is pretty old but where exactly is
 the
 bottleneck?

 Any thoughts, insights or recommendations welcome!

 Cheers
 Peter
 -- 
 View this message in context:
 http://www.nabble.com/slow-raid5-performance-tf4650085.html#a13284909
 Sent from the linux-raid mailing list archive at Nabble.com.

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid
 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 -
 To unsubscribe from this list: send the line unsubscribe linux-raid
 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid
 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Richard Scobie

Peter wrote:
Thanks Justin, good to hear about some real world experience. 


Hi Peter,

I recently built a 3 drive RAID5 using the onboard SATA controllers on 
an MCP55 based board and get around 115MB/s write and 141MB/s read.


A fourth drive was added some time later and after growing the array and 
filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% 
full.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow raid5 performance

2007-10-22 Thread Justin Piszcz



On Tue, 23 Oct 2007, Richard Scobie wrote:


Peter wrote:
Thanks Justin, good to hear about some real world experience. 


Hi Peter,

I recently built a 3 drive RAID5 using the onboard SATA controllers on an 
MCP55 based board and get around 115MB/s write and 141MB/s read.


A fourth drive was added some time later and after growing the array and 
filesystem (XFS), saw 160MB/s write and 178MB/s read, with the array 60% 
full.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Yes, your chipset must be PCI-e based and not PCI.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-22 Thread John Stoffel

[ I was going to reply to this earlier, but the Red Sox and good
weather got into the way this weekend.  ;-]

 Michael == Michael Tokarev [EMAIL PROTECTED] writes:

Michael I'm doing a sysadmin work for about 15 or 20 years.

Welcome to the club!  It's a fun career, always something new to
learn. 

 If you are going to mirror an existing filesystem, then by definition
 you have a second disk or partition available for the purpose.  So you
 would merely setup the new RAID1, in degraded mode, using the new
 partition as the base.  Then you copy the data over to the new RAID1
 device, change your boot setup, and reboot.

Michael And you have to copy the data twice as a result, instead of
Michael copying it only once to the second disk.

So?  Why is this such a big deal?  As I see it, there are two seperate
ways to setup a RAID1 setup, on an OS.

1.  The mirror is built ahead of time and you install onto the
mirror.  And twice as much data gets written, half to each disk.
*grin* 

2.  You are encapsulating an existing OS install and you need to do a
reboot from the un-mirrored OS to the mirrored setup.  So yes, you
do have to copy the data from the orig to the mirror, reboot, then
resync back onto the original disk whish has been added into the the
RAID set.  

Neither case is really that big a deal.  And with the RAID super block
at the front of the disk, you don't have to worry about mixing up
which disk is which.  It's not fun when you boot one disk, thinking
it's the RAID disk, but end up booting the original disk.  

 As Doug says, and I agree strongly, you DO NOT want to have the
 possibility of confusion and data loss, especially on bootup.  And

Michael There are different point of views, and different settings
Michael etc.  For example, I once dealt with a linux user who was
Michael unable to use his disk partition, because his system (it was
Michael RedHat if I remember correctly) recognized some LVM volume on
Michael his disk (it was previously used with Windows) and tried to
Michael automatically activate it, thus making it busy.  What I'm
Michael talking about here is that any automatic activation of
Michael anything should be done with extreme care, using smart logic
Michael in the startup scripts if at all.

Ah... but you can also de-active LVM partitions as well if you like.  

Michael The Doug's example - in my opinion anyway - shows wrong tools
Michael or bad logic in the startup sequence, not a general flaw in
Michael superblock location.

I don't agree completely.  I think the superblock location is a key
issue, because if you have a superblock location which moves depending
the filesystem or LVM you use to look at the partition (or full disk)
then you need to be even more careful about how to poke at things.

This is really true when you use the full disk for the mirror, because
then you don't have the partition table to base some initial
guestimates on.  Since there is an explicit Linux RAID partition type,
as well as an explicit linux filesystem (filesystem is then decoded
from the first Nk of the partition), you have a modicum of safety.

If ext3 has the superblock in the first 4k of the disk, but you've
setup the disk to use RAID1 with the LVM superblock at the end of the
disk, you now need to be careful about how the disk is detected and
then mounted.

To the ext3 detection logic, it looks like an ext3 filesystem, to LVM,
it looks like a RAID partition.  Which is correct?  Which is wrong?
How do you tell programmatically?  

That's what I think that all superblocks should be in the SAME
location on the disk and/or partitions if used.  It keeps down
problems like this.  

Michael Another example is ext[234]fs - it does not touch first 512
Michael bytes of the device, so if there was an msdos filesystem
Michael there before, it will be recognized as such by many tools,
Michael and an attempt to mount it automatically will lead to at
Michael least scary output and nothing mounted, or in fsck doing
Michael fatal things to it in worst scenario.  Sure thing the first
Michael 512 bytes should be just cleared.. but that's another topic.

I would argue that ext[234] should be clearing those 512 bytes.  Why
aren't they cleared  

Michael Speaking of cases where it was really helpful to have an
Michael ability to mount individual raid components directly without
Michael the raid level - most of them was due to one or another
Michael operator errors, usually together with bugs and/or omissions
Michael in software.  I don't remember exact scenarious anymore (last
Michael time it was more than 2 years ago).  Most of the time it was
Michael one or another sort of system recovery.

In this case, you're only talking about RAID1 mirrors, no other RAID
configuration fits this scenario.  And while this might look to be
helpful, I would strongly argue that it's not, because it's a special
case of the RAID code and can lead to all kinds of bugs and problems
if it's not 

Re: slow raid5 performance

2007-10-22 Thread Peter Grandi
 On Mon, 22 Oct 2007 15:33:09 -0400 (EDT), Justin Piszcz
 [EMAIL PROTECTED] said:

[ ... speed difference between PCI and PCIe RAID HAs ... ]

 I recently built a 3 drive RAID5 using the onboard SATA
 controllers on an MCP55 based board and get around 115MB/s
 write and 141MB/s read.  A fourth drive was added some time
 later and after growing the array and filesystem (XFS), saw
 160MB/s write and 178MB/s read, with the array 60% full.

jpiszcz Yes, your chipset must be PCI-e based and not PCI.

Broadly speaking yes (the MCP55 is a PCIe chipset), but it is
more complicated than that. The south bridge chipset host
adapters often have a rather faster link to memory and the CPU
interconnect than the PCI or PCIe buses can provide, even when
they are externally ''PCI''.

Also, when the RAID HA is not in-chipset it also matters a fair
bit how many lanes the PCIe slot (or whether it is PCI-X 64 bit
and 66MHz) it is plugged in has -- most PCIe RAID HAs can use 4
or 8 lanes (or equivalent for PCI-X).

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-22 Thread Michael Tokarev
John Stoffel wrote:

 Michael == Michael Tokarev [EMAIL PROTECTED] writes:

 If you are going to mirror an existing filesystem, then by definition
 you have a second disk or partition available for the purpose.  So you
 would merely setup the new RAID1, in degraded mode, using the new
 partition as the base.  Then you copy the data over to the new RAID1
 device, change your boot setup, and reboot.
 
 Michael And you have to copy the data twice as a result, instead of
 Michael copying it only once to the second disk.
 
 So?  Why is this such a big deal?  As I see it, there are two seperate
 ways to setup a RAID1 setup, on an OS.
[..]
that was just a tiny nitpick, so to say, about a particular way to
convert existing system into raid1 - not something which's done every
day anyway.  Still, double the time for copying your terabyte-sized
drive is something to consider.

[]
 Michael automatically activate it, thus making it busy.  What I'm
 Michael talking about here is that any automatic activation of
 Michael anything should be done with extreme care, using smart logic
 Michael in the startup scripts if at all.
 
 Ah... but you can also de-active LVM partitions as well if you like.  

Yes, esp. being a newbie user who first installed linux on his PC just
to see that he can't use his disk.. ;)  That was a real situation - I
helped someone who had never heard of LVM and did little of anything
with filesystems/disks before.

 Michael The Doug's example - in my opinion anyway - shows wrong tools
 Michael or bad logic in the startup sequence, not a general flaw in
 Michael superblock location.
 
 I don't agree completely.  I think the superblock location is a key
 issue, because if you have a superblock location which moves depending
 the filesystem or LVM you use to look at the partition (or full disk)
 then you need to be even more careful about how to poke at things.

Superblock location does not depend on the filesystem.  Raid exports
the inside space only, excluding superblocks, to the next level
(filesystem or else).

 This is really true when you use the full disk for the mirror, because
 then you don't have the partition table to base some initial
 guestimates on.  Since there is an explicit Linux RAID partition type,
 as well as an explicit linux filesystem (filesystem is then decoded
 from the first Nk of the partition), you have a modicum of safety.

Speaking of whole disks - first, don't do that (for reasons suitable
for another topic), and second, using the whole disk or partitions
makes no real difference whatsoever to the topic being discussed.

There's just no need for the guesswork, except for the first install
(to automatically recognize existing devices, and to use them after
confirmation), and maybe for rescue systems, which again is a different
topic.

In any case, for a tool that does a guesswork (like libvolume-id, to
create /dev/ symlinks), it's as easy to look at the end of the device
as to the beginning or to any other fixed place - since the tool has
to know the superblock format, it knows superblock location as well).

Maybe manual guesswork, based on hexdump of first several kilobytes
of data, is a bit more difficult in case where superblock is located
at the end.  But if one has to analyze hexdump, he doesn't care about
raid anymore.

 If ext3 has the superblock in the first 4k of the disk, but you've
 setup the disk to use RAID1 with the LVM superblock at the end of the
 disk, you now need to be careful about how the disk is detected and
 then mounted.

See above.  For tools, it's trivial to distinguish a component of a
raid volume from the volume itself, by looking for superblock at whatever
location.  Including stuff like mkfs, which - like mdadm does - may
warn one about previous filesystem/volume information on the device
in question.

 Michael Speaking of cases where it was really helpful to have an
 Michael ability to mount individual raid components directly without
 Michael the raid level - most of them was due to one or another
 Michael operator errors, usually together with bugs and/or omissions
 Michael in software.  I don't remember exact scenarious anymore (last
 Michael time it was more than 2 years ago).  Most of the time it was
 Michael one or another sort of system recovery.
 
 In this case, you're only talking about RAID1 mirrors, no other RAID
 configuration fits this scenario.  And while this might look to be

Definitely.  However, linear - to some extent - can be used partially.
But sure with much less usefulness.

However, raid1 is much more common setup than anything else - IMHO anyway.
It's the cheapest and the most reliable thing for an average user anyway -
it's cheaper to get 2 large drives than to, say, 3 a bit smaller drives.
Yes, raid1 has 1/2 space wasted, compared with, say, raid5 on top of 3
drives (only 1/3 wasted), but still 3 smallish drives costs more than
2 larger drives.

 helpful, I would strongly argue that it's not, because it's a special
 

mdadm devices building in the wrong order

2007-10-22 Thread marc
Hello,

I am having a rather urgent and annoying problem and I would appreciate
some input from anyone who has come across this.  I have not been able to
find a solution as of yet.  My issue deals with nested raid using mdadm,
and it seems that upon a reboot mdadm is attempting to assemble the larger
array before the smaller component array is created, and thus it is
failing.

I have a degraded raid 5 array md1 whch is composed of hda1 and md0.  Upon
a reboot, mdadm attempts to build md1 before md0 is built. It fails, so
md1 is not build and I need to assemble it manually.

Is there a solution for this?

Thanks you for your time,

Marc

p.s. I would like to also take a second to express my gratitude to Neil
Brown  for the mdadm utility.  I have found it very useful and it has made
working with raid in linux very enjoyable and straight-forward!
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm devices building in the wrong order

2007-10-22 Thread Neil Brown
On Monday October 22, [EMAIL PROTECTED] wrote:
 Hello,
 
 I am having a rather urgent and annoying problem and I would appreciate
 some input from anyone who has come across this.  I have not been able to
 find a solution as of yet.  My issue deals with nested raid using mdadm,
 and it seems that upon a reboot mdadm is attempting to assemble the larger
 array before the smaller component array is created, and thus it is
 failing.
 
 I have a degraded raid 5 array md1 whch is composed of hda1 and md0.  Upon
 a reboot, mdadm attempts to build md1 before md0 is built. It fails, so
 md1 is not build and I need to assemble it manually.
 
 Is there a solution for this?

What order are the arrays listed in in mdadm.conf?
If md1 comes first, put it last.

Otherwise, I cannot think what might be happening.  Maybe if you
include some kernel logs that might help.


 
 Thanks you for your time,
 
 Marc
 
 p.s. I would like to also take a second to express my gratitude to Neil
 Brown  for the mdadm utility.  I have found it very useful and it has made
 working with raid in linux very enjoyable and straight-forward!

Thanks :-)

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-22 Thread Neil Brown
On Monday October 22, [EMAIL PROTECTED] wrote:
 
 Hey Neil,
 
 Your fix works for me too.  However, I'm wondering why you held back
 on fixing the same issue in the bitmap runs into data comparison
 that follows:

It isn't really needed here.  In this case bitmap-offset is positive,
so all the numbers are positive, so it doesn't matter if the
comparison is signed or not.

Thanks for mentioning it though.

NeilBrown


 
 --- ./drivers/md/bitmap.c 2007-10-19 19:11:58.0 -0400
 +++ ./drivers/md/bitmap.c 2007-10-22 09:53:41.0 -0400
 @@ -286,7 +286,7 @@
 /* METADATA BITMAP DATA */
 if (rdev-sb_offset*2
 + bitmap-offset
 -   + page-index*(PAGE_SIZE/512) + size/512
 +   +
 (long)(page-index*(PAGE_SIZE/512)) + size/512
  rdev-data_offset)
 /* bitmap runs in to data */
 return -EINVAL;
 
 Thanks,
 Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html