BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Peter Rabbitson

Hello,

It seems that mdadm/md do not perform proper sanity checks before adding a 
component to a degraded array. If the size of the new component is just right, 
the superblock information will overlap with the data area. This will happen 
without any error indications in the syslog or otherwise.


I came up with a reproducible scenario which I am attaching to this email 
alongside with the entire test script. I have not tested it for other raid 
levels, or other types of superblocks, but I suspect the same problem will 
occur for many other configurations.


I am willing to test patches, however the attached script is non-intrusive 
enough to be executed anywhere.


The output of the script follows bellow.

Peter

==
==
==

[EMAIL PROTECTED]:/media/space/testmd# ./md_overlap_test
Creating component 1 (1056768 bytes)... done.
Creating component 2 (1056768 bytes)... done.
Creating component 3 (1056768 bytes)... done.


===
Creating 3 disk raid5 array with v1.1 superblock
mdadm: array /dev/md9 started.
Waiting for resync to finish... done.

md9 : active raid5 loop3[3] loop2[1] loop1[0]
  2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Initial checksum of raw raid5 device: 4df1921524a3b717a956fceaed0ae691  /dev/md9


===
Failing first componnent
mdadm: set /dev/loop1 faulty in /dev/md9
mdadm: hot removed /dev/loop1

md9 : active raid5 loop3[3] loop2[1]
  2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/2] [_UU]

Checksum of raw raid5 device after failing componnent: 
4df1921524a3b717a956fceaed0ae691  /dev/md9



===
Re-creating block device with size 1048576 bytes, so both the superblock and 
data start at the same spot

Adding back to array
mdadm: added /dev/loop1
Waiting for resync to finish... done.

md9 : active raid5 loop1[4] loop3[3] loop2[1]
  2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Checksum of raw raid5 device after adding back smaller component: 
bb854f77ad222d224fcdd8c8f96b51f0  /dev/md9



===
Attempting recovery
Waiting for recovery to finish... done.
Performing check
Waiting for check to finish... done.

Current value of mismatch_cnt: 0

Checksum of raw raid5 device after repair/check: 
146f5c37305c42cda64538782c8c3794  /dev/md9

[EMAIL PROTECTED]:/media/space/testmd#
#!/bin/bash

echo Please read the script first, and comment the exit line at the top.
echo This script will require about 3MB of free space, it will free (and use)
echo loop devices 1 2 and 3, and will use the md device number specified in 
MD_DEV.
exit 0

MD_DEV=md9# make sure this is not an array you use
COMP_NUM=3
COMP_SIZE=$((1 * 1024 * 1024 + 8192)) #1MiB comp sizes with room for 8k (16 
sect) of metadata

mdadm -S /dev/$MD_DEV /dev/null

DEVS=
for i in $(seq $COMP_NUM); do
echo -n Creating component $i ($COMP_SIZE bytes)... 
losetup -d /dev/loop${i} /dev/null

set -e
PCMD=print \\\x${i}${i}\ x $COMP_SIZE   # fill entire image with the 
component number (0xiii...)
perl -e $PCMD  dummy${i}.img
losetup /dev/loop${i} dummy${i}.img
DEVS=$DEVS /dev/loop${i}
set +e
echo done.
done

echo
echo
echo ===
echo Creating $COMP_NUM disk raid5 array with v1.1 superblock
# superblock at beginning of blockdev guarantees that it will overlap with real 
data, not with parity
mdadm -C /dev/$MD_DEV -l 5 -n $COMP_NUM -e 1.1 $DEVS

echo -n Waiting for resync to finish...
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != idle ] ; do
echo -n .
sleep 1
done
echo  done.
echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n Initial checksum of raw raid5 device: 
md5sum /dev/$MD_DEV

echo
echo
echo ===
echo Failing first componnent
mdadm -f /dev/$MD_DEV /dev/loop1
mdadm -r /dev/$MD_DEV /dev/loop1

echo
grep -A1 $MD_DEV /proc/mdstat 

echo
echo -n Checksum of raw raid5 device after failing componnent: 
md5sum /dev/$MD_DEV

echo
echo
echo ===
NEWSIZE=$(( $COMP_SIZE - $(cat /sys/block/$MD_DEV/md/rd1/offset) * 512 ))
echo Re-creating block device with size $NEWSIZE bytes, so both the superblock 
and data start at the same spot
losetup -d /dev/loop1 /dev/null
PCMD=print \\\x11\ x $NEWSIZE
perl -e $PCMD  dummy1.img
losetup /dev/loop1 dummy1.img

echo Adding back to array
mdadm -a /dev/$MD_DEV /dev/loop1

echo -n Waiting for resync to finish...
while [ $(cat /sys/block/$MD_DEV/md/sync_action) != idle ] ; do
echo -n .

Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Peter Rabbitson

Neil Brown wrote:

On Monday January 28, [EMAIL PROTECTED] wrote:

Hello,

It seems that mdadm/md do not perform proper sanity checks before adding a 
component to a degraded array. If the size of the new component is just right, 
the superblock information will overlap with the data area. This will happen 
without any error indications in the syslog or otherwise.


I thought I fixed that What versions of Linux kernel and mdadm are
you using for your tests?



Linux is 2.6.23.14 with everything md related compiled in (no modules)
mdadm - v2.6.4 - 19th October 2007 (latest in debian/sid)

Peter
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote:
 Hello,
 
 It seems that mdadm/md do not perform proper sanity checks before adding a 
 component to a degraded array. If the size of the new component is just 
 right, 
 the superblock information will overlap with the data area. This will happen 
 without any error indications in the syslog or otherwise.

I thought I fixed that What versions of Linux kernel and mdadm are
you using for your tests?

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to eradicate previous version of device information, even with zero-superblock and dd

2008-01-28 Thread Moshe Yudkowsky
I've been trying to bring up a RAID10 device, and I'm having some 
difficulty with automatically-created device names.


mdadm version 2.5.6, Debian Etch.

With metadata=1.2 in my config file,

mdadm --create /dev/md/all --auto=p7 -n 4 --level=10 /dev/sd*2

This does seem to create a RAID array. I see that my /dev/md/ directory 
is populated with all1 through all7.


On reboot, however, I notice that there's a suddenly a /dev/md127 
device. Confused, I attempted to start over many times, but I can't seem 
to create a non-all array and I can't seem to create a simple 
/dev/md/0 array.


Steps:

To eradicate all prior traces of md configuration, I issue these commands:

mdadm --stop /dev/md/all

which stops.

mdadm --zero-superblock  /dev/sd[each drive]2


I went further (after some trouble) and issued

dd if=/dev/zero of=/dev/sd[each drive]2 count=2M

I then issue:

rm /dev/md* /dev/md/*

The ARRAY information is commented out of the config file (mdadm.conf).

On reboot, I see that the devices /dev/md/all, /dev/md/all1, etc. have 
reappeared, along /dev/md127, /dev/md_127, and /dev/md_d127.


This is very, very puzzling.

Well, I thought I could work around this. I issued

mdadm --create /dev/md/all

with the same paramters as above. I can use cfdisk and fdisk (either 
one) to create two partitions, /dev/md/all1 and /dev/md/all2.


However,

mkfs.reiserfs /dev/md/all1

claims that the /dev/md/all1 has no such device or address.

ls -l /dev/md/all gives

brw-rw 1 root disk 254, 8129 (date) /dev/md/all1

QUESTIONS:

1. If I create a device called /dev/md/all, should I expect that mdadm 
will create a device called /dev/md/127, and that mdadm --detail --scan 
will report it as /dev/md127 or something similar?


2. How can I completely eradicate all traces of previous work, given 
that zero-superblock and dd on the drives that make up the array doesn't 
seem to erase previous information?




--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 If you're going to shoot, shoot! Don't talk!
   -- Eli Wallach,The Good, the Bad, and the Ugly
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Use new sb type

2008-01-28 Thread Jan Engelhardt

This makes 1.0 the default sb type for new arrays.

Signed-off-by: Jan Engelhardt [EMAIL PROTECTED]

---
 Create.c |6 --
 super0.c |4 +---
 super1.c |2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

Index: mdadm-2.6.4/Create.c
===
--- mdadm-2.6.4.orig/Create.c
+++ mdadm-2.6.4/Create.c
@@ -241,12 +241,6 @@ int Create(struct supertype *st, char *m
fprintf(stderr, Name : internal error - no 
default metadata style\n);
exit(2);
}
-   if (st-ss-major != 0 ||
-   st-minor_version != 90)
-   fprintf(stderr, Name : Defaulting to version
-%d.%d metadata\n,
-   st-ss-major,
-   st-minor_version);
}
freesize = st-ss-avail_size(st, ldsize  9);
if (freesize == 0) {
Index: mdadm-2.6.4/super0.c
===
--- mdadm-2.6.4.orig/super0.c
+++ mdadm-2.6.4/super0.c
@@ -820,9 +820,7 @@ static struct supertype *match_metadata_
st-minor_version = 90;
st-max_devs = MD_SB_DISKS;
if (strcmp(arg, 0) == 0 ||
-   strcmp(arg, 0.90) == 0 ||
-   strcmp(arg, default) == 0
-   )
+   strcmp(arg, 0.90) == 0)
return st;
 
st-minor_version = 9; /* flag for 'byte-swapped' */
Index: mdadm-2.6.4/super1.c
===
--- mdadm-2.6.4.orig/super1.c
+++ mdadm-2.6.4/super1.c
@@ -1143,7 +1143,7 @@ static struct supertype *match_metadata_
 
st-ss = super1;
st-max_devs = 384;
-   if (strcmp(arg, 1.0) == 0) {
+   if (strcmp(arg, 1.0) == 0 || strcmp(arg, default) == 0) {
st-minor_version = 0;
return st;
}

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-28 Thread Bill Davidsen

Keld Jørn Simonsen wrote:

On Mon, Jan 28, 2008 at 07:13:30AM +1100, Neil Brown wrote:
  

On Sunday January 27, [EMAIL PROTECTED] wrote:


Hi

I have tried to make a striping raid out of my new 4 x 1 TB
SATA-2 disks. I tried raid10,f2 in several ways:

1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
of md0+md1

2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
of md0+md1

3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of 
md0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB


4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB

5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1
  

Try
  6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1



That I already tried, (and I wrongly stated that I used f4 in stead of
f2). I had two times a thruput of about 300 MB/s but since then I could
not reproduce the behaviour. Are there errors on this that has been
corrected in newer kernels?


  

Also try raid10,o2 with a largeish chunksize (256KB is probably big
enough).



I tried that too, but my mdadm did not allow me to use the o flag.

My kernel is 2.6.12  and mdadm is v1.12.0 - 14 June 2005.
can I upgrade the mdadm alone to a newer version, and then which is
recommendable?
  


I doubt that updating the mdadm is going to help, the kernel is old and 
lacks a number of improvements in the last few years. I don't think you 
will see any major improvements without a kernel upgrade.


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use new sb type

2008-01-28 Thread David Greaves
Jan Engelhardt wrote:
 This makes 1.0 the default sb type for new arrays.
 

IIRC there was a discussion a while back on renaming mdadm options (google Time
to  deprecate old RAID formats?) and the superblocks to emphasise the location
and data structure. Would it be good to introduce the new names at the same time
as changing the default format/on-disk-location?

David

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use new sb type

2008-01-28 Thread David Greaves
Peter Rabbitson wrote:
 David Greaves wrote:
 Jan Engelhardt wrote:
 This makes 1.0 the default sb type for new arrays.


 IIRC there was a discussion a while back on renaming mdadm options
 (google Time
 to  deprecate old RAID formats?) and the superblocks to emphasise the
 location
 and data structure. Would it be good to introduce the new names at the
 same time
 as changing the default format/on-disk-location?

 David
 
 Also wasn't the concession to make 1.1 default instead of 1.0 ?
 
IIRC Doug Leford did some digging wrt lilo + grub and found that 1.1 and 1.2
wouldn't work with them. I'd have to review the thread though...

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use new sb type

2008-01-28 Thread Peter Rabbitson

David Greaves wrote:

Jan Engelhardt wrote:

This makes 1.0 the default sb type for new arrays.



IIRC there was a discussion a while back on renaming mdadm options (google Time
to  deprecate old RAID formats?) and the superblocks to emphasise the location
and data structure. Would it be good to introduce the new names at the same time
as changing the default format/on-disk-location?

David


Also wasn't the concession to make 1.1 default instead of 1.0 ?

Peter
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use new sb type

2008-01-28 Thread Jan Engelhardt

On Jan 28 2008 18:19, David Greaves wrote:
Jan Engelhardt wrote:
 This makes 1.0 the default sb type for new arrays.
 

IIRC there was a discussion a while back on renaming mdadm options
(google Time to deprecate old RAID formats?) and the superblocks
to emphasise the location and data structure. Would it be good to
introduce the new names at the same time as changing the default
format/on-disk-location?

The -e 1.0/1.1/1.2 is sufficient for me, I would not need
--metadata 1 --metadata-layout XXX.

So renaming options should definitely be a separate patch.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: striping of a 4 drive raid10

2008-01-28 Thread Keld Jørn Simonsen
On Mon, Jan 28, 2008 at 01:32:48PM -0500, Bill Davidsen wrote:
 Neil Brown wrote:
 On Sunday January 27, [EMAIL PROTECTED] wrote:
   
 Hi
 
 I have tried to make a striping raid out of my new 4 x 1 TB
 SATA-2 disks. I tried raid10,f2 in several ways:
 
 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0
 of md0+md1
 
 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2
 of md0+md1
 
 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize 
 of md0 =md1 =128 KB,  md2 = raid0 of md0+md1 chunksize = 256 KB
 
 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize
 of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB
 
 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1
 
 
 Try
   6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1
 
 Also try raid10,o2 with a largeish chunksize (256KB is probably big
 enough).
   
 
 Looking at the issues raised, there might be some benefit from having 
 the mirror chunks on the slower inner tracks of a raid10, and to read 
 from the outer tracks if the drives with the data on the outer tracks 
 are idle. This would appear to offer a transfer rate benefit overall.

Hmm, how do I do this? I think this is normal behaviour of a raid10,f2.
Is that so?

So you mean I should rather use f2 than o2? Or should I configure the f2
in some way?

My hdparm -t gives:

/dev/sda5:
 Timing buffered beginning disk reads:   82 MB in  1.00 seconds = 81.686 MB/sec
 Timing buffered endingdisk reads:   42 MB in  1.03 seconds = 40.625 MB/sec
 Average seek time 13.714 msec, min=4.641, max=23.921
 Average track-to-track time 28.151 msec, min=26.729, max=28.730

So, yes, there is a reason to use the faster outer tracks - and have the 
faster access time that f2 gives . How does o2 behave here? Does it read
and search on the whole disk?


As to your other comments in another mail, I could of cause install
a newer kernel and mdadm, but then I would loose the support of my
supported and paid system. And Neil said that there have been no
performance fixes for f2 since the kernel I use (2.6.12).
I thought that o2 support was included since 2.6.10 - but apparantly not
so. 

Best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-28 Thread Tim Southerwood

Subtitle: Patch to mainline yet?

Hi

I don't see evidence of Neil's patch in 2.6.24, so I applied it by hand
on my server.

Was that the correct thing to do, or did this issue get fixed in a 
different way that I wouldn't have spotted? I had a look at the git logs 
but it was not obvious - please pardon my ignorance, I'm not familiar 
enough with the code.


Many thanks,

Tim

Tim Southerwood wrote:

Carlos Carvalho wrote:

Tim Southerwood ([EMAIL PROTECTED]) wrote on 23 January 2008 13:37:
 Sorry if this breaks threaded mail readers, I only just subscribed 
to  the list so don;t have the original post to reply to.

 
 I believe I'm having the same problem.
 
 Regarding XFS on a raid5 md array:
 
 Kernels 2.6.22-14 (Ubuntu Gutsy generic and server builds) *and* 
 2.6.24-rc8 (pure build from virgin sources) compiled for amd64 arch.


This has been corrected already, install Neil's patches. It worked for
several people under high stress, including us.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi

I just coerced the patch into 2.6.23.14, reset 
/sys/block/md1/md/stripe_cache_size to default (256) and rebooted.


I can confirm that after 2 hours of heavy bashing[1] the system has not 
hung. Looks good - many thanks. But I will run with a stripe_cache_size 
of 4096 in practise as it improves write speen on my configuration about 
2.5 times.


Cheers

Tim



[1] Rsync  50GB to raid pluf xfs_fsr + dd 11GB of /dev/zero to same 
filesystem.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to eradicate previous version of device information, even with zero-superblock and dd

2008-01-28 Thread Moshe Yudkowsky



QUESTIONS:

1. If I create a device called /dev/md/all, should I expect that mdadm 
will create a device called /dev/md/127, and that mdadm --detail --scan 
will report it as /dev/md127 or something similar?


That's still happening. However:

2. How can I completely eradicate all traces of previous work, given 
that zero-superblock and dd on the drives that make up the array doesn't 
seem to erase previous information?


Answer:

In order for the md drives to be started on a reboot, upgrade-initramfs 
   places information about the current configuration into boot 
configuration.


In order to eradicate everything, stop all arrays, comment out any ARRAY 
lines in mdadm.conf, remove all md device files, and then issue


update-initramfs

This cleans out the information that's hidden inside the /boot area. On 
the next reboot, no extraneous md files are present. It's then possible 
to issue an mdadm --create /dev/md/all that will create the appropriate 
md devices automatically with proper major and minor device numbers.


To get the md device started correctly at init time, I seem to require 
the use of update-initramfs. I will investigate further when I've got 
some time...



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 The odds are good, but the goods are odd.
 -- Alaskan women, on the high ratio of men to women in Alaska
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problem with spare, acive device, clean degrated, reshaip RADI5, anybody can help

2008-01-28 Thread Andreas-Sokov
Hello linux-raid.

i have DEBIAN.

raid01:/# mdadm -V
mdadm - v2.6.4 - 19th October 2007

raid01:/# mdadm -D /dev/md1
/dev/md1:
Version : 00.91.03
  Creation Time : Tue Nov 13 18:42:36 2007
 Raid Level : raid5
 Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Sun Jan 27 00:24:44 2008
  State : clean, degraded
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 64K

  Delta Devices : 1, (4-5)

   UUID : 4fbdc8df:07b952cf:7cc6faa0:04676ba5
 Events : 0.683478

Number   Major   Minor   RaidDevice State
   0   8   320  active sync   /dev/sdc
   1   8   481  active sync   /dev/sdd
   2   8   642  active sync   /dev/sde
   3   8   803  active sync   /dev/sdf
   4   004  removed

   5   8   16-  spare   /dev/sdb


Anybody know what i need to do for /dev/sdb became ACTIVE DEVICE ?




**
@raid01:/# mdadm -E /dev/sdb
/dev/sdb:
  Magic : a92b4efc
Version : 00.91.00
   UUID : 4fbdc8df:07b952cf:7cc6faa0:04676ba5
  Creation Time : Tue Nov 13 18:42:36 2007
 Raid Level : raid5
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
 Array Size : 1953545984 (1863.05 GiB 2000.43 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

  Reshape pos'n : 194537472 (185.53 GiB 199.21 GB)
  Delta Devices : 1 (4-5)

Update Time : Tue Jan 29 02:05:52 2008
  State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 1
   Checksum : 450cd41b - correct
 Events : 0.683482

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 5   8   165  spare   /dev/sdb

   0 0   8   320  active sync   /dev/sdc
   1 1   8   481  active sync   /dev/sdd
   2 2   8   642  active sync   /dev/sde
   3 3   8   803  active sync   /dev/sdf
   4 4   004  faulty removed
   5 5   8   165  spare   /dev/sdb


   

-- 
Best regards
 Andreas
 mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote:
 Hello,
 
 It seems that mdadm/md do not perform proper sanity checks before adding a 
 component to a degraded array. If the size of the new component is just 
 right, 
 the superblock information will overlap with the data area. This will happen 
 without any error indications in the syslog or otherwise.
 
 I came up with a reproducible scenario which I am attaching to this email 
 alongside with the entire test script. I have not tested it for other raid 
 levels, or other types of superblocks, but I suspect the same problem will 
 occur for many other configurations.
 
 I am willing to test patches, however the attached script is non-intrusive 
 enough to be executed anywhere.

Thanks for the report and the test script.

This patch for mdadm should fix this problem I hate the fact that
we sometimes use K and sometimes use sectors for
sizes/offsets... groan.

I'll probably get a test in the kernel as well to guard against this.

Thanks,
NeilBrown


### Diffstat output
 ./Manage.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/Manage.c ./Manage.c
--- .prev/Manage.c  2008-01-29 11:15:54.0 +1100
+++ ./Manage.c  2008-01-29 11:16:15.0 +1100
@@ -337,7 +337,7 @@ int Manage_subdevs(char *devname, int fd
 
/* Make sure device is large enough */
if (tst-ss-avail_size(tst, ldsize/512) 
-   array.size) {
+   array.size*2) {
fprintf(stderr, Name : %s not large 
enough to join array\n,
dv-devname);
return 1;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problem with spare, acive device, clean degrated, reshaip RADI5, anybody can help

2008-01-28 Thread Andreas-Sokov
Hello linux-raid.

i have DEBIAN.

raid01:/# mdadm -V
mdadm - v2.6.4 - 19th October 2007

raid01:/# mdadm -D /dev/md1
/dev/md1:
Version : 00.91.03
  Creation Time : Tue Nov 13 18:42:36 2007
 Raid Level : raid5
 Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Sun Jan 27 00:24:44 2008
  State : clean, degraded
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 64K

  Delta Devices : 1, (4-5)

   UUID : 4fbdc8df:07b952cf:7cc6faa0:04676ba5
 Events : 0.683478

Number   Major   Minor   RaidDevice State
   0   8   320  active sync   /dev/sdc
   1   8   481  active sync   /dev/sdd
   2   8   642  active sync   /dev/sde
   3   8   803  active sync   /dev/sdf
   4   004  removed

   5   8   16-  spare   /dev/sdb


Anybody know what i need to do for /dev/sdb became ACTIVE DEVICE ?




**
@raid01:/# mdadm -E /dev/sdb
/dev/sdb:
  Magic : a92b4efc
Version : 00.91.00
   UUID : 4fbdc8df:07b952cf:7cc6faa0:04676ba5
  Creation Time : Tue Nov 13 18:42:36 2007
 Raid Level : raid5
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
 Array Size : 1953545984 (1863.05 GiB 2000.43 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1

  Reshape pos'n : 194537472 (185.53 GiB 199.21 GB)
  Delta Devices : 1 (4-5)

Update Time : Tue Jan 29 02:05:52 2008
  State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 1
   Checksum : 450cd41b - correct
 Events : 0.683482

 Layout : left-symmetric
 Chunk Size : 64K

  Number   Major   Minor   RaidDevice State
this 5   8   165  spare   /dev/sdb

   0 0   8   320  active sync   /dev/sdc
   1 1   8   481  active sync   /dev/sdd
   2 2   8   642  active sync   /dev/sde
   3 3   8   803  active sync   /dev/sdf
   4 4   004  faulty removed
   5 5   8   165  spare   /dev/sdb


   

-- 
Best regards
 Andreas
 mailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use new sb type

2008-01-28 Thread Tim Southerwood
David Greaves wrote:
 Peter Rabbitson wrote:
 David Greaves wrote:
 Jan Engelhardt wrote:
 This makes 1.0 the default sb type for new arrays.

 IIRC there was a discussion a while back on renaming mdadm options
 (google Time
 to  deprecate old RAID formats?) and the superblocks to emphasise the
 location
 and data structure. Would it be good to introduce the new names at the
 same time
 as changing the default format/on-disk-location?

 David
 Also wasn't the concession to make 1.1 default instead of 1.0 ?

 IIRC Doug Leford did some digging wrt lilo + grub and found that 1.1 and 1.2
 wouldn't work with them. I'd have to review the thread though...
 
 David
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

For what it's worth, that was my finding too. -e 0.9+1.0 are fine with
GRUB, but  1.1 an 1.2 won't work under the filesystem that contains
/boot, at least with GRUB 1.x (I haven't used LILO for some time nor
have I tried the development GRUB 2).

The reason IIRC boils down to the fact that GRUB 1 isn't MD aware, and
the only reason one can get away with using it on a RAID 1 setup at
all is that the constituent devices present the same data as the
composite MD device, from the start.

Putting an MD SB at/near the beginning of the device breaks this case
and GRUB 1 doesn't know how to deal with it.

Cheers
Tim
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


In this partition scheme, grub does not find md information?

2008-01-28 Thread Moshe Yudkowsky
I'm finding a problem that isn't covered by the usual FAQs and online 
recipes.


Attempted setup: RAID 10 array with 4 disks.

Because Debian doesn't include RAID10 in its installation disks, I 
created a Debian installation on the first partition of sda, in 
/dev/sda1. Eventually I'll probably convert it to swap, but in the 
meantime that 4G has  a complete 2.6.18 install (Debian stable).


I created a RAID 10 array of four partitions, /dev/md/all, out of 
/dev/sd[abcd]2.


Using fdisk/cfdisk, I created the partition/dev/md/all1 (500 MB) for 
/boot, and the parition /dev/md/all2  with all remaining space into one 
large partition (about 850 GB). That larger partition contains /, /usr, 
/home, etc. each as a separate LVM volume. I copied usr, var, etc. (but 
not proc or sys, of course) files over to the raid array, mounted that 
array, did a chroot to its root, and started grub.


I admit that I'm no grub expert, but it's clear that grub cannot find 
any of the information in /dev/md/all1. For example,


grub find /boot/grub/this_is_raid

can't find a file that exists only on the raid array. Grub only searches 
/dev/sda1, not /dev/md/all1.


Perhaps I'm mistaken but I though it was possible to do boot from 
/dev/md/all1.


I've tried other attacks but without success. For example, also while in 
chroot,


grub-install /dev/md/all2 does not work. (Nor does it work with the 
--root=/boot option.)


I also tried modifications to menu.lst, adding root=/dev/md/all1 to the 
kernel command, but RAID array's version of menu.lst is never detected.


What I do see is

grub find /boot/grub/stage1
 (hd0,0)

which indicates (as far as I can tell) that it's found the information 
written on /dev/sda1 and nothing in /dev/md/all1.


Am I trying to do something that's basically impossible?

--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote:
 
 Perhaps I'm mistaken but I though it was possible to do boot from 
 /dev/md/all1.

It is my understanding that grub cannot boot from RAID.
You can boot from raid1 by the expedient of booting from one of the
halves.
A common approach is to make a small raid1 which contains /boot and
boot from that.  Then use the rest of your devices for raid10 or raid5
or whatever.
 
 Am I trying to do something that's basically impossible?

I believe so.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write-intent bitmaps

2008-01-28 Thread Russell Coker
On Tuesday 29 January 2008 05:15, Bill Davidsen [EMAIL PROTECTED] wrote:
 You may have missed the much higher part of the previous paragraph.
 And given the reliability of modern drives, unless you have a LOT of
 them you may be looking at years of degraded performance to save a few
 hours of slow performance after a power fail or similar. In other words,
 it's not as black and white as it seems.

What is the pathological case?  1/2 or 1/3 write performance?

For serious write performance of a RAID you want a NVRAM write-back cache for 
RAID-5 stripes, and the NVRAM cache removes the need for write-intent 
bitmaps.  AFAIK Linux software RAID doesn't support such things and that 
putting filesystem journals and the write-intent bitmap blocks on NVRAM 
devices is the best that you could achieve.

It seems that if you want the best performance for small synchronous writes 
(EG a mail server - which may be the most pessimal application for 
write-intent bitmaps) then hardware RAID is the only option.

Are there plans for supporting a NVRAM write-back cache with Linux software 
RAID?

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html