RAID1 robust read and read/write correct patch

2005-02-23 Thread J. David Beutel
I'd like to try this patch 
http://marc.theaimsgroup.com/?l=linux-raidm=110704868115609w=2 with 
EVMS BBR.

Has anyone tried it on 2.6.10 (with FC2 1.9 and EVMS patches)?  Has 
anyone tried the rewrite part at all?  I don't know md or the kernel or 
this patch, but the following lines of the patch seem suspicious to me.  
Should it set R1BIO_ReadRewrite instead?  That bit gets tested later, 
whereas it seems like R1BIO_ReadRetry is never tested and 
R1BIO_ReadRewrite is never set.

+#ifdef DO_ADD_READ_WRITE_CORRECT
+   else/* tell next time we're here that we're a retry */
+   set_bit(R1BIO_ReadRetry, r1_bio-state);
+#endif /* DO_ADD_READ_WRITE_CORRECT */
Cheers,
11011011
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread Peter T. Breuer
J. David Beutel [EMAIL PROTECTED] wrote:
 I'd like to try this patch 
 http://marc.theaimsgroup.com/?l=linux-raidm=110704868115609w=2 with 
 EVMS BBR.
 
 Has anyone tried it on 2.6.10 (with FC2 1.9 and EVMS patches)?  Has 
 anyone tried the rewrite part at all?  I don't know md or the kernel or 
 this patch, but the following lines of the patch seem suspicious to me.  
 Should it set R1BIO_ReadRewrite instead?  That bit gets tested later, 
 whereas it seems like R1BIO_ReadRetry is never tested and 
 R1BIO_ReadRewrite is never set.
 
 +#ifdef DO_ADD_READ_WRITE_CORRECT
 +   else/* tell next time we're here that we're a retry */
 +   set_bit(R1BIO_ReadRetry, r1_bio-state);
 +#endif /* DO_ADD_READ_WRITE_CORRECT */

Quite possibly - I never tested the rewrite part of the patch, just
wrote it to indicate how it should go and stuck it in to encourage
others to go on from there.  It's disabled by default.  You almost
certainly don't want to enable it unless you are a developer (or a
guinea pig :).

As I recall I set a new flag (don't remember offhand what it is called)
to help the code understand that it is dealing with a retry, which in
turn helps the patch be more modular and not introduce extra program
state.

Let me look ...

Yes, the only flag added for the write correction stuff was this in raid1.h:

+#ifdef DO_ADD_READ_WRITE_CORRECT
+#define R1BIO_ReadRewrite  7
+#endif /* DO_ADD_READ_WRITE_CORRECT */

So hmmm. Let me look at what the hidden (disabled, whatever) part
of the patch does. It's a patch entirely to raid1.c (plus one bit
definition in raid1.h).

The first part of the patch is the couple of lines you noted above,
which simply mark the master bio to say that we have had a failure
on read here that would normally cause a disk expulsion, and we haven't
done it. Those lines are harmless in themselves (out of context). They
are in raid1_end_read_request() and the (mirrored) request in question
has just failed to read on its target device.

The second part of the patch is lower down in the same function and
will not be activated in the situation that the previous hunk was
activated. The hunk says:

#ifdef DO_ADD_READ_WRITE_CORRECT
if (uptodate  test_bit(R1BIO_ReadRewrite, r1_bio-state)) {
/* Success at last - rewrite failed reads */
set_bit(R1BIO_IsSync, r1_bio-state);
reschedule_retry(r1_bio);
} else
#endif /* DO_ADD_READ_WRITE_CORRECT */


(the uptdate guarantees that we're not in the same invocation as
before if we get here, because the previous hunk was active only in the
!uptodate case). I believe therefore that a previous failed
read retried on another target and we are now in the retry, which has
succeeded. I therefore believe that we should be testing the same bit
as we set before. The R1BIO_ReadRetry bit is not tested anywhere else -
I  believe it should be tested now. That will tell us that this is a
retry and we want to try and start a write based on our successful
read, hoping that the write will mend whatever has gone wrong on the
disk.

So yes, the patch in the second hunk should read


#ifdef DO_ADD_READ_WRITE_CORRECT
if (uptodate  test_bit(R1BIO_ReadRetry, r1_bio-state)) {
/* Success at last - rewrite failed reads */
set_bit(R1BIO_IsSync, r1_bio-state);
reschedule_retry(r1_bio);
} else
#endif /* DO_ADD_READ_WRITE_CORRECT */

I clearly wasn't concentrating on the name of the bit. I was more
worried about the mechanism that might trigger a rewrite attempt
(sufficiently worried not to test it myself!). The idea is that
we launch reschedule_retry() which should put the request on a queue to
be dealt with by the raid1d daemon thread.

This thread normally handles resyncs, but it'll do its read-then-write
trick given half a chance. By setting the IsSync bit we make it think
that the read has already been done (well, it has!), and that it's time
to send out a write based on it to all mirrors. I expect that the
completed read will have signalled all the interested kernel buffers
that the data is available. I am unsure if I need to increment a
reference counter on those buffers to stop them vanishing while we are
doing a write from them in raid1d.

That appears to be the whole patch!

Peter

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Errenous sizeof use in raid1

2005-02-23 Thread Alexander Nyberg
This isn't a real bug as the smallest slab-size is 32 bytes 
but please apply for consistency.

Found by the Coverity tool.

Signed-off-by: Alexander Nyberg [EMAIL PROTECTED]

= drivers/md/raid1.c 1.105 vs edited =
--- 1.105/drivers/md/raid1.c2005-01-08 06:44:10 +01:00
+++ edited/drivers/md/raid1.c   2005-02-23 13:23:21 +01:00
@@ -1346,7 +1346,7 @@ static int raid1_reshape(mddev_t *mddev,
if (conf-mirrors[d].rdev)
return -EBUSY;
 
-   newpoolinfo = kmalloc(sizeof(newpoolinfo), GFP_KERNEL);
+   newpoolinfo = kmalloc(sizeof(*newpoolinfo), GFP_KERNEL);
if (!newpoolinfo)
return -ENOMEM;
newpoolinfo-mddev = mddev;


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 with 2 bad drives

2005-02-23 Thread Matthias Julius
Mike Hardy [EMAIL PROTECTED] writes:

 I posted a raid5 parity calculator implemented in perl a while back (a
 couple weeks?) that is capable of taking your disk geometry, the RAID
 LBA you're interested in, and finding the disk sector it belongs to.

 I honestly don't remember if it can go the other way, but I'm not sure
 why it couldn't? Its possible that bad blocks may simply be in the
 parity chunk of the stripe too. Once you've got the RAID LBA you can
 use the methods in the BadBlockHowto to find the file

That helps.  Although there is a typo in line 139.  See diff below.

When I know the RAID LBA how do I find out to which LV it belongs and
which sector it is in there?  But, I guess I better ask that on a LVM
list.

Matthias

--- raid5calc.orig  2005-02-23 08:26:43.721332354 -0500
+++ raid5calc   2005-02-23 08:30:12.673100526 -0500
@@ -136,7 +136,7 @@
 # Testing only -
 # Check to see if the result I got is the same as what is in the block
 open (DEVICE,  . $component{device})
-|| die Unable to open device  . $compoent{device} . :  . $! . \n;
+|| die Unable to open device  . $component{device} . :  . $! . \n;
 seek(DEVICE, $device_offset, 0)
 || die Unable to seek to  . $device_offset .  device  . 
$xor_devices{$i} . :  . $! . \n;
 read(DEVICE, $data, ($sectors_per_chunk * 512))

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: *terrible* direct-write performance with raid5

2005-02-23 Thread Michael Tokarev
dean gaudet wrote:
On Tue, 22 Feb 2005, Michael Tokarev wrote:

When debugging some other problem, I noticied that
direct-io (O_DIRECT) write speed on a software raid5
is terrible slow.  Here's a small table just to show
the idea (not numbers by itself as they vary from system
to system but how they relate to each other).  I measured
plain single-drive performance (sdX below), performance
of a raid5 array composed from 5 sdX drives, and ext3
filesystem (the file on the filesystem was pre-created
during tests).  Speed measurements performed with 8Kbyte
buffer aka write(fd, buf, 8192*1024), units a Mb/sec.
with O_DIRECT you told the kernel it couldn't cache anything... you're 
managing the cache.  you should either be writing 64KiB or you should 
change your chunksize to 8KiB (if it goes that low).
The picture does not change at all when changing raid chunk size.
With 8kb chunk speed is exactly the same as with 64kb or 256kb chunk.
Yes, increasing write buffer size helps alot.  Here's the write
performance in mb/sec for direct-write into an md device which
is a raid5 array built from 5 drives depending on the write
buffer size (in kb):
 buffer  md raid5sdX
   speedspeed
 1  0.2   14
 2  0.4   26
 4  0.9   41
 8  1.7   44
16  3.9   44
32 72.6   44
64 84.6   ..
   128 97.1
   256 53.7
   512 64.1
  1024 74.5
I've no idea why there's a drop in speed after 128kb blocksize,
but the more important is a huge drop with 32-16 kb blocksize.
The numbers are almost exactly the same with several chunksizes --
256kb, 64kb (default), 8kb and 4kb.
(note raid5 performs faster than a single drive, it's expectable
as it is possible to write to several drives in parallel).
The numbers also does not depend much on seeking -- obviously the
speed with seeking is worse than the above for sequential write,
but not much worse (about 10..20%, not 20 times as with 72 vs 4
mb/sec).
/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: *terrible* direct-write performance with raid5

2005-02-23 Thread Peter T. Breuer
Michael Tokarev [EMAIL PROTECTED] wrote:
 (note raid5 performs faster than a single drive, it's expectable
 as it is possible to write to several drives in parallel).

Each raid5 write must include at least ONE write to a target.  I think
you're saying that the writes go to different targets from time to time
and that when the targets are the bottlenecks then you get faster than
normal response.

H. That's actually quite difficult to calculate, because if say you
have three raid disks, then every time you write to the array you write
to two of those three (foget the read, which will come via readahead and
buffers).  Suppose that's no slower than one write to one disk, how
could you get any speed INCREASE?

Well, only by writing to a different two out of the three each time, or
near each time. If you first write to AB, then to BC, then to CA, and
repeat, then you have written 3 times but only kept each disk busy 2/3
of the time, so I suppose there is some opportunity for pipelining. Can
anyone see where?

   A B C  A B C  ...
   B C A  B C A  ...
   1 2 3  1 2 3

Maybe like this:


   A1 A3  A1 A3  ...
   B1 B2  B1 B2  ...
   C2 C3  C2 C3  ...

Yes. That seems to preserve local order and go 50% faster.

Peter

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [OT] best tape backup system?

2005-02-23 Thread Michael Tokarev
Guy wrote:
I know a thing or 2 about SCSI.  I know I had it correct.  1 config was all
wide LVD (2940U2W).  My card has a LVD and a SE port on the same logical
SCSI bus.
I was surprized once when I noticied such logical SCSI bus really isn't
logical per se.  I mean, if I plug ANY device into the SE port, all
devices, including the ones attached to LVD port switches to SE mode.
Another channel - yes, sure, but not another port (connector) on the
same channel.  Well, I can't say for all vendors and all cards, but the
ones we have here (mostly adaptec and ncr/sym) works this way.  For example,
Adaptec AHA-3940U2x controller: it's 2-channel card with 3 ports, two
68pin LVD/SE (one per channel) and one older scsi-like-ide connector
attached to 1st channel -- any device attached to this last connector
forces the whole 1st channel to go into SE mode, and no LVD-only device
works (in fact, it does not work at all in this case with any LVD-only
device attached).
  Another config was wide LVD disks and a narrow SE tape.  Another,
my disks support SE, I had all SE wide and narrow.  Correct terminators in
each case.  I also tried a 2940UW, no LVD, all SE.  All configs worked if I
only used the disks or the tape drives, but failed if I used disks and tape
at the same time.
Well, this is weird.  We have numerous configurations with mixed tapes and
disks (and other stuff) and had no single problem so far, everything just
works (except of that obvious LVD vs SE issue).  Including 2940UW and other
controllers.
/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [OT] best tape backup system?

2005-02-23 Thread Guy
We must be in 2 different worlds!!!

I can have wide LVD and narrow SE on the same card (2940U2W).  And wide
ultra and narrow SE on the same card (2940UW).  That is why the card is so
good.  IMO.  Just not with linux.  The OS that must not be named, supports
the above.  :(  In fact, I have a PC with a 2940UW with 2 disks (UW), 2 CDs,
1 DDS3 and a zip drive.  All happy with OS to not be named 98.  Maybe I
should try knopix, and see if that system likes Linux.

However, as I said before.  I did not swap out every component.  I used the
same disks every time, but did swap the data cables, tape drives, SCSI cards
and terminators.  But those disks still work today in U2W (LVD-80) mode.

My last attempt to mix SCSI disks and tapes was over 1 year ago, using RH9.
On previous attempts I would have used RH7.  I don't recall ever using RH8.

Guy

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Tokarev
Sent: Wednesday, February 23, 2005 1:29 PM
To: linux-raid@vger.kernel.org
Subject: Re: [OT] best tape backup system?

Guy wrote:
 I know a thing or 2 about SCSI.  I know I had it correct.  1 config was
all
 wide LVD (2940U2W).  My card has a LVD and a SE port on the same logical
 SCSI bus.

I was surprized once when I noticied such logical SCSI bus really isn't
logical per se.  I mean, if I plug ANY device into the SE port, all
devices, including the ones attached to LVD port switches to SE mode.
Another channel - yes, sure, but not another port (connector) on the
same channel.  Well, I can't say for all vendors and all cards, but the
ones we have here (mostly adaptec and ncr/sym) works this way.  For example,
Adaptec AHA-3940U2x controller: it's 2-channel card with 3 ports, two
68pin LVD/SE (one per channel) and one older scsi-like-ide connector
attached to 1st channel -- any device attached to this last connector
forces the whole 1st channel to go into SE mode, and no LVD-only device
works (in fact, it does not work at all in this case with any LVD-only
device attached).

   Another config was wide LVD disks and a narrow SE tape.  Another,
 my disks support SE, I had all SE wide and narrow.  Correct terminators in
 each case.  I also tried a 2940UW, no LVD, all SE.  All configs worked if
I
 only used the disks or the tape drives, but failed if I used disks and
tape
 at the same time.

Well, this is weird.  We have numerous configurations with mixed tapes and
disks (and other stuff) and had no single problem so far, everything just
works (except of that obvious LVD vs SE issue).  Including 2940UW and other
controllers.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID1 robust read and read/write correct and EVMS-BBR

2005-02-23 Thread Nagpure, Dinesh
Hi,

I noticed the discussion about robust read on the RAID list and similar one
on the EVMS list so I am sending this mail to both the lists. Latent media
faults which prevent data from being read from portions of a disk has always
been a concern for us. Such faults will go undetected till the time that
block is read. RAID 1 depends on error free mirrors for proper operation and
undiscovered bad blocks would only give pseudo illusion of duplexity when in
reality the array should be degraded. Over long run all the mirrors might
develop latent media faults and none can be replaced with a new disk. Also
it is a disaster if the same block goes bad on all the mirrors in a RAID 1
volume. With this concern we developed what we call disk-scrubber. The
approach was to proactively seek for bad spots on the disk and when one is
discovered, read the correct data from the other mirror and use it to repair
the disk by way of a write. SCSI disks automatically repair bad spots on
write by internally mapping the bad spots to spare sectors (Being SCSI
centric might be one limitation of this solution).
The implementation comprised of a thread that looks for bad spots by way of
slow repeated continuous scan through all disks. The RAID error management
was extended to attempt a repair on read error from a RAID 1 array to permit
fixing of user discovered bad spots as well as those discovered by the
scrubber. The work is lk2.4.26 based as of now.

I can go back and put together a patch over the weekend if anyone is
interested in using it. 

-dinesh
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 robust read and read/write correct and EVMS-BBR

2005-02-23 Thread J. David Beutel
Nagpure, Dinesh wrote, on 2005-Feb-23 9:55 AM:
I can go back and put together a patch over the weekend if anyone is
interested in using it. 
 

Yes, please, I'm very interested in using it.
Cheers,
11011011
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread J. David Beutel
Peter T. Breuer wrote, on 2005-Feb-23 1:50 AM:
Quite possibly - I never tested the rewrite part of the patch, just
wrote it to indicate how it should go and stuck it in to encourage
others to go on from there.  It's disabled by default.  You almost
certainly don't want to enable it unless you are a developer (or a
guinea pig :).
 

Thanks for taking a look at it!  Unfortunately, I'm not a kernel 
developer.  I haven't even been using C for the last 8 years.  But I'd 
really like to have that rewrite functionality, and I can dedicate my 
system as a guinea pig for at least a little while, if there's a way I 
can test it in a finite amount of time and build some confidence in it 
before I start to really use that system.

I'd like to start with an md unit test suite.  Is there one?  I don't 
know if the architecture would allow for this, but naively I'm thinking 
that the test suite would use a mock disk driver (e.g., in memory only) 
to simulate various kinds of hardware failures and confirm that md 
responds as expected to both the layer above (the kernel?) and below 
(the disk driver?).  Unit tests are also good for simulating unlikely 
and hard to reproduce race conditions, although stress tests are better 
at discovering new ones.   But, should the test suite play the roll of 
the kernel by calling md functions directly in a user space sandbox 
(mock kernel, threads, etc)?  Or, should it play the roll of a user 
process by calling the real kernel to test the real md (broadening the 
scope of the test)?  I'd appreciate opinions or advice from kernel or md 
developers.

Also, does anyone have advice on how I should do system and stress tests 
on this?

Cheers,
11011011
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RAID1 robust read and read/write correct and EVMS-BBR

2005-02-23 Thread Guy
This is very good!  But most of my disk space is RAID5.  Any chance you have
similar plans for RAID5?

Thanks,
Guy

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Nagpure, Dinesh
Sent: Wednesday, February 23, 2005 2:56 PM
To: '[EMAIL PROTECTED]'
Cc: 'linux-raid@vger.kernel.org'
Subject: RAID1 robust read and read/write correct and EVMS-BBR

Hi,

I noticed the discussion about robust read on the RAID list and similar one
on the EVMS list so I am sending this mail to both the lists. Latent media
faults which prevent data from being read from portions of a disk has always
been a concern for us. Such faults will go undetected till the time that
block is read. RAID 1 depends on error free mirrors for proper operation and
undiscovered bad blocks would only give pseudo illusion of duplexity when in
reality the array should be degraded. Over long run all the mirrors might
develop latent media faults and none can be replaced with a new disk. Also
it is a disaster if the same block goes bad on all the mirrors in a RAID 1
volume. With this concern we developed what we call disk-scrubber. The
approach was to proactively seek for bad spots on the disk and when one is
discovered, read the correct data from the other mirror and use it to repair
the disk by way of a write. SCSI disks automatically repair bad spots on
write by internally mapping the bad spots to spare sectors (Being SCSI
centric might be one limitation of this solution).
The implementation comprised of a thread that looks for bad spots by way of
slow repeated continuous scan through all disks. The RAID error management
was extended to attempt a repair on read error from a RAID 1 array to permit
fixing of user discovered bad spots as well as those discovered by the
scrubber. The work is lk2.4.26 based as of now.

I can go back and put together a patch over the weekend if anyone is
interested in using it. 

-dinesh
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 robust read and read/write correct patch

2005-02-23 Thread Peter T. Breuer
J. David Beutel [EMAIL PROTECTED] wrote:
 Peter T. Breuer wrote, on 2005-Feb-23 1:50 AM:
 
  Quite possibly - I never tested the rewrite part of the patch, just
 
 wrote it to indicate how it should go and stuck it in to encourage
 others to go on from there.  It's disabled by default.  You almost
 certainly don't want to enable it unless you are a developer (or a
 guinea pig :).
 
 Thanks for taking a look at it!  Unfortunately, I'm not a kernel 
 developer.  I haven't even been using C for the last 8 years.  But I'd 
 really like to have that rewrite functionality, and I can dedicate my 
 system as a guinea pig for at least a little while, if there's a way I 
 can test it in a finite amount of time and build some confidence in it 
 before I start to really use that system.

I'd say that theres's about a 50/50 chance that it will work as it is
without crashing the kernel. But it's impossible to say until somebody
tries it unless more people offer their kibbitzing thought-experiments
first!

I can run the 2.4 UML kernel safely for tests, but it's not as good as
running a real kernel because you don't get an OOPS when things go bad.
You just don't have to suffer the psych pain of rebooting! So you can
do more debugging, although the debugging is not as good. And you stil 
have to set up the test again each time.

I'm particularly unclear in the present patch if end_io is run on the
original read request after it's been retried and used as the first
half of a read-write resync pair. I simply can't see from the code, and
running is the only way of finding out. There are also possible race
conditions against the resync thread proper under some conditions, but
that won't be a problem in testing.

 I'd like to start with an md unit test suite.  Is there one?  I don't 

!! I always simply do

   dd if=/dev/zero of=/tmp/core0 bs=4k count=1k
   dd if=/dev/zero of=/tmp/core1 bs=4k count=1k
   losetup /dev/loop0 /tmp/core0
   losetup /dev/loop1 /tmp/core1
   mdadm -C -l 1 -n 2 -x 0 --force /dev/md0 /dev/loop[01]

or something very like that.

 know if the architecture would allow for this, but naively I'm thinking 
 that the test suite would use a mock disk driver (e.g., in memory only) 
 to simulate various kinds of hardware failures and confirm that md 

Uh, one can do that via the devicemapper (dmsetup?) but I've never
bothered - it's much simpler to add a line or two to the code along the
lines of if (bio-b_sector == 1024) return -1; in order to simulate an
error. 

One could add ioctls to make that configurable, but by then we're in dm
territory.

 responds as expected to both the layer above (the kernel?) and below 
 (the disk driver?).  Unit tests are also good for simulating unlikely 
 and hard to reproduce race conditions, although stress tests are better 

Well, if you could make a dm-based testrig, yes please!

 at discovering new ones.   But, should the test suite play the roll of 
 the kernel by calling md functions directly in a user space sandbox 

Never mind that for now. The actual user space reads or writes
can be in a makefile. The difficulty is engineering the devices
to have the intended failures.

 (mock kernel, threads, etc)?  Or, should it play the roll of a user 
 process by calling the real kernel to test the real md (broadening the 

No - nothing like that. The testsuite will be run under a kernel. It's
not your business to know if it's a real kernel or a uml kernel or some
other kind of sandbox.

 scope of the test)?  I'd appreciate opinions or advice from kernel or md 
 developers.
 
 Also, does anyone have advice on how I should do system and stress tests 
 on this?

Well, setting up is the major problem. After that running the tests is
just standard scripting.

Peter

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Posible bug on RAID 1 driver

2005-02-23 Thread Jesús Rojo Martínez
Hi,

  The Linux RAID crushes when recuntructing a disk.


  We have a RAID 1 disk over two active SATA disks and one spare SATA
disk.

  I was probing the RAID and I finded that the RAID ocasionally crashes.
I adjunt to this email the secuences of commands I wrote.

  When I swap one disk on the RAID, the spare becomes active and its
information is reconstructed. I do this repeatedly and all is well.

  But, sometimes, It crushes and the spare disk isn't reconstructed. The
/proc/mdstat informs that velocity is... 0 Kb/s!!! And is because this
that the system doesn't respond correctly (really the system responds,
but it doesn't write on the RAID, and the RAID doesn't umounts and when
I halt the system, it hungs waiting for the RAID).

  I use raidtools and mdadm tools, is for that I think the problem is in
the kernel driver, not in the tools or my secuence of commands.

  I use a 2.4.29 kernel (on a Debian sarge GNU/Linux).

  I am not familiarized with the source of that. Because of this I
report the bug to you.


  Please, acknowledge me about this email.

  Thanks a lot.


  Regards,

-- 

--- Jesús Rojo Martínez. ---


P.D.: Sorry about my poor english.
centralmad:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[1] sdb1[0] sdc1[2]
  999872 blocks [2/2] [UU]
  
unused devices: none
centralmad:~# DEVICE=/dev/sdd1 ; raidsetfaulty /dev/md1 $DEVICE ; raidhotremove 
/dev/md1 $DEVICE ; cat /proc/mdstat;
   
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdb1[0] sdc1[2]
  999872 blocks [2/1] [U_]
  
unused devices: none
centralmad:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdb1[0] sdc1[2]
  999872 blocks [2/1] [U_]
  [==..]  recovery = 14.4% (145088/999872) finish=0.1min 
speed=72544K/sec
unused devices: none
centralmad:~# cat /proc/mdstat; raidhotadd /dev/md1 $DEVICE ; cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdb1[0] sdc1[1]
  999872 blocks [2/2] [UU]
  
unused devices: none
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[2] sdb1[0] sdc1[1]
  999872 blocks [2/2] [UU]
  
unused devices: none
centralmad:~# DEVICE=/dev/sdb1 ; raidsetfaulty /dev/md1 $DEVICE ; raidhotremove 
/dev/md1 $DEVICE ; cat /proc/mdstat; 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[2] sdc1[1]
  999872 blocks [2/1] [_U]
  
unused devices: none
centralmad:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[2] sdc1[1]
  999872 blocks [2/1] [_U]
  [==..]  recovery = 32.8% (328848/999872) finish=0.2min 
speed=54808K/sec
unused devices: none
centralmad:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[2] sdc1[1]
  999872 blocks [2/1] [_U]
  [===.]  recovery = 96.4% (964864/999872) finish=0.0min 
speed=56756K/sec
unused devices: none
centralmad:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[0] sdc1[1]
  999872 blocks [2/2] [UU]
  
unused devices: none
centralmad:~# cat /proc/mdstat; raidhotadd /dev/md1 $DEVICE ; cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdd1[0] sdc1[1]
  999872 blocks [2/2] [UU]
  
unused devices: none
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdb1[2] sdd1[0] sdc1[1]
  999872 blocks [2/2] [UU]
  
unused devices: none
centralmad:~# DEVICE=/dev/sdc1 ; raidsetfaulty /dev/md1 $DEVICE ; raidhotremove 
/dev/md1 $DEVICE ; cat /proc/mdstat; 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdb1[2] sdd1[0]
  999872 blocks [2/1] [U_]
  []  recovery =  0.0% (0/999872) finish=166.6min 
speed=0K/sec
unused devices: none
centralmad:~# cat /proc/mdstat; raidhotadd /dev/md1 $DEVICE ; cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdb1[2] sdd1[0]
  999872 blocks [2/1] [U_]
  []  recovery =  0.0% (0/999872) finish=166.6min 
speed=0K/sec
unused devices: none
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdc1[3] sdb1[2] sdd1[0]
  999872 blocks [2/1] [U_]
  []  recovery =  0.0% (0/999872) finish=166.6min 
speed=0K/sec
unused devices: none
centralmad:~# cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdc1[3] sdb1[2] sdd1[0]
  999872 blocks [2/1] [U_]
  []  recovery =  0.0% (0/999872) finish=1166.5min 
speed=0K/sec
unused devices: none
centralmad:~#


Re: RAID1 robust read and read/write correct and EVMS-BBR

2005-02-23 Thread bernd

I can go back and put together a patch over the weekend if anyone is
interested in using it. 

-dinesh
[EMAIL PROTECTED]
-

Oh yes, please make this patch. We are very very interested in it!

We are waiting for the one day where the same block on all mirrors has
read problems. Ok, we're now waiting for about 15 years because the
HPUX mirror strategy is the same. Quite a long time without desaster
but it will happen (till today Murphy was right in any case but one:-)).
If anything happens to a disk one must be warned as soon as possible.

Thanks
B. Rieke ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html