[PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread NeilBrown
This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and 24-rc.
It would be great if it cold get into 23.13 and 24.final.
Thanks.
NeilBrown

### Comments for Changeset

We currently do not wait for the block from the missing device
to be computed from parity before copying data to the new stripe
layout.

The change in the raid6 code is not techincally needed as we
don't delay data block recovery in the same way for raid6 yet.
But making the change now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: [EMAIL PROTECTED]
Cc: Dan Williams [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2008-01-04 09:42:05.0 +1100
+++ ./drivers/md/raid5.c2008-01-04 09:42:27.0 +1100
@@ -2865,7 +2865,7 @@ static void handle_stripe5(struct stripe
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0  s.req_compute == 0)
handle_stripe_expansion(conf, sh, NULL);
 
if (sh-ops.count)
@@ -3067,7 +3067,7 @@ static void handle_stripe6(struct stripe
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0  s.req_compute == 0)
handle_stripe_expansion(conf, sh, r6s);
 
spin_unlock(sh-lock);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Dan Williams
On Thu, 2008-01-03 at 15:46 -0700, NeilBrown wrote:
 This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and 24-rc.
 It would be great if it cold get into 23.13 and 24.final.
 Thanks.
 NeilBrown
 
 ### Comments for Changeset
 
 We currently do not wait for the block from the missing device
 to be computed from parity before copying data to the new stripe
 layout.
 
 The change in the raid6 code is not techincally needed as we
 don't delay data block recovery in the same way for raid6 yet.
 But making the change now is safer long-term.
 
 This bug exists in 2.6.23 and 2.6.24-rc
 
 Cc: [EMAIL PROTECTED]
 Cc: Dan Williams [EMAIL PROTECTED]
 Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
Acked-by: Dan Williams [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Dan Williams
On Thu, 2008-01-03 at 16:00 -0700, Williams, Dan J wrote:
 On Thu, 2008-01-03 at 15:46 -0700, NeilBrown wrote:
  This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and
 24-rc.
  It would be great if it cold get into 23.13 and 24.final.
  Thanks.
  NeilBrown
 
  ### Comments for Changeset
 
  We currently do not wait for the block from the missing device
  to be computed from parity before copying data to the new stripe
  layout.
 
  The change in the raid6 code is not techincally needed as we
  don't delay data block recovery in the same way for raid6 yet.
  But making the change now is safer long-term.
 
  This bug exists in 2.6.23 and 2.6.24-rc
 
  Cc: [EMAIL PROTECTED]
  Cc: Dan Williams [EMAIL PROTECTED]
  Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
 Acked-by: Dan Williams [EMAIL PROTECTED]
 

On closer look the safer test is:

!test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending).

The 'req_compute' field only indicates that a 'compute_block' operation
was requested during this pass through handle_stripe so that we can
issue a linked chain of asynchronous operations.

---

From: Neil Brown [EMAIL PROTECTED]

md: Fix data corruption when a degraded raid5 array is reshaped.

We currently do not wait for the block from the missing device
to be computed from parity before copying data to the new stripe
layout.

The change in the raid6 code is not techincally needed as we
don't delay data block recovery in the same way for raid6 yet.
But making the change now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: [EMAIL PROTECTED]
Signed-off-by: Dan Williams [EMAIL PROTECTED]
---

 drivers/md/raid5.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a5aad8c..e8c8157 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2865,7 +2865,8 @@ static void handle_stripe5(struct stripe_head *sh)
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0 
+   !test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending))
handle_stripe_expansion(conf, sh, NULL);
 
if (sh-ops.count)
@@ -3067,7 +3068,8 @@ static void handle_stripe6(struct stripe_head *sh, struct 
page *tmp_page)
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0 
+   !test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending))
handle_stripe_expansion(conf, sh, r6s);
 
spin_unlock(sh-lock);

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Neil Brown
On Thursday January 3, [EMAIL PROTECTED] wrote:
 
 On closer look the safer test is:
 
   !test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending).
 
 The 'req_compute' field only indicates that a 'compute_block' operation
 was requested during this pass through handle_stripe so that we can
 issue a linked chain of asynchronous operations.
 
 ---
 
 From: Neil Brown [EMAIL PROTECTED]

Technically that should probably be
  From: Dan Williams [EMAIL PROTECTED]

now, and then I add
  Acked-by: NeilBrown [EMAIL PROTECTED]

because I completely agree with your improvement.

We should keep an eye out for then Andrew commits this and make sure
the right patch goes in...

Thanks,
NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html