Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock

2008-01-29 Thread K.Tanaka
Hi,

Also, md raid10 seems to have the same problem.
I will test raid10 applying this patch as well.

Sorry for the late response. I had a trouble with reproducing the problem,
but it turns out that the 2.6.24 kernel needs the latest (possibly testing)
version of systemtap-0.6.1-1 to run systemtap for the fault injection tool.

I've reproduced the stall on both raid1 and raid10 using 2.6.24.
Also I've tested the patch applied to 2.6.24 and confirmed that
it will fix the stall problem for both cases.

K.Tanaka wrote:
 Hi,
 
 Thank you for the patch.
 I have applied the patch to 2.6.23.14 and it works well.
 
 - In case of 2.6.23.14, the problem is reproduced.
 - In case of 2.6.23.14 with this patch, raid1 works well so far.
   The fault injection script continues to run, and it doesn't deadlock.
   I will keep it running for a while.
 
 Also, md raid10 seems to have the same problem.
 I will test raid10 applying this patch as well.
 
 
 Neil Brown wrote:
 On Tuesday January 15, [EMAIL PROTECTED] wrote:
 This message describes the details about md-RAID1 issue found by
 testing the md RAID1 using the SCSI fault injection framework.

 Abstract:
 Both the error handler for md RAID1 and write access request to the md RAID1
 use raid1d kernel thread. The nr_pending flag could cause a race condition
 in raid1d, results in a raid1d deadlock.
 Thanks for finding and reporting this.

 I believe the following patch should fix the deadlock.

 If you are able to repeat your test and confirm this I would
 appreciate it.

 Thanks,
 NeilBrown



 Fix deadlock in md/raid1 when handling a read error.

 When handling a read error, we freeze the array to stop any other
 IO while attempting to over-write with correct data.


-- 
-
Kenichi TANAKA| Open Source Software Platform Development Division
  | Computers Software Operations Unit, NEC Corporation
  | [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock

2008-01-24 Thread K.Tanaka
Hi,

Thank you for the patch.
I have applied the patch to 2.6.23.14 and it works well.

- In case of 2.6.23.14, the problem is reproduced.
- In case of 2.6.23.14 with this patch, raid1 works well so far.
  The fault injection script continues to run, and it doesn't deadlock.
  I will keep it running for a while.

Also, md raid10 seems to have the same problem.
I will test raid10 applying this patch as well.


Neil Brown wrote:
 On Tuesday January 15, [EMAIL PROTECTED] wrote:
 This message describes the details about md-RAID1 issue found by
 testing the md RAID1 using the SCSI fault injection framework.

 Abstract:
 Both the error handler for md RAID1 and write access request to the md RAID1
 use raid1d kernel thread. The nr_pending flag could cause a race condition
 in raid1d, results in a raid1d deadlock.
 
 Thanks for finding and reporting this.
 
 I believe the following patch should fix the deadlock.
 
 If you are able to repeat your test and confirm this I would
 appreciate it.
 
 Thanks,
 NeilBrown
 
 
 
 Fix deadlock in md/raid1 when handling a read error.
 
 When handling a read error, we freeze the array to stop any other
 IO while attempting to over-write with correct data.
 
 This is done in the raid1d thread and must wait for all submitted IO
 to complete (except for requests that failed and are sitting in the
 retry queue - these are counted in -nr_queue and will stay there during
 a freeze).
 
 However write requests need attention from raid1d as bitmap updates
 might be required.  This can cause a deadlock as raid1 is waiting for
 requests to finish that themselves need attention from raid1d.
 
 So we create a new function 'flush_pending_writes' to give that attention,
 and call it in freeze_array to be sure that we aren't waiting on raid1d.
 
 Thanks to K.Tanaka [EMAIL PROTECTED] for finding and reporting
 this problem.
 
 Cc: K.Tanaka [EMAIL PROTECTED]
 Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
-- 
-
Kenichi TANAKA| Open Source Software Platform Development Division
  | Computers Software Operations Unit, NEC Corporation
  | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock

2008-01-23 Thread Neil Brown
On Tuesday January 15, [EMAIL PROTECTED] wrote:
 
 This message describes the details about md-RAID1 issue found by
 testing the md RAID1 using the SCSI fault injection framework.
 
 Abstract:
 Both the error handler for md RAID1 and write access request to the md RAID1
 use raid1d kernel thread. The nr_pending flag could cause a race condition
 in raid1d, results in a raid1d deadlock.

Thanks for finding and reporting this.

I believe the following patch should fix the deadlock.

If you are able to repeat your test and confirm this I would
appreciate it.

Thanks,
NeilBrown



Fix deadlock in md/raid1 when handling a read error.

When handling a read error, we freeze the array to stop any other
IO while attempting to over-write with correct data.

This is done in the raid1d thread and must wait for all submitted IO
to complete (except for requests that failed and are sitting in the
retry queue - these are counted in -nr_queue and will stay there during
a freeze).

However write requests need attention from raid1d as bitmap updates
might be required.  This can cause a deadlock as raid1 is waiting for
requests to finish that themselves need attention from raid1d.

So we create a new function 'flush_pending_writes' to give that attention,
and call it in freeze_array to be sure that we aren't waiting on raid1d.

Thanks to K.Tanaka [EMAIL PROTECTED] for finding and reporting
this problem.

Cc: K.Tanaka [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid1.c |   66 ++-
 1 file changed, 45 insertions(+), 21 deletions(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c2008-01-18 11:19:09.0 +1100
+++ ./drivers/md/raid1.c2008-01-24 14:21:55.0 +1100
@@ -592,6 +592,37 @@ static int raid1_congested(void *data, i
 }
 
 
+static int flush_pending_writes(conf_t *conf)
+{
+   /* Any writes that have been queue but are awaiting
+* bitmap updates get flushed here.
+* We return 1 if any requests were actually submitted.
+*/
+   int rv = 0;
+
+   spin_lock_irq(conf-device_lock);
+
+   if (conf-pending_bio_list.head) {
+   struct bio *bio;
+   bio = bio_list_get(conf-pending_bio_list);
+   blk_remove_plug(conf-mddev-queue);
+   spin_unlock_irq(conf-device_lock);
+   /* flush any pending bitmap writes to
+* disk before proceeding w/ I/O */
+   bitmap_unplug(conf-mddev-bitmap);
+
+   while (bio) { /* submit pending writes */
+   struct bio *next = bio-bi_next;
+   bio-bi_next = NULL;
+   generic_make_request(bio);
+   bio = next;
+   }
+   rv = 1;
+   } else
+   spin_unlock_irq(conf-device_lock);
+   return rv;
+}
+
 /* Barriers
  * Sometimes we need to suspend IO while we do something else,
  * either some resync/recovery, or reconfigure the array.
@@ -678,10 +709,14 @@ static void freeze_array(conf_t *conf)
spin_lock_irq(conf-resync_lock);
conf-barrier++;
conf-nr_waiting++;
+   spin_unlock_irq(conf-resync_lock);
+
+   spin_lock_irq(conf-resync_lock);
wait_event_lock_irq(conf-wait_barrier,
conf-barrier+conf-nr_pending == conf-nr_queued+2,
conf-resync_lock,
-   raid1_unplug(conf-mddev-queue));
+   ({ flush_pending_writes(conf);
+  raid1_unplug(conf-mddev-queue); }));
spin_unlock_irq(conf-resync_lock);
 }
 static void unfreeze_array(conf_t *conf)
@@ -907,6 +942,9 @@ static int make_request(struct request_q
blk_plug_device(mddev-queue);
spin_unlock_irqrestore(conf-device_lock, flags);
 
+   /* In case raid1d snuck into freeze_array */
+   wake_up(conf-wait_barrier);
+
if (do_sync)
md_wakeup_thread(mddev-thread);
 #if 0
@@ -1473,28 +1511,14 @@ static void raid1d(mddev_t *mddev)

for (;;) {
char b[BDEVNAME_SIZE];
-   spin_lock_irqsave(conf-device_lock, flags);
-
-   if (conf-pending_bio_list.head) {
-   bio = bio_list_get(conf-pending_bio_list);
-   blk_remove_plug(mddev-queue);
-   spin_unlock_irqrestore(conf-device_lock, flags);
-   /* flush any pending bitmap writes to disk before 
proceeding w/ I/O */
-   bitmap_unplug(mddev-bitmap);
-
-   while (bio) { /* submit pending writes */
-   struct bio *next = bio-bi_next;
-   bio-bi_next = NULL;
-   generic_make_request(bio);
-   bio = next;
-

[BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock

2008-01-14 Thread K.Tanaka

This message describes the details about md-RAID1 issue found by
testing the md RAID1 using the SCSI fault injection framework.

Abstract:
Both the error handler for md RAID1 and write access request to the md RAID1
use raid1d kernel thread. The nr_pending flag could cause a race condition
in raid1d, results in a raid1d deadlock.

Details:
 error handlingwrite operation
   
--
A-1. Issue a read request

A-2  SCSI error detected
   B-1. make_request() for raid1 
starts.
A-3. raid1_end_read_request() is called
 in the interrupt context. It detects
 read error and wakes up raid1d
 kernel thread.B-2. make_request() calls 
wait_barrier() to
increment nr_pending flag.
A-4. raid1d wake up

A-5. raid1d calls freeze_array() and waiting
 for nr_pending to be decremented.
 That means stop IO and wait for   B-3. make_request() wakes up 
raid1d kernel thread
 everything to go quite.to send write request to 
the lower layer.

   B-4. raid1d wake up (already 
waken up by A-3)

   (  process stalls here because A-5 never ends )

A-6. raid1d calls fix_read_error() to
 handle read error.B-5. raid1d calls 
generic_make_request() for write request.

   B-6. raid1_end_write_request() 
is called in the
interrupt context when the 
write access is completed
and nr_pending flag is 
decremented.
The deadlock mechanism:
If raid1d waken up by detecting read error (A-4) goes into freeze_array()
right after make_request() for write request has incremented nr_pending 
flag(B-2),
raid1d stalls waiting for nr_pending flag to be decremented (A-5).
On the other hand, nr_pending flag incremented by make_request() for write 
request
will never be decremented because the flag can be decremented after raid1d 
issues
generic_make_request() (B-5, B-6) but now raid1d is stopped.

This problem could could easily be reproduced with by using the new fault 
injection framework,
using no response from the SCSI device simulation.
However, it could also occur if raid1 error handler contends with write
operation,  but with low probability.

I will report the other problems after I clean up and post the code for
the scsi fault injection framework.

-- 

Kenichi TANAKA| Open Source Software Platform Development Division
  | Computers Software Operations Unit, NEC Corporation
  | [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html