Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock
Hi, Also, md raid10 seems to have the same problem. I will test raid10 applying this patch as well. Sorry for the late response. I had a trouble with reproducing the problem, but it turns out that the 2.6.24 kernel needs the latest (possibly testing) version of systemtap-0.6.1-1 to run systemtap for the fault injection tool. I've reproduced the stall on both raid1 and raid10 using 2.6.24. Also I've tested the patch applied to 2.6.24 and confirmed that it will fix the stall problem for both cases. K.Tanaka wrote: Hi, Thank you for the patch. I have applied the patch to 2.6.23.14 and it works well. - In case of 2.6.23.14, the problem is reproduced. - In case of 2.6.23.14 with this patch, raid1 works well so far. The fault injection script continues to run, and it doesn't deadlock. I will keep it running for a while. Also, md raid10 seems to have the same problem. I will test raid10 applying this patch as well. Neil Brown wrote: On Tuesday January 15, [EMAIL PROTECTED] wrote: This message describes the details about md-RAID1 issue found by testing the md RAID1 using the SCSI fault injection framework. Abstract: Both the error handler for md RAID1 and write access request to the md RAID1 use raid1d kernel thread. The nr_pending flag could cause a race condition in raid1d, results in a raid1d deadlock. Thanks for finding and reporting this. I believe the following patch should fix the deadlock. If you are able to repeat your test and confirm this I would appreciate it. Thanks, NeilBrown Fix deadlock in md/raid1 when handling a read error. When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. -- - Kenichi TANAKA| Open Source Software Platform Development Division | Computers Software Operations Unit, NEC Corporation | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock
Hi, Thank you for the patch. I have applied the patch to 2.6.23.14 and it works well. - In case of 2.6.23.14, the problem is reproduced. - In case of 2.6.23.14 with this patch, raid1 works well so far. The fault injection script continues to run, and it doesn't deadlock. I will keep it running for a while. Also, md raid10 seems to have the same problem. I will test raid10 applying this patch as well. Neil Brown wrote: On Tuesday January 15, [EMAIL PROTECTED] wrote: This message describes the details about md-RAID1 issue found by testing the md RAID1 using the SCSI fault injection framework. Abstract: Both the error handler for md RAID1 and write access request to the md RAID1 use raid1d kernel thread. The nr_pending flag could cause a race condition in raid1d, results in a raid1d deadlock. Thanks for finding and reporting this. I believe the following patch should fix the deadlock. If you are able to repeat your test and confirm this I would appreciate it. Thanks, NeilBrown Fix deadlock in md/raid1 when handling a read error. When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. This is done in the raid1d thread and must wait for all submitted IO to complete (except for requests that failed and are sitting in the retry queue - these are counted in -nr_queue and will stay there during a freeze). However write requests need attention from raid1d as bitmap updates might be required. This can cause a deadlock as raid1 is waiting for requests to finish that themselves need attention from raid1d. So we create a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to K.Tanaka [EMAIL PROTECTED] for finding and reporting this problem. Cc: K.Tanaka [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] -- - Kenichi TANAKA| Open Source Software Platform Development Division | Computers Software Operations Unit, NEC Corporation | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock
On Tuesday January 15, [EMAIL PROTECTED] wrote: This message describes the details about md-RAID1 issue found by testing the md RAID1 using the SCSI fault injection framework. Abstract: Both the error handler for md RAID1 and write access request to the md RAID1 use raid1d kernel thread. The nr_pending flag could cause a race condition in raid1d, results in a raid1d deadlock. Thanks for finding and reporting this. I believe the following patch should fix the deadlock. If you are able to repeat your test and confirm this I would appreciate it. Thanks, NeilBrown Fix deadlock in md/raid1 when handling a read error. When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. This is done in the raid1d thread and must wait for all submitted IO to complete (except for requests that failed and are sitting in the retry queue - these are counted in -nr_queue and will stay there during a freeze). However write requests need attention from raid1d as bitmap updates might be required. This can cause a deadlock as raid1 is waiting for requests to finish that themselves need attention from raid1d. So we create a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to K.Tanaka [EMAIL PROTECTED] for finding and reporting this problem. Cc: K.Tanaka [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid1.c | 66 ++- 1 file changed, 45 insertions(+), 21 deletions(-) diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c2008-01-18 11:19:09.0 +1100 +++ ./drivers/md/raid1.c2008-01-24 14:21:55.0 +1100 @@ -592,6 +592,37 @@ static int raid1_congested(void *data, i } +static int flush_pending_writes(conf_t *conf) +{ + /* Any writes that have been queue but are awaiting +* bitmap updates get flushed here. +* We return 1 if any requests were actually submitted. +*/ + int rv = 0; + + spin_lock_irq(conf-device_lock); + + if (conf-pending_bio_list.head) { + struct bio *bio; + bio = bio_list_get(conf-pending_bio_list); + blk_remove_plug(conf-mddev-queue); + spin_unlock_irq(conf-device_lock); + /* flush any pending bitmap writes to +* disk before proceeding w/ I/O */ + bitmap_unplug(conf-mddev-bitmap); + + while (bio) { /* submit pending writes */ + struct bio *next = bio-bi_next; + bio-bi_next = NULL; + generic_make_request(bio); + bio = next; + } + rv = 1; + } else + spin_unlock_irq(conf-device_lock); + return rv; +} + /* Barriers * Sometimes we need to suspend IO while we do something else, * either some resync/recovery, or reconfigure the array. @@ -678,10 +709,14 @@ static void freeze_array(conf_t *conf) spin_lock_irq(conf-resync_lock); conf-barrier++; conf-nr_waiting++; + spin_unlock_irq(conf-resync_lock); + + spin_lock_irq(conf-resync_lock); wait_event_lock_irq(conf-wait_barrier, conf-barrier+conf-nr_pending == conf-nr_queued+2, conf-resync_lock, - raid1_unplug(conf-mddev-queue)); + ({ flush_pending_writes(conf); + raid1_unplug(conf-mddev-queue); })); spin_unlock_irq(conf-resync_lock); } static void unfreeze_array(conf_t *conf) @@ -907,6 +942,9 @@ static int make_request(struct request_q blk_plug_device(mddev-queue); spin_unlock_irqrestore(conf-device_lock, flags); + /* In case raid1d snuck into freeze_array */ + wake_up(conf-wait_barrier); + if (do_sync) md_wakeup_thread(mddev-thread); #if 0 @@ -1473,28 +1511,14 @@ static void raid1d(mddev_t *mddev) for (;;) { char b[BDEVNAME_SIZE]; - spin_lock_irqsave(conf-device_lock, flags); - - if (conf-pending_bio_list.head) { - bio = bio_list_get(conf-pending_bio_list); - blk_remove_plug(mddev-queue); - spin_unlock_irqrestore(conf-device_lock, flags); - /* flush any pending bitmap writes to disk before proceeding w/ I/O */ - bitmap_unplug(mddev-bitmap); - - while (bio) { /* submit pending writes */ - struct bio *next = bio-bi_next; - bio-bi_next = NULL; - generic_make_request(bio); - bio = next; -
[BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock
This message describes the details about md-RAID1 issue found by testing the md RAID1 using the SCSI fault injection framework. Abstract: Both the error handler for md RAID1 and write access request to the md RAID1 use raid1d kernel thread. The nr_pending flag could cause a race condition in raid1d, results in a raid1d deadlock. Details: error handlingwrite operation -- A-1. Issue a read request A-2 SCSI error detected B-1. make_request() for raid1 starts. A-3. raid1_end_read_request() is called in the interrupt context. It detects read error and wakes up raid1d kernel thread.B-2. make_request() calls wait_barrier() to increment nr_pending flag. A-4. raid1d wake up A-5. raid1d calls freeze_array() and waiting for nr_pending to be decremented. That means stop IO and wait for B-3. make_request() wakes up raid1d kernel thread everything to go quite.to send write request to the lower layer. B-4. raid1d wake up (already waken up by A-3) ( process stalls here because A-5 never ends ) A-6. raid1d calls fix_read_error() to handle read error.B-5. raid1d calls generic_make_request() for write request. B-6. raid1_end_write_request() is called in the interrupt context when the write access is completed and nr_pending flag is decremented. The deadlock mechanism: If raid1d waken up by detecting read error (A-4) goes into freeze_array() right after make_request() for write request has incremented nr_pending flag(B-2), raid1d stalls waiting for nr_pending flag to be decremented (A-5). On the other hand, nr_pending flag incremented by make_request() for write request will never be decremented because the flag can be decremented after raid1d issues generic_make_request() (B-5, B-6) but now raid1d is stopped. This problem could could easily be reproduced with by using the new fault injection framework, using no response from the SCSI device simulation. However, it could also occur if raid1 error handler contends with write operation, but with low probability. I will report the other problems after I clean up and post the code for the scsi fault injection framework. -- Kenichi TANAKA| Open Source Software Platform Development Division | Computers Software Operations Unit, NEC Corporation | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html