Re: Checksums wrong on one disk of mirror
David wrote: snip mdadm is version 1.12. Looking at the most recently available version this seems incredibly out of date, but seems to be the default installed in Ubuntu. Even Debian stable seems to have 1.9. I can bug this with them for an update if necessary. It's already on it's way. Update to the comming Debian release Etch (due to be Stable in December 2006; if I remember correctly). In Etch the mdadm version is v2.5.3 (7 August 2006). Henrik Holst - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re[2]: RAID1 submirror failure causes reboot?
On Mon, Nov 13 2006, Neil Brown wrote: On Friday November 10, [EMAIL PROTECTED] wrote: Hello Neil, [87398.531579] blk: request botched NB NB That looks bad. Possible some bug in the IDE controller or elsewhere NB in the block layer. Jens: What might cause that? NB --snip-- NB That doesn't look like raid was involved. If it was you would expect NB to see raid1_end_write_request or raid1_end_read_request in that NB trace. So that might be the hard or soft part of IDE layer failing the system, or a PCI problem for example? What I think is happening here (and Jens: if you could tell me how impossible this is, that would be good) is this: Some error handling somewhere in the low-level ide driver is getting confused and somehow one the sector counts in the 'struct request' is getting set wrongly. blk_recalc_rq_sectors notices this and says blk: request botched. It tries to auto-correct by increasing rq-nr_sectors to be consistent with other counts. I'm *guessing* this is the wrong thing to do, and that it has a side-effect but bi_end_io is getting called on the Bi twice. The second time the bio has been freed and reused and the wrong b_end_io is called and it does the wrong thing. It doesn't sound at all unreasonable. It's most likely either a bug in the ide driver, or a bad bio being passed to the block layer (and later on to the request and driver). By bad I mean one that isn't entirely consistent, which could be a bug in eg md. The request botched error is usually a sign of something being severly screwed up. As you mention further down, get slab and page debugging enabled to potentially catch this earlier. It could be a sign of a freed bio or request with corrupt contents. This sounds a bit far-fetched, but it is the only explanation I can come up with for the observed back trace which is: [87403.706012] [c0103871] error_code+0x39/0x40 [87403.710794] [c0180e0a] mpage_end_io_read+0x5e/0x72 [87403.716154] [c0164af9] bio_endio+0x56/0x7b [87403.720798] [c0256778] __end_that_request_first+0x1e0/0x301 [87403.726985] [c02568a4] end_that_request_first+0xb/0xd [87403.732699] [c02bd73c] __ide_end_request+0x54/0xe1 [87403.738214] [c02bd807] ide_end_request+0x3e/0x5c [87403.743382] [c02c35df] task_error+0x5b/0x97 [87403.748113] [c02c36fa] task_in_intr+0x6e/0xa2 [87403.753120] [c02bf19e] ide_intr+0xaf/0x12c [87403.757815] [c013e5a7] handle_IRQ_event+0x23/0x57 [87403.763135] [c013e66f] __do_IRQ+0x94/0xfd [87403.767802] [c0105192] do_IRQ+0x32/0x68 [87403.772278] [c010372e] common_interrupt+0x1a/0x20 i.e. bio_endio goes straight to mpage_end_io despite the face that the filesystem is mounted over md/raid1. What is the workload? Is io to the real device mixed with io that came through md as well? Is the kernel compiled with CONFIG_DEBUG_SLAB=y and CONFIG_DEBUG_PAGEALLOC=y ?? They might help trigger the error earlier and so make the problem more obvious. Agree, that would be a good plan to enable. Other questions: are you seeing timeouts at any point? The ide timeout code has some request/bio resetting code which might be worrisome. NeilBrown -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re[2]: RAID1 submirror failure causes reboot?
On Monday November 13, [EMAIL PROTECTED] wrote: It doesn't sound at all unreasonable. It's most likely either a bug in the ide driver, or a bad bio being passed to the block layer (and later on to the request and driver). By bad I mean one that isn't entirely consistent, which could be a bug in eg md. I just noticed (while tracking raid6 problems...) that bio_clone calls bio_phys_segments and bio_hw_segments (why does it do both?). This calls blk_recount_segments which does calculations based on -bi_bdev. Only immediately after calling bio_clone, raid1 changes bi_bdev, thus creating potential inconsistency in the bio. Would this sort of inconsistency cause this problem? Agree, that would be a good plan to enable. Other questions: are you seeing timeouts at any point? The ide timeout code has some request/bio resetting code which might be worrisome. Jim could probably answer this with more authority, but there aren't obvious timeouts from the logs he posted. A representative sample is: [87338.675891] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87338.685143] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87338.694791] ide: failed opcode was: unknown [87343.557424] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87343.566388] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87343.576105] ide: failed opcode was: unknown [87348.472226] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87348.481170] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87348.490843] ide: failed opcode was: unknown [87353.387028] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87353.395735] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87353.405500] ide: failed opcode was: unknown [87353.461342] ide1: reset: success Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx
stripe_to_pdidx finds the index of the parity disk for a given stripe. It assumes raid5 in that it uses disks-1 to determine the number of data disks. This is incorrect for raid6 but fortunately the two usages cancel each other out. The only way that 'data_disks' affects the calculation of pd_idx in raid5_compute_sector is when it is divided into the sector number. But as that sector number is calculated by multiplying in the wrong value of 'data_disks' the division produces the right value. So it is innocuous but needs to be fixed. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid5.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2006-11-14 10:05:00.0 +1100 +++ ./drivers/md/raid5.c2006-11-14 10:33:41.0 +1100 @@ -1355,8 +1355,10 @@ static int stripe_to_pdidx(sector_t stri int pd_idx, dd_idx; int chunk_offset = sector_div(stripe, sectors_per_chunk); - raid5_compute_sector(stripe*(disks-1)*sectors_per_chunk -+ chunk_offset, disks, disks-1, dd_idx, pd_idx, conf); + raid5_compute_sector(stripe * (disks - conf-max_degraded) +*sectors_per_chunk + chunk_offset, +disks, disks - conf-max_degraded, +dd_idx, pd_idx, conf); return pd_idx; } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 003 of 4] md: Misc fixes for aligned-read handling.
1/ When aligned requests fail (read error) they need to be retried via the normal method (stripe cache). As we cannot be sure that we can process a single read in one go (we may not be able to allocate all the stripes needed) we store a bio-being-retried and a list of bioes-that-still-need-to-be-retried. When find a bio that needs to be retried, we should add it to the list, not to single-bio... 2/ The cloned bio is being used-after-free (to test BIO_UPTODATE). 3/ We forgot to add rdev-data_offset when submitting a bio for aligned-read 4/ clone_bio calls blk_recount_segments and then we change bi_bdev, so we need to invalidate the segment counts. 5/ We were never incrementing 'scnt' when resubmitting failed aligned requests. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid5.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2006-11-14 10:34:17.0 +1100 +++ ./drivers/md/raid5.c2006-11-14 10:34:33.0 +1100 @@ -2658,8 +2658,8 @@ static void add_bio_to_retry(struct bio spin_lock_irqsave(conf-device_lock, flags); - bi-bi_next = conf-retry_read_aligned; - conf-retry_read_aligned = bi; + bi-bi_next = conf-retry_read_aligned_list; + conf-retry_read_aligned_list = bi; spin_unlock_irqrestore(conf-device_lock, flags); md_wakeup_thread(conf-mddev-thread); @@ -2698,6 +2698,7 @@ static int raid5_align_endio(struct bio struct bio* raid_bi = bi-bi_private; mddev_t *mddev; raid5_conf_t *conf; + int uptodate = test_bit(BIO_UPTODATE, bi-bi_flags); if (bi-bi_size) return 1; @@ -2706,7 +2707,7 @@ static int raid5_align_endio(struct bio mddev = raid_bi-bi_bdev-bd_disk-queue-queuedata; conf = mddev_to_conf(mddev); - if (!error test_bit(BIO_UPTODATE, bi-bi_flags)) { + if (!error uptodate) { bio_endio(raid_bi, bytes, 0); if (atomic_dec_and_test(conf-active_aligned_reads)) wake_up(conf-wait_for_stripe); @@ -2759,9 +2760,11 @@ static int chunk_aligned_read(request_qu rcu_read_lock(); rdev = rcu_dereference(conf-disks[dd_idx].rdev); if (rdev test_bit(In_sync, rdev-flags)) { - align_bi-bi_bdev = rdev-bdev; atomic_inc(rdev-nr_pending); rcu_read_unlock(); + align_bi-bi_bdev = rdev-bdev; + align_bi-bi_flags = ~(1 BIO_SEG_VALID); + align_bi-bi_sector += rdev-data_offset; spin_lock_irq(conf-device_lock); wait_event_lock_irq(conf-wait_for_stripe, @@ -3151,7 +3154,8 @@ static int retry_aligned_read(raid5_con conf); last_sector = raid_bio-bi_sector + (raid_bio-bi_size9); - for (; logical_sector last_sector; logical_sector += STRIPE_SECTORS) { + for (; logical_sector last_sector; +logical_sector += STRIPE_SECTORS, scnt++) { if (scnt raid_bio-bi_hw_segments) /* already done this stripe */ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads
1/ We don't de-reference the rdev when the read completes. This means we need to record the rdev to so it is still available in the end_io routine. Fortunately bi_next in the original bio is unused at this point so we can stuff it in there. 2/ We leak a cloned by if the target rdev is not usasble. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid5.c |7 +++ 1 file changed, 7 insertions(+) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2006-11-14 11:00:51.0 +1100 +++ ./drivers/md/raid5.c2006-11-14 11:06:44.0 +1100 @@ -2699,6 +2699,7 @@ static int raid5_align_endio(struct bio mddev_t *mddev; raid5_conf_t *conf; int uptodate = test_bit(BIO_UPTODATE, bi-bi_flags); + mdk_rdev_t *rdev; if (bi-bi_size) return 1; @@ -2706,6 +2707,10 @@ static int raid5_align_endio(struct bio mddev = raid_bi-bi_bdev-bd_disk-queue-queuedata; conf = mddev_to_conf(mddev); + rdev = (void*)raid_bi-bi_next; + raid_bi-bi_next = NULL; + + rdev_dec_pending(rdev, conf-mddev); if (!error uptodate) { bio_endio(raid_bi, bytes, 0); @@ -2762,6 +2767,7 @@ static int chunk_aligned_read(request_qu if (rdev test_bit(In_sync, rdev-flags)) { atomic_inc(rdev-nr_pending); rcu_read_unlock(); + raid_bio-bi_next = (void*)rdev; align_bi-bi_bdev = rdev-bdev; align_bi-bi_flags = ~(1 BIO_SEG_VALID); align_bi-bi_sector += rdev-data_offset; @@ -2777,6 +2783,7 @@ static int chunk_aligned_read(request_qu return 1; } else { rcu_read_unlock(); + bio_put(align_bi); return 0; } } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bio too big device dm-XX (256 255) on 2.6.17
Hello, this is getting more and more annoying. Somewhere in the stack reiserfs-dm-md-hd[bd] lies the problem that's causing bio too big device dm-10 (256 255) errors, which cause i/o failures. It works as expected on reiserfs-dm-sda and on ext3-dm-md-hd[bd]. Debian Etch, 2.6.17-2. On Thu, 9 Nov 2006 00:52:27 +0100 Jure Pečar [EMAIL PROTECTED] wrote: Hello, Recently I upgraded my home server. I moved EVMS volumes from 3ware hw mirrors (lvm2) to a md raid1 (lvm2) and am now getting lots of these errors, which result in Input/output errors trying to read files from those volumes. It's kind of ugly because I have all the data, I just cannot read it ... Google comes up with mails from 2003 mentioning such problems. Are there any known such problems in recent kernels or am I hitting something new here? -- Jure Pečar http://jure.pecar.org/ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bio too big device dm-XX (256 255) on 2.6.17
On Tuesday November 14, [EMAIL PROTECTED] wrote: Hello, this is getting more and more annoying. Somewhere in the stack reiserfs-dm-md-hd[bd] lies the problem that's causing bio too big device dm-10 (256 255) errors, which cause i/o failures. It works as expected on reiserfs-dm-sda and on ext3-dm-md-hd[bd]. Debian Etch, 2.6.17-2. Please try this patch (I can give more detailed instructions if needed). NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/dm-table.c |3 +++ 1 file changed, 3 insertions(+) diff .prev/drivers/md/dm-table.c ./drivers/md/dm-table.c --- .prev/drivers/md/dm-table.c 2006-11-14 15:23:08.0 +1100 +++ ./drivers/md/dm-table.c 2006-11-14 15:23:57.0 +1100 @@ -99,6 +99,9 @@ static void combine_restrictions_low(str lhs-max_segment_size = min_not_zero(lhs-max_segment_size, rhs-max_segment_size); + lhs-max_hw_sectors = + min_not_zero(lhs-max_hw_sectors, rhs-max_hw_sectors); + lhs-seg_boundary_mask = min_not_zero(lhs-seg_boundary_mask, rhs-seg_boundary_mask); - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re[2]: RAID1 submirror failure causes reboot?
On Tue, Nov 14 2006, Neil Brown wrote: On Monday November 13, [EMAIL PROTECTED] wrote: It doesn't sound at all unreasonable. It's most likely either a bug in the ide driver, or a bad bio being passed to the block layer (and later on to the request and driver). By bad I mean one that isn't entirely consistent, which could be a bug in eg md. I just noticed (while tracking raid6 problems...) that bio_clone calls bio_phys_segments and bio_hw_segments (why does it do both?). This calls blk_recount_segments which does calculations based on -bi_bdev. Only immediately after calling bio_clone, raid1 changes bi_bdev, thus creating potential inconsistency in the bio. Would this sort of inconsistency cause this problem? raid1 should change it first, you are right. But it should not matter, as the real device should have restrictions that are at least equal to the md device. So it may be a bit more conservative, but I don't think there's a problem bug there. Agree, that would be a good plan to enable. Other questions: are you seeing timeouts at any point? The ide timeout code has some request/bio resetting code which might be worrisome. Jim could probably answer this with more authority, but there aren't obvious timeouts from the logs he posted. A representative sample is: [87338.675891] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87338.685143] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87338.694791] ide: failed opcode was: unknown [87343.557424] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87343.566388] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87343.576105] ide: failed opcode was: unknown [87348.472226] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87348.481170] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87348.490843] ide: failed opcode was: unknown [87353.387028] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } [87353.395735] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, LBAsect=176315718, sector=176315711 [87353.405500] ide: failed opcode was: unknown [87353.461342] ide1: reset: success Then lets wait for Jim to repeat his testing with all the debugging options enabled, that should make us a little wiser. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html