Re: Checksums wrong on one disk of mirror

2006-11-13 Thread Henrik Holst
David wrote:

snip

 mdadm is version 1.12.  Looking at the most recently available version
 this seems incredibly out of date, but seems to be the default installed
 in Ubuntu.  Even Debian stable seems to have 1.9.  I can bug this with
 them for an update if necessary.

It's already on it's way. Update to the comming Debian release Etch
(due to be Stable in December 2006; if I remember correctly). In Etch
the mdadm version is v2.5.3 (7 August 2006).

Henrik Holst

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re[2]: RAID1 submirror failure causes reboot?

2006-11-13 Thread Jens Axboe
On Mon, Nov 13 2006, Neil Brown wrote:
 On Friday November 10, [EMAIL PROTECTED] wrote:
  Hello Neil,
  
   [87398.531579] blk: request botched
  NB  
  
  NB That looks bad.  Possible some bug in the IDE controller or elsewhere
  NB in the block layer.  Jens: What might cause that?
  
  NB --snip--
  
  NB That doesn't look like raid was involved.  If it was you would expect
  NB to see raid1_end_write_request or raid1_end_read_request in that
  NB trace.
  So that might be the hard or soft part of IDE layer failing the
  system, or a PCI problem for example?
 
 What I think is happening here (and Jens: if you could tell me how
 impossible this is, that would be good) is this:
 
 Some error handling somewhere in the low-level ide driver is getting
 confused and somehow one the sector counts in the 'struct request' is
 getting set wrongly.  blk_recalc_rq_sectors notices this and says
 blk: request botched.  It tries to auto-correct by increasing
 rq-nr_sectors to be consistent with other counts.
 I'm *guessing* this is the wrong thing to do, and that it has a
 side-effect but bi_end_io is getting called on the Bi twice.
 The second time the bio has been freed and reused and the wrong
 b_end_io is called and it does the wrong thing.

It doesn't sound at all unreasonable. It's most likely either a bug in
the ide driver, or a bad bio being passed to the block layer (and
later on to the request and driver). By bad I mean one that isn't
entirely consistent, which could be a bug in eg md. The request
botched error is usually a sign of something being severly screwed up.
As you mention further down, get slab and page debugging enabled to
potentially catch this earlier. It could be a sign of a freed bio or
request with corrupt contents.

 This sounds a bit far-fetched, but it is the only explanation I can
 come up with for the observed back trace which is:
 
 [87403.706012]  [c0103871] error_code+0x39/0x40
 [87403.710794]  [c0180e0a] mpage_end_io_read+0x5e/0x72
 [87403.716154]  [c0164af9] bio_endio+0x56/0x7b
 [87403.720798]  [c0256778] __end_that_request_first+0x1e0/0x301
 [87403.726985]  [c02568a4] end_that_request_first+0xb/0xd
 [87403.732699]  [c02bd73c] __ide_end_request+0x54/0xe1
 [87403.738214]  [c02bd807] ide_end_request+0x3e/0x5c
 [87403.743382]  [c02c35df] task_error+0x5b/0x97
 [87403.748113]  [c02c36fa] task_in_intr+0x6e/0xa2
 [87403.753120]  [c02bf19e] ide_intr+0xaf/0x12c
 [87403.757815]  [c013e5a7] handle_IRQ_event+0x23/0x57
 [87403.763135]  [c013e66f] __do_IRQ+0x94/0xfd
 [87403.767802]  [c0105192] do_IRQ+0x32/0x68
 [87403.772278]  [c010372e] common_interrupt+0x1a/0x20
 
 i.e. bio_endio goes straight to mpage_end_io despite the face that the
 filesystem is mounted over md/raid1.

What is the workload? Is io to the real device mixed with io that came
through md as well?

 Is the kernel compiled with CONFIG_DEBUG_SLAB=y and
 CONFIG_DEBUG_PAGEALLOC=y ??
 They might help trigger the error earlier and so make the problem more
 obvious.

Agree, that would be a good plan to enable. Other questions: are you
seeing timeouts at any point? The ide timeout code has some request/bio
resetting code which might be worrisome.
 
 NeilBrown

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re[2]: RAID1 submirror failure causes reboot?

2006-11-13 Thread Neil Brown
On Monday November 13, [EMAIL PROTECTED] wrote:
 
 It doesn't sound at all unreasonable. It's most likely either a bug in
 the ide driver, or a bad bio being passed to the block layer (and
 later on to the request and driver). By bad I mean one that isn't
 entirely consistent, which could be a bug in eg md. 

I just noticed (while tracking raid6 problems...) that bio_clone calls
bio_phys_segments and bio_hw_segments (why does it do both?).
This calls blk_recount_segments which does calculations based on
 -bi_bdev.
Only immediately after calling bio_clone, raid1 changes bi_bdev, thus
creating potential inconsistency in the bio.  Would this sort of
inconsistency cause this problem?

 
 Agree, that would be a good plan to enable. Other questions: are you
 seeing timeouts at any point? The ide timeout code has some request/bio
 resetting code which might be worrisome.

Jim could probably answer this with more authority, but there aren't
obvious timeouts from the logs he posted.  A representative sample is:
[87338.675891] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87338.685143] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87338.694791] ide: failed opcode was: unknown
[87343.557424] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87343.566388] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87343.576105] ide: failed opcode was: unknown
[87348.472226] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87348.481170] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87348.490843] ide: failed opcode was: unknown
[87353.387028] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87353.395735] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87353.405500] ide: failed opcode was: unknown
[87353.461342] ide1: reset: success

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 001 of 4] md: Fix innocuous bug in raid6 stripe_to_pdidx

2006-11-13 Thread NeilBrown

stripe_to_pdidx finds the index of the parity disk for a given
stripe.
It assumes raid5 in that it uses disks-1 to determine the number
of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each
other out.  The only way that 'data_disks' affects the calculation of
pd_idx in raid5_compute_sector is when it is divided into the
sector number.  But as that sector number is calculated by multiplying
in the wrong value of 'data_disks' the division produces the right
value.

So it is innocuous but needs to be fixed.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2006-11-14 10:05:00.0 +1100
+++ ./drivers/md/raid5.c2006-11-14 10:33:41.0 +1100
@@ -1355,8 +1355,10 @@ static int stripe_to_pdidx(sector_t stri
int pd_idx, dd_idx;
int chunk_offset = sector_div(stripe, sectors_per_chunk);
 
-   raid5_compute_sector(stripe*(disks-1)*sectors_per_chunk
-+ chunk_offset, disks, disks-1, dd_idx, pd_idx, 
conf);
+   raid5_compute_sector(stripe * (disks - conf-max_degraded)
+*sectors_per_chunk + chunk_offset,
+disks, disks - conf-max_degraded,
+dd_idx, pd_idx, conf);
return pd_idx;
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 003 of 4] md: Misc fixes for aligned-read handling.

2006-11-13 Thread NeilBrown

1/ When aligned requests fail (read error) they need to be retried 
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to 
   the list, not to single-bio...

2/ The cloned bio is being used-after-free (to test BIO_UPTODATE).

3/ We forgot to add rdev-data_offset when submitting
   a bio for aligned-read
4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |   14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2006-11-14 10:34:17.0 +1100
+++ ./drivers/md/raid5.c2006-11-14 10:34:33.0 +1100
@@ -2658,8 +2658,8 @@ static void add_bio_to_retry(struct bio 
 
spin_lock_irqsave(conf-device_lock, flags);
 
-   bi-bi_next = conf-retry_read_aligned;
-   conf-retry_read_aligned = bi;
+   bi-bi_next = conf-retry_read_aligned_list;
+   conf-retry_read_aligned_list = bi;
 
spin_unlock_irqrestore(conf-device_lock, flags);
md_wakeup_thread(conf-mddev-thread);
@@ -2698,6 +2698,7 @@ static int raid5_align_endio(struct bio 
struct bio* raid_bi  = bi-bi_private;
mddev_t *mddev;
raid5_conf_t *conf;
+   int uptodate = test_bit(BIO_UPTODATE, bi-bi_flags);
 
if (bi-bi_size)
return 1;
@@ -2706,7 +2707,7 @@ static int raid5_align_endio(struct bio 
mddev = raid_bi-bi_bdev-bd_disk-queue-queuedata;
conf = mddev_to_conf(mddev);
 
-   if (!error  test_bit(BIO_UPTODATE, bi-bi_flags)) {
+   if (!error  uptodate) {
bio_endio(raid_bi, bytes, 0);
if (atomic_dec_and_test(conf-active_aligned_reads))
wake_up(conf-wait_for_stripe);
@@ -2759,9 +2760,11 @@ static int chunk_aligned_read(request_qu
rcu_read_lock();
rdev = rcu_dereference(conf-disks[dd_idx].rdev);
if (rdev  test_bit(In_sync, rdev-flags)) {
-   align_bi-bi_bdev =  rdev-bdev;
atomic_inc(rdev-nr_pending);
rcu_read_unlock();
+   align_bi-bi_bdev =  rdev-bdev;
+   align_bi-bi_flags = ~(1  BIO_SEG_VALID);
+   align_bi-bi_sector += rdev-data_offset;
 
spin_lock_irq(conf-device_lock);
wait_event_lock_irq(conf-wait_for_stripe,
@@ -3151,7 +3154,8 @@ static int  retry_aligned_read(raid5_con
conf);
last_sector = raid_bio-bi_sector + (raid_bio-bi_size9);
 
-   for (; logical_sector  last_sector; logical_sector += STRIPE_SECTORS) {
+   for (; logical_sector  last_sector;
+logical_sector += STRIPE_SECTORS, scnt++) {
 
if (scnt  raid_bio-bi_hw_segments)
/* already done this stripe */
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 004 of 4] md: Fix a couple more bugs in raid5/6 aligned reads

2006-11-13 Thread NeilBrown

1/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

2/ We leak a cloned by if the target rdev is not usasble.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |7 +++
 1 file changed, 7 insertions(+)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2006-11-14 11:00:51.0 +1100
+++ ./drivers/md/raid5.c2006-11-14 11:06:44.0 +1100
@@ -2699,6 +2699,7 @@ static int raid5_align_endio(struct bio 
mddev_t *mddev;
raid5_conf_t *conf;
int uptodate = test_bit(BIO_UPTODATE, bi-bi_flags);
+   mdk_rdev_t *rdev;
 
if (bi-bi_size)
return 1;
@@ -2706,6 +2707,10 @@ static int raid5_align_endio(struct bio 
 
mddev = raid_bi-bi_bdev-bd_disk-queue-queuedata;
conf = mddev_to_conf(mddev);
+   rdev = (void*)raid_bi-bi_next;
+   raid_bi-bi_next = NULL;
+
+   rdev_dec_pending(rdev, conf-mddev);
 
if (!error  uptodate) {
bio_endio(raid_bi, bytes, 0);
@@ -2762,6 +2767,7 @@ static int chunk_aligned_read(request_qu
if (rdev  test_bit(In_sync, rdev-flags)) {
atomic_inc(rdev-nr_pending);
rcu_read_unlock();
+   raid_bio-bi_next = (void*)rdev;
align_bi-bi_bdev =  rdev-bdev;
align_bi-bi_flags = ~(1  BIO_SEG_VALID);
align_bi-bi_sector += rdev-data_offset;
@@ -2777,6 +2783,7 @@ static int chunk_aligned_read(request_qu
return 1;
} else {
rcu_read_unlock();
+   bio_put(align_bi);
return 0;
}
 }
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio too big device dm-XX (256 255) on 2.6.17

2006-11-13 Thread Jure Pečar

Hello,

this is getting more and more annoying.

Somewhere in the stack reiserfs-dm-md-hd[bd] lies the problem that's causing 
bio too big device dm-10 (256  255) errors, which cause i/o failures.

It works as expected on reiserfs-dm-sda and on ext3-dm-md-hd[bd].

Debian Etch, 2.6.17-2.



On Thu, 9 Nov 2006 00:52:27 +0100
Jure Pečar [EMAIL PROTECTED] wrote:

 
 Hello,
 
 Recently I upgraded my home server. I moved EVMS volumes from 3ware hw
 mirrors (lvm2) to a md raid1 (lvm2) and am now getting lots of these
 errors, which result in Input/output errors trying to read files from
 those volumes. It's kind of ugly because I have all the data, I just
 cannot read it ...
 
 Google comes up with mails from 2003 mentioning such problems. Are there
 any known such problems in recent kernels or am I hitting something new
 here?


-- 

Jure Pečar
http://jure.pecar.org/
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bio too big device dm-XX (256 255) on 2.6.17

2006-11-13 Thread Neil Brown
On Tuesday November 14, [EMAIL PROTECTED] wrote:
 
 Hello,
 
 this is getting more and more annoying.
 
 Somewhere in the stack reiserfs-dm-md-hd[bd] lies the problem that's causing 
 bio too big device dm-10 (256  255) errors, which cause i/o failures.
 
 It works as expected on reiserfs-dm-sda and on ext3-dm-md-hd[bd].
 
 Debian Etch, 2.6.17-2.
 

Please try this patch (I can give more detailed instructions if needed).

NeilBrown

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/dm-table.c |3 +++
 1 file changed, 3 insertions(+)

diff .prev/drivers/md/dm-table.c ./drivers/md/dm-table.c
--- .prev/drivers/md/dm-table.c 2006-11-14 15:23:08.0 +1100
+++ ./drivers/md/dm-table.c 2006-11-14 15:23:57.0 +1100
@@ -99,6 +99,9 @@ static void combine_restrictions_low(str
lhs-max_segment_size =
min_not_zero(lhs-max_segment_size, rhs-max_segment_size);
 
+   lhs-max_hw_sectors = 
+   min_not_zero(lhs-max_hw_sectors, rhs-max_hw_sectors);
+
lhs-seg_boundary_mask =
min_not_zero(lhs-seg_boundary_mask, rhs-seg_boundary_mask);
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re[2]: RAID1 submirror failure causes reboot?

2006-11-13 Thread Jens Axboe
On Tue, Nov 14 2006, Neil Brown wrote:
 On Monday November 13, [EMAIL PROTECTED] wrote:
  
  It doesn't sound at all unreasonable. It's most likely either a bug in
  the ide driver, or a bad bio being passed to the block layer (and
  later on to the request and driver). By bad I mean one that isn't
  entirely consistent, which could be a bug in eg md. 
 
 I just noticed (while tracking raid6 problems...) that bio_clone calls
 bio_phys_segments and bio_hw_segments (why does it do both?).
 This calls blk_recount_segments which does calculations based on
  -bi_bdev.
 Only immediately after calling bio_clone, raid1 changes bi_bdev, thus
 creating potential inconsistency in the bio.  Would this sort of
 inconsistency cause this problem?

raid1 should change it first, you are right. But it should not matter,
as the real device should have restrictions that are at least equal to
the md device. So it may be a bit more conservative, but I don't think
there's a problem bug there.

  Agree, that would be a good plan to enable. Other questions: are you
  seeing timeouts at any point? The ide timeout code has some request/bio
  resetting code which might be worrisome.
 
 Jim could probably answer this with more authority, but there aren't
 obvious timeouts from the logs he posted.  A representative sample is:
 [87338.675891] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 [87338.685143] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
 LBAsect=176315718, sector=176315711
 [87338.694791] ide: failed opcode was: unknown
 [87343.557424] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 [87343.566388] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
 LBAsect=176315718, sector=176315711
 [87343.576105] ide: failed opcode was: unknown
 [87348.472226] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 [87348.481170] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
 LBAsect=176315718, sector=176315711
 [87348.490843] ide: failed opcode was: unknown
 [87353.387028] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 [87353.395735] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
 LBAsect=176315718, sector=176315711
 [87353.405500] ide: failed opcode was: unknown
 [87353.461342] ide1: reset: success

Then lets wait for Jim to repeat his testing with all the debugging
options enabled, that should make us a little wiser.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html