Eyal Lebedinsky wrote:
> I CC'ed linux-ide to see if they think the reported error was really innocent:
>
> Question: does this error report suggest that a disk could be corrupted?
>
> This SATA disk is part of an md raid and no error was reported by md.
>
> [937567.332751] ata3.00: exception Em
I CC'ed linux-ide to see if they think the reported error was really innocent:
Question: does this error report suggest that a disk could be corrupted?
This SATA disk is part of an md raid and no error was reported by md.
[937567.332751] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4190002 acti
Ok, so hearing all the excitement I ran a check on a multi-disk
RAID-1. One of the RAID-1 disks failed out, maybe by coincidence
but presumably due to the check. (I also have another disk in
the array deliberately removed as a backup mechanism.) And
of course there is a big mismatch count.
Questi
On Saturday February 24, [EMAIL PROTECTED] wrote:
> But is this not a good opportunity to repair the bad stripe for a very
> low cost (no complete resync required)?
In this case, 'md' knew nothing about an error. The SCSI layer
detected something and thought it had fixed it itself. Nothing for m
Justin Piszcz wrote:
On Sat, 24 Feb 2007, Michael Tokarev wrote:
Jason Rainforest wrote:
I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5,
multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200
+).
I then ordered a resync. The mismatch_cnt returned to 0 at
On Sun, 25 Feb 2007, Christian Pernegger wrote:
Sorry to hijack the thread a little but I just noticed that the
mismatch_cnt for my mirror is at 256.
I'd always thought the monthly check done by the mdadm Debian package
does repair as well - apparently it doesn't.
So I guess I should run rep
Sorry to hijack the thread a little but I just noticed that the
mismatch_cnt for my mirror is at 256.
I'd always thought the monthly check done by the mdadm Debian package
does repair as well - apparently it doesn't.
So I guess I should run repair but I'm wondering ...
- is it safe / bugfree con
On Sat, Feb 24, 2007 at 11:23:55AM +1100, Eyal Lebedinsky wrote:
[...]
>
> fsck (ext3 with logging) found no errors but I may have bad data
> somewhere.
I've written a program for fast MD5/SHA256 summing which may be useful
for tracking these kind of silent corruptions. See
http://www.fr
On Sat, 24 Feb 2007, Michael Tokarev wrote:
Jason Rainforest wrote:
I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5,
multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200
+).
I then ordered a resync. The mismatch_cnt returned to 0 at the start of
As poin
Jason Rainforest wrote:
> I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5,
> multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200
> +).
>
> I then ordered a resync. The mismatch_cnt returned to 0 at the start of
As pointed out later it was repair, not resync.
Ahh, perhaps Neil can fix that? ;)
Cat /sys/block/md0/md/sync_action will tell you what it is really doing.
On Sat, 24 Feb 2007, Jason Rainforest wrote:
Yes, I meant repair, sorry. I checked my bash history and I did indeed
order a repair (echo repair >/sys/block/md0/md/sync_action). I think
Yes, I meant repair, sorry. I checked my bash history and I did indeed
order a repair (echo repair >/sys/block/md0/md/sync_action). I think I
called it a resync because that's what /proc/mdstat told me it was
doing.
On Sat, 2007-02-24 at 04:50 -0500, Justin Piszcz wrote:
> A resync? You're suppos
A resync? You're supposed to run a 'repair' are you not?
Justin.
On Sat, 24 Feb 2007, Jason Rainforest wrote:
I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5,
multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200
+).
I then ordered a resync. The mismatch_c
I tried doing a check, found a mismatch_cnt of 8 (7*250Gb SW RAID5,
multiple controllers on Linux 2.6.19.2, SMP x86-64 on Athlon64 X2 4200
+).
I then ordered a resync. The mismatch_cnt returned to 0 at the start of
the resync, but around the same time that it went up to 8 with the
check, it went u
Of course you could just run repair but then you would never know that
mismatch_cnt was > 0.
Justin.
On Sat, 24 Feb 2007, Justin Piszcz wrote:
Perhaps,
The way it works (I believe is as follows)
1. echo check > sync_action
2. If mismatch_cnt > 0 then run:
3. echo repair > sync_action
4. Re-
Perhaps,
The way it works (I believe is as follows)
1. echo check > sync_action
2. If mismatch_cnt > 0 then run:
3. echo repair > sync_action
4. Re-run #1
5. Check to make sure it is back to 0.
Justin.
On Sat, 24 Feb 2007, Eyal Lebedinsky wrote:
I did a resync since, which ended up with the
I did a resync since, which ended up with the same mismatch_cnt of 184.
I noticed that the count *was* reset to zero when the resync started,
but ended up with 184 (same as after the check).
I thought that the resync just calculates fresh parity and does not
bother checking if it is different. So
But is this not a good opportunity to repair the bad stripe for a very
low cost (no complete resync required)?
At time of error we actually know which disk failed and can re-write
it, something we do not know at resync time, so I assume we always
write to the parity disk.
Justin Piszcz wrote:
> S
Should the raid have noticed the error, checked the offending
stripe and taken appropriate action? The messages from that error
are below.
I don't think so, that is why we need to run check every once and a while
and check the mismatch_cnt file for each md raid device.
Run repair then re-run c
I run a 'check' weekly, and yesterday it came up with a non-zero
mismatch count (184). There were no earlier RAID errors logged
and the count was zero after the run a week ago.
Now, the interesting part is that there was one i/o error logged
during the check *last week*, however the raid did not s
20 matches
Mail list logo