Re: [opensuse] software raid missing a drive??

Carlos E. R. Tue, 06 Mar 2007 18:31:59 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The Tuesday 2007-03-06 at 16:26 +0100, Leen de Braal wrote:

> >> > Look at the logs... it's the only way. It could be a glitch. There 
> >> > is a temporary problem sometime, a disk is removed, and it awaits 
> >> > manual intervention. It will automatically activate an spare if 
> >> > available, though.
> >> >
> >>
> >> Found:
> >>
> >> Mar  5 00:17:14 linux kernel: hda: dma_intr: status=0x51 { DriveReady 
> >> SeekComplete Error }
> >> Mar  5 00:17:14 linux kernel: hda: dma_intr: error=0x40 { 
> >> UncorrectableError }, LBAsect=273480054, high=16, low=5044598, 
> >> sector=273480053
> >> Mar  5 00:17:14 linux kernel: ide: failed opcode was: unknown
> >> Mar  5 00:17:14 linux kernel: end_request: I/O error, dev hda, sector 
> >> 273480053
> >> Mar  5 00:17:14 linux kernel: raid1: Disk failure on hda3, disabling 
> >> device.
> >> Mar  5 00:17:14 linux kernel:   Operation continuing on 1 devices
> >> Mar  5 00:17:14 linux kernel: raid1: hda3: rescheduling sector 271343408
> >> Mar  5 00:17:14 linux kernel: RAID1 conf printout:
> >> Mar  5 00:17:14 linux kernel:  --- wd:1 rd:2
> >> Mar  5 00:17:14 linux kernel:  disk 0, wo:1, o:0, dev:hda3
> >> Mar  5 00:17:14 linux kernel:  disk 1, wo:0, o:1, dev:hdb3
> >> Mar  5 00:17:14 linux kernel: RAID1 conf printout:
> >> Mar  5 00:17:14 linux kernel:  --- wd:1 rd:2
> >> Mar  5 00:17:14 linux kernel:  disk 1, wo:0, o:1, dev:hdb3
> >> Mar  5 00:17:14 linux kernel: raid1: hdb3: redirecting sector 271343408 to 
> >> another mirror
> >
> > Is the above telling me that hda3 was removed from the mirror because
> > of a single bad sector?

Yes...

> > That seems extremely aggressive.

Quite so.

> Me too
> 
> >
> > I know there is some LKML discussion of needing to have MD
> > automatically detect the above and simply rewrite the failed sector
> > with data from the good mirrored sector.
> >
> > During the write /dev/hda should re-map the failed sector and continue
> > running fine.  (ie. All disk sector remapping for failures happens on
> > writes AIUI.)

Yes, that should work. The disk firmware remaps bad sectors when writing. 

Alternatively, the software could remap a sector, but it would do that on 
the layer above the mirror, ie, at ext3 level, for example, meaning on 
both disks. But that is not automatic, either, AFAIK.

> > If a disk is failed after a single sector read error currently I can
> > see why the kernel developers are looking into alternate ways to
> > handle the situation.

Seems so.

> It is running ok now, as far as i can see, all in sync.
> For me it means that I will have to pay more attention to monitor this
> kind of errors. Will look into mdadm, as I have seen, that it has
> parameters that can make it do this, and report me by mail or something
> like that.

You can set it to email you, even to page or phone you, I think.

Also, you can find the error in the SMART log of that HD, using smartctl. 
It should be possible to deduce if the sector was remaped, looking at the 
Reallocated_Sector_Ct.

- -- 
Cheers,
       Carlos E. R.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFF7iMttTMYHG2NR9URAlLZAJkBdp8ppHVlp57xw+cMKor04qsnZQCgipmz
9KAlen8lUNj4HC9SxCGpmQs=
=+jq6
-----END PGP SIGNATURE-----

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [opensuse] software raid missing a drive??

Reply via email to