Re: [opensuse] Error's on raid disk

Carlos E. R. Tue, 08 May 2007 05:33:12 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The Tuesday 2007-05-08 at 13:09 +0200, Wilfred van Velzen wrote:

> > Uau, that's a large disk. Or busy. Usually, it's about two hours or
> so.
> 
> It's big: 750GB
> 
> Not too busy, but busy enough during work hours, because the 30% hasn't
> moved yet...

Probably busy enough that the test doesn't progress (much). If the test is 
well designed, the normal disk activity has priority. Probably you just 
have to wait longer and see.

There is a trick, although you may not like it. f the raid is in software, 
you can deactivate one of the hard disks (simulate a failure). The other 
disk(s) take over the load, the failed one goes idle, and the test can 
happily progress on that one. However, if the other disk goes down in the 
interval... ouch :-(

> > Not the system log, but the smart log that resides in the disk; you 
> > can dig it out with "smartctl -a device".
> 
> Yes, that was what I meant. I checked with:
> 
> smartctl -l error /dev/sdb
> 
> and:
> 
> smartctl -l selftest /dev/sdb
> 
> But that shows the same output as the -a option...

Ah...

I expected something like this (I see it with -a):

SMART Error Log Version: 1
ATA Error Count: 251 (device log contains only the most recent five errors)
...
Error 251 occurred at disk power-on lifetime: 3734 hours (155 days + 14 hours)
  When the command that caused the error occurred, the device was active or 
idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 e8 f6 83 f0  Error: ICRC, ABRT at LBA = 0x0083f6e8 = 8648424

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 f0 f9 f5 83 f0 00      00:04:45.606  READ DMA EXT
  25 00 f0 f9 f5 83 f0 00      00:04:44.706  READ DMA EXT
  10 00 3f 00 00 00 f0 00      00:04:44.705  RECALIBRATE [OBS-4]
  25 00 f0 f9 f5 83 f0 00      00:04:44.421  READ DMA EXT
  25 00 f0 f9 f5 83 f0 00      00:04:44.248  READ DMA EXT

I think these logs depend on the disk manufacturer.

> > Right. If I interpret it correctly, your sda has four sectors
> remapped. It
> 
> sdb!

Right, sdb, I got confused.

> I'll advice the one who controls the money to order a spare one in
> advance, so we can replace it if necessary. It's one of the disks in a
> raid 1 configuration, so it shouldn't be an immediate problem if one
> disk fails...

In the case of a production server that you consider important enough to 
have a raid, it should always be important to have a disks spare at hand, 
errors or not ;-)

Also, you know that you can have an "active spare" inside the raid. If 
there is a problem, it will immediately activate it and switch over. The 
disadvantage is, obviously, that the spare is powered up, although idle. 
In those cases, I would have an spare outside, too - maybe I'm too 
paranoid ;-)

> > > This isn't something that can be fixed on short notice ;), so I hope
> > > you will see this message!
> >
> > Yep, I noticed, because you sent also a CC to me: in those cases Pine
> > shows a yellow mark :-)
> 
> I will keep doing this, then... ;)

No problem. Just remember that some people here do not like those at all - 
I really don't mind, my filters work nicely ;-)

- -- 
Cheers,
       Carlos E. R.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGQG3+tTMYHG2NR9URAopmAJwPH+9oifhx6UZdRmWYdBcM7UA3+gCeKaYn
wHv5e9D4vePAc5Kw8eyTKPU=
=lHY7
-----END PGP SIGNATURE-----

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [opensuse] Error's on raid disk

Reply via email to