Understanding my SMART errors

2009-06-10 Thread Celejar
Hi,

In the last couple of days, I've begun to see both kernel errors and
SMART warnings about my laptop's two and a half year old hard drive.

An excerpt of a current 'dmesg | grep hda' (these errors occurred upon
resuming from suspend to disk):

[34074.459505] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34074.459685] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34074.459886] hda: possibly failed opcode: 0x25
[34079.744751] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.744931] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.745135] hda: possibly failed opcode: 0x25
[34079.750086] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.750263] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.750466] hda: possibly failed opcode: 0x25
[34079.789002] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.789192] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.789411] hda: possibly failed opcode: 0x25
[34079.794851] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.795043] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.795261] hda: possibly failed opcode: 0x25

I ran the short and long SMART self-tests, and they seem clean:

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted without error   00%  5880 -
# 2  Short offline   Completed without error   00%  5879 -
# 3  Short offline   Completed without error   00%  1435 -

[#1 and #2 are the ones I ran yesterday, IIUC.]

I've attached the output of '# smartctl -a /dev/hda' to this mail.

Here's an excerpt of syslog ('grep smartd /var/log/syslog', with a bunch
of 'Temperature_Celsius changed' lines removed, since I think they're
normal):

Jun  9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 
191 G-Sense_Error_Rate changed from 100 to 99 
Jun  9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
increased from 12 to 17 
Jun  9 15:12:29 lizzie smartd[3474]: Sending warning via mail to r...@localhost 
... 
Jun  9 15:12:29 lizzie smartd[3474]: Warning via mail to r...@localhost: 
successful 
Jun  9 19:09:49 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
increased from 17 to 28 
Jun  9 20:42:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 
191 G-Sense_Error_Rate changed from 99 to 100 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure 
Attribute: 2 Throughput_Performance changed from 100 to 105 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure 
Attribute: 3 Spin_Up_Time changed from 151 to 152 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure 
Attribute: 8 Seek_Time_Performance changed from 100 to 126 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
increased from 28 to 34 

So far, the only actual problem that I've noticed is a (single) failure to
resume from disk yesterday, with some message (I neglected to save it)
about a checksum failure, which I believe was accompanied by some
kernel errors similar to the ones that I've reproduced above.

Is this drive going?  What further tests / diagnostics can I do?  [Yes,
I have backups, and I'm going to redouble my attention to keeping them
current making sure that they're comprehensive.]

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator



smart-info
Description: Binary data


Re: Understanding my SMART errors

2009-06-10 Thread Johannes Wiedersich
Celejar wrote:
 Jun  9 15:12:29 lizzie smartd[3474]: Sending warning via mail to 
 r...@localhost ... 
 Jun  9 15:12:29 lizzie smartd[3474]: Warning via mail to r...@localhost: 
 successful 

What does this mail say?

Cheers,
Johannes


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Understanding my SMART errors

2009-06-10 Thread Boyd Stephen Smith Jr.
In 20090610143552.fd11cd1a.cele...@gmail.com, Celejar wrote:
An excerpt of a current 'dmesg | grep hda' (these errors occurred upon
resuming from suspend to disk):

[34074.459505] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error
 } [34074.459685] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 [34074.459886] hda: possibly failed opcode: 0x25
[34079.744751] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error
 } [34079.744931] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 [34079.745135] hda: possibly failed opcode: 0x25
[34079.750086] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error
 } [34079.750263] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 [34079.750466] hda: possibly failed opcode: 0x25
[34079.789002] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error
 } [34079.789192] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 [34079.789411] hda: possibly failed opcode: 0x25
[34079.794851] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error
 } [34079.795043] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 [34079.795261] hda: possibly failed opcode: 0x25

Could be cabling or some other component between the kernel and the HD, but 
most likely this is the sign of a failing drive.

 Jun  9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
 increased from 12 to 17
 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
 increased from 28 to 34

Is this drive going?

Most likely, yes.  Although, it might not completely fail for quite a while.  
It may even be fixable through manufacturer-specific means.
-- 
Boyd Stephen Smith Jr.   ,= ,-_-. =.
b...@iguanasuicide.net  ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/\_/



signature.asc
Description: This is a digitally signed message part.


Re: Understanding my SMART errors

2009-06-10 Thread Aniruddha
On Wed, Jun 10, 2009 at 8:50 PM, Boyd Stephen Smith
Jr.b...@iguanasuicide.net wrote:
 In 20090610143552.fd11cd1a.cele...@gmail.com, Celejar wrote:

Is this drive going?

 Most likely, yes.  Although, it might not completely fail for quite a while.
 It may even be fixable through manufacturer-specific means.


Most probably your drive is failing, do already have a backup? If not
now is the best time to make one ; ) For more info on those errors:
http://www.captain.at/howto-linux-driveready-seekcomplete-error-drivestatuserror.php


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Understanding my SMART errors

2009-06-10 Thread Celejar
On Wed, 10 Jun 2009 20:45:14 +0200
Johannes Wiedersich johan...@physik.blm.tu-muenchen.de wrote:

 Celejar wrote:
  Jun  9 15:12:29 lizzie smartd[3474]: Sending warning via mail to 
  r...@localhost ... 
  Jun  9 15:12:29 lizzie smartd[3474]: Warning via mail to r...@localhost: 
  successful 
 
 What does this mail say?

Nothing useful, which is why I didn't bother reproducing it
originally.  I have so far received two emails; the first:

 This email was generated by the smartd daemon running on:
 
host name: lizzie
   DNS domain: localdomain
   NIS domain: (none)
 
 The following warning/error was logged by the smartd daemon:
 
 Device: /dev/hda, ATA error count increased from 0 to 4
 
 For details see host's SYSLOG (default: /var/log/syslog).
 
 You can also use the smartctl utility for further investigation.
 No additional email messages about this problem will be sent.

I received another one mentioning an ATA error count increase from 12 to
17.

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Understanding my SMART errors

2009-06-10 Thread Celejar
On Wed, 10 Jun 2009 13:50:29 -0500
Boyd Stephen Smith Jr. b...@iguanasuicide.net wrote:

...

 Could be cabling or some other component between the kernel and the HD, but 
 most likely this is the sign of a failing drive.
 
  Jun  9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
  increased from 12 to 17
  Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count 
  increased from 28 to 34
 
 Is this drive going?
 
 Most likely, yes.  Although, it might not completely fail for quite a while.  
 It may even be fixable through manufacturer-specific means.

Thanks; I guess I'll start keeping my eyes open for a replacement.

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: Understanding my SMART errors

2009-06-10 Thread Celejar
On Wed, 10 Jun 2009 21:26:05 +0200
Aniruddha mailingdotl...@gmail.com wrote:

 On Wed, Jun 10, 2009 at 8:50 PM, Boyd Stephen Smith
 Jr.b...@iguanasuicide.net wrote:
  In 20090610143552.fd11cd1a.cele...@gmail.com, Celejar wrote:
 
 Is this drive going?
 
  Most likely, yes.  Although, it might not completely fail for quite a while.
  It may even be fixable through manufacturer-specific means.
 
 
 Most probably your drive is failing, do already have a backup? If not
 now is the best time to make one ; ) For more info on those errors:

As I mentioned in my OP, I do have them, and I'll certainly be extra
vigilant about them now.

 http://www.captain.at/howto-linux-driveready-seekcomplete-error-drivestatuserror.php

Thanks; I'll have to take a look.

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org