Understanding my SMART errors
Hi, In the last couple of days, I've begun to see both kernel errors and SMART warnings about my laptop's two and a half year old hard drive. An excerpt of a current 'dmesg | grep hda' (these errors occurred upon resuming from suspend to disk): [34074.459505] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34074.459685] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34074.459886] hda: possibly failed opcode: 0x25 [34079.744751] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.744931] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.745135] hda: possibly failed opcode: 0x25 [34079.750086] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.750263] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.750466] hda: possibly failed opcode: 0x25 [34079.789002] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.789192] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.789411] hda: possibly failed opcode: 0x25 [34079.794851] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.795043] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.795261] hda: possibly failed opcode: 0x25 I ran the short and long SMART self-tests, and they seem clean: smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offlineCompleted without error 00% 5880 - # 2 Short offline Completed without error 00% 5879 - # 3 Short offline Completed without error 00% 1435 - [#1 and #2 are the ones I ran yesterday, IIUC.] I've attached the output of '# smartctl -a /dev/hda' to this mail. Here's an excerpt of syslog ('grep smartd /var/log/syslog', with a bunch of 'Temperature_Celsius changed' lines removed, since I think they're normal): Jun 9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 191 G-Sense_Error_Rate changed from 100 to 99 Jun 9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 12 to 17 Jun 9 15:12:29 lizzie smartd[3474]: Sending warning via mail to r...@localhost ... Jun 9 15:12:29 lizzie smartd[3474]: Warning via mail to r...@localhost: successful Jun 9 19:09:49 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 17 to 28 Jun 9 20:42:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 191 G-Sense_Error_Rate changed from 99 to 100 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 2 Throughput_Performance changed from 100 to 105 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 3 Spin_Up_Time changed from 151 to 152 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 100 to 126 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 28 to 34 So far, the only actual problem that I've noticed is a (single) failure to resume from disk yesterday, with some message (I neglected to save it) about a checksum failure, which I believe was accompanied by some kernel errors similar to the ones that I've reproduced above. Is this drive going? What further tests / diagnostics can I do? [Yes, I have backups, and I'm going to redouble my attention to keeping them current making sure that they're comprehensive.] Celejar -- mailmin.sourceforge.net - remote access via secure (OpenPGP) email ssuds.sourceforge.net - A Simple Sudoku Solver and Generator smart-info Description: Binary data
Re: Understanding my SMART errors
Celejar wrote: Jun 9 15:12:29 lizzie smartd[3474]: Sending warning via mail to r...@localhost ... Jun 9 15:12:29 lizzie smartd[3474]: Warning via mail to r...@localhost: successful What does this mail say? Cheers, Johannes -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Understanding my SMART errors
In 20090610143552.fd11cd1a.cele...@gmail.com, Celejar wrote: An excerpt of a current 'dmesg | grep hda' (these errors occurred upon resuming from suspend to disk): [34074.459505] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34074.459685] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34074.459886] hda: possibly failed opcode: 0x25 [34079.744751] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.744931] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.745135] hda: possibly failed opcode: 0x25 [34079.750086] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.750263] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.750466] hda: possibly failed opcode: 0x25 [34079.789002] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.789192] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.789411] hda: possibly failed opcode: 0x25 [34079.794851] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } [34079.795043] hda: dma_intr: error=0x84 { DriveStatusError BadCRC } [34079.795261] hda: possibly failed opcode: 0x25 Could be cabling or some other component between the kernel and the HD, but most likely this is the sign of a failing drive. Jun 9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 12 to 17 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 28 to 34 Is this drive going? Most likely, yes. Although, it might not completely fail for quite a while. It may even be fixable through manufacturer-specific means. -- Boyd Stephen Smith Jr. ,= ,-_-. =. b...@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/\_/ signature.asc Description: This is a digitally signed message part.
Re: Understanding my SMART errors
On Wed, Jun 10, 2009 at 8:50 PM, Boyd Stephen Smith Jr.b...@iguanasuicide.net wrote: In 20090610143552.fd11cd1a.cele...@gmail.com, Celejar wrote: Is this drive going? Most likely, yes. Although, it might not completely fail for quite a while. It may even be fixable through manufacturer-specific means. Most probably your drive is failing, do already have a backup? If not now is the best time to make one ; ) For more info on those errors: http://www.captain.at/howto-linux-driveready-seekcomplete-error-drivestatuserror.php -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Understanding my SMART errors
On Wed, 10 Jun 2009 20:45:14 +0200 Johannes Wiedersich johan...@physik.blm.tu-muenchen.de wrote: Celejar wrote: Jun 9 15:12:29 lizzie smartd[3474]: Sending warning via mail to r...@localhost ... Jun 9 15:12:29 lizzie smartd[3474]: Warning via mail to r...@localhost: successful What does this mail say? Nothing useful, which is why I didn't bother reproducing it originally. I have so far received two emails; the first: This email was generated by the smartd daemon running on: host name: lizzie DNS domain: localdomain NIS domain: (none) The following warning/error was logged by the smartd daemon: Device: /dev/hda, ATA error count increased from 0 to 4 For details see host's SYSLOG (default: /var/log/syslog). You can also use the smartctl utility for further investigation. No additional email messages about this problem will be sent. I received another one mentioning an ATA error count increase from 12 to 17. Celejar -- mailmin.sourceforge.net - remote access via secure (OpenPGP) email ssuds.sourceforge.net - A Simple Sudoku Solver and Generator -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Understanding my SMART errors
On Wed, 10 Jun 2009 13:50:29 -0500 Boyd Stephen Smith Jr. b...@iguanasuicide.net wrote: ... Could be cabling or some other component between the kernel and the HD, but most likely this is the sign of a failing drive. Jun 9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 12 to 17 Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 28 to 34 Is this drive going? Most likely, yes. Although, it might not completely fail for quite a while. It may even be fixable through manufacturer-specific means. Thanks; I guess I'll start keeping my eyes open for a replacement. Celejar -- mailmin.sourceforge.net - remote access via secure (OpenPGP) email ssuds.sourceforge.net - A Simple Sudoku Solver and Generator -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: Understanding my SMART errors
On Wed, 10 Jun 2009 21:26:05 +0200 Aniruddha mailingdotl...@gmail.com wrote: On Wed, Jun 10, 2009 at 8:50 PM, Boyd Stephen Smith Jr.b...@iguanasuicide.net wrote: In 20090610143552.fd11cd1a.cele...@gmail.com, Celejar wrote: Is this drive going? Most likely, yes. Although, it might not completely fail for quite a while. It may even be fixable through manufacturer-specific means. Most probably your drive is failing, do already have a backup? If not now is the best time to make one ; ) For more info on those errors: As I mentioned in my OP, I do have them, and I'll certainly be extra vigilant about them now. http://www.captain.at/howto-linux-driveready-seekcomplete-error-drivestatuserror.php Thanks; I'll have to take a look. Celejar -- mailmin.sourceforge.net - remote access via secure (OpenPGP) email ssuds.sourceforge.net - A Simple Sudoku Solver and Generator -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org