HDD problems that do not follow SMART results
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I'm recurrently getting freezes because of HDD problems. During these freezes, that generally last until I shut down the computer, I get such messages: == smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Maxtor DiamondMax Plus 9 family Device Model: Maxtor 6Y160M0 Serial Number:Y44NQSTE Firmware Version: YAR51HW0 User Capacity:163,928,604,672 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is:Tue Aug 28 16:09:09 2012 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED [...] SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.30] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.35] ata6: SError: { UnrecovData Handshk } Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.38] ata6.00: failed command: WRITE DMA EXT Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.44] ata6.00: cmd 35/00:80:00:4f:f5/00:01:12:00:00/e0 tag 0 dma 196608 out Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.46] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error) Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.49] ata6.00: status: { DRDY } Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.56] ata6: hard resetting link Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.476042] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.597999] ata6.00: configured for UDMA/133 Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.598003] ata6.00: device reported invalid CHS sector 0 Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.598008] ata6: EH complete Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965242] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965247] ata6: SError: { UnrecovData Handshk } Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965251] ata6.00: failed command: WRITE DMA EXT Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965257] ata6.00: cmd 35/00:80:00:4f:f5/00:01:12:00:00/e0 tag 0 dma 196608 out Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965258] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error) Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965261] ata6.00: status: { DRDY } Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965269] ata6: hard resetting link Aug 28 10:22:10 merciadriluca-station kernel: [ 2191.440043] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546566] ata6.00: configured for UDMA/133 Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546571] ata6.00: device reported invalid CHS sector 0 Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546578] ata6: EH complete == After restarting, I got messages such as == Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816031] ata4: SError: { UnrecovData Handshk } Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816035] ata4.00: failed command: WRITE DMA Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816042] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error) Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816045] ata4.00: status: { DRDY } Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816053] ata4: hard resetting link Aug 28 11:01:35 merciadriluca-station kernel: [ 234.292041] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 28 11:01:35 merciadriluca-station kernel: [ 234.411821] ata4.00: configured for UDMA/133 Aug 28 11:01:35 merciadriluca-station kernel: [ 234.411826] ata4.00: device reported invalid CHS sector 0 Aug 28 11:01:35 merciadriluca-station kernel: [ 234.411831] ata4: EH complete Aug 28 11:02:14 merciadriluca-station kernel: [ 272.780026] ata4: limiting SATA link speed to 1.5 Gbps Aug 28 11:02:14 merciadriluca-station kernel: [ 272.780030] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 11:02:14 merciadriluca-station kernel: [ 272.780034] ata4: SError: { UnrecovData Handshk } Aug 28 11:02:14 merciadriluca-station
Re: HDD problems that do not follow SMART results
On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote: I'm recurrently getting freezes because of HDD problems. During these freezes, that generally last until I shut down the computer, I get such messages: == smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Maxtor DiamondMax Plus 9 family Device Model: Maxtor 6Y160M0 (...) Do you hear any clicking sound coming from the hard disk? Anyway, if my memory serves me well, that hard disk model has to be at least 8 or more years... Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.30] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.35] ata6: SError: { UnrecovData Handshk } Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.38] ata6.00: failed command: WRITE DMA EXT (...) After restarting, I got messages such as == Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816031] ata4: SError: { UnrecovData Handshk } Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816035] ata4.00: failed command: WRITE DMA Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out (...) and also == Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572582] Descriptor sense data with sense descriptors (in hex): Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572584] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572592] 00 00 00 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572608] end_request: I/O error, dev sdc, sector 361216 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572613] Buffer I/O error on device sdc5, logical block 43136 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572615] lost page write due to I/O error on sdc5 (...) It looks like the HDD associated with sdc is encountering some issues. And more specifically, /dev/sdc5 partition. But is sdc linked to ata4 or ata6? Do these two problems (before and after restarting) are the same ones or not? Yes, it seems there are two hard disks affected. Run: dmesg | grep -i ata[0-6] After running several short and long tests with S.M.A.R.T. on each of my 3 HDDs, I got these results: 1) HDD associated with /dev/sda looks in some pre-failure state: (...) SMART Error Log Version: 1 Warning: ATA error count 454 inconsistent with error log pointer 5 I would run here the manufacturer's test disk but this one looks it's a bit tired. You can keep monitoring the tagged pre-fail values and proceed with a hard disk replacement as soon as these are quickly increased. 2) HDD associated with /dev/sdb verifies (...) (this is the one that looks the healthiest, actually). Agreed. 3) The HDD associated with /dev/sdc, which should be in some way broken (being given the messages that I wrote above from /var/log/syslog), does not look so through SMART: (...) Oh my... consider also to run the manufacturer's smart test utility for this one... and make a full backup _now_. What can I deduce from this? It looks like /dev/sdc is broken but SMART tells /dev/sda would have more chance being on the verge to broke than /dev/sdc. I can deduce that Maxtor hard disks are very old and would deserve for a retirement, eventhough they are still up and (somehow) running. Note that I tried exchanging SATA cables, to no avail. In your case there are logged errors regarding sectors and I/O errors and this is dangerous. Greetings, -- Camaleón -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/k1iuri$ik9$2...@ger.gmane.org
Re: HDD problems that do not follow SMART results
Camaleón wrote: On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote: I'm recurrently getting freezes because of HDD problems. During these freezes, that generally last until I shut down the computer, I get such messages: == smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Maxtor DiamondMax Plus 9 family Device Model: Maxtor 6Y160M0 (...) Do you hear any clicking sound coming from the hard disk? Anyway, if my memory serves me well, that hard disk model has to be at least 8 or more years... Good memory. I just replaced a Model 6Y080P0 of that family with a SSD830. I can't find when I installed that disc. Must be about 8 years ago. And never anything wrong per smartctl. Hugo Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.30] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.35] ata6: SError: { UnrecovData Handshk } Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.38] ata6.00: failed command: WRITE DMA EXT (...) After restarting, I got messages such as == Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816031] ata4: SError: { UnrecovData Handshk } Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816035] ata4.00: failed command: WRITE DMA Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out (...) and also == Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572582] Descriptor sense data with sense descriptors (in hex): Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572584] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572592] 00 00 00 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572608] end_request: I/O error, dev sdc, sector 361216 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572613] Buffer I/O error on device sdc5, logical block 43136 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572615] lost page write due to I/O error on sdc5 (...) It looks like the HDD associated with sdc is encountering some issues. And more specifically, /dev/sdc5 partition. But is sdc linked to ata4 or ata6? Do these two problems (before and after restarting) are the same ones or not? Yes, it seems there are two hard disks affected. Run: dmesg | grep -i ata[0-6] After running several short and long tests with S.M.A.R.T. on each of my 3 HDDs, I got these results: 1) HDD associated with /dev/sda looks in some pre-failure state: (...) SMART Error Log Version: 1 Warning: ATA error count 454 inconsistent with error log pointer 5 I would run here the manufacturer's test disk but this one looks it's a bit tired. You can keep monitoring the tagged pre-fail values and proceed with a hard disk replacement as soon as these are quickly increased. 2) HDD associated with /dev/sdb verifies (...) (this is the one that looks the healthiest, actually). Agreed. 3) The HDD associated with /dev/sdc, which should be in some way broken (being given the messages that I wrote above from /var/log/syslog), does not look so through SMART: (...) Oh my... consider also to run the manufacturer's smart test utility for this one... and make a full backup _now_. What can I deduce from this? It looks like /dev/sdc is broken but SMART tells /dev/sda would have more chance being on the verge to broke than /dev/sdc. I can deduce that Maxtor hard disks are very old and would deserve for a retirement, eventhough they are still up and (somehow) running. Note that I tried exchanging SATA cables, to no avail. In your case there are logged errors regarding sectors and I/O errors and this is dangerous. Greetings, -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/k1jdpf$vai$1...@ger.gmane.org
Re: HDD problems that do not follow SMART results
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks for both answers. I effectively removed the very old one, installed at its place a brand new HDD. This way, I disconnected the one which was SMART-recognized as sick, and put a new one which now contains /home/* This looks perfect, I just had to modify /etc/fstab accordingly (to modify one UUID value for /home) and to use some screwdrivers. Thanks again for the help. I hope everything will be fine now. I'm just surprised that SMART actually detected a faulty HDD which was not causing any troubles, when it said nothing for a drive that was totally faulty! - -- Merciadri Luca See http://www.student.montefiore.ulg.ac.be/~merciadri/ - -- If it's too good to be true, then it probably is. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/ iEYEARECAAYFAlA9PjoACgkQM0LLzLt8MhzVigCfQMihJPRkv415lMddtEPmPQ0N 7PEAniV/oCcIVKHX51zX3DXgHU2cY7zX =6c63 -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87fw76znph.fsf@merciadriluca-station.MERCIADRILUCA