HDD problems that do not follow SMART results

2012-08-28 Thread Merciadri Luca
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I'm recurrently getting freezes because of HDD problems. During these
freezes, that generally last until I shut down the computer, I get such
messages:

==
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax Plus 9 family
Device Model: Maxtor 6Y160M0
Serial Number:Y44NQSTE
Firmware Version: YAR51HW0
User Capacity:163,928,604,672 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Tue Aug 28 16:09:09 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[...]

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.30] ata6.00: exception 
Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.35] ata6: SError: { 
UnrecovData Handshk }
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.38] ata6.00: failed 
command: WRITE DMA EXT
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.44] ata6.00: cmd 
35/00:80:00:4f:f5/00:01:12:00:00/e0 tag 0 dma 196608 out
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.46]  res 
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.49] ata6.00: status: { 
DRDY }
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.56] ata6: hard 
resetting link
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.476042] ata6: SATA link up 
3.0 Gbps (SStatus 123 SControl 300)
Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.597999] ata6.00: 
configured for UDMA/133
Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.598003] ata6.00: device 
reported invalid CHS sector 0
Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.598008] ata6: EH complete
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965242] ata6.00: exception 
Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965247] ata6: SError: { 
UnrecovData Handshk }
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965251] ata6.00: failed 
command: WRITE DMA EXT
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965257] ata6.00: cmd 
35/00:80:00:4f:f5/00:01:12:00:00/e0 tag 0 dma 196608 out
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965258]  res 
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965261] ata6.00: status: { 
DRDY }
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965269] ata6: hard 
resetting link
Aug 28 10:22:10 merciadriluca-station kernel: [ 2191.440043] ata6: SATA link up 
3.0 Gbps (SStatus 123 SControl 300)
Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546566] ata6.00: 
configured for UDMA/133
Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546571] ata6.00: device 
reported invalid CHS sector 0
Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546578] ata6: EH complete
==

After restarting, I got messages such as

==
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816026] ata4.00: exception 
Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816031] ata4: SError: { 
UnrecovData Handshk }
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816035] ata4.00: failed 
command: WRITE DMA
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816040] ata4.00: cmd 
ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816042]  res 
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816045] ata4.00: status: { 
DRDY }
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816053] ata4: hard 
resetting link
Aug 28 11:01:35 merciadriluca-station kernel: [  234.292041] ata4: SATA link up 
3.0 Gbps (SStatus 123 SControl 300)
Aug 28 11:01:35 merciadriluca-station kernel: [  234.411821] ata4.00: 
configured for UDMA/133
Aug 28 11:01:35 merciadriluca-station kernel: [  234.411826] ata4.00: device 
reported invalid CHS sector 0
Aug 28 11:01:35 merciadriluca-station kernel: [  234.411831] ata4: EH complete
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780026] ata4: limiting 
SATA link speed to 1.5 Gbps
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780030] ata4.00: exception 
Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780034] ata4: SError: { 
UnrecovData Handshk }
Aug 28 11:02:14 merciadriluca-station 

Re: HDD problems that do not follow SMART results

2012-08-28 Thread Camaleón
On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote:

 I'm recurrently getting freezes because of HDD problems. During these
 freezes, that generally last until I shut down the computer, I get such
 messages:
 
 ==
 smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
 Copyright (C) 2002-10 by Bruce Allen,
 http://smartmontools.sourceforge.net
 
 === START OF INFORMATION SECTION ===
 Model Family: Maxtor DiamondMax Plus 9 family 
 Device Model: Maxtor 6Y160M0

(...)

Do you hear any clicking sound coming from the hard disk?

Anyway, if my memory serves me well, that hard disk model has to be at 
least 8 or more years...


 Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.30] ata6.00: 
 exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen 
 Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.35] ata6: SError: { 
 UnrecovData Handshk } 
 Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.38] ata6.00: failed 
 command: WRITE DMA EXT 

(...)


 After restarting, I got messages such as
 
 ==
 Aug 28 11:01:35 merciadriluca-station kernel: [  233.816026] ata4.00: 
 exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen 
 Aug 28 11:01:35 merciadriluca-station kernel: [  233.816031] ata4: SError: { 
 UnrecovData Handshk } 
 Aug 28 11:01:35 merciadriluca-station kernel: [  233.816035] ata4.00: failed 
 command: WRITE DMA 
 Aug 28 11:01:35 merciadriluca-station kernel: [  233.816040] ata4.00: cmd 
 ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out 

(...)

 and also
 
 ==
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572574] sd 3:0:0:0: 
 [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572578] sd 3:0:0:0: 
 [sdc] Sense Key : Aborted Command [current] [descriptor] 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572582] Descriptor sense 
 data with sense descriptors (in hex): 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572584] 72 0b 00 
 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572592] 00 00 00 
 00 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572596] sd 3:0:0:0: 
 [sdc] Add. Sense: No additional sense information 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572600] sd 3:0:0:0: 
 [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572608] end_request: I/O 
 error, dev sdc, sector 361216 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572613] Buffer I/O error 
 on device sdc5, logical block 43136 
 Aug 28 11:04:49 merciadriluca-station kernel: [  427.572615] lost page write 
 due to I/O error on sdc5 

(...)

 It looks like the HDD associated with sdc is encountering some issues.

And more specifically, /dev/sdc5 partition.

 But is sdc linked to ata4 or ata6? Do these two problems (before and
 after restarting) are the same ones or not?

Yes, it seems there are two hard disks affected. Run:

dmesg | grep -i ata[0-6]

 After running several short and long tests with S.M.A.R.T. on each of my
 3 HDDs, I got these results:
 
 1) HDD associated with /dev/sda looks in some pre-failure state:

(...)

 SMART Error Log Version: 1
 Warning: ATA error count 454 inconsistent with error log pointer 5

I would run here the manufacturer's test disk but this one looks it's a bit 
tired. You can keep monitoring the tagged pre-fail values and proceed with 
a hard disk replacement as soon as these are quickly increased.

 2) HDD associated with /dev/sdb verifies

(...)

 (this is the one that looks the healthiest, actually).

Agreed.
 
 3) The HDD associated with /dev/sdc, which should be in some way broken
 (being given the messages that I wrote above from /var/log/syslog), does
 not look so through SMART:

(...)

Oh my... consider also to run the manufacturer's smart test utility for this 
one... and make a full backup _now_.

 What can I deduce from this? It looks like /dev/sdc is broken but SMART
 tells /dev/sda would have more chance being on the verge to broke than
 /dev/sdc.

I can deduce that Maxtor hard disks are very old and would deserve for a 
retirement, eventhough they are still up and (somehow) running.

 Note that I tried exchanging SATA cables, to no avail.

In your case there are logged errors regarding sectors and I/O errors and this 
is dangerous.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/k1iuri$ik9$2...@ger.gmane.org



Re: HDD problems that do not follow SMART results

2012-08-28 Thread hvw59601

Camaleón wrote:

On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote:


I'm recurrently getting freezes because of HDD problems. During these
freezes, that generally last until I shut down the computer, I get such
messages:

==
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen,
http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax Plus 9 family 
Device Model: Maxtor 6Y160M0


(...)

Do you hear any clicking sound coming from the hard disk?

Anyway, if my memory serves me well, that hard disk model has to be at 
least 8 or more years...




Good memory. I just replaced a Model 6Y080P0 of that family with a 
SSD830. I can't find when I installed that disc. Must be about 8 years 
ago. And never anything wrong per smartctl.


Hugo



Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.30] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen 
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.35] ata6: SError: { UnrecovData Handshk } 
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.38] ata6.00: failed command: WRITE DMA EXT 


(...)



After restarting, I got messages such as

==
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen 
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816031] ata4: SError: { UnrecovData Handshk } 
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816035] ata4.00: failed command: WRITE DMA 
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out 


(...)


and also

==
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572582] Descriptor sense data with sense descriptors (in hex): 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572584] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572592] 00 00 00 00 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572608] end_request: I/O error, dev sdc, sector 361216 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572613] Buffer I/O error on device sdc5, logical block 43136 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572615] lost page write due to I/O error on sdc5 


(...)


It looks like the HDD associated with sdc is encountering some issues.


And more specifically, /dev/sdc5 partition.


But is sdc linked to ata4 or ata6? Do these two problems (before and
after restarting) are the same ones or not?


Yes, it seems there are two hard disks affected. Run:

dmesg | grep -i ata[0-6]


After running several short and long tests with S.M.A.R.T. on each of my
3 HDDs, I got these results:

1) HDD associated with /dev/sda looks in some pre-failure state:


(...)


SMART Error Log Version: 1
Warning: ATA error count 454 inconsistent with error log pointer 5


I would run here the manufacturer's test disk but this one looks it's a bit 
tired. You can keep monitoring the tagged pre-fail values and proceed with 
a hard disk replacement as soon as these are quickly increased.



2) HDD associated with /dev/sdb verifies


(...)


(this is the one that looks the healthiest, actually).


Agreed.
 

3) The HDD associated with /dev/sdc, which should be in some way broken
(being given the messages that I wrote above from /var/log/syslog), does
not look so through SMART:


(...)

Oh my... consider also to run the manufacturer's smart test utility for this 
one... and make a full backup _now_.



What can I deduce from this? It looks like /dev/sdc is broken but SMART
tells /dev/sda would have more chance being on the verge to broke than
/dev/sdc.


I can deduce that Maxtor hard disks are very old and would deserve for a 
retirement, eventhough they are still up and (somehow) running.



Note that I tried exchanging SATA cables, to no avail.


In your case there are logged errors regarding sectors and I/O errors and this 
is dangerous.


Greetings,




--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/k1jdpf$vai$1...@ger.gmane.org



Re: HDD problems that do not follow SMART results

2012-08-28 Thread Merciadri Luca
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thanks for both answers. I effectively removed the very old one,
installed at its place a brand new HDD. This way, I disconnected the one
which was SMART-recognized as sick, and put a new one which now contains
/home/*

This looks perfect, I just had to modify /etc/fstab accordingly (to
modify one UUID value for /home) and to use some screwdrivers.

Thanks again for the help. I hope everything will be fine now. I'm just
surprised that SMART actually detected a faulty HDD which was not
causing any troubles, when it said nothing for a drive that was totally
faulty!

- -- 
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
- -- 

If it's too good to be true, then it probably is.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/

iEYEARECAAYFAlA9PjoACgkQM0LLzLt8MhzVigCfQMihJPRkv415lMddtEPmPQ0N
7PEAniV/oCcIVKHX51zX3DXgHU2cY7zX
=6c63
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/87fw76znph.fsf@merciadriluca-station.MERCIADRILUCA