Re: Disk heads won't park [pat II]

2014-04-29 Thread Nuno Magalhães
Reviving this thread since i tried turning the machine on again (and
amybe another thread will bump this one).

And, again (well i wasn't expecting it to go away), as soon as the
machine starts - right after POST, even before GRUB - the drive starts
making reading noise (like when an antivirus is scanning or the
system is thrashing). The only way it stops is with hdparm -y (no
wonders there).

This is a Seagate Barracuda ST31000528AS drive with a CC49 firmware
upgrade. Here's a few other commands i tried:

~# hdparm -Z /dev/sdd
/dev/sdd:
 disabling Seagate auto powersaving mode
 HDIO_DRIVE_CMD(seagatepwrsave) failed: Input/output error

~# hdparm -B /dev/sdd
APM_level  = not supported

The message Incorrect metadata area header checksum on /dev/sdd1 at
offset 4096 shows up in dmesg and on lvm operations. Here's some
SMART fun:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate POSR--   113   099   006-50643073
  3 Spin_Up_TimePO   095   095   000-0
  4 Start_Stop_Count-O--CK   099   099   020-1274
  5 Reallocated_Sector_Ct   PO--CK   047   047   036-2181
  7 Seek_Error_Rate POSR--   075   060   030-35143152
  9 Power_On_Hours  -O--CK   079   079   000-18750
 10 Spin_Retry_CountPO--C-   100   100   097-0
 12 Power_Cycle_Count   -O--CK   100   100   020-638
183 Runtime_Bad_Block   -O--CK   100   100   000-0
184 End-to-End_Error-O--CK   100   100   099-0
187 Reported_Uncorrect  -O--CK   100   100   000-0
188 Command_Timeout -O--CK   100   099   000-1
189 High_Fly_Writes -O-RCK   100   100   000-0
190 Airflow_Temperature_Cel -O---K   071   051   045-29 (Min/Max 22/29)
194 Temperature_Celsius -O---K   029   049   000-29 (0 11 0 0)
195 Hardware_ECC_Recovered  -O-RC-   026   018   000-50643073
197 Current_Pending_Sector  -O--C-   100   100   000-0
198 Offline_Uncorrectable   C-   100   100   000-0
199 UDMA_CRC_Error_Count-OSRCK   200   200   000-0
240 Head_Flying_Hours   --   100   253   000-178838143258596
241 Total_LBAs_Written  --   100   253   000-1457922426
242 Total_LBAs_Read --   100   253   000-1552877542
||_ K auto-keep
|__ C event count
___ R error rate
||| S speed/performance
||_ O updated online
|__ P prefailure warning

I've also run a few tests, but they also show as Aborted even when i
let it run for hours:

=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_DescriptionStatus  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline   Self-test routine in progress 90% 18752 -
# 2  Short offline   Aborted by host   90% 18752 -
# 3  Short offline   Aborted by host   90% 18751 -
# 4  Short offline   Aborted by host   90% 18751 -
# 5  Short offline   Aborted by host   90% 18751 -
# 6  Extended offlineAborted by host   90% 18750 -
# 7  Extended offlineCompleted without error   00% 18746 -
# 8  Extended offlineAborted by host   90% 18742 -
# 9  Extended offlineInterrupted (host reset)  90% 18742 -
#10  Short offline   Interrupted (host reset)  00% 18741 -
#11  Short offline   Completed without error   00% 18653 -

# smartctl -A /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   113   099   006Pre-fail
Always   -   50652210
  3 Spin_Up_Time0x0003   095   095   000Pre-fail
Always   -   0
  4 Start_Stop_Count0x0032   099   099   020Old_age
Always   -   1283
  5 Reallocated_Sector_Ct   0x0033   047   047   036Pre-fail
Always   -   2181
  7 Seek_Error_Rate 0x000f   075   060   030Pre-fail
Always   -   35273246
  9 Power_On_Hours  0x0032   079   079   000Old_age
Always   -   18754
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail
Always   -   0
 12 Power_Cycle_Count   0x0032   100   

Re: Disk heads won't park [pat II]

2014-04-29 Thread Stan Hoeppner
On 4/29/2014 1:20 PM, Nuno Magalhães wrote:
 Reviving this thread since i tried turning the machine on again (and
 amybe another thread will bump this one).
 
 And, again (well i wasn't expecting it to go away), as soon as the
 machine starts - right after POST, even before GRUB - the drive starts
 making reading noise (like when an antivirus is scanning or the
 system is thrashing). The only way it stops is with hdparm -y (no
 wonders there).

The drive isn't failing, but has failed.  Replace it.

Mechanical drive platters have hard coded track markers.  These are
created by a low level format at the factory on today's drives.  Those
with experience going back to MFM/RLL days may recall performing low
level formats due to stepper motor issues.  This was done by entering
g=c800:5 in a debugger, which loaded and executed the controller's
firmware format utility.

Drive firmware reads the low level track markers in order to properly
position the read/write head on user data tracks.  The noise you are
hearing is the head seeking across the platter trying to locate the
track markers and it is unsuccessful.

The likely cause is a worn out actuator return spring.  This is a rather
common failure mode.  The cost of the return spring is about USD 0.0025,
about a quarter of a cent.  If the spring tension deviates too far from
spec the head will no longer align to a track marker.  While Q.C. is
typically good on spring manufacturing, they are made from large spools
of drawn steel wire.  A slight imperfection in a few feet of a 10,000
foot spool will yield a few dozen springs that may not last long, in
your case about 2.1 years.

Worn out spindle bearings can also cause head locating problems, but in
this case you'll usually notice a vibration in the PC case and likely a
hum accompanying it.  Q.C. on the bearing assemblies is much higher than
return springs.  The spindle assy is the most expensive part in a disk
drive due to the manufacturing tolerance and spin balancing required.
You usually don't see bearing wear issues until 5+ years of power on
duty.  And of course bearing wear is inversely proportional to spindle
speed.  I.e. 5K drive bearings should normally last longer than 15K
drive bearings.

No software tool can identify the cause of your problem.  However, SMART
has been telling you for some time that the drive was experiencing seek
errors.  Seek errors indicate a head positioning problem.  A head
positioning problem normally indicates a worn return spring, bearings,
or possibly a problem with the voice coil or its drive circuit, though
the latter is rare.

Cheers,

Stan



 This is a Seagate Barracuda ST31000528AS drive with a CC49 firmware
 upgrade. Here's a few other commands i tried:
...
   7 Seek_Error_Rate POSR--   075   060   030-35143152
...
 I do assume it is failing, but i'd like to know why and which values
 are really tell-tale (for instance the WHEN_FAILED column above is
 empty, so i can't realyl draw any conclusions).
 
 This is a recently installed, headless system with almost nothing installed.
 
 Thanks,
 Nuno
 
 


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/535ffbc4.1060...@hardwarefreak.com



Re: Disk heads won't park [pat II]

2014-04-29 Thread Nuno Magalhães
On Tue, Apr 29, 2014 at 8:21 PM, Stan Hoeppner s...@hardwarefreak.com wrote:

 The drive isn't failing, but has failed.  Replace it.

I already have another one on the way. I was going to buy Samsung but
then learnt their drive division was bought by Seagate (which also
bought Maxtor, the brand of the oldest drive on my desktop). I settled
for a Toshiba DT01ACA100 which would've been here already if the store
hadn't handed me an DT01ABA100 instead (rpm difference). Maybe i just
got unlucky with this Seagate, we'll see.

I assume the return spring repair would be both infeasable and way
beyond USD 0.0025 or the cost of the new drive. :) Alas, such is the
market. The weird thing is, to an extent, the drive kinda works/ed (i
guess only one platter is damaged but i won't play expert). It'll make
a lovely paper-weight, though.

 No software tool can identify the cause of your problem.  However, SMART
 has been telling you for some time that the drive was experiencing seek
 errors.

That's a subtle hint for me to setup smartd for the other drives (it
was planned... in the to-do stack).

Thank you for your very thorough explanation.

Cheers,
Nuno


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/cadqa9ub2z3yykmcb6owxmyw7vpevp1n8x_kgh6nxap2xxgw...@mail.gmail.com



Re: Disk heads won't park [pat II]

2014-04-29 Thread Stan Hoeppner
On 4/29/2014 6:13 PM, Nuno Magalhães wrote:
 On Tue, Apr 29, 2014 at 8:21 PM, Stan Hoeppner s...@hardwarefreak.com wrote:

 The drive isn't failing, but has failed.  Replace it.
 
 I already have another one on the way. I was going to buy Samsung but
 then learnt their drive division was bought by Seagate (which also
 bought Maxtor, the brand of the oldest drive on my desktop). I settled
 for a Toshiba DT01ACA100 which would've been here already if the store
 hadn't handed me an DT01ABA100 instead (rpm difference). Maybe i just
 got unlucky with this Seagate, we'll see.
 
 I assume the return spring repair would be both infeasable and way
 beyond USD 0.0025 or the cost of the new drive. :) Alas, such is the
 market. The weird thing is, to an extent, the drive kinda works/ed (i
 guess only one platter is damaged but i won't play expert). It'll make
 a lovely paper-weight, though.

It's not feasible to effect repairs to the moving parts of a modern hard
disk drive.  To do so would require breaking the air seal, and doing
that will introduce dust particles into the platter cavity.  Screw the
cover back on and fire it up, and in no time flat the dust particles
will scour the platter surfaces, as they get bounced around at 5900 to
15,000 RPM, and if one hits a read/write head it will damage it.

Some of the professional data recovery services have clean room
facilities and trained personnel (or used to anyway) who can effect such
repairs in herculean efforts to recover data, but the drives are never
returned to service.  Last I heard such services start around $10,000
USD with no guarantee of data recovery from the failed drive.

 No software tool can identify the cause of your problem.  However, SMART
 has been telling you for some time that the drive was experiencing seek
 errors.
 
 That's a subtle hint for me to setup smartd for the other drives (it
 was planned... in the to-do stack).

Don't bother.  Cron 'smartctl -A /dev/[device]' the first of every month
and have it mail the output to you.  Look at the raw values for seek and
read error rates, reallocated sectors, etc.  Once these pass zero and
keep climbing it's time to get a replacement on the way.  At that point
the drive may have a year left, maybe a month, or maybe it will die
tomorrow.  No way to know.  Don't wait until a drive is completely dead
before replacing it.  That mentality is for kitchen appliances, not
something that won't spit your data back out after it goes ker-thunk. ;)

 Thank you for your very thorough explanation.

You're welcome.

Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/536059f3.9070...@hardwarefreak.com