Re: Disk heads won't park [pat II]
Reviving this thread since i tried turning the machine on again (and amybe another thread will bump this one). And, again (well i wasn't expecting it to go away), as soon as the machine starts - right after POST, even before GRUB - the drive starts making reading noise (like when an antivirus is scanning or the system is thrashing). The only way it stops is with hdparm -y (no wonders there). This is a Seagate Barracuda ST31000528AS drive with a CC49 firmware upgrade. Here's a few other commands i tried: ~# hdparm -Z /dev/sdd /dev/sdd: disabling Seagate auto powersaving mode HDIO_DRIVE_CMD(seagatepwrsave) failed: Input/output error ~# hdparm -B /dev/sdd APM_level = not supported The message Incorrect metadata area header checksum on /dev/sdd1 at offset 4096 shows up in dmesg and on lvm operations. Here's some SMART fun: Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGSVALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 113 099 006-50643073 3 Spin_Up_TimePO 095 095 000-0 4 Start_Stop_Count-O--CK 099 099 020-1274 5 Reallocated_Sector_Ct PO--CK 047 047 036-2181 7 Seek_Error_Rate POSR-- 075 060 030-35143152 9 Power_On_Hours -O--CK 079 079 000-18750 10 Spin_Retry_CountPO--C- 100 100 097-0 12 Power_Cycle_Count -O--CK 100 100 020-638 183 Runtime_Bad_Block -O--CK 100 100 000-0 184 End-to-End_Error-O--CK 100 100 099-0 187 Reported_Uncorrect -O--CK 100 100 000-0 188 Command_Timeout -O--CK 100 099 000-1 189 High_Fly_Writes -O-RCK 100 100 000-0 190 Airflow_Temperature_Cel -O---K 071 051 045-29 (Min/Max 22/29) 194 Temperature_Celsius -O---K 029 049 000-29 (0 11 0 0) 195 Hardware_ECC_Recovered -O-RC- 026 018 000-50643073 197 Current_Pending_Sector -O--C- 100 100 000-0 198 Offline_Uncorrectable C- 100 100 000-0 199 UDMA_CRC_Error_Count-OSRCK 200 200 000-0 240 Head_Flying_Hours -- 100 253 000-178838143258596 241 Total_LBAs_Written -- 100 253 000-1457922426 242 Total_LBAs_Read -- 100 253 000-1552877542 ||_ K auto-keep |__ C event count ___ R error rate ||| S speed/performance ||_ O updated online |__ P prefailure warning I've also run a few tests, but they also show as Aborted even when i let it run for hours: === START OF READ SMART DATA SECTION === SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Self-test routine in progress 90% 18752 - # 2 Short offline Aborted by host 90% 18752 - # 3 Short offline Aborted by host 90% 18751 - # 4 Short offline Aborted by host 90% 18751 - # 5 Short offline Aborted by host 90% 18751 - # 6 Extended offlineAborted by host 90% 18750 - # 7 Extended offlineCompleted without error 00% 18746 - # 8 Extended offlineAborted by host 90% 18742 - # 9 Extended offlineInterrupted (host reset) 90% 18742 - #10 Short offline Interrupted (host reset) 00% 18741 - #11 Short offline Completed without error 00% 18653 - # smartctl -A /dev/sdd smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 113 099 006Pre-fail Always - 50652210 3 Spin_Up_Time0x0003 095 095 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 099 099 020Old_age Always - 1283 5 Reallocated_Sector_Ct 0x0033 047 047 036Pre-fail Always - 2181 7 Seek_Error_Rate 0x000f 075 060 030Pre-fail Always - 35273246 9 Power_On_Hours 0x0032 079 079 000Old_age Always - 18754 10 Spin_Retry_Count0x0013 100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100
Re: Disk heads won't park [pat II]
On 4/29/2014 1:20 PM, Nuno Magalhães wrote: Reviving this thread since i tried turning the machine on again (and amybe another thread will bump this one). And, again (well i wasn't expecting it to go away), as soon as the machine starts - right after POST, even before GRUB - the drive starts making reading noise (like when an antivirus is scanning or the system is thrashing). The only way it stops is with hdparm -y (no wonders there). The drive isn't failing, but has failed. Replace it. Mechanical drive platters have hard coded track markers. These are created by a low level format at the factory on today's drives. Those with experience going back to MFM/RLL days may recall performing low level formats due to stepper motor issues. This was done by entering g=c800:5 in a debugger, which loaded and executed the controller's firmware format utility. Drive firmware reads the low level track markers in order to properly position the read/write head on user data tracks. The noise you are hearing is the head seeking across the platter trying to locate the track markers and it is unsuccessful. The likely cause is a worn out actuator return spring. This is a rather common failure mode. The cost of the return spring is about USD 0.0025, about a quarter of a cent. If the spring tension deviates too far from spec the head will no longer align to a track marker. While Q.C. is typically good on spring manufacturing, they are made from large spools of drawn steel wire. A slight imperfection in a few feet of a 10,000 foot spool will yield a few dozen springs that may not last long, in your case about 2.1 years. Worn out spindle bearings can also cause head locating problems, but in this case you'll usually notice a vibration in the PC case and likely a hum accompanying it. Q.C. on the bearing assemblies is much higher than return springs. The spindle assy is the most expensive part in a disk drive due to the manufacturing tolerance and spin balancing required. You usually don't see bearing wear issues until 5+ years of power on duty. And of course bearing wear is inversely proportional to spindle speed. I.e. 5K drive bearings should normally last longer than 15K drive bearings. No software tool can identify the cause of your problem. However, SMART has been telling you for some time that the drive was experiencing seek errors. Seek errors indicate a head positioning problem. A head positioning problem normally indicates a worn return spring, bearings, or possibly a problem with the voice coil or its drive circuit, though the latter is rare. Cheers, Stan This is a Seagate Barracuda ST31000528AS drive with a CC49 firmware upgrade. Here's a few other commands i tried: ... 7 Seek_Error_Rate POSR-- 075 060 030-35143152 ... I do assume it is failing, but i'd like to know why and which values are really tell-tale (for instance the WHEN_FAILED column above is empty, so i can't realyl draw any conclusions). This is a recently installed, headless system with almost nothing installed. Thanks, Nuno -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/535ffbc4.1060...@hardwarefreak.com
Re: Disk heads won't park [pat II]
On Tue, Apr 29, 2014 at 8:21 PM, Stan Hoeppner s...@hardwarefreak.com wrote: The drive isn't failing, but has failed. Replace it. I already have another one on the way. I was going to buy Samsung but then learnt their drive division was bought by Seagate (which also bought Maxtor, the brand of the oldest drive on my desktop). I settled for a Toshiba DT01ACA100 which would've been here already if the store hadn't handed me an DT01ABA100 instead (rpm difference). Maybe i just got unlucky with this Seagate, we'll see. I assume the return spring repair would be both infeasable and way beyond USD 0.0025 or the cost of the new drive. :) Alas, such is the market. The weird thing is, to an extent, the drive kinda works/ed (i guess only one platter is damaged but i won't play expert). It'll make a lovely paper-weight, though. No software tool can identify the cause of your problem. However, SMART has been telling you for some time that the drive was experiencing seek errors. That's a subtle hint for me to setup smartd for the other drives (it was planned... in the to-do stack). Thank you for your very thorough explanation. Cheers, Nuno -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/cadqa9ub2z3yykmcb6owxmyw7vpevp1n8x_kgh6nxap2xxgw...@mail.gmail.com
Re: Disk heads won't park [pat II]
On 4/29/2014 6:13 PM, Nuno Magalhães wrote: On Tue, Apr 29, 2014 at 8:21 PM, Stan Hoeppner s...@hardwarefreak.com wrote: The drive isn't failing, but has failed. Replace it. I already have another one on the way. I was going to buy Samsung but then learnt their drive division was bought by Seagate (which also bought Maxtor, the brand of the oldest drive on my desktop). I settled for a Toshiba DT01ACA100 which would've been here already if the store hadn't handed me an DT01ABA100 instead (rpm difference). Maybe i just got unlucky with this Seagate, we'll see. I assume the return spring repair would be both infeasable and way beyond USD 0.0025 or the cost of the new drive. :) Alas, such is the market. The weird thing is, to an extent, the drive kinda works/ed (i guess only one platter is damaged but i won't play expert). It'll make a lovely paper-weight, though. It's not feasible to effect repairs to the moving parts of a modern hard disk drive. To do so would require breaking the air seal, and doing that will introduce dust particles into the platter cavity. Screw the cover back on and fire it up, and in no time flat the dust particles will scour the platter surfaces, as they get bounced around at 5900 to 15,000 RPM, and if one hits a read/write head it will damage it. Some of the professional data recovery services have clean room facilities and trained personnel (or used to anyway) who can effect such repairs in herculean efforts to recover data, but the drives are never returned to service. Last I heard such services start around $10,000 USD with no guarantee of data recovery from the failed drive. No software tool can identify the cause of your problem. However, SMART has been telling you for some time that the drive was experiencing seek errors. That's a subtle hint for me to setup smartd for the other drives (it was planned... in the to-do stack). Don't bother. Cron 'smartctl -A /dev/[device]' the first of every month and have it mail the output to you. Look at the raw values for seek and read error rates, reallocated sectors, etc. Once these pass zero and keep climbing it's time to get a replacement on the way. At that point the drive may have a year left, maybe a month, or maybe it will die tomorrow. No way to know. Don't wait until a drive is completely dead before replacing it. That mentality is for kitchen appliances, not something that won't spit your data back out after it goes ker-thunk. ;) Thank you for your very thorough explanation. You're welcome. Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/536059f3.9070...@hardwarefreak.com