Re: scrub implies failing drive - smartctl blissfully unaware

2014-12-01 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/25/2014 6:13 PM, Chris Murphy wrote: The drive will only issue a read error when its ECC absolutely cannot recover the data, hard fail. A few years ago companies including Western Digital started shipping large cheap drives, think of the

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-28 Thread Patrik Lundquist
On 25 November 2014 at 22:34, Phillip Susi ps...@ubuntu.com wrote: On 11/19/2014 7:05 PM, Chris Murphy wrote: I'm not a hard drive engineer, so I can't argue either point. But consumer drives clearly do behave this way. On Linux, the kernel's default 30 second command timer eventually

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-28 Thread Patrik Lundquist
On 25 November 2014 at 23:14, Phillip Susi ps...@ubuntu.com wrote: On 11/19/2014 6:59 PM, Duncan wrote: The paper specifically mentioned that it wasn't necessarily the more expensive devices that were the best, either, but the ones that faired best did tend to have longer device-ready times.

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-25 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/19/2014 7:05 PM, Chris Murphy wrote: I'm not a hard drive engineer, so I can't argue either point. But consumer drives clearly do behave this way. On Linux, the kernel's default 30 second command timer eventually results in what look like

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-25 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/19/2014 6:59 PM, Duncan wrote: It's not physical spinup, but electronic device-ready. It happens on SSDs too and they don't have anything to spinup. If you have an SSD that isn't handling IO within 5 seconds or so of power on, it is badly

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-25 Thread Chris Murphy
On Tue, Nov 25, 2014 at 2:34 PM, Phillip Susi ps...@ubuntu.com wrote: I have seen plenty of error logs of people with drives that do properly give up and return an error instead of timing out so I get the feeling that most drives are properly behaved. Is there a particular make/model of

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-25 Thread Rich Freeman
On Tue, Nov 25, 2014 at 6:13 PM, Chris Murphy li...@colorremedies.com wrote: A few years ago companies including Western Digital started shipping large cheap drives, think of the green drives. These had very high TLER (Time Limited Error Recovery) settings, a.k.a. SCT ERC. Later they

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-22 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 11/21/2014 04:12 PM, Robert White wrote: Here's a bug from 2005 of someone having a problem with the ACPI IDE support... That is not ACPI emulation. ACPI is not used to access the disk, but rather it has hooks that give it a chance to diddle

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Ian Armstrong
On Fri, 21 Nov 2014 09:05:32 +0200, Brendan Hide wrote: On 2014/11/21 06:58, Zygo Blaxell wrote: I also notice you are not running regular SMART self-tests (e.g. by smartctl -t long) and the last (and first, and only!) self-test the drive ran was ~12000 hours ago. That means most of your

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/20/2014 5:45 PM, Robert White wrote: Nice attempt at saving face, but wrong as _always_. The CONFIG_PATA_ACPI option has been in the kernel since 2008 and lots of people have used it. If you search for ACPI ide you'll find people

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/20/2014 6:08 PM, Robert White wrote: Well you should have _actually_ trimmed your response down to not pressing send. _Many_ motherboards have complete RAID support at levels 0, 1, 10, and five 5. A few have RAID6. Some of them even

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Chris Murphy
On Fri, Nov 21, 2014 at 5:55 AM, Ian Armstrong bt...@iarmst.co.uk wrote: In my situation what I've found is that if I scrub let it fix the errors then a second pass immediately after will show no errors. If I then leave it a few days try again there will be errors, even in old files which

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Zygo Blaxell
On Fri, Nov 21, 2014 at 09:05:32AM +0200, Brendan Hide wrote: On 2014/11/21 06:58, Zygo Blaxell wrote: You have one reallocated sector, so the drive has lost some data at some time in the last 49000(!) hours. Normally reallocations happen during writes so the data that was lost was data you

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Chris Murphy
On Fri, Nov 21, 2014 at 10:42 AM, Zygo Blaxell zblax...@furryterror.org wrote: I run 'smartctl -t long' from cron overnight (or whenever the drives are most idle). You can also set up smartd.conf to launch the self tests; however, the syntax for test scheduling is byzantine compared to cron

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Robert White
On 11/21/2014 07:11 AM, Phillip Susi wrote: On 11/20/2014 5:45 PM, Robert White wrote: If you search for ACPI ide you'll find people complaining in 2008-2010 about windows error messages indicating the device is present in their system but no OS driver is available. Nope... not finding it.

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Robert White
On 11/21/2014 01:12 PM, Robert White wrote: (wrong links included in post...) Dangit... those two links were bad... wrong clipboard... /sigh... I'll just stand on the pasted text from the driver. 8-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Zygo Blaxell
On Fri, Nov 21, 2014 at 11:06:19AM -0700, Chris Murphy wrote: On Fri, Nov 21, 2014 at 10:42 AM, Zygo Blaxell zblax...@furryterror.org wrote: I run 'smartctl -t long' from cron overnight (or whenever the drives are most idle). You can also set up smartd.conf to launch the self tests;

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-21 Thread Ian Armstrong
On Fri, 21 Nov 2014 10:45:21 -0700 Chris Murphy wrote: On Fri, Nov 21, 2014 at 5:55 AM, Ian Armstrong bt...@iarmst.co.uk wrote: In my situation what I've found is that if I scrub let it fix the errors then a second pass immediately after will show no errors. If I then leave it a few

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/19/2014 5:25 PM, Robert White wrote: The controller, the thing that sets the ready bit and sends the interrupt is distinct from the driver, the thing that polls the ready bit when the interrupt is sent. At the bus level there are fixed

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/19/2014 5:33 PM, Robert White wrote: That would be fake raid, not hardware raid. The LSI MegaRaid controller people would _love_ to hear more about your insight into how their battery-backed multi-drive RAID controller is fake. You should

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Robert White
On 11/20/2014 12:26 PM, Phillip Susi wrote: Yes, ACPI 4.0 added this mess. I have yet to see a single system that actually implements it. I can't believe they even bothered adding this driver to the kernel. Is there anyone in the world who has ever used it? If no motherboard vendor has

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Robert White
On 11/20/2014 12:34 PM, Phillip Susi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/19/2014 5:33 PM, Robert White wrote: That would be fake raid, not hardware raid. The LSI MegaRaid controller people would _love_ to hear more about your insight into how their battery-backed

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Zygo Blaxell
On Tue, Nov 18, 2014 at 09:29:54AM +0200, Brendan Hide wrote: Hey, guys See further below extracted output from a daily scrub showing csum errors on sdb, part of a raid1 btrfs. Looking back, it has been getting errors like this for a few days now. The disk is patently unreliable but

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-20 Thread Brendan Hide
On 2014/11/21 06:58, Zygo Blaxell wrote: You have one reallocated sector, so the drive has lost some data at some time in the last 49000(!) hours. Normally reallocations happen during writes so the data that was lost was data you were in the process of overwriting anyway; however, the

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 9:40 PM, Chris Murphy wrote: It’s well known on linux-raid@ that consumer drives have well over 30 second deep recoveries when they lack SCT command support. The WDC and Seagate “green” drives are over 2 minutes apparently. This

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 9:46 PM, Duncan wrote: I'm not sure about normal operation, but certainly, many drives take longer than 30 seconds to stabilize after power-on, and I routinely see resets during this time. As far as I have seen, typical drive spin up

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Robert White
On 11/19/2014 08:07 AM, Phillip Susi wrote: On 11/18/2014 9:46 PM, Duncan wrote: I'm not sure about normal operation, but certainly, many drives take longer than 30 seconds to stabilize after power-on, and I routinely see resets during this time. As far as I have seen, typical drive spin up

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/19/2014 4:05 PM, Robert White wrote: It's cheaper, and less error prone, and less likely to generate customer returns if the generic controller chips just send init, wait a fixed delay, then request a status compared to trying to

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Robert White
Shame you already know everything? On 11/19/2014 01:47 PM, Phillip Susi wrote: On 11/19/2014 4:05 PM, Robert White wrote: One of the reasons that the whole industry has started favoring point-to-point (SATA, SAS) or physical intercessor chaining point-to-point (eSATA) buses is to remove a

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Robert White
P.S. On 11/19/2014 01:47 PM, Phillip Susi wrote: Another common cause is having a dedicated hardware RAID controller (dell likes to put LSI MegaRaid controllers in their boxes for example), many mother boards have hardware RAID support available through the bios, etc, leaving that feature

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Duncan
Phillip Susi posted on Wed, 19 Nov 2014 11:07:43 -0500 as excerpted: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 9:46 PM, Duncan wrote: I'm not sure about normal operation, but certainly, many drives take longer than 30 seconds to stabilize after power-on, and I routinely

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Chris Murphy
On Wed, Nov 19, 2014 at 8:11 AM, Phillip Susi ps...@ubuntu.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 9:40 PM, Chris Murphy wrote: It’s well known on linux-raid@ that consumer drives have well over 30 second deep recoveries when they lack SCT command support. The

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Duncan
Robert White posted on Wed, 19 Nov 2014 13:05:13 -0800 as excerpted: One of the reasons that the whole industry has started favoring point-to-point (SATA, SAS) or physical intercessor chaining point-to-point (eSATA) buses is to remove a lot of those wait-and-see delays. That said, you

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-19 Thread Robert White
On 11/19/2014 04:25 PM, Duncan wrote: Most often, however, it's at resume, not original startup, which is understandable as state at resume doesn't match state at suspend/ hibernate. The irritating thing, as previously discussed, is when one device takes long enough to come back that mdraid or

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Austin S Hemmelgarn
On 2014-11-18 02:29, Brendan Hide wrote: Hey, guys See further below extracted output from a daily scrub showing csum errors on sdb, part of a raid1 btrfs. Looking back, it has been getting errors like this for a few days now. The disk is patently unreliable but smartctl's output implies there

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Brendan Hide
On 2014/11/18 09:36, Roman Mamedov wrote: On Tue, 18 Nov 2014 09:29:54 +0200 Brendan Hide bren...@swiftspirit.co.za wrote: Hey, guys See further below extracted output from a daily scrub showing csum errors on sdb, part of a raid1 btrfs. Looking back, it has been getting errors like this for

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Brendan Hide
On 2014/11/18 14:08, Austin S Hemmelgarn wrote: [snip] there are some parts of the drive that aren't covered by SMART attributes on most disks, most notably the on-drive cache. There really isn't a way to disable the read cache on the drive, but you can disable write-caching. Its an old and

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Duncan
Brendan Hide posted on Tue, 18 Nov 2014 15:24:48 +0200 as excerpted: In this case, yup, its directly to the motherboard chipset's built-in ports. This is a very old desktop, and the other 3 disks don't have any issues. I'm checking out the alternative pointed out by Austin. SATA-relevant

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 7:08 AM, Austin S Hemmelgarn wrote: In addition to the storage controller being a possibility as mentioned in another reply, there are some parts of the drive that aren't covered by SMART attributes on most disks, most notably the

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 10:35 AM, Marc MERLIN wrote: Try running hdrecover on your drive, it'll scan all your blocks and try to rewrite the ones that are failing, if any: http://hdrecover.sourceforge.net/ He doesn't have blocks that are failing; he has

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Marc MERLIN
On Tue, Nov 18, 2014 at 11:04:00AM -0500, Phillip Susi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 10:35 AM, Marc MERLIN wrote: Try running hdrecover on your drive, it'll scan all your blocks and try to rewrite the ones that are failing, if any:

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 11:11 AM, Marc MERLIN wrote: That seems to be the case, but hdrecover will rule that part out at least. It's already ruled out: if the read failed that is what the error message would have said rather than a bad checksum. -BEGIN

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Chris Murphy
On Nov 18, 2014, at 8:35 AM, Marc MERLIN m...@merlins.org wrote: On Tue, Nov 18, 2014 at 09:29:54AM +0200, Brendan Hide wrote: Hey, guys See further below extracted output from a daily scrub showing csum errors on sdb, part of a raid1 btrfs. Looking back, it has been getting errors like

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 1:57 PM, Chris Murphy wrote: So a.) use smartctl -l scterc to change the value below 30 seconds (300 deciseconds) with 70 deciseconds being reasonable. If the drive doesn’t support SCT commands, then b.) change the linux scsi

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Chris Murphy
On Nov 18, 2014, at 1:58 PM, Phillip Susi ps...@ubuntu.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/18/2014 1:57 PM, Chris Murphy wrote: So a.) use smartctl -l scterc to change the value below 30 seconds (300 deciseconds) with 70 deciseconds being reasonable. If the

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-18 Thread Duncan
Phillip Susi posted on Tue, 18 Nov 2014 15:58:18 -0500 as excerpted: Are there really any that take longer than 30 seconds? That's enough time for thousands of retries. If it can't be read after a dozen tries, it ain't never gonna work. It seems absurd that a drive would keep trying for so

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-17 Thread Roman Mamedov
On Tue, 18 Nov 2014 09:29:54 +0200 Brendan Hide bren...@swiftspirit.co.za wrote: Hey, guys See further below extracted output from a daily scrub showing csum errors on sdb, part of a raid1 btrfs. Looking back, it has been getting errors like this for a few days now. The disk is