Re: Btrfs on a failing drive

2014-11-19 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Again, please stop taking this conversation private; keep the mailing
list on the Cc.

On 11/19/2014 11:37 AM, Fennec Fox wrote:
 well ive used spinrite and its found a few sectors   and they
 never move   so obviously the drives firmware isnt dealing with bad
 blocks on the drive   anyways ive got a new drive on order  but
 what can i do to prevent the drive from killing any more data?

The drive will only remap bad blocks when you try to write to them, so
if you haven't written to them then it is no surprise that they aren't
going anywhere.

If the drive is actually returning bad data rather than failing the
read outright, then the only thing you can do is to have btrfs
duplicate all data so if the checksum on one copy is bad it can try
the other.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbN8VAAoJEI5FoCIzSKrwGjkIAKxXbBcMaItyBe08yC/bipUH
2crWLj5MKej1sn1HEo1WqgJM1hCEZuHCBa8I6ZIECcZmzs4rvKhzU4WWIQ7J/tMN
8OYUzdsWboxbKHY5hrNEVsi8QcUTbz7HT3doaaYDhI7qERu1Ib/4FH+m5yFYEIu8
tx5+N2PzyXctDlNnjY/pcFg+I2+QyA5Rb9X+fLpvVoZCEW7TTMhejfKSQpMEfzHW
JsYyKwDpQO6cGIWi19P7pgHc2bsCzShPtFo9UQJh5TtuxjsqP01ju1UfQBX0+Y25
B2LDAjyGE71pY68tBuS7EC9XSB9Iks5yEJotmwYTv3/L7bgDeAGPrj5cFOKG9Tc=
=8JoK
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs on a failing drive

2014-11-18 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Please get in the habit of using your mail client's reply-to-all
button instead of reply; there is no need for us to take this
conversation private.

On 11/17/2014 10:15 PM, Fennec Fox wrote:
snip big smartctl output
 i know the drive is dying and needs replacing   but i need to keep 
 this drive arround for some time longer   as i cant run from a 32
 gb usbfar too slow

If it were just a few bad sectors, then you could deal with that by
writing to them, which would force the drive to reallocate them from
the spare pool.  I'd suggest you dd /dev/zero all over the drive so
everything is written to, then check the smart stats again.  If there
were no write errors, and the smart stats show zero pending sectors,
then everything has been reallocated and you should be ok to reformat
the drive and use it.

As I said before though, the errors you posted from dmesg don't
indicate that the drive failed to read sectors, but rather that it
returned incorrect data, and this is *NEVER* supposed to happen.

I'd suggest running a few passes of badblocks over the drive, testing
writing different patterns and verifying that they read back
correctly.  If it can't do that, then there's nothing for it but to
junk the drive.

badblocks -b 4096 -c 256 -s -t 00 /dev/sda

That will read the drive and verify that it is full of zeros.  If that
passes, write a different pattern to the disk and verify that reads
back correctly:

badblocks -b 4096 -c 256 -s -w /dev/sda


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUa2duAAoJEI5FoCIzSKrw+0AIAJNAqF1rY2m5Oalehr3dz+G4
O6h9XERRiTl8GVMgcj7ZybeP3sFroItgiki5UdhRsjNoPEPRQpv3hApY7p2cEUtk
yNn8jAeRBjA0kli+5HMHY3eHL4RmLO3mrLmNoAu5HShvWBE4zj/18vvk15m/u5rj
SnrxBUSQ91V0D6p/CFkjAX9iBZBoWx4+J7Wz8EOhqnFJbqXaCEOdj7NKrjQ/7r+Q
5gxQWD4x54NQSGPfexERtRRaL9drE3JoLTbOEC+xdt7a9MwHw5Z50DTfMRzibpFP
kdKlRCLMzcNGXSVt/187MMbpvROXBWhfmAAFOCz5rGtrGjX3V6+/7hpPBn5ft3E=
=L5No
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs on a failing drive

2014-11-18 Thread Duncan
Phillip Susi posted on Tue, 18 Nov 2014 10:36:14 -0500 as excerpted:

 As I said before though, the errors you posted from dmesg don't indicate
 that the drive failed to read sectors, but rather that it returned
 incorrect data, and this is *NEVER* supposed to happen.
 
 I'd suggest running a few passes of badblocks over the drive, testing
 writing different patterns and verifying that they read back correctly.

+1 for badblocks! =:^)

Tho a hint if you decide to test multiple drives as I did some years 
ago.  Doing a multiple passes (I'd suggest at least two) on a full drive 
can take QUITE some time (days), due to the shear volume of data to be 
written to the drive, then read back to verify, then written as a new 
pattern and read back again.

But unlike IDE, the bottleneck on at least spinning rust SATA (well, 
unless you go heavy port-multiplier) tends to be the platters themselves, 
not the buses as they're point-to-point now days, or the controllers.  
Generally you can process four or more drives in parallel without slowing 
down the individual results significantly at all.  Thus, while it takes 
days to test a single drive, it normally takes the same time to process 
four drives in parallel!  So if you have 4+ devices to badblocks-test, 
definitely setup four (perhaps more, depending on hardware layout) 
instances of badblocks running at once, one to each of the devices.  Cut 
your time for all four done serially to say 8 days, to only two days when 
done in parallel! =:^)

Of course good SSDs tend to be both many times faster and several times 
smaller in capacity, so a badblocks run on them should be MUCH faster, 
perhaps a couple hours vs a couple days, and much less parallelizable 
without slowing all of them down, since they tend to saturate the bus or 
close to it (the reason fast SATA-based SSDs all tend to rate similarly 
speed-wise, the SATA bus is the bottleneck and the PCI-E bus isn't /that/ 
far behind!, tho the PCIE bus can be /enough/ faster to give direct PCIE-
interface SSDs a definite speed boost over top-of-the-line SATA interface 
devices... for those that can afford their accordingly higher prices).

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs on a failing drive

2014-11-17 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 11/17/2014 05:55 PM, Fennec Fox wrote:
 well i am an arch linux user and machine owner using a failing
 drive its still relyable enough for me but btrfs seems not to mark
 bad blocks as unusable and continues to try to write to them. 
 /bbs.archlinux.org/viewtopic.php?pid=1476540#p1476540  this forum
 post has a few more details regarding the problem  i really need a
 bit of help  thank you

If indeed writes are failing then the drive is only suitable for a
door stop.  Drives remap bad sectors to a spare pool on write so if it
is now failing writes, it has already exhausted its spare pool and you
should have replaced it long ago.  Have a look at its SMART stats and
it will probably confirm the drive is fubar.


 [   83.050733] BTRFS info (device sda1): csum failed ino 3048916
 off 33030144 csum 1217419445 expected csum 510562246 [   83.052317]
 BTRFS info (device sda1): csum failed ino 3048916 off 33030144 csum
 1217419445 expected csum 510562246

That's not saying writes are failing; it is saying that your data has
been silently corrupted, which means the drive is the worst kind of
broken and should be thrown in a fire at once.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCgAGBQJUap4OAAoJENRVrw2cjl5RwBAH/1ceBd4i7WD7679x3bshYYTi
Lv63xLRMjbo+T0md3ptcndyxFbZlRdWQiJbIKT40yn9xnqOWeXWTkSmODqGyEOdC
M9HSlfZg8fOAha4kb7k1tzzqxdR1J3iAj03/G0B4+YKY0I7AaGdzhGLRAY8EVtRW
UVG99451wwRyUpg3YLk+n12MMSlq8Sy9XSjMU5/ECDzemH5GF6pPNi39nCy6JFti
oaTOwnAROfb7L3Y9ZBiIJ52Y7p4UIdS1jaSkLw0U2g0Gz+5V1/fb1hOhK5J/loYy
bC4JyoJsxn9GyJGwM93s64aWE5X+N+i7RzmysQVBI/3wepGXpG0Tsq37NnKB3iU=
=BctV
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs on a failing drive

2014-11-17 Thread Chris Murphy

On Nov 17, 2014, at 3:55 PM, Fennec Fox fennect...@gmail.com wrote:

 well i am an arch linux user and machine owner using a failing drive
  its still relyable enough for me but btrfs seems not to mark bad
 blocks as unusable and continues to try to write to them.

It’s supposed to do try to write to them. If there is actual persistent write 
failure it’s the job of the firmware to reassign the affected LBA to a reserve 
physical sector. If it can’t do this, the drive is no longer normally 
operating, it should return a write error and ideally Btrfs would refuse to use 
the drive at all. I don’t know if that device rejection code exists yet. It 
hasn’t been the job of the filesystem to keep track of bad physical sectors 
since ancient times.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html