Re: Frequent btrfs corruption on a USB flash drive

2016-07-08 Thread Henk Slager
>> Device is GOOD
>>
>> I also created a big file with dd using /dev/urandom with the same size
>> as my flash drive, copied it once and read it three times. The SHA-1
>> checksum is always the same and matches the original one on the hard disk.
>>
>> So after much testing I feel I can conclude that my USB flash drive is
>> not fake and it is not defective.
>>
> For what it's worth, there's multiple other things that could cause similar
> issues.  I've had a number of cases where bad USB hubs or poorly designed
> (or just buggy or failing) USB controllers caused similar data corruption,
> the most recent one being an issue with both a bad USB 2.0 hub (which did
> not properly implement the USB standard, counterfeit USB devices come in all
> types) and a malfunctioning USB 3.0 controller (which did not properly
> account for things that didn't properly implement the standard and had no
> recovery code to handle this in the drivers).  I ended up in most cases
> checking the ports using other USB devices (at least a keyboard, a mouse,
> and a USB serial adapter).

Similar as Austin, I also want to note that there might be USB related
issues that only pop-up after some time and not in tests.

For example, this weekend I connected a 2.5inch 500G drive with its
Y-cable to a H87M-Pro board that is fed by a 80+Gold PSU, despite its
many 'bad sectors' I remembered from 2 years ago in a btrfs raid1
setup. This 500G disk has worked well for almost 2 years connected to
a 7-inch eeepc4G, XFS formatted. But with the H87M-Pro I just now saw
that it dropped off the USB every now and then, causing trouble for
Btrfs.

For connecting harddisks to phones, I once bought an external powered
hub, and I put that between the board the the 500G disk => that made
it all stable, no disconnects and Btrfs works fine as expected. I had
similar issues on another PC with a Sandisk Extreme 64G USB3 stick,
but that was likely a protocol issue.

So maybe try to use the stick with your use case in another HW setup,
hopefully then it is stable for a longer time than the few days now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-08 Thread Austin S. Hemmelgarn

On 2016-07-08 12:10, Francesco Turco wrote:

On 2016-07-07 19:57, Chris Murphy wrote:

Use F3 to test flash:
http://oss.digirati.com.br/f3/


I tested my USB flash drive with F3 as you suggested, and there's no
indication it is a fake device.

---

# f3probe --destructive /dev/sdb
F3 probe 6.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

WARNING: Probing normally takes from a few seconds to 15 minutes, but
 it can take longer. Please be patient.

Good news: The device `/dev/sdb' is the real thing

Device geometry:
 *Usable* size: 57.69 GB (120979456 blocks)
Announced size: 57.69 GB (120979456 blocks)
Module: 64.00 GB (2^36 Bytes)
Approximate cache size: 0.00 Byte (0 blocks), need-reset=no
   Physical block size: 512.00 Byte (2^9 Bytes)

Probe time: 2'23"

--

$ f3read /run/media/fturco/a7d8a7b1-e0c2-4fbb-879f-e17046989f3c
  SECTORS  ok/corrupted/changed/overwritten
Validating file 1.h2w ... 2097152/0/  0/  0
Validating file 2.h2w ... 2097152/0/  0/  0
Validating file 3.h2w ... 2097152/0/  0/  0
Validating file 4.h2w ... 2097152/0/  0/  0
Validating file 5.h2w ... 2097152/0/  0/  0
Validating file 6.h2w ... 2097152/0/  0/  0
Validating file 7.h2w ... 2097152/0/  0/  0
Validating file 8.h2w ... 2097152/0/  0/  0
Validating file 9.h2w ... 2097152/0/  0/  0
Validating file 10.h2w ... 2097152/0/  0/  0
Validating file 11.h2w ... 2097152/0/  0/  0
Validating file 12.h2w ... 2097152/0/  0/  0
Validating file 13.h2w ... 2097152/0/  0/  0
Validating file 14.h2w ... 2097152/0/  0/  0
Validating file 15.h2w ... 2097152/0/  0/  0
Validating file 16.h2w ... 2097152/0/  0/  0
Validating file 17.h2w ... 2097152/0/  0/  0
Validating file 18.h2w ... 2097152/0/  0/  0
Validating file 19.h2w ... 2097152/0/  0/  0
Validating file 20.h2w ... 2097152/0/  0/  0
Validating file 21.h2w ... 2097152/0/  0/  0
Validating file 22.h2w ... 2097152/0/  0/  0
Validating file 23.h2w ... 2097152/0/  0/  0
Validating file 24.h2w ... 2097152/0/  0/  0
Validating file 25.h2w ... 2097152/0/  0/  0
Validating file 26.h2w ... 2097152/0/  0/  0
Validating file 27.h2w ... 2097152/0/  0/  0
Validating file 28.h2w ... 2097152/0/  0/  0
Validating file 29.h2w ... 2097152/0/  0/  0
Validating file 30.h2w ... 2097152/0/  0/  0
Validating file 31.h2w ... 2097152/0/  0/  0
Validating file 32.h2w ... 2097152/0/  0/  0
Validating file 33.h2w ... 2097152/0/  0/  0
Validating file 34.h2w ... 2097152/0/  0/  0
Validating file 35.h2w ... 2097152/0/  0/  0
Validating file 36.h2w ... 2097152/0/  0/  0
Validating file 37.h2w ... 2097152/0/  0/  0
Validating file 38.h2w ... 2097152/0/  0/  0
Validating file 39.h2w ... 2097152/0/  0/  0
Validating file 40.h2w ... 2097152/0/  0/  0
Validating file 41.h2w ... 2097152/0/  0/  0
Validating file 42.h2w ... 2097152/0/  0/  0
Validating file 43.h2w ... 2097152/0/  0/  0
Validating file 44.h2w ... 2097152/0/  0/  0
Validating file 45.h2w ... 2097152/0/  0/  0
Validating file 46.h2w ... 2097152/0/  0/  0
Validating file 47.h2w ... 2097152/0/  0/  0
Validating file 48.h2w ... 2097152/0/  0/  0
Validating file 49.h2w ... 2097152/0/  0/  0
Validating file 50.h2w ... 2097152/0/  0/  0
Validating file 51.h2w ... 2097152/0/  0/  0
Validating file 52.h2w ... 2097152/0/  0/  0
Validating file 53.h2w ... 2097152/0/  0/  0
Validating file 54.h2w ... 2097152/0/  0/  0
Validating file 55.h2w ... 2097152/0/  0/  0
Validating file 56.h2w ... 1364266/0/  0/  0

  Data OK: 55.65 GB (116707626 sectors)
Data LOST: 0.00 Byte (0 sectors)
   Corrupted: 0.00 Byte (0 sectors)
Slightly changed: 0.00 Byte (0 sectors)
 Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 34.73 MB/s




Read more, and also includes a much faster alternative for GNOME:
https://blogs.gnome.org/hughsie/2015/01/28/detecting-fake-flash/


I also tested my flash drive with gnome-multi-writer-probe, and it says
it is not fake:

# gnome-multi-writer-probe 

Re: Frequent btrfs corruption on a USB flash drive

2016-07-08 Thread Francesco Turco
On 2016-07-07 19:57, Chris Murphy wrote:
> Use F3 to test flash:
> http://oss.digirati.com.br/f3/

I tested my USB flash drive with F3 as you suggested, and there's no
indication it is a fake device.

---

# f3probe --destructive /dev/sdb
F3 probe 6.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

WARNING: Probing normally takes from a few seconds to 15 minutes, but
 it can take longer. Please be patient.

Good news: The device `/dev/sdb' is the real thing

Device geometry:
 *Usable* size: 57.69 GB (120979456 blocks)
Announced size: 57.69 GB (120979456 blocks)
Module: 64.00 GB (2^36 Bytes)
Approximate cache size: 0.00 Byte (0 blocks), need-reset=no
   Physical block size: 512.00 Byte (2^9 Bytes)

Probe time: 2'23"

--

$ f3read /run/media/fturco/a7d8a7b1-e0c2-4fbb-879f-e17046989f3c
  SECTORS  ok/corrupted/changed/overwritten
Validating file 1.h2w ... 2097152/0/  0/  0
Validating file 2.h2w ... 2097152/0/  0/  0
Validating file 3.h2w ... 2097152/0/  0/  0
Validating file 4.h2w ... 2097152/0/  0/  0
Validating file 5.h2w ... 2097152/0/  0/  0
Validating file 6.h2w ... 2097152/0/  0/  0
Validating file 7.h2w ... 2097152/0/  0/  0
Validating file 8.h2w ... 2097152/0/  0/  0
Validating file 9.h2w ... 2097152/0/  0/  0
Validating file 10.h2w ... 2097152/0/  0/  0
Validating file 11.h2w ... 2097152/0/  0/  0
Validating file 12.h2w ... 2097152/0/  0/  0
Validating file 13.h2w ... 2097152/0/  0/  0
Validating file 14.h2w ... 2097152/0/  0/  0
Validating file 15.h2w ... 2097152/0/  0/  0
Validating file 16.h2w ... 2097152/0/  0/  0
Validating file 17.h2w ... 2097152/0/  0/  0
Validating file 18.h2w ... 2097152/0/  0/  0
Validating file 19.h2w ... 2097152/0/  0/  0
Validating file 20.h2w ... 2097152/0/  0/  0
Validating file 21.h2w ... 2097152/0/  0/  0
Validating file 22.h2w ... 2097152/0/  0/  0
Validating file 23.h2w ... 2097152/0/  0/  0
Validating file 24.h2w ... 2097152/0/  0/  0
Validating file 25.h2w ... 2097152/0/  0/  0
Validating file 26.h2w ... 2097152/0/  0/  0
Validating file 27.h2w ... 2097152/0/  0/  0
Validating file 28.h2w ... 2097152/0/  0/  0
Validating file 29.h2w ... 2097152/0/  0/  0
Validating file 30.h2w ... 2097152/0/  0/  0
Validating file 31.h2w ... 2097152/0/  0/  0
Validating file 32.h2w ... 2097152/0/  0/  0
Validating file 33.h2w ... 2097152/0/  0/  0
Validating file 34.h2w ... 2097152/0/  0/  0
Validating file 35.h2w ... 2097152/0/  0/  0
Validating file 36.h2w ... 2097152/0/  0/  0
Validating file 37.h2w ... 2097152/0/  0/  0
Validating file 38.h2w ... 2097152/0/  0/  0
Validating file 39.h2w ... 2097152/0/  0/  0
Validating file 40.h2w ... 2097152/0/  0/  0
Validating file 41.h2w ... 2097152/0/  0/  0
Validating file 42.h2w ... 2097152/0/  0/  0
Validating file 43.h2w ... 2097152/0/  0/  0
Validating file 44.h2w ... 2097152/0/  0/  0
Validating file 45.h2w ... 2097152/0/  0/  0
Validating file 46.h2w ... 2097152/0/  0/  0
Validating file 47.h2w ... 2097152/0/  0/  0
Validating file 48.h2w ... 2097152/0/  0/  0
Validating file 49.h2w ... 2097152/0/  0/  0
Validating file 50.h2w ... 2097152/0/  0/  0
Validating file 51.h2w ... 2097152/0/  0/  0
Validating file 52.h2w ... 2097152/0/  0/  0
Validating file 53.h2w ... 2097152/0/  0/  0
Validating file 54.h2w ... 2097152/0/  0/  0
Validating file 55.h2w ... 2097152/0/  0/  0
Validating file 56.h2w ... 1364266/0/  0/  0

  Data OK: 55.65 GB (116707626 sectors)
Data LOST: 0.00 Byte (0 sectors)
   Corrupted: 0.00 Byte (0 sectors)
Slightly changed: 0.00 Byte (0 sectors)
 Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 34.73 MB/s



> Read more, and also includes a much faster alternative for GNOME:
> https://blogs.gnome.org/hughsie/2015/01/28/detecting-fake-flash/

I also tested my flash drive with gnome-multi-writer-probe, and it says
it is not fake:

# gnome-multi-writer-probe /dev/sdb
Device is GOOD

I also created a big 

Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Chris Murphy
On Thu, Jul 7, 2016 at 4:38 PM, Andrew E. Mileski  wrote:
> On 2016-07-07 17:13, Francesco Turco wrote:
>>
>>
>> On 2016-07-07 23:11, Andrew E. Mileski wrote:
>>>
>>> How large is this USB flash device?
>>
>>
>> 64 GB.
>>
>
> I don't know if there is an official recommended minimum size for btrfs, but
> I would expect 64 GB to be okay.

In my similar case, it was a 16GiB stick, but the Btrfs on LUKS
partition was maybe 4GiB. Again I used -M and ran into zero problems
in ~6 months of almost daily usage. But not rsync. I was using it for
encrypted /home.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Andrew E. Mileski

On 2016-07-07 17:13, Francesco Turco wrote:


On 2016-07-07 23:11, Andrew E. Mileski wrote:

How large is this USB flash device?


64 GB.



I don't know if there is an official recommended minimum size for btrfs, but I 
would expect 64 GB to be okay.


I've personally set my minimum recommendation for btrfs at 120 GB based on my 
experience with failures in various flash devices from 4 to 30 GB.


If you want to experiment, I have a theory that formatting single volumes with 
"-m single" can avoid a potential controller race in one specific situation, 
plus it helps to reduce the meta overhead on smaller devices.


Lastly, the last two USB issues I investigated were both fixed by replacing the 
cables.  Something to try if it is a cabled device.


~~
Andrew E. Mileski
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Andrew E. Mileski

On 2016-07-07 09:49, Francesco Turco wrote:

I have a USB flash drive with an encrypted Btrfs filesystem where I
store daily backups. My problem is that this btrfs filesystem gets
corrupted very often, after a few days of usage. Usually I just reformat
it and move along, but this time I'd like to understand the root cause
of the problem and fix it.


How large is this USB flash device?

I've had issues with btrfs and small devices, where a 1 GB data chunk is 
relatively large.


~~
Andrew E. Mileski
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Francesco Turco

On 2016-07-07 23:11, Andrew E. Mileski wrote:
> How large is this USB flash device?

64 GB.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Francesco Turco
On 2016-07-07 20:25, Chris Murphy wrote:
> On Thu, Jul 7, 2016 at 8:55 AM, Francesco Turco  wrote:
>> Perhaps I
>> should try to rule out an hardware problem by filling my USB flash drive
>> with a large random file and then checking if its SHA-1 checksum
>> corresponds to the original copy on the hard disk. But first I probably
>> should backup the current Btrfs filesystem with the dd command. Can I
>> proceed?
> 
> https://btrfs.wiki.kernel.org/index.php/Gotchas

Thank you for the link, I didn't know that using LVM snapshots or
mounting dd copies can create problems! That could explain the reason
for some of the problems I had in the past.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Chris Murphy
On Thu, Jul 7, 2016 at 8:55 AM, Francesco Turco  wrote:

> I'm not sure. Commands don't fail explicitely when I use ext4, but I
> agree with you that I may get corruption silently nonetheless.

Use XFS v5 format which is the default in xfsprogs 3.2.3 and later. It
at least checksums metadata.

> Perhaps I
> should try to rule out an hardware problem by filling my USB flash drive
> with a large random file and then checking if its SHA-1 checksum
> corresponds to the original copy on the hard disk. But first I probably
> should backup the current Btrfs filesystem with the dd command. Can I
> proceed?

https://btrfs.wiki.kernel.org/index.php/Gotchas


>> Just to clarify, you're using BTRFS on top of disk encryption (LUKS? Or
>> is it just raw encryption, or even something completely different?), on
>> a USB flash drive (not a USB to SATA adapter with an SSD or HDD in it),
>> correct?
>
> I'm using a btrfs filesystem on a GUID partition encrypted with LUKS.
> It's a Kingston USB flash drive connected directly to my desktop machine
> via USB. It's definitively not a SSD or a HDD, and I'm not using any
> adapter.

First definitely check to make sure it's not fake. It's a well known
brand and there's a lot of incentive to make fake Kingston devices. I
have a Kingston DTR500 and have used it in the same use case you have,
Btrfs on LUKS, for maybe 6 months with no corruptions. In my case I
formatted with -M (mixed bg), and it was with kernels older than 4.x,
but otherwise sounds the same. Granted, individual units of the same
model can have big differences let alone between models. But if it's a
Btrfs bug, it might be a regression.

I wonder if this might be a use case for one of the integrity check
mount options? It slows things down a lot but the extra checking might
help pin point at least the moment something bad is happening.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Chris Murphy
On Thu, Jul 7, 2016 at 7:49 AM, Francesco Turco  wrote:

> $ btrfs filesystem show
> /run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3
> $

Try it with sudo. I think it's a bug that 'btrfs fi show' returns
silently for non-root. It should produce an error that root privileges
are needed, or it should work for unprivileged users.




> Btrfs-check reports many errors. I attached the output to this e-mail
> message.
>
> Output from dmesg:
>
> $ dmesg | tail
> [18756.159963] BTRFS error (device dm-4): bad tree block start
> 6592115285688248773 35323904

The problem happened before this, so I think we need the entire dmesg.



> I checked this USB flash drive with badblocks in non-destructive
> read-write mode. No errors.

Use F3 to test flash:
http://oss.digirati.com.br/f3/

Some distros have it in their repo, Fedora does. It's a bit
unintuitive what you need to do is use the write binary to write the
test files to the stick (this is destructive) and then use the read
binary to read back the written files.

Read more, and also includes a much faster alternative for GNOME:
https://blogs.gnome.org/hughsie/2015/01/28/detecting-fake-flash/




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Austin S. Hemmelgarn

On 2016-07-07 10:55, Francesco Turco wrote:

On 2016-07-07 16:27, Austin S. Hemmelgarn wrote:

This seems odd, are you trying to access anything over NFS or some other
network filesystem protocol here?  If not, then I believe you've found a
bug, because I'm pretty certain we shouldn't be returning -ESTALE for
anything.


No, I don't use NFS or any other network filesystem.
OK, I'm going to try and check the kernel code to figure out if there's 
any other case we might return that in.  I'm pretty certain that there's 
nowhere BTRFS should return that though, which means you've either hit a 
bug or have some other hardware issue (Given past experience, I think 
it's more likely that you've hit a bug).



The question here is: Do you get any data corruption when using ext4?
Quite often when there's a hardware issue, you won't see _any_
indication of it other than corrupted files when using something like
ext4 or XFS, but it will show up almost immediately with BTRFS because
we validate checksums on almost everything.  There have been at least a
couple of times I've found disk issues while converting from ext4 to
BTRFS that I didn't know existed before, and then going back was able to
reliable reproduce using other tools.

Also, FWIW, badblocks is not necessarily a reliable test method for
flash drives, they often handle serialized reads like badblocks does
very well even when failing.


I'm not sure. Commands don't fail explicitely when I use ext4, but I
agree with you that I may get corruption silently nonetheless. Perhaps I
should try to rule out an hardware problem by filling my USB flash drive
with a large random file and then checking if its SHA-1 checksum
corresponds to the original copy on the hard disk. But first I probably
should backup the current Btrfs filesystem with the dd command. Can I
proceed?
Yeah, I would suggest backing up the filesystem, be careful that you 
don't have both copies of the filesystem visible to the system at the 
same time once you've finished creating the backup copy though, as there 
are potential issues if you have both visible while trying to mount the FS.


As far as checking the drive, I'd do essentially what you had said, with 
two extra parts:
1. Calculate the checksum of the data on the drive multiple times and 
make sure that it matches each time as well as matching the original 
file (if it doesn't match the original file, but each calculation from 
the drive matches, then the issue is something in the write path only).
2. Do so multiple times so you can be sure to cover _every_ block.  Most 
flash drives have a pool of spare blocks that are used for wear 
leveling, and if the issue is in one of those, this is the only way to 
find it.


You might also try doing some testing with FIO or iozone, those tend to 
exercise a wider variety of things than stuff like badblocks or dd. 
Also, since you'll have a backup copy of the FS, you might consider 
running a destructive test with badblocks (it works a bit more reliably 
on flash devices this way, just make sure to run it multiple times too), 
both with and without the -B option (-B affects how things are buffered, 
if you see errors with it enabled but none without it, then you probably 
have some bad RAM).



Just to clarify, you're using BTRFS on top of disk encryption (LUKS? Or
is it just raw encryption, or even something completely different?), on
a USB flash drive (not a USB to SATA adapter with an SSD or HDD in it),
correct?


I'm using a btrfs filesystem on a GUID partition encrypted with LUKS.
It's a Kingston USB flash drive connected directly to my desktop machine
via USB. It's definitively not a SSD or a HDD, and I'm not using any
adapter.
OK, that both simplifies things, and makes them a bit more complicated. 
If it had been a SSD or HDD connected through an adapter, the preferred 
method of checking would be to pull it out and put it directly in the 
system to verify the drive.  However, since it's a regular flash drive, 
if it is the drive, it will probably be significantly less expensive to 
replace.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Francesco Turco
On 2016-07-07 16:27, Austin S. Hemmelgarn wrote:
> This seems odd, are you trying to access anything over NFS or some other
> network filesystem protocol here?  If not, then I believe you've found a
> bug, because I'm pretty certain we shouldn't be returning -ESTALE for
> anything.

No, I don't use NFS or any other network filesystem.

> The question here is: Do you get any data corruption when using ext4?
> Quite often when there's a hardware issue, you won't see _any_
> indication of it other than corrupted files when using something like
> ext4 or XFS, but it will show up almost immediately with BTRFS because
> we validate checksums on almost everything.  There have been at least a
> couple of times I've found disk issues while converting from ext4 to
> BTRFS that I didn't know existed before, and then going back was able to
> reliable reproduce using other tools.
> 
> Also, FWIW, badblocks is not necessarily a reliable test method for
> flash drives, they often handle serialized reads like badblocks does
> very well even when failing.

I'm not sure. Commands don't fail explicitely when I use ext4, but I
agree with you that I may get corruption silently nonetheless. Perhaps I
should try to rule out an hardware problem by filling my USB flash drive
with a large random file and then checking if its SHA-1 checksum
corresponds to the original copy on the hard disk. But first I probably
should backup the current Btrfs filesystem with the dd command. Can I
proceed?

> Just to clarify, you're using BTRFS on top of disk encryption (LUKS? Or
> is it just raw encryption, or even something completely different?), on
> a USB flash drive (not a USB to SATA adapter with an SSD or HDD in it),
> correct?

I'm using a btrfs filesystem on a GUID partition encrypted with LUKS.
It's a Kingston USB flash drive connected directly to my desktop machine
via USB. It's definitively not a SSD or a HDD, and I'm not using any
adapter.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Austin S. Hemmelgarn

On 2016-07-07 09:49, Francesco Turco wrote:

I have a USB flash drive with an encrypted Btrfs filesystem where I
store daily backups. My problem is that this btrfs filesystem gets
corrupted very often, after a few days of usage. Usually I just reformat
it and move along, but this time I'd like to understand the root cause
of the problem and fix it.

I can mount the partition without problems, but then when using commands
such as rsync or even humble ls I get the following error message:

$ rsync /home/fturco/Buffer/E-book/
/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Buffer/E-book/
--recursive
rsync:
readlink_stat("/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Riviste")
failed: Stale file handle (116)
rsync:
readlink_stat("/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Backup")
failed: Stale file handle (116)
rsync:
readdir("/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Calibre
(TMSU)"): Input/output error (5)
This seems odd, are you trying to access anything over NFS or some other 
network filesystem protocol here?  If not, then I believe you've found a 
bug, because I'm pretty certain we shouldn't be returning -ESTALE for 
anything.


The previous command gets stuck and I had to manually stop it.

The following command doesn't return any output, but its exit code is 1
(failure):

$ btrfs filesystem show
/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3
$
Something is definitely wrong here.  Unless Parabola has seriously 
modified btrfs-progs, this should be spitting out info about the devices 
and filesystem usage.  This may be a result of the errors seen by check, 
but I doubt that


Btrfs-check reports many errors. I attached the output to this e-mail
message.
Looking at this, I see a couple of things I know it should fix correctly 
(the 'errors 2001' stuff is fixable, and I'm pretty certain that the 
'errors 200' thing is too, and I think it will fix the bytenr mismatch 
stuff mostly safely), but there's enough I'm not sure about that I can't 
in good conscience recommend that you run check with --repair, as it may 
make things worse.  Hopefully someone who actually understands what the 
other things actually mean can provide more help on that.


Output from dmesg:

$ dmesg | tail
[18756.159963] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.160828] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18756.161821] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.163047] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18756.163921] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.164806] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18756.165673] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.166548] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18757.950603] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18757.951492] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904

I checked this USB flash drive with badblocks in non-destructive
read-write mode. No errors.

If I format this partition as Ext4 instead of Btrfs I can use it without
problems, but my goal is to use Btrfs on all devices.
The question here is: Do you get any data corruption when using ext4? 
Quite often when there's a hardware issue, you won't see _any_ 
indication of it other than corrupted files when using something like 
ext4 or XFS, but it will show up almost immediately with BTRFS because 
we validate checksums on almost everything.  There have been at least a 
couple of times I've found disk issues while converting from ext4 to 
BTRFS that I didn't know existed before, and then going back was able to 
reliable reproduce using other tools.


Also, FWIW, badblocks is not necessarily a reliable test method for 
flash drives, they often handle serialized reads like badblocks does 
very well even when failing.


Just to clarify, you're using BTRFS on top of disk encryption (LUKS? Or 
is it just raw encryption, or even something completely different?), on 
a USB flash drive (not a USB to SATA adapter with an SSD or HDD in it), 
correct?


My GNU/Linux distribution is Parabola GNU/Linux-libre.
Kernel version is: 4.6.3.
Btrfs-progs version is: 4.6

Please tell me if you need other details. Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Frequent btrfs corruption on a USB flash drive

2016-07-07 Thread Francesco Turco
I have a USB flash drive with an encrypted Btrfs filesystem where I
store daily backups. My problem is that this btrfs filesystem gets
corrupted very often, after a few days of usage. Usually I just reformat
it and move along, but this time I'd like to understand the root cause
of the problem and fix it.

I can mount the partition without problems, but then when using commands
such as rsync or even humble ls I get the following error message:

$ rsync /home/fturco/Buffer/E-book/
/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Buffer/E-book/
--recursive
rsync:
readlink_stat("/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Riviste")
failed: Stale file handle (116)
rsync:
readlink_stat("/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Backup")
failed: Stale file handle (116)
rsync:
readdir("/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3/Calibre
(TMSU)"): Input/output error (5)

The previous command gets stuck and I had to manually stop it.

The following command doesn't return any output, but its exit code is 1
(failure):

$ btrfs filesystem show
/run/media/fturco/5283147c-b7b4-448f-97b0-b235344a56a3
$

Btrfs-check reports many errors. I attached the output to this e-mail
message.

Output from dmesg:

$ dmesg | tail
[18756.159963] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.160828] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18756.161821] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.163047] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18756.163921] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.164806] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18756.165673] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18756.166548] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904
[18757.950603] BTRFS error (device dm-4): bad tree block start
6592115285688248773 35323904
[18757.951492] BTRFS error (device dm-4): bad tree block start
8533404122473270145 35323904

I checked this USB flash drive with badblocks in non-destructive
read-write mode. No errors.

If I format this partition as Ext4 instead of Btrfs I can use it without
problems, but my goal is to use Btrfs on all devices.

My GNU/Linux distribution is Parabola GNU/Linux-libre.
Kernel version is: 4.6.3.
Btrfs-progs version is: 4.6

Please tell me if you need other details. Thanks.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
# btrfs check --readonly /dev/mapper/luks-08e23ed4-a2a1-41f0-a5f6-794ff0647ada
Checking filesystem on /dev/mapper/luks-08e23ed4-a2a1-41f0-a5f6-794ff0647ada
UUID: 5283147c-b7b4-448f-97b0-b235344a56a3
checking extents
checksum verify failed on 35274752 found E042416D wanted 4CD1CFA0
checksum verify failed on 35274752 found E042416D wanted 4CD1CFA0
checksum verify failed on 35274752 found E8B38F1B wanted B3F4F728
checksum verify failed on 35274752 found E042416D wanted 4CD1CFA0
bytenr mismatch, want=35274752, have=6970279768983377651
checksum verify failed on 35291136 found 6B9667D1 wanted CDED2E29
checksum verify failed on 35291136 found 6B9667D1 wanted CDED2E29
checksum verify failed on 35291136 found 607F5103 wanted F21126A3
checksum verify failed on 35291136 found 6B9667D1 wanted CDED2E29
bytenr mismatch, want=35291136, have=16962852950865328208
checksum verify failed on 35307520 found 088ACE59 wanted 22164173
checksum verify failed on 35307520 found 088ACE59 wanted 22164173
checksum verify failed on 35307520 found F59BACEE wanted E647A1CD
checksum verify failed on 35307520 found 088ACE59 wanted 22164173
bytenr mismatch, want=35307520, have=16013504349018505369
checksum verify failed on 35323904 found CA154283 wanted 10E9FA6B
checksum verify failed on 35323904 found CA154283 wanted 10E9FA6B
checksum verify failed on 35323904 found 4DA7B234 wanted 794014C7
checksum verify failed on 35323904 found 4DA7B234 wanted 794014C7
bytenr mismatch, want=35323904, have=8533404122473270145
parent transid verify failed on 35340288 wanted 44 found 37
parent transid verify failed on 35340288 wanted 44 found 37
parent transid verify failed on 35340288 wanted 44 found 37
parent transid verify failed on 35340288 wanted 44 found 37
Ignoring transid failure
leaf parent key incorrect 35340288
bad block 35340288
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
parent transid verify failed on 35340288 wanted 44 found 37
Ignoring transid failure
parent transid verify failed on 35340288 wanted 44 found 37
Ignoring transid failure
parent transid verify failed on 35340288 wanted 44 found 37
Ignoring transid failure
parent transid verify failed on 35340288 wanted 44 found 37
Ignoring transid failure
checksum verify failed on 35274752 found E042416D wanted 4CD1CFA0
checksum