Re: Recovering BTRFS from bcache failure.

2015-04-09 Thread Dan Merillat
On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote:
 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super


 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system

 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)

 So I can't even send an image for people to look at.

CCing some more people on this one, while this filesystem isn't
important I'd like to know that restore from backup isn't the only
option for BTRFS corruption.  All of the tools simply throw up their
hands and bail when confronted with this filesystem, even btrfs-image.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat
It's a known bug with bcache and enabling discard, it was discarding
sections containing data it wanted.  After a reboot bcache refused to
accept the cache data, and of course it was dirty because I'm frankly
too stupid to breathe sometimes.

So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
rescue the btrfs data that it trashed.


On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote:
 Hello,

 I had some luck in the past with btrfs restore using the -r option. I don't
 recall how I determined the roots... Maybe I tried random numbers? I was
 able to recover nearly all of my data from a bcache related crash from over
 a year ago.

 What kind of bcache failure did you see? I've been doing some testing
 recently and ran into 2 bcache failures. With both of these failures, I had
 a ' bad btree header at bucket' error message (which is entirely different
 from the crash I had over a year back). I'm currently trying a different SSD
 to see if that alleviates the issue. The error makes me think that it's a
 bcache specific issue that's unrelated to btrfs or possibly (in my case) an
 issue with the previous SSD.

 Did you encounter this same error?

 With my 2 most recent crashes, I didn't try to recover very hard (or even
 try 'btrfs recover; at all) as I've been taking daily backups. I did try
 btrfsck, and not only would it fail, it would segfault.

 -Cameron


 On 04/08/2015 11:07 AM, Dan Merillat wrote:

 Any ideas on where to start with this?  I did flush the cache out to
 disk before I made changes to the bcache configuration, so there
 shouldn't be anything completely missing, just some bits of stale
 metadata.  If I can get the tools to take the closest match and run
 with it it would probably recover nearly everything.

 At worst, is there a way to scan the metadata blocks and rebuild from
 found extent-trees?




 On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com
 wrote:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent 

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat
Sorry I pressed send before I finished my thoughts.

btrfs restore gets nowhere with any options.  btrfs-recover says the
superblocks are fine, and chunk recover does nothing after a few hours
of reading.

Everything else bails out with the errors I listed above.

On Wed, Apr 8, 2015 at 2:36 PM, Dan Merillat dan.meril...@gmail.com wrote:
 It's a known bug with bcache and enabling discard, it was discarding
 sections containing data it wanted.  After a reboot bcache refused to
 accept the cache data, and of course it was dirty because I'm frankly
 too stupid to breathe sometimes.

 So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
 rescue the btrfs data that it trashed.


 On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote:
 Hello,

 I had some luck in the past with btrfs restore using the -r option. I don't
 recall how I determined the roots... Maybe I tried random numbers? I was
 able to recover nearly all of my data from a bcache related crash from over
 a year ago.

 What kind of bcache failure did you see? I've been doing some testing
 recently and ran into 2 bcache failures. With both of these failures, I had
 a ' bad btree header at bucket' error message (which is entirely different
 from the crash I had over a year back). I'm currently trying a different SSD
 to see if that alleviates the issue. The error makes me think that it's a
 bcache specific issue that's unrelated to btrfs or possibly (in my case) an
 issue with the previous SSD.

 Did you encounter this same error?

 With my 2 most recent crashes, I didn't try to recover very hard (or even
 try 'btrfs recover; at all) as I've been taking daily backups. I did try
 btrfsck, and not only would it fail, it would segfault.

 -Cameron


 On 04/08/2015 11:07 AM, Dan Merillat wrote:

 Any ideas on where to start with this?  I did flush the cache out to
 disk before I made changes to the bcache configuration, so there
 shouldn't be anything completely missing, just some bits of stale
 metadata.  If I can get the tools to take the closest match and run
 with it it would probably recover nearly everything.

 At worst, is there a way to scan the metadata blocks and rebuild from
 found extent-trees?




 On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com
 wrote:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring 

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Kai Krakow
 on the first post. Not sure, what you tried until now (except 
btrfs restore).

If you don't have time, this is all you can try now. If your data is 
valuable - well, you have to wait for the experts here. But general opinion 
here is: If you don't have backups, your data is not valuable by definition, 
especially if using an unmature fs, or an even more experimental setup like 
bcache. ;-)

PS: btrfs-zero-log is usually my personal first-resort in case of problems 
not fixable within a few tries. I got nightly backups to restore from and 
compare data after zero-log. Thus why I started my answer with it.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat
Any ideas on where to start with this?  I did flush the cache out to
disk before I made changes to the bcache configuration, so there
shouldn't be anything completely missing, just some bits of stale
metadata.  If I can get the tools to take the closest match and run
with it it would probably recover nearly everything.

At worst, is there a way to scan the metadata blocks and rebuild from
found extent-trees?




On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote:
 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super


 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system

 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)

 So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovering BTRFS from bcache failure.

2015-04-07 Thread Dan Merillat
Bcache failures are nasty, because they leave a mix of old and new
data on the disk.  In this case, there was very little dirty data, but
of course the tree roots were dirty and out-of-sync.

fileserver:/usr/src/btrfs-progs# ./btrfs --version
Btrfs v3.18.2

kernel version 3.18

[  572.573566] BTRFS info (device bcache0): enabling auto recovery
[  572.573619] BTRFS info (device bcache0): disk space caching is enabled
[  574.266055] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.276952] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277008] BTRFS: failed to read tree root on bcache0
[  574.277187] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277356] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277398] BTRFS: failed to read tree root on bcache0
[  574.285955] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 613694
[  574.298741] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 610499
[  574.298804] BTRFS: failed to read tree root on bcache0
[  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
[  575.111495] BTRFS (device bcache0): parent transid verify failed on
7567954464768 wanted 613688 found 613685
[  575.111559] BTRFS: failed to read tree root on bcache0
[  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
[  575.131803] BTRFS (device bcache0): parent transid verify failed on
7567954214912 wanted 613687 found 613680
[  575.131866] BTRFS: failed to read tree root on bcache0
[  575.180101] BTRFS: open_ctree failed

all the btrfs tools throw up their hands with similar errors:
ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super


fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
--init-extent-tree
enabling repair mode
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Couldn't open file system

Annoyingly:
# ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Open ctree failed
create failed (Success)

So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Suggestion on reducing short kernel hangs from my btrfs filesystems: bcache?

2014-11-14 Thread Marc MERLIN
I have a server which runs zoneminder (video recording which is CPU and
disk IO intensive) while also doing a bunch of I/O over serial ports.

I have a a dual core
Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz
(4 virtual CPUs in /proc/cpuinfo)

It's pretty clear that when zoneminder is doing more work, my programs
that talk to serial ports start failing due to delays on the kernel side
and desynchronization, causing serial port protocol errors (I'm using
USB serial adapters, and use 12 of them).
I'm pretty sure it's because of delays in the kernel more than user
space, but can't prove that easily.

I have a preempt kernel, kernel 3.16.3:
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y

From what I can tell, things did get worse after I upgraded from ext4 to
btrfs (not counting times where I resync the software raid5 underneath
or run a btrfs scrub).

I may try to see if VOLPREMPT might work better, but I'm thinking
putting an SSD in front of that mdadm RAID5 array will help by relieving
the IO load and hopefully giving more time for the CPU to handle serial
port requests.
I'm actually not sure if my issue is btrfs interrupting serial port
connections due to PREEMPT, or if serial port connections aren't being
serviced quickly enough because the kernel is busy with btrfs and PREMPT
hasn't kicked in yet.

From reading the list, bcache may work with btrfs, but before I try
that, I was curious if there are other or better ways to use an SSD to
make btrfs less impacting on my server?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-08-20 Thread raphead

Hi,
has this issue been resolved?
I would like to use the bcache + btrfs combo.
Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-08-04 Thread Fábio Pfeifer
After completely loosing my filesystem twice because of this bug, I gave
up using btrfs on top of bcache (also writeback). In my case, I used to
have some subvolumes and some snapshot of these subvolumes, but not many
of them. The btrfs mantra backup, bakcup and backup saved me.

Best regards,

Fábio Pfeifer

2014-07-30 20:01 GMT-03:00 Larkin Lowrey llow...@nuclearwinter.com:
 I've been running two backup servers, with 25T and 20T of data, using
 btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
 scrubs and backup verifies (SHA1 hashes) and have never had a corruption
 issue.

 My use of btrfs is simple, though, with no subvolumes and no btrfs level
 raid. My bcache backing devices are LVM volumes that span multiple md
 raid6 arrays. So, either the bug has been fixed or my configuration is
 not susceptible.

 I'm running kernel 3.15.5-200.fc20.x86_64.

 --Larkin

 On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
 this bug still exists?

 Kernel 3.14
 B: 2x HDD 1 TB
 C: 1x SSD 256 GB

 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

 I still have no incomplete page write messages in dmesg | grep btrfs and 
 the checksums of some manually reviewed files are okay.

 Who has more experiences about this?

 Thanks,

 - dp
 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on bcache

2014-07-31 Thread dptrash
Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
this bug still exists?

Kernel 3.14
B: 2x HDD 1 TB
C: 1x SSD 256 GB

# make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
# mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

I still have no incomplete page write messages in dmesg | grep btrfs and 
the checksums of some manually reviewed files are okay.

Who has more experiences about this?

Thanks,

- dp
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-07-31 Thread Duncan
dptrash posted on Thu, 31 Jul 2014 17:35:44 +0200 as excerpted:

 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018,
 does this bug still exists?
 
 Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB
 
 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
 --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1
 
 I still have no incomplete page write messages in dmesg | grep btrfs
 and the checksums of some manually reviewed files are okay.
 
 Who has more experiences about this?

See the reply (not mine) to your earlier post of the question:

http://permalink.gmane.org/gmane.linux.kernel.bcache.devel/2602

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on bcache

2014-07-30 Thread dptrash
Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
this bug still exists?

Kernel 3.14
B: 2x HDD 1 TB
C: 1x SSD 256 GB

# make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
# mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

I still have no incomplete page write messages in dmesg | grep btrfs and 
the checksums of some manually reviewed files are okay.

Who has more experiences about this?

Thanks,

- dp
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-07-30 Thread Larkin Lowrey
I've been running two backup servers, with 25T and 20T of data, using
btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
scrubs and backup verifies (SHA1 hashes) and have never had a corruption
issue.

My use of btrfs is simple, though, with no subvolumes and no btrfs level
raid. My bcache backing devices are LVM volumes that span multiple md
raid6 arrays. So, either the bug has been fixed or my configuration is
not susceptible.

I'm running kernel 3.15.5-200.fc20.x86_64.

--Larkin

On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
 this bug still exists?

 Kernel 3.14
 B: 2x HDD 1 TB
 C: 1x SSD 256 GB

 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

 I still have no incomplete page write messages in dmesg | grep btrfs and 
 the checksums of some manually reviewed files are okay.

 Who has more experiences about this?

 Thanks,

 - dp
 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-05-01 Thread Austin S Hemmelgarn
On 2014-04-30 14:16, Felix Homann wrote:
 Hi,
 a couple of months ago there has been some discussion about issues
 when using btrfs on bcache:
 
 http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018
 
 From looking at the mailing list archives I cannot tell whether or not
 this issue has been resolved in current kernels from either bcache's
 or btrfs' side.
 
 Can anyone tell me what's the current state of this issue? Should it
 be safe to use btrfs on bcache by now?

In all practicality, I don't think anyone who frequents the list knows.
 I do know that there are a number of people (myself included) who avoid
bcache in general because of having issues with seemingly random kernel
OOPSes when it is linked in (either as a module or compiled in), even
when it isn't being used.  My advice would be to just test it with some
non-essential data (maybe set up a virtual machine?).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on bcache

2014-04-30 Thread Felix Homann
Hi,
a couple of months ago there has been some discussion about issues
when using btrfs on bcache:

http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018

From looking at the mailing list archives I cannot tell whether or not
this issue has been resolved in current kernels from either bcache's
or btrfs' side.

Can anyone tell me what's the current state of this issue? Should it
be safe to use btrfs on bcache by now?

Thanks and kind regards,
Felix
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-01-08 Thread Chris Mason
On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote:
 On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote:
  On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
   Hello,
   
   I put the WARN_ON(1); after the printk lines (incomplete page read
   and incomplete page write) in extent_io.c.
   
   here some call traces:
   
   [   19.509497] incomplete page read in btrfs with offset 2560 and length 
   1536
   [   19.509500] [ cut here ]
   [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
   end_bio_extent_readpage+0x788/0xc20 [btrfs]()
   [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
   iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
   ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
   evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
   i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
   snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
   processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
   usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
   ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
   scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
   [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
   W  O 3.12.5-1-ARCH #1
   [   19.509580] Hardware name: System manufacturer System Product
   Name/P5WDG2 WS Pro, BIOS 090503/06/2008
   [   19.509581]  0009 880231a63cb0 814ee37b
   
   [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
   
   [   19.509587]  8802320cc9c0  880233b0e000
   880231a63cf8
   [   19.509590] Call Trace:
   [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
   [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
   [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
   [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
   [btrfs]
  
  This should mean that bcache is either failing to read some blocks
  properly or is fiddling with the bv_len/bv_offset fields.
  
  Could someone from bcache comment?
 
 Oh man, I found this and then threw up my hands in despair.
 
 Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the
 biovec so it can retry a bio on error, if the biovecs weren't all whole pages,
 otherwise it just passes the biovec down with the next bio to the underlying
 cache/backing device.
 
 What btrfs appears to be doing though - I couldn't believe that code actually
 _worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice
 undefined after a bio's completed, they might have been updated if the driver
 was using blk_update_request but for many drivers that just process the entire
 bio all at once they just won't touch those fields - and that includes 
 anything
 that clones the bio (md/dm).
 
 This is probably relevant to immutable biovecs here...
 
 -
 
 Ok, I looked again at the relevant btrfs code, I guess I can see how this 
 printk
 isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check
 for here? And why is it using bv_offset and bv_len further down in
 end_bio_extent_readpage()?

After the IO is done, we're recording the specific logical byte range
that covered the IO.  In practice its always the full page, we can
switch to just trusting PAGE_CACHE_SIZE.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-01-08 Thread Kent Overstreet
On Wed, Jan 08, 2014 at 07:35:32PM +, Chris Mason wrote:
 On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote:
  Ok, I looked again at the relevant btrfs code, I guess I can see how this 
  printk
  isn't normally triggered. But Chris, _what on earth_ is btrfs trying to 
  check
  for here? And why is it using bv_offset and bv_len further down in
  end_bio_extent_readpage()?
 
 After the IO is done, we're recording the specific logical byte range
 that covered the IO.  In practice its always the full page, we can
 switch to just trusting PAGE_CACHE_SIZE.

Yeah, the code already assumes it was doing PAGE_CACHE_SIZE reads; what
you're effectively checking is that the driver did the bvec all at once,
and that it didn't process half a bvec, update it, then process the rest
- which is a completely fine thing to do.

So for now - yeah, the correct thing to do is to just ignore
bv_offset/bv_len and go by PAGE_CACHE_SIZE. But - after immutable
biovecs is in, _then_ you'll be able to depend on bv_offset/bv_len
remaining unchanged (and you can get rid of your dependency on
PAGE_CACHE_SIZE bvecs).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-01-06 Thread Kent Overstreet
On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote:
 On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
  Hello,
  
  I put the WARN_ON(1); after the printk lines (incomplete page read
  and incomplete page write) in extent_io.c.
  
  here some call traces:
  
  [   19.509497] incomplete page read in btrfs with offset 2560 and length 
  1536
  [   19.509500] [ cut here ]
  [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
  end_bio_extent_readpage+0x788/0xc20 [btrfs]()
  [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
  iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
  ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
  evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
  i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
  snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
  processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
  usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
  ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
  scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
  [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
  W  O 3.12.5-1-ARCH #1
  [   19.509580] Hardware name: System manufacturer System Product
  Name/P5WDG2 WS Pro, BIOS 090503/06/2008
  [   19.509581]  0009 880231a63cb0 814ee37b
  
  [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
  
  [   19.509587]  8802320cc9c0  880233b0e000
  880231a63cf8
  [   19.509590] Call Trace:
  [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
  [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
  [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
  [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
  [btrfs]
 
 This should mean that bcache is either failing to read some blocks
 properly or is fiddling with the bv_len/bv_offset fields.
 
 Could someone from bcache comment?

Oh man, I found this and then threw up my hands in despair.

Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the
biovec so it can retry a bio on error, if the biovecs weren't all whole pages,
otherwise it just passes the biovec down with the next bio to the underlying
cache/backing device.

What btrfs appears to be doing though - I couldn't believe that code actually
_worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice
undefined after a bio's completed, they might have been updated if the driver
was using blk_update_request but for many drivers that just process the entire
bio all at once they just won't touch those fields - and that includes anything
that clones the bio (md/dm).

This is probably relevant to immutable biovecs here...

-

Ok, I looked again at the relevant btrfs code, I guess I can see how this printk
isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check
for here? And why is it using bv_offset and bv_len further down in
end_bio_extent_readpage()?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-24 Thread Fábio Pfeifer
(resend int text only)
Some more information about this issue.

I installed my system last november (arch x86_64), with kernel 3.11.
That time I didn't see any csum error or
incomplete page read error. Some time later these errors started to
show up. I don't know exactly if it was in
3.11 - 3.12 upgrade or somewhere in the 3.12 cycle. I've been using
bcache in writeback mode from the beginning.

I made some more testing:
  - tryed bcache in writethrough, writearound  and none modes;
  - tryed linux kernel 3.13-rc5

The errors didn't go away (maybe because my filesystem is already
corrupted). I didn't have time to test with kernel 3.11 again.

But lately the errors increased, and it started to make my system
unstable, and then unusable.
I had to reformat everything and recover my backups.

I don't have my / and /home in btrfs over bcache anymore, but I can
make some tests in a spare HD and SSD i have here. I'll report back
after Christmas.

thanks,

Fabio

2013/12/20 Chris Mason c...@fb.com:
 On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
 Hello,

 I put the WARN_ON(1); after the printk lines (incomplete page read
 and incomplete page write) in extent_io.c.

 here some call traces:

 [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
 [   19.509500] [ cut here ]
 [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
 end_bio_extent_readpage+0x788/0xc20 [btrfs]()
 [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
 iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
 evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
 i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
 snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
 processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
 usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
 ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
 scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
 [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
 W  O 3.12.5-1-ARCH #1
 [   19.509580] Hardware name: System manufacturer System Product
 Name/P5WDG2 WS Pro, BIOS 090503/06/2008
 [   19.509581]  0009 880231a63cb0 814ee37b
 
 [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
 
 [   19.509587]  8802320cc9c0  880233b0e000
 880231a63cf8
 [   19.509590] Call Trace:
 [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
 [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
 [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
 [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
 [btrfs]

 This should mean that bcache is either failing to read some blocks
 properly or is fiddling with the bv_len/bv_offset fields.

 Could someone from bcache comment?

 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-20 Thread eb
On Thu, Dec 19, 2013 at 8:59 PM, Chris Mason c...@fb.com wrote:
 On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
 Btrfs shouldn't be setting the offset on the bios.  Are you able to add
 a WARN_ON to the message that prints this so we can see the stack trace?

If you send me a patch - my experience on hacking on the kernel is
exactly 0 - I'll try to see if I can compile a custom kernel and get
it running.

 Could you please cc the bcache and btrfs list together?

Done.

I did some more testing - I copied an image of a 128GB drive over the
network (via netcat) onto the bcache/btrfs system and verified the
results twice using sha1sum. They're both identical on the source
system (which is *not* using bcache) and bcache/btrfs setup. I've
gotten a lot of the incomplete write errors and a few csum erros in
dmesg, but apparently they haven't done any harm?

Not sure how remarkable this is, as these kinds of things are supposed
to bypass the cache anyway, but I assume they still have to go through
the subsystem.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-20 Thread Fábio Pfeifer
] ? kthread_create_on_node+0x120/0x120
[   25.592360] ---[ end trace bbc8d0d088375447 ]---

thanks,

Fabio Pfeifer

2013/12/19 Chris Mason c...@fb.com:
 On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:

 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)

 sdb3+sdc1 = /dev/bcache0

 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:

 [13811.845540] incomplete page write in btrfs with offset 1536 and length 
 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 
 1024

 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

 Btrfs shouldn't be setting the offset on the bios.  Are you able to add
 a WARN_ON to the message that prints this so we can see the stack trace?

 Could you please cc the bcache and btrfs list together?

 -chris

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-20 Thread Chris Mason
On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
 Hello,
 
 I put the WARN_ON(1); after the printk lines (incomplete page read
 and incomplete page write) in extent_io.c.
 
 here some call traces:
 
 [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
 [   19.509500] [ cut here ]
 [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
 end_bio_extent_readpage+0x788/0xc20 [btrfs]()
 [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
 iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
 evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
 i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
 snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
 processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
 usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
 ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
 scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
 [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
 W  O 3.12.5-1-ARCH #1
 [   19.509580] Hardware name: System manufacturer System Product
 Name/P5WDG2 WS Pro, BIOS 090503/06/2008
 [   19.509581]  0009 880231a63cb0 814ee37b
 
 [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
 
 [   19.509587]  8802320cc9c0  880233b0e000
 880231a63cf8
 [   19.509590] Call Trace:
 [   19.509596]  [814ee37b] dump_stack+0x54/0x8d
 [   19.509601]  [81062bcd] warn_slowpath_common+0x7d/0xa0
 [   19.509603]  [81062caa] warn_slowpath_null+0x1a/0x20
 [   19.509614]  [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 
 [btrfs]

This should mean that bcache is either failing to read some blocks
properly or is fiddling with the bv_len/bv_offset fields.

Could someone from bcache comment?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-20 Thread Henry de Valence
On Thu, Dec 19, 2013 at 2:04 PM, Fábio Pfeifer fmpfei...@gmail.com wrote:
 Any update on this?

 I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
 device 500 GB IDE, cache 24 GB SSD = /dev/bcache0
 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
 messages in dmesg:

I also have this issue.

Also, this afternoon I experienced data corruption on my btrfs device
(checksum errors), which might or might not be related. I don't really
know how to determine the cause, but if anyone has suggestions they'd
be appreciated.

Cheers,
Henry de Valence
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer
Any update on this?

I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
device 500 GB IDE, cache 24 GB SSD = /dev/bcache0
On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
messages in dmesg:

(...)
[   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
212992 csum 519977505 expected csum 3166125439
[   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
106496 csum 3553846164 expected csum 1299185721
[   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
[   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
172032 csum 1883678196 expected csum 1337496676
[   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
[   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
237568 csum 2863587994 expected csum 2693116460
[   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
[   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
57344 csum 1528117893 expected csum 2239543273
[   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
[   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
[   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
[   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
[   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
[   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
16384 csum 1180114025 expected csum 474262911
[   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
[   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
327680 csum 3065880108 expected csum 2663659117
[   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
[   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
[   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
[   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
81920 csum 1511792656 expected csum 3733709121
[   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
(...)

should I be worried?

thanks,

Fabio Pfeifer

2013/12/18 eb e...@gmx.ch:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:

 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)

 sdb3+sdc1 = /dev/bcache0

 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:

 [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024

 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

 Trying to fix this, I unattached the cache (still using /dev/bcache0,
 but without /dev/sdb3 attached), causing these errors to disappear. As
 soon as I re-attached /dev/sdb3 they started again, so I am fairly
 sure it's an unfavorable interaction between bcache and btrfs.

 Is this something I should be worried about (they're only emitted with
 KERN_INFO?) or just an alignment problem? The underlying HDD is using
 4K-Sectors, while the block_size of bcache seems to be 512, could that
 be the issue here?

 I've also encountered incomplete reads and a few csum errors, but I
 have not been able to trigger these regularly. I have a feeling that
 the error is more likely  o be on the bcache end (I've mailed to that
 list as well), however any insight into the matter would be much
 appreciated.

 Thanks,

 - eb
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer
Forgot to mention: bcache is in writeback mode

2013/12/19 Fábio Pfeifer fmpfei...@gmail.com:
 Any update on this?

 I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
 device 500 GB IDE, cache 24 GB SSD = /dev/bcache0
 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
 messages in dmesg:

 (...)
 [   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
 212992 csum 519977505 expected csum 3166125439
 [   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
 [   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
 [   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
 106496 csum 3553846164 expected csum 1299185721
 [   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
 [   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
 172032 csum 1883678196 expected csum 1337496676
 [   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
 [   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
 237568 csum 2863587994 expected csum 2693116460
 [   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
 [   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
 57344 csum 1528117893 expected csum 2239543273
 [   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
 [   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
 [   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
 [   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
 [   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
 [   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
 16384 csum 1180114025 expected csum 474262911
 [   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
 [   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
 327680 csum 3065880108 expected csum 2663659117
 [   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
 [   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
 [   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
 [   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
 81920 csum 1511792656 expected csum 3733709121
 [   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
 (...)

 should I be worried?

 thanks,

 Fabio Pfeifer

 2013/12/18 eb e...@gmx.ch:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:

 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)

 sdb3+sdc1 = /dev/bcache0

 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:

 [13811.845540] incomplete page write in btrfs with offset 1536 and length 
 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 
 1024

 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

 Trying to fix this, I unattached the cache (still using /dev/bcache0,
 but without /dev/sdb3 attached), causing these errors to disappear. As
 soon as I re-attached /dev/sdb3 they started again, so I am fairly
 sure it's an unfavorable interaction between bcache and btrfs.

 Is this something I should be worried about (they're only emitted with
 KERN_INFO?) or just an alignment problem? The underlying HDD is using
 4K-Sectors, while the block_size of bcache seems to be 512, could that
 be the issue here?

 I've also encountered incomplete reads and a few csum errors, but I
 have not been able to trigger these regularly. I have a feeling that
 the error is more likely  o be on the bcache end (I've mailed to that
 list as well), however any insight into the matter would be much
 appreciated.

 Thanks,

 - eb
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-19 Thread Chris Mason
On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
 I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
 follows:
 
 /dev/sdb3 - cache0 (80 GB Intel SSD)
 /dev/sdc1 - backing device (2 TB WD HDD)
 
 sdb3+sdc1 = /dev/bcache0
 
 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
 as / and /home. What's been bothering me are the following entries in
 my kernel log:
 
 [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024
 
 The offset/length values are always either 1536/2560 or 3072/1024,
 they sum up nicely to 4K. There are 607 of those in there as I am
 writing this, the machine has been up 18 hours and been under no
 particular I/O strain (it's a desktop).

Btrfs shouldn't be setting the offset on the bios.  Are you able to add
a WARN_ON to the message that prints this so we can see the stack trace?

Could you please cc the bcache and btrfs list together?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on bcache

2013-12-18 Thread eb
I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows:

/dev/sdb3 - cache0 (80 GB Intel SSD)
/dev/sdc1 - backing device (2 TB WD HDD)

sdb3+sdc1 = /dev/bcache0

On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
as / and /home. What's been bothering me are the following entries in
my kernel log:

[13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
[13870.326639] incomplete page write in btrfs with offset 3072 and length 1024

The offset/length values are always either 1536/2560 or 3072/1024,
they sum up nicely to 4K. There are 607 of those in there as I am
writing this, the machine has been up 18 hours and been under no
particular I/O strain (it's a desktop).

Trying to fix this, I unattached the cache (still using /dev/bcache0,
but without /dev/sdb3 attached), causing these errors to disappear. As
soon as I re-attached /dev/sdb3 they started again, so I am fairly
sure it's an unfavorable interaction between bcache and btrfs.

Is this something I should be worried about (they're only emitted with
KERN_INFO?) or just an alignment problem? The underlying HDD is using
4K-Sectors, while the block_size of bcache seems to be 512, could that
be the issue here?

I've also encountered incomplete reads and a few csum errors, but I
have not been able to trigger these regularly. I have a feeling that
the error is more likely  o be on the bcache end (I've mailed to that
list as well), however any insight into the matter would be much
appreciated.

Thanks,

- eb
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html