Re: Recovering BTRFS from bcache failure.
On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0 --init-extent-tree enabling repair mode parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Couldn't open file system Annoyingly: # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Open ctree failed create failed (Success) So I can't even send an image for people to look at. CCing some more people on this one, while this filesystem isn't important I'd like to know that restore from backup isn't the only option for BTRFS corruption. All of the tools simply throw up their hands and bail when confronted with this filesystem, even btrfs-image. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering BTRFS from bcache failure.
It's a known bug with bcache and enabling discard, it was discarding sections containing data it wanted. After a reboot bcache refused to accept the cache data, and of course it was dirty because I'm frankly too stupid to breathe sometimes. So yes, it's a bcache issue, but that's unresolvable. I'm trying to rescue the btrfs data that it trashed. On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote: Hello, I had some luck in the past with btrfs restore using the -r option. I don't recall how I determined the roots... Maybe I tried random numbers? I was able to recover nearly all of my data from a bcache related crash from over a year ago. What kind of bcache failure did you see? I've been doing some testing recently and ran into 2 bcache failures. With both of these failures, I had a ' bad btree header at bucket' error message (which is entirely different from the crash I had over a year back). I'm currently trying a different SSD to see if that alleviates the issue. The error makes me think that it's a bcache specific issue that's unrelated to btrfs or possibly (in my case) an issue with the previous SSD. Did you encounter this same error? With my 2 most recent crashes, I didn't try to recover very hard (or even try 'btrfs recover; at all) as I've been taking daily backups. I did try btrfsck, and not only would it fail, it would segfault. -Cameron On 04/08/2015 11:07 AM, Dan Merillat wrote: Any ideas on where to start with this? I did flush the cache out to disk before I made changes to the bcache configuration, so there shouldn't be anything completely missing, just some bits of stale metadata. If I can get the tools to take the closest match and run with it it would probably recover nearly everything. At worst, is there a way to scan the metadata blocks and rebuild from found extent-trees? On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent
Re: Recovering BTRFS from bcache failure.
Sorry I pressed send before I finished my thoughts. btrfs restore gets nowhere with any options. btrfs-recover says the superblocks are fine, and chunk recover does nothing after a few hours of reading. Everything else bails out with the errors I listed above. On Wed, Apr 8, 2015 at 2:36 PM, Dan Merillat dan.meril...@gmail.com wrote: It's a known bug with bcache and enabling discard, it was discarding sections containing data it wanted. After a reboot bcache refused to accept the cache data, and of course it was dirty because I'm frankly too stupid to breathe sometimes. So yes, it's a bcache issue, but that's unresolvable. I'm trying to rescue the btrfs data that it trashed. On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote: Hello, I had some luck in the past with btrfs restore using the -r option. I don't recall how I determined the roots... Maybe I tried random numbers? I was able to recover nearly all of my data from a bcache related crash from over a year ago. What kind of bcache failure did you see? I've been doing some testing recently and ran into 2 bcache failures. With both of these failures, I had a ' bad btree header at bucket' error message (which is entirely different from the crash I had over a year back). I'm currently trying a different SSD to see if that alleviates the issue. The error makes me think that it's a bcache specific issue that's unrelated to btrfs or possibly (in my case) an issue with the previous SSD. Did you encounter this same error? With my 2 most recent crashes, I didn't try to recover very hard (or even try 'btrfs recover; at all) as I've been taking daily backups. I did try btrfsck, and not only would it fail, it would segfault. -Cameron On 04/08/2015 11:07 AM, Dan Merillat wrote: Any ideas on where to start with this? I did flush the cache out to disk before I made changes to the bcache configuration, so there shouldn't be anything completely missing, just some bits of stale metadata. If I can get the tools to take the closest match and run with it it would probably recover nearly everything. At worst, is there a way to scan the metadata blocks and rebuild from found extent-trees? On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring
Re: Recovering BTRFS from bcache failure.
on the first post. Not sure, what you tried until now (except btrfs restore). If you don't have time, this is all you can try now. If your data is valuable - well, you have to wait for the experts here. But general opinion here is: If you don't have backups, your data is not valuable by definition, especially if using an unmature fs, or an even more experimental setup like bcache. ;-) PS: btrfs-zero-log is usually my personal first-resort in case of problems not fixable within a few tries. I got nightly backups to restore from and compare data after zero-log. Thus why I started my answer with it. -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering BTRFS from bcache failure.
Any ideas on where to start with this? I did flush the cache out to disk before I made changes to the bcache configuration, so there shouldn't be anything completely missing, just some bits of stale metadata. If I can get the tools to take the closest match and run with it it would probably recover nearly everything. At worst, is there a way to scan the metadata blocks and rebuild from found extent-trees? On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote: Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0 --init-extent-tree enabling repair mode parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Couldn't open file system Annoyingly: # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Open ctree failed create failed (Success) So I can't even send an image for people to look at. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovering BTRFS from bcache failure.
Bcache failures are nasty, because they leave a mix of old and new data on the disk. In this case, there was very little dirty data, but of course the tree roots were dirty and out-of-sync. fileserver:/usr/src/btrfs-progs# ./btrfs --version Btrfs v3.18.2 kernel version 3.18 [ 572.573566] BTRFS info (device bcache0): enabling auto recovery [ 572.573619] BTRFS info (device bcache0): disk space caching is enabled [ 574.266055] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.276952] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277008] BTRFS: failed to read tree root on bcache0 [ 574.277187] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277356] BTRFS (device bcache0): parent transid verify failed on 7567956930560 wanted 613690 found 613681 [ 574.277398] BTRFS: failed to read tree root on bcache0 [ 574.285955] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 613694 [ 574.298741] BTRFS (device bcache0): parent transid verify failed on 7567965720576 wanted 613689 found 610499 [ 574.298804] BTRFS: failed to read tree root on bcache0 [ 575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768 [ 575.111495] BTRFS (device bcache0): parent transid verify failed on 7567954464768 wanted 613688 found 613685 [ 575.111559] BTRFS: failed to read tree root on bcache0 [ 575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912 [ 575.131803] BTRFS (device bcache0): parent transid verify failed on 7567954214912 wanted 613687 found 613680 [ 575.131866] BTRFS: failed to read tree root on bcache0 [ 575.180101] BTRFS: open_ctree failed all the btrfs tools throw up their hands with similar errors: ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Could not open root, trying backup super fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0 --init-extent-tree enabling repair mode parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Couldn't open file system Annoyingly: # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 parent transid verify failed on 7567956930560 wanted 613690 found 613681 Ignoring transid failure Couldn't setup extent tree Open ctree failed create failed (Success) So I can't even send an image for people to look at. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Suggestion on reducing short kernel hangs from my btrfs filesystems: bcache?
I have a server which runs zoneminder (video recording which is CPU and disk IO intensive) while also doing a bunch of I/O over serial ports. I have a a dual core Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz (4 virtual CPUs in /proc/cpuinfo) It's pretty clear that when zoneminder is doing more work, my programs that talk to serial ports start failing due to delays on the kernel side and desynchronization, causing serial port protocol errors (I'm using USB serial adapters, and use 12 of them). I'm pretty sure it's because of delays in the kernel more than user space, but can't prove that easily. I have a preempt kernel, kernel 3.16.3: CONFIG_TREE_PREEMPT_RCU=y CONFIG_PREEMPT_RCU=y CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y CONFIG_DEBUG_PREEMPT=y From what I can tell, things did get worse after I upgraded from ext4 to btrfs (not counting times where I resync the software raid5 underneath or run a btrfs scrub). I may try to see if VOLPREMPT might work better, but I'm thinking putting an SSD in front of that mdadm RAID5 array will help by relieving the IO load and hopefully giving more time for the CPU to handle serial port requests. I'm actually not sure if my issue is btrfs interrupting serial port connections due to PREEMPT, or if serial port connections aren't being serviced quickly enough because the kernel is busy with btrfs and PREMPT hasn't kicked in yet. From reading the list, bcache may work with btrfs, but before I try that, I was curious if there are other or better ways to use an SSD to make btrfs less impacting on my server? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
Hi, has this issue been resolved? I would like to use the bcache + btrfs combo. Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
After completely loosing my filesystem twice because of this bug, I gave up using btrfs on top of bcache (also writeback). In my case, I used to have some subvolumes and some snapshot of these subvolumes, but not many of them. The btrfs mantra backup, bakcup and backup saved me. Best regards, Fábio Pfeifer 2014-07-30 20:01 GMT-03:00 Larkin Lowrey llow...@nuclearwinter.com: I've been running two backup servers, with 25T and 20T of data, using btrfs on bcache (writeback) for about 7 months. I periodically run btrfs scrubs and backup verifies (SHA1 hashes) and have never had a corruption issue. My use of btrfs is simple, though, with no subvolumes and no btrfs level raid. My bcache backing devices are LVM volumes that span multiple md raid6 arrays. So, either the bug has been fixed or my configuration is not susceptible. I'm running kernel 3.15.5-200.fc20.x86_64. --Larkin On 7/30/2014 5:04 PM, dptr...@arcor.de wrote: Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does this bug still exists? Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1 I still have no incomplete page write messages in dmesg | grep btrfs and the checksums of some manually reviewed files are okay. Who has more experiences about this? Thanks, - dp -- To unsubscribe from this list: send the line unsubscribe linux-bcache in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-bcache in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on bcache
Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does this bug still exists? Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1 I still have no incomplete page write messages in dmesg | grep btrfs and the checksums of some manually reviewed files are okay. Who has more experiences about this? Thanks, - dp -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
dptrash posted on Thu, 31 Jul 2014 17:35:44 +0200 as excerpted: Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does this bug still exists? Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1 I still have no incomplete page write messages in dmesg | grep btrfs and the checksums of some manually reviewed files are okay. Who has more experiences about this? See the reply (not mine) to your earlier post of the question: http://permalink.gmane.org/gmane.linux.kernel.bcache.devel/2602 -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on bcache
Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does this bug still exists? Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1 I still have no incomplete page write messages in dmesg | grep btrfs and the checksums of some manually reviewed files are okay. Who has more experiences about this? Thanks, - dp -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
I've been running two backup servers, with 25T and 20T of data, using btrfs on bcache (writeback) for about 7 months. I periodically run btrfs scrubs and backup verifies (SHA1 hashes) and have never had a corruption issue. My use of btrfs is simple, though, with no subvolumes and no btrfs level raid. My bcache backing devices are LVM volumes that span multiple md raid6 arrays. So, either the bug has been fixed or my configuration is not susceptible. I'm running kernel 3.15.5-200.fc20.x86_64. --Larkin On 7/30/2014 5:04 PM, dptr...@arcor.de wrote: Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does this bug still exists? Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1 I still have no incomplete page write messages in dmesg | grep btrfs and the checksums of some manually reviewed files are okay. Who has more experiences about this? Thanks, - dp -- To unsubscribe from this list: send the line unsubscribe linux-bcache in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On 2014-04-30 14:16, Felix Homann wrote: Hi, a couple of months ago there has been some discussion about issues when using btrfs on bcache: http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018 From looking at the mailing list archives I cannot tell whether or not this issue has been resolved in current kernels from either bcache's or btrfs' side. Can anyone tell me what's the current state of this issue? Should it be safe to use btrfs on bcache by now? In all practicality, I don't think anyone who frequents the list knows. I do know that there are a number of people (myself included) who avoid bcache in general because of having issues with seemingly random kernel OOPSes when it is linked in (either as a module or compiled in), even when it isn't being used. My advice would be to just test it with some non-essential data (maybe set up a virtual machine?). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on bcache
Hi, a couple of months ago there has been some discussion about issues when using btrfs on bcache: http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018 From looking at the mailing list archives I cannot tell whether or not this issue has been resolved in current kernels from either bcache's or btrfs' side. Can anyone tell me what's the current state of this issue? Should it be safe to use btrfs on bcache by now? Thanks and kind regards, Felix -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote: On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote: On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote: Hello, I put the WARN_ON(1); after the printk lines (incomplete page read and incomplete page write) in extent_io.c. here some call traces: [ 19.509497] incomplete page read in btrfs with offset 2560 and length 1536 [ 19.509500] [ cut here ] [ 19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441 end_bio_extent_readpage+0x788/0xc20 [btrfs]() [ 19.509530] Modules linked in: cdc_acm fuse iTCO_wdt iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache [ 19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P W O 3.12.5-1-ARCH #1 [ 19.509580] Hardware name: System manufacturer System Product Name/P5WDG2 WS Pro, BIOS 090503/06/2008 [ 19.509581] 0009 880231a63cb0 814ee37b [ 19.509585] 880231a63ce8 81062bcd ea00085eaec0 [ 19.509587] 8802320cc9c0 880233b0e000 880231a63cf8 [ 19.509590] Call Trace: [ 19.509596] [814ee37b] dump_stack+0x54/0x8d [ 19.509601] [81062bcd] warn_slowpath_common+0x7d/0xa0 [ 19.509603] [81062caa] warn_slowpath_null+0x1a/0x20 [ 19.509614] [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 [btrfs] This should mean that bcache is either failing to read some blocks properly or is fiddling with the bv_len/bv_offset fields. Could someone from bcache comment? Oh man, I found this and then threw up my hands in despair. Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the biovec so it can retry a bio on error, if the biovecs weren't all whole pages, otherwise it just passes the biovec down with the next bio to the underlying cache/backing device. What btrfs appears to be doing though - I couldn't believe that code actually _worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice undefined after a bio's completed, they might have been updated if the driver was using blk_update_request but for many drivers that just process the entire bio all at once they just won't touch those fields - and that includes anything that clones the bio (md/dm). This is probably relevant to immutable biovecs here... - Ok, I looked again at the relevant btrfs code, I guess I can see how this printk isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check for here? And why is it using bv_offset and bv_len further down in end_bio_extent_readpage()? After the IO is done, we're recording the specific logical byte range that covered the IO. In practice its always the full page, we can switch to just trusting PAGE_CACHE_SIZE. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Wed, Jan 08, 2014 at 07:35:32PM +, Chris Mason wrote: On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote: Ok, I looked again at the relevant btrfs code, I guess I can see how this printk isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check for here? And why is it using bv_offset and bv_len further down in end_bio_extent_readpage()? After the IO is done, we're recording the specific logical byte range that covered the IO. In practice its always the full page, we can switch to just trusting PAGE_CACHE_SIZE. Yeah, the code already assumes it was doing PAGE_CACHE_SIZE reads; what you're effectively checking is that the driver did the bvec all at once, and that it didn't process half a bvec, update it, then process the rest - which is a completely fine thing to do. So for now - yeah, the correct thing to do is to just ignore bv_offset/bv_len and go by PAGE_CACHE_SIZE. But - after immutable biovecs is in, _then_ you'll be able to depend on bv_offset/bv_len remaining unchanged (and you can get rid of your dependency on PAGE_CACHE_SIZE bvecs). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote: On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote: Hello, I put the WARN_ON(1); after the printk lines (incomplete page read and incomplete page write) in extent_io.c. here some call traces: [ 19.509497] incomplete page read in btrfs with offset 2560 and length 1536 [ 19.509500] [ cut here ] [ 19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441 end_bio_extent_readpage+0x788/0xc20 [btrfs]() [ 19.509530] Modules linked in: cdc_acm fuse iTCO_wdt iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache [ 19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P W O 3.12.5-1-ARCH #1 [ 19.509580] Hardware name: System manufacturer System Product Name/P5WDG2 WS Pro, BIOS 090503/06/2008 [ 19.509581] 0009 880231a63cb0 814ee37b [ 19.509585] 880231a63ce8 81062bcd ea00085eaec0 [ 19.509587] 8802320cc9c0 880233b0e000 880231a63cf8 [ 19.509590] Call Trace: [ 19.509596] [814ee37b] dump_stack+0x54/0x8d [ 19.509601] [81062bcd] warn_slowpath_common+0x7d/0xa0 [ 19.509603] [81062caa] warn_slowpath_null+0x1a/0x20 [ 19.509614] [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 [btrfs] This should mean that bcache is either failing to read some blocks properly or is fiddling with the bv_len/bv_offset fields. Could someone from bcache comment? Oh man, I found this and then threw up my hands in despair. Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the biovec so it can retry a bio on error, if the biovecs weren't all whole pages, otherwise it just passes the biovec down with the next bio to the underlying cache/backing device. What btrfs appears to be doing though - I couldn't believe that code actually _worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice undefined after a bio's completed, they might have been updated if the driver was using blk_update_request but for many drivers that just process the entire bio all at once they just won't touch those fields - and that includes anything that clones the bio (md/dm). This is probably relevant to immutable biovecs here... - Ok, I looked again at the relevant btrfs code, I guess I can see how this printk isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check for here? And why is it using bv_offset and bv_len further down in end_bio_extent_readpage()? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
(resend int text only) Some more information about this issue. I installed my system last november (arch x86_64), with kernel 3.11. That time I didn't see any csum error or incomplete page read error. Some time later these errors started to show up. I don't know exactly if it was in 3.11 - 3.12 upgrade or somewhere in the 3.12 cycle. I've been using bcache in writeback mode from the beginning. I made some more testing: - tryed bcache in writethrough, writearound and none modes; - tryed linux kernel 3.13-rc5 The errors didn't go away (maybe because my filesystem is already corrupted). I didn't have time to test with kernel 3.11 again. But lately the errors increased, and it started to make my system unstable, and then unusable. I had to reformat everything and recover my backups. I don't have my / and /home in btrfs over bcache anymore, but I can make some tests in a spare HD and SSD i have here. I'll report back after Christmas. thanks, Fabio 2013/12/20 Chris Mason c...@fb.com: On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote: Hello, I put the WARN_ON(1); after the printk lines (incomplete page read and incomplete page write) in extent_io.c. here some call traces: [ 19.509497] incomplete page read in btrfs with offset 2560 and length 1536 [ 19.509500] [ cut here ] [ 19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441 end_bio_extent_readpage+0x788/0xc20 [btrfs]() [ 19.509530] Modules linked in: cdc_acm fuse iTCO_wdt iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache [ 19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P W O 3.12.5-1-ARCH #1 [ 19.509580] Hardware name: System manufacturer System Product Name/P5WDG2 WS Pro, BIOS 090503/06/2008 [ 19.509581] 0009 880231a63cb0 814ee37b [ 19.509585] 880231a63ce8 81062bcd ea00085eaec0 [ 19.509587] 8802320cc9c0 880233b0e000 880231a63cf8 [ 19.509590] Call Trace: [ 19.509596] [814ee37b] dump_stack+0x54/0x8d [ 19.509601] [81062bcd] warn_slowpath_common+0x7d/0xa0 [ 19.509603] [81062caa] warn_slowpath_null+0x1a/0x20 [ 19.509614] [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 [btrfs] This should mean that bcache is either failing to read some blocks properly or is fiddling with the bv_len/bv_offset fields. Could someone from bcache comment? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Thu, Dec 19, 2013 at 8:59 PM, Chris Mason c...@fb.com wrote: On Wed, 2013-12-18 at 18:17 +0100, eb wrote: Btrfs shouldn't be setting the offset on the bios. Are you able to add a WARN_ON to the message that prints this so we can see the stack trace? If you send me a patch - my experience on hacking on the kernel is exactly 0 - I'll try to see if I can compile a custom kernel and get it running. Could you please cc the bcache and btrfs list together? Done. I did some more testing - I copied an image of a 128GB drive over the network (via netcat) onto the bcache/btrfs system and verified the results twice using sha1sum. They're both identical on the source system (which is *not* using bcache) and bcache/btrfs setup. I've gotten a lot of the incomplete write errors and a few csum erros in dmesg, but apparently they haven't done any harm? Not sure how remarkable this is, as these kinds of things are supposed to bypass the cache anyway, but I assume they still have to go through the subsystem. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
] ? kthread_create_on_node+0x120/0x120 [ 25.592360] ---[ end trace bbc8d0d088375447 ]--- thanks, Fabio Pfeifer 2013/12/19 Chris Mason c...@fb.com: On Wed, 2013-12-18 at 18:17 +0100, eb wrote: I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows: /dev/sdb3 - cache0 (80 GB Intel SSD) /dev/sdc1 - backing device (2 TB WD HDD) sdb3+sdc1 = /dev/bcache0 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted as / and /home. What's been bothering me are the following entries in my kernel log: [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024 The offset/length values are always either 1536/2560 or 3072/1024, they sum up nicely to 4K. There are 607 of those in there as I am writing this, the machine has been up 18 hours and been under no particular I/O strain (it's a desktop). Btrfs shouldn't be setting the offset on the bios. Are you able to add a WARN_ON to the message that prints this so we can see the stack trace? Could you please cc the bcache and btrfs list together? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote: Hello, I put the WARN_ON(1); after the printk lines (incomplete page read and incomplete page write) in extent_io.c. here some call traces: [ 19.509497] incomplete page read in btrfs with offset 2560 and length 1536 [ 19.509500] [ cut here ] [ 19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441 end_bio_extent_readpage+0x788/0xc20 [btrfs]() [ 19.509530] Modules linked in: cdc_acm fuse iTCO_wdt iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1 ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache [ 19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P W O 3.12.5-1-ARCH #1 [ 19.509580] Hardware name: System manufacturer System Product Name/P5WDG2 WS Pro, BIOS 090503/06/2008 [ 19.509581] 0009 880231a63cb0 814ee37b [ 19.509585] 880231a63ce8 81062bcd ea00085eaec0 [ 19.509587] 8802320cc9c0 880233b0e000 880231a63cf8 [ 19.509590] Call Trace: [ 19.509596] [814ee37b] dump_stack+0x54/0x8d [ 19.509601] [81062bcd] warn_slowpath_common+0x7d/0xa0 [ 19.509603] [81062caa] warn_slowpath_null+0x1a/0x20 [ 19.509614] [a00b7ba8] end_bio_extent_readpage+0x788/0xc20 [btrfs] This should mean that bcache is either failing to read some blocks properly or is fiddling with the bv_len/bv_offset fields. Could someone from bcache comment? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Thu, Dec 19, 2013 at 2:04 PM, Fábio Pfeifer fmpfei...@gmail.com wrote: Any update on this? I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing device 500 GB IDE, cache 24 GB SSD = /dev/bcache0 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of messages in dmesg: I also have this issue. Also, this afternoon I experienced data corruption on my btrfs device (checksum errors), which might or might not be related. I don't really know how to determine the cause, but if anyone has suggestions they'd be appreciated. Cheers, Henry de Valence -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
Any update on this? I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing device 500 GB IDE, cache 24 GB SSD = /dev/bcache0 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of messages in dmesg: (...) [ 22.282469] BTRFS info (device bcache0): csum failed ino 56193 off 212992 csum 519977505 expected csum 3166125439 [ 22.282656] incomplete page read in btrfs with offset 1024 and length 3072 [ 23.370872] incomplete page read in btrfs with offset 1024 and length 3072 [ 23.370890] BTRFS info (device bcache0): csum failed ino 57765 off 106496 csum 3553846164 expected csum 1299185721 [ 23.505238] incomplete page read in btrfs with offset 2560 and length 1536 [ 23.505256] BTRFS info (device bcache0): csum failed ino 75922 off 172032 csum 1883678196 expected csum 1337496676 [ 23.508535] incomplete page read in btrfs with offset 2560 and length 1536 [ 23.508547] BTRFS info (device bcache0): csum failed ino 74368 off 237568 csum 2863587994 expected csum 2693116460 [ 25.683059] incomplete page read in btrfs with offset 2560 and length 1536 [ 25.683078] BTRFS info (device bcache0): csum failed ino 123709 off 57344 csum 1528117893 expected csum 2239543273 [ 25.684339] incomplete page read in btrfs with offset 1024 and length 3072 [ 26.622384] incomplete page read in btrfs with offset 1024 and length 3072 [ 26.906718] incomplete page read in btrfs with offset 2560 and length 1536 [ 27.823247] incomplete page read in btrfs with offset 1024 and length 3072 [ 27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed [ 27.823271] BTRFS info (device bcache0): csum failed ino 34587 off 16384 csum 1180114025 expected csum 474262911 [ 28.490066] incomplete page read in btrfs with offset 2560 and length 1536 [ 28.490085] BTRFS info (device bcache0): csum failed ino 65817 off 327680 csum 3065880108 expected csum 2663659117 [ 29.413824] incomplete page read in btrfs with offset 1024 and length 3072 [ 41.913857] incomplete page read in btrfs with offset 2560 and length 1536 [ 55.761753] incomplete page read in btrfs with offset 1024 and length 3072 [ 55.761771] BTRFS info (device bcache0): csum failed ino 72835 off 81920 csum 1511792656 expected csum 3733709121 [ 69.636498] incomplete page read in btrfs with offset 2560 and length 1536 (...) should I be worried? thanks, Fabio Pfeifer 2013/12/18 eb e...@gmx.ch: I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows: /dev/sdb3 - cache0 (80 GB Intel SSD) /dev/sdc1 - backing device (2 TB WD HDD) sdb3+sdc1 = /dev/bcache0 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted as / and /home. What's been bothering me are the following entries in my kernel log: [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024 The offset/length values are always either 1536/2560 or 3072/1024, they sum up nicely to 4K. There are 607 of those in there as I am writing this, the machine has been up 18 hours and been under no particular I/O strain (it's a desktop). Trying to fix this, I unattached the cache (still using /dev/bcache0, but without /dev/sdb3 attached), causing these errors to disappear. As soon as I re-attached /dev/sdb3 they started again, so I am fairly sure it's an unfavorable interaction between bcache and btrfs. Is this something I should be worried about (they're only emitted with KERN_INFO?) or just an alignment problem? The underlying HDD is using 4K-Sectors, while the block_size of bcache seems to be 512, could that be the issue here? I've also encountered incomplete reads and a few csum errors, but I have not been able to trigger these regularly. I have a feeling that the error is more likely o be on the bcache end (I've mailed to that list as well), however any insight into the matter would be much appreciated. Thanks, - eb -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
Forgot to mention: bcache is in writeback mode 2013/12/19 Fábio Pfeifer fmpfei...@gmail.com: Any update on this? I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing device 500 GB IDE, cache 24 GB SSD = /dev/bcache0 On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of messages in dmesg: (...) [ 22.282469] BTRFS info (device bcache0): csum failed ino 56193 off 212992 csum 519977505 expected csum 3166125439 [ 22.282656] incomplete page read in btrfs with offset 1024 and length 3072 [ 23.370872] incomplete page read in btrfs with offset 1024 and length 3072 [ 23.370890] BTRFS info (device bcache0): csum failed ino 57765 off 106496 csum 3553846164 expected csum 1299185721 [ 23.505238] incomplete page read in btrfs with offset 2560 and length 1536 [ 23.505256] BTRFS info (device bcache0): csum failed ino 75922 off 172032 csum 1883678196 expected csum 1337496676 [ 23.508535] incomplete page read in btrfs with offset 2560 and length 1536 [ 23.508547] BTRFS info (device bcache0): csum failed ino 74368 off 237568 csum 2863587994 expected csum 2693116460 [ 25.683059] incomplete page read in btrfs with offset 2560 and length 1536 [ 25.683078] BTRFS info (device bcache0): csum failed ino 123709 off 57344 csum 1528117893 expected csum 2239543273 [ 25.684339] incomplete page read in btrfs with offset 1024 and length 3072 [ 26.622384] incomplete page read in btrfs with offset 1024 and length 3072 [ 26.906718] incomplete page read in btrfs with offset 2560 and length 1536 [ 27.823247] incomplete page read in btrfs with offset 1024 and length 3072 [ 27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed [ 27.823271] BTRFS info (device bcache0): csum failed ino 34587 off 16384 csum 1180114025 expected csum 474262911 [ 28.490066] incomplete page read in btrfs with offset 2560 and length 1536 [ 28.490085] BTRFS info (device bcache0): csum failed ino 65817 off 327680 csum 3065880108 expected csum 2663659117 [ 29.413824] incomplete page read in btrfs with offset 1024 and length 3072 [ 41.913857] incomplete page read in btrfs with offset 2560 and length 1536 [ 55.761753] incomplete page read in btrfs with offset 1024 and length 3072 [ 55.761771] BTRFS info (device bcache0): csum failed ino 72835 off 81920 csum 1511792656 expected csum 3733709121 [ 69.636498] incomplete page read in btrfs with offset 2560 and length 1536 (...) should I be worried? thanks, Fabio Pfeifer 2013/12/18 eb e...@gmx.ch: I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows: /dev/sdb3 - cache0 (80 GB Intel SSD) /dev/sdc1 - backing device (2 TB WD HDD) sdb3+sdc1 = /dev/bcache0 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted as / and /home. What's been bothering me are the following entries in my kernel log: [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024 The offset/length values are always either 1536/2560 or 3072/1024, they sum up nicely to 4K. There are 607 of those in there as I am writing this, the machine has been up 18 hours and been under no particular I/O strain (it's a desktop). Trying to fix this, I unattached the cache (still using /dev/bcache0, but without /dev/sdb3 attached), causing these errors to disappear. As soon as I re-attached /dev/sdb3 they started again, so I am fairly sure it's an unfavorable interaction between bcache and btrfs. Is this something I should be worried about (they're only emitted with KERN_INFO?) or just an alignment problem? The underlying HDD is using 4K-Sectors, while the block_size of bcache seems to be 512, could that be the issue here? I've also encountered incomplete reads and a few csum errors, but I have not been able to trigger these regularly. I have a feeling that the error is more likely o be on the bcache end (I've mailed to that list as well), however any insight into the matter would be much appreciated. Thanks, - eb -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on bcache
On Wed, 2013-12-18 at 18:17 +0100, eb wrote: I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows: /dev/sdb3 - cache0 (80 GB Intel SSD) /dev/sdc1 - backing device (2 TB WD HDD) sdb3+sdc1 = /dev/bcache0 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted as / and /home. What's been bothering me are the following entries in my kernel log: [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024 The offset/length values are always either 1536/2560 or 3072/1024, they sum up nicely to 4K. There are 607 of those in there as I am writing this, the machine has been up 18 hours and been under no particular I/O strain (it's a desktop). Btrfs shouldn't be setting the offset on the bios. Are you able to add a WARN_ON to the message that prints this so we can see the stack trace? Could you please cc the bcache and btrfs list together? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on bcache
I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows: /dev/sdb3 - cache0 (80 GB Intel SSD) /dev/sdc1 - backing device (2 TB WD HDD) sdb3+sdc1 = /dev/bcache0 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted as / and /home. What's been bothering me are the following entries in my kernel log: [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024 The offset/length values are always either 1536/2560 or 3072/1024, they sum up nicely to 4K. There are 607 of those in there as I am writing this, the machine has been up 18 hours and been under no particular I/O strain (it's a desktop). Trying to fix this, I unattached the cache (still using /dev/bcache0, but without /dev/sdb3 attached), causing these errors to disappear. As soon as I re-attached /dev/sdb3 they started again, so I am fairly sure it's an unfavorable interaction between bcache and btrfs. Is this something I should be worried about (they're only emitted with KERN_INFO?) or just an alignment problem? The underlying HDD is using 4K-Sectors, while the block_size of bcache seems to be 512, could that be the issue here? I've also encountered incomplete reads and a few csum errors, but I have not been able to trigger these regularly. I have a feeling that the error is more likely o be on the bcache end (I've mailed to that list as well), however any insight into the matter would be much appreciated. Thanks, - eb -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html