btrfszero-log fs_image to developers
Hello! As say in this page https://btrfs.wiki.kernel.org/index.php/Btrfs-zero-log I send output data of #btrfs-image -c 9 -t 8 /dev/mapper/root /tmp/fs_image to developers. File link: https://onedrive.live.com/redir?resid=31ECDF5D805029B7!403759authkey=!AMWoAxWRnpVkJY4ithint=file%2c Additional informaion: [lm@mothership ~]$ uname -a Linux mothership 3.19.2-1-ARCH #1 SMP PREEMPT Wed Mar 18 16:21:02 CET 2015 x86_64 GNU/Linux [lm@mothership ~]$ btrfs --version btrfs-progs v3.19 [lm@mothership ~]$ btrfs fi show ERROR: could not open /dev/sdb3 ERROR: could not open /dev/sdb4 btrfs-progs v3.19 [lm@mothership ~]$ btrfs fi df / Data, single: total=18.01GiB, used=16.25GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.00GiB, used=448.55MiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=160.00MiB, used=0.00B p.s. After hard reset root partition did not mount with BTRFS: open_ctree failed error. p.p.s. partition /dev/sdb3 has been fixed by #btrfs-zero-log /dev/sdb3 -- Best regards, Mikhail Lemhzhin-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: Doc: Add warning and note on btrfs-convert.
Original Message Subject: Re: [PATCH v2] btrfs-progs: Doc: Add warning and note on btrfs-convert. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年04月02日 23:19 On Thu, Mar 26, 2015 at 10:19:24AM +0800, Qu Wenruo wrote: +WARNING: To ensure *btrfs-convert* be able to rollback btrfs, one should never +execute *btrfs filesystem defragment* or *btrfs balance* command on the +converted btrfs. So it looks like a fundamental problem, not lack of implementation. The original filesystem has some correspondence between physical blocks (1:1 match in ext) and btrfs blocks (where the mapping is not 1:1, though from the beginning physical matches logical). Once we balance data, the chunks get moved and the original phyisical offset is lost. We'd have to remember that somewhere and restore upon rollback. I don't see now why defrag is harmful to rollback. The defragmented data are written to the ext free space, ie. where all new modifications get written. The old data are pinned by the ext2_saved subvolume and can be restored. Or not? Oh, I forgot ext*_image is readonly, so defrag should be OK. I'll remove defrag from warning. BTW, although we use 1:1 physical bytenr and if the extent is moved, we lost its physical bytenr, but we still have its offset in ext*_image, and its logical file offset is the same as its original physical bytenr. So, why not use file offset as physical bytenr to do rollback? It should make btrfs-convert to rollback even after balance. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] btrfs-progs: Doc: Add warning and note on btrfs-convert.
Although btrfs-convert can rollback converted btrfs, it still has some limitation to ensure rollback. Add a warning on the limitations. Also add a note for users who decides to go on with btrfs and don't need the rollback ability. Reported-by: Vytautas D vyt...@gmail.com Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- v2: Add reporter tags v3: Remove 'btrfs fi defrag' from commands that may disable rollback ability. Add a little explain to make the warning more reasonable. --- Documentation/btrfs-convert.txt | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/btrfs-convert.txt b/Documentation/btrfs-convert.txt index 8b3f05b..b57af73 100644 --- a/Documentation/btrfs-convert.txt +++ b/Documentation/btrfs-convert.txt @@ -15,6 +15,16 @@ DESCRIPTION and the original filesystem image is accessible as from separate subvolume named 'ext2_saved' as file image. +WARNING: If one hopes to rollback to ext2/3/4, he or she should not execute +*btrfs balance* command on converted btrfs. +Since it will change the extent layout and make *btrfs-convert* unable to +rollback. + +NOTE: If one is satisfied with converted btrfs, and not longer wants to +rollback to ext2/3/4, it is highly recommended to remove 'ext2_saved' subvolume +and execute *btrfs filesystem defragment* and *btrfs balance* command on the +converted btrfs. + OPTIONS --- -d|--no-datasum:: -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: convert-test: Add test for converting ext* with regular file extent.
Original Message Subject: Re: [PATCH 2/2] btrfs-progs: convert-test: Add test for converting ext* with regular file extent. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年04月02日 23:45 On Thu, Apr 02, 2015 at 10:21:36AM +0800, Qu Wenruo wrote: Before previous patch, btrfs-convert will result fsck complain if there is any regular file extent in newly converted btrfs. Add test case for it. Please separate the changes that update generic code and the test itself. OK, I'll update it soon. +script_dir=$(dirname $(realpath $0)) +top=$(realpath $script_dir/../) Please use upper case names. Some fsck-tests use lower case names, I'll update them too. Thanks, Qu +TEST_DEV=${TEST_DEV:-} +TEST_MNT=${TEST_MNT:-$top/tests/mnt} +RESULT=$top/tests/convert-tests-results.txt RESULTS +IMAGE=$script_dir/test.img -_fail() -{ - echo $* | tee -a convert-tests-results.txt - exit 1 -} +source $top/tests/common +export top +export RESULT +# For comprehensive convert test which needs write something into ext* +export TEST_MNT +export LANG + +rm -f $RESULT +mkdir -p $TEST_MNT || _fail unable to create mount point on $TEST_MNT + +# test reply on btrfs-convert +check_prereq btrfs-convert +check_prereq btrfs -rm -f convert-tests-results.txt -test(){ +convert_test(){ echo [TEST] $1 nodesize=$2 shift 2 - echo creating ext image with: $* convert-tests-results.txt + echo creating ext image with: $* $RESULT # 256MB is the smallest acceptable btrfs image. - rm -f $here/test.img convert-tests-results.txt 21 \ + rm -f $IMAGE $RESULT 21 \ || _fail could not remove test image file - truncate -s 256M $here/test.img convert-tests-results.txt 21 \ + truncate -s 256M $IMAGE $RESULT 21 \ || _fail could not create test image file - $* -F $here/test.img convert-tests-results.txt 21 \ + $* -F $IMAGE $RESULT 21 \ || _fail filesystem create failed - $here/btrfs-convert -N $nodesize $here/test.img \ -convert-tests-results.txt 21 \ + + # write a file with regular file extent + $SUDO_HELPER mount $IMAGE $TEST_MNT + $SUDO_HELPER dd if=/dev/zero bs=$nodesize count=4 of=$TEST_MNT/test \ + 1/dev/null 21 + $SUDO_HELPER umount $TEST_MNT + + # do convert test + $top/btrfs-convert -N $nodesize $script_dir/test.img \ $IMAGE instead of $script_dir/test.img +$RESULT 21 \ || _fail btrfs-convert failed - $here/btrfs check $here/test.img convert-tests-results.txt 21 \ same here + $top/btrfs check $script_dir/test.img $RESULT 21 \ and here || _fail btrfs check detected errors Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is this normal? Should I use scrub?
On Thu, Apr 02, 2015 at 09:58:39AM +, Andy Smith wrote: Hi Hugo, Thanks for your help. Makes a change from you answering my questions. :) On Wed, Apr 01, 2015 at 03:42:02PM +, Hugo Mills wrote: On Wed, Apr 01, 2015 at 03:11:14PM +, Andy Smith wrote: Should I run a scrub as well? Yes. The output you've had so far will be just the pieces that the FS has tried to read, and where, as a result, it's been able to detect the out-of-date data. A scrub will check and fix everything. Thanks, things seem to be fine now. :) What's the difference between verufy and csum here? verify would be where the internal consistency checks for metadata failed. That might be, for example, where it's detected that a tree node has a newer transaction ID (effectively a monotonic timestamp) than its parent. This should never happen, so the parent is probably out of date. If there's another copy of the metadata that doesn't have the same problem, it can be used to repair the obviously-wrong copy. csum is where the checksum validation failed -- this would be, for example, where some data was modified on one copy and left unchanged on the older copy, but the metadata for both copies was updated. In that case, the data on the out-of-date drive wouldn't match the checksum, and needs to be updated from the good copy. Hugo. scrub status for 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb scrub device /dev/sdh (id 2) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 14642 seconds total bytes scrubbed: 383.42GiB with 0 errors scrub device /dev/sdg (id 3) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 14504 seconds total bytes scrubbed: 382.62GiB with 0 errors scrub device /dev/sdf (id 4) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 14436 seconds total bytes scrubbed: 383.00GiB with 0 errors scrub device /dev/sdk (id 5) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 21156 seconds total bytes scrubbed: 1.13TiB with 14530 errors error details: verify=10909 csum=3621 corrected errors: 14530, uncorrectable errors: 0, unverified errors: 0 scrub device /dev/sdj (id 6) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 5693 seconds total bytes scrubbed: 119.42GiB with 0 errors scrub device /dev/sde (id 7) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 5282 seconds total bytes scrubbed: 114.45GiB with 0 errors Cheers, Andy -- Hugo Mills | Debugging is like hitting yourself in the head with hugo@... carfax.org.uk | hammer: it feels so good when you find the bug, and http://carfax.org.uk/ | you're allowed to stop debugging. PGP: 65E74AC0 |PotatoEngineer signature.asc Description: Digital signature
Btrfs hangs 3.19-10
I've get it several times, after rebooting or unclean shutdown system. This is very strange bug, because if i reboot, and mount it from live cd, all that okay, and after reboot in system, system successful mount all and working good. i did try to found any previous issues on it, and found nothing. [ 240.100043] INFO: task mount:485 blocked for more than 120 seconds. [ 240.100156] Not tainted 3.19.0-10-generic #10-Ubuntu [ 240.100244] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.100359] mount D 88009e907818 0 485 1 0x0004 [ 240.100364] 88009e907818 8804473ac4b0 00014200 88009e907fd8 [ 240.100368] 00014200 88044ae3f5c0 8804473ac4b0 88009e907828 [ 240.100371] 88045d073c70 88045d073cd8 88045d073cf0 88009e907858 [ 240.100374] Call Trace: [ 240.100386] [817c38e9] schedule+0x29/0x70 [ 240.100427] [c050a7e5] btrfs_tree_lock+0x55/0x1f0 [btrfs] [ 240.100432] [810b6200] ? wait_woken+0x90/0x90 [ 240.100444] [c04acc69] btrfs_search_slot+0x709/0xa60 [btrfs] [ 240.100457] [c04ae8ad] btrfs_insert_empty_items+0x7d/0xd0 [btrfs] [ 240.100469] [c04a78aa] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 240.100487] [c050aab8] btrfs_insert_orphan_item+0x58/0x80 [btrfs] [ 240.100509] [c050c04e] insert_orphan_item+0x5e/0x90 [btrfs] [ 240.100529] [c0510d11] replay_one_buffer+0x351/0x370 [btrfs] [ 240.100547] [c050b991] walk_up_log_tree+0xd1/0x240 [btrfs] [ 240.100558] [c04a78aa] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 240.100577] [c050bb9b] walk_log_tree+0x9b/0x1a0 [btrfs] [ 240.100596] [c0513214] btrfs_recover_log_trees+0x1d4/0x470 [btrfs] [ 240.100614] [c05109c0] ? replay_one_extent+0x6b0/0x6b0 [btrfs] [ 240.100630] [c04ceb23] open_ctree+0x1813/0x2090 [btrfs] [ 240.100642] [c04a4b00] btrfs_mount+0x850/0x920 [btrfs] [ 240.100649] [811f7208] mount_fs+0x38/0x1c0 [ 240.100653] [8119b5e5] ? __alloc_percpu+0x15/0x20 [ 240.100658] [812138fb] vfs_kern_mount+0x6b/0x120 [ 240.100662] [81216754] do_mount+0x204/0xb20 [ 240.100665] [8121738b] SyS_mount+0x8b/0xd0 [ 240.100669] [817c824d] system_call_fastpath+0x16/0x1b [ 240.100676] INFO: task btrfs-transacti:506 blocked for more than 120 seconds. [ 240.100780] Not tainted 3.19.0-10-generic #10-Ubuntu [ 240.100869] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.100982] btrfs-transacti D 8804474abdc8 0 506 2 0x [ 240.100986] 8804474abdc8 88009ef83ae0 00014200 8804474abfd8 [ 240.100989] 00014200 81c1a580 88009ef83ae0 8804474abdd8 [ 240.100992] 880448695000 88045d396000 88045d33afc0 8804474abe00 [ 240.100995] Call Trace: [ 240.100998] [817c38e9] schedule+0x29/0x70 [ 240.101018] [c04d14c5] btrfs_commit_transaction+0x375/0xa40 [btrfs] [ 240.101021] [810b6200] ? wait_woken+0x90/0x90 [ 240.101037] [c04ccf5d] transaction_kthread+0x1dd/0x250 [btrfs] [ 240.101052] [c04ccd80] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs] [ 240.101057] [81094679] kthread+0xc9/0xe0 [ 240.101061] [810945b0] ? kthread_create_on_node+0x1c0/0x1c0 [ 240.101064] [817c8198] ret_from_fork+0x58/0x90 [ 240.101067] [810945b0] ? kthread_create_on_node+0x1c0/0x1c0 -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents
On Wed, Apr 01, 2015 at 03:01:07PM -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 4/1/15 2:44 PM, Brian Foster wrote: On Mon, Mar 30, 2015 at 03:11:06PM -0400, Jeff Mahoney wrote: This tests tests four conditions where discard can potentially not discard unused extents completely. We test, with -o discard and with fstrim, scenarios of removing many relatively small files and removing several large files. The important part of the two scenarios is that the large files must be large enough to span a blockgroup alone. It's possible for an entire block group to be emptied and dropped without an opportunity to discard individual extents as would happen with smaller files. The test confirms the discards have occured by using a sparse file mounted via loopback to punch holes and then check how many blocks are still allocated within the file. Signed-off-by: Jeff Mahoney je...@suse.com --- The code looks mostly Ok to me, a few notes below. Those aside, this is a longish test. It takes me about 8 minutes to run on my typical low end vm. My test hardware is a 16 core / 16 GB RAM machine using a commodity SSD. It ran pretty quickly. Yeah, as Dave mentioned, I test on anything from beefier bare metal hardware to fairly resource constrained VMs. We certainly have longer running tests, but sometimes I exclude them when I'm just trying to do regression tests on ongoing development work, etc. I suppose I should start by explaining that I wrote the test to be btrfs specific and then realized that the only thing that was /actually/ btrfs-specific was the btrfs filesystem sync call. I ran it on XFS to ensure it worked as expected, but didn't have any reason to try to adapt it to work in any other environment. Is the 1GB block group magic value mutable in any way, or is it a hardcoded thing (for btrfs I presume)? It would be nice if we could shrink that a bit. If not, perhaps there are some other ways to reduce the runtime... It's not hardcoded for btrfs, but it is by far the most common sized block group. I'd prefer to test what people are using. Ok... - Is there any reason a single discard or trim test instance must be all large or small files? In other words, is there something that this wouldn't catch if the 10GB were 50% filled with large files and %50 with small files? That would allow us to trim the maximum on the range of small file creation and only have two invocations instead of four. Only to draw attention to the obvious failure cases, which are probably specific to btrfs. If a file spans an entire block group and is removed, it skips the individual discards and depends on the block group removal to discard the entire thing (this wasn't happening). If there are lots of small files, it hits different paths, and I wanted to make it clear which one each mode of the test was targeting. Otherwise, whoever hits the failure is going to end up having to do it manually, which defeats the purpose of having an automated test case, IM O. So it seems to me this is somewhat a mix of a functional test, a stress test and a targeted regression test. IIUC, the regression could probably be reproduced and tested for using a smaller block group and thus require significantly less time. The functional test requires testing that the various discard mechanisms work appropriately (e.g., mix of files), but still shouldn't require writing so much data. A stress test certainly requires a larger fs and writing a bit more data. That could probably be managed in a variety of ways. You could throw the whole thing under btrfs as is, split it up into separate tests that are grouped such that it's easier to exclude the longer running test, use fs-specific values as discussed above, use a percentage of SCRATCH_DEV and let the tester determine the time based on the devices under test, etc. The only thing I really care about is the length of time running the test on slower storage. The idea of scaling the file sizes seems a reasonable enough workaround to me, assuming we can get that 8 minutes down to a couple minutes or so. - If the 1GB thing is in fact a btrfs thing, could we make the core test a bit more size agnostic (e.g., perhaps pass the file count/size values as parameters) and then scale the parameters up exclusively for btrfs? For example, set defaults of fssize=1G, largefile=100MB, smallfile=[512b-5MB] or something of that nature and override them to the 10GB, 1GB, 32k-... values for btrfs? That way we don't need to write as much data for fs' where it might not be necessary. If someone wants to weigh in on what sane defaults for other file systems might be, sure. The order of magnitude numbers I threw out above seem reasonable to me, at least for xfs. We can always tweak it later. tests/generic/326 | 164
Re: WARNING at fs/btrfs/super.c:260 __btrfs_abort_transaction (error -17)
On 01/04/2015 15:13, Chris Mason wrote: On Tue, Mar 24, 2015 at 6:23 PM, Sophie just4pleis...@gmail.com wrote: On 24/03/15 17:34, Chris Mason wrote: On Tue, Mar 24, 2015 at 9:43 AM, Sophie Dexter just4pleis...@gmail.com wrote: On 20/03/2015 15:19, Sophie Dexter wrote: I'm given to understand that this is the right place to report a btrfs problem, I apologise if not :-( I have been using my router as a simple NFS NAS for around 2 years with an ext3 formatted 2 TB Western Digital 2.5 USB Passport disk. I have been slowly moving to BTRFS and thought it about time to convert this disk too but unfortunately BTRFS is unreliable on my router :-(. It doesn't take long for an error to happen causing a 'ro' remount. However the disk is unreadable after the remount, both for NFS and locally. Rebooting the router seems to be the only way to access the disk again. I also have a 1 GB swap partition on the disk although swap doesn't appear to be a factor as the problem occurs whether or not swap is enabled (this report is without swap). I used my laptop to convert the fs to btrfs, not my router. My laptop has Fedora 21 with 3.18 kernel and tools. No problems are found when I use my laptop to check and scrub the disk (i.e. with the disk connected directly to my laptop). You have great timing, there are two reports of a very similar abort with 4.0-rc5, but your report makes it clear these are not a regression from 4.0-rc4. Are you able to run btrfsck on this filesystem? I'd like to check for metadata inconsistencies. -chris Hi Chris, Haha, great timing is the secret of good comedy lol OpenWrt has only very recently signed off the 3.18 kernel as the default kernel for my router, I was using a build with 3.14 when I converted my disk and saw the same problem :!: I may have posted something I haven't repeated here in the OpenWrt ticket I opened: https://dev.openwrt.org/ticket/19216 I previously checked and scrubbed the disk when the problem first occurred and happily no problems were found then. Although, I had to use another computer because btrfs check doesn't complete on my router, the process is killed due to lack of memory (btrfs invoked oom-killer) :-( Should I start another topic for this or just accept that that problem is due to a lack of memory? I have just run btrfs check again using (yet another) laptop and I think everything is still OK: # btrfs check /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: ---- checking extents checking free space cache checking fs roots checking csums checking root refs found 930516788539 bytes used err is 0 total csum bytes: 1234353920 total tree bytes: 1458515968 total fs tree bytes: 54571008 total extent tree bytes: 66936832 btree space waste bytes: 73372568 file data blocks allocated: 1264250781696 referenced 1264250781696 Btrfs v3.14.1 # uname -a Linux ##-- 3.16.0-31-generic #43-Ubuntu SMP Tue Mar 10 17:37:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Sophie, can you please grab the latest btrfs progs from git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git And try with that btrfsck? The other image that is reproducing this has an error in the free space cache, so I'd like to confirm if you're hitting the same problem. -chris Hi Chris, I compiled the tools on my Fedora laptop this lunchtime (and noticed a few omissions on the btrfs wiki source repositories page so have requested a wiki account so that I can update that page). I don't have the problematic disk with me at the moment but will btrfsck it again when I get home. Is it helpful to test my disk with different systems? I could build the latest tools on my Ubuntu laptop and I see that OpenWRT updated to 3.19.1 btrfs progs a few days ago too. I will try them on my router anyway to see if btrfsck now completes or if the oom-killer steps in still. I don't know what version of tools are on my Raspbian Raspberry Pi but if they're not the new ones I could try to (cross?) compile those to, but tthat might take me a little while as I haven't compiled anything for my Raspberry Pi before. Happy Easter :-) Sophie x -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND] fstests: test for btrfs send after complex directory hierarchy changes
Test a very complex scenario for a btrfs incremental send operation where a large directory hierarchy had many subtrees moved between parent directories, preserving the names of some directories and inverting the parent-child relationship between some directories (a child in the parent snapshot became a parent, in the send snapshot, of the directory that is its parent in the parent snapshot). This test made the incremental send fail with -ENOMEM because it entered an infinite loop when building path strings that are used as operands of the rename operations issued in the send stream. This issue was fixed by the following linux kernel btrfs patch: Btrfs: incremental send, don't delay directory renames unnecessarily Signed-off-by: Filipe Manana fdman...@suse.com --- Rebased against latest master branch, which implied changing the test number. No changes otherwise. tests/btrfs/087 | 199 tests/btrfs/087.out | 2 + tests/btrfs/group | 1 + 3 files changed, 202 insertions(+) create mode 100644 tests/btrfs/087 create mode 100644 tests/btrfs/087.out diff --git a/tests/btrfs/087 b/tests/btrfs/087 new file mode 100644 index 000..b8ee3e1 --- /dev/null +++ b/tests/btrfs/087 @@ -0,0 +1,199 @@ +#! /bin/bash +# FS QA Test No. btrfs/087 +# +# Test a very complex scenario for a btrfs incremental send operation where a +# large directory hierarchy had many subtrees moved between parent directories, +# preserving the names of some directories and inverting the parent-child +# relationship between some directories (a child in the parent snapshot became +# a parent, in the send snapshot, of the directory that is its parent in the +# parent snapshot). +# +# This test made the incremental send fail with -ENOMEM because it entered an +# infinite loop when building path strings that are used as operands of the +# rename operations issued in the send stream. +# This issue was fixed by the following linux kernel btrfs patch: +# +# Btrfs: incremental send, don't delay directory renames unnecessarily +# +#--- +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fssum +_need_to_be_root + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs $seqres.full 21 +_scratch_mount + +mkdir $SCRATCH_MNT/data +mkdir $SCRATCH_MNT/data/n1 +mkdir $SCRATCH_MNT/data/n1/n2 +mkdir $SCRATCH_MNT/data/n4 +mkdir $SCRATCH_MNT/data/n1/n2/p1 +mkdir $SCRATCH_MNT/data/n1/n2/p1/p2 +mkdir $SCRATCH_MNT/data/t6 +mkdir $SCRATCH_MNT/data/t7 +mkdir -p $SCRATCH_MNT/data/t5/t7 +mkdir $SCRATCH_MNT/data/t2 +mkdir $SCRATCH_MNT/data/t4 +mkdir -p $SCRATCH_MNT/data/t1/t3 +mkdir $SCRATCH_MNT/data/p1 +mv $SCRATCH_MNT/data/t1 $SCRATCH_MNT/data/p1 +mkdir -p $SCRATCH_MNT/data/p1/p2 +mv $SCRATCH_MNT/data/t4 $SCRATCH_MNT/data/p1/p2/t1 +mv $SCRATCH_MNT/data/t5 $SCRATCH_MNT/data/n4/t5 +mv $SCRATCH_MNT/data/n1/n2/p1/p2 $SCRATCH_MNT/data/n4/t5/p2 +mv $SCRATCH_MNT/data/t7 $SCRATCH_MNT/data/n4/t5/p2/t7 +mv $SCRATCH_MNT/data/t2 $SCRATCH_MNT/data/n4/t1 +mv $SCRATCH_MNT/data/p1 $SCRATCH_MNT/data/n4/t5/p2/p1 +mv $SCRATCH_MNT/data/n1/n2 $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2 +mv $SCRATCH_MNT/data/n4/t5/p2/p1/p2/t1 $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/t1 +mv $SCRATCH_MNT/data/n4/t5/t7 $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/t1/t7 +mv $SCRATCH_MNT/data/n4/t5/p2/p1/t1/t3 $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/t1/t3 +mv $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/p1 \ + $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/t1/t7/p1 +mv $SCRATCH_MNT/data/t6 $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/t1/t3/t5 +mv $SCRATCH_MNT/data/n4/t5/p2/p1/t1 $SCRATCH_MNT/data/n4/t5/p2/p1/p2/n2/t1/t3/t1 +mv $SCRATCH_MNT/data/n1
[PATCH v2] fstests: regression test for btrfs file range cloning
Test btrfs file range cloning with the same file as a source and destination. This tests a specific scenario where the extent layout of the file confused the clone ioctl implementation making it return -EEXIST to userspace. This issue was fixed by the following linux kernel patch: Btrfs: fix range cloning when same inode used as source and destination Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Rebased against latest master, which implied changing the test's number, and added steps to test for the case where different source and destination files are used, just to verify if produces exactly the same result as the case where the same file is used as source and destination. tests/btrfs/088 | 120 tests/btrfs/088.out | 20 + tests/btrfs/group | 1 + 3 files changed, 141 insertions(+) create mode 100755 tests/btrfs/088 create mode 100644 tests/btrfs/088.out diff --git a/tests/btrfs/088 b/tests/btrfs/088 new file mode 100755 index 000..ac0a459 --- /dev/null +++ b/tests/btrfs/088 @@ -0,0 +1,120 @@ +#! /bin/bash +# FS QA Test No. btrfs/088 +# +# Test btrfs file range cloning with the same file as a source and destination. +# +# This tests a specific scenario where the extent layout of the file confused +# the clone ioctl implementation making it return -EEXIST to userspace. +# This issue was fixed by the following linux kernel patch: +# +#Btrfs: fix range cloning when same inode used as source and destination +# +#--- +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_cloner +_need_to_be_root + +rm -f $seqres.full + +# Create a file with an extent layout that confused the btrfs clone ioctl +# implementation. The first extent item that is cloned by the second call +# to the cloner program will have only a trailing part of it referenced by +# a new extent item, since the source offset starts in the middle of that +# extent. This confused the clone ioctl because after inserting this new +# extent item it would immediately after process it again thinking it +# corresponded to an extent that existed before - this made it attempt to +# insert a duplicated extent item pointing to the same extent again, which +# made it return an -EEXIST error to userspace and turn the filesystem to +# readonly mode (since the current transaction got aborted). +test_clone() +{ + local bs=$1 + + $XFS_IO_PROG -f -c pwrite -S 0xaa $((2 * $bs)) $((2 * $bs)) \ + $SCRATCH_MNT/foo | _filter_xfs_io + + $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 $SCRATCH_MNT/foo \ + $SCRATCH_MNT/foo + $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 $SCRATCH_MNT/foo \ + $SCRATCH_MNT/foo + + echo File digest after clone operations using same file as source and destination + md5sum $SCRATCH_MNT/foo | _filter_scratch + + # Test cloning using different source and destination files for the + # same exact data - it must produce the exact same result as the case + # before. + $XFS_IO_PROG -f -c pwrite -S 0xaa $((2 * $bs)) $((2 * $bs)) \ + $SCRATCH_MNT/a | _filter_xfs_io + cp $SCRATCH_MNT/a $SCRATCH_MNT/b + + $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 $SCRATCH_MNT/a \ + $SCRATCH_MNT/b + + cp $SCRATCH_MNT/b $SCRATCH_MNT/foo2 + $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 $SCRATCH_MNT/b \ + $SCRATCH_MNT/foo2 + + echo File digest after clone operations using different files as source and destination + md5sum $SCRATCH_MNT/foo2 | _filter_scratch + +} + +# Make sure the test passes offsets and lengths to the btrfs clone ioctl that +#
Re: Btrfs hangs 3.19-10
So cool, thanks Hugo :) 2015-04-02 14:46 GMT+03:00 Hugo Mills h...@carfax.org.uk: On Thu, Apr 02, 2015 at 02:38:24PM +0300, Timofey Titovets wrote: I've get it several times, after rebooting or unclean shutdown system. This is very strange bug, because if i reboot, and mount it from live cd, all that okay, and after reboot in system, system successful mount all and working good. i did try to found any previous issues on it, and found nothing. Try 4.0-rc6, which should have the fix for the problem in it. This was introduced in the stable series, and is now fixed in mainline. It should also be fixed in the next stable release, I believe. Hugo. [ 240.100043] INFO: task mount:485 blocked for more than 120 seconds. [ 240.100156] Not tainted 3.19.0-10-generic #10-Ubuntu [ 240.100244] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.100359] mount D 88009e907818 0 485 1 0x0004 [ 240.100364] 88009e907818 8804473ac4b0 00014200 88009e907fd8 [ 240.100368] 00014200 88044ae3f5c0 8804473ac4b0 88009e907828 [ 240.100371] 88045d073c70 88045d073cd8 88045d073cf0 88009e907858 [ 240.100374] Call Trace: [ 240.100386] [817c38e9] schedule+0x29/0x70 [ 240.100427] [c050a7e5] btrfs_tree_lock+0x55/0x1f0 [btrfs] [ 240.100432] [810b6200] ? wait_woken+0x90/0x90 [ 240.100444] [c04acc69] btrfs_search_slot+0x709/0xa60 [btrfs] [ 240.100457] [c04ae8ad] btrfs_insert_empty_items+0x7d/0xd0 [btrfs] [ 240.100469] [c04a78aa] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 240.100487] [c050aab8] btrfs_insert_orphan_item+0x58/0x80 [btrfs] [ 240.100509] [c050c04e] insert_orphan_item+0x5e/0x90 [btrfs] [ 240.100529] [c0510d11] replay_one_buffer+0x351/0x370 [btrfs] [ 240.100547] [c050b991] walk_up_log_tree+0xd1/0x240 [btrfs] [ 240.100558] [c04a78aa] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 240.100577] [c050bb9b] walk_log_tree+0x9b/0x1a0 [btrfs] [ 240.100596] [c0513214] btrfs_recover_log_trees+0x1d4/0x470 [btrfs] [ 240.100614] [c05109c0] ? replay_one_extent+0x6b0/0x6b0 [btrfs] [ 240.100630] [c04ceb23] open_ctree+0x1813/0x2090 [btrfs] [ 240.100642] [c04a4b00] btrfs_mount+0x850/0x920 [btrfs] [ 240.100649] [811f7208] mount_fs+0x38/0x1c0 [ 240.100653] [8119b5e5] ? __alloc_percpu+0x15/0x20 [ 240.100658] [812138fb] vfs_kern_mount+0x6b/0x120 [ 240.100662] [81216754] do_mount+0x204/0xb20 [ 240.100665] [8121738b] SyS_mount+0x8b/0xd0 [ 240.100669] [817c824d] system_call_fastpath+0x16/0x1b [ 240.100676] INFO: task btrfs-transacti:506 blocked for more than 120 seconds. [ 240.100780] Not tainted 3.19.0-10-generic #10-Ubuntu [ 240.100869] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.100982] btrfs-transacti D 8804474abdc8 0 506 2 0x [ 240.100986] 8804474abdc8 88009ef83ae0 00014200 8804474abfd8 [ 240.100989] 00014200 81c1a580 88009ef83ae0 8804474abdd8 [ 240.100992] 880448695000 88045d396000 88045d33afc0 8804474abe00 [ 240.100995] Call Trace: [ 240.100998] [817c38e9] schedule+0x29/0x70 [ 240.101018] [c04d14c5] btrfs_commit_transaction+0x375/0xa40 [btrfs] [ 240.101021] [810b6200] ? wait_woken+0x90/0x90 [ 240.101037] [c04ccf5d] transaction_kthread+0x1dd/0x250 [btrfs] [ 240.101052] [c04ccd80] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs] [ 240.101057] [81094679] kthread+0xc9/0xe0 [ 240.101061] [810945b0] ? kthread_create_on_node+0x1c0/0x1c0 [ 240.101064] [817c8198] ret_from_fork+0x58/0x90 [ 240.101067] [810945b0] ? kthread_create_on_node+0x1c0/0x1c0 -- Hugo Mills | Great oxymorons of the world, no. 10: hugo@... carfax.org.uk | Business Ethics http://carfax.org.uk/ | PGP: 65E74AC0 | -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is this normal? Should I use scrub?
Hi Hugo, Thanks for your help. On Wed, Apr 01, 2015 at 03:42:02PM +, Hugo Mills wrote: On Wed, Apr 01, 2015 at 03:11:14PM +, Andy Smith wrote: Should I run a scrub as well? Yes. The output you've had so far will be just the pieces that the FS has tried to read, and where, as a result, it's been able to detect the out-of-date data. A scrub will check and fix everything. Thanks, things seem to be fine now. :) What's the difference between verufy and csum here? scrub status for 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb scrub device /dev/sdh (id 2) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 14642 seconds total bytes scrubbed: 383.42GiB with 0 errors scrub device /dev/sdg (id 3) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 14504 seconds total bytes scrubbed: 382.62GiB with 0 errors scrub device /dev/sdf (id 4) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 14436 seconds total bytes scrubbed: 383.00GiB with 0 errors scrub device /dev/sdk (id 5) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 21156 seconds total bytes scrubbed: 1.13TiB with 14530 errors error details: verify=10909 csum=3621 corrected errors: 14530, uncorrectable errors: 0, unverified errors: 0 scrub device /dev/sdj (id 6) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 5693 seconds total bytes scrubbed: 119.42GiB with 0 errors scrub device /dev/sde (id 7) history scrub started at Wed Apr 1 20:05:58 2015 and finished after 5282 seconds total bytes scrubbed: 114.45GiB with 0 errors Cheers, Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs hangs 3.19-10
On Thu, Apr 02, 2015 at 02:38:24PM +0300, Timofey Titovets wrote: I've get it several times, after rebooting or unclean shutdown system. This is very strange bug, because if i reboot, and mount it from live cd, all that okay, and after reboot in system, system successful mount all and working good. i did try to found any previous issues on it, and found nothing. Try 4.0-rc6, which should have the fix for the problem in it. This was introduced in the stable series, and is now fixed in mainline. It should also be fixed in the next stable release, I believe. Hugo. [ 240.100043] INFO: task mount:485 blocked for more than 120 seconds. [ 240.100156] Not tainted 3.19.0-10-generic #10-Ubuntu [ 240.100244] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.100359] mount D 88009e907818 0 485 1 0x0004 [ 240.100364] 88009e907818 8804473ac4b0 00014200 88009e907fd8 [ 240.100368] 00014200 88044ae3f5c0 8804473ac4b0 88009e907828 [ 240.100371] 88045d073c70 88045d073cd8 88045d073cf0 88009e907858 [ 240.100374] Call Trace: [ 240.100386] [817c38e9] schedule+0x29/0x70 [ 240.100427] [c050a7e5] btrfs_tree_lock+0x55/0x1f0 [btrfs] [ 240.100432] [810b6200] ? wait_woken+0x90/0x90 [ 240.100444] [c04acc69] btrfs_search_slot+0x709/0xa60 [btrfs] [ 240.100457] [c04ae8ad] btrfs_insert_empty_items+0x7d/0xd0 [btrfs] [ 240.100469] [c04a78aa] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 240.100487] [c050aab8] btrfs_insert_orphan_item+0x58/0x80 [btrfs] [ 240.100509] [c050c04e] insert_orphan_item+0x5e/0x90 [btrfs] [ 240.100529] [c0510d11] replay_one_buffer+0x351/0x370 [btrfs] [ 240.100547] [c050b991] walk_up_log_tree+0xd1/0x240 [btrfs] [ 240.100558] [c04a78aa] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 240.100577] [c050bb9b] walk_log_tree+0x9b/0x1a0 [btrfs] [ 240.100596] [c0513214] btrfs_recover_log_trees+0x1d4/0x470 [btrfs] [ 240.100614] [c05109c0] ? replay_one_extent+0x6b0/0x6b0 [btrfs] [ 240.100630] [c04ceb23] open_ctree+0x1813/0x2090 [btrfs] [ 240.100642] [c04a4b00] btrfs_mount+0x850/0x920 [btrfs] [ 240.100649] [811f7208] mount_fs+0x38/0x1c0 [ 240.100653] [8119b5e5] ? __alloc_percpu+0x15/0x20 [ 240.100658] [812138fb] vfs_kern_mount+0x6b/0x120 [ 240.100662] [81216754] do_mount+0x204/0xb20 [ 240.100665] [8121738b] SyS_mount+0x8b/0xd0 [ 240.100669] [817c824d] system_call_fastpath+0x16/0x1b [ 240.100676] INFO: task btrfs-transacti:506 blocked for more than 120 seconds. [ 240.100780] Not tainted 3.19.0-10-generic #10-Ubuntu [ 240.100869] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.100982] btrfs-transacti D 8804474abdc8 0 506 2 0x [ 240.100986] 8804474abdc8 88009ef83ae0 00014200 8804474abfd8 [ 240.100989] 00014200 81c1a580 88009ef83ae0 8804474abdd8 [ 240.100992] 880448695000 88045d396000 88045d33afc0 8804474abe00 [ 240.100995] Call Trace: [ 240.100998] [817c38e9] schedule+0x29/0x70 [ 240.101018] [c04d14c5] btrfs_commit_transaction+0x375/0xa40 [btrfs] [ 240.101021] [810b6200] ? wait_woken+0x90/0x90 [ 240.101037] [c04ccf5d] transaction_kthread+0x1dd/0x250 [btrfs] [ 240.101052] [c04ccd80] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs] [ 240.101057] [81094679] kthread+0xc9/0xe0 [ 240.101061] [810945b0] ? kthread_create_on_node+0x1c0/0x1c0 [ 240.101064] [817c8198] ret_from_fork+0x58/0x90 [ 240.101067] [810945b0] ? kthread_create_on_node+0x1c0/0x1c0 -- Hugo Mills | Great oxymorons of the world, no. 10: hugo@... carfax.org.uk | Business Ethics http://carfax.org.uk/ | PGP: 65E74AC0 | signature.asc Description: Digital signature
Re: [PATCH] Btrfs: RENAME_EXCHANGE semantic for renameat2()
On Thu, Apr 2, 2015 at 4:56 AM, Davide Italiano dccitali...@gmail.com wrote: Signed-off-by: Davide Italiano dccitali...@gmail.com Hi, only skimmed through it, a few small comments below. I haven't surely tested it as well (I assume you ran all xfstests from the generic group). Thanks. --- fs/btrfs/inode.c | 190 ++- 1 file changed, 189 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d2e732d..49b0867 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8890,6 +8890,190 @@ static int btrfs_getattr(struct vfsmount *mnt, return 0; } +static int btrfs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = BTRFS_I(old_dir)-root; + struct btrfs_root *dest = BTRFS_I(new_dir)-root; + struct inode *new_inode = new_dentry-d_inode; + struct inode *old_inode = old_dentry-d_inode; + struct timespec ctime = CURRENT_TIME; + u64 old_ino = btrfs_ino(old_inode); + u64 new_ino = btrfs_ino(new_inode); + u64 old_idx = 0; + u64 new_idx = 0; + u64 root_objectid; + int ret; + + /* we only allow rename subvolume link between subvolumes */ + if (old_ino != BTRFS_FIRST_FREE_OBJECTID root != dest) + return -EXDEV; + + /* close the racy window with snapshot create/destroy ioctl */ + if (old_ino == BTRFS_FIRST_FREE_OBJECTID) + down_read(root-fs_info-subvol_sem); + if (new_ino == BTRFS_FIRST_FREE_OBJECTID) + down_read(dest-fs_info-subvol_sem); + + /* +* We want to reserve the absolute worst case amount of items. So if +* both inodes are subvols and we need to unlink them then that would +* require 4 item modifications, but if they are both normal inodes it +* would require 5 item modifications, so we'll assume their normal +* inodes. So 5 * 2 is 10, plus 2 for the new links, so 12 total items +* should cover the worst case number of items we'll modify. +*/ + trans = btrfs_start_transaction(root, 12); + if (IS_ERR(trans)) { +ret = PTR_ERR(trans); +goto out_notrans; +} + + /* +* We need to find a free sequence number both in the source and +* in the destination directory for the exchange. +*/ + ret = btrfs_set_inode_index(new_dir, old_idx); + if (ret) + goto out_fail; + ret = btrfs_set_inode_index(old_dir, new_idx); + if (ret) + goto out_fail; + + BTRFS_I(old_inode)-dir_index = 0ULL; + BTRFS_I(new_inode)-dir_index = 0ULL; + + /* Reference for the source. */ + if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) { + /* force full log commit if subvolume involved. */ + btrfs_set_log_full_commit(root-fs_info, trans); + } else { + ret = btrfs_insert_inode_ref(trans, dest, +new_dentry-d_name.name, +new_dentry-d_name.len, +old_ino, +btrfs_ino(new_dir), old_idx); + if (ret) + goto out_fail; + btrfs_pin_log_trans(root); + } + + /* And now for the dest. */ + if (unlikely(new_ino == BTRFS_FIRST_FREE_OBJECTID)) { + /* force full log commit if subvolume involved. */ + btrfs_set_log_full_commit(dest-fs_info, trans); + } else { + ret = btrfs_insert_inode_ref(trans, root, +old_dentry-d_name.name, +old_dentry-d_name.len, +new_ino, +btrfs_ino(old_dir), new_idx); + if (ret) + goto out_fail; + btrfs_pin_log_trans(dest); + } + + /* +* Update i-node version and ctime/mtime. +*/ + inode_inc_iversion(old_dir); + inode_inc_iversion(new_dir); + inode_inc_iversion(old_inode); + inode_inc_iversion(new_inode); + old_dir-i_ctime = old_dir-i_mtime = ctime; + new_dir-i_ctime = new_dir-i_mtime = ctime; + old_inode-i_ctime = ctime; + new_inode-i_ctime = ctime; + + if (old_dentry-d_parent != new_dentry-d_parent) { + btrfs_record_unlink_dir(trans, old_dir, old_inode, 1); + btrfs_record_unlink_dir(trans, new_dir, new_inode, 1); + } + + /* src is a subvolume */ +
Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents
On Mon, 30 Mar 2015, Jeff Mahoney wrote: Date: Mon, 30 Mar 2015 15:11:06 -0400 From: Jeff Mahoney je...@suse.com To: linux-btrfs linux-btrfs@vger.kernel.org, fste...@vger.kernel.org Subject: [PATCH] xfstests: generic: test for discard properly discarding unused extents This tests tests four conditions where discard can potentially not discard unused extents completely. We test, with -o discard and with fstrim, scenarios of removing many relatively small files and removing several large files. The important part of the two scenarios is that the large files must be large enough to span a blockgroup alone. It's possible for an entire block group to be emptied and dropped without an opportunity to discard individual extents as would happen with smaller files. The test confirms the discards have occured by using a sparse file mounted via loopback to punch holes and then check how many blocks are still allocated within the file. Signed-off-by: Jeff Mahoney je...@suse.com --- tests/generic/326 | 164 ++ tests/generic/326.out | 5 ++ tests/generic/group | 1 + 3 files changed, 170 insertions(+) create mode 100644 tests/generic/326 create mode 100644 tests/generic/326.out diff --git a/tests/generic/326 b/tests/generic/326 new file mode 100644 index 000..923a27f --- /dev/null +++ b/tests/generic/326 @@ -0,0 +1,164 @@ +#! /bin/bash +# FSQA Test No. 326 +# +# This test uses a loopback mount with PUNCH_HOLE support to test +# whether discard operations are working as expected. +# +# It tests both -odiscard and fstrim. +# +# Copyright (C) 2015 SUSE. All Rights Reserved. +# Author: Jeff Mahoney je...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +loopdev= +tmpdir= +_cleanup() +{ + [ -n $tmpdir ] umount $tmpdir + [ -n $loopdev ] losetup -d $loopdev +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs generic +_supported_os Linux +_require_scratch +_require_fstrim + +rm -f $seqres.full + +_scratch_mkfs $seqres.full +_require_fs_space $SCRATCH_MNT $(( 10 * 1024 * 1024 )) +_scratch_mount + +test_discard() +{ + discard=$1 + files=$2 + + tmpfile=$SCRATCH_MNT/testfs.img.$$ + tmpdir=$SCRATCH_MNT/testdir.$$ + mkdir -p $tmpdir || _fail !!! failed to create temp mount dir + + # Create a sparse file to host the file system + dd if=/dev/zero of=$tmpfile bs=1M count=1 seek=10240 $seqres.full \ + || _fail !!! failed to create fs image file You can just use truncate here. + + opts= + if [ $discard = discard ]; then + opts=-o discard + fi + losetup -f $tmpfile + loopdev=$(losetup -j $tmpfile|awk -F: '{print $1}') you can just do loopdev=$(losetup --show -f $tmpfile) + _mkfs_dev $loopdev $seqres.full + $MOUNT_PROG $opts $loopdev $tmpdir \ + || _fail !!! failed to loopback mount + + if [ $files = large ]; then + # Create files larger than 1GB so each one occupies + # more than one block group Why it needs to be that big ? Can't you make btrfs groups smaller ? + for n in $(seq 1 8); do + dd if=/dev/zero of=$tmpdir/file$n bs=1M count=1200 \ + $seqres.full + done + else + # Create up to 40k files sized 32k-1GB. + mkdir -p $tmpdir/testdir + for ((i = 1; i = 4; i++)); do + SIZE=$(( $((1024*1024*1024)) / $(( $RANDOM + 1 )) )) + $XFS_IO_PROG -f -c pwrite -S 0xaa 0 $SIZE \ + $tmpdir/testdir/${prefix}_$i /dev/null + if [ $? -ne 0 ]; then + echo Failed creating file ${prefix}_$i \ + $seqres.full + break + fi + done + fi + + sync
Re: [PATCH v2] btrfs-progs: Doc: Add warning and note on btrfs-convert.
David Sterba posted on Thu, 02 Apr 2015 17:19:31 +0200 as excerpted: On Thu, Mar 26, 2015 at 10:19:24AM +0800, Qu Wenruo wrote: WARNING: To ensure *btrfs-convert* be able to rollback btrfs, one should never execute *btrfs filesystem defragment* or *btrfs balance* command on the converted btrfs. I don't see now why defrag is harmful to rollback. The defragmented data are written to the ext free space, ie. where all new modifications get written. The old data are pinned by the ext2_saved subvolume and can be restored. Or not? Is defrag ever going to be snapshot-aware-enabled again? If not, then I don't see that (snapshot-unaware) defrag can affect ext2_saved either. But with snapshot-aware-defrag, AFAIK defrag would affect ext2_saved, unless of course it was special-cased... -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs-progs: Doc: Add warning and note on btrfs-convert.
Qu Wenruo posted on Fri, 03 Apr 2015 09:21:15 +0800 as excerpted: +WARNING: If one hopes to rollback to ext2/3/4, he or she should not execute +*btrfs balance* command on converted btrfs. +Since it will change the extent layout and make *btrfs-convert* unable to +rollback. Because since is subordinating and the sentence doesn't contain what it is subordinated to as that's explained in the previous sentence, Since it will... introduces a sentence fragment. Possible correction 1, combine the sentences: Either simply eliminate the period following the previous sentence, or convert it to a comma, either way combining the sentences into one. Possible correction 2, eliminate the subordinator: Eliminate Since and begin the second sentence with A balance... (replacing it). Possible correction 3, reorder and edit, something like this: +WARNING: Do not execute a *btrfs balance on a converted btrfs until you +are sure you will not rollback, as a balance changes the extent layout +and prevents btrfs-convert from successfully rolling back to ext2/3/4. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs hangs 3.19-10
Roman Mamedov posted on Thu, 02 Apr 2015 19:13:12 +0500 as excerpted: Yeah I believe I just hit that in 3.14.37, system unbootable (locks up at Scanning for Btrfs filesystems), resulted in a many hour downtime as it was a remote system w/o IPMI. Fine after a reboot to 3.14.34. Too bad that even staying on the stable series kernel can back-stab you like that from time to time. Well, btrfs itself isn't really stable yet... Stable series should be stable at least to the extent that whatever you're using in them is, but with btrfs itself not yet entirely stable... If your goal is full stability, something that has been demonstrated mature, stable, and problem free for some period (say six months if you're edgy, a year if you're moderate, at least two years if you're conservative), is still a far better choice. Given existing problems even now, even if 4.0 btrfs proves entirely stable, btrfs won't have that demonstrated history at least until the end of whatever length of period you consider appropriate, because that couldn't start until 4.0's release, which is still in the future. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SSD mode on HDD
Hey, I figured out that for some reason on both ubuntu and debian, SSD mode seems to be turned on even on HDD's (hard disk drives - eg. those with rotating disk). I figured out now, but it's been like this for at least a year or more and I didn't really had any problems with this. Is it OK to have SSD mode enabled on HDDs? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please add 9c4f61f01d269815bb7c37be3ede59c5587747c6 to stable
Hi stable friends, Can you please backport this one to 3.19.y. It fixes a bug introduced by: 381cf6587f8a8a8e981bc0c18859b51dc756, which was tagged for stable 3.14+ The symptoms of the bug are deadlocks during log reply after a crash. The patch wasn't intentionally fixing the deadlock, which is why we missed it when tagging fixes. Please put this commit everywhere you've cherry-picked 381cf6587f8a8a8e981bc0c18859b51dc756 commit 9c4f61f01d269815bb7c37be3ede59c5587747c6 Author: David Sterba dste...@suse.cz Date: Fri Jan 2 19:12:57 2015 +0100 btrfs: simplify insert_orphan_item We can search and add the orphan item in one go, btrfs_insert_orphan_item will find out if the item already exists. Signed-off-by: David Sterba dste...@suse.cz -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: test for btrfs transaction abortion on device with discard support
Test that btrfs' transaction abortion does not corrupt a filesystem mounted with -o discard nor allows a subsequent fstrim to corrupt the filesystem (regardless of being mounted with or without -o discard). This issue was fixed by the following linux kernel patch: Btrfs: fix fs corruption on transaction abort if device supports discard (commit 678886bdc6378c1cbd5072da2c5a3035000214e3) Without the corresponding btrfs fix the fs becomes unmountable and fails like this: $ ./check btrfs/089 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian3 3.19.0-btrfs-next-7+ MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 btrfs/089 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/089.out.bad) --- tests/btrfs/089.out 2015-04-02 16:46:28.022498841 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/089.out.bad 2015-04-02 16:48:05.406195409 +0100 @@ -1,2 +1,8 @@ QA output created by 089 -File content after transaction abort + remount: hello +mount: wrong fs type, bad option, bad superblock on /dev/sdc, + missing codepage or helper program, or other error + In some cases useful info is found in syslog - try + dmesg | tail or so + ... (Run 'diff -u tests/btrfs/089.out /home/fdmanana/git/hub/xfstests/results//btrfs/089.out.bad' to see the entire diff) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//btrfs/089.full) Ran: btrfs/089 Failures: btrfs/089 Failed 1 of 1 tests $ cat /home/fdmanana/git/hub/xfstests/results//btrfs/089.full Performing full device TRIM (100.00GiB) ... _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 read block failed check_tree_block Couldn't read tree root Couldn't open file system (...) Signed-off-by: Filipe Manana fdman...@suse.com --- tests/btrfs/089 | 129 tests/btrfs/089.out | 2 + tests/btrfs/group | 1 + 3 files changed, 132 insertions(+) create mode 100755 tests/btrfs/089 create mode 100644 tests/btrfs/089.out diff --git a/tests/btrfs/089 b/tests/btrfs/089 new file mode 100755 index 000..032a8aa --- /dev/null +++ b/tests/btrfs/089 @@ -0,0 +1,129 @@ +#! /bin/bash +# FS QA Test No. btrfs/089 +# +# Test that btrfs' transaction abortion does not corrupt a filesystem mounted +# with -o discard nor allows a subsequent fstrim to corrupt the filesystem +# (regardless of being mounted with or without -o discard). +# +# This issue was fixed by the following linux kernel patch: +# +#Btrfs: fix fs corruption on transaction abort if device supports discard +#(commit 678886bdc6378c1cbd5072da2c5a3035000214e3) +# +#--- +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fail_make_request +_need_to_be_root + +allow_fail_make_request() +{ + echo 100 $DEBUGFS_MNT/fail_make_request/probability + echo 999 $DEBUGFS_MNT/fail_make_request/times + echo 0 /sys/kernel/debug/fail_make_request/verbose +} + +disallow_fail_make_request() +{ + echo 0 $DEBUGFS_MNT/fail_make_request/probability + echo 0 $DEBUGFS_MNT/fail_make_request/times +} + +SCRATCH_BDEV=`_short_dev $SCRATCH_DEV` + +start_fail_scratch_dev() +{ + echo 1 /sys/block/$SCRATCH_BDEV/make-it-fail +} + +stop_fail_scratch_dev() +{ + echo 0
Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes
On Wed, Apr 01, 2015 at 08:49:54PM -0700, Omar Sandoval wrote: Hm, yeah, that's unfortunate, thanks for pointing that out. It looks like we can get the subvolume ID reliably: diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 05fef19..a74ddb3 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1024,6 +1024,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry) struct btrfs_root *root = info-tree_root; char *compress_type; + seq_printf(seq, ,subvolid=%llu, + BTRFS_I(d_inode(dentry))-root-root_key.objectid); yes, subvolid is reliable, but not very friendly from users' POV. I'd like to see subvol=/path there. Possibly we can have both. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: Doc: Add warning and note on btrfs-convert.
On Thu, Mar 26, 2015 at 10:19:24AM +0800, Qu Wenruo wrote: +WARNING: To ensure *btrfs-convert* be able to rollback btrfs, one should never +execute *btrfs filesystem defragment* or *btrfs balance* command on the +converted btrfs. So it looks like a fundamental problem, not lack of implementation. The original filesystem has some correspondence between physical blocks (1:1 match in ext) and btrfs blocks (where the mapping is not 1:1, though from the beginning physical matches logical). Once we balance data, the chunks get moved and the original phyisical offset is lost. We'd have to remember that somewhere and restore upon rollback. I don't see now why defrag is harmful to rollback. The defragmented data are written to the ext free space, ie. where all new modifications get written. The old data are pinned by the ext2_saved subvolume and can be restored. Or not? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs doc: emphasis that only mounted device works for btrfs device stats
On Wed, Apr 01, 2015 at 03:52:57AM -0400, Chen Hanxiao wrote: We provided format path|device in command line. But btrfs device stats doesn't work if device is not mounted. Although that's right, the dev stats are easy to implement in userspace. Applied, as it reflects current state. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: convert: Make ext*_image file obey datacsum setting.
On Thu, Apr 02, 2015 at 10:21:35AM +0800, Qu Wenruo wrote: Before this patch, ext*_image is always set NODATACSUM inode flag. However btrfs-convert will set normal file with DATACUSM flag by default, and generate checksum for regular file extent. Now, a regular file extent is shared by a btrfs file inode with DATACSUM and ext*_image with NODATACSUM, and it has checksum in csum tree. This will cause btrfsck complain about odd checksum, since ext*_image is set NODATACSUM but has checksum generated from regular file extent. This patch makes convert completely obey datacsum setting, meaning btrfs-convert will generate csum for every file extent by default. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: convert-test: Add test for converting ext* with regular file extent.
On Thu, Apr 02, 2015 at 10:21:36AM +0800, Qu Wenruo wrote: Before previous patch, btrfs-convert will result fsck complain if there is any regular file extent in newly converted btrfs. Add test case for it. Please separate the changes that update generic code and the test itself. +script_dir=$(dirname $(realpath $0)) +top=$(realpath $script_dir/../) Please use upper case names. +TEST_DEV=${TEST_DEV:-} +TEST_MNT=${TEST_MNT:-$top/tests/mnt} +RESULT=$top/tests/convert-tests-results.txt RESULTS +IMAGE=$script_dir/test.img -_fail() -{ - echo $* | tee -a convert-tests-results.txt - exit 1 -} +source $top/tests/common +export top +export RESULT +# For comprehensive convert test which needs write something into ext* +export TEST_MNT +export LANG + +rm -f $RESULT +mkdir -p $TEST_MNT || _fail unable to create mount point on $TEST_MNT + +# test reply on btrfs-convert +check_prereq btrfs-convert +check_prereq btrfs -rm -f convert-tests-results.txt -test(){ +convert_test(){ echo [TEST] $1 nodesize=$2 shift 2 - echo creating ext image with: $* convert-tests-results.txt + echo creating ext image with: $* $RESULT # 256MB is the smallest acceptable btrfs image. - rm -f $here/test.img convert-tests-results.txt 21 \ + rm -f $IMAGE $RESULT 21 \ || _fail could not remove test image file - truncate -s 256M $here/test.img convert-tests-results.txt 21 \ + truncate -s 256M $IMAGE $RESULT 21 \ || _fail could not create test image file - $* -F $here/test.img convert-tests-results.txt 21 \ + $* -F $IMAGE $RESULT 21 \ || _fail filesystem create failed - $here/btrfs-convert -N $nodesize $here/test.img \ - convert-tests-results.txt 21 \ + + # write a file with regular file extent + $SUDO_HELPER mount $IMAGE $TEST_MNT + $SUDO_HELPER dd if=/dev/zero bs=$nodesize count=4 of=$TEST_MNT/test \ + 1/dev/null 21 + $SUDO_HELPER umount $TEST_MNT + + # do convert test + $top/btrfs-convert -N $nodesize $script_dir/test.img \ $IMAGE instead of $script_dir/test.img + $RESULT 21 \ || _fail btrfs-convert failed - $here/btrfs check $here/test.img convert-tests-results.txt 21 \ same here + $top/btrfs check $script_dir/test.img $RESULT 21 \ and here || _fail btrfs check detected errors Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 4/2/15 8:33 AM, Lukáš Czerner wrote: On Mon, 30 Mar 2015, Jeff Mahoney wrote: Date: Mon, 30 Mar 2015 15:11:06 -0400 From: Jeff Mahoney je...@suse.com To: linux-btrfs linux-btrfs@vger.kernel.org, fste...@vger.kernel.org Subject: [PATCH] xfstests: generic: test for discard properly discarding unused extents This tests tests four conditions where discard can potentially not discard unused extents completely. We test, with -o discard and with fstrim, scenarios of removing many relatively small files and removing several large files. The important part of the two scenarios is that the large files must be large enough to span a blockgroup alone. It's possible for an entire block group to be emptied and dropped without an opportunity to discard individual extents as would happen with smaller files. The test confirms the discards have occured by using a sparse file mounted via loopback to punch holes and then check how many blocks are still allocated within the file. Signed-off-by: Jeff Mahoney je...@suse.com --- tests/generic/326 | 164 ++ tests/generic/326.out | 5 ++ tests/generic/group | 1 + 3 files changed, 170 insertions(+) create mode 100644 tests/generic/326 create mode 100644 tests/generic/326.out diff --git a/tests/generic/326 b/tests/generic/326 new file mode 100644 index 000..923a27f --- /dev/null +++ b/tests/generic/326 @@ -0,0 +1,164 @@ +#! /bin/bash +# FSQA Test No. 326 +# +# This test uses a loopback mount with PUNCH_HOLE support to test +# whether discard operations are working as expected. +# +# It tests both -odiscard and fstrim. +# +# Copyright (C) 2015 SUSE. All Rights Reserved. +# Author: Jeff Mahoney je...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- - +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1# failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +loopdev= +tmpdir= +_cleanup() +{ +[ -n $tmpdir ] umount $tmpdir +[ -n $loopdev ] losetup -d $loopdev +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs generic +_supported_os Linux +_require_scratch +_require_fstrim + +rm -f $seqres.full + +_scratch_mkfs $seqres.full +_require_fs_space $SCRATCH_MNT $(( 10 * 1024 * 1024 )) +_scratch_mount + +test_discard() +{ + discard=$1 +files=$2 + + tmpfile=$SCRATCH_MNT/testfs.img.$$ + tmpdir=$SCRATCH_MNT/testdir.$$ + mkdir -p $tmpdir || _fail !!! failed to create temp mount dir + + # Create a sparse file to host the file system + dd if=/dev/zero of=$tmpfile bs=1M count=1 seek=10240 $seqres.full \ + || _fail !!! failed to create fs image file You can just use truncate here. Yep. + + opts= + if [ $discard = discard ]; then + opts=-o discard + fi +losetup -f $tmpfile + loopdev=$(losetup -j $tmpfile|awk -F: '{print $1}') you can just do loopdev=$(losetup --show -f $tmpfile) Thanks, that's a good tip! +_mkfs_dev $loopdev $seqres.full +$MOUNT_PROG $opts $loopdev $tmpdir \ + || _fail !!! failed to loopback mount + + if [ $files = large ]; then +# Create files larger than 1GB so each one occupies + # more than one block group Why it needs to be that big ? Can't you make btrfs groups smaller ? Not in the way you're expecting. Block group sizes are hardcoded to a certain size within the kernel based on what the chunk will be used for (Data=1GB, Metadata=1GB, 256MB for fs size 50 GB, System=32MB). The only modifier is if a chunk will comprise more than 10% of the total capacity of the file system, so chunks will be scaled smaller on file systems smaller than 10GB. +if [ $discard = trim ]; then + $FSTRIM_PROG $tmpdir + fi + +$UMOUNT_PROG $tmpdir + rmdir $tmpdir + tmpdir= + + # Sync the backing file system to ensure the hole punches have +# happened and we can trust the result. + if [ $FSTYP = btrfs ]; then + _run_btrfs_util_prog filesystem sync $SCRATCH_MNT + fi +
Re: Btrfs hangs 3.19-10
On Thu, 2 Apr 2015 11:46:08 + Hugo Mills h...@carfax.org.uk wrote: On Thu, Apr 02, 2015 at 02:38:24PM +0300, Timofey Titovets wrote: I've get it several times, after rebooting or unclean shutdown system. This is very strange bug, because if i reboot, and mount it from live cd, all that okay, and after reboot in system, system successful mount all and working good. i did try to found any previous issues on it, and found nothing. Try 4.0-rc6, which should have the fix for the problem in it. This was introduced in the stable series, and is now fixed in mainline. It should also be fixed in the next stable release, I believe. Yeah I believe I just hit that in 3.14.37, system unbootable (locks up at Scanning for Btrfs filesystems), resulted in a many hour downtime as it was a remote system w/o IPMI. Fine after a reboot to 3.14.34. Too bad that even staying on the stable series kernel can back-stab you like that from time to time. -- With respect, Roman signature.asc Description: PGP signature
[PATCH v2] Btrfs: fix range cloning when same inode used as source and destination
While searching for extents to clone we might find one where we only use a part of it coming from its tail. If our destination inode is the same the source inode, we end up removing the tail part of the extent item and insert after a new one that point to the same extent with an adjusted key file offset and data offset. After this we search for the next extent item in the fs/subvol tree with a key that has an offset incremented by one. But this second search leaves us at the new extent item we inserted previously, and since that extent item has a non-zero data offset, it it can make us call btrfs_drop_extents with an empty range (start == end) which causes the following warning: [23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]() (...) [23978.557266] Call Trace: [23978.557978] [81425fd9] dump_stack+0x4c/0x65 [23978.559191] [81045390] warn_slowpath_common+0xa1/0xbb [23978.560699] [a047f0ea] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs] [23978.562389] [8104544d] warn_slowpath_null+0x1a/0x1c [23978.563613] [a047f0ea] btrfs_drop_extent_cache+0x43/0x385 [btrfs] [23978.565103] [810e3a18] ? time_hardirqs_off+0x15/0x28 [23978.566294] [81079ff8] ? trace_hardirqs_off+0xd/0xf [23978.567438] [a047f73d] __btrfs_drop_extents+0x6b/0x9e1 [btrfs] [23978.568702] [8107c03f] ? trace_hardirqs_on+0xd/0xf [23978.569763] [811441c0] ? cache_alloc+0x69/0x2eb [23978.570817] [81142269] ? virt_to_head_page+0x9/0x36 [23978.571872] [81143c15] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb [23978.573466] [811420d5] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18 [23978.574962] [a0480d07] btrfs_drop_extents+0x66/0x7f [btrfs] [23978.576179] [a049aa35] btrfs_clone+0x516/0xaf5 [btrfs] [23978.577311] [a04983dc] ? lock_extent_range+0x7b/0xcd [btrfs] [23978.578520] [a049b2a2] btrfs_ioctl_clone+0x28e/0x39f [btrfs] [23978.580282] [a049d9ae] btrfs_ioctl+0xb51/0x219a [btrfs] (...) [23978.591887] ---[ end trace 988ec2a653d03ed3 ]--- Then we attempt to insert a new extent item with a key that already exists, which makes btrfs_insert_empty_item return -EEXIST resulting in abortion of the current transaction: [23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]() (...) [23978.622589] Call Trace: [23978.623181] [81425fd9] dump_stack+0x4c/0x65 [23978.624359] [81045390] warn_slowpath_common+0xa1/0xbb [23978.625573] [a044ab6c] ? __btrfs_abort_transaction+0x52/0x114 [btrfs] [23978.626971] [810453f0] warn_slowpath_fmt+0x46/0x48 [23978.628003] [8108a6c8] ? vprintk_default+0x1d/0x1f [23978.629138] [a044ab6c] __btrfs_abort_transaction+0x52/0x114 [btrfs] [23978.630528] [a049ad1b] btrfs_clone+0x7fc/0xaf5 [btrfs] [23978.631635] [a04983dc] ? lock_extent_range+0x7b/0xcd [btrfs] [23978.632886] [a049b2a2] btrfs_ioctl_clone+0x28e/0x39f [btrfs] [23978.634119] [a049d9ae] btrfs_ioctl+0xb51/0x219a [btrfs] (...) [23978.647714] ---[ end trace 988ec2a653d03ed4 ]--- This is wrong because we should not process the extent item that we just inserted previously, and instead process the extent item that follows it in the tree For example for the test case I wrote for fstests: bs=$((64 * 1024)) mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc mount /dev/sdc /mnt xfs_io -f -c pwrite -S 0xaa $(($bs * 2)) $(($bs * 2)) /mnt/foo $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo The second clone call fails with -EEXIST, because when we process the first extent item (offset 262144), we drop part of it (counting from the end) and then insert a new extent item with a key greater then the key we found. The next time we search the tree we search for a key with offset 262144 + 1, which leaves us at the new extent item we have just inserted but we think it refers to an extent that we need to clone. Fix this by ensuring the next search key uses an offset corresponding to the offset of the key we found previously plus the data length of the corresponding extent item. This ensures we skip new extent items that we inserted and works for the case of implicit holes too (NO_HOLES feature). A test case for fstests follows soon. Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Fixed a warning about potentially uninitialized variable. David got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc however. fs/btrfs/ioctl.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 09a566a..029f4da 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3254,6 +3254,7 @@ process_slot: u64 datao = 0, datal = 0;
Re: [PATCH] Btrfs: incremental send, clear name from cache after orphanization
On Sun, Mar 08, 2015 at 05:48:25PM +, Filipe Manana wrote: If a directory's reference ends up being orphanized, because the inode currently being processed has a new path that matches that directory's path, make sure we evict the name of the directory from the name cache. This is because there might be descendent inodes (either directories or regular files) that will be orphanized later too, and therefore the orphan name of the ancestor must be used, otherwise we send issue rename operations with a wrong path in the send stream. Reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir -p /mnt/data/n1/n2/p1/p2 $ mkdir /mnt/data/n4 $ mkdir -p /mnt/data/p1/p2 $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ mv /mnt/data/p1/p2 /mnt/data $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/p1 $ mv /mnt/data/p2 /mnt/data/n1/n2/p1 $ mv /mnt/data/n1/n2 /mnt/data/p1 $ mv /mnt/data/p1 /mnt/data/n4 $ mv /mnt/data/n4/p1/n2/p1 /mnt/data $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send /mnt/snap1 -f /tmp/1.send $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt2 $ btrfs receive /mnt2 -f /tmp/1.send $ btrfs receive /mnt2 -f /tmp/2.send ERROR: rename data/p1/p2 - data/n4/p1/p2 failed. no such file or directory Directories data/p1 (inode 263) and data/p1/p2 (inode 264) in the parent snapshot are both orphanized during the incremental send, and as soon as data/p1 is orphanized, we must make sure that when orphanizing data/p1/p2 we use a source path of o263-6-o/p2 for the rename operation instead of the old path data/p1/p2 (the one before the orphanization of inode 263). A test case for xfstests follows soon. Reported-by: Robbie Ko robbi...@synology.com Signed-off-by: Filipe Manana fdman...@suse.com Tested-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: add debugfs file to test transaction aborts
On Wed, Apr 1, 2015 at 12:11 AM, Filipe Manana fdman...@suse.com wrote: At the moment we can not reliably and deterministically test that the transaction abortion code works as expected. For example in the past [1] we had an issue where that code returned the pinned extents to the free space caches allowing fstrim to perform a discard against the physical locations of the extents, and worse, when the fs was mounted with the option -o discard it explicitly did a discard on each pinned extent. This resulted in filesystem corruption, leaving the fs unmountable. This patch adds a debugfs file named abort_transaction, which has a default default value of an empty string, can only be written by someone with root privileges and when a string is written to it, it makes sure all subsequent transaction commits fail at the very end (right before writing the new superblock) if that string matches the label of the filesystem. This way we can for example write a deterministic fstest for commit [1] which looks like: _require_btrfs_debugfs() { if [ -d /sys/kernel/debug/btrfs ]; then BTRFS_DEBUG_FS=/sys/kernel/debug/btrfs elif [ -d /debug/btrfs ]; then BTRFS_DEBUG_FS=/debug else _notrun btrfs debugfs not available fi if [ ! -z $1 ]; then if [ ! -e $BTRFS_DEBUG_FS/$1 ]; then _notrun btrfs debugfs path $1 not available fi fi } _supported_fs btrfs _supported_os Linux _require_scratch _require_btrfs_debugfs abort_transaction _need_to_be_root rm -f $seqres.full # We will abort a btrfs transaction later, which always produces a warning in # dmesg. We do not want the test to fail because of this. _disable_dmesg_check fslabel=btrfs_fstest_$seq _scratch_mkfs -L $fslabel $seqres.full 21 _scratch_mount -o discard _require_batched_discard $SCRATCH_MNT # Create a file and commit the current transaction. echo -n hello $SCRATCH_MNT/foo sync # Now update the file, which forces a COW operation of the fs root, adding # the old root location to the pinned extents list. echo -n world $SCRATCH_MNT/foo # Now abort the current transaction, unmount the fs, mount it again and verify # we can open the file and read its content (which should match what it had # when the last transaction committed successfully). Btrfs used to issue a # discard operation on the extents in the pinned extents list, resulting in # corruption of metadata and data, and used too to return the pinned extents # to the free space caches, allowing future fstrim operations to perform a # discard operation against the pinned exents. echo -n $fslabel $BTRFS_DEBUG_FS/abort_transaction sync echo $BTRFS_DEBUG_FS/abort_transaction $FSTRIM_PROG $SCRATCH_MNT _scratch_unmount _scratch_mount echo File content after transaction abort + remount: $(cat $SCRATCH_MNT/foo) The test's expected output is: File content after transaction abort + remount: hello With patch [1] reverted the test fails with: btrfs/088 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/088.out.bad) --- tests/btrfs/088.out 2015-03-31 19:31:17.558436298 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/088.out.bad 2015-03-31 19:58:12.741403640 +0100 @@ -1,2 +1,8 @@ QA output created by 088 -File content after transaction abort + remount: hello +mount: wrong fs type, bad option, bad superblock on /dev/sdc, + missing codepage or helper program, or other error + In some cases useful info is found in syslog - try + dmesg | tail or so + ... (Run 'diff -u tests/btrfs/088.out /home/fdmanana/git/hub/xfstests/results//btrfs/088.out.bad' to see the entire diff) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//btrfs/088.full) $ cat /home/fdmanana/git/hub/xfstests/results//btrfs/088.full (...) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 Check tree block failed, want=29573120, have=0 read block failed check_tree_block Couldn't read tree root Couldn't open file system *** end fsck.btrfs output With this feature we can also get a fstest for the issue fixed by the patch that fixes log tree corruption when the fs is mounted with -o discard [2]. Btrfs: fix log tree corruption when fs mounted with -o discard [1] commit 678886bdc637 (Btrfs: fix fs corruption on transaction abort if device supports discard) [2] Btrfs: fix log tree corruption when fs mounted with
Re: [PATCH v2] Btrfs: fix range cloning when same inode used as source and destination
On Thu, 02 Apr 2015 18:25:11 +0100, Filipe Manana wrote: V2: Fixed a warning about potentially uninitialized variable. David got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc however. I was *just* about to post this warning, since I saw it only a minute ago! I assume you mean: fs/btrfs/ioctl.c: In function 'btrfs_clone': fs/btrfs/ioctl.c:3531:14: warning: 'next_key_min_offset' may be used uninitialized in this function [-Wmaybe-uninitialized] key.offset = next_key_min_offset; ^ ..and this is with 4.9.2 here. Anyway..thanks for being faster :) Holger -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND] fstests: test for btrfs send after complex directory hierarchy changes
On Thu, Apr 02, 2015 at 11:48:46AM +0100, Filipe Manana wrote: Test a very complex scenario for a btrfs incremental send operation where a large directory hierarchy had many subtrees moved between parent directories, preserving the names of some directories and inverting the parent-child relationship between some directories (a child in the parent snapshot became a parent, in the send snapshot, of the directory that is its parent in the parent snapshot). This test made the incremental send fail with -ENOMEM because it entered an infinite loop when building path strings that are used as operands of the rename operations issued in the send stream. This issue was fixed by the following linux kernel btrfs patch: Btrfs: incremental send, don't delay directory renames unnecessarily Signed-off-by: Filipe Manana fdman...@suse.com Reviewed-by: David Sterba dste...@suse.cz Tested with current master (4.0-rc6) where it fails. The out of memory failure is fixed by the mentioned kernel patch, though to fully pass the test it's needed to also include Btrfs: incremental send, clear name from cache after orphanization https://patchwork.kernel.org/patch/5996271/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: incremental send, don't delay directory renames unnecessarily
On Sat, Mar 28, 2015 at 12:59:46AM +, Filipe Manana wrote: Even though we delay the rename of directories when they become descendents of other directories that were also renamed in the send root to prevent infinite path build loops, we were doing it in cases where this was not needed and was actually harmful resulting in infinite path build loops as we ended up with a circular dependency of delayed directory renames. [...] Reported-by: Robbie Ko robbi...@synology.com Signed-off-by: Filipe Manana fdman...@suse.com Tested-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SSD mode on HDD
On Thu, Apr 02, 2015 at 05:49:04PM +0200, Petr Bena wrote: Hey, I figured out that for some reason on both ubuntu and debian, SSD mode seems to be turned on even on HDD's (hard disk drives - eg. those with rotating disk). I figured out now, but it's been like this for at least a year or more and I didn't really had any problems with this. Is it OK to have SSD mode enabled on HDDs? Yeah, it's safe. It will slightly change the behaviour of allocator and triggers some actions earlier (that are otherwise offloaded to other threads). There's one case where it might make a difference: the metadata defragmentaion is skipped if SSD is enabled. I'm going to send a patch that'll make the behaviour consistent. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix range cloning when same inode used as source and destination
On Thu, Apr 2, 2015 at 6:31 PM, Holger Hoffstätte holger.hoffstae...@googlemail.com wrote: On Thu, 02 Apr 2015 18:25:11 +0100, Filipe Manana wrote: V2: Fixed a warning about potentially uninitialized variable. David got this warning on a 4.5.1 gcc, but I didn't on a 4.9.2 gcc however. I was *just* about to post this warning, since I saw it only a minute ago! I assume you mean: fs/btrfs/ioctl.c: In function 'btrfs_clone': fs/btrfs/ioctl.c:3531:14: warning: 'next_key_min_offset' may be used uninitialized in this function [-Wmaybe-uninitialized] key.offset = next_key_min_offset; ^ ..and this is with 4.9.2 here. Anyway..thanks for being faster :) Yes, that was it. Thanks. Holger -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html