Re: [PATCH v12.1 01/15] btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize
On 2016/07/12 1:41, David Sterba wrote: > On Mon, Jul 11, 2016 at 11:05:29AM +0800, Qu Wenruo wrote: >> From: Wang Xiaoguang>> >> Extract cow_file_range() new parameters for both in-band dedupe and >> subpage sector size patchset. >> >> This should make conflict of both patchset to minimal, and reduce the >> effort needed to rebase them. > > Great, thanks. I did a test merge, there are still conflicts but they > seem to be resolvable more easily, picking the changes from the right > patchset. I haven't tested it so there's more work, but the point was to > get the conflict surface down. > > There's another candidate, btrfs_set_extent_delalloc as it adds a > parameter, can you please send a similar patch for that? He's going to attend LinuxCon Japan (Jul 13 - Jul 15). So I guess he will reply several days later (next week?). Thanks, Satoru > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 has failing disks, but smart is clear
Hi Andrei, Thanks for the info, sorry about the improper terminology. In better news, I discovered that one disk wasn't getting recognized by the OS at certain outputs of one of my mini-sas to sata cables, so I got a new one, and the disk works fine on that. So I'm that's at least one bad cable, I still have to check the other 3, though. -- Corey On 07/08/2016 10:40 PM, Andrei Borzenkov wrote: 07.07.2016 09:40, Corey Coughlin пишет: Hi Tomasz, Thanks for the response! I should clear some things up, though. On 07/06/2016 03:59 PM, Tomasz Kusmierz wrote: On 6 Jul 2016, at 23:14, Corey Coughlinwrote: Hi all, Hoping you all can help, have a strange problem, think I know what's going on, but could use some verification. I set up a raid1 type btrfs filesystem on an Ubuntu 16.04 system, here's what it looks like: btrfs fi show Label: none uuid: 597ee185-36ac-4b68-8961-d4adc13f95d4 Total devices 10 FS bytes used 3.42TiB devid1 size 1.82TiB used 1.18TiB path /dev/sdd devid2 size 698.64GiB used 47.00GiB path /dev/sdk devid3 size 931.51GiB used 280.03GiB path /dev/sdm devid4 size 931.51GiB used 280.00GiB path /dev/sdl devid5 size 1.82TiB used 1.17TiB path /dev/sdi devid6 size 1.82TiB used 823.03GiB path /dev/sdj devid7 size 698.64GiB used 47.00GiB path /dev/sdg devid8 size 1.82TiB used 1.18TiB path /dev/sda devid9 size 1.82TiB used 1.18TiB path /dev/sdb devid 10 size 1.36TiB used 745.03GiB path /dev/sdh Now when I say that the drives mount points change, I'm not saying they change when I reboot. They change while the system is running. For instance, here's the fi show after I ran a "check --repair" run this afternoon: btrfs fi show Label: none uuid: 597ee185-36ac-4b68-8961-d4adc13f95d4 Total devices 10 FS bytes used 3.42TiB devid1 size 1.82TiB used 1.18TiB path /dev/sdd devid2 size 698.64GiB used 47.00GiB path /dev/sdk devid3 size 931.51GiB used 280.03GiB path /dev/sdm devid4 size 931.51GiB used 280.00GiB path /dev/sdl devid5 size 1.82TiB used 1.17TiB path /dev/sdi devid6 size 1.82TiB used 823.03GiB path /dev/sds devid7 size 698.64GiB used 47.00GiB path /dev/sdg devid8 size 1.82TiB used 1.18TiB path /dev/sda devid9 size 1.82TiB used 1.18TiB path /dev/sdb devid 10 size 1.36TiB used 745.03GiB path /dev/sdh Notice that /dev/sdj in the previous run changed to /dev/sds. There was no reboot, the mount just changed. I don't know why that is happening, but it seems like the majority of the errors are on that drive. But given that I've fixed the start/stop issue on that disk, it probably isn't a WD Green issue. It's not "mount point", it is just device names. Do not make it sound more confusing than it already is :) This implies that disks drop off and reappear. Do you have "dmesg" or log (/var/log/syslog or /var/log/messages or journalctl) for the same period of time? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs: add test for an incremental send after moving directories around
On Sat, Jul 02, 2016 at 01:32:08PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana> > Test that an incremental send operation works after doing radical changes > in the directory hierarchy that involve switching the inode that directory > entries point to. > > This test exercises scenarios used to fail in btrfs and are fixed by the > following patches for the linux kernel: > > "Btrfs: send, fix failure to move directories with the same name around" > "Btrfs: incremental send, fix invalid paths for rename operations" > > Signed-off-by: Filipe Manana > --- > tests/btrfs/124 | 261 > > tests/btrfs/124.out | 2 + > tests/btrfs/group | 1 + > 3 files changed, 264 insertions(+) > create mode 100755 tests/btrfs/124 > create mode 100644 tests/btrfs/124.out > > diff --git a/tests/btrfs/124 b/tests/btrfs/124 > new file mode 100755 > index 000..38635a3 > --- /dev/null > +++ b/tests/btrfs/124 > @@ -0,0 +1,261 @@ > +#! /bin/bash > +# FS QA Test No. btrfs/124 > +# > +# Test that an incremental send operation works after doing radical changes > +# in the directory hierarchy that involve switching the inode that directory > +# entries point to. > +# > +#--- > +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved. > +# Author: Filipe Manana > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ cd / to make it consistent with other _cleanup :) > + rm -fr $send_files_dir > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_require_fssum > + > +send_files_dir=$TEST_DIR/btrfs-test-$seq These three tests all take use of $TEST_DIR, but don't _require_test, I think we need it in these tests. > + > +rm -f $seqres.full > +rm -fr $send_files_dir > +mkdir $send_files_dir > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > + > +# case 1 > +mkdir -p $SCRATCH_MNT/case_1/d/p1 > +mkdir $SCRATCH_MNT/case_1/p1 > + > +# case 2 > +mkdir -p $SCRATCH_MNT/case_2/a > +mkdir $SCRATCH_MNT/case_2/d > +mkdir $SCRATCH_MNT/case_2/e > +mkdir $SCRATCH_MNT/case_2/f > +mkdir $SCRATCH_MNT/case_2/ance > +mkdir $SCRATCH_MNT/case_2/d/ance > +mkdir $SCRATCH_MNT/case_2/a/c > +mv $SCRATCH_MNT/case_2/e $SCRATCH_MNT/case_2/d/ance > +mv $SCRATCH_MNT/case_2/f $SCRATCH_MNT/case_2/d/ance > +mv $SCRATCH_MNT/case_2/ance $SCRATCH_MNT/case_2/d/ance > + > +# case 3 > +mkdir -p $SCRATCH_MNT/case_3/d > +mkdir $SCRATCH_MNT/case_3/a > +mkdir $SCRATCH_MNT/case_3/waiting_dir > +mkdir -p $SCRATCH_MNT/case_3/pre/ance > +mkdir $SCRATCH_MNT/case_3/d/ance > +mkdir $SCRATCH_MNT/case_3/a/c > +mv $SCRATCH_MNT/case_3/waiting_dir $SCRATCH_MNT/case_3/d/ance > + > +# case 4 > +mkdir -p $SCRATCH_MNT/case_4/tmp > +mkdir $SCRATCH_MNT/case_4/below_ance > +mkdir -p $SCRATCH_MNT/case_4/pre/wait_dir > +mkdir $SCRATCH_MNT/case_4/desc > +mkdir $SCRATCH_MNT/case_4/ance > +mv $SCRATCH_MNT/case_4/below_ance $SCRATCH_MNT/case_4/ance > +mkdir $SCRATCH_MNT/case_4/other_dir > + > +# Filesystem looks like: > +# > +# . (ino > 256) > +# |--- case_1/ (ino > 257) > +# | | d/ (ino > 258) > +# | | |--- p1/ (ino > 259) > +# | | > +# | | p1/ (ino > 260) > +# | > +# |--- case_2/ (ino > 261) > +# | | a/ (ino > 262) > +# | | | c/ (ino > 268) > +# | | > +# | | d/ (ino > 263) >
[PATCH] Btrfs: change BUG_ON()'s to ASSERT()'s in backref_cache_cleanup()
Since it is just an in-memory building of the backrefs of several btree blocks, nothing is fatal other than memory leaks, so this changes BUG_ON()'s to ASSERT()'s. Signed-off-by: Liu Bo--- fs/btrfs/relocation.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 0477dca..9732919 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -235,12 +235,12 @@ static void backref_cache_cleanup(struct backref_cache *cache) cache->last_trans = 0; for (i = 0; i < BTRFS_MAX_LEVEL; i++) - BUG_ON(!list_empty(>pending[i])); - BUG_ON(!list_empty(>changed)); - BUG_ON(!list_empty(>detached)); - BUG_ON(!RB_EMPTY_ROOT(>rb_root)); - BUG_ON(cache->nr_nodes); - BUG_ON(cache->nr_edges); + ASSERT(list_empty(>pending[i])); + ASSERT(list_empty(>changed)); + ASSERT(list_empty(>detached)); + ASSERT(RB_EMPTY_ROOT(>rb_root)); + ASSERT(!cache->nr_nodes); + ASSERT(!cache->nr_edges); } static struct backref_node *alloc_backref_node(struct backref_cache *cache) -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FIDEDUPERANGE with src_length == 0
Hey, Darrick, generic/182 is failing on Btrfs for me with the following output: --- tests/generic/182.out 2016-07-07 19:51:54.0 -0700 +++ /tmp/fixxfstests/xfstests/results//generic/182.out.bad 2016-07-11 17:28:28.230039216 -0700 @@ -1,12 +1,10 @@ QA output created by 182 Create the original files -dedupe: Extents did not match. f4820540fc0ac02750739896fe028d56 TEST_DIR/test-182/file1 69ad53078a16243d98e21d9f8704a071 TEST_DIR/test-182/file2 69ad53078a16243d98e21d9f8704a071 TEST_DIR/test-182/file2.chk Compare against check files Make the original file almost dedup-able -dedupe: Extents did not match. f4820540fc0ac02750739896fe028d56 TEST_DIR/test-182/file1 158d4e3578b94b89cbb44493a2110fb9 TEST_DIR/test-182/file2 158d4e3578b94b89cbb44493a2110fb9 TEST_DIR/test-182/file2.chk It looks like that test is checking that a dedupe with length == 0 is treated as a dedupe to EOF, but Btrfs doesn't do that [1]. As far as I can tell, it never did, but maybe I'm just confused. What was the behavior when you introduced that test? That seems like a reasonable thing to do, but I wanted to clear this up before changing/fixing Btrfs. Thanks. 1: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/ioctl.c?h=v4.7-rc7#n3122 -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix panic in balance due to EIO
During build_backref_tree(), if we fail to read a btree node, we can eventually run into BUG_ON(cache->nr_nodes) that we put in backref_cache_cleanup(), meaning we have at least one memory leak. This frees the backref_node that we allocate at the very beginning of build_backref_tree(). Signed-off-by: Liu Bo--- fs/btrfs/relocation.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 0477dca..f00267a 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1135,6 +1135,8 @@ out: btrfs_free_path(path1); btrfs_free_path(path2); if (err) { + int orig_free = 0; + while (!list_empty()) { lower = list_entry(useless.next, struct backref_node, list); @@ -1171,8 +1173,13 @@ out: lower = list_entry(useless.next, struct backref_node, list); list_del_init(>list); + if (lower == node) + orig_free = 1; free_backref_node(cache, lower); } + + if (!orig_free) + free_backref_node(cache, node); return ERR_PTR(err); } ASSERT(!node || !node->detached); -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure
On Mon, Jul 11, 2016 at 06:54:02PM -0400, Chris Mason wrote: > On Mon, Jul 11, 2016 at 03:48:38PM -0700, Liu Bo wrote: > > On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote: > > > > > > > > > On 07/11/2016 01:39 PM, Liu Bo wrote: > > > > eb->io_pages is set in read_extent_buffer_pages(). > > > > > > > > In case of readpage failure, for pages that have been added to bio, > > > > it calls bio_endio and later readpage_io_failed_hook() does the work. > > > > > > > > When this eb's page (couldn't be the 1st page) fails to add itself to > > > > bio > > > > due to failure in merge_bio(), it cannot decrease eb->io_pages via > > > > bio_endio, > > > > and ends up with a memory leak eventually. > > > > > > > > This lets __do_readpage propagate errors to callers and adds the > > > > 'atomic_dec(>io_pages)'. > > > > > > Thanks for looking at this Liu, how is it currently being tested? > > > > I have a btrfs disk image which was corrupted by btrfs-corrupt-block > > tool, in that image, the chunk tree's content has been removed while the > > chunk node can be read from read successfully, so we'd get -EIO when > > trying to read tree root's node since __btrfs_map_block() would fail to > > find the right item in chunk mapping_tree. Thus, we can test our error > > handling path in read_extent_buffer_pages(). > > Fantastic. Can you please make this an xfstest, maybe along with a dm-flakey? > as the second phase? Sure, this depends on a btrfs-corrupt-block patch, which I've not sent out, I'll try to work out a xfstests case :) Btw, I'm also planning to add this into our fuzz images of btrfs-progs. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure
On Mon, Jul 11, 2016 at 03:48:38PM -0700, Liu Bo wrote: On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote: On 07/11/2016 01:39 PM, Liu Bo wrote: > eb->io_pages is set in read_extent_buffer_pages(). > > In case of readpage failure, for pages that have been added to bio, > it calls bio_endio and later readpage_io_failed_hook() does the work. > > When this eb's page (couldn't be the 1st page) fails to add itself to bio > due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, > and ends up with a memory leak eventually. > > This lets __do_readpage propagate errors to callers and adds the > 'atomic_dec(>io_pages)'. Thanks for looking at this Liu, how is it currently being tested? I have a btrfs disk image which was corrupted by btrfs-corrupt-block tool, in that image, the chunk tree's content has been removed while the chunk node can be read from read successfully, so we'd get -EIO when trying to read tree root's node since __btrfs_map_block() would fail to find the right item in chunk mapping_tree. Thus, we can test our error handling path in read_extent_buffer_pages(). Fantastic. Can you please make this an xfstest, maybe along with a dm-flakey? as the second phase? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure
On Mon, Jul 11, 2016 at 02:27:39PM -0400, Chris Mason wrote: > > > On 07/11/2016 01:39 PM, Liu Bo wrote: > > eb->io_pages is set in read_extent_buffer_pages(). > > > > In case of readpage failure, for pages that have been added to bio, > > it calls bio_endio and later readpage_io_failed_hook() does the work. > > > > When this eb's page (couldn't be the 1st page) fails to add itself to bio > > due to failure in merge_bio(), it cannot decrease eb->io_pages via > > bio_endio, > > and ends up with a memory leak eventually. > > > > This lets __do_readpage propagate errors to callers and adds the > > 'atomic_dec(>io_pages)'. > > Thanks for looking at this Liu, how is it currently being tested? I have a btrfs disk image which was corrupted by btrfs-corrupt-block tool, in that image, the chunk tree's content has been removed while the chunk node can be read from read successfully, so we'd get -EIO when trying to read tree root's node since __btrfs_map_block() would fail to find the right item in chunk mapping_tree. Thus, we can test our error handling path in read_extent_buffer_pages(). Thanks, -liubo > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND v2] fstests: btrfs/079: Fix wrong value passed to available space check.
From: Qu WenruoWrong value is passed to _require_fs_space, which should be in unit of kilobyte(1024), but passed in unit of gigabyte(1024^3). Fix it. Reviewed-by: David Sterba Signed-off-by: Qu Wenruo --- This is from January of last year, looks like it fell through the cracks. tests/btrfs/079 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/btrfs/079 b/tests/btrfs/079 index 6aee3a373f91..ed4eb727ee2b 100755 --- a/tests/btrfs/079 +++ b/tests/btrfs/079 @@ -72,7 +72,7 @@ rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount -_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024)) +_require_fs_space $SCRATCH_MNT $(($filesize / 1024)) $XFS_IO_PROG -f -c "falloc 0 $filesize" $testfile dd_work() { -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 64-btrfs.rules and degraded boot
On Fri, Jul 8, 2016 at 6:24 AM, Austin S. Hemmelgarnwrote: > To clarify, I'm not trying to argue against adding support, I'm arguing > against it being mandatory. By "D-Bus support" I did not mean to indicate mandating it, just that it would be one possible way to get some very basic state change messages to user space tools so we're doing the least amount of wheel reinvention as possible. > A filesystem which requires specific system > services to be running just for regular maintenance tasks is not a well > designed filesystem. To be entirely honest, I'm not all that happy about > the functional dependency on udev to have device discovery, but there's no > point in me arguing about that... Well everything else that came before it is effectively deprecated, so there's no going back. The way forward would be to get udev more granular state information about a Btrfs volume than 0 and 1. > > Just thinking aloud, but why not do a daemon that does the actual > monitoring, and then provide an interface (at least a UNIX domain socket, > and optionally a D-Bus endpoint) that other tools can use to query > filesystem status. LVM already has a similar setup for monitoring DM-RAID > volumes, snapshots, and thin storage pools, although it's designed as an > event driven tool that does something when specific things happen (for > example, possibly auto-extending snapshots when they start to get full). That would be consistent with mdadm --monitor and dmeventd, but it is yet another wheel reinvention at the lower level, which then also necessitates higher level things to adapt to that interface. It would be neat if there could be some unification and consistency. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix unexpected balance crash due to BUG_ON
On 07/11/2016 01:21 PM, Liu Bo wrote: Mounting a btrfs can resume previous balance operations asynchronously. An user got a crash when one drive has some corrupt sectors. Since balance can cancel itself in case of any error, we can gracefully return errors to upper layers and let balance do the cancel job. Reported-by: sashSigned-off-by: Liu Bo v2 Looks good, thanks Liu. Signed-off-by: Chris Mason -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to mount degraded RAID5
On Mon, Jul 11, 2016 at 11:17 AM, Tomáš Hrdinawrote: > sudo btrfs-debug-tree /dev/sdc > It has 20 lines. Don't know, what you use for bigger files. > > > sudo btrfs-debug-tree -b 6062434418688 /dev/sdc > http://sebsauvage.net/paste/?fc156dee1d1deb3b#YpG/TA0H3I313jMuC4pgsdj++TcuDaFwWIBeuuOXfCA= > > sudo btrfs-debug-tree -b 6062497202176 /dev/sdc > http://sebsauvage.net/paste/?86621abec9c239bd#kwTpZ7BZLcLw71yCfr3jHKZT08zsaXK3RgdFo7MFoFc= > > > sudo btrfs-debug-tree -b 6062470332416 /dev/sdc > http://sebsauvage.net/paste/?4ff40fa0b6b201c9#nFk7pT9MLj2w9egUJlgXdkmCkWyp1vSG0kADfq3J7eA= None of these have anything useful in them, there's no tree root there. > > It got some results, but I don't know, what to look for. > > > sudo btrfs-map-logical -l 7008899547136 /dev/sdc > parent transid verify failed on 7008807157760 wanted 70175 found 70133 > parent transid verify failed on 7008807157760 wanted 70175 found 70133 > checksum verify failed on 7008807157760 found F192848C wanted 1571393A > checksum verify failed on 7008807157760 found F192848C wanted 1571393A > bytenr mismatch, want=7008807157760, have=65536 > mirror 1 logical 7008899547136 physical 735226609664 device /dev/sdb > mirror 2 logical 7008899547136 physical 3166748524544 device /dev/sdc > > > Also I don't know, what to do with this. How to compute new csum. Right, that's pretty tricky to do manually. > > For me, it would be ok to give up and just start fresh. OK in that case do that. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix eb memory leak due to readpage failure
On 07/11/2016 01:39 PM, Liu Bo wrote: eb->io_pages is set in read_extent_buffer_pages(). In case of readpage failure, for pages that have been added to bio, it calls bio_endio and later readpage_io_failed_hook() does the work. When this eb's page (couldn't be the 1st page) fails to add itself to bio due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, and ends up with a memory leak eventually. This lets __do_readpage propagate errors to callers and adds the 'atomic_dec(>io_pages)'. Thanks for looking at this Liu, how is it currently being tested? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] Btrfs: fix eb memory leak due to readpage failure
eb->io_pages is set in read_extent_buffer_pages(). In case of readpage failure, for pages that have been added to bio, it calls bio_endio and later readpage_io_failed_hook() does the work. When this eb's page (couldn't be the 1st page) fails to add itself to bio due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, and ends up with a memory leak eventually. This lets __do_readpage propagate errors to callers and adds the 'atomic_dec(>io_pages)'. Signed-off-by: Liu Bo--- v2: - Move 'dec io_pages' to the caller so that we're consistent with write_one_eb() v3: - Bail out once we fail to read a page and do the cleanup work for eb->io_pages fs/btrfs/extent_io.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ac1a696..7303e5a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2878,6 +2878,7 @@ __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset, * into the tree that are removed when the IO is done (by the end_io * handlers) * XXX JDM: This needs looking at to ensure proper page locking + * return 0 on success, otherwise return error */ static int __do_readpage(struct extent_io_tree *tree, struct page *page, @@ -2899,7 +2900,7 @@ static int __do_readpage(struct extent_io_tree *tree, sector_t sector; struct extent_map *em; struct block_device *bdev; - int ret; + int ret = 0; int nr = 0; size_t pg_offset = 0; size_t iosize; @@ -3080,6 +3081,7 @@ static int __do_readpage(struct extent_io_tree *tree, } else { SetPageError(page); unlock_extent(tree, cur, cur + iosize - 1); + goto out; } cur = cur + iosize; pg_offset += iosize; @@ -3090,7 +3092,7 @@ out: SetPageUptodate(page); unlock_page(page); } - return 0; + return ret; } static inline void __do_contiguous_readpages(struct extent_io_tree *tree, @@ -5230,14 +5232,31 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, atomic_set(>io_pages, num_reads); for (i = start_i; i < num_pages; i++) { page = eb->pages[i]; + if (!PageUptodate(page)) { + if (ret) { + atomic_dec(>io_pages); + unlock_page(page); + continue; + } + ClearPageError(page); err = __extent_read_full_page(tree, page, get_extent, , mirror_num, _flags, READ | REQ_META); - if (err) + if (err) { ret = err; + /* +* We use in above __extent_read_full_page, +* so we ensure that if it returns error, the +* current page fails to add itself to bio and +* it's been unlocked. +* +* We must dec io_pages by ourselves. +*/ + atomic_dec(>io_pages); + } } else { unlock_page(page); } -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)
On 07/11/2016 11:16 AM, David Sterba wrote: On Mon, Jul 11, 2016 at 11:00:55AM -0400, Chris Mason wrote: So, the real bug is that we're letting some delalloc stat hang around after the truncate, probably related to IO in progress. We do already account for delalloc in what we return to stat, but there's a corner case involving truncate where we screw it up. So the original testcase: a) some "tool" creates sparse file b) that tool does not sync explicitly and exits .. c) tar is called immediately after that to archive the sparse file d) tar considers [2] the file is completely sparse (because st_blocks is zero) and archives no data. Here comes data loss. will not happen. The application would basically have to mimick the provided reproducer script and do the truncate/write loop and be lucky enough to let tar hit the short race window. Looking harder there is a race window that can trigger this without the truncate loop: 1) application calls write(), we make the pages delalloc (in-ram st_blocks goes up) 2) VM calls write_cache_pages, we go find a contiguous delalloc range 3) We call cow_file_range on the locked range of pages 4) cow_file_range clears the delalloc bits (in-ram st_blocks goes down) < - race begins here -> 5) The io is started 6) The IO completes and extents are inserted into the metadata 7) the on disk/in-ram st_blocks goes up < race ends here > This makes a ton more sense than leaking delalloc bits. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to mount degraded RAID5
sudo btrfs-debug-tree /dev/sdc It has 20 lines. Don't know, what you use for bigger files. sudo btrfs-debug-tree -b 6062434418688 /dev/sdc http://sebsauvage.net/paste/?fc156dee1d1deb3b#YpG/TA0H3I313jMuC4pgsdj++TcuDaFwWIBeuuOXfCA= sudo btrfs-debug-tree -b 6062497202176 /dev/sdc http://sebsauvage.net/paste/?86621abec9c239bd#kwTpZ7BZLcLw71yCfr3jHKZT08zsaXK3RgdFo7MFoFc= sudo btrfs-debug-tree -b 6062470332416 /dev/sdc http://sebsauvage.net/paste/?4ff40fa0b6b201c9#nFk7pT9MLj2w9egUJlgXdkmCkWyp1vSG0kADfq3J7eA= It got some results, but I don't know, what to look for. sudo btrfs-map-logical -l 7008899547136 /dev/sdc parent transid verify failed on 7008807157760 wanted 70175 found 70133 parent transid verify failed on 7008807157760 wanted 70175 found 70133 checksum verify failed on 7008807157760 found F192848C wanted 1571393A checksum verify failed on 7008807157760 found F192848C wanted 1571393A bytenr mismatch, want=7008807157760, have=65536 mirror 1 logical 7008899547136 physical 735226609664 device /dev/sdb mirror 2 logical 7008899547136 physical 3166748524544 device /dev/sdc Also I don't know, what to do with this. How to compute new csum. For me, it would be ok to give up and just start fresh. Thank you Tomas *From:* Chris Murphy *Sent:* Sunday, July 10, 2016 10:08PM *To:* Tomáš Hrdina *Cc:* Chris Murphy, Btrfs Btrfs *Subject:* Re: Unable to mount degraded RAID5 btrfs-debug-tree -b --- Tato zpráva byla zkontrolována na viry programem Avast Antivirus. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: fix unexpected balance crash due to BUG_ON
Mounting a btrfs can resume previous balance operations asynchronously. An user got a crash when one drive has some corrupt sectors. Since balance can cancel itself in case of any error, we can gracefully return errors to upper layers and let balance do the cancel job. Reported-by: sashSigned-off-by: Liu Bo --- v2: - Initialize path with NULL. - Show more information when we bail out. fs/btrfs/volumes.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 589f128..348a183 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3421,7 +3421,7 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info) u64 size_to_free; u64 chunk_type; struct btrfs_chunk *chunk; - struct btrfs_path *path; + struct btrfs_path *path = NULL; struct btrfs_key key; struct btrfs_key found_key; struct btrfs_trans_handle *trans; @@ -3455,13 +3455,35 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info) ret = btrfs_shrink_device(device, old_size - size_to_free); if (ret == -ENOSPC) break; - BUG_ON(ret); + if (ret) { + /* btrfs_shrink_device never returns ret > 0 */ + WARN_ON(ret > 0); + goto error; + } trans = btrfs_start_transaction(dev_root, 0); - BUG_ON(IS_ERR(trans)); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + btrfs_info(fs_info, +"%s:%d fails on btrfs_start_transaction() right after shrinking devivce %s (original size is %llu new size is %llu", + __func__, __LINE__, + rcu_str_deref(device->name), old_size, + old_size - size_to_free); + goto error; + } ret = btrfs_grow_device(trans, device, old_size); - BUG_ON(ret); + if (ret) { + btrfs_end_transaction(trans, dev_root); + /* btrfs_grow_device never returns ret > 0 */ + WARN_ON(ret > 0); + btrfs_info(fs_info, +"%s:%d fails on btrfs_grow_device() right after shrinking devivce %s (original size is %llu new size is %llu", + __func__, __LINE__, + rcu_str_deref(device->name), old_size, + old_size - size_to_free); + goto error; + } btrfs_end_transaction(trans, dev_root); } -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add option to run balance as daemon
On Mon, Jul 11, 2016 at 07:17:28AM -0400, Austin S. Hemmelgarn wrote: > On 2016-07-11 03:26, Tomasz Torcz wrote: > > On Tue, Jun 21, 2016 at 11:16:59AM -0400, Austin S. Hemmelgarn wrote: > > > Currently, balance operations are run synchronously in the foreground. > > > This is nice for interactive management, but is kind of crappy when you > > > start looking at automation and similar things. > > > > It can be done with simplest systemd unit file: > > btrfs-balance@.service: > > --- > > [Unit] > > Description=btrfs balance for %I > > > > [Service] > > ExecStart=/usr/bin/btrfs balance start %I > > ExecStop=/usr/bin/btrfs balance cancel %I > > --- > > > > It automates quite nicely and needs no additional code. > > > It's also entirely dependent on a couple of things: > 1. You're running systemd (not everyone is, I'm certainly not). So instead of using widespread, tested code, you re-implement parts of it. BTW, your patch for daemonizing does only 5 steps out of 15 described in man 7 daemon. > 2. You're only dealing with the local system. > > The type of situation I'm thinking of is dealing with non-local systems. > For example, running something like this: > ssh user@remotehost btrfs balance start --background / > Keeping the SSH connection open for the duration of the balance has issues > for some people (may close without keep-alive set, uses network bandwidth > with keep-alive set, many people who are hosted have bandwidth quotas > still), and it's extremely useful to have the option to fire and forget. I don't get the local part. Right now, when using above unit you can ssh user@remotehost systemctl start btrfs-balance@- (or even systemctl -H user@remotehost start btrfs-balance@-) and balance for / runs in background on target host. With clean environment, logs being captured, locking against multiple startups and so on. Right now, without any additional code. -- Tomasz .. oo o. oo o. .o .o o. o. oo o. .. Torcz.. .o .o .o .o oo oo .o .. .. oo oo o.o.o. .o .. o. o. o. o. o. o. oo .. .. o. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v12.1 01/15] btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize
On Mon, Jul 11, 2016 at 11:05:29AM +0800, Qu Wenruo wrote: > From: Wang Xiaoguang> > Extract cow_file_range() new parameters for both in-band dedupe and > subpage sector size patchset. > > This should make conflict of both patchset to minimal, and reduce the > effort needed to rebase them. Great, thanks. I did a test merge, there are still conflicts but they seem to be resolvable more easily, picking the changes from the right patchset. I haven't tested it so there's more work, but the point was to get the conflict surface down. There's another candidate, btrfs_set_extent_delalloc as it adds a parameter, can you please send a similar patch for that? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: fix free space calculation in dump_space_info()
Wang Xiaoguang posted on Mon, 11 Jul 2016 19:30:04 +0800 as excerpted: > In btrfs, btrfs_space_info's bytes_may_use is treated as fs used space, > as what we do in reserve_metadata_bytes() or > btrfs_alloc_data_chunk_ondemand(), so in dump_space_info(), when > calculating free space, we should also minus btrfs_space_info's > bytes_may_use. > > Signed-off-by: Wang Xiaoguang> --- > fs/btrfs/extent-tree.c | 4 ++-- I'm not a dev so won't evaluate the patch itself. However, on kernel related mailing lists it's standard practice to include a short, often one-line per revision, revision changelog. It helps reviewers keep track of what changed since the last time they looked at the patch. See pretty much any on-list v2+ patch as an example. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)
On Mon, Jul 11, 2016 at 11:00:55AM -0400, Chris Mason wrote: > So, the real bug is that we're letting some delalloc stat hang around > after the truncate, probably related to IO in progress. We do already > account for delalloc in what we return to stat, but there's a corner > case involving truncate where we screw it up. So the original testcase: > >> a) some "tool" creates sparse file > >> b) that tool does not sync explicitly and exits .. > >> c) tar is called immediately after that to archive the sparse file > >> d) tar considers [2] the file is completely sparse (because st_blocks > >> is > >>zero) and archives no data. Here comes data loss. will not happen. The application would basically have to mimick the provided reproducer script and do the truncate/write loop and be lucky enough to let tar hit the short race window. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)
On 07/11/2016 10:41 AM, David Sterba wrote: On Sat, Jul 02, 2016 at 09:18:07AM +0200, Pavel Raiskup wrote: There are optimizations in archivers (tar, rsync, ...) that rely on up2date st_blocks info. For example, in GNU tar there is optimization check [1] whether the 'st_size' reports more data than the 'st_blocks' can hold --> then tar considers that file is sparse (and does additional steps). It looks like btrfs doesn't show correct value in 'st_blocks' until the data are synced. ATM, there happens that: a) some "tool" creates sparse file b) that tool does not sync explicitly and exits .. c) tar is called immediately after that to archive the sparse file d) tar considers [2] the file is completely sparse (because st_blocks is zero) and archives no data. Here comes data loss. Because we fixed 'btrfs' to report non-zero 'st_blocks' when the file data is small and is in-lined (no real data blocks) -- I consider this is too bug in btrfs worth fixing. [1] http://git.savannah.gnu.org/cgit/paxutils.git/tree/lib/system.h?id=ec72abd9dd63bbff4534ec77e97b1a6cadfc3cf8#n392 [2] http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=ac065c57fdc1788a2769fb119ed0c8146e1b9dd6#n273 Tested on kernel: kernel-4.5.7-300.fc24.x86_64 Originally reported here, reproducer available there: https://bugzilla.redhat.com/show_bug.cgi?id=1352061 The reproducer works for me here. So far I found: * the btrfs implementation of stat.st_blocks (btrfs_getattr) includes the 'delayed allocated' bytes, so there is not a problem in principle (https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_fs_btrfs_inode.c-23L9372=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=Y07_PApD0zOC-eaM4Hq-oxAwNlktnY8bDo7LD2GbXRo=Is_ZqFiL7a4sVWAB1k1ZuAgbNMK-sZ1gcU7oLtyfKoY= ) * calling fsync on the sparsefile will produce the expected result * a short delay between ./binary and 'stat' will also produce correct result, 0.5 seconds worked for me -- so it IMO proves it's a race between writing and reporting the data * I'm not yet sure where the delay between write and synced 'inode->delalloc_bytes' comes from * I think that st_blocks accounting can be wrong anyway, if the file is mmap-ed and not msync-ed, I'm writing a reproducer for this case On my test box running current linux git, things work fine if I run the reproducer once. But if I leave it running in a loop long enough for writeback to kick in, I trigger it. The reproducer has a loop in there where it is adding delalloc writes and truncating them away. What should be happening is that we're leaving some delalloc bits set past EOF, which makes us skip bumping inode->delalloc_bytes during the new write. I can kind of confirm this by changing the reproducer to stat directly after the write call. Normally st_blocks is never zero. But if I leave it running in a loop for 30 seconds or so, I eventually get st_block zero directly after the write(). If I change the C program to unlink the file on exit, running the binary over and over again works every time. So, the real bug is that we're letting some delalloc stat hang around after the truncate, probably related to IO in progress. We do already account for delalloc in what we return to stat, but there's a corner case involving truncate where we screw it up. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: stat() on btrfs reports the st_blocks with delay (data loss in archivers)
On Sat, Jul 02, 2016 at 09:18:07AM +0200, Pavel Raiskup wrote: > There are optimizations in archivers (tar, rsync, ...) that rely on up2date > st_blocks info. For example, in GNU tar there is optimization check [1] > whether the 'st_size' reports more data than the 'st_blocks' can hold --> then > tar considers that file is sparse (and does additional steps). > > It looks like btrfs doesn't show correct value in 'st_blocks' until the data > are synced. ATM, there happens that: > > a) some "tool" creates sparse file > b) that tool does not sync explicitly and exits .. > c) tar is called immediately after that to archive the sparse file > d) tar considers [2] the file is completely sparse (because st_blocks is >zero) and archives no data. Here comes data loss. > > Because we fixed 'btrfs' to report non-zero 'st_blocks' when the file data is > small and is in-lined (no real data blocks) -- I consider this is too bug in > btrfs worth fixing. > > [1] > http://git.savannah.gnu.org/cgit/paxutils.git/tree/lib/system.h?id=ec72abd9dd63bbff4534ec77e97b1a6cadfc3cf8#n392 > [2] > http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=ac065c57fdc1788a2769fb119ed0c8146e1b9dd6#n273 > > Tested on kernel: > kernel-4.5.7-300.fc24.x86_64 > > Originally reported here, reproducer available there: > https://bugzilla.redhat.com/show_bug.cgi?id=1352061 The reproducer works for me here. So far I found: * the btrfs implementation of stat.st_blocks (btrfs_getattr) includes the 'delayed allocated' bytes, so there is not a problem in principle (http://lxr.free-electrons.com/source/fs/btrfs/inode.c#L9372) * calling fsync on the sparsefile will produce the expected result * a short delay between ./binary and 'stat' will also produce correct result, 0.5 seconds worked for me -- so it IMO proves it's a race between writing and reporting the data * I'm not yet sure where the delay between write and synced 'inode->delalloc_bytes' comes from * I think that st_blocks accounting can be wrong anyway, if the file is mmap-ed and not msync-ed, I'm writing a reproducer for this case -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs won't mount to /home
On 2016-07-11 22:56, Roman Mamedov wrote: On Mon, 11 Jul 2016 22:45:13 +0900 Tomasz Chmielewskiwrote: So, weird, isn't it? What's wrong there? Your systemd unmounts it immediately from /home, search the archives there's been a funny story like that recently. Yes, could be similar to this one: https://bugs.archlinux.org/task/44658 In my case, the entry did exist in /etc/fstab, but was different than at the time of booting the system. Tomasz Chmielewski https://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] receive: strip root subvol path during process_clone
Hi David, On Fri, Jun 17, 2016 at 7:18 PM, David Sterbawrote: > thanks for the fix. The mail thread holds enough information to > reproduce the issue, I'll get to that on Monday. I'm not sure about the > use of strstr to match the subvolume name so I want to write the tests > first and craft some paths to see how it works. any update on this? I've been running this patch for a few weeks now with my hourly backups. I could also take a stab at writing a test case but I think I'd need some pointers for that. ;-) I took a look at the send/receive patches that Filipe sent earlier and afaict this isn't fixed in those. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs won't mount to /home
On Mon, 11 Jul 2016 22:45:13 +0900 Tomasz Chmielewskiwrote: > So, weird, isn't it? > > What's wrong there? Your systemd unmounts it immediately from /home, search the archives there's been a funny story like that recently. -- With respect, Roman pgpUH09kQqplk.pgp Description: OpenPGP digital signature
btrfs won't mount to /home
This is kind of strange (kernel 4.6.3, Ubuntu 16.04): # mount -t btrfs /dev/sda4 /home # echo $? 0 # dmesg -c [382148.588847] BTRFS info (device sda4): disk space caching is enabled [382148.588851] BTRFS: has skinny extents So it worked? # ls /home# df | grep home # mount | grep /home # lsof -n | grep /home All give no output. So, weird, isn't it? Now, let's try to mount to /home2: # mkdir /home2 # mount /dev/sda4 /home2 # mount | grep home /dev/sda4 on /home2 type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/) # dmesg -c [382190.199363] BTRFS info (device sda4): disk space caching is enabled [382190.199370] BTRFS: has skinny extents What's wrong there? Tomasz Chmielewski https://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: fix false ENOSPC for btrfs_fallocate()
hello, Please ignore this patch, though this patch is correct to me, and pass the fstests test. I have prepared a new common patch to fix this false ENOSPC bug. Currently I'm doing fstests test, and will sent them tomorrow, thanks. Regards, Xiaoguang Wang On 07/06/2016 06:37 PM, Wang Xiaoguang wrote: Below test scripts can reproduce this false ENOSPC: #!/bin/bash dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128 dev=$(losetup --show -f fs.img) mkfs.btrfs -f -M $dev mkdir /tmp/mntpoint mount /dev/loop0 /tmp/mntpoint cd mntpoint xfs_io -f -c "falloc 0 $((40*1024*1024))" testfile Above fallocate(2) operation will fail for ENOSPC reason, but indeed fs still has free space to satisfy this request. The reason is btrfs_fallocate() dose not decrease btrfs_space_info's bytes_may_use just in time, and it calls btrfs_free_reserved_data_space_noquota() in the end of btrfs_fallocate(), which is too late and have already added false unnecessary pressure to enospc system. See call graph: btrfs_fallocate() |-> btrfs_alloc_data_chunk_ondemand() It will add btrfs_space_info's bytes_may_use accordingly. |-> btrfs_prealloc_file_range() It will call btrfs_reserve_extent(), but note that alloc type is RESERVE_ALLOC_NO_ACCOUNT, so btrfs_update_reserved_bytes() will only increase btrfs_space_info's bytes_reserved accordingly, but will not decrease btrfs_space_info's bytes_may_use, then obviously we have overestimated real needed disk space, and it'll impact other processes who do write(2) or fallocate(2) operations, also can impact metadata reservation in mixed mode, and bytes_max_use will only be decreased in the end of btrfs_fallocate(). To fix this false ENOSPC, we need to decrease btrfs_space_info's bytes_may_use in btrfs_prealloc_file_range() in time, as what we do in cow_file_range(), See call graph in : cow_file_range() |-> extent_clear_unlock_delalloc() |-> clear_extent_bit() |-> btrfs_clear_bit_hook() |-> btrfs_free_reserved_data_space_noquota() This function will decrease bytes_may_use accordingly. So this patch choose to call btrfs_free_reserved_data_space() in __btrfs_prealloc_file_range() for both successful and failed path. Also this patch removes some old and useless comments. Signed-off-by: Wang Xiaoguang--- fs/btrfs/extent-tree.c | 1 - fs/btrfs/file.c| 23 --- fs/btrfs/inode-map.c | 3 +-- fs/btrfs/inode.c | 12 fs/btrfs/relocation.c | 10 +- 5 files changed, 34 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 82b912a..b0c86d2 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3490,7 +3490,6 @@ again: dcs = BTRFS_DC_SETUP; else if (ret == -ENOSPC) set_bit(BTRFS_TRANS_CACHE_ENOSPC, >transaction->flags); - btrfs_free_reserved_data_space(inode, 0, num_pages); out_put: iput(inode); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 2234e88..f872113 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2669,6 +2669,7 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_start = round_down(offset, blocksize); alloc_end = round_up(offset + len, blocksize); + cur_offset = alloc_start; /* Make sure we aren't being give some crap mode */ if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) @@ -2761,7 +2762,6 @@ static long btrfs_fallocate(struct file *file, int mode, /* First, check if we exceed the qgroup limit */ INIT_LIST_HEAD(_list); - cur_offset = alloc_start; while (1) { em = btrfs_get_extent(inode, NULL, 0, cur_offset, alloc_end - cur_offset, 0); @@ -2788,6 +2788,14 @@ static long btrfs_fallocate(struct file *file, int mode, last_byte - cur_offset); if (ret < 0) break; + } else { + /* +* Do not need to reserve unwritten extent for this +* range, free reserved data space first, otherwise +* it'll result false ENOSPC error. +*/ + btrfs_free_reserved_data_space(inode, cur_offset, + last_byte - cur_offset); } free_extent_map(em); cur_offset = last_byte; @@ -2839,18 +2847,11 @@ out_unlock: unlock_extent_cached(_I(inode)->io_tree, alloc_start, locked_end, _state, GFP_KERNEL); out: - /* -* As we waited the extent range, the data_rsv_map must be empty -* in the range, as
[PATCH v2] btrfs: fix free space calculation in dump_space_info()
In btrfs, btrfs_space_info's bytes_may_use is treated as fs used space, as what we do in reserve_metadata_bytes() or btrfs_alloc_data_chunk_ondemand(), so in dump_space_info(), when calculating free space, we should also minus btrfs_space_info's bytes_may_use. Signed-off-by: Wang Xiaoguang--- fs/btrfs/extent-tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 81310ff..f1cc8b8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7694,8 +7694,8 @@ static void dump_space_info(struct btrfs_space_info *info, u64 bytes, printk(KERN_INFO "BTRFS: space_info %llu has %llu free, is %sfull\n", info->flags, info->total_bytes - info->bytes_used - info->bytes_pinned - - info->bytes_reserved - info->bytes_readonly, - (info->full) ? "" : "not "); + info->bytes_reserved - info->bytes_readonly - + info->bytes_may_use, (info->full) ? "" : "not "); printk(KERN_INFO "BTRFS: space_info total=%llu, used=%llu, pinned=%llu, " "reserved=%llu, may_use=%llu, readonly=%llu\n", info->total_bytes, info->bytes_used, info->bytes_pinned, -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: fix free space calculation in dump_space_info()
hello, On 07/08/2016 10:28 PM, David Sterba wrote: On Wed, Jul 06, 2016 at 06:16:06PM +0800, Wang Xiaoguang wrote: hello, On 07/05/2016 01:10 AM, David Sterba wrote: On Wed, Jun 29, 2016 at 01:12:16PM +0800, Wang Xiaoguang wrote: Can you please describe in more detail what is this patch fixing? In original dump_space_info(), free space info is calculated by info->total_bytes - info->bytes_used - info->bytes_pinned - info->bytes_reserved - info->bytes_readonly, but I think free space info should also minus info->bytes_may_use :) Not really what I expected. The change is correct but the changelog should say something "the 'used space' formula is missing the bytes_may_used, that is used elsewhere eg. __reserve_metadata_bytes or space_info_add_old_bytes". That way I have something to verify during the review. Sorry, later I'll try to make my patches more cleaner, thanks. Regards, Xiaoguang Wang -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH] btrfs: Handle uninitialised inode eviction
On Mon, Jul 11, 2016 at 01:57:10PM +0300, Nikolay Borisov wrote: > > > On 07/11/2016 11:48 AM, David Sterba wrote: > > On Mon, Jul 11, 2016 at 09:43:09AM +0300, Nikolay Borisov wrote: > >> The code flow in btrfs_new_inode allows for btrfs_evict_inode to be > >> called with not fully initialised inode (e.g. ->root member not > >> being set). This can happen when btrfs_set_inode_index in > >> btrfs_new_inode fails, which in turn would call iput for the newly > >> allocated inode. This in turn leads to vfs calling into btrfs_evict_inode. > >> This leads to null pointer dereference. To handle this situation check > >> whether > >> the passed inode has root set and just free it in case it doesn't. > >> > >> Signed-off-by: Nikolay Borisov> >> Reviewed-by: Josef Bacik > >> --- > >> fs/btrfs/inode.c | 9 - > >> 1 file changed, 8 insertions(+), 1 deletion(-) > >> > >> Hello, > >> > >> I belive this is fixes the issue reported in > >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809 > > > > There's some time left before 4.7 release, so I'll send another pull > > request, including this patch. > > Now that I think about it, shouldn't this also be queued for stable as well? Yes. Marking patches with stable tags has been very inconsistent so we send patches to stable separately. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: test that send does not issue invalid rmdir operations
From: Filipe MananaTest that an incremental send operation does not prematurely issues rmdir operations under a particular scenario (the rmdir operation is sent before the target directory is empty). This issue is fixed by the following patch for the linux kernel: "Btrfs: incremental send, fix premature rmdir operations" Signed-off-by: Filipe Manana --- tests/btrfs/126 | 123 tests/btrfs/126.out | 2 + tests/btrfs/group | 1 + 3 files changed, 126 insertions(+) create mode 100755 tests/btrfs/126 create mode 100644 tests/btrfs/126.out diff --git a/tests/btrfs/126 b/tests/btrfs/126 new file mode 100755 index 000..7d33549 --- /dev/null +++ b/tests/btrfs/126 @@ -0,0 +1,123 @@ +#! /bin/bash +# FS QA Test No. btrfs/126 +# +# Test that an incremental send operation does not prematurely issues rmdir +# operations under a particular scenario (the rmdir operation is sent before +# the target directory is empty). +# +#--- +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fssum + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +mkdir $SCRATCH_MNT/a +mkdir $SCRATCH_MNT/tmp +mkdir $SCRATCH_MNT/del +mv $SCRATCH_MNT/tmp $SCRATCH_MNT/del +mkdir $SCRATCH_MNT/a/c +mkdir $SCRATCH_MNT/del/x + +# Filesystem looks like: +# +# . (ino 256) +# |--- a/ (ino 257) +# ||--- c/ (ino 260) +# | +# |--- del/ (ino 259) +# |--- tmp/ (ino 258) +# |--- x/ (ino 261) +# +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 + +# When inode 260 was processed, rename operations for it and for inode 258 were +# issued (the rename for inode 260 must happen before the rename for inode 258). +# Then immediately after issuing the rename operation for inode 258, and before +# inode 261 was processed, the send stream issued a rmdir operation for inode +# 260, which would make the receiver fail with the error ENOTEMPTY because inode +# 261 was not yet renamed, it was still a child of inode 260 at that time. +# +mv $SCRATCH_MNT/a/c $SCRATCH_MNT +mv $SCRATCH_MNT/del/x $SCRATCH_MNT/a +mv $SCRATCH_MNT/del/tmp $SCRATCH_MNT/c +rmdir $SCRATCH_MNT/del + +# Filesystem now looks like: +# +# . (ino 256) +# |--- a/ (ino 257) +# ||--- x/ (ino 261) +# | +# |--- c/ (ino 260) +# |--- tmp/(ino 258) +# +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 + +run_check $FSSUM_PROG -A -f -w $send_files_dir/1.fssum $SCRATCH_MNT/mysnap1 +run_check $FSSUM_PROG -A -f -w $send_files_dir/2.fssum \ + -x $SCRATCH_MNT/mysnap2/mysnap1 $SCRATCH_MNT/mysnap2 + +_run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap +_run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ + -f $send_files_dir/2.snap + +# Now recreate the filesystem by receiving both send streams and verify we get +# the same content that the original filesystem had. +_scratch_unmount +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount +
[PATCH 2/3] btrfs: add test for incremental send after removing a directory
From: Filipe MananaTest that, under a particular scenario, an incremental send operation does not leak memory (which used to emit a warning in dmesg/syslog). This is a regression test for a btrfs kernel fix that has the title: "Btrfs: send, fix warning due to late freeing of orphan_dir_info structures". Signed-off-by: Filipe Manana --- tests/btrfs/125 | 127 tests/btrfs/125.out | 2 + tests/btrfs/group | 1 + 3 files changed, 130 insertions(+) create mode 100755 tests/btrfs/125 create mode 100644 tests/btrfs/125.out diff --git a/tests/btrfs/125 b/tests/btrfs/125 new file mode 100755 index 000..2ee39c7 --- /dev/null +++ b/tests/btrfs/125 @@ -0,0 +1,127 @@ +#! /bin/bash +# FS QA Test No. btrfs/125 +# +# Test that, under a particular scenario, an incremental send operation does +# not leak memory (which used to emit a warning in dmesg/syslog). +# +#--- +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fssum + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +mkdir $SCRATCH_MNT/a +mkdir $SCRATCH_MNT/tmp +mkdir $SCRATCH_MNT/del +mv $SCRATCH_MNT/tmp $SCRATCH_MNT/del +mkdir $SCRATCH_MNT/a/c +mkdir $SCRATCH_MNT/del/x +mkdir $SCRATCH_MNT/del/y + +# Filesystem looks like: +# +# . (ino 256) +# |--- a/ (ino 257) +# ||--- c/ (ino 260) +# | +# |--- del/ (ino 259) +# |--- tmp/ (ino 258) +# |--- x/ (ino 261) +# |--- y/ (ino 262) +# +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 + +# It used to be that when attempting to issue an rmdir operation for inode 259, +# the kernel allocated an orphan_dir_info structure so that later after doing +# the delayed rename operation for inode 258 (which happened once inode 260 was +# renamed) it would check if it could finally issue a rmdir instruction for +# inode 259. If it couldn't, it would not release the previously allocated +# orphan_dir_info structure immediately. Instead it would only release it once +# it finished the send stream and it would emit a warning in dmesg/syslog. +# +mv $SCRATCH_MNT/a/c $SCRATCH_MNT +mv $SCRATCH_MNT/del/x $SCRATCH_MNT/a +mv $SCRATCH_MNT/del/y $SCRATCH_MNT/a +mv $SCRATCH_MNT/del/tmp $SCRATCH_MNT/c +rmdir $SCRATCH_MNT/del + +# Filesystem now looks like: +# +# . (ino 256) +# |--- a/ (ino 257) +# ||--- x/ (ino 261) +# ||--- y/ (ino 262) +# | +# |--- c/ (ino 260) +# |--- tmp/(ino 258) +# +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 + +run_check $FSSUM_PROG -A -f -w $send_files_dir/1.fssum $SCRATCH_MNT/mysnap1 +run_check $FSSUM_PROG -A -f -w $send_files_dir/2.fssum \ + -x $SCRATCH_MNT/mysnap2/mysnap1 $SCRATCH_MNT/mysnap2 + +_run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap +_run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ + -f $send_files_dir/2.snap +
Re: [PATCH] btrfs-progs: add option to run balance as daemon
On 2016-07-11 03:26, Tomasz Torcz wrote: On Tue, Jun 21, 2016 at 11:16:59AM -0400, Austin S. Hemmelgarn wrote: Currently, balance operations are run synchronously in the foreground. This is nice for interactive management, but is kind of crappy when you start looking at automation and similar things. It can be done with simplest systemd unit file: btrfs-balance@.service: --- [Unit] Description=btrfs balance for %I [Service] ExecStart=/usr/bin/btrfs balance start %I ExecStop=/usr/bin/btrfs balance cancel %I --- It automates quite nicely and needs no additional code. It's also entirely dependent on a couple of things: 1. You're running systemd (not everyone is, I'm certainly not). 2. You're only dealing with the local system. The type of situation I'm thinking of is dealing with non-local systems. For example, running something like this: ssh user@remotehost btrfs balance start --background / Keeping the SSH connection open for the duration of the balance has issues for some people (may close without keep-alive set, uses network bandwidth with keep-alive set, many people who are hosted have bandwidth quotas still), and it's extremely useful to have the option to fire and forget. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs: add test for an incremental send after moving directories around
From: Filipe MananaTest that an incremental send operation works after doing radical changes in the directory hierarchy that involve switching the inode that directory entries point to. This test exercises scenarios used to fail in btrfs and are fixed by the following patches for the linux kernel: "Btrfs: send, fix failure to move directories with the same name around" "Btrfs: incremental send, fix invalid paths for rename operations" Signed-off-by: Filipe Manana --- tests/btrfs/124 | 261 tests/btrfs/124.out | 2 + tests/btrfs/group | 1 + 3 files changed, 264 insertions(+) create mode 100755 tests/btrfs/124 create mode 100644 tests/btrfs/124.out diff --git a/tests/btrfs/124 b/tests/btrfs/124 new file mode 100755 index 000..38635a3 --- /dev/null +++ b/tests/btrfs/124 @@ -0,0 +1,261 @@ +#! /bin/bash +# FS QA Test No. btrfs/124 +# +# Test that an incremental send operation works after doing radical changes +# in the directory hierarchy that involve switching the inode that directory +# entries point to. +# +#--- +# Copyright (C) 2016 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_fssum + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# case 1 +mkdir -p $SCRATCH_MNT/case_1/d/p1 +mkdir $SCRATCH_MNT/case_1/p1 + +# case 2 +mkdir -p $SCRATCH_MNT/case_2/a +mkdir $SCRATCH_MNT/case_2/d +mkdir $SCRATCH_MNT/case_2/e +mkdir $SCRATCH_MNT/case_2/f +mkdir $SCRATCH_MNT/case_2/ance +mkdir $SCRATCH_MNT/case_2/d/ance +mkdir $SCRATCH_MNT/case_2/a/c +mv $SCRATCH_MNT/case_2/e $SCRATCH_MNT/case_2/d/ance +mv $SCRATCH_MNT/case_2/f $SCRATCH_MNT/case_2/d/ance +mv $SCRATCH_MNT/case_2/ance $SCRATCH_MNT/case_2/d/ance + +# case 3 +mkdir -p $SCRATCH_MNT/case_3/d +mkdir $SCRATCH_MNT/case_3/a +mkdir $SCRATCH_MNT/case_3/waiting_dir +mkdir -p $SCRATCH_MNT/case_3/pre/ance +mkdir $SCRATCH_MNT/case_3/d/ance +mkdir $SCRATCH_MNT/case_3/a/c +mv $SCRATCH_MNT/case_3/waiting_dir $SCRATCH_MNT/case_3/d/ance + +# case 4 +mkdir -p $SCRATCH_MNT/case_4/tmp +mkdir $SCRATCH_MNT/case_4/below_ance +mkdir -p $SCRATCH_MNT/case_4/pre/wait_dir +mkdir $SCRATCH_MNT/case_4/desc +mkdir $SCRATCH_MNT/case_4/ance +mv $SCRATCH_MNT/case_4/below_ance $SCRATCH_MNT/case_4/ance +mkdir $SCRATCH_MNT/case_4/other_dir + +# Filesystem looks like: +# +# . (ino 256) +# |--- case_1/ (ino 257) +# | | d/ (ino 258) +# | | |--- p1/ (ino 259) +# | | +# | | p1/ (ino 260) +# | +# |--- case_2/ (ino 261) +# | | a/ (ino 262) +# | | | c/ (ino 268) +# | | +# | | d/ (ino 263) +# | | ance/ (ino 267) +# | | e/ (ino 264) +# | | f/ (ino 265) +# | | ance/ (ino 266) +# | +# |--- case_3/ (ino 269) +# | | a/ (ino 271) +# | | | c/
[PATCH 5/7] Btrfs: send, fix warning due to late freeing of orphan_dir_info structures
From: Robbie KoUnder certain situations, when doing an incremental send, we can end up not freeing orphan_dir_info structures as soon as they are no longer needed. Instead we end up freeing them only after finishing the send stream, which causes a warning to be emitted: [282735.229200] [ cut here ] [282735.229968] WARNING: CPU: 9 PID: 10588 at fs/btrfs/send.c:6298 btrfs_ioctl_send+0xe2f/0xe51 [btrfs] [282735.231282] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [282735.237130] CPU: 9 PID: 10588 Comm: btrfs Tainted: GW 4.6.0-rc7-btrfs-next-31+ #1 [282735.239309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [282735.240160] 880224273ca8 8126b42c [282735.240160] 880224273ce8 81052b14 189a24273ac8 [282735.240160] 8802210c9800 0001 [282735.240160] Call Trace: [282735.240160] [] dump_stack+0x67/0x90 [282735.240160] [] __warn+0xc2/0xdd [282735.240160] [] warn_slowpath_null+0x1d/0x1f [282735.240160] [] btrfs_ioctl_send+0xe2f/0xe51 [btrfs] [282735.240160] [] btrfs_ioctl+0x14f/0x1f81 [btrfs] [282735.240160] [] ? arch_local_irq_save+0x9/0xc [282735.240160] [] vfs_ioctl+0x18/0x34 [282735.240160] [] do_vfs_ioctl+0x550/0x5be [282735.240160] [] ? __fget+0x6b/0x77 [282735.240160] [] ? __fget_light+0x62/0x71 [282735.240160] [] SyS_ioctl+0x57/0x79 [282735.240160] [] entry_SYSCALL_64_fastpath+0x18/0xa8 [282735.240160] [] ? time_hardirqs_off+0x9/0x14 [282735.240160] [] ? trace_hardirqs_off_caller+0x1f/0xaa [282735.256343] ---[ end trace a4539270c8056f93 ]--- Consider the following example: Parent snapshot: . (ino 256) |--- a/ (ino 257) ||--- c/ (ino 260) | |--- del/ (ino 259) |--- tmp/ (ino 258) |--- x/ (ino 261) |--- y/ (ino 262) Send snapshot: . (ino 256) |--- a/ (ino 257) ||--- x/ (ino 261) ||--- y/ (ino 262) | |--- c/ (ino 260) |--- tmp/(ino 258) 1) When processing inode 258, we end up delaying its rename operation because it has an ancestor (in the send snapshot) that has a higher inode number (inode 260) which was also renamed in the send snapshot, therefore we delay the rename of inode 258 so that it happens after inode 260 is renamed; 2) When processing inode 259, we end up delaying its deletion (rmdir operation) because it has a child inode (258) that has its rename operation delayed. At this point we allocate an orphan_dir_info structure and tag inode 258 so that we later attempt to see if we can delete (rmdir) inode 259 once inode 258 is renamed; 3) When we process inode 260, after renaming it we finally do the rename operation for inode 258. Once we issue the rename operation for inode 258 we notice that this inode was tagged so that we attempt to see if at this point we can delete (rmdir) inode 259. But at this point we can not still delete inode 259 because it has 2 children, inodes 261 and 262, that were not yet processed and therefore not yet moved (renamed) away from inode 259. We end up not freeing the orphan_dir_info structure allocated in step 2; 4) We process inodes 261 and 262, and once we move/rename inode 262 we issue the rmdir operation for inode 260; 5) We finish the send stream and notice that red black tree that contains orphan_dir_info structures is not empty, so we emit a warning and then free any orphan_dir_structures left. So fix this by freeing an orphan_dir_info structure once we try to apply a pending rename operation if we can not delete yet the tagged directory. A test case for fstests follows soon. Signed-off-by: Robbie Ko Signed-off-by: Filipe Manana [Modified changelog to be more detailed and easier to understand] Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 4 1
[PATCH 4/7] Btrfs: incremental send, fix premature rmdir operations
From: Robbie KoUnder certain situations, an incremental send operation can contain a rmdir operation that will make the receiving end fail when attempting to execute it, because the target directory is not yet empty. Consider the following example: Parent snapshot: . (ino 256) |--- a/ (ino 257) ||--- c/ (ino 260) | |--- del/ (ino 259) |--- tmp/ (ino 258) |--- x/ (ino 261) Send snapshot: . (ino 256) |--- a/ (ino 257) ||--- x/ (ino 261) | |--- c/ (ino 260) |--- tmp/(ino 258) 1) When processing inode 258, we delay its rename operation because inode 260 is its new parent in the send snapshot and it was not yet renamed (since 260 > 258, that is, beyond the current progress); 2) When processing inode 259, we realize we can not yet send an rmdir operation (against inode 259) because inode 258 was still not yet renamed/moved away from inode 259. Therefore we update data structures so that after inode 258 is renamed, we try again to see if we can finally send an rmdir operation for inode 259; 3) When we process inode 260, we send a rename operation for it followed by a rename operation for inode 258. Once we send the rename operation for inode 258 we then check if we can finally issue an rmdir for its previous parent, inode 259, by calling the can_rmdir() function with a value of sctx->cur_ino + 1 (260 + 1 = 261) for its "progress" argument. This makes can_rmdir() return true (value 1) because even though there's still a child inode of inode 259 that was not yet renamed/moved, which is inode 261, the given value of progress (261) is not lower then 261 (that is, not lower than the inode number of some child of inode 259). So we end up sending a rmdir operation for inode 259 before its child inode 261 is processed and renamed. So fix this by passing the correct progress value to the call to can_rmdir() from within apply_dir_move() (where we issue delayed rename operations), which should match stcx->cur_ino (the number of the inode currently being processed) and not sctx->cur_ino + 1. A test case for fstests follows soon. Signed-off-by: Robbie Ko Signed-off-by: Filipe Manana [Rewrote change log to be more detailed, clear and well formatted] Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 3d6a4dd..4115aba 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3236,7 +3236,7 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm) /* already deleted */ goto finish; } - ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino + 1); + ret = can_rmdir(sctx, rmdir_ino, odi->gen, sctx->cur_ino); if (ret < 0) goto out; if (!ret) -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations
From: Robbie KoDuring an incremental send, if we have delayed rename operations for inodes that were children of directories which were removed in the send snapshot, we can end up accessing incorrect items in a leaf or accessing beyond the last item of the leaf due to issuing utimes operations for the removed inodes. Consider the following example: Parent snapshot: . (ino 256) |--- a/ (ino 257) ||--- c/ (ino 262) | |--- b/ (ino 258) ||--- d/ (ino 263) | |--- del/ (ino 261) |--- x/ (ino 259) |--- y/ (ino 260) Send snapshot: . (ino 256) |--- a/ (ino 257) | |--- b/ (ino 258) | |--- c/ (ino 262) ||--- y/ (ino 260) | |--- d/ (ino 263) |--- x/ (ino 259) 1) When processing inodes 259 and 260, we end up delaying their rename operations because their parents, inodes 263 and 262 respectively, were not yet processed and therefore not yet renamed; 2) When processing inode 262, its rename operation is issued and right after the rename operation for inode 260 is issued. However right after issuing the rename operation for inode 260, at send.c:apply_dir_move(), we issue utimes operations for all current and past parents of inode 260. This means we try to send a utimes operation for its old parent, inode 261 (deleted in the send snapshot), which does not cause any immediate and deterministic failure, because when the target inode is not found in the send snapshot, the send.c:send_utimes() function ignores it and uses the leaf region pointed to by path->slots[0], which can be any unrelated item (belonging to other inode) or it can be a region outside the leaf boundaries, if the leaf is full and path->slots[0] matches the number of items in the leaf. So we end up either successfully sending a utimes operation, which is fine and irrelevant because the old parent (inode 261) will end up being deleted later, or we end up doing an invalid memory access tha crashes the kernel. So fix this by making apply_dir_move() issue utimes operations only for parents that still exist in the send snapshot. In a separate patch we will make send_utimes() return an error (-ENOENT) if the given inode does not exists in the send snapshot. Signed-off-by: Robbie Ko Signed-off-by: Filipe Manana [Rewrote change log to be more detailed and better organized] Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e552c33..8b65396 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3270,8 +3270,18 @@ finish: * and old parent(s). */ list_for_each_entry(cur, >update_refs, list) { - if (cur->dir == rmdir_ino) + /* +* The parent inode might have been deleted in the send snapshot +*/ + ret = get_inode_info(sctx->send_root, cur->dir, NULL, +NULL, NULL, NULL, NULL, NULL); + if (ret == -ENOENT) { + ret = 0; continue; + } + if (ret < 0) + goto out; + ret = send_utimes(sctx, cur->dir, cur->dir_gen); if (ret < 0) goto out; -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] Btrfs: send, avoid incorrect leaf accesses when sending utimes operations
From: Filipe MananaThe caller of send_utimes() is supposed to be sure that the inode number it passes to this function does actually exists in the send snapshot. However due to logic/algorithm bugs (such as the one fixed by the patch titled "Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations"), this might not be the case and when that happens it makes send_utimes() access use an unrelated leaf item as the target inode item or access beyond a leaf's boundaries (when the leaf is full and path->slots[0] matches the number of items in the leaf). So if the call to btrfs_search_slot() done by send_utimes() does not find the inode item, just make sure send_utimes() returns -ENOENT and does not silently accesses unrelated leaf items or does invalid leaf accesses, also allowing us to easialy and deterministically catch such algorithmic/logic bugs. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 8b65396..2db8dc8 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -2502,6 +2502,8 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); key.type = BTRFS_INODE_ITEM_KEY; key.offset = 0; ret = btrfs_search_slot(NULL, sctx->send_root, , path, 0, 0); + if (ret > 0) + ret = -ENOENT; if (ret < 0) goto out; -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] Btrfs: send, add missing error check for calls to path_loop()
From: Filipe MananaThe function path_loop() can return a negative integer, signaling an error, 0 if there's no path loop and 1 if there's a path loop. We were treating any non zero values as meaning that a path loop exists. Fix this by explicitly checking for errors and gracefully return them to user space. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 993e1ba..0dc05bb 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3200,6 +3200,8 @@ static int apply_dir_move(struct send_ctx *sctx, struct pending_dir_move *pm) sctx->send_progress = sctx->cur_ino + 1; ret = path_loop(sctx, name, pm->ino, pm->gen, ); + if (ret < 0) + goto out; if (ret) { LIST_HEAD(deleted_refs); ASSERT(ancestor > BTRFS_FIRST_FREE_OBJECTID); -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] Btrfs: incremental send, fix invalid paths for rename operations
From: Filipe MananaExample scenario: Parent snapshot: . (ino 277) | tmp/ (ino 278) | pre/ (ino 280) | | wait_dir/ (ino 281) | | desc/ (ino 282) | ance/ (ino 283) | | below_ance/ (ino 279) | | other_dir/(ino 284) Send snapshot: . (ino 277) | tmp/ (ino 278) | other_dir/ (ino 284) | below_ance/ (ino 279) || pre/(ino 280) | | wait_dir/(ino 281) | desc/ (ino 282) | ance/ (ino 283) While computing the send stream the following steps happen: 1) While processing inode 279 we end up delaying its rename operation because its new parent in the send snapshot, inode 284, was not yet processed and therefore not yet renamed; 2) Later when processing inode 280 we end up renaming it immediately to "ance/below_once/pre" and not delay its rename operation because its new parent (inode 279 in the send snapshot) has its rename operation delayed and inode 280 is not an encestor of inode 279 (its parent in the send snapshot) in the parent snapshot; 3) When processing inode 281 we end up delaying its rename operation because its new parent in the send snapshot, inode 284, was not yet processed and therefore not yet renamed; 4) When processing inode 282 we do not delay its rename operation because its parent in the send snapshot, inode 281, already has its own rename operation delayed and our current inode (282) is not an ancestor of inode 281 in the parent snapshot. Therefore inode 282 is renamed to "ance/below_ance/pre/wait_dir"; 5) When processing inode 283 we realize that we can rename it because one of its ancestors in the send snapshot, inode 281, has its rename operation delayed and inode 283 is not an ancestor of inode 281 in the parent snapshot. So a rename operation to rename inode 283 to "ance/below_ance/pre/wait_dir/desc/ance" is issued. This path is invalid due to a missing path building loop that was undetected by the incremental send implementation, as inode 283 ends up getting included twice in the path (once with its path in the parent snapshot). Therefore its rename operation must wait before the ancestor inode 284 is renamed. Fix this by not terminating the rename dependency checks when we find an ancestor, in the send snapshot, that has its rename operation delayed. So that we continue doing the same checks if the current inode is not an ancestor, in the parent snapshot, of an ancestor in the send snapshot we are processing in the loop. The problem and reproducer were reported by Robbie Ko, as part of a patch titled "Btrfs: incremental send, avoid ancestor rename to descendant". However the fix was unnecessarily complicated and can be addressed with much less code and effort. Reported-by: Robbie Ko Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 0dc05bb..3d6a4dd 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -3534,7 +3534,8 @@ static int wait_for_parent_move(struct send_ctx *sctx, ret = is_ancestor(sctx->parent_root, sctx->cur_ino, sctx->cur_inode_gen, ino, path_before); - break; + if (ret) + break; } fs_path_reset(path_before); -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] Btrfs: send, fix failure to move directories with the same name around
From: Robbie KoWhen doing an incremental send we can end up not moving directories that have the same name. This happens when the same parent directory has different child directories with the same name in the parent and send snapshots. For example, consider the following scenario: Parent snapshot: . (ino 256) | d/(ino 257) | |--- p1/ (ino 258) | | p1/ (ino 259) Send snapshot: .(ino 256) |--- d/ (ino 257) |--- p1/(ino 259) |--- p1/ (ino 258) The directory named "d" (inode 257) has in both snapshots an entry with the name "p1" but it refers to different inodes in both snapshots (inode 258 in the parent snapshot and inode 259 in the send snapshot). When attempting to move inode 258, the operation is delayed because its new parent, inode 259, was not yet moved/renamed (as the stream is currently processing inode 258). Then when processing inode 259, we also end up delaying its move/rename operation so that it happens after inode 258 is moved/renamed. This decision to delay the move/rename rename operation of inode 259 is due to the fact that the new parent inode (257) still has inode 258 as its child, which has the same name has inode 259. So we end up with inode 258 move/rename operation waiting for inode's 259 move/rename operation, which in turn it waiting for inode's 258 move/rename. This results in ending the send stream without issuing move/rename operations for inodes 258 and 259 and generating the following warnings in syslog/dmesg: [148402.979747] [ cut here ] [148402.980588] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6177 btrfs_ioctl_send+0xe03/0xe51 [btrfs] [148402.981928] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [148402.986999] CPU: 14 PID: 4117 Comm: btrfs Tainted: GW 4.6.0-rc7-btrfs-next-31+ #1 [148402.988136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [148402.988136] 88022139fca8 8126b42c [148402.988136] 88022139fce8 81052b14 18212139fac8 [148402.988136] 88022b0db400 0001 [148402.988136] Call Trace: [148402.988136] [] dump_stack+0x67/0x90 [148402.988136] [] __warn+0xc2/0xdd [148402.988136] [] warn_slowpath_null+0x1d/0x1f [148402.988136] [] btrfs_ioctl_send+0xe03/0xe51 [btrfs] [148402.988136] [] btrfs_ioctl+0x14f/0x1f81 [btrfs] [148402.988136] [] ? arch_local_irq_save+0x9/0xc [148402.988136] [] ? __lock_is_held+0x3c/0x57 [148402.988136] [] vfs_ioctl+0x18/0x34 [148402.988136] [] do_vfs_ioctl+0x550/0x5be [148402.988136] [] ? __fget+0x6b/0x77 [148402.988136] [] ? __fget_light+0x62/0x71 [148402.988136] [] SyS_ioctl+0x57/0x79 [148402.988136] [] entry_SYSCALL_64_fastpath+0x18/0xa8 [148402.988136] [] ? trace_hardirqs_off_caller+0x3f/0xaa [148403.011373] ---[ end trace a4539270c8056f8b ]--- [148403.012296] [ cut here ] [148403.013071] WARNING: CPU: 14 PID: 4117 at fs/btrfs/send.c:6194 btrfs_ioctl_send+0xe19/0xe51 [btrfs] [148403.014447] Modules linked in: btrfs crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis ppdev tpm parport_pc psmouse parport sg pcspkr i2c_piix4 i2c_core evdev processor serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [148403.019708] CPU: 14 PID: 4117 Comm: btrfs Tainted: GW 4.6.0-rc7-btrfs-next-31+ #1 [148403.020104] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014 [148403.020104] 88022139fca8 8126b42c [148403.020104] 88022139fce8 81052b14 18322139fac8 [148403.020104] 88022b0db400 0001 [148403.020104] Call Trace: [148403.020104] [] dump_stack+0x67/0x90 [148403.020104] [] __warn+0xc2/0xdd [148403.020104] [] warn_slowpath_null+0x1d/0x1f [148403.020104] [] btrfs_ioctl_send+0xe19/0xe51 [btrfs] [148403.020104] [] btrfs_ioctl+0x14f/0x1f81 [btrfs] [148403.020104] [] ? arch_local_irq_save+0x9/0xc [148403.020104] [] ? __lock_is_held+0x3c/0x57 [148403.020104] [] vfs_ioctl+0x18/0x34 [148403.020104] [] do_vfs_ioctl+0x550/0x5be [148403.020104] [] ? __fget+0x6b/0x77 [148403.020104] [] ? __fget_light+0x62/0x71 [148403.020104] [] SyS_ioctl+0x57/0x79 [148403.020104] [] entry_SYSCALL_64_fastpath+0x18/0xa8 [148403.020104] [] ?
Re: [RESEND PATCH] btrfs: Handle uninitialised inode eviction
On 07/11/2016 11:48 AM, David Sterba wrote: > On Mon, Jul 11, 2016 at 09:43:09AM +0300, Nikolay Borisov wrote: >> The code flow in btrfs_new_inode allows for btrfs_evict_inode to be >> called with not fully initialised inode (e.g. ->root member not >> being set). This can happen when btrfs_set_inode_index in >> btrfs_new_inode fails, which in turn would call iput for the newly >> allocated inode. This in turn leads to vfs calling into btrfs_evict_inode. >> This leads to null pointer dereference. To handle this situation check >> whether >> the passed inode has root set and just free it in case it doesn't. >> >> Signed-off-by: Nikolay Borisov>> Reviewed-by: Josef Bacik >> --- >> fs/btrfs/inode.c | 9 - >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> Hello, >> >> I belive this is fixes the issue reported in >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809 > > There's some time left before 4.7 release, so I'll send another pull > request, including this patch. Now that I think about it, shouldn't this also be queued for stable as well? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH] btrfs: Fix slab accounting flags
On Mon, Jul 11, 2016 at 12:12:21AM +0300, Nikolay Borisov wrote: > BTRFS is using a variety of slab caches to satisfy internal needs. > Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT, > meaning allocations from the caches are going to be accounted as > SReclaimable. At the same time btrfs is not registering any shrinkers > whatsoever, thus preventing memory from the slabs to be shrunk. This > means those caches are not in fact reclaimable. > > To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the > inode cache, since this one is being freed by the generic VFS super_block > shrinker. Also set the transaction related caches as SLAB_TEMPORARY, > to better document the lifetime of the objects (it just translates > to SLAB_RECLAIM_ACCOUNT). > > Signed-off-by: Nikolay Borisov> Reviewed-by: David Sterba FYI, patch is queued for 4.8. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On Mon, Jul 11, 2016 at 11:28:01AM +0530, Chandan Rajendra wrote: On Friday, July 08, 2016 12:02:35 PM Chris Mason wrote: Can you please run the attached test program: gcc -o short-write short-write.c -lpthread ./short-write some-new-file-on-btrfs I want to see if you're triggering the same problem we've tried to fix, or something else. Hi Chris, I am able to reproduce the issue with the 'short-write' program. But before the call trace associated with btrfs_destroy_inode(), I see the following call trace ... [ cut here ] WARNING: CPU: 2 PID: 2311 at /home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 btrfs_free_reserved_data_space_noquota+0xe8/0x100 Modules linked in: [ ... ] I will continue to debug and find out the root cause. Thanks Chandan, I'm able to reproduce the same thing more easily by changing short-write to have: #define ROUNDS 4096 and by getting rid of the sync_file_range call. Still nailing down where the accounting is going wrong, so any help is appreciated. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can't use btrfs on USB-stick (write errors)
On Tue, Jun 21, 2016 at 11:54:57PM +0900, Tomasz Chmielewski wrote: > I've tried to use btrfs on USB-stick, but unfortunately it fails with > write errors. > > The below is for kernel 4.4; I've tried with 4.6.2, and it fails in a > similar way. > > > I'm not sure how to reliably reproduce it, but it seems to me it has > something to do with: > > - plenty of random writes > > - USB stick sometimes "very slow to reply" (may be USB-stick dependent) > > > The closest thing to reproduce pretty much within an hour was launching > ktorrent and downloading a couple of Linux isos. > > ext4 does not fail under similar load. USB stick is brand new, writing > data with dd is successful, reading data with dd is successful (whether > it's ext4, btrfs or raw partition); it only seems to fail with btrfs and > plenty of random writes. I think it's the write pattern that does not go well with the usb stick. >From the logs provided, it's an EIO during transaction commit, that's usually fatal so the filesystem switches to read-only. The dd write pattern (and consecutive writes) do not stress the flash translation layer or it's internal block allocation and reclaim. As you say it's not easy to reproduce, I'm more inclined to blame the usb stick. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: dedupe-inband enable/reconfigure: force option does not take argument
On Fri, Jul 08, 2016 at 03:11:42PM +0900, Satoru Takeuchi wrote: > --- > This patch can be applied to integration-20160704(2355a7e5dcdf122d1924) I will not apply this separately, this should be folded to the proper patchset. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: du: fix to skip not btrfs dir/file
On Wed, Jul 06, 2016 at 05:42:33PM +0200, Holger Hoffstätte wrote: > >At the absolute minimum, I think that these messages should go to > > stderr (like du does when it deosn't have permissions), and should go > > away with -q. They're still irritating, but at least you can get rid > > of them easily. > > If anything this should require a --verbose, not the other way > around. Maybe instead of breaking the output just indicate the > special status via "-- --" values, or default to 0.00? The '--' replacement values sound good to me (0.00 would be confusing). > Still, we're explicitly only interested in btrfs stuff and not > anything else, so printing non-information can only yield noise. I think the extra lines would happen mostly in scenario where directories are mountpoints and thus have a potentially different filesystem. If we print just a single line for a mountpoint (and not descend to it), I don't think it will create that much noise in the output. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs module does not load on sparc64
On Thu, Jul 07, 2016 at 05:29:35PM +0300, Anatoly Pugachev wrote: > Compiled linux kernel (git version 4.7.0-rc6+) using my own kernel > config file, enabling : > > CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y > CONFIG_BTRFS_DEBUG=y > CONFIG_BTRFS_ASSERT=y > > and now I can't load btrfs module: > > # modprobe btrfs > modprobe: ERROR: could not insert 'btrfs': Invalid argument > > > and in logs (and on console): > > [1897399.942697] Btrfs loaded, crc32c=crc32c-generic, debug=on, assert=on > [1897400.024645] BTRFS: selftest: sectorsize: 8192 nodesize: 8192 > [1897400.098089] BTRFS: selftest: Running btrfs free space cache tests > [1897400.175863] BTRFS: selftest: Running extent only tests > [1897400.241871] BTRFS: selftest: Running bitmap only tests > [1897400.307877] BTRFS: selftest: Running bitmap and extent tests > [1897400.380329] BTRFS: selftest: Running space stealing from bitmap to extent > [1897400.470517] BTRFS: selftest: Free space cache tests finished > [1897400.542875] BTRFS: selftest: Running extent buffer operation tests > [1897400.621710] BTRFS: selftest: Running btrfs_split_item tests > [1897400.692929] BTRFS: selftest: Running extent I/O tests > [1897400.757459] BTRFS: selftest: Running find delalloc tests > [1897401.082670] BTRFS: selftest: Running extent buffer bitmap tests > [1897401.161223] BTRFS: selftest: Setting straddling pages failed The sanity tests fail at some point with EINVAL, so the module will fail to load in turn. Looking at the test itself (__test_eb_bitmaps), it does no make any assumptions about page size etc. so this "should work". Taking powerpc with 64k page size for another reference where the tests work, I'm not sure what exactly could be wrong here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH] btrfs: Handle uninitialised inode eviction
On Mon, Jul 11, 2016 at 09:43:09AM +0300, Nikolay Borisov wrote: > The code flow in btrfs_new_inode allows for btrfs_evict_inode to be > called with not fully initialised inode (e.g. ->root member not > being set). This can happen when btrfs_set_inode_index in > btrfs_new_inode fails, which in turn would call iput for the newly > allocated inode. This in turn leads to vfs calling into btrfs_evict_inode. > This leads to null pointer dereference. To handle this situation check whether > the passed inode has root set and just free it in case it doesn't. > > Signed-off-by: Nikolay Borisov> Reviewed-by: Josef Bacik > --- > fs/btrfs/inode.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > Hello, > > I belive this is fixes the issue reported in > http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809 There's some time left before 4.7 release, so I'll send another pull request, including this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add option to run balance as daemon
On Tue, Jun 21, 2016 at 11:16:59AM -0400, Austin S. Hemmelgarn wrote: > Currently, balance operations are run synchronously in the foreground. > This is nice for interactive management, but is kind of crappy when you > start looking at automation and similar things. It can be done with simplest systemd unit file: btrfs-balance@.service: --- [Unit] Description=btrfs balance for %I [Service] ExecStart=/usr/bin/btrfs balance start %I ExecStop=/usr/bin/btrfs balance cancel %I --- It automates quite nicely and needs no additional code. -- Tomasz Torcz"Funeral in the morning, IDE hacking xmpp: zdzich...@chrome.plin the afternoon and evening." - Alan Cox -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND PATCH] btrfs: Handle uninitialised inode eviction
The code flow in btrfs_new_inode allows for btrfs_evict_inode to be called with not fully initialised inode (e.g. ->root member not being set). This can happen when btrfs_set_inode_index in btrfs_new_inode fails, which in turn would call iput for the newly allocated inode. This in turn leads to vfs calling into btrfs_evict_inode. This leads to null pointer dereference. To handle this situation check whether the passed inode has root set and just free it in case it doesn't. Signed-off-by: Nikolay BorisovReviewed-by: Josef Bacik --- fs/btrfs/inode.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) Hello, I belive this is fixes the issue reported in http://thread.gmane.org/gmane.comp.file-systems.btrfs/57809 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4421954720b8..b51723811d01 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5159,11 +5159,18 @@ void btrfs_evict_inode(struct inode *inode) struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_block_rsv *rsv, *global_rsv; int steal_from_global = 0; - u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); + u64 min_size; int ret; trace_btrfs_inode_evict(inode); + if (!root) { + kmem_cache_free(btrfs_inode_cachep, BTRFS_I(inode)); + return; + } + + min_size = btrfs_calc_trunc_metadata_size(root, 1); + evict_inode_truncate_pages(inode); if (inode->i_nlink && -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html