Re: Status of RAID5/6
On Fri, Mar 30, 2018 at 06:14:52PM +0200, Goffredo Baroncelli wrote: > On 03/29/2018 11:50 PM, Zygo Blaxell wrote: > > On Wed, Mar 21, 2018 at 09:02:36PM +0100, Christoph Anton Mitterer wrote: > >> Hey. > >> > >> Some things would IMO be nice to get done/clarified (i.e. documented in > >> the Wiki and manpages) from users'/admin's POV: > [...] > > > > btrfs has no optimization like mdadm write-intent bitmaps; recovery > > is always a full-device operation. In theory btrfs could track > > modifications at the chunk level but this isn't even specified in the > > on-disk format, much less implemented. > > It could go even further; it would be sufficient to track which > *partial* stripes update will be performed before a commit, in one > of the btrfs logs. Then in case of a mount of an unclean filesystem, > a scrub on these stripes would be sufficient. A scrub cannot fix a raid56 write hole--the data is already lost. The damaged stripe updates must be replayed from the log. A scrub could fix raid1/raid10 partial updates but only if the filesystem can reliably track which blocks failed to be updated by the disconnected disks. It would be nice if scrub could be filtered the same way balance is, e.g. only certain block ranges, or only metadata blocks; however, this is not presently implemented. > BR > G.Baroncelli > > [...] > > > -- > gpg @keyserver.linux.it: Goffredo Baroncelli > Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: PGP signature
Re: Status of RAID5/6
On Fri, Mar 30, 2018 at 09:21:00AM +0200, Menion wrote: > Thanks for the detailed explanation. I think that a summary of this > should go in the btrfs raid56 wiki status page, because now it is > completely inconsistent and if a user comes there, ihe may get the > impression that the raid56 is just broken > Still I have the 1 bilion dollar question: from your word I understand > that even in RAID56 the metadata are spread on the devices in a coplex > way, but shall I assume that the array can survice to the sudden death > of one (two for raid6) HDD in the array? I wouldn't assume that. There is still the write hole, and while there is a small probability of having a write hole failure, it's a probability that applies on *every* write in degraded mode, and since disks can fail at any time, the array can enter degraded mode at any time. It's similar to lottery tickets--buy one ticket, you probably won't win, but if you buy millions of tickets, you'll claim the prize eventually. The "prize" in this case is a severely damaged, possibly unrecoverable filesystem. If the data is raid5 and the metadata is raid1, the filesystem can survive a single disk failure easily; however, some of the data may be lost if writes to the remaining disks are interrupted by a system crash or power failure and the write hole issue occurs. Note that the damage is not necessarily limited to recently written data--it's any random data that is merely located adjacent to written data on the filesystem. I wouldn't use raid6 until the write hole issue is resolved. There is no configuration where two disks can fail and metadata can still be updated reliably. Some users use the 'ssd_spread' mount option to reduce the probability of write hole failure, which happens to be helpful by accident on some array configurations, but it has a fairly high cost when the array is not degraded due to all the extra balancing required. > Bye signature.asc Description: PGP signature
Re: [PATCH] Btrfs: print error messages when failing to read trees
On Thu, Mar 29, 2018 at 06:11:45AM +0800, Liu Bo wrote: > When mount fails to read trees like fs tree, checksum tree, extent > tree, etc, there is not enough information about where went wrong. > > With this, messages like > > "BTRFS warning (device sdf): failed to read root (objectid=7): -5" > > would help us a bit. > > Signed-off-by: Liu BoReviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2.1 2/2] btrfs: Validate child tree block's level and first key
On Thu, Mar 29, 2018 at 09:08:11AM +0800, Qu Wenruo wrote: > We have several reports about node pointer points to incorrect child > tree blocks, which could have even wrong owner and level but still with > valid generation and checksum. > > Although btrfs check could handle it and print error message like: > leaf parent key incorrect 60670574592 > > Kernel doesn't have enough check on this type of corruption correctly. > At least add such check to read_tree_block() and btrfs_read_buffer(), > where we need two new parameters @level and @first_key to verify the > child tree block. > > The new @level check is mandatory and all call sites are already > modified to extract expected level from its call chain. > > While @first_key is optional, the following call sites are skipping such > check: > 1) Root node/leaf >As ROOT_ITEM doesn't contain the first key, skip @first_key check. > 2) Direct backref >Only parent bytenr and level is known and we need to resolve the key >all by ourselves, skip @first_key check. > > Another note of this verification is, it needs extra info from nodeptr > or ROOT_ITEM, so it can't fit into current tree-checker framework, which > is limited to node/leaf boundary. > > Signed-off-by: Qu Wenruo> --- > changelog: > v2: > Make @level check mandatory, suggesed by Jeff and Nikolay. > Change parameter order as @level is now mandatory, put it in front of > @first_key. > Change verify_parent_level() to verify_key_level() to avoid confusion > on the @level parameter. > Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging. > v2.1 > Rebased to misc-next branch, to use parent_level, it needs to revert > Nikolay's patch "btrfs: Remove unused parent_level var from > btrfs_realloc_node", > sorry Nikolay. Ah ok, I've reverted the patch, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: bail out on error during replay_dir_deletes
If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs to bail out, otherwise @ret would be forced to be 0 after 'break;' and the caller won't be aware of it. Signed-off-by: Liu Bo--- fs/btrfs/tree-log.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 4ee9431..11e2c26 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2356,8 +2356,10 @@ static noinline int replay_dir_deletes(struct btrfs_trans_handle *trans, nritems = btrfs_header_nritems(path->nodes[0]); if (path->slots[0] >= nritems) { ret = btrfs_next_leaf(root, path); - if (ret) + if (ret == 1) break; + else if (ret < 0) + goto out; } btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix NULL pointer dereference in log_dir_items
0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is returned, path->nodes[0] could be NULL, log_dir_items lacks such a check for <0 and we may run into a null pointer dereference panic. Signed-off-by: Liu Bo--- fs/btrfs/tree-log.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 4344577..4ee9431 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3518,8 +3518,11 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans, * from this directory and from this transaction */ ret = btrfs_next_leaf(root, path); - if (ret == 1) { - last_offset = (u64)-1; + if (ret) { + if (ret == 1) + last_offset = (u64)-1; + else + err = ret; goto done; } btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: clean up resources during umount after trans is aborted
Currently if some fatal errors occur, like all IO get -EIO, resources would be cleaned up when a) transaction is being committed or b) BTRFS_FS_STATE_ERROR is set However, in some rare cases, resources may be left alone after transaction gets aborted and umount may run into some ASSERT(), e.g. ASSERT(list_empty(_group->dirty_list)); For case a), in btrfs_commit_transaciton(), there're several places at the beginning where we just call btrfs_end_transaction() without cleaning up resources. For case b), it is possible that the trans handle doesn't have any dirty stuff, then only trans hanlde is marked as aborted while BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that all resources won't stay in memory after umount. Signed-off-by: Liu Bo--- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 21f34ad..643bd69 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3735,7 +3735,8 @@ void close_ctree(struct btrfs_fs_info *fs_info) btrfs_err(fs_info, "commit super ret %d", ret); } - if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state)) + if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state) || + test_bit(BTRFS_FS_STATE_TRANS_ABORTED, _info->fs_state)) btrfs_error_commit_super(fs_info); kthread_stop(fs_info->transaction_kthread); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] libbtrfsutil: misc fixes
On Thu, Mar 29, 2018 at 12:53:52AM -0700, Omar Sandoval wrote: > From: Omar Sandoval> This is a handful of minor fixes for libbtrfsutil that would be good to > get in before the release. Patch 1 fixes an issue reported by Tomohiro a > few weeks back. Patch 2 fixes a memory leak I noticed while writing > patch 1. Patch 3 makes us use the local mkfs.btrfs so that the tests can > run on the CI system. Patch 4 fixes an issue I noticed while testing > that fix. Patch 5 fixes a test failure caused by "btrfs-progs: mkfs: add > uuid and otime to ROOT_ITEM of, FS_TREE". > > Based on your devel branch. Please apply! Done, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] btrfs-progs: Enhance btrfs-image to handle missing device
On Fri, Mar 30, 2018 at 03:35:25PM +0800, Qu Wenruo wrote: > Can be fetched from github: > https://github.com/adam900710/btrfs-progs/tree/btrfs_image_fix > > Bug report: > https://github.com/kdave/btrfs-progs/issues/118 > > In short, the problem is caused by some old code (read_extent_data() > from ancient btrfs check code) and offset-by-one from btrfs-image. > Which makes btrfs-image can only read from the first stripe of RAID1. > > And if device of the first stripe is missing, btrfs-image will fail. > > Fix the problem and add test case for it. > > Qu Wenruo (3): > btrfs-progs: disk-io: Fix read_extent_data() error handler for missing > device > btrfs-progs: convert: Fix offset-by-one error in read_data_extent() > btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing > device 1-3 applied, thanks. I'll reorder the uuid tree patch after those patches so the test does not fail. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing device
On Fri, Mar 30, 2018 at 03:35:28PM +0800, Qu Wenruo wrote: > Signed-off-by: Qu WenruoApplied with the fixes below, thanks. > --- > tests/misc-tests/030-missing-device-image/test.sh | 57 > +++ > 1 file changed, 57 insertions(+) > create mode 100755 tests/misc-tests/030-missing-device-image/test.sh > > diff --git a/tests/misc-tests/030-missing-device-image/test.sh > b/tests/misc-tests/030-missing-device-image/test.sh > new file mode 100755 > index ..b8ae3a950cc9 > --- /dev/null > +++ b/tests/misc-tests/030-missing-device-image/test.sh > @@ -0,0 +1,57 @@ > +#!/bin/bash > +# Test that btrfs-image can dump image correctly for missing device (RAID1) > +# > +# At least for RAID1, btrfs-image should be able to handle one missing device > +# without any problem > + > +source "$TEST_TOP/common" > + > +check_prereq btrfs-image > +check_prereq mkfs.btrfs > +check_prereq btrfs > + > +setup_root_helper > +setup_loopdevs 2 > +prepare_loopdevs > +dev1=${loopdevs[1]} > +dev2=${loopdevs[2]} > + > +# $1:device number to remove (either 1 or 2) > +tmp=$(mktemp --tmpdir -d btrfs-progs-misc-test-XXX) Not necessary. > +test_missing() > +{ > + bad_num=$1 > + bad_dev=${loopdevs[$bad_num]} > + good_num=$((3 - $bad_num)) > + good_dev=${loopdevs[$good_num]} All of them should be declared as local > + > + run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -d raid1 -m raid1 \ > + "$dev1" "$dev2" > + > + # fill the fs with some data, we could create space cache > + run_check $SUDO_HELPER mount "$dev1" "$TEST_MNT" > + run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/a" bs=1M count=10 > + run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/b" bs=4k > count=1000 conv=sync > + run_check $SUDO_HELPER umount "$TEST_MNT" > + > + # make sure we have space cache > + run_check_stdout "$TOP/btrfs" inspect dump-tree -t root "$dev1" \ > + > "$tmp/output" > + if ! grep -q "EXTENT_DATA" "$tmp/output" ; then this can be glued to one line, making the tmp files unnecessry > + # normally above operation should create space cache. > + # if not, it may means we have migrated to v2 cache by default > + _not_run "unable to create v1 space cache" > + fi > + > + # now wipe the device > + run_check wipefs -fa "$bad_dev" > + > + # we don't care about the image but btrfs-image must not fail > + run_check "$TOP/btrfs-image" "$good_dev" /dev/null > +} > + > +# Test with either device missing, so we're ensured to hit missing device > +test_missing 1 > +test_missing 2 > +cleanup_loopdevs > +rm $tmp -rf -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: tests/mkfs: Test if mkfs.btrfs --rootdir can handle broken soft link well
On Thu, Mar 29, 2018 at 09:26:44AM +0800, Qu Wenruo wrote: > Signed-off-by: Qu WenruoTest fixed and applied, thanks. > --- > .../016-rootdir-bad-symbolic-link/test.sh | 26 > ++ > 1 file changed, 26 insertions(+) > create mode 100755 tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh > > diff --git a/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh > b/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh > new file mode 100755 > index ..d12efa629042 > --- /dev/null > +++ b/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh > @@ -0,0 +1,26 @@ > +#!/bin/bash > +# Regression test for mkfs.btrfs --rootdir with bad symbolic link > +# (points to non-existing location) > +# > +# Since mkfs.btrfs --rootdir will just create symbolic link other than > +# follow it, we shouldn't hit any problem > + > +source "$TEST_TOP/common" > + > +check_prereq mkfs.btrfs > + > +prepare_test_dev > + > +tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX) > + > +non_existing="/no/such/file" Adding a few $RANDOM will make the below almost impossible. > + > +if [ -f "$non_existing" ]; then > + # Some smartass don't want to this test case to run > + _not_run "Some one created $non_exist, which is not expect to exist" ^^ > +fi > + > +run_check ln -sf "$non_existing" "$tmp/foobar" > + > +run_check "$TOP/mkfs.btrfs" -f --rootdir "$tmp" "$TEST_DEV" > +run_check "$TOP/btrfs" check "$TEST_DEV" rm -rf -- "$tmp" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] btrfs-progs: fi usage: cleanup unnecessary permission error check
On Thu, Mar 29, 2018 at 05:23:39PM +0900, Misono Tomohiro wrote: > Since BTRFS_IOC_FS_INFO does not require root privilege, there is no > need to check EPERM error. Well, this has been changed in 3.16, but there's still 3.2 longterm kernel. It does not hurt to have the check and verbose message so we may need to keep it for a few more years. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Remove unused parameter
On Fri, Mar 30, 2018 at 08:49:55AM +0300, Nikolay Borisov wrote: > > > On 30.03.2018 05:56, Gu Jinxiang wrote: > > Parameter usagestr is not used, remove it. > > > > Signed-off-by: Gu Jinxiang> > Reviewed-by: Nikolay Borisov Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question, will ls -l eventually be able to show subvolumes?
On Fri, Mar 30, 2018 at 10:42:10AM +0100, Pete wrote: > I've just notice work going on to make rmdir be able to delete > subvolumes. Is there an intent to allow ls -l to display directories as > subvolumes? That's entirely up to coreutils guys. Meow! -- ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"
On 03/30/2018 06:46 AM, Misono Tomohiro wrote: > On 2018/03/30 2:35, Goffredo Baroncelli wrote: >> Hi Misono, > > Hello, [...] >> My conclusion, is that your work is not consistent with the current >> implementation; so I am suggesting to create a new subcommand ("btrfs sub >> tree" ?) where you can use, without constraint, your api. > > I agree that the current output of "sub list" is confusing first of all and > it is not easy to keep consistent behavior between user and root. > > So, I think "sub list" (or new command) should be: > - (default) list the subvolumes under the specified path, including > subvolumes mounted below the specified path. > Any user can do this (with appropriate permission checks). > The path to be printed is the relative from the specified path. > - (-a option) list all the subvolmumes in the filesystem. Only root can do > this. > The path to be printed is the absolute path from the toplevel > subvolume. > > This would need some changes in progs code, but I think we can use the same > new ioctls I proposed[1] for the default behavior. > > I'd like to update "sub list" command eventually rather than adding new > command, even though this breaks the compatibility. > I want to know what other developer/user think. > > [1] https://www.spinics.net/lists/linux-btrfs/msg76003.html > >> >> Another small differences that I found is in the subcommand "btrfs sub >> show". This is not capable to show the snapshots of a subvolume; I think >> that, in "user mode", it should be reported that there is a "missing >> capabilities" problem than to show an empty list > > Yes, this is because that only the snapshots *under the specified path* are > searched. > This can be easily changed to search under the mount point, but this also > results in > slight change in print format of the path for root. I tried to keep as much > consistency in this version. When I perform a snapshot, I think the new snapshot more as a sister/brother than a child, so I put it at the same level instead of below the source subvolume. Moreover the path printed at the first line seems to be relative at the root of filesystem. > Thanks, > Tomohiro Misono > >> >> Below an example of what I am say: >> >> ghigo@venice:/tmp$ btrfs sub cre a >> ghigo@venice:/tmp$ btrfs sub snap a c >> >> ghigo@venice:/tmp$ # patched btrfs >> ghigo@venice:/tmp$ /home/ghigo/btrfs/btrfs-progs/btrfs sub show a >> tmp/a >> Name: a >> UUID: 0c19a7a4-a7f5-1849-9262-88ab3da504d0 >> Parent UUID:- >> Received UUID: - >> Creation time: 2018-03-28 22:48:09 +0200 >> Subvolume ID: 598 >> Generation: 696908 >> Gen at creation:696907 >> Parent ID: 257 >> Top level ID: 257 >> Flags: - >> Snapshot(s): >> >> >> ghigo@venice:/tmp$ # stock btrfs >> ghigo@venice:/tmp$ sudo /home/ghigo/btrfs/btrfs-progs/btrfs sub show a >> ^^ >> tmp/a >> Name: a >> UUID: 0c19a7a4-a7f5-1849-9262-88ab3da504d0 >> Parent UUID:- >> Received UUID: - >> Creation time: 2018-03-28 22:48:09 +0200 >> Subvolume ID: 598 >> Generation: 696908 >> Gen at creation:696907 >> Parent ID: 257 >> Top level ID: 257 >> Flags: - >> Snapshot(s): >> debian/tmp/c >> ^ >> >> >> BR >> G.Baroncelli >> >> >> - >> Test performed: >> >> ghigo@venice:/tmp$ sudo btrfs sub crea a >> ghigo@venice:/tmp$ sudo btrfs sub crea a/a1 >> ghigo@venice:/tmp$ sudo btrfs sub crea a/a1/a2 >> ghigo@venice:/tmp$ sudo btrfs sub crea b >> ghigo@venice:/tmp$ sudo btrfs sub crea b/b1 >> ghigo@venice:/tmp$ sudo chmod og-rwx b/. >> >> >> # stock btrfs progs >> ghigo@venice:/tmp$ sudo btrfs sub l . >> ID 257 gen 696886 top level 5 path debian >> ID 289 gen 587461 top level 257 path var/lib/machines >> ID 299 gen 693561 top level 5 path boot >> ID 582 gen 683965 top level 5 path i386 >> ID 592 gen 696884 top level 257 path tmp/a >> ID 593 gen 696885 top level 592 path tmp/a/a1 >> ID 594 gen 696885 top level 593 path tmp/a/a1/a2 >> ID 595 gen 696887 top level 257 path tmp/b >> ID 596 gen 696887 top level 595 path tmp/b/b1 >> >> # patched btrfs progs >> ghigo@venice:/tmp$ /home/ghigo/btrfs/btrfs-progs/btrfs sub lis . >> ID 592 gen 696884 top level 257 path a >> ID 593 gen 696885 top level 592 path a/a1 >> ID 594 gen 696885 top level 593 path a/a1/a2 >> ID 595 gen 0 top level 257 path b >> >> >> ghigo@venice:/tmp$ /home/ghigo/btrfs/btrfs-progs/btrfs sub show a >> tmp/a >> Name: a >>
Re: [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"
On 03/30/2018 06:46 AM, Misono Tomohiro wrote: > [Un]fortunately, the stock "btrfs sub list" has the capability to show all > the subvolumes, even the ones not mounted, so this can be entirely replaced > by your API. s/can be entirely/can't be entirely/ -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of RAID5/6
On 03/29/2018 11:50 PM, Zygo Blaxell wrote: > On Wed, Mar 21, 2018 at 09:02:36PM +0100, Christoph Anton Mitterer wrote: >> Hey. >> >> Some things would IMO be nice to get done/clarified (i.e. documented in >> the Wiki and manpages) from users'/admin's POV: [...] > >> - changing raid lvls? > > btrfs uses a brute-force RAID conversion algorithm which always works, but > takes zero short cuts. e.g. there is no speed optimization implemented > for cases like "convert 2-disk raid1 to 1-disk single" which can be > very fast in theory. The worst-case running time is the only running > time available in btrfs. [...] What it is reported by Zygo is an excellent source of information. However I have to point out that BTRFS has a little optimization: i.e. scrub/balance only works on the allocated chunk. So a partial filled filesystem requires less time than a nearly filled one > > btrfs has no optimization like mdadm write-intent bitmaps; recovery > is always a full-device operation. In theory btrfs could track > modifications at the chunk level but this isn't even specified in the > on-disk format, much less implemented. It could go even further; it would be sufficient to track which *partial* stripes update will be performed before a commit, in one of the btrfs logs. Then in case of a mount of an unclean filesystem, a scrub on these stripes would be sufficient. BR G.Baroncelli [...] -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Replace owner argument in add_pinned_bytes with a boolean
add_pinned_bytes reallyc ares whether the bytes being pinned are either data or metadata. To that effect is checks whether the 'owner' argument is less than BTRFS_FIRST_FREE_OBJECTID (256). This works because owner can really have 2 types of values: a) For metadata extents it holds the level at which the parent is in the btree. This amounts to owner having the values 0-7 b) In case of modifying data extentsi, owner is the inode number to which those extents belongs. Let's make this more explicit byt converting the owner parameter to a boolean value and either pass it directly when we know the type of extents we are working with (i.e. in btrfs_free_tree_block). In cases when the parent function can be called on both metadata/data extents perform the check in the caller. This hopefully makes the interface of add_pinned_bytes more intuitive. Signed-off-by: Nikolay Borisov--- As an added bonus (but not changelog noteworthy) we even get a slight reduction in size (likely due to de-inlining): add/remove: 0/0 grow/shrink: 2/2 up/down: 21/-65 (-44) Function old new delta btrfs_free_extent238 249 +11 btrfs_inc_extent_ref 197 207 +10 add_pinned_bytes 109 104 -5 btrfs_free_tree_block906 846 -60 Total: Before=89122, After=89078, chg -0.05% fs/btrfs/extent-tree.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0fe196f19e66..1a663b820a7b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -756,12 +756,12 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info, } static void add_pinned_bytes(struct btrfs_fs_info *fs_info, s64 num_bytes, -u64 owner, u64 root_objectid) +bool metadata, u64 root_objectid) { struct btrfs_space_info *space_info; u64 flags; - if (owner < BTRFS_FIRST_FREE_OBJECTID) { + if (metadata) { if (root_objectid == BTRFS_CHUNK_TREE_OBJECTID) flags = BTRFS_BLOCK_GROUP_SYSTEM; else @@ -2212,8 +2212,10 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, _ref_mod, _ref_mod); } - if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0) - add_pinned_bytes(fs_info, -num_bytes, owner, root_objectid); + if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0) { + bool metadata = owner < BTRFS_FIRST_FREE_OBJECTID; + add_pinned_bytes(fs_info, -num_bytes, metadata, root_objectid); + } return ret; } @@ -7231,7 +7233,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, } out: if (pin) - add_pinned_bytes(fs_info, buf->len, btrfs_header_level(buf), + add_pinned_bytes(fs_info, buf->len, true, root->root_key.objectid); if (last_ref) { @@ -7285,8 +7287,10 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, _ref_mod, _ref_mod); } - if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0) - add_pinned_bytes(fs_info, num_bytes, owner, root_objectid); + if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0) { + bool metadata = owner < BTRFS_FIRST_FREE_OBJECTID; + add_pinned_bytes(fs_info, num_bytes, metadata, root_objectid); + } return ret; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question, will ls -l eventually be able to show subvolumes?
I've just notice work going on to make rmdir be able to delete subvolumes. Is there an intent to allow ls -l to display directories as subvolumes? Pete -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] libbtrfsutil: fix test assumptions about top-level subvolume
On 2018/03/29 16:53, Omar Sandoval wrote: > From: Omar Sandoval> > Since "btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE", > the top-level subvolume has a non-zero UUID, ctime, and otime. Fix the > subvolume_info() test to not check for zero. Sorry, I didn't notice this. I checked this works in devel branch: Reviewed-by: Tomohiro Misono > > Signed-off-by: Omar Sandoval > --- > libbtrfsutil/python/tests/test_subvolume.py | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/libbtrfsutil/python/tests/test_subvolume.py > b/libbtrfsutil/python/tests/test_subvolume.py > index a46d4a34..93396cba 100644 > --- a/libbtrfsutil/python/tests/test_subvolume.py > +++ b/libbtrfsutil/python/tests/test_subvolume.py > @@ -95,7 +95,8 @@ class TestSubvolume(BtrfsTestCase): > self.assertEqual(info.parent_id, 0) > self.assertEqual(info.dir_id, 0) > self.assertEqual(info.flags, 0) > -self.assertEqual(info.uuid, bytes(16)) > +self.assertIsInstance(info.uuid, bytes) > +self.assertEqual(len(info.uuid), 16) > self.assertEqual(info.parent_uuid, bytes(16)) > self.assertEqual(info.received_uuid, bytes(16)) > self.assertNotEqual(info.generation, 0) > @@ -103,8 +104,8 @@ class TestSubvolume(BtrfsTestCase): > self.assertEqual(info.otransid, 0) > self.assertEqual(info.stransid, 0) > self.assertEqual(info.rtransid, 0) > -self.assertEqual(info.ctime, 0) > -self.assertEqual(info.otime, 0) > +self.assertIsInstance(info.ctime, float) > +self.assertIsInstance(info.otime, float)> > self.assertEqual(info.stime, 0) > self.assertEqual(info.rtime, 0) > > @@ -117,6 +118,7 @@ class TestSubvolume(BtrfsTestCase): > self.assertEqual(info.dir_id, 256) > self.assertEqual(info.flags, 0) > self.assertIsInstance(info.uuid, bytes) > +self.assertEqual(len(info.uuid), 16) > self.assertEqual(info.parent_uuid, bytes(16)) > self.assertEqual(info.received_uuid, bytes(16)) > self.assertNotEqual(info.generation, 0) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] btrfs-progs: Enhance btrfs-image to handle missing device
Can be fetched from github: https://github.com/adam900710/btrfs-progs/tree/btrfs_image_fix Bug report: https://github.com/kdave/btrfs-progs/issues/118 In short, the problem is caused by some old code (read_extent_data() from ancient btrfs check code) and offset-by-one from btrfs-image. Which makes btrfs-image can only read from the first stripe of RAID1. And if device of the first stripe is missing, btrfs-image will fail. Fix the problem and add test case for it. Qu Wenruo (3): btrfs-progs: disk-io: Fix read_extent_data() error handler for missing device btrfs-progs: convert: Fix offset-by-one error in read_data_extent() btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing device disk-io.c | 6 ++- image/main.c | 2 +- tests/misc-tests/030-missing-device-image/test.sh | 57 +++ 3 files changed, 62 insertions(+), 3 deletions(-) create mode 100755 tests/misc-tests/030-missing-device-image/test.sh -- 2.16.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: disk-io: Fix read_extent_data() error handler for missing device
When device is missing, read_extent_data() (function exported from old btrfs check code) has the following problems: 1) Modify @len parameter if device is missing If device returned in @multi is missing, @len can be larger than @max_len (originl length). This could confusing caller and underflow the read loop. 2) Still return 0 for missing device It only handles read error, missing device is not handled and 0 is returned. 3) Wrong check for device->fd In fact, 0 is also a valid fd. Although not possible under most case, it's still need fix. Fix them all. Fixes: 1bad2f2f2dfe ("Btrfs-progs: fsck: add an option to check data csums") Signed-off-by: Qu Wenruo--- disk-io.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/disk-io.c b/disk-io.c index 610963357675..310ab19cf099 100644 --- a/disk-io.c +++ b/disk-io.c @@ -395,10 +395,12 @@ int read_extent_data(struct btrfs_fs_info *fs_info, char *data, u64 logical, } device = multi->stripes[0].dev; - if (device->fd <= 0) - goto err; if (*len > max_len) *len = max_len; + if (device->fd < 0) { + ret = -EIO; + goto err; + } ret = pread64(device->fd, data, *len, multi->stripes[0].physical); if (ret != *len) -- 2.16.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing device
Signed-off-by: Qu Wenruo--- tests/misc-tests/030-missing-device-image/test.sh | 57 +++ 1 file changed, 57 insertions(+) create mode 100755 tests/misc-tests/030-missing-device-image/test.sh diff --git a/tests/misc-tests/030-missing-device-image/test.sh b/tests/misc-tests/030-missing-device-image/test.sh new file mode 100755 index ..b8ae3a950cc9 --- /dev/null +++ b/tests/misc-tests/030-missing-device-image/test.sh @@ -0,0 +1,57 @@ +#!/bin/bash +# Test that btrfs-image can dump image correctly for missing device (RAID1) +# +# At least for RAID1, btrfs-image should be able to handle one missing device +# without any problem + +source "$TEST_TOP/common" + +check_prereq btrfs-image +check_prereq mkfs.btrfs +check_prereq btrfs + +setup_root_helper +setup_loopdevs 2 +prepare_loopdevs +dev1=${loopdevs[1]} +dev2=${loopdevs[2]} + +# $1: device number to remove (either 1 or 2) +tmp=$(mktemp --tmpdir -d btrfs-progs-misc-test-XXX) +test_missing() +{ + bad_num=$1 + bad_dev=${loopdevs[$bad_num]} + good_num=$((3 - $bad_num)) + good_dev=${loopdevs[$good_num]} + + run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -d raid1 -m raid1 \ + "$dev1" "$dev2" + + # fill the fs with some data, we could create space cache + run_check $SUDO_HELPER mount "$dev1" "$TEST_MNT" + run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/a" bs=1M count=10 + run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/b" bs=4k count=1000 conv=sync + run_check $SUDO_HELPER umount "$TEST_MNT" + + # make sure we have space cache + run_check_stdout "$TOP/btrfs" inspect dump-tree -t root "$dev1" \ + > "$tmp/output" + if ! grep -q "EXTENT_DATA" "$tmp/output" ; then + # normally above operation should create space cache. + # if not, it may means we have migrated to v2 cache by default + _not_run "unable to create v1 space cache" + fi + + # now wipe the device + run_check wipefs -fa "$bad_dev" + + # we don't care about the image but btrfs-image must not fail + run_check "$TOP/btrfs-image" "$good_dev" /dev/null +} + +# Test with either device missing, so we're ensured to hit missing device +test_missing 1 +test_missing 2 +cleanup_loopdevs +rm $tmp -rf -- 2.16.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of RAID5/6
Thanks for the detailed explanation. I think that a summary of this should go in the btrfs raid56 wiki status page, because now it is completely inconsistent and if a user comes there, ihe may get the impression that the raid56 is just broken Still I have the 1 bilion dollar question: from your word I understand that even in RAID56 the metadata are spread on the devices in a coplex way, but shall I assume that the array can survice to the sudden death of one (two for raid6) HDD in the array? Bye -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] btrfs-progs: extent buffer related refactor and cleanup
On Fri, Mar 30, 2018 at 01:48:52PM +0800, Qu Wenruo wrote: >The patchset can be fetched from github: >https://github.com/adam900710/btrfs-progs/tree/eb_cleanup > >Just like kernel cleanup and refactors, this patchset will embed >btrfs_fs_info structure into extent_buffer. > >And fixes several possible NULL pointer dereference (although not >utilized in btrfs-progs yet). > >Changelog: >v2: > Embarrassingly, I forgot to install reiserfsprogs in my development > machine, so the 3rd patch lacks one call site in > convert/source-reiserfs.c. > >Qu Wenruo (5): > btrfs-progs: extent_io: Fix NULL pointer dereference in >free_extent_buffer_final() > btrfs-progs: extent_io: Init eb->lru to avoid NULL pointer dereference > btrfs-progs: extent_io: Refactor alloc_extent_buffer() to follow >kernel parameters > btrfs-progs: Unify btrfs_leaf_free_psace() parameter with kernel > btrfs-progs: print-tree: Remove btrfs_root parameter The patchset looks good to me. Reviewed-by: Lu Fengqi> > btrfs-corrupt-block.c | 2 +- > check/main.c | 2 +- > check/mode-lowmem.c | 2 +- > cmds-inspect-dump-tree.c | 31 ++ > convert/source-reiserfs.c | 3 +-- > ctree.c | 65 +-- > ctree.h | 3 ++- > disk-io.c | 3 +-- > extent-tree.c | 8 +++--- > extent_io.c | 17 - > extent_io.h | 3 ++- > print-tree.c | 20 --- > print-tree.h | 4 +-- > 13 files changed, 85 insertions(+), 78 deletions(-) > >-- >2.16.3 > >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Thanks, Lu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] btrfs: cleanup btrfs_ioctl_snap_destroy() by using btrfs_delete_subvolume()
Use btrfs_delete_subvolume() in btrfs_ioctl_snap_destroy() too to cleanup the code. Call of d_delete() is still required since btrfs_delete_subvolume() does not call it (for rmdir(2), vfs layer later calls it). As a result, btrfs_unlink_subvol() and may_destroy_subvol() become static functions. No functional change happens. Signed-off-by: Tomohiro Misono--- fs/btrfs/ctree.h | 6 +-- fs/btrfs/inode.c | 6 +-- fs/btrfs/ioctl.c | 131 +-- 3 files changed, 6 insertions(+), 137 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2dbead106aab..6f23f0694ac3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3162,11 +3162,7 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans, int btrfs_add_link(struct btrfs_trans_handle *trans, struct btrfs_inode *parent_inode, struct btrfs_inode *inode, const char *name, int name_len, int add_backref, u64 index); -int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - struct inode *dir, u64 objectid, - const char *name, int name_len); -noinline int may_destroy_subvol(struct btrfs_root *root); +int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry); int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, int front); int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 84dbb9cafd6b..afc631f28088 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4252,7 +4252,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry) return ret; } -int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, +static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *dir, u64 objectid, const char *name, int name_len) @@ -4336,7 +4336,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, /* * helper to check if the subvolume references other subvolumes */ -noinline int may_destroy_subvol(struct btrfs_root *root) +static noinline int may_destroy_subvol(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_path *path; @@ -4387,7 +4387,7 @@ noinline int may_destroy_subvol(struct btrfs_root *root) return ret; } -static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) +int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) { struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb); struct btrfs_root *root = BTRFS_I(dir)->root; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 11af9eb449ef..7559a1a82e6d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2266,12 +2266,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, struct btrfs_root *root = BTRFS_I(dir)->root; struct btrfs_root *dest = NULL; struct btrfs_ioctl_vol_args *vol_args; - struct btrfs_trans_handle *trans; - struct btrfs_block_rsv block_rsv; - u64 root_flags; - u64 qgroup_reserved; int namelen; - int ret; int err = 0; if (!S_ISDIR(dir->i_mode)) @@ -2355,133 +2350,11 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, } inode_lock(inode); - - /* -* Don't allow to delete a subvolume with send in progress. This is -* inside the i_mutex so the error handling that has to drop the bit -* again is not run concurrently. -*/ - spin_lock(>root_item_lock); - root_flags = btrfs_root_flags(>root_item); - if (dest->send_in_progress == 0) { - btrfs_set_root_flags(>root_item, - root_flags | BTRFS_ROOT_SUBVOL_DEAD); - spin_unlock(>root_item_lock); - } else { - spin_unlock(>root_item_lock); - btrfs_warn(fs_info, - "Attempt to delete subvolume %llu during send", - dest->root_key.objectid); - err = -EPERM; - goto out_unlock_inode; - } - - down_write(_info->subvol_sem); - - err = may_destroy_subvol(dest); - if (err) - goto out_up_write; - - btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP); - /* -* One for dir inode, two for dir entries, two for root -* ref/backref. -*/ - err = btrfs_subvolume_reserve_metadata(root, _rsv, - 5, _reserved, true); - if (err) - goto out_up_write; - - trans = btrfs_start_transaction(root, 0); - if (IS_ERR(trans)) { - err = PTR_ERR(trans); - goto
[PATCH v3 2/3] btrfs: Allow rmdir(2) to delete a subvolume
This patch changes the behavior of rmdir(2) to allow it to delete an empty subvolume by default, unless it is not a default subvolume and send is not in progress. New function btrfs_delete_subvolume() is almost equal to the second half of btrfs_ioctl_snap_destroy(). This function requires inode_lock for both @dir and inode of @dentry. For rmdir(2) it is already acquired in vfs layer before calling btrfs_rmdir(). Note that while a non-privileged user cannot delete a read-only subvolume by "btrfs subvolume delete" when user_subvol_rm_allowd mount option is enabled, rmdir(2) can delete an empty read-only subvolume. Tested-by: Goffredo BaroncelliSigned-off-by: Tomohiro Misono --- fs/btrfs/inode.c | 141 ++- 1 file changed, 140 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index db66fa4fede6..84dbb9cafd6b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4387,6 +4387,145 @@ noinline int may_destroy_subvol(struct btrfs_root *root) return ret; } +static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb); + struct btrfs_root *root = BTRFS_I(dir)->root; + struct inode *inode = d_inode(dentry); + struct btrfs_root *dest = BTRFS_I(inode)->root; + struct btrfs_trans_handle *trans; + struct btrfs_block_rsv block_rsv; + u64 root_flags; + u64 qgroup_reserved; + int ret; + int err; + + /* +* Don't allow to delete a subvolume with send in progress. This is +* inside the i_mutex so the error handling that has to drop the bit +* again is not run concurrently. +*/ + spin_lock(>root_item_lock); + root_flags = btrfs_root_flags(>root_item); + if (dest->send_in_progress == 0) { + btrfs_set_root_flags(>root_item, + root_flags | BTRFS_ROOT_SUBVOL_DEAD); + spin_unlock(>root_item_lock); + } else { + spin_unlock(>root_item_lock); + btrfs_warn(fs_info, + "Attempt to delete subvolume %llu during send", + dest->root_key.objectid); + err = -EPERM; + return err; + } + + down_write(_info->subvol_sem); + + err = may_destroy_subvol(dest); + if (err) + goto out_up_write; + + btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP); + /* +* One for dir inode, two for dir entries, two for root +* ref/backref. +*/ + err = btrfs_subvolume_reserve_metadata(root, _rsv, + 5, _reserved, true); + if (err) + goto out_up_write; + + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + err = PTR_ERR(trans); + goto out_release; + } + trans->block_rsv = _rsv; + trans->bytes_reserved = block_rsv.size; + + btrfs_record_snapshot_destroy(trans, BTRFS_I(dir)); + + ret = btrfs_unlink_subvol(trans, root, dir, + dest->root_key.objectid, + dentry->d_name.name, + dentry->d_name.len); + if (ret) { + err = ret; + btrfs_abort_transaction(trans, ret); + goto out_end_trans; + } + + btrfs_record_root_in_trans(trans, dest); + + memset(>root_item.drop_progress, 0, + sizeof(dest->root_item.drop_progress)); + dest->root_item.drop_level = 0; + btrfs_set_root_refs(>root_item, 0); + + if (!test_and_set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, >state)) { + ret = btrfs_insert_orphan_item(trans, + fs_info->tree_root, + dest->root_key.objectid); + if (ret) { + btrfs_abort_transaction(trans, ret); + err = ret; + goto out_end_trans; + } + } + + ret = btrfs_uuid_tree_rem(trans, fs_info, dest->root_item.uuid, + BTRFS_UUID_KEY_SUBVOL, + dest->root_key.objectid); + if (ret && ret != -ENOENT) { + btrfs_abort_transaction(trans, ret); + err = ret; + goto out_end_trans; + } + if (!btrfs_is_empty_uuid(dest->root_item.received_uuid)) { + ret = btrfs_uuid_tree_rem(trans, fs_info, + dest->root_item.received_uuid, + BTRFS_UUID_KEY_RECEIVED_SUBVOL, + dest->root_key.objectid); + if (ret && ret != -ENOENT) { +
[PATCH v3 1/3] btrfs: move may_destroy_subvol() from ioctl.c to inode.c
This is a preparation work to allow rmdir(2) to delete a subvolume. Signed-off-by: Tomohiro Misono--- fs/btrfs/ctree.h | 1 + fs/btrfs/inode.c | 54 ++ fs/btrfs/ioctl.c | 54 -- 3 files changed, 55 insertions(+), 54 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index da308774b8a4..2dbead106aab 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3166,6 +3166,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *dir, u64 objectid, const char *name, int name_len); +noinline int may_destroy_subvol(struct btrfs_root *root); int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, int front); int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f53470112670..db66fa4fede6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4333,6 +4333,60 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, return ret; } +/* + * helper to check if the subvolume references other subvolumes + */ +noinline int may_destroy_subvol(struct btrfs_root *root) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_path *path; + struct btrfs_dir_item *di; + struct btrfs_key key; + u64 dir_id; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + /* Make sure this root isn't set as the default subvol */ + dir_id = btrfs_super_root_dir(fs_info->super_copy); + di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path, + dir_id, "default", 7, 0); + if (di && !IS_ERR(di)) { + btrfs_dir_item_key_to_cpu(path->nodes[0], di, ); + if (key.objectid == root->root_key.objectid) { + ret = -EPERM; + btrfs_err(fs_info, + "deleting default subvolume %llu is not allowed", + key.objectid); + goto out; + } + btrfs_release_path(path); + } + + key.objectid = root->root_key.objectid; + key.type = BTRFS_ROOT_REF_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0); + if (ret < 0) + goto out; + BUG_ON(ret == 0); + + ret = 0; + if (path->slots[0] > 0) { + path->slots[0]--; + btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); + if (key.objectid == root->root_key.objectid && + key.type == BTRFS_ROOT_REF_KEY) + ret = -ENOTEMPTY; + } +out: + btrfs_free_path(path); + return ret; +} + static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) { struct inode *inode = d_inode(dentry); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 111ee282b777..11af9eb449ef 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1843,60 +1843,6 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, return ret; } -/* - * helper to check if the subvolume references other subvolumes - */ -static noinline int may_destroy_subvol(struct btrfs_root *root) -{ - struct btrfs_fs_info *fs_info = root->fs_info; - struct btrfs_path *path; - struct btrfs_dir_item *di; - struct btrfs_key key; - u64 dir_id; - int ret; - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - /* Make sure this root isn't set as the default subvol */ - dir_id = btrfs_super_root_dir(fs_info->super_copy); - di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path, - dir_id, "default", 7, 0); - if (di && !IS_ERR(di)) { - btrfs_dir_item_key_to_cpu(path->nodes[0], di, ); - if (key.objectid == root->root_key.objectid) { - ret = -EPERM; - btrfs_err(fs_info, - "deleting default subvolume %llu is not allowed", - key.objectid); - goto out; - } - btrfs_release_path(path); - } - - key.objectid = root->root_key.objectid; - key.type = BTRFS_ROOT_REF_KEY; - key.offset = (u64)-1; - - ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0); - if (ret < 0) - goto out; - BUG_ON(ret == 0); - - ret = 0; - if (path->slots[0] > 0) { - path->slots[0]--; - btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); - if
[PATCH v3 0/3] Allow rmdir(2) to delete a subvolume
changelog: v2 -> v3 ... use if-else block instead of two if blocks and add Tested-by tag in 2nd patch v1 -> v2 ... split the patch to hopefully make review easier Note: I will send a xfstest if this series is merged. 1st patch is a preparation work just moving the declaration of may_destroy_subvol(). 2nd patch is the main part. New function btrfs_delete_subvolume() is introduced and used in btrfs_rmdir() when a direcoty is an empty subvolume. The function is almost the copy of second half of btrfs_ioctl_snap_destroy(). The code path for "sub delete" is not changed yet. 3rd patch is a cleanup of btrfs_ioctl_snap_destroy() and uses brrfs_delete_subvolume() for "sub delete" too. Tomohiro Misono (3): btrfs: move may_destroy_subvol() from ioctl.c to inode.c btrfs: Allow rmdir(2) to delete a subvolume btrfs: cleanup btrfs_ioctl_snap_destroy() by using btrfs_delete_subvolume() fs/btrfs/ctree.h | 5 +- fs/btrfs/inode.c | 197 ++- fs/btrfs/ioctl.c | 185 +-- 3 files changed, 198 insertions(+), 189 deletions(-) -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html