Re: Status of RAID5/6

2018-03-30 Thread Zygo Blaxell
On Fri, Mar 30, 2018 at 06:14:52PM +0200, Goffredo Baroncelli wrote:
> On 03/29/2018 11:50 PM, Zygo Blaxell wrote:
> > On Wed, Mar 21, 2018 at 09:02:36PM +0100, Christoph Anton Mitterer wrote:
> >> Hey.
> >>
> >> Some things would IMO be nice to get done/clarified (i.e. documented in
> >> the Wiki and manpages) from users'/admin's  POV:
> [...]
> > 
> > btrfs has no optimization like mdadm write-intent bitmaps; recovery
> > is always a full-device operation.  In theory btrfs could track
> > modifications at the chunk level but this isn't even specified in the
> > on-disk format, much less implemented.
> 
> It could go even further; it would be sufficient to track which
> *partial* stripes update will be performed before a commit, in one
> of the btrfs logs. Then in case of a mount of an unclean filesystem,
> a scrub on these stripes would be sufficient.

A scrub cannot fix a raid56 write hole--the data is already lost.
The damaged stripe updates must be replayed from the log.

A scrub could fix raid1/raid10 partial updates but only if the filesystem
can reliably track which blocks failed to be updated by the disconnected
disks.

It would be nice if scrub could be filtered the same way balance is, e.g.
only certain block ranges, or only metadata blocks; however, this is not
presently implemented.

> BR
> G.Baroncelli
> 
> [...]
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli 
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


Re: Status of RAID5/6

2018-03-30 Thread Zygo Blaxell
On Fri, Mar 30, 2018 at 09:21:00AM +0200, Menion wrote:
>  Thanks for the detailed explanation. I think that a summary of this
> should go in the btrfs raid56 wiki status page, because now it is
> completely inconsistent and if a user comes there, ihe may get the
> impression that the raid56 is just broken
> Still I have the 1 bilion dollar question: from your word I understand
> that even in RAID56 the metadata are spread on the devices in a coplex
> way, but shall I assume that the array can survice to the sudden death
> of one (two for raid6) HDD in the array?

I wouldn't assume that.  There is still the write hole, and while there
is a small probability of having a write hole failure, it's a probability
that applies on *every* write in degraded mode, and since disks can fail
at any time, the array can enter degraded mode at any time.

It's similar to lottery tickets--buy one ticket, you probably won't win,
but if you buy millions of tickets, you'll claim the prize eventually.
The "prize" in this case is a severely damaged, possibly unrecoverable
filesystem.

If the data is raid5 and the metadata is raid1, the filesystem can
survive a single disk failure easily; however, some of the data may be
lost if writes to the remaining disks are interrupted by a system crash
or power failure and the write hole issue occurs.  Note that the damage
is not necessarily limited to recently written data--it's any random
data that is merely located adjacent to written data on the filesystem.

I wouldn't use raid6 until the write hole issue is resolved.  There is
no configuration where two disks can fail and metadata can still be
updated reliably.

Some users use the 'ssd_spread' mount option to reduce the probability
of write hole failure, which happens to be helpful by accident on some
array configurations, but it has a fairly high cost when the array is
not degraded due to all the extra balancing required.



> Bye


signature.asc
Description: PGP signature


Re: [PATCH] Btrfs: print error messages when failing to read trees

2018-03-30 Thread David Sterba
On Thu, Mar 29, 2018 at 06:11:45AM +0800, Liu Bo wrote:
> When mount fails to read trees like fs tree, checksum tree, extent
> tree, etc, there is not enough information about where went wrong.
> 
> With this, messages like
> 
> "BTRFS warning (device sdf): failed to read root (objectid=7): -5"
> 
> would help us a bit.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.1 2/2] btrfs: Validate child tree block's level and first key

2018-03-30 Thread David Sterba
On Thu, Mar 29, 2018 at 09:08:11AM +0800, Qu Wenruo wrote:
> We have several reports about node pointer points to incorrect child
> tree blocks, which could have even wrong owner and level but still with
> valid generation and checksum.
> 
> Although btrfs check could handle it and print error message like:
> leaf parent key incorrect 60670574592
> 
> Kernel doesn't have enough check on this type of corruption correctly.
> At least add such check to read_tree_block() and btrfs_read_buffer(),
> where we need two new parameters @level and @first_key to verify the
> child tree block.
> 
> The new @level check is mandatory and all call sites are already
> modified to extract expected level from its call chain.
> 
> While @first_key is optional, the following call sites are skipping such
> check:
> 1) Root node/leaf
>As ROOT_ITEM doesn't contain the first key, skip @first_key check.
> 2) Direct backref
>Only parent bytenr and level is known and we need to resolve the key
>all by ourselves, skip @first_key check.
> 
> Another note of this verification is, it needs extra info from nodeptr
> or ROOT_ITEM, so it can't fit into current tree-checker framework, which
> is limited to node/leaf boundary.
> 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Make @level check mandatory, suggesed by Jeff and Nikolay.
>   Change parameter order as @level is now mandatory, put it in front of
>   @first_key.
>   Change verify_parent_level() to verify_key_level() to avoid confusion
>   on the @level parameter.
>   Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging.
> v2.1
>   Rebased to misc-next branch, to use parent_level, it needs to revert
>   Nikolay's patch "btrfs: Remove unused parent_level var from 
> btrfs_realloc_node",
>   sorry Nikolay.

Ah ok, I've reverted the patch, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: bail out on error during replay_dir_deletes

2018-03-30 Thread Liu Bo
If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
to bail out, otherwise @ret would be forced to be 0 after 'break;' and
the caller won't be aware of it.

Signed-off-by: Liu Bo 
---
 fs/btrfs/tree-log.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4ee9431..11e2c26 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2356,8 +2356,10 @@ static noinline int replay_dir_deletes(struct 
btrfs_trans_handle *trans,
nritems = btrfs_header_nritems(path->nodes[0]);
if (path->slots[0] >= nritems) {
ret = btrfs_next_leaf(root, path);
-   if (ret)
+   if (ret == 1)
break;
+   else if (ret < 0)
+   goto out;
}
btrfs_item_key_to_cpu(path->nodes[0], _key,
  path->slots[0]);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix NULL pointer dereference in log_dir_items

2018-03-30 Thread Liu Bo
0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
returned, path->nodes[0] could be NULL, log_dir_items lacks such a
check for <0 and we may run into a null pointer dereference panic.

Signed-off-by: Liu Bo 
---
 fs/btrfs/tree-log.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 4344577..4ee9431 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3518,8 +3518,11 @@ static noinline int log_dir_items(struct 
btrfs_trans_handle *trans,
 * from this directory and from this transaction
 */
ret = btrfs_next_leaf(root, path);
-   if (ret == 1) {
-   last_offset = (u64)-1;
+   if (ret) {
+   if (ret == 1)
+   last_offset = (u64)-1;
+   else
+   err = ret;
goto done;
}
btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: clean up resources during umount after trans is aborted

2018-03-30 Thread Liu Bo
Currently if some fatal errors occur, like all IO get -EIO, resources
would be cleaned up when
a) transaction is being committed or
b) BTRFS_FS_STATE_ERROR is set

However, in some rare cases, resources may be left alone after transaction
gets aborted and umount may run into some ASSERT(), e.g.
ASSERT(list_empty(_group->dirty_list));

For case a), in btrfs_commit_transaciton(), there're several places at the
beginning where we just call btrfs_end_transaction() without cleaning up
resources.  For case b), it is possible that the trans handle doesn't have
any dirty stuff, then only trans hanlde is marked as aborted while
BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.

This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
all resources won't stay in memory after umount.

Signed-off-by: Liu Bo 
---
 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 21f34ad..643bd69 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3735,7 +3735,8 @@ void close_ctree(struct btrfs_fs_info *fs_info)
btrfs_err(fs_info, "commit super ret %d", ret);
}
 
-   if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state))
+   if (test_bit(BTRFS_FS_STATE_ERROR, _info->fs_state) ||
+   test_bit(BTRFS_FS_STATE_TRANS_ABORTED, _info->fs_state))
btrfs_error_commit_super(fs_info);
 
kthread_stop(fs_info->transaction_kthread);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] libbtrfsutil: misc fixes

2018-03-30 Thread David Sterba
On Thu, Mar 29, 2018 at 12:53:52AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> This is a handful of minor fixes for libbtrfsutil that would be good to
> get in before the release. Patch 1 fixes an issue reported by Tomohiro a
> few weeks back. Patch 2 fixes a memory leak I noticed while writing
> patch 1. Patch 3 makes us use the local mkfs.btrfs so that the tests can
> run on the CI system. Patch 4 fixes an issue I noticed while testing
> that fix. Patch 5 fixes a test failure caused by "btrfs-progs: mkfs: add
> uuid and otime to ROOT_ITEM of, FS_TREE".
> 
> Based on your devel branch. Please apply!

Done, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] btrfs-progs: Enhance btrfs-image to handle missing device

2018-03-30 Thread David Sterba
On Fri, Mar 30, 2018 at 03:35:25PM +0800, Qu Wenruo wrote:
> Can be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/btrfs_image_fix
> 
> Bug report:
> https://github.com/kdave/btrfs-progs/issues/118
> 
> In short, the problem is caused by some old code (read_extent_data()
> from ancient btrfs check code) and offset-by-one from btrfs-image.
> Which makes btrfs-image can only read from the first stripe of RAID1.
> 
> And if device of the first stripe is missing, btrfs-image will fail.
> 
> Fix the problem and add test case for it.
> 
> Qu Wenruo (3):
>   btrfs-progs: disk-io: Fix read_extent_data() error handler for missing
> device
>   btrfs-progs: convert: Fix offset-by-one error in read_data_extent()
>   btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing
> device

1-3 applied, thanks. I'll reorder the uuid tree patch after those
patches so the test does not fail.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing device

2018-03-30 Thread David Sterba
On Fri, Mar 30, 2018 at 03:35:28PM +0800, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 

Applied with the fixes below, thanks.

> ---
>  tests/misc-tests/030-missing-device-image/test.sh | 57 
> +++
>  1 file changed, 57 insertions(+)
>  create mode 100755 tests/misc-tests/030-missing-device-image/test.sh
> 
> diff --git a/tests/misc-tests/030-missing-device-image/test.sh 
> b/tests/misc-tests/030-missing-device-image/test.sh
> new file mode 100755
> index ..b8ae3a950cc9
> --- /dev/null
> +++ b/tests/misc-tests/030-missing-device-image/test.sh
> @@ -0,0 +1,57 @@
> +#!/bin/bash
> +# Test that btrfs-image can dump image correctly for missing device (RAID1)
> +#
> +# At least for RAID1, btrfs-image should be able to handle one missing device
> +# without any problem
> +
> +source "$TEST_TOP/common"
> +
> +check_prereq btrfs-image
> +check_prereq mkfs.btrfs
> +check_prereq btrfs
> +
> +setup_root_helper
> +setup_loopdevs 2
> +prepare_loopdevs
> +dev1=${loopdevs[1]}
> +dev2=${loopdevs[2]}
> +
> +# $1:device number to remove (either 1 or 2)
> +tmp=$(mktemp --tmpdir -d btrfs-progs-misc-test-XXX)

Not necessary.

> +test_missing()
> +{
> + bad_num=$1
> + bad_dev=${loopdevs[$bad_num]}
> + good_num=$((3 - $bad_num))
> + good_dev=${loopdevs[$good_num]}

All of them should be declared as local

> +
> + run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -d raid1 -m raid1 \
> + "$dev1" "$dev2"
> + 
> + # fill the fs with some data, we could create space cache
> + run_check $SUDO_HELPER mount "$dev1" "$TEST_MNT"
> + run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/a" bs=1M count=10
> + run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/b" bs=4k 
> count=1000 conv=sync
> + run_check $SUDO_HELPER umount "$TEST_MNT"
> + 
> + # make sure we have space cache
> + run_check_stdout "$TOP/btrfs" inspect dump-tree -t root "$dev1" \
> + > "$tmp/output"
> + if ! grep -q "EXTENT_DATA" "$tmp/output" ; then

this can be glued to one line, making the tmp files unnecessry

> + # normally above operation should create space cache.
> + # if not, it may means we have migrated to v2 cache by default
> + _not_run "unable to create v1 space cache"
> + fi
> +
> + # now wipe the device
> + run_check wipefs -fa "$bad_dev"
> +
> + # we don't care about the image but btrfs-image must not fail
> + run_check "$TOP/btrfs-image" "$good_dev" /dev/null
> +}
> +
> +# Test with either device missing, so we're ensured to hit missing device
> +test_missing 1
> +test_missing 2
> +cleanup_loopdevs
> +rm $tmp -rf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: tests/mkfs: Test if mkfs.btrfs --rootdir can handle broken soft link well

2018-03-30 Thread David Sterba
On Thu, Mar 29, 2018 at 09:26:44AM +0800, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 

Test fixed and applied, thanks.

> ---
>  .../016-rootdir-bad-symbolic-link/test.sh  | 26 
> ++
>  1 file changed, 26 insertions(+)
>  create mode 100755 tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh
> 
> diff --git a/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh 
> b/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh
> new file mode 100755
> index ..d12efa629042
> --- /dev/null
> +++ b/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh
> @@ -0,0 +1,26 @@
> +#!/bin/bash
> +# Regression test for mkfs.btrfs --rootdir with bad symbolic link
> +# (points to non-existing location)
> +#
> +# Since mkfs.btrfs --rootdir will just create symbolic link other than
> +# follow it, we shouldn't hit any problem
> +
> +source "$TEST_TOP/common"
> +
> +check_prereq mkfs.btrfs
> +
> +prepare_test_dev
> +
> +tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
> +
> +non_existing="/no/such/file"

Adding a few $RANDOM will make the below almost impossible.

> +
> +if [ -f "$non_existing" ]; then
> + # Some smartass don't want to this test case to run
> + _not_run "Some one created $non_exist, which is not expect to exist"
   ^^

> +fi
> +
> +run_check ln -sf "$non_existing" "$tmp/foobar"
> +
> +run_check "$TOP/mkfs.btrfs" -f --rootdir "$tmp" "$TEST_DEV"
> +run_check "$TOP/btrfs" check "$TEST_DEV"

rm -rf -- "$tmp"
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] btrfs-progs: fi usage: cleanup unnecessary permission error check

2018-03-30 Thread David Sterba
On Thu, Mar 29, 2018 at 05:23:39PM +0900, Misono Tomohiro wrote:
> Since BTRFS_IOC_FS_INFO does not require root privilege, there is no
> need to check EPERM error.

Well, this has been changed in 3.16, but there's still 3.2 longterm
kernel. It does not hurt to have the check and verbose message so we may
need to keep it for a few more years.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Remove unused parameter

2018-03-30 Thread David Sterba
On Fri, Mar 30, 2018 at 08:49:55AM +0300, Nikolay Borisov wrote:
> 
> 
> On 30.03.2018 05:56, Gu Jinxiang wrote:
> > Parameter usagestr is not used, remove it.
> > 
> > Signed-off-by: Gu Jinxiang 
> 
> Reviewed-by: Nikolay Borisov 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question, will ls -l eventually be able to show subvolumes?

2018-03-30 Thread Adam Borowski
On Fri, Mar 30, 2018 at 10:42:10AM +0100, Pete wrote:
> I've just notice work going on to make rmdir be able to delete
> subvolumes.  Is there an intent to allow ls -l to display directories as
> subvolumes?

That's entirely up to coreutils guys.


Meow!
-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"

2018-03-30 Thread Goffredo Baroncelli
On 03/30/2018 06:46 AM, Misono Tomohiro wrote:
> On 2018/03/30 2:35, Goffredo Baroncelli wrote:
>> Hi Misono,
> 
> Hello,
[...]
>> My conclusion, is that your work is not consistent with the current 
>> implementation; so I am suggesting to create a new subcommand ("btrfs sub 
>> tree" ?) where you can use, without constraint, your api.
> 
> I agree that the current output of "sub list" is confusing first of all and
> it is not easy to keep consistent behavior between user and root.
> 
> So, I think "sub list" (or new command) should be:
>   - (default) list the subvolumes under the specified path, including 
> subvolumes mounted below the specified path.
>   Any user can do this (with appropriate permission checks).
>   The path to be printed is the relative from the specified path.
>   - (-a option) list all the subvolmumes in the filesystem. Only root can do 
> this.
>   The path to be printed is the absolute path from the toplevel 
> subvolume.
> 
> This would need some changes in progs code, but I think we can use the same 
> new ioctls I proposed[1] for the default behavior.



> 
> I'd like to update "sub list" command eventually rather than adding new 
> command, even though this breaks the compatibility.
> I want to know what other developer/user think.
> 
> [1] https://www.spinics.net/lists/linux-btrfs/msg76003.html
> 
>>
>> Another small differences that I found is in the subcommand "btrfs sub 
>> show". This is not capable to show the snapshots of a subvolume; I think 
>> that, in "user mode", it should be reported that there is a "missing 
>> capabilities" problem than to show an empty list
> 
> Yes, this is because that only the snapshots *under the specified path* are 
> searched.
> This can be easily changed to search under the mount point, but this also 
> results in
> slight change in print format of the path for root. I tried to keep as much 
> consistency in this version.

When I perform a snapshot, I think the new snapshot more as a sister/brother 
than a child, so I put it at the same level instead of below the source 
subvolume. Moreover the path printed at the first line seems to be relative at 
the root of filesystem.

> Thanks,
> Tomohiro Misono
> 
>>
>> Below an example of what I am say:
>>
>> ghigo@venice:/tmp$ btrfs sub cre a
>> ghigo@venice:/tmp$ btrfs sub snap a c
>>
>> ghigo@venice:/tmp$ # patched btrfs
>> ghigo@venice:/tmp$ /home/ghigo/btrfs/btrfs-progs/btrfs sub show a
>> tmp/a
>>  Name:   a
>>  UUID:   0c19a7a4-a7f5-1849-9262-88ab3da504d0
>>  Parent UUID:-
>>  Received UUID:  -
>>  Creation time:  2018-03-28 22:48:09 +0200
>>  Subvolume ID:   598
>>  Generation: 696908
>>  Gen at creation:696907
>>  Parent ID:  257
>>  Top level ID:   257
>>  Flags:  -
>>  Snapshot(s):
>> 
>>
>> ghigo@venice:/tmp$ # stock btrfs
>> ghigo@venice:/tmp$ sudo /home/ghigo/btrfs/btrfs-progs/btrfs sub show a
>>   ^^
>> tmp/a
>>  Name:   a
>>  UUID:   0c19a7a4-a7f5-1849-9262-88ab3da504d0
>>  Parent UUID:-
>>  Received UUID:  -
>>  Creation time:  2018-03-28 22:48:09 +0200
>>  Subvolume ID:   598
>>  Generation: 696908
>>  Gen at creation:696907
>>  Parent ID:  257
>>  Top level ID:   257
>>  Flags:  -
>>  Snapshot(s):
>>  debian/tmp/c
>> ^
>>
>>
>> BR
>> G.Baroncelli
>>
>>
>> -
>> Test performed:
>>
>> ghigo@venice:/tmp$ sudo btrfs sub crea a
>> ghigo@venice:/tmp$ sudo btrfs sub crea a/a1
>> ghigo@venice:/tmp$ sudo btrfs sub crea a/a1/a2
>> ghigo@venice:/tmp$ sudo btrfs sub crea b
>> ghigo@venice:/tmp$ sudo btrfs sub crea b/b1
>> ghigo@venice:/tmp$ sudo chmod og-rwx b/.
>>
>>
>> # stock btrfs progs
>> ghigo@venice:/tmp$ sudo btrfs sub l .
>> ID 257 gen 696886 top level 5 path debian
>> ID 289 gen 587461 top level 257 path var/lib/machines
>> ID 299 gen 693561 top level 5 path boot
>> ID 582 gen 683965 top level 5 path i386
>> ID 592 gen 696884 top level 257 path tmp/a
>> ID 593 gen 696885 top level 592 path tmp/a/a1
>> ID 594 gen 696885 top level 593 path tmp/a/a1/a2
>> ID 595 gen 696887 top level 257 path tmp/b
>> ID 596 gen 696887 top level 595 path tmp/b/b1
>>
>> # patched btrfs progs
>> ghigo@venice:/tmp$ /home/ghigo/btrfs/btrfs-progs/btrfs sub lis .
>> ID 592 gen 696884 top level 257 path a
>> ID 593 gen 696885 top level 592 path a/a1
>> ID 594 gen 696885 top level 593 path a/a1/a2
>> ID 595 gen 0 top level 257 path b
>>
>>
>> ghigo@venice:/tmp$ /home/ghigo/btrfs/btrfs-progs/btrfs sub show a
>> tmp/a
>>  Name:   a
>>  

Re: [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"

2018-03-30 Thread Goffredo Baroncelli
On 03/30/2018 06:46 AM, Misono Tomohiro wrote:
> [Un]fortunately, the stock "btrfs sub list" has the capability to show all 
> the subvolumes, even the ones not mounted, so this can be entirely replaced 
> by your API.

s/can be entirely/can't be entirely/

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of RAID5/6

2018-03-30 Thread Goffredo Baroncelli
On 03/29/2018 11:50 PM, Zygo Blaxell wrote:
> On Wed, Mar 21, 2018 at 09:02:36PM +0100, Christoph Anton Mitterer wrote:
>> Hey.
>>
>> Some things would IMO be nice to get done/clarified (i.e. documented in
>> the Wiki and manpages) from users'/admin's  POV:
[...]
> 
>>   - changing raid lvls?
> 
> btrfs uses a brute-force RAID conversion algorithm which always works, but
> takes zero short cuts.  e.g. there is no speed optimization implemented
> for cases like "convert 2-disk raid1 to 1-disk single" which can be
> very fast in theory.  The worst-case running time is the only running
> time available in btrfs.

[...]
What it is reported by Zygo is an excellent source of information. However I 
have to point out that BTRFS has a little optimization: i.e. scrub/balance only 
works on the allocated chunk. So a partial filled filesystem requires less time 
than a nearly filled one

> 
> btrfs has no optimization like mdadm write-intent bitmaps; recovery
> is always a full-device operation.  In theory btrfs could track
> modifications at the chunk level but this isn't even specified in the
> on-disk format, much less implemented.

It could go even further; it would be sufficient to track which *partial* 
stripes update will be performed before a commit, in one of the btrfs logs. 
Then in case of a mount of an unclean filesystem, a scrub on these stripes 
would be sufficient.

BR
G.Baroncelli

[...]


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Replace owner argument in add_pinned_bytes with a boolean

2018-03-30 Thread Nikolay Borisov
add_pinned_bytes reallyc ares whether the bytes being pinned are either
data or metadata. To that effect is checks whether the 'owner' argument
is less than BTRFS_FIRST_FREE_OBJECTID (256). This works because
owner can really have 2 types of values:
 a) For metadata extents it holds the level at which the parent is in
 the btree. This amounts to owner having the values 0-7

 b) In case of modifying data extentsi, owner is the inode number
 to which those extents belongs.

Let's make this more explicit byt converting the owner parameter to a
boolean value and either pass it directly when we know the type of
extents we are working with (i.e. in btrfs_free_tree_block). In cases
when the parent function can be called on both metadata/data extents
perform the check in the caller. This hopefully makes the interface
of add_pinned_bytes more intuitive.

Signed-off-by: Nikolay Borisov 
---

As an added bonus (but not changelog noteworthy) we even get a slight 
reduction in size (likely due to de-inlining):

add/remove: 0/0 grow/shrink: 2/2 up/down: 21/-65 (-44)
Function old new   delta
btrfs_free_extent238 249 +11
btrfs_inc_extent_ref 197 207 +10
add_pinned_bytes 109 104  -5
btrfs_free_tree_block906 846 -60
Total: Before=89122, After=89078, chg -0.05%

 fs/btrfs/extent-tree.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0fe196f19e66..1a663b820a7b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -756,12 +756,12 @@ static struct btrfs_space_info *__find_space_info(struct 
btrfs_fs_info *info,
 }
 
 static void add_pinned_bytes(struct btrfs_fs_info *fs_info, s64 num_bytes,
-u64 owner, u64 root_objectid)
+bool metadata, u64 root_objectid)
 {
struct btrfs_space_info *space_info;
u64 flags;
 
-   if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+   if (metadata) {
if (root_objectid == BTRFS_CHUNK_TREE_OBJECTID)
flags = BTRFS_BLOCK_GROUP_SYSTEM;
else
@@ -2212,8 +2212,10 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle 
*trans,
 _ref_mod, _ref_mod);
}
 
-   if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0)
-   add_pinned_bytes(fs_info, -num_bytes, owner, root_objectid);
+   if (ret == 0 && old_ref_mod < 0 && new_ref_mod >= 0) {
+   bool metadata = owner < BTRFS_FIRST_FREE_OBJECTID;
+   add_pinned_bytes(fs_info, -num_bytes, metadata, root_objectid);
+   }
 
return ret;
 }
@@ -7231,7 +7233,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle 
*trans,
}
 out:
if (pin)
-   add_pinned_bytes(fs_info, buf->len, btrfs_header_level(buf),
+   add_pinned_bytes(fs_info, buf->len, true,
 root->root_key.objectid);
 
if (last_ref) {
@@ -7285,8 +7287,10 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans,
 _ref_mod, _ref_mod);
}
 
-   if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0)
-   add_pinned_bytes(fs_info, num_bytes, owner, root_objectid);
+   if (ret == 0 && old_ref_mod >= 0 && new_ref_mod < 0) {
+   bool metadata = owner < BTRFS_FIRST_FREE_OBJECTID;
+   add_pinned_bytes(fs_info, num_bytes, metadata, root_objectid);
+   }
 
return ret;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question, will ls -l eventually be able to show subvolumes?

2018-03-30 Thread Pete
I've just notice work going on to make rmdir be able to delete
subvolumes.  Is there an intent to allow ls -l to display directories as
subvolumes?

Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] libbtrfsutil: fix test assumptions about top-level subvolume

2018-03-30 Thread Misono Tomohiro
On 2018/03/29 16:53, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> Since "btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE",
> the top-level subvolume has a non-zero UUID, ctime, and otime. Fix the
> subvolume_info() test to not check for zero.

Sorry, I didn't notice this.

I checked this works in devel branch:
Reviewed-by: Tomohiro Misono 

> 
> Signed-off-by: Omar Sandoval 
> ---
>  libbtrfsutil/python/tests/test_subvolume.py | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/libbtrfsutil/python/tests/test_subvolume.py 
> b/libbtrfsutil/python/tests/test_subvolume.py
> index a46d4a34..93396cba 100644
> --- a/libbtrfsutil/python/tests/test_subvolume.py
> +++ b/libbtrfsutil/python/tests/test_subvolume.py
> @@ -95,7 +95,8 @@ class TestSubvolume(BtrfsTestCase):
>  self.assertEqual(info.parent_id, 0)
>  self.assertEqual(info.dir_id, 0)
>  self.assertEqual(info.flags, 0)
> -self.assertEqual(info.uuid, bytes(16))
> +self.assertIsInstance(info.uuid, bytes)
> +self.assertEqual(len(info.uuid), 16)
>  self.assertEqual(info.parent_uuid, bytes(16))
>  self.assertEqual(info.received_uuid, bytes(16))
>  self.assertNotEqual(info.generation, 0)
> @@ -103,8 +104,8 @@ class TestSubvolume(BtrfsTestCase):
>  self.assertEqual(info.otransid, 0)
>  self.assertEqual(info.stransid, 0)
>  self.assertEqual(info.rtransid, 0)
> -self.assertEqual(info.ctime, 0)
> -self.assertEqual(info.otime, 0)
> +self.assertIsInstance(info.ctime, float)
> +self.assertIsInstance(info.otime, float)>  
> self.assertEqual(info.stime, 0)
>  self.assertEqual(info.rtime, 0)
>  
> @@ -117,6 +118,7 @@ class TestSubvolume(BtrfsTestCase):
>  self.assertEqual(info.dir_id, 256)
>  self.assertEqual(info.flags, 0)
>  self.assertIsInstance(info.uuid, bytes)
> +self.assertEqual(len(info.uuid), 16)
>  self.assertEqual(info.parent_uuid, bytes(16))
>  self.assertEqual(info.received_uuid, bytes(16))
>  self.assertNotEqual(info.generation, 0)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: Enhance btrfs-image to handle missing device

2018-03-30 Thread Qu Wenruo
Can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/btrfs_image_fix

Bug report:
https://github.com/kdave/btrfs-progs/issues/118

In short, the problem is caused by some old code (read_extent_data()
from ancient btrfs check code) and offset-by-one from btrfs-image.
Which makes btrfs-image can only read from the first stripe of RAID1.

And if device of the first stripe is missing, btrfs-image will fail.

Fix the problem and add test case for it.

Qu Wenruo (3):
  btrfs-progs: disk-io: Fix read_extent_data() error handler for missing
device
  btrfs-progs: convert: Fix offset-by-one error in read_data_extent()
  btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing
device

 disk-io.c |  6 ++-
 image/main.c  |  2 +-
 tests/misc-tests/030-missing-device-image/test.sh | 57 +++
 3 files changed, 62 insertions(+), 3 deletions(-)
 create mode 100755 tests/misc-tests/030-missing-device-image/test.sh

-- 
2.16.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: disk-io: Fix read_extent_data() error handler for missing device

2018-03-30 Thread Qu Wenruo
When device is missing, read_extent_data() (function exported from old
btrfs check code) has the following problems:

1) Modify @len parameter if device is missing
   If device returned in @multi is missing, @len can be larger than
   @max_len (originl length).

   This could confusing caller and underflow the read loop.

2) Still return 0 for missing device
   It only handles read error, missing device is not handled and 0 is
   returned.

3) Wrong check for device->fd
   In fact, 0 is also a valid fd.
   Although not possible under most case, it's still need fix.

Fix them all.

Fixes: 1bad2f2f2dfe ("Btrfs-progs: fsck: add an option to check data csums")
Signed-off-by: Qu Wenruo 
---
 disk-io.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 610963357675..310ab19cf099 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -395,10 +395,12 @@ int read_extent_data(struct btrfs_fs_info *fs_info, char 
*data, u64 logical,
}
device = multi->stripes[0].dev;
 
-   if (device->fd <= 0)
-   goto err;
if (*len > max_len)
*len = max_len;
+   if (device->fd < 0) {
+   ret = -EIO;
+   goto err;
+   }
 
ret = pread64(device->fd, data, *len, multi->stripes[0].physical);
if (ret != *len)
-- 
2.16.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: tests/misc: Test if btrfs-image can handle RAID1 missing device

2018-03-30 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 tests/misc-tests/030-missing-device-image/test.sh | 57 +++
 1 file changed, 57 insertions(+)
 create mode 100755 tests/misc-tests/030-missing-device-image/test.sh

diff --git a/tests/misc-tests/030-missing-device-image/test.sh 
b/tests/misc-tests/030-missing-device-image/test.sh
new file mode 100755
index ..b8ae3a950cc9
--- /dev/null
+++ b/tests/misc-tests/030-missing-device-image/test.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# Test that btrfs-image can dump image correctly for missing device (RAID1)
+#
+# At least for RAID1, btrfs-image should be able to handle one missing device
+# without any problem
+
+source "$TEST_TOP/common"
+
+check_prereq btrfs-image
+check_prereq mkfs.btrfs
+check_prereq btrfs
+
+setup_root_helper
+setup_loopdevs 2
+prepare_loopdevs
+dev1=${loopdevs[1]}
+dev2=${loopdevs[2]}
+
+# $1:  device number to remove (either 1 or 2)
+tmp=$(mktemp --tmpdir -d btrfs-progs-misc-test-XXX)
+test_missing()
+{
+   bad_num=$1
+   bad_dev=${loopdevs[$bad_num]}
+   good_num=$((3 - $bad_num))
+   good_dev=${loopdevs[$good_num]}
+
+   run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -d raid1 -m raid1 \
+   "$dev1" "$dev2"
+   
+   # fill the fs with some data, we could create space cache
+   run_check $SUDO_HELPER mount "$dev1" "$TEST_MNT"
+   run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/a" bs=1M count=10
+   run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/b" bs=4k 
count=1000 conv=sync
+   run_check $SUDO_HELPER umount "$TEST_MNT"
+   
+   # make sure we have space cache
+   run_check_stdout "$TOP/btrfs" inspect dump-tree -t root "$dev1" \
+   > "$tmp/output"
+   if ! grep -q "EXTENT_DATA" "$tmp/output" ; then
+   # normally above operation should create space cache.
+   # if not, it may means we have migrated to v2 cache by default
+   _not_run "unable to create v1 space cache"
+   fi
+
+   # now wipe the device
+   run_check wipefs -fa "$bad_dev"
+
+   # we don't care about the image but btrfs-image must not fail
+   run_check "$TOP/btrfs-image" "$good_dev" /dev/null
+}
+
+# Test with either device missing, so we're ensured to hit missing device
+test_missing 1
+test_missing 2
+cleanup_loopdevs
+rm $tmp -rf
-- 
2.16.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of RAID5/6

2018-03-30 Thread Menion
 Thanks for the detailed explanation. I think that a summary of this
should go in the btrfs raid56 wiki status page, because now it is
completely inconsistent and if a user comes there, ihe may get the
impression that the raid56 is just broken
Still I have the 1 bilion dollar question: from your word I understand
that even in RAID56 the metadata are spread on the devices in a coplex
way, but shall I assume that the array can survice to the sudden death
of one (two for raid6) HDD in the array?
Bye
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] btrfs-progs: extent buffer related refactor and cleanup

2018-03-30 Thread Lu Fengqi
On Fri, Mar 30, 2018 at 01:48:52PM +0800, Qu Wenruo wrote:
>The patchset can be fetched from github:
>https://github.com/adam900710/btrfs-progs/tree/eb_cleanup
>
>Just like kernel cleanup and refactors, this patchset will embed
>btrfs_fs_info structure into extent_buffer.
>
>And fixes several possible NULL pointer dereference (although not
>utilized in btrfs-progs yet).
>
>Changelog:
>v2:
>  Embarrassingly, I forgot to install reiserfsprogs in my development
>  machine, so the 3rd patch lacks one call site in
>  convert/source-reiserfs.c.
>
>Qu Wenruo (5):
>  btrfs-progs: extent_io: Fix NULL pointer dereference in
>free_extent_buffer_final()
>  btrfs-progs: extent_io: Init eb->lru to avoid NULL pointer dereference
>  btrfs-progs: extent_io: Refactor alloc_extent_buffer() to follow
>kernel parameters
>  btrfs-progs: Unify btrfs_leaf_free_psace() parameter with kernel
>  btrfs-progs: print-tree: Remove btrfs_root parameter

The patchset looks good to me.

Reviewed-by: Lu Fengqi 

>
> btrfs-corrupt-block.c |  2 +-
> check/main.c  |  2 +-
> check/mode-lowmem.c   |  2 +-
> cmds-inspect-dump-tree.c  | 31 ++
> convert/source-reiserfs.c |  3 +--
> ctree.c   | 65 +--
> ctree.h   |  3 ++-
> disk-io.c |  3 +--
> extent-tree.c |  8 +++---
> extent_io.c   | 17 -
> extent_io.h   |  3 ++-
> print-tree.c  | 20 ---
> print-tree.h  |  4 +--
> 13 files changed, 85 insertions(+), 78 deletions(-)
>
>-- 
>2.16.3
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

-- 
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] btrfs: cleanup btrfs_ioctl_snap_destroy() by using btrfs_delete_subvolume()

2018-03-30 Thread Misono Tomohiro
Use btrfs_delete_subvolume() in btrfs_ioctl_snap_destroy() too to
cleanup the code. Call of d_delete() is still required since
btrfs_delete_subvolume() does not call it (for rmdir(2), vfs layer later
calls it).

As a result, btrfs_unlink_subvol() and may_destroy_subvol()
become static functions. No functional change happens.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ctree.h |   6 +--
 fs/btrfs/inode.c |   6 +--
 fs/btrfs/ioctl.c | 131 +--
 3 files changed, 6 insertions(+), 137 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2dbead106aab..6f23f0694ac3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3162,11 +3162,7 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
 int btrfs_add_link(struct btrfs_trans_handle *trans,
   struct btrfs_inode *parent_inode, struct btrfs_inode *inode,
   const char *name, int name_len, int add_backref, u64 index);
-int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root,
-   struct inode *dir, u64 objectid,
-   const char *name, int name_len);
-noinline int may_destroy_subvol(struct btrfs_root *root);
+int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry);
 int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 84dbb9cafd6b..afc631f28088 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4252,7 +4252,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry 
*dentry)
return ret;
 }
 
-int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
+static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len)
@@ -4336,7 +4336,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
 /*
  * helper to check if the subvolume references other subvolumes
  */
-noinline int may_destroy_subvol(struct btrfs_root *root)
+static noinline int may_destroy_subvol(struct btrfs_root *root)
 {
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_path *path;
@@ -4387,7 +4387,7 @@ noinline int may_destroy_subvol(struct btrfs_root *root)
return ret;
 }
 
-static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
+int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
struct btrfs_root *root = BTRFS_I(dir)->root;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 11af9eb449ef..7559a1a82e6d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2266,12 +2266,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file 
*file,
struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_root *dest = NULL;
struct btrfs_ioctl_vol_args *vol_args;
-   struct btrfs_trans_handle *trans;
-   struct btrfs_block_rsv block_rsv;
-   u64 root_flags;
-   u64 qgroup_reserved;
int namelen;
-   int ret;
int err = 0;
 
if (!S_ISDIR(dir->i_mode))
@@ -2355,133 +2350,11 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
file *file,
}
 
inode_lock(inode);
-
-   /*
-* Don't allow to delete a subvolume with send in progress. This is
-* inside the i_mutex so the error handling that has to drop the bit
-* again is not run concurrently.
-*/
-   spin_lock(>root_item_lock);
-   root_flags = btrfs_root_flags(>root_item);
-   if (dest->send_in_progress == 0) {
-   btrfs_set_root_flags(>root_item,
-   root_flags | BTRFS_ROOT_SUBVOL_DEAD);
-   spin_unlock(>root_item_lock);
-   } else {
-   spin_unlock(>root_item_lock);
-   btrfs_warn(fs_info,
-  "Attempt to delete subvolume %llu during send",
-  dest->root_key.objectid);
-   err = -EPERM;
-   goto out_unlock_inode;
-   }
-
-   down_write(_info->subvol_sem);
-
-   err = may_destroy_subvol(dest);
-   if (err)
-   goto out_up_write;
-
-   btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP);
-   /*
-* One for dir inode, two for dir entries, two for root
-* ref/backref.
-*/
-   err = btrfs_subvolume_reserve_metadata(root, _rsv,
-  5, _reserved, true);
-   if (err)
-   goto out_up_write;
-
-   trans = btrfs_start_transaction(root, 0);
-   if (IS_ERR(trans)) {
-   err = PTR_ERR(trans);
-   goto 

[PATCH v3 2/3] btrfs: Allow rmdir(2) to delete a subvolume

2018-03-30 Thread Misono Tomohiro
This patch changes the behavior of rmdir(2) to allow it to delete
an empty subvolume by default, unless it is not a default subvolume
and send is not in progress.

New function btrfs_delete_subvolume() is almost equal to the second half
of btrfs_ioctl_snap_destroy(). This function requires inode_lock for both
@dir and inode of @dentry. For rmdir(2) it is already acquired in vfs
layer before calling btrfs_rmdir().

Note that while a non-privileged user cannot delete a read-only subvolume
by "btrfs subvolume delete" when user_subvol_rm_allowd mount option is
enabled, rmdir(2) can delete an empty read-only subvolume.

Tested-by: Goffredo Baroncelli 
Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/inode.c | 141 ++-
 1 file changed, 140 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index db66fa4fede6..84dbb9cafd6b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4387,6 +4387,145 @@ noinline int may_destroy_subvol(struct btrfs_root *root)
return ret;
 }
 
+static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
+{
+   struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
+   struct btrfs_root *root = BTRFS_I(dir)->root;
+   struct inode *inode = d_inode(dentry);
+   struct btrfs_root *dest = BTRFS_I(inode)->root;
+   struct btrfs_trans_handle *trans;
+   struct btrfs_block_rsv block_rsv;
+   u64 root_flags;
+   u64 qgroup_reserved;
+   int ret;
+   int err;
+
+   /*
+* Don't allow to delete a subvolume with send in progress. This is
+* inside the i_mutex so the error handling that has to drop the bit
+* again is not run concurrently.
+*/
+   spin_lock(>root_item_lock);
+   root_flags = btrfs_root_flags(>root_item);
+   if (dest->send_in_progress == 0) {
+   btrfs_set_root_flags(>root_item,
+   root_flags | BTRFS_ROOT_SUBVOL_DEAD);
+   spin_unlock(>root_item_lock);
+   } else {
+   spin_unlock(>root_item_lock);
+   btrfs_warn(fs_info,
+  "Attempt to delete subvolume %llu during send",
+  dest->root_key.objectid);
+   err = -EPERM;
+   return err;
+   }
+
+   down_write(_info->subvol_sem);
+
+   err = may_destroy_subvol(dest);
+   if (err)
+   goto out_up_write;
+
+   btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP);
+   /*
+* One for dir inode, two for dir entries, two for root
+* ref/backref.
+*/
+   err = btrfs_subvolume_reserve_metadata(root, _rsv,
+  5, _reserved, true);
+   if (err)
+   goto out_up_write;
+
+   trans = btrfs_start_transaction(root, 0);
+   if (IS_ERR(trans)) {
+   err = PTR_ERR(trans);
+   goto out_release;
+   }
+   trans->block_rsv = _rsv;
+   trans->bytes_reserved = block_rsv.size;
+
+   btrfs_record_snapshot_destroy(trans, BTRFS_I(dir));
+
+   ret = btrfs_unlink_subvol(trans, root, dir,
+   dest->root_key.objectid,
+   dentry->d_name.name,
+   dentry->d_name.len);
+   if (ret) {
+   err = ret;
+   btrfs_abort_transaction(trans, ret);
+   goto out_end_trans;
+   }
+
+   btrfs_record_root_in_trans(trans, dest);
+
+   memset(>root_item.drop_progress, 0,
+   sizeof(dest->root_item.drop_progress));
+   dest->root_item.drop_level = 0;
+   btrfs_set_root_refs(>root_item, 0);
+
+   if (!test_and_set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, >state)) {
+   ret = btrfs_insert_orphan_item(trans,
+   fs_info->tree_root,
+   dest->root_key.objectid);
+   if (ret) {
+   btrfs_abort_transaction(trans, ret);
+   err = ret;
+   goto out_end_trans;
+   }
+   }
+
+   ret = btrfs_uuid_tree_rem(trans, fs_info, dest->root_item.uuid,
+ BTRFS_UUID_KEY_SUBVOL,
+ dest->root_key.objectid);
+   if (ret && ret != -ENOENT) {
+   btrfs_abort_transaction(trans, ret);
+   err = ret;
+   goto out_end_trans;
+   }
+   if (!btrfs_is_empty_uuid(dest->root_item.received_uuid)) {
+   ret = btrfs_uuid_tree_rem(trans, fs_info,
+ dest->root_item.received_uuid,
+ BTRFS_UUID_KEY_RECEIVED_SUBVOL,
+ dest->root_key.objectid);
+   if (ret && ret != -ENOENT) {
+ 

[PATCH v3 1/3] btrfs: move may_destroy_subvol() from ioctl.c to inode.c

2018-03-30 Thread Misono Tomohiro
This is a preparation work to allow rmdir(2) to delete a subvolume.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ctree.h |  1 +
 fs/btrfs/inode.c | 54 ++
 fs/btrfs/ioctl.c | 54 --
 3 files changed, 55 insertions(+), 54 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index da308774b8a4..2dbead106aab 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3166,6 +3166,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
+noinline int may_destroy_subvol(struct btrfs_root *root);
 int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f53470112670..db66fa4fede6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4333,6 +4333,60 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
return ret;
 }
 
+/*
+ * helper to check if the subvolume references other subvolumes
+ */
+noinline int may_destroy_subvol(struct btrfs_root *root)
+{
+   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct btrfs_path *path;
+   struct btrfs_dir_item *di;
+   struct btrfs_key key;
+   u64 dir_id;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   /* Make sure this root isn't set as the default subvol */
+   dir_id = btrfs_super_root_dir(fs_info->super_copy);
+   di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path,
+  dir_id, "default", 7, 0);
+   if (di && !IS_ERR(di)) {
+   btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
+   if (key.objectid == root->root_key.objectid) {
+   ret = -EPERM;
+   btrfs_err(fs_info,
+ "deleting default subvolume %llu is not 
allowed",
+ key.objectid);
+   goto out;
+   }
+   btrfs_release_path(path);
+   }
+
+   key.objectid = root->root_key.objectid;
+   key.type = BTRFS_ROOT_REF_KEY;
+   key.offset = (u64)-1;
+
+   ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0);
+   if (ret < 0)
+   goto out;
+   BUG_ON(ret == 0);
+
+   ret = 0;
+   if (path->slots[0] > 0) {
+   path->slots[0]--;
+   btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
+   if (key.objectid == root->root_key.objectid &&
+   key.type == BTRFS_ROOT_REF_KEY)
+   ret = -ENOTEMPTY;
+   }
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
 static int btrfs_rmdir(struct inode *dir, struct dentry *dentry)
 {
struct inode *inode = d_inode(dentry);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 111ee282b777..11af9eb449ef 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1843,60 +1843,6 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
return ret;
 }
 
-/*
- * helper to check if the subvolume references other subvolumes
- */
-static noinline int may_destroy_subvol(struct btrfs_root *root)
-{
-   struct btrfs_fs_info *fs_info = root->fs_info;
-   struct btrfs_path *path;
-   struct btrfs_dir_item *di;
-   struct btrfs_key key;
-   u64 dir_id;
-   int ret;
-
-   path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
-
-   /* Make sure this root isn't set as the default subvol */
-   dir_id = btrfs_super_root_dir(fs_info->super_copy);
-   di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path,
-  dir_id, "default", 7, 0);
-   if (di && !IS_ERR(di)) {
-   btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
-   if (key.objectid == root->root_key.objectid) {
-   ret = -EPERM;
-   btrfs_err(fs_info,
- "deleting default subvolume %llu is not 
allowed",
- key.objectid);
-   goto out;
-   }
-   btrfs_release_path(path);
-   }
-
-   key.objectid = root->root_key.objectid;
-   key.type = BTRFS_ROOT_REF_KEY;
-   key.offset = (u64)-1;
-
-   ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0);
-   if (ret < 0)
-   goto out;
-   BUG_ON(ret == 0);
-
-   ret = 0;
-   if (path->slots[0] > 0) {
-   path->slots[0]--;
-   btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
-   if 

[PATCH v3 0/3] Allow rmdir(2) to delete a subvolume

2018-03-30 Thread Misono Tomohiro
changelog:
  v2 -> v3 ... use if-else block instead of two if blocks and 
   add Tested-by tag in 2nd patch
  v1 -> v2 ... split the patch to hopefully make review easier

Note: I will send a xfstest if this series is merged.

1st patch is a preparation work just moving the declaration of
may_destroy_subvol().

2nd patch is the main part. New function btrfs_delete_subvolume() is
introduced and used in btrfs_rmdir() when a direcoty is an empty
subvolume. The function is almost the copy of second half of
btrfs_ioctl_snap_destroy().
The code path for "sub delete" is not changed yet.

3rd patch is a cleanup of btrfs_ioctl_snap_destroy() and uses 
brrfs_delete_subvolume() for "sub delete" too.

Tomohiro Misono (3):
  btrfs: move may_destroy_subvol() from ioctl.c to inode.c
  btrfs: Allow rmdir(2) to delete a subvolume
  btrfs: cleanup btrfs_ioctl_snap_destroy() by using
btrfs_delete_subvolume()

 fs/btrfs/ctree.h |   5 +-
 fs/btrfs/inode.c | 197 ++-
 fs/btrfs/ioctl.c | 185 +--
 3 files changed, 198 insertions(+), 189 deletions(-)

-- 
2.14.3


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html