Re: [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim metadata space

2016-10-06 Thread Wang Xiaoguang

Hi,

Any comments about these 3 patches, thanks.

Regards,
Xiaoguang Wang

On 09/22/2016 05:25 PM, Wang Xiaoguang wrote:

Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all
ordered extents, but I run into some enospc errors when doing large file
create and delete test, it's because shrink_delalloc() does not write
enough delalloc bytes and wait them finished:
 From: Miao Xie 
 Date: Mon, 4 Nov 2013 23:13:25 +0800
 Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered 
extents

 It is very likely that there are lots of ordered extents in the filesytem,
 if we wait for the completion of all of them when we want to reclaim some
 space for the metadata space reservation, we would be blocked for a long
 time. The performance would drop down suddenly for a long time.

But since Josef introduced "Btrfs: introduce ticketed enospc infrastructure",
shrink_delalloc() starts to be run asynchronously, then If we want to reclaim
metadata space, we can try harder, after all, false enospc error is not
acceptable.

Signed-off-by: Wang Xiaoguang 
---
  fs/btrfs/extent-tree.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 46c2a37..f7c420b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4721,7 +4721,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
if (trans)
return;
if (wait_ordered)
-   btrfs_wait_ordered_roots(root->fs_info, items,
+   btrfs_wait_ordered_roots(root->fs_info, -1,
 0, (u64)-1);
return;
}
@@ -4775,6 +4775,14 @@ skip_async:
}
delalloc_bytes = percpu_counter_sum_positive(
&root->fs_info->delalloc_bytes);
+   if (loops == 2) {
+   /*
+* Try to write all current delalloc bytes and wait all
+* ordered extents to have a last try.
+*/
+   to_reclaim = delalloc_bytes;
+   items = -1;
+   }
}
  }
  




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix enospc in punch hole

2016-10-06 Thread robbieko
From: Robbie Ko 

when extent-tree level > BTRFS_MAX_LEVEL / 2,
__btrfs_drop_extents -> btrfs_duplicate_item ->
setup_leaf_for_split -> split_leaf
maybe enospc, because min_size is too small,
need use btrfs_calc_trans_metadata_size.

Signed-off-by: Robbie Ko 
---
 fs/btrfs/file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fea31a4..809ca85 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2322,7 +2322,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
u64 tail_len;
u64 orig_start = offset;
u64 cur_offset;
-   u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
+   u64 min_size = btrfs_calc_trans_metadata_size(root, 1);
u64 drop_end;
int ret = 0;
int err = 0;
@@ -2469,7 +2469,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
ret = -ENOMEM;
goto out_free;
}
-   rsv->size = btrfs_calc_trunc_metadata_size(root, 1);
+   rsv->size = btrfs_calc_trans_metadata_size(root, 1);
rsv->failfast = 1;
 
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: image: fix compiler warning

2016-10-06 Thread Tsutomu Itoh
On 2016/10/07 1:00, David Sterba wrote:
> On Wed, Oct 05, 2016 at 05:07:48PM +0900, Tsutomu Itoh wrote:
>> In v4.8-rc1, gcc 5.3.1 gives following warning. Fixed it.
>>
>> [CC] btrfs-image.o
>> btrfs-image.c: In function 'flush_pending':
>> btrfs-image.c:708:17: warning: 'start' may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
>>   header->bytenr = cpu_to_le64(start);
>>  ^
>> btrfs-image.c:927:6: note: 'start' was declared here
>>   u64 start;
>>   ^
> 
> So the patch makes the compiler warning go away, but is the code
> correct? AFAICS, the warning points to the case where flush_pending is
> called with done=1 (from create_metadump) and there's zero
> md->pending_size . Are you sure this is an expected state and that the
> function can proceed with state = 0 ?
> 

I think that this is a case where some errors occurred before calling
flush_pending.
Therefore, create_metadump returns the error to the caller. (creating
the image fails, I think.)

Thanks,
Tsutomu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: relocation: Fix NULL pointer access and leaking qgroups in btrfs_recover_relocation

2016-10-06 Thread Qu Wenruo



At 10/07/2016 02:00 AM, Goldwyn Rodrigues wrote:

Hi Qu,

On 10/06/2016 03:03 AM, Qu Wenruo wrote:

We fixed balance qgroup leaking by introducing
qgroup_fix_relocated_data_extents(), but it only covers normal balance
routine well.

For case btrfs_recover_relocation() routine, since rc->stage or
rc->data_inode are not initialized, we either skip
qgroup_fix_relocated_data_extents() or just cause NULL pointer access.

Since we skip qgroup_fix_relocated_data_extents() in
btrfs_recover_relocation(), we will still leak qgroup numbers for that
routine.

In the fix, we won't use data_inode any longer, as at the timing of the
qgroup fix, noone will modify data_reloc tree any longer, so path
locking should be good enough.

And add check against rc->merge_reloc_tree, so we can detect if we are
in btrfs_recover_relocation() routine and continue qgroup fix.


While testing this patch, I realized the original problem of qgroup
values being incorrect after a balance, has returned... or probably was
not solved completely. I tried it with the shell script sent sometime
back. The script fails when I checkout fix df2c95f33e0a2 as well.

I remember it did not appear when I tested it last, so it is possible
that some other change has affected this.

Would recommend to hold off this until the balance issue is not
completely fixed.

For reference, the test script is:

#!/bin/bash -x

MNT="/mnt"
DEV="/dev/vdb"

mkfs.btrfs -f $DEV
mount -t btrfs $DEV $MNT

mkdir $MNT/snaps
echo "populate $MNT with some data"
#cp -a /usr/share/fonts $MNT/
cp -a /usr/ $MNT/ &
for i in `seq -w 0 8`; do
S="$MNT/snaps/snap$i"
echo "create and populate $S"
btrfs su snap $MNT $S;
cp -a /boot $S;
done;

#let the cp from above finish
wait

btrfs fi sync $MNT

btrfs quota enable $MNT
btrfs quota rescan -w $MNT
btrfs qg show $MNT

umount $MNT

mount -t btrfs $DEV $MNT


time btrfs balance start --full-balance $MNT

umount $MNT

btrfsck $DEV




Thanks for the report.

Yes, the script can reproduce the problem, almost 100% possibility.

But it also takes a long time to run.(Partly because of the slow 
find_parent_nodes function call).


I tried to simplify it by using inline extent to bumping the tree level 
and use small file extents to reduce IO.

But no good reproducer yet.

Did you remember which kernel version you tested the original fix?
Maybe it could provide some clue.

Thanks,
Qu





Reported-by: Goldwyn Rodrigues 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/relocation.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c0c13dc..f250187 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3946,8 +3946,7 @@ static int qgroup_fix_relocated_data_extents(struct 
btrfs_trans_handle *trans,
 struct reloc_control *rc)
 {
struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
-   struct inode *inode = rc->data_inode;
-   struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root;
+   struct btrfs_root *data_reloc_root;
struct btrfs_path *path;
struct btrfs_key key;
int ret = 0;
@@ -3956,18 +3955,28 @@ static int qgroup_fix_relocated_data_extents(struct 
btrfs_trans_handle *trans,
return 0;

/*
+* For normal balance routine, merge_reloc_tree flag is always cleared
+* before commit trans.
+* For recover relocation routine, merge_reloc_tree is always 1, we need
+* to continue anyway.
+*
 * Only for stage where we update data pointers the qgroup fix is
 * valid.
 * For MOVING_DATA stage, we will miss the timing of swapping tree
 * blocks, and won't fix it.
 */
-   if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found))
+   if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found) &&
+   rc->merge_reloc_tree == 0)
return 0;

+   data_reloc_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID);
+   if (IS_ERR(data_reloc_root))
+   return PTR_ERR(data_reloc_root);
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-   key.objectid = btrfs_ino(inode);
+   key.objectid = BTRFS_FIRST_FREE_OBJECTID + 1;
key.type = BTRFS_EXTENT_DATA_KEY;
key.offset = 0;

@@ -3975,13 +3984,10 @@ static int qgroup_fix_relocated_data_extents(struct 
btrfs_trans_handle *trans,
if (ret < 0)
goto out;

-   lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1);
while (1) {
struct btrfs_file_extent_item *fi;

btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
-   if (key.objectid > btrfs_ino(inode))
-   break;
if (key.type != BTRFS_EXTENT_DATA_KEY)
goto next;
fi = btrfs_item_ptr(path->no

[PATCH] Btrfs: fix leak subvol subv_writers conter

2016-10-06 Thread robbieko
From: Robbie Ko 

In run_delalloc_nocow, maybe not release subv_writers conter,
will lead to create snapshot hang.

Signed-off-by: Robbie Ko 
---
 fs/btrfs/inode.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6811c4..9722554 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1386,11 +1386,17 @@ next_slot:
 * this ensure that csum for a given extent are
 * either valid or do not exist.
 */
-   if (csum_exist_in_range(root, disk_bytenr, num_bytes))
+   if (csum_exist_in_range(root, disk_bytenr, num_bytes)) {
+   if (!nolock)
+   btrfs_end_write_no_snapshoting(root);
goto out_check;
+   }
if (!btrfs_inc_nocow_writers(root->fs_info,
-disk_bytenr))
+disk_bytenr)) {
+   if (!nolock)
+   btrfs_end_write_no_snapshoting(root);
goto out_check;
+   }
nocow = 1;
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
extent_end = found_key.offset +
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is is possible to submit binary image as fstest test case?

2016-10-06 Thread Theodore Ts'o
On Thu, Oct 06, 2016 at 08:29:28AM -0400, Brian Foster wrote:
> Doesn't necessarily bother me one way or the other, but something we've
> done with XFS in such situations is introduce a DEBUG mode only sysfs
> tunable that delays certain infrastructure (log recovery in our case) to
> coordinate with test cases that try to reproduce such timing/racing
> problems.

The other approach the btrfs folks might consider is to have a
sufficiently powerful "debugfs" or "xfs_db" program that can generate
test images with the desired property.

I've found that the time I've invested in creating debugfs has repaid
itself a hundred times over, especially when maintaining a regression
test suite for e2fsck.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Making statfs return qgroup info (for container use case etc.)

2016-10-06 Thread Dave Chinner
On Thu, Oct 06, 2016 at 02:51:58PM +0100, Tim Small wrote:
> Hello,
> 
> I use btrfs for containers (e.g. full OS style containers using lxc/lxd
> etc. rather than application containers).  btrfs de-duplication and
> send/receive are very useful features for this use-case.
> 
> Each container runs in its own subvolume, and I use quotas to stop one
> container exhausting the disk space of the others.
> 
> df shows the total space + free space for the entire filesystem - which
> is confusing for users within the containers.  Worse - they don't have
> any way of knowing how much quota they actually have left (other than du
> -xs / vs. whatever I've told them).
> 
> It'd be really nice if running e.g. statfs("/home", ...) within a
> container could be made to return the qgroup usage + limit for the path
> (e.g. by passing in an option at mount time - such as qgroup level
> maybe?) , instead of the global filesystem data in f_bfree f_blocks etc.

XFS does this with directory tree quotas. It was implmented 10 years
ago or so, IIRC...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: relocation: Fix NULL pointer access and leaking qgroups in btrfs_recover_relocation

2016-10-06 Thread Goldwyn Rodrigues
Hi Qu,

On 10/06/2016 03:03 AM, Qu Wenruo wrote:
> We fixed balance qgroup leaking by introducing
> qgroup_fix_relocated_data_extents(), but it only covers normal balance
> routine well.
> 
> For case btrfs_recover_relocation() routine, since rc->stage or
> rc->data_inode are not initialized, we either skip
> qgroup_fix_relocated_data_extents() or just cause NULL pointer access.
> 
> Since we skip qgroup_fix_relocated_data_extents() in
> btrfs_recover_relocation(), we will still leak qgroup numbers for that
> routine.
> 
> In the fix, we won't use data_inode any longer, as at the timing of the
> qgroup fix, noone will modify data_reloc tree any longer, so path
> locking should be good enough.
> 
> And add check against rc->merge_reloc_tree, so we can detect if we are
> in btrfs_recover_relocation() routine and continue qgroup fix.

While testing this patch, I realized the original problem of qgroup
values being incorrect after a balance, has returned... or probably was
not solved completely. I tried it with the shell script sent sometime
back. The script fails when I checkout fix df2c95f33e0a2 as well.

I remember it did not appear when I tested it last, so it is possible
that some other change has affected this.

Would recommend to hold off this until the balance issue is not
completely fixed.

For reference, the test script is:

#!/bin/bash -x

MNT="/mnt"
DEV="/dev/vdb"

mkfs.btrfs -f $DEV
mount -t btrfs $DEV $MNT

mkdir $MNT/snaps
echo "populate $MNT with some data"
#cp -a /usr/share/fonts $MNT/
cp -a /usr/ $MNT/ &
for i in `seq -w 0 8`; do
S="$MNT/snaps/snap$i"
echo "create and populate $S"
btrfs su snap $MNT $S;
cp -a /boot $S;
done;

#let the cp from above finish
wait

btrfs fi sync $MNT

btrfs quota enable $MNT
btrfs quota rescan -w $MNT
btrfs qg show $MNT

umount $MNT

mount -t btrfs $DEV $MNT


time btrfs balance start --full-balance $MNT

umount $MNT

btrfsck $DEV



> 
> Reported-by: Goldwyn Rodrigues 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/relocation.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index c0c13dc..f250187 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -3946,8 +3946,7 @@ static int qgroup_fix_relocated_data_extents(struct 
> btrfs_trans_handle *trans,
>struct reloc_control *rc)
>  {
>   struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
> - struct inode *inode = rc->data_inode;
> - struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root;
> + struct btrfs_root *data_reloc_root;
>   struct btrfs_path *path;
>   struct btrfs_key key;
>   int ret = 0;
> @@ -3956,18 +3955,28 @@ static int qgroup_fix_relocated_data_extents(struct 
> btrfs_trans_handle *trans,
>   return 0;
>  
>   /*
> +  * For normal balance routine, merge_reloc_tree flag is always cleared
> +  * before commit trans.
> +  * For recover relocation routine, merge_reloc_tree is always 1, we need
> +  * to continue anyway.
> +  *
>* Only for stage where we update data pointers the qgroup fix is
>* valid.
>* For MOVING_DATA stage, we will miss the timing of swapping tree
>* blocks, and won't fix it.
>*/
> - if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found))
> + if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found) &&
> + rc->merge_reloc_tree == 0)
>   return 0;
>  
> + data_reloc_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID);
> + if (IS_ERR(data_reloc_root))
> + return PTR_ERR(data_reloc_root);
> +
>   path = btrfs_alloc_path();
>   if (!path)
>   return -ENOMEM;
> - key.objectid = btrfs_ino(inode);
> + key.objectid = BTRFS_FIRST_FREE_OBJECTID + 1;
>   key.type = BTRFS_EXTENT_DATA_KEY;
>   key.offset = 0;
>  
> @@ -3975,13 +3984,10 @@ static int qgroup_fix_relocated_data_extents(struct 
> btrfs_trans_handle *trans,
>   if (ret < 0)
>   goto out;
>  
> - lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1);
>   while (1) {
>   struct btrfs_file_extent_item *fi;
>  
>   btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> - if (key.objectid > btrfs_ino(inode))
> - break;
>   if (key.type != BTRFS_EXTENT_DATA_KEY)
>   goto next;
>   fi = btrfs_item_ptr(path->nodes[0], path->slots[0],
> @@ -4004,7 +4010,6 @@ next:
>   break;
>   }
>   }
> - unlock_extent(&BTRFS_I(inode)->io_tree, 0 , (u64)-1);
>  out:
>   btrfs_free_path(path);
>   return ret;
> 

-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.or

Re: Problem with btrfs_create() getting EBUSY; cause is stale highest_objectid at mount - kernel.org bug 173001

2016-10-06 Thread Dave Olson
Dave Olson  wrote:

> Dave Olson  wrote:
> 
> > The full details are in
> > https://bugzilla.kernel.org/show_bug.cgi?id=173001
> > 
> > Basicly, logrotate has rotated a file to a new name, tries to open a new
> > file with the original name, and gets EBUSY.  The file is not created.

The cause is that the highest_objectid field in the btrfs_root structure
is set incorrectly (an old stale value) at filesystem mount in some
cases.   See the writeup in the bug for more details.

I'm guessing that some of the journal information is not flushed to disk
after btrfs_create(), even when the filesystem is mounted with
flushoncommit, so if a powercycle is done within a few seconds of the
file being created, there is a good chance that this problem will occur
on a subsequent reboot.

btrfs check (up through 4.7.3) does not seem to detect this stale
condition.

Dave Olson
ol...@cumulusnetworks.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: image: fix compiler warning

2016-10-06 Thread David Sterba
On Wed, Oct 05, 2016 at 05:07:48PM +0900, Tsutomu Itoh wrote:
> In v4.8-rc1, gcc 5.3.1 gives following warning. Fixed it.
> 
> [CC] btrfs-image.o
> btrfs-image.c: In function 'flush_pending':
> btrfs-image.c:708:17: warning: 'start' may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
>   header->bytenr = cpu_to_le64(start);
>  ^
> btrfs-image.c:927:6: note: 'start' was declared here
>   u64 start;
>   ^

So the patch makes the compiler warning go away, but is the code
correct? AFAICS, the warning points to the case where flush_pending is
called with done=1 (from create_metadump) and there's zero
md->pending_size . Are you sure this is an expected state and that the
function can proceed with state = 0 ?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: subvol_uuid_search: Return error code on memory allocation failure

2016-10-06 Thread David Sterba
On Wed, Oct 05, 2016 at 06:33:12PM +0530, Prasanth K S R wrote:
> From: Prasanth K S R 
> 
> This commit fixes coverity defect CID 1328695.
> 
> Signed-off-by: Prasanth K S R 
> ---
>  send-utils.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/send-utils.c b/send-utils.c
> index a85fa08..6f80b6f 100644
> --- a/send-utils.c
> +++ b/send-utils.c
> @@ -486,6 +486,11 @@ struct subvol_info *subvol_uuid_search(struct 
> subvol_uuid_search *s,
>   info->path = strdup(path);
>   } else {
>   info->path = malloc(PATH_MAX);
> + if (!info->path) {
> + fprintf(stderr, "Memory allocation failed\n");

Please use the error() helper (does not need \n terminated string).

Also, would be grat to convert subvol_uuid_search to return ERR_PTR
style value (which means to conver all callers). Now we don't see why
exactly it fails and interpret it as nonexistence of the uuid.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix warning_trace compile error if backtrace is disabled

2016-10-06 Thread David Sterba
On Thu, Oct 06, 2016 at 04:47:19PM +0800, Qu Wenruo wrote:
> If we disable backtrace, btrfs-progs can't be compiled since we don't
> have warning_trace defined.
> 
> Fix by move it out of #ifndef BTRFS_DISABLE_BACKTRACE block.

But now this breaks with backtrace, eg:

kerncompat.h: In function ‘warning_trace’:
kerncompat.h:91:3: warning: implicit declaration of function ‘print_trace’ 
[-Wimplicit-function-declaration]
   print_trace();
   ^~~
kerncompat.h: At top level:
kerncompat.h:97:20: warning: conflicting types for ‘print_trace’
 static inline void print_trace(void)
^~~
kerncompat.h:97:20: error: static declaration of ‘print_trace’ follows 
non-static declaration
kerncompat.h:91:3: note: previous implicit declaration of ‘print_trace’ was here
   print_trace();

I've fixed it and will add a script to cycle through common build
configuration so we can automate that a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Making statfs return qgroup info (for container use case etc.)

2016-10-06 Thread Tim Small
Hello,

I use btrfs for containers (e.g. full OS style containers using lxc/lxd
etc. rather than application containers).  btrfs de-duplication and
send/receive are very useful features for this use-case.

Each container runs in its own subvolume, and I use quotas to stop one
container exhausting the disk space of the others.

df shows the total space + free space for the entire filesystem - which
is confusing for users within the containers.  Worse - they don't have
any way of knowing how much quota they actually have left (other than du
-xs / vs. whatever I've told them).

It'd be really nice if running e.g. statfs("/home", ...) within a
container could be made to return the qgroup usage + limit for the path
(e.g. by passing in an option at mount time - such as qgroup level
maybe?) , instead of the global filesystem data in f_bfree f_blocks etc.

A useful tweak might be to only return the qgroup info within non-root
user namespaces, or perhaps just if the root UID != 0 (since the normal
use case for this is unprivileged containers), but I've no idea how
viable that idea is.

Yet another idea - (since btrfs always returns 0 for f_files - used for
inodes by other FS), then perhaps having the option of returning quota
information there would be useful (or maybe return quota referenced size
info in f_blocks and f_bfree, and exclusive size info in f_files and
f_free)?

Googling shows that this question has come up briefly on the #btrfs IRC
channel in the past.

Does sound like something that could be added to btrfs?

What I need to do currently only works with zfs, but I don't really want
to go there if I can avoid it...

ref:  search for btrfs in this web page:

https://www.stgraber.org/2016/03/26/lxd-2-0-resource-control-412/

... the shown working example is on zfs...

Tim.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] btrfs-progs: tests: make the ioctl-test actually useful

2016-10-06 Thread David Sterba
Enhance the test to verify ioctl uniqueness, compare the defined values
against the hardcoded expected values, and take care of the 32bit/64bit
compatibility workarounds.

Signed-off-by: David Sterba 
---
 Makefile.in  |  27 +-
 ioctl-test.c | 266 +--
 2 files changed, 263 insertions(+), 30 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 983b8b9a6193..48ff54ca60f2 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -387,9 +387,30 @@ quick-test: $(objects) $(libs) quick-test.o
@echo "[LD] $@"
$(Q)$(CC) $(CFLAGS) -o quick-test $(objects) $(libs) quick-test.o 
$(LDFLAGS) $(LIBS)
 
-ioctl-test: $(objects) $(libs) ioctl-test.o
-   @echo "[LD] $@"
-   $(Q)$(CC) $(CFLAGS) -o ioctl-test $(objects) $(libs) ioctl-test.o 
$(LDFLAGS) $(LIBS)
+ioctl-test-64.o: ioctl-test.c ioctl.h kerncompat.h ctree.h
+   @echo "[CC64]   $@"
+   $(Q)$(CC) $(CFLAGS) -m64 -c $< -o $@
+
+ioctl-test-32.o: ioctl-test.c ioctl.h kerncompat.h ctree.h
+   @echo "[CC32]   $@"
+   $(Q)$(CC) $(CFLAGS) -m32 -c $< -o $@
+
+ioctl-test-32: ioctl-test-32.o
+   @echo "[LD32]   $@"
+   $(Q)$(CC) $(CFLAGS) -m32 -o $@ $< $(LDFLAGS)
+   @echo "   ?[PAHOLE] $@"
+   -$(Q)pahole $@ > $@.pahole
+
+ioctl-test-64: ioctl-test-64.o
+   @echo "[LD64]   $@"
+   $(Q)$(CC) $(CFLAGS) -m64 -o $@ $< $(LDFLAGS)
+   @echo "   ?[PAHOLE] $@"
+   -$(Q)pahole $@ > $@.pahole
+
+test-ioctl: ioctl-test-32 ioctl-test-64
+   @echo "[TEST/ioctl]"
+   @$(Q)./ioctl-test-32 > ioctl-test-32.log
+   @$(Q)./ioctl-test-64 > ioctl-test-64.log
 
 send-test: $(objects) $(libs) send-test.o
@echo "[LD] $@"
diff --git a/ioctl-test.c b/ioctl-test.c
index 54fc013584e3..26b3c68df606 100644
--- a/ioctl-test.c
+++ b/ioctl-test.c
@@ -1,37 +1,249 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include "kerncompat.h"
 #include 
 #include 
-#include "kerncompat.h"
+
 #include "ioctl.h"
+#include "ctree.h"
+
+#define LIST_32_COMPAT \
+   ONE(BTRFS_IOC_SET_RECEIVED_SUBVOL_32)
+
+#define LIST_64_COMPAT \
+   ONE(BTRFS_IOC_SEND_64)
+
+#define LIST_BASE  \
+   ONE(BTRFS_IOC_SNAP_CREATE)  \
+   ONE(BTRFS_IOC_DEFRAG)   \
+   ONE(BTRFS_IOC_RESIZE)   \
+   ONE(BTRFS_IOC_SCAN_DEV) \
+   ONE(BTRFS_IOC_TRANS_START)  \
+   ONE(BTRFS_IOC_TRANS_END)\
+   ONE(BTRFS_IOC_SYNC) \
+   ONE(BTRFS_IOC_CLONE)\
+   ONE(BTRFS_IOC_ADD_DEV)  \
+   ONE(BTRFS_IOC_RM_DEV)   \
+   ONE(BTRFS_IOC_BALANCE)  \
+   ONE(BTRFS_IOC_CLONE_RANGE)  \
+   ONE(BTRFS_IOC_SUBVOL_CREATE)\
+   ONE(BTRFS_IOC_SNAP_DESTROY) \
+   ONE(BTRFS_IOC_DEFRAG_RANGE) \
+   ONE(BTRFS_IOC_TREE_SEARCH)  \
+   ONE(BTRFS_IOC_TREE_SEARCH_V2)   \
+   ONE(BTRFS_IOC_INO_LOOKUP)   \
+   ONE(BTRFS_IOC_DEFAULT_SUBVOL)   \
+   ONE(BTRFS_IOC_SPACE_INFO)   \
+   ONE(BTRFS_IOC_START_SYNC)   \
+   ONE(BTRFS_IOC_WAIT_SYNC)\
+   ONE(BTRFS_IOC_SNAP_CREATE_V2)   \
+   ONE(BTRFS_IOC_SUBVOL_CREATE_V2) \
+   ONE(BTRFS_IOC_SUBVOL_GETFLAGS)  \
+   ONE(BTRFS_IOC_SUBVOL_SETFLAGS)  \
+   ONE(BTRFS_IOC_SCRUB)\
+   ONE(BTRFS_IOC_SCRUB_CANCEL) \
+   ONE(BTRFS_IOC_SCRUB_PROGRESS)   \
+   ONE(BTRFS_IOC_DEV_INFO) \
+   ONE(BTRFS_IOC_FS_INFO)  \
+   ONE(BTRFS_IOC_BALANCE_V2)   \
+   ONE(BTRFS_IOC_BALANCE_CTL)  \
+   ONE(BTRFS_IOC_BALANCE_PROGRESS) \
+   ONE(BTRFS_IOC_INO_PATHS)\
+   ONE(BTRFS_IOC_LOGICAL_INO)  \
+   ONE(BTRFS_IOC_SET_RECEIVED_SUBVOL)  \
+   ONE(BTRFS_IOC_SEND) \
+   ONE(BTRFS_IOC_DEVICES_READY)\
+   ONE(BTRFS_IOC_QUOTA_CTL)\
+   ONE(BTRFS_IOC_QGROUP_ASSIGN)\
+   ONE(BTRFS_IOC_QGROUP_CREATE)\

[PATCH 3/4] btrfs-progs: ioctl: add 64bit compat for SEND

2016-10-06 Thread David Sterba
The ioctl value of SEND will be different on 32bit userspace and 64bit
kernel due to different pointer type width, that unfortunatelly made it
into the structure definition.

To maintain backward compatibility, we must do it in the 64bit->32bit
way, because we don't have the kernel side workardound like
SET_RECEIVED_SUBVOL has.  Changing value of SEND would then break
existing users of the raw ioctl.

The compatibility structure and ioctl should not be used, exists for
documentation, and testing.

Signed-off-by: David Sterba 
---
 ioctl.h | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index 6baa51ab1724..f7233da90637 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -635,8 +635,8 @@ struct btrfs_ioctl_send_args {
__u64 reserved[4];  /* in */
 } __attribute__((packed));
 /*
- * Size of structure depends on pointer width, was not caught.  Kernel handles
- * pointer width differences transparently
+ * Size of structure depends on pointer width, was not caught in the early
+ * days.  Kernel handles pointer width differences transparently.
  */
 BUILD_ASSERT(sizeof(__u64 *) == 8
 ? sizeof(struct btrfs_ioctl_send_args) == 72
@@ -644,6 +644,28 @@ BUILD_ASSERT(sizeof(__u64 *) == 8
? sizeof(struct btrfs_ioctl_send_args) == 68
: 0));
 
+/*
+ * Different pointer width leads to structure size change. Kernel should accept
+ * both ioctl values (derived from the structures) for backward compatibility.
+ * Size of this structure is same on 32bit and 64bit though.
+ *
+ * NOTE: do not use in your code, this is for testing only
+ */
+struct btrfs_ioctl_send_args_64 {
+   __s64 send_fd;  /* in */
+   __u64 clone_sources_count;  /* in */
+   union {
+   __u64 __user *clone_sources;/* in */
+   __u64 __clone_sources_alignment;
+   };
+   __u64 parent_root;  /* in */
+   __u64 flags;/* in */
+   __u64 reserved[4];  /* in */
+} __attribute__((packed));
+BUILD_ASSERT(sizeof(struct btrfs_ioctl_send_args_64) == 72);
+
+#define BTRFS_IOC_SEND_64_COMPAT_DEFINED 1
+
 /* Error codes as returned by the kernel */
 enum btrfs_err_code {
notused,
@@ -761,6 +783,11 @@ static inline char *btrfs_err_str(enum btrfs_err_code 
err_code)
struct btrfs_ioctl_received_subvol_args_32)
 #endif
 
+#ifdef BTRFS_IOC_SEND_64_COMPAT_DEFINED
+#define BTRFS_IOC_SEND_64 _IOW(BTRFS_IOCTL_MAGIC, 38, \
+   struct btrfs_ioctl_send_args_64)
+#endif
+
 #define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct 
btrfs_ioctl_send_args)
 #define BTRFS_IOC_DEVICES_READY _IOR(BTRFS_IOCTL_MAGIC, 39, \
 struct btrfs_ioctl_vol_args)
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] btrfs-progs: ioctl: pack structures

2016-10-06 Thread David Sterba
Add the packed attribute to several structures so we avoid any
accidental size changes due to compiler-dependent padding.

Expected alignment is 8 bytes, padded where needed.

Signed-off-by: David Sterba 
---
 ioctl.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index a7235c00c1a0..d0c06657f0c0 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -149,7 +149,7 @@ struct btrfs_ioctl_scrub_args {
struct btrfs_scrub_progress progress;   /* out */
/* pad to 1k */
__u64 unused[(1024-32-sizeof(struct btrfs_scrub_progress))/8];
-};
+} __attribute__((packed));
 BUILD_ASSERT(sizeof(struct btrfs_ioctl_scrub_args) == 1024);
 
 #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS   0
@@ -160,7 +160,7 @@ struct btrfs_ioctl_dev_replace_start_params {
 * above */
__u8 srcdev_name[BTRFS_DEVICE_PATH_NAME_MAX + 1];   /* in */
__u8 tgtdev_name[BTRFS_DEVICE_PATH_NAME_MAX + 1];   /* in */
-};
+} __attribute__((packed, aligned (8)));
 BUILD_ASSERT(sizeof(struct btrfs_ioctl_dev_replace_start_params) == 2072);
 
 #define BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED0
@@ -175,7 +175,7 @@ struct btrfs_ioctl_dev_replace_status_params {
__u64 time_stopped; /* out, seconds since 1-Jan-1970 */
__u64 num_write_errors; /* out */
__u64 num_uncorrectable_read_errors;/* out */
-};
+} __attribute__((packed));
 BUILD_ASSERT(sizeof(struct btrfs_ioctl_dev_replace_status_params) == 48);
 
 #define BTRFS_IOCTL_DEV_REPLACE_CMD_START  0
@@ -556,7 +556,7 @@ BUILD_ASSERT(sizeof(struct btrfs_ioctl_qgroup_create_args) 
== 16);
 struct btrfs_ioctl_timespec {
__u64 sec;
__u32 nsec;
-};
+} __attribute__((packed, aligned (8)));
 
 struct btrfs_ioctl_received_subvol_args {
charuuid[BTRFS_UUID_SIZE];  /* in */
@@ -601,7 +601,7 @@ struct btrfs_ioctl_send_args {
__u64 parent_root;  /* in */
__u64 flags;/* in */
__u64 reserved[4];  /* in */
-};
+} __attribute__((packed));
 /*
  * Size of structure depends on pointer width, was not caught.  Kernel handles
  * pointer width differences transparently
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs-progs: ioctl: add 32bit compat for SET_RECEIVED_SUBVOL

2016-10-06 Thread David Sterba
The ioctl value of SET_RECEIVED_SUBVOL will be different on 32bit
userspace and 64bit kernel. This is transparently handled on the kernel
side. We now define the same structure so we can verify the ioctl value.

The defined structure and ioctl should not be used. It is exists only
for testing purposes, documenting the mess and as a warning for the
future.

Signed-off-by: David Sterba 
---
 ioctl.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/ioctl.h b/ioctl.h
index d0c06657f0c0..6baa51ab1724 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -570,6 +570,38 @@ struct btrfs_ioctl_received_subvol_args {
 BUILD_ASSERT(sizeof(struct btrfs_ioctl_received_subvol_args) == 200);
 
 /*
+ * If we have a 32-bit userspace and 64-bit kernel, then the UAPI
+ * structures are incorrect, as the timespec structure from userspace
+ * is 4 bytes too small. We define these alternatives here for backward
+ * compatibility, the kernel understands both values.
+ */
+
+/*
+ * Structure size is different on 32bit and 64bit, has some padding if the
+ * structure is embedded. Packing makes sure the size is same on both, but will
+ * be misaligned on 64bit.
+ *
+ * NOTE: do not use in your code, this is for testing only
+ */
+struct btrfs_ioctl_timespec_32 {
+   __u64 sec;
+   __u32 nsec;
+} __attribute__ ((__packed__));
+
+struct btrfs_ioctl_received_subvol_args_32 {
+   charuuid[BTRFS_UUID_SIZE];  /* in */
+   __u64   stransid;   /* in */
+   __u64   rtransid;   /* out */
+   struct btrfs_ioctl_timespec_32 stime; /* in */
+   struct btrfs_ioctl_timespec_32 rtime; /* out */
+   __u64   flags;  /* in */
+   __u64   reserved[16];   /* in */
+} __attribute__ ((__packed__));
+BUILD_ASSERT(sizeof(struct btrfs_ioctl_received_subvol_args_32) == 192);
+
+#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32_COMPAT_DEFINED 1
+
+/*
  * Caller doesn't want file data in the send stream, even if the
  * search of clone sources doesn't find an extent. UPDATE_EXTENT
  * commands will be sent instead of WRITE commands.
@@ -723,6 +755,12 @@ static inline char *btrfs_err_str(enum btrfs_err_code 
err_code)
struct btrfs_ioctl_logical_ino_args)
 #define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \
struct btrfs_ioctl_received_subvol_args)
+
+#ifdef BTRFS_IOC_SET_RECEIVED_SUBVOL_32_COMPAT_DEFINED
+#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \
+   struct btrfs_ioctl_received_subvol_args_32)
+#endif
+
 #define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct 
btrfs_ioctl_send_args)
 #define BTRFS_IOC_DEVICES_READY _IOR(BTRFS_IOCTL_MAGIC, 39, \
 struct btrfs_ioctl_vol_args)
-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Btrfs-progs 4.8, fix build on 32bit, ioctl cleanups

2016-10-06 Thread David Sterba
So this series should fix the 32bit build problems, tested on gcc 6.2.1. I've
added test of checking the ioctl numbers: "make test-ioctl" that will build
32bit and 64bit versions (needs compiler for both).

The ioctl compatibility reasoning is in the changelogs. The goal is to keep
everything working (send, receive), so there's no visible change, but more
testing would be welcome of course.

The patchset is now in devel, on top of 4.8.

David Sterba (4):
  btrfs-progs: ioctl: pack structures
  btrfs-progs: ioctl: add 32bit compat for SET_RECEIVED_SUBVOL
  btrfs-progs: ioctl: add 64bit compat for SEND
  btrfs-progs: tests: make the ioctl-test actually useful

 Makefile.in  |  27 +-
 ioctl-test.c | 266 +--
 ioctl.h  |  79 --
 3 files changed, 335 insertions(+), 37 deletions(-)

-- 
2.10.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] exportfs: be careful to only return expected errors.

2016-10-06 Thread J. Bruce Fields
On Thu, Oct 06, 2016 at 05:39:24PM +1100, NeilBrown wrote:
> On Thu, Aug 04 2016, NeilBrown wrote:
> 
> >
> >
> > When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors.
> > In particular it can be tempting to return ENOENT, but this is not
> > handled well by nfsd.
> >
> > Rather than requiring strict adherence to error code code filesystems,
> > treat all unexpected error codes the same as ESTALE.  This is safest.
> >
> > Signed-off-by: NeilBrown 
> > ---
> >
> > I didn't add a dprintk for unexpected error messages, partly
> > because dprintk isn't usable in exportfs.  I could have used pr_debug()
> > but I really didn't see much value.
> >
> > This has been tested together with the btrfs change, and it restores
> > correct functionality.
> >
> > Thanks,
> > NeilBrown
> 
> 
> Hi Bruce,
>  I notice that this patch isn't in -next, but the btrfs change which
>  motivated it is.
>  That means, unless there is some other work around somewhere, btrfs
>  might start having problems with nfs export.

Whoops, I wonder what happened.  Looking back  Oh, I think I set it
aside pending a response from Christoph, but I don't think that's really
necessary.

>  Can we get this patch into 4.9-rc??
> 
>  Or has someone fixed it a different way?

No, I'll just apply.

I need to send a pull request soon, I just have to make up my mind on
COPY first.

--b.

> Thanks,
> NeilBrown
> 
> 
> >
> >  fs/exportfs/expfs.c  | 10 ++
> >  include/linux/exportfs.h | 13 +++--
> >  2 files changed, 13 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> > index 207ba8d627ca..a4b531be9168 100644
> > --- a/fs/exportfs/expfs.c
> > +++ b/fs/exportfs/expfs.c
> > @@ -428,10 +428,10 @@ struct dentry *exportfs_decode_fh(struct vfsmount 
> > *mnt, struct fid *fid,
> > if (!nop || !nop->fh_to_dentry)
> > return ERR_PTR(-ESTALE);
> > result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
> > -   if (!result)
> > -   result = ERR_PTR(-ESTALE);
> > -   if (IS_ERR(result))
> > -   return result;
> > +   if (PTR_ERR(result) == -ENOMEM)
> > +   return ERR_CAST(result);
> > +   if (IS_ERR_OR_NULL(result))
> > +   return ERR_PTR(-ESTALE);
> >  
> > if (d_is_dir(result)) {
> > /*
> > @@ -541,6 +541,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, 
> > struct fid *fid,
> >  
> >   err_result:
> > dput(result);
> > +   if (err != -ENOMEM)
> > +   err = -ESTALE;
> > return ERR_PTR(err);
> >  }
> >  EXPORT_SYMBOL_GPL(exportfs_decode_fh);
> > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > index b03c0625fa6e..5ab958cdc50b 100644
> > --- a/include/linux/exportfs.h
> > +++ b/include/linux/exportfs.h
> > @@ -157,12 +157,13 @@ struct fid {
> >   *@fh_to_dentry is given a &struct super_block (@sb) and a file handle
> >   *fragment (@fh, @fh_len). It should return a &struct dentry which 
> > refers
> >   *to the same file that the file handle fragment refers to.  If it 
> > cannot,
> > - *it should return a %NULL pointer if the file was found but no 
> > acceptable
> > - *&dentries were available, or an %ERR_PTR error code indicating why it
> > - *couldn't be found (e.g. %ENOENT or %ENOMEM).  Any suitable dentry 
> > can be
> > - *returned including, if necessary, a new dentry created with 
> > d_alloc_root.
> > - *The caller can then find any other extant dentries by following the
> > - *d_alias links.
> > + *it should return a %NULL pointer if the file cannot be found, or an
> > + *%ERR_PTR error code of %ENOMEM if a memory allocation failure 
> > occurred.
> > + *Any other error code is treated like %NULL, and will cause an 
> > %ESTALE error
> > + *for callers of exportfs_decode_fh().
> > + *Any suitable dentry can be returned including, if necessary, a new 
> > dentry
> > + *created with d_alloc_root.  The caller can then find any other extant
> > + *dentries by following the d_alias links.
> >   *
> >   * fh_to_parent:
> >   *Same as @fh_to_dentry, except that it returns a pointer to the parent
> > -- 
> > 2.9.2


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is is possible to submit binary image as fstest test case?

2016-10-06 Thread Brian Foster
On Thu, Oct 06, 2016 at 04:12:56PM +0800, Qu Wenruo wrote:
> Hi,
> 
> Just as the title says, for some case(OK, btrfs again) we need to catch a
> file system in special timing.
> 
> In this specific case, we need to grab a btrfs image undergoing balancing,
> just before the balance finished.
> 
> Although we can use flakey to drop all write, we still don't have method to
> catch the timing of the that transaction.
> 
> 
> On the other hand, we can tweak our local kernel, adding msleep()/message
> and dump the disk during the sleep.
> And the image I dumped can easily trigger btrfs kernel and user-space bug.
> 
> So I'm wondering if I can just upload a zipped raw image as part of the test
> case?
> 

Doesn't necessarily bother me one way or the other, but something we've
done with XFS in such situations is introduce a DEBUG mode only sysfs
tunable that delays certain infrastructure (log recovery in our case) to
coordinate with test cases that try to reproduce such timing/racing
problems.

See test xfs/051 for an example..

Brian

> Thanks,
> Qu
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Fix warning_trace compile error if backtrace is disabled

2016-10-06 Thread Qu Wenruo
If we disable backtrace, btrfs-progs can't be compiled since we don't
have warning_trace defined.

Fix by move it out of #ifndef BTRFS_DISABLE_BACKTRACE block.

Signed-off-by: Qu Wenruo 
---
 kerncompat.h | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/kerncompat.h b/kerncompat.h
index ea04ef3..fba0bc1 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -72,6 +72,26 @@
 #define ___token_glue(a,b,c)   a ## b ## c
 #define BUILD_ASSERT(x)extern int 
__token_glue(compile_time_assert_,__LINE__,__COUNTER__)[1-2*!(x)] 
__attribute__((unused))
 
+static inline void warning_trace(const char *assertion, const char *filename,
+ const char *func, unsigned line, int val,
+ int trace)
+{
+   if (val)
+   return;
+   if (assertion)
+   fprintf(stderr,
+   "%s:%d: %s: Warning: assertion `%s` failed, value %d\n",
+   filename, line, func, assertion, val);
+   else
+   fprintf(stderr,
+   "%s:%d: %s: Warning: assertion failed, value %d.\n",
+   filename, line, func, val);
+#ifndef BTRFS_DISABLE_BACKTRACE
+   if (trace)
+   print_trace();
+#endif
+}
+
 #ifndef BTRFS_DISABLE_BACKTRACE
 #define MAX_BACKTRACE  16
 static inline void print_trace(void)
@@ -99,24 +119,6 @@ static inline void assert_trace(const char *assertion, 
const char *filename,
exit(1);
 }
 
-static inline void warning_trace(const char *assertion, const char *filename,
- const char *func, unsigned line, int val,
- int trace)
-{
-   if (val)
-   return;
-   if (assertion)
-   fprintf(stderr,
-   "%s:%d: %s: Warning: assertion `%s` failed, value %d\n",
-   filename, line, func, assertion, val);
-   else
-   fprintf(stderr,
-   "%s:%d: %s: Warning: assertion failed, value %d.\n",
-   filename, line, func, val);
-   if (trace)
-   print_trace();
-}
-
 #define BUG() assert_trace(NULL, __FILE__, __func__, __LINE__, 0)
 #else
 #define BUG() assert(0)
-- 
2.10.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Fix stack overflow for checking qgroup on tree reloc tree

2016-10-06 Thread Qu Wenruo
For tree reloc tree whose level is >= 2, the root node's parent will
point to itself.
In this case it will make btrfsck overflow its stack and cause segfault.

While for tree reloc tree, it doesn't affect qgroup and kernel can
handle it well.

So add tree reloc tree check for qgroup-verify.c and fix the bug.

Test case will follow soon after I make a minimal image for it.
Current xz ziped image is still over 10M for a 512M fs.

Signed-off-by: Qu Wenruo 
---
 qgroup-verify.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/qgroup-verify.c b/qgroup-verify.c
index df0e547..9a56f59 100644
--- a/qgroup-verify.c
+++ b/qgroup-verify.c
@@ -369,6 +369,11 @@ static int find_parent_roots(struct ulist *roots, u64 
parent)
if (ret < 0)
goto out;
}
+   } else if (ref->parent == ref->bytenr) {
+   /*
+* Special loop case for tree reloc tree
+*/
+   ref->root = BTRFS_TREE_RELOC_OBJECTID;
} else {
ret = find_parent_roots(roots, ref->parent);
if (ret < 0)
@@ -578,6 +583,8 @@ static u64 resolve_one_root(u64 bytenr)
 
if (ref->root)
return ref->root;
+   if (ref->parent == bytenr)
+   return BTRFS_TREE_RELOC_OBJECTID;
return resolve_one_root(ref->parent);
 }
 
@@ -748,6 +755,9 @@ static int add_refs_for_implied(struct btrfs_fs_info *info, 
u64 bytenr,
struct btrfs_root *root;
struct btrfs_key key;
 
+   /* Tree reloc tree doesn't contribute qgroup, skip it */
+   if (root_id == BTRFS_TREE_RELOC_OBJECTID)
+   return 0;
key.objectid = root_id;
key.type = BTRFS_ROOT_ITEM_KEY;
key.offset = (u64)-1;
-- 
2.10.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is is possible to submit binary image as fstest test case?

2016-10-06 Thread Qu Wenruo

Hi,

Just as the title says, for some case(OK, btrfs again) we need to catch 
a file system in special timing.


In this specific case, we need to grab a btrfs image undergoing 
balancing, just before the balance finished.


Although we can use flakey to drop all write, we still don't have method 
to catch the timing of the that transaction.



On the other hand, we can tweak our local kernel, adding 
msleep()/message and dump the disk during the sleep.

And the image I dumped can easily trigger btrfs kernel and user-space bug.

So I'm wondering if I can just upload a zipped raw image as part of the 
test case?


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: relocation: Fix NULL pointer access and leaking qgroups in btrfs_recover_relocation

2016-10-06 Thread Qu Wenruo
We fixed balance qgroup leaking by introducing
qgroup_fix_relocated_data_extents(), but it only covers normal balance
routine well.

For case btrfs_recover_relocation() routine, since rc->stage or
rc->data_inode are not initialized, we either skip
qgroup_fix_relocated_data_extents() or just cause NULL pointer access.

Since we skip qgroup_fix_relocated_data_extents() in
btrfs_recover_relocation(), we will still leak qgroup numbers for that
routine.

In the fix, we won't use data_inode any longer, as at the timing of the
qgroup fix, noone will modify data_reloc tree any longer, so path
locking should be good enough.

And add check against rc->merge_reloc_tree, so we can detect if we are
in btrfs_recover_relocation() routine and continue qgroup fix.

Reported-by: Goldwyn Rodrigues 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/relocation.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c0c13dc..f250187 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3946,8 +3946,7 @@ static int qgroup_fix_relocated_data_extents(struct 
btrfs_trans_handle *trans,
 struct reloc_control *rc)
 {
struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
-   struct inode *inode = rc->data_inode;
-   struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root;
+   struct btrfs_root *data_reloc_root;
struct btrfs_path *path;
struct btrfs_key key;
int ret = 0;
@@ -3956,18 +3955,28 @@ static int qgroup_fix_relocated_data_extents(struct 
btrfs_trans_handle *trans,
return 0;
 
/*
+* For normal balance routine, merge_reloc_tree flag is always cleared
+* before commit trans.
+* For recover relocation routine, merge_reloc_tree is always 1, we need
+* to continue anyway.
+*
 * Only for stage where we update data pointers the qgroup fix is
 * valid.
 * For MOVING_DATA stage, we will miss the timing of swapping tree
 * blocks, and won't fix it.
 */
-   if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found))
+   if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found) &&
+   rc->merge_reloc_tree == 0)
return 0;
 
+   data_reloc_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID);
+   if (IS_ERR(data_reloc_root))
+   return PTR_ERR(data_reloc_root);
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-   key.objectid = btrfs_ino(inode);
+   key.objectid = BTRFS_FIRST_FREE_OBJECTID + 1;
key.type = BTRFS_EXTENT_DATA_KEY;
key.offset = 0;
 
@@ -3975,13 +3984,10 @@ static int qgroup_fix_relocated_data_extents(struct 
btrfs_trans_handle *trans,
if (ret < 0)
goto out;
 
-   lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1);
while (1) {
struct btrfs_file_extent_item *fi;
 
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
-   if (key.objectid > btrfs_ino(inode))
-   break;
if (key.type != BTRFS_EXTENT_DATA_KEY)
goto next;
fi = btrfs_item_ptr(path->nodes[0], path->slots[0],
@@ -4004,7 +4010,6 @@ next:
break;
}
}
-   unlock_extent(&BTRFS_I(inode)->io_tree, 0 , (u64)-1);
 out:
btrfs_free_path(path);
return ret;
-- 
2.10.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: opencode chunk locking, remove helpers

2016-10-06 Thread David Sterba
On Wed, Oct 05, 2016 at 02:41:24PM -0400, Jeff Mahoney wrote:
> On 10/5/16 8:23 AM, David Sterba wrote:
> > The helpers are trivial and we don't use them consistently.
> 
> This one is going to conflict with the root->fs_info removal patchset in
> every chunk since the helper does nothing with the root.  Otherwise, the
> idea is sound.

No problem, I'll refresh and resend after the fsinfo/root series.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-06 Thread Stefan Priebe - Profihost AG
Thanks Wang,

i applied them both on top of vanilla v4.8 - i hope this is OK. Will
report back what happens.

Greets,
Stefan

Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
> Hi,
> 
> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
> I found that compress sometime report ENOSPC error even in 4.8-rc8,
> currently
 I cannot confirm that as i do not have anough space to test this
 without
 compression ;-( But yes i've compression enabled.
>>> I might not get you, my poor english :)
>>> You mean that you only get ENOSPC error when compression is enabled?
>>>
>>> And when compression is not enabled, you do not get ENOSPC error?
>> I can't tell you. I cannot test with compression not enabled. I do not
>> have anough free space on this disk.
> I had just sent two patches to fix false enospc error for compression,
> please have a try, they fix false enospc error in my test environment.
> btrfs: fix false enospc for compression
> btrfs: improve inode's outstanding_extents computation
> 
> I apply these two patchs in linux upstream tree, the latest commit
> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
> 
> Regards,
> Xiaoguang Wang
> 
>>
> I'm trying to fix it.
 That sounds good but do you also get the
 BTRFS: space_info 4 has 18446742286429913088 free, is not full

 kernel messages on umount? if not you might have found another problem.
>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace13018e13e ]---
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791
>> btrfs_free_block_groups+0x365/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x365/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace1

Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full

2016-10-06 Thread Stefan Priebe - Profihost AG
Thanks Wang,

i applied them both on top of vanilla v4.8 - i hope this is OK. Will
report back what happens.

Greets,
Stefan

Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang:
> Hi,
> 
> On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote:
>> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang:
> I found that compress sometime report ENOSPC error even in 4.8-rc8,
> currently
 I cannot confirm that as i do not have anough space to test this
 without
 compression ;-( But yes i've compression enabled.
>>> I might not get you, my poor english :)
>>> You mean that you only get ENOSPC error when compression is enabled?
>>>
>>> And when compression is not enabled, you do not get ENOSPC error?
>> I can't tell you. I cannot test with compression not enabled. I do not
>> have anough free space on this disk.
> I had just sent two patches to fix false enospc error for compression,
> please have a try, they fix false enospc error in my test environment.
> btrfs: fix false enospc for compression
> btrfs: improve inode's outstanding_extents computation
> 
> I apply these two patchs in linux upstream tree, the latest commit
> is 41844e36206be90cd4d962ea49b0abc3612a99d0.
> 
> Regards,
> Xiaoguang Wang
> 
>>
> I'm trying to fix it.
 That sounds good but do you also get the
 BTRFS: space_info 4 has 18446742286429913088 free, is not full

 kernel messages on umount? if not you might have found another problem.
>>> Yes, I seem similar messages, you can paste you whole dmesg info here.
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790
>> btrfs_free_block_groups+0x346/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x346/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace13018e13e ]---
>> [ cut here ]
>> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791
>> btrfs_free_block_groups+0x365/0x430 [btrfs]()
>> Modules linked in: netconsole xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm
>> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan
>> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop
>> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit
>> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd
>> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid
>> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1
>> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015
>>  880fda777d00 813b69c3 
>> c067a099 880fda777d38 810821c6 
>> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098
>> Call Trace:
>> [] dump_stack+0x63/0x90
>> [] warn_slowpath_common+0x86/0xc0
>> [] warn_slowpath_null+0x1a/0x20
>> [] btrfs_free_block_groups+0x365/0x430 [btrfs]
>> [] close_ctree+0x15d/0x330 [btrfs]
>> [] btrfs_put_super+0x19/0x20 [btrfs]
>> [] generic_shutdown_super+0x6f/0x100
>> [] kill_anon_super+0x12/0x20
>> [] btrfs_kill_super+0x16/0xa0 [btrfs]
>> [] deactivate_locked_super+0x43/0x70
>> [] deactivate_super+0x5c/0x60
>> [] cleanup_mnt+0x3f/0x90
>> [] __cleanup_mnt+0x12/0x20
>> [] task_work_run+0x81/0xa0
>> [] exit_to_usermode_loop+0xb0/0xc0
>> [] syscall_return_slowpath+0xd4/0x130
>> [] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace cee6ace1

Re: [PATCH v3.1 2/3] btrfs: relocation: Fix leaking qgroups numbers on data extents

2016-10-06 Thread Qu Wenruo



At 10/05/2016 09:53 AM, Goldwyn Rodrigues wrote:



On 08/14/2016 09:36 PM, Qu Wenruo wrote:

This patch fixes a REGRESSION introduced in 4.2, caused by the big quota
rework.

When balancing data extents, qgroup will leak all its numbers for
relocated data extents.

The relocation is done in the following steps for data extents:
1) Create data reloc tree and inode
2) Copy all data extents to data reloc tree
   And commit transaction
3) Create tree reloc tree(special snapshot) for any related subvolumes
4) Replace file extent in tree reloc tree with new extents in data reloc
   tree
   And commit transaction
5) Merge tree reloc tree with original fs, by swapping tree blocks

For 1)~4), since tree reloc tree and data reloc tree doesn't count to
qgroup, everything is OK.

But for 5), the swapping of tree blocks will only info qgroup to track
metadata extents.

If metadata extents contain file extents, qgroup number for file extents
will get lost, leading to corrupted qgroup accounting.

The fix is, before commit transaction of step 5), manually info qgroup to
track all file extents in data reloc tree.
Since at commit transaction time, the tree swapping is done, and qgroup
will account these data extents correctly.

Cc: Mark Fasheh 
Reported-by: Mark Fasheh 
Reported-by: Filipe Manana 
Signed-off-by: Qu Wenruo 
Tested-by: Goldwyn Rodrigues 
---
 fs/btrfs/relocation.c | 109 +++---
 1 file changed, 103 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b26a5ae..27480ef 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -31,6 +31,7 @@
 #include "async-thread.h"
 #include "free-space-cache.h"
 #include "inode-map.h"
+#include "qgroup.h"

 /*
  * backref_node, mapping_node and tree_block start with this
@@ -3916,6 +3917,90 @@ int prepare_to_relocate(struct reloc_control *rc)
return 0;
 }

+/*
+ * Qgroup fixer for data chunk relocation.
+ * The data relocation is done in the following steps
+ * 1) Copy data extents into data reloc tree
+ * 2) Create tree reloc tree(special snapshot) for related subvolumes
+ * 3) Modify file extents in tree reloc tree
+ * 4) Merge tree reloc tree with original fs tree, by swapping tree blocks
+ *
+ * The problem is, data and tree reloc tree are not accounted to qgroup,
+ * and 4) will only info qgroup to track tree blocks change, not file extents
+ * in the tree blocks.
+ *
+ * The good news is, related data extents are all in data reloc tree, so we
+ * only need to info qgroup to track all file extents in data reloc tree
+ * before commit trans.
+ */
+static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans,
+struct reloc_control *rc)
+{
+   struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
+   struct inode *inode = rc->data_inode;
+   struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   int ret = 0;
+
+   if (!fs_info->quota_enabled)
+   return 0;
+
+   /*
+* Only for stage where we update data pointers the qgroup fix is
+* valid.
+* For MOVING_DATA stage, we will miss the timing of swapping tree
+* blocks, and won't fix it.
+*/
+   if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found))
+   return 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+   key.objectid = btrfs_ino(inode);
+   key.type = BTRFS_EXTENT_DATA_KEY;
+   key.offset = 0;
+
+   ret = btrfs_search_slot(NULL, data_reloc_root, &key, path, 0, 0);
+   if (ret < 0)
+   goto out;
+
+   lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1);
+   while (1) {
+   struct btrfs_file_extent_item *fi;
+
+   btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+   if (key.objectid > btrfs_ino(inode))
+   break;
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   goto next;
+   fi = btrfs_item_ptr(path->nodes[0], path->slots[0],
+   struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(path->nodes[0], fi) !=
+   BTRFS_FILE_EXTENT_REG)
+   goto next;
+   ret = btrfs_qgroup_insert_dirty_extent(trans, fs_info,
+   btrfs_file_extent_disk_bytenr(path->nodes[0], fi),
+   btrfs_file_extent_disk_num_bytes(path->nodes[0], fi),
+   GFP_NOFS);
+   if (ret < 0)
+   break;
+next:
+   ret = btrfs_next_item(data_reloc_root, path);
+   if (ret < 0)
+   break;
+   if (ret > 0) {
+   ret = 0;
+   break;
+   }
+