[PATCH v2 0/3] Misc volume patch set part2

2018-08-09 Thread Anand Jain
The patch set (2/3 and 3/3) adds helper function to deduce the num_device
(which is the number of the devices when device is mounted, this excludes
the seed device however includes the replacing target). We need to know
the num_device that actually belongs to the FSID without the replacing
target (if it exists) when we are balancing and or when we are trying to
device delete. In both of these cases, it would be wrong to consider
replacing target when reallocating the blockgroups.

This patch as such does not change any of the logic which is already in
the original code, but instead it would bring such a logic at two
locations into a single place/function. But to do that as these two
location uses either WARN_ON or BUG_ON to perform the safety catch so
the first patch does the cleanup and uses ASSERT at both of these
places. And now since these two sections of codes are identical, the 2nd
patch replaces them into a helper function.

Also though the check for num_device > 0 should actually be num_device >
1 OR instead, the check of num_device > 0 should have been after the
num_device--, since we have had already too many comments and added
confusions, I shall leave that part to be fixed at sometime later. So
that the cleanup here is a really a cleanup instead of chaning the
logic in a rough scale.

Last but not least, this patch has gone through a lot of comments on
either to use BUG or return -EINVAL for accidental fall of num_device
below 1, which I wasn't expecting so much of the concerns about
the use of BUG_ON/-EINVAL as its trivial because fundamentally a FS
can't be in the mounted state when num_device < 1.


Anand Jain (3):
  btrfs: drop uuid_mutex in btrfs_free_extra_devids()
  btrfs: assert for num_devices below 0
  btrfs: add helper btrfs_num_devices() to deduce num_devices

 fs/btrfs/volumes.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

-- 
2.7.0



[PATCH 1/2] btrfs: assert for num_devices below 0

2018-08-09 Thread Anand Jain
In preparation to add helper function to deduce the num_devices with
replace running, use assert instead of bug_on and warn_on.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b6d9b6a6fba7..0062615a79be 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1877,7 +1877,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const 
char *device_path,
num_devices = fs_devices->num_devices;
btrfs_dev_replace_read_lock(_info->dev_replace);
if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   WARN_ON(num_devices < 1);
+   ASSERT(num_devices > 0);
num_devices--;
}
btrfs_dev_replace_read_unlock(_info->dev_replace);
@@ -3752,7 +3752,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
num_devices = fs_info->fs_devices->num_devices;
btrfs_dev_replace_read_lock(_info->dev_replace);
if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   BUG_ON(num_devices < 1);
+   ASSERT(num_devices > 0);
num_devices--;
}
btrfs_dev_replace_read_unlock(_info->dev_replace);
-- 
2.7.0



[PATCH v4 1/3] btrfs: drop uuid_mutex in btrfs_free_extra_devids()

2018-08-09 Thread Anand Jain
btrfs_free_extra_devids() is called only in the mount context which
traverses through the fs_devices::devices and frees the orphan devices
devices in the given %fs_devices if any. As the search for the orphan
device is limited to fs_devices::devices so we don't need the global
uuid_mutex.

There can't be any mount-point based ioctl threads in this context as
the mount thread is not yet returned. But there can be the btrfs-control
based scan ioctls thread which calls device_list_add().

Here in the mount thread the fs_devices::opened is incremented way before
btrfs_free_extra_devids() is called and in the scan context the fs_devices
which are already opened neither be freed or alloc-able at
device_list_add().

But lets say you change the device-path and call the scan again, then scan
would update the new device path and this operation could race against the
btrfs_free_extra_devids() thread, which might be in the process of
free-ing the same device. So synchronize it by using the
device_list_mutex.

This scenario is a very corner case, and practically the scan and mount
are anyway serialized by the usage so unless the race is instrumented its
very difficult to achieve.

Signed-off-by: Anand Jain 
Reviewed-by: David Sterba 
---
v3->v4: As we traverse through the seed device, fs_device gets updated with
the child seed fs_devices, so make sure we use the same fs_devices
pointer for the mutex_unlock as used for the mutex_lock.
v2->v3: Update change log.
(Currently device_list_add() is very lean on its device_list_mutex 
usage,
a cleanup and fix is wip. Given the practicality of the above race
condition this patch is good to merge).
v1->v2: replace uuid_mutex with device_list_mutex instead of delete.
change log updated.

 fs/btrfs/volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4405e430da6..f277df935e7c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -934,8 +934,9 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
*fs_devices, int step)
 {
struct btrfs_device *device, *next;
struct btrfs_device *latest_dev = NULL;
+   struct btrfs_fs_devices *parent_fs_devices = fs_devices;
 
-   mutex_lock(_mutex);
+   mutex_lock(_fs_devices->device_list_mutex);
 again:
/* This is the initialized path, it is safe to release the devices. */
list_for_each_entry_safe(device, next, _devices->devices, dev_list) {
@@ -989,8 +990,7 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
*fs_devices, int step)
}
 
fs_devices->latest_bdev = latest_dev->bdev;
-
-   mutex_unlock(_mutex);
+   mutex_unlock(_fs_devices->device_list_mutex);
 }
 
 static void free_device_rcu(struct rcu_head *head)
-- 
2.7.0



[PATCH v5 2/2] btrfs: add helper btrfs_num_devices() to deduce num_devices

2018-08-09 Thread Anand Jain
When the replace is running the fs_devices::num_devices also includes
the replace device, however in some operations like device delete and
balance it needs the actual num_devices without the repalce devices, so
now the function btrfs_num_devices() just provides that.

And here is a scenario how balance and repalce items could co-exist.
Consider balance is started and paused, now start the replace
followed by a unmount or power-recycle of the system. During following
mount, the open_ctree() first restarts the balance so it must check for
the replace device otherwise our num_devices calculation will be wrong.

Signed-off-by: Anand Jain 
---
v4->v5: uses assert.
v3->v4: add comment and drop the inline (sorry missed it before)
v2->v3: update changelog with not so obvious balance and repalce
co-existance secnario
v1->v2: add comments

 fs/btrfs/volumes.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0062615a79be..630f9ec158d0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1863,6 +1863,21 @@ void btrfs_assign_next_active_device(struct btrfs_device 
*device,
fs_info->fs_devices->latest_bdev = next_device->bdev;
 }
 
+/* Returns btrfs_fs_devices::num_devices excluding replace device if any */
+static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
+{
+   u64 num_devices = fs_info->fs_devices->num_devices;
+
+   btrfs_dev_replace_read_lock(_info->dev_replace);
+   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
+   ASSERT(num_devices > 0);
+   num_devices--;
+   }
+   btrfs_dev_replace_read_unlock(_info->dev_replace);
+
+   return num_devices;
+}
+
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
u64 devid)
 {
@@ -1874,13 +1889,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const 
char *device_path,
 
mutex_lock(_mutex);
 
-   num_devices = fs_devices->num_devices;
-   btrfs_dev_replace_read_lock(_info->dev_replace);
-   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   ASSERT(num_devices > 0);
-   num_devices--;
-   }
-   btrfs_dev_replace_read_unlock(_info->dev_replace);
+   num_devices = btrfs_num_devices(fs_info);
 
ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
if (ret)
@@ -3749,13 +3758,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
}
}
 
-   num_devices = fs_info->fs_devices->num_devices;
-   btrfs_dev_replace_read_lock(_info->dev_replace);
-   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   ASSERT(num_devices > 0);
-   num_devices--;
-   }
-   btrfs_dev_replace_read_unlock(_info->dev_replace);
+   num_devices = btrfs_num_devices(fs_info);
+
allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
if (num_devices > 1)
allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
-- 
2.7.0



mount shows incorrect subvol when backed by bind mount

2018-08-09 Thread Chris Murphy
I've got another example of bind mounts resulting in confusing
(incorrect) information in the mount command with Btrfs. In this case,
it's Docker using bind mounts.

Full raw version (expires in 7 days)
https://paste.fedoraproject.org/paste/r8tr-3nuvoycwxf0bPUrmA/raw

Relevant portion:

mount shows:

/dev/mmcblk0p3 on /var/lib/docker/containers type btrfs
(rw,noatime,seclabel,compress-force=zstd,ssd,space_cache=v2,subvolid=265,subvol=/root/var/lib/docker/containers)
/dev/mmcblk0p3 on /var/lib/docker/btrfs type btrfs
(rw,noatime,seclabel,compress-force=zstd,ssd,space_cache=v2,subvolid=265,subvol=/root/var/lib/docker/btrfs)

And from the detail fpaste, you can see there is no such subvolume
docker/btrfs or docker/containers - and subvolid=265 is actually for
rootfs.

Anyway, mortals will be confused by this behavior.

-- 
Chris Murphy


[PATCH] btrfs-progs: qgroup: Don't return 1 if qgroup is marked inconsistent during relationship assignment

2018-08-09 Thread Qu Wenruo
BTRFS_IOC_QGROUP_ASSIGN ioctl could return >0 if qgroup is marked
inconsistent after successful relationship assignment/removal.

We leak the return value as the final return value of btrfs command.
But according to the man page, return value other than 0 means failure.

Fix this by resetting the return value to 0 for --no-rescan case.

Signed-off-by: Qu Wenruo 
---
 cmds-qgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index b928edc7c408..7234bdc17bb4 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -123,6 +123,7 @@ static int _cmd_qgroup_assign(int assign, int argc, char 
**argv,
error("quota rescan failed: %m");
} else {
warning("quotas may be inconsistent, rescan needed");
+   ret = 0;
}
}
close_file_or_dir(fd, dirstream);
-- 
2.18.0



[PATCH v2] fstests: btrfs: Add test for corrupted childless qgroup numbers

2018-08-09 Thread Qu Wenruo
This bug is exposed by populating a high level qgroup, and then make it
childless with old qgroup numbers, and finally do rescan.

Normally rescan should zero out all qgroups' accounting number, but due
to a kernel bug which won't mark childless qgroups dirty, their on-disk
data is never updated, thus old numbers remain and cause qgroup
corruption.

Fixed by the following kernel patch:
"btrfs: qgroup: Dirty all qgroups before rescan"

Reported-by: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Change the adjective for the offending group, from "orphan" to
  "childless"
---
 tests/btrfs/170 | 83 +
 tests/btrfs/170.out |  3 ++
 tests/btrfs/group   |  1 +
 3 files changed, 87 insertions(+)
 create mode 100755 tests/btrfs/170
 create mode 100644 tests/btrfs/170.out

diff --git a/tests/btrfs/170 b/tests/btrfs/170
new file mode 100755
index ..3a810e80562f
--- /dev/null
+++ b/tests/btrfs/170
@@ -0,0 +1,83 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 SUSE Linux Products GmbH.  All Rights Reserved.
+#
+# FS QA Test 170
+#
+# Test if btrfs can clear high level childless qgroup's accounting numbers
+# during rescan.
+#
+# Fixed by the following kernel patch:
+# "btrfs: qgroup: Dirty all qgroups before rescan"
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+
+
+# Populate the fs
+_run_btrfs_util_prog subvolume create "$SCRATCH_MNT/subvol"
+_pwrite_byte 0xcdcd 0 1M "$SCRATCH_MNT/subvol/file1" | _filter_xfs_io > 
/dev/null
+
+# Ensure that file reach disk, so it will also appear in snapshot
+sync
+_run_btrfs_util_prog subvolume snapshot "$SCRATCH_MNT/subvol" 
"$SCRATCH_MNT/snapshot"
+
+
+_run_btrfs_util_prog quota enable "$SCRATCH_MNT"
+_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
+
+# Create high level qgroup
+_run_btrfs_util_prog qgroup create 1/0 "$SCRATCH_MNT"
+
+# Don't use _run_btrfs_util_prog here, as it can return 1 to info user
+# that qgroup is marked inconsistent, this is a bug in btrfs-progs, but
+# to ensure it will work, we just ignore the return value.
+$BTRFS_UTIL_PROG qgroup assign "$SCRATCH_MNT/snapshot" 1/0 "$SCRATCH_MNT"
+
+# Above assign will mark qgroup inconsistent due to the shared extents
+# between subvol/snapshot/high level qgroup, do rescan here
+_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
+
+# Now remove the qgroup relationship and make 1/0 childless
+# Due to the shared extent outside of 1/0, we will mark qgroup inconsistent
+# and keep the number of qgroup 1/0
+$BTRFS_UTIL_PROG qgroup remove "$SCRATCH_MNT/snapshot" 1/0 "$SCRATCH_MNT"
+
+# Above removal also marks qgroup inconsistent, rescan again
+_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
+
+# After the test, btrfs check will verify qgroup numbers to catch any
+# corruption.
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/170.out b/tests/btrfs/170.out
new file mode 100644
index ..9002199e48ed
--- /dev/null
+++ b/tests/btrfs/170.out
@@ -0,0 +1,3 @@
+QA output created by 170
+WARNING: quotas may be inconsistent, rescan needed
+WARNING: quotas may be inconsistent, rescan needed
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b616c73d09bf..339c977135c0 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -172,3 +172,4 @@
 167 auto quick replace volume
 168 auto quick send
 169 auto quick send
+170 auto quick qgroup
-- 
2.18.0



[PATCH v3] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Qu Wenruo
[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:
--
$ mkfs.btrfs -fq $DEV
$ mount $DEV /mnt

$ btrfs quota enable /mnt
$ btrfs qgroup create 1/0 /mnt
$ btrfs sub create /mnt/sub
$ btrfs qgroup assign 0/257 1/0 /mnt

$ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
$ btrfs sub snap /mnt/sub /mnt/snap
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/257  1016.00KiB 16.00KiB none none 1/0 ---
0/258  1016.00KiB 16.00KiB none none --- ---
1/01016.00KiB 16.00KiB none none --- 0/257

so far so good, but:

$ btrfs qgroup remove 0/257 1/0 /mnt
WARNING: quotas may be inconsistent, rescan needed
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre  /mnt
qgoupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/257  1016.00KiB 16.00KiB none none --- ---
0/258  1016.00KiB 16.00KiB none none --- ---
1/01016.00KiB 16.00KiB none none --- ---
   ^^  not cleared
--

[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.

However we don't mark all qgroups dirty, but rely on rescan to mark
qgroups dirty.

If we have any high level qgroup without children, it won't be marked
dirty during rescan, since we can not reach that qgroup.

This will cause QGROUP_INFO items of childless qgroups never get updated in
quota tree, thus their numbers will stay the same in "btrfs qgroup show"
output.

[FIX]
Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even we
have childless qgroups, their QGROUP_INFO items will still get updated
during rescan.

Reported-by: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
Reviewed-by: Misono Tomohiro 
Tested-by: Misono Tomohiro 
---
changelog:
v2:
  Fix some grammar errors in commit message.
v3:
  Fix the adjective used for qgroups without children, from "orphan" to
  "childless".
  Update tags.
---
 fs/btrfs/qgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48c1c3e7baf3..5a5372b33d96 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info)
qgroup->rfer_cmpr = 0;
qgroup->excl = 0;
qgroup->excl_cmpr = 0;
+   qgroup_dirty(fs_info, qgroup);
}
spin_unlock(_info->qgroup_lock);
 }
-- 
2.18.0



List of known BTRFS Raid 5/6 Bugs?

2018-08-09 Thread erenthetitan
I am searching for more information regarding possible bugs related to BTRFS 
Raid 5/6. All sites i could find are incomplete and information contradicts 
itself:

The Wiki Raid 5/6 Page (https://btrfs.wiki.kernel.org/index.php/RAID56)
warns of the write hole bug, stating that your data remains safe (except data 
written during power loss, obviously) upon unclean shutdown unless your data 
gets corrupted by further issues like bit-rot, drive failure etc.

The Wiki Gotchas Page (https://btrfs.wiki.kernel.org/index.php/Gotchas)
warns of possible incorrigible "transid" mismatch, not stating which versions 
are affected or what transid mismatch means for your data. It does not mention 
the write hole at all.

This Mail Archive (linux-btrfs@vger.kernel.org/msg55161.html" 
target="_blank">https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg55161.html)
states that scrubbing BTRFS Raid 5/6 will always repair Data Corruption, but 
may corrupt your Metadata while trying to do so - meaning you have to scrub 
twice in a row to ensure data integrity.

The Bugzilla Entry (https://bugzilla.kernel.org/buglist.cgi?component=btrfs)
contains mostly unanswered bugs, which may or may not still count (2013 - 2018).

This Spinics Discussion 
(https://www.spinics.net/lists/linux-btrfs/msg76471.html)
states that the write hole can even damage old data eg. data that was not 
accessed during unclean shutdown, the opposite of what the Raid5/6 Status Page 
states!

This Spinics comment (https://www.spinics.net/lists/linux-btrfs/msg76412.html)
informs that hot-plugging a device will trigger the write hole. Accessed data 
will therefore be corrupted.
In case the earlier statement about old data corruption is true, random data 
could be permamently lost.
This is even more dangerous if you are connecting your devices via USB, as USB 
can unconnect due to external influence, eg. touching the cables, shaking...

Lastly, this Superuser question 
(https://superuser.com/questions/1325245/btrfs-transid-failure#1344494)
assumes that the transid mismatch bug could toggle your system unmountable.
While it might be possible to restore your data using sudo BTRFS Restore, it is 
still unknown how the transid mismatch is even toggled, meaning that your file 
system could fail at any time!

Do you know of any comprehensive and complete Bug list?

Do you know more about the stated Bugs?

Do you know further Bugs that are not addressed in any of these sites?

-
FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-09 Thread Qu Wenruo


On 8/10/18 1:48 AM, Tomasz Pala wrote:
> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
> 
>> 2) Different limitations on exclusive/shared bytes
>>Btrfs can set different limit on exclusive/shared bytes, further
>>complicating the problem.
>>
>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>It lacks all the shared trees (mentioned below), and in fact such
>>shared tree can be pretty large (especially for extent tree and csum
>>tree).
> 
> I'm not sure about the implications, but just to clarify some things:
> 
> when limiting somebody's data space we usually don't care about the
> underlying "savings" coming from any deduplicating technique - these are
> purely bonuses for system owner, so he could do larger resource overbooking.

In reality that's definitely not the case.

From what I see, most users would care more about exclusively used space
(excl), other than the total space one subvolume is referring to (rfer).

The most common case is, you do a snapshot, user would only care how
much new space can be written into the subvolume, other than the total
subvolume size.

> 
> So - the limit set on any user should enforce maximum and absolute space
> he has allocated, including the shared stuff. I could even imagine that
> creating a snapshot might immediately "eat" the available quota. In a
> way, that quota returned matches (give or take) `du` reported usage,
> unless "do not account reflinks withing single qgroup" was easy to implemet.

In fact, that's the case. In current implementation, accounting on
extent is the easiest (if not the only) way to implement.

> 
> I.e.: every shared segment should be accounted within quota (at least once).

Already accounted, at least for rfer.

> 
> And the numbers accounted should reflect the uncompressed sizes.

No way for current extent based solution.

> 
> 
> Moreover - if there would be per-subvolume RAID levels someday, the data
> should be accouted in relation to "default" (filesystem) RAID level,
> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
> data, and twice the data in an opposite scenario (like "dup" profile on
> single-drive filesystem).

No possible again for current extent based solution.

> 
> 
> In short: values representing quotas are user-oriented ("the numbers one
> bought"), not storage-oriented ("the numbers they actually occupy").

Well, if something is not possible or brings so big performance impact,
there will be no argument on how it should work in the first place.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [COMMAND HANGS] The command 'btrfs subvolume sync -s 2 xyz' can hangs.

2018-08-09 Thread Jeff Mahoney
On 8/9/18 11:15 AM, Giuseppe Della Bianca wrote:
> Hi.
> 
> My system: 
> - Fedora 28 x86_64
> - kernel-4.17.7-200
> - btrfs-progs-4.15.1-1
> 
> The command 'btrfs subvolume sync -s 2 xyz' hangs in this case:
> 
> - Run command 'btrfs subvolume sync -s 2 xyz' .
> - After some time the kernel reports an error on the filesystem.
>   (error that existed before the command was launched.)
> - The filesystem goes in read-only mode.
> - The command hangs.

Can you provide the output of 'dmesg' and the contents of
/proc//stack where  is the pid of the btrfs command process?

-Jeff
-- 
Jeff Mahoney
SUSE Labs




signature.asc
Description: OpenPGP digital signature


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-09 Thread Tomasz Pala
On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:

> 2) Different limitations on exclusive/shared bytes
>Btrfs can set different limit on exclusive/shared bytes, further
>complicating the problem.
> 
> 3) Btrfs quota only accounts data/metadata used by the subvolume
>It lacks all the shared trees (mentioned below), and in fact such
>shared tree can be pretty large (especially for extent tree and csum
>tree).

I'm not sure about the implications, but just to clarify some things:

when limiting somebody's data space we usually don't care about the
underlying "savings" coming from any deduplicating technique - these are
purely bonuses for system owner, so he could do larger resource overbooking.

So - the limit set on any user should enforce maximum and absolute space
he has allocated, including the shared stuff. I could even imagine that
creating a snapshot might immediately "eat" the available quota. In a
way, that quota returned matches (give or take) `du` reported usage,
unless "do not account reflinks withing single qgroup" was easy to implemet.

I.e.: every shared segment should be accounted within quota (at least once).

And the numbers accounted should reflect the uncompressed sizes.


Moreover - if there would be per-subvolume RAID levels someday, the data
should be accouted in relation to "default" (filesystem) RAID level,
i.e. having a RAID0 subvolume on RAID1 fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).


In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").

-- 
Tomasz Pala 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMAND HANGS] The command 'btrfs subvolume sync -s 2 xyz' can hangs.

2018-08-09 Thread Giuseppe Della Bianca
Hi.

My system: 
- Fedora 28 x86_64
- kernel-4.17.7-200
- btrfs-progs-4.15.1-1

The command 'btrfs subvolume sync -s 2 xyz' hangs in this case:

- Run command 'btrfs subvolume sync -s 2 xyz' .
- After some time the kernel reports an error on the filesystem.
  (error that existed before the command was launched.)
- The filesystem goes in read-only mode.
- The command hangs.


Best regards.

gdb



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-convert missing in btrfs-tools v4.15.1

2018-08-09 Thread David Sterba
On Sat, Jul 28, 2018 at 05:34:49PM -0400, Nicholas D Steeves wrote:
> On 28 July 2018 at 16:50, jkexcel  wrote:
> > I'm an end user trying to use btrfs-convert but when I installed
> > btrfs-tools and its dependency btrfs-progs on kubuntu 18.04, the
> > installation was successful, and it shows that v4.15.1-1build1 was
> > installed.
> >
> > However, when using the command  # brtfs-convert  /dev/sda4 (with the
> > drive unmounted) the resulting error appears "command not found"
> > I also tried command "btrfs convert" in case this was folded into the
> > main tool, but this also failed.
> >
> > 1. Is btrfs-convert still available?
> >
> > 2. Where can I find it?
> >
> > 3. Has btrfs-convert been replaced? what is it's new name?
> >
> > 4. Is it safe to use a downgraded version of btrfs-tools ie: 4.14 or
> > older?
> 
> You can blame me for that.  In Debian several users had reported
> release-critical issues in btrfs-convert, so I submitted a patch to
> disable it for the forseable future, eg:
> 
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=864798
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854489

The reports are against version 4.7.3 released one year before the time
of the bug reports. The fix for the reported bug happened in 4.10, that
was half a year before the bug.

> Also, please consider the official status "As of 4.0 kernels this feature
> is not often used or well tested anymore, and there have been some reports
> that the conversion doesn't work reliably. Feel free to try it out, but
> make sure you have backups" (
> https://btrfs.wiki.kernel.org/index.php/Conversion_from_Ext3 ).

Sorry that this you take it as official status. The wiki is open to
edits and such claims appear there from time to time. I've removed it,
it's been there since 2015 when it possibly was accurate but it's not
anymore.

> I'm happy to hear it is still disabled in Ubuntu, where many more
> users would be affected.  IIRC OpenSUSE LEAP and SLED 15 reenabled it
> (it was previously disabled there), so maybe it needs specific kernel
> versions or patches to not trigger RC bugs, and/or very specific
> btrfs-progs versions, and/or very specific e2fslibs, and/or specific
> combinations?  While I very much look forward to the day when
> btrfs-convert can be relied on in Debian, I don't think we're there
> yet.

So if there's no way to update package to newer version or nobody who
backports fixes, then it's better not to ship the tool. There's nothing
close to the 'specific version of X', besides using more up to date
versions. Alternatively, there could have been a separate package with
the convert tool, with a warning about the known issues.

It's in my interest to ship all tools in distros, but there's also only
that much what the upstream community can do. If you're going to
reconsider the status of btrfs-convert in Debian, please let me know.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Qu Wenruo


On 8/9/18 5:17 PM, Filipe Manana wrote:
> On Thu, Aug 9, 2018 at 8:08 AM, Qu Wenruo  wrote:
>> [BUG]
>> In the following case, rescan won't zero out the number of qgroup 1/0:
>> --
>> $ mkfs.btrfs -fq $DEV
>> $ mount $DEV /mnt
>>
>> $ btrfs quota enable /mnt
>> $ btrfs qgroup create 1/0 /mnt
>> $ btrfs sub create /mnt/sub
>> $ btrfs qgroup assign 0/257 1/0 /mnt
>>
>> $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
>> $ btrfs sub snap /mnt/sub /mnt/snap
>> $ btrfs quota rescan -w /mnt
>> $ btrfs qgroup show -pcre /mnt
>> qgroupid rfer excl max_rfer max_excl parent  child
>>      --  -
>> 0/5  16.00KiB 16.00KiB none none --- ---
>> 0/257  1016.00KiB 16.00KiB none none 1/0 ---
>> 0/258  1016.00KiB 16.00KiB none none --- ---
>> 1/01016.00KiB 16.00KiB none none --- 0/257
>>
>> so far so good, but:
>>
>> $ btrfs qgroup remove 0/257 1/0 /mnt
>> WARNING: quotas may be inconsistent, rescan needed
>> $ btrfs quota rescan -w /mnt
>> $ btrfs qgroup show -pcre  /mnt
>> qgoupid rfer excl max_rfer max_excl parent  child
>>      --  -
>> 0/5  16.00KiB 16.00KiB none none --- ---
>> 0/257  1016.00KiB 16.00KiB none none --- ---
>> 0/258  1016.00KiB 16.00KiB none none --- ---
>> 1/01016.00KiB 16.00KiB none none --- ---
>>^^  not cleared
>> --
>>
>> [CAUSE]
>> Before rescan we call qgroup_rescan_zero_tracking() to zero out all
>> qgroups' accounting numbers.
>>
>> However we don't mark all qgroups dirty, but rely on rescan to mark
>> qgroups dirty.
>>
>> If we have any high level qgroup but without any child (orphan group),
> 
> That sentence is confusing. An orphan, by definition [1], is someone
> (or something in this case) without parents.
> But you mention a group without children, so that should be named
> "childless" or simply say "without children".

Makes sense, I'll correct the commit message in next version.

Thanks,
Qu

> So one part of the sentence is wrong, either what is in parenthesis or
> what comes before them.
> 
> [1] https://www.thefreedictionary.com/orphan
> 
>> it
>> won't be marked dirty during rescan, since we can not reach that qgroup.
>>
>> This will cause QGROUP_INFO items of orphan qgroups never get updated in
>> quota tree, thus their numbers will stay the same in "btrfs qgroup show"
>> output.
>>
>> [FIX]
>> Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even we
>> have orphan qgroups their QGROUP_INFO items will still get updated during
>> rescan.
>>
>> Reported-by: Misono Tomohiro 
>> Signed-off-by: Qu Wenruo 
>> ---
>> changelog:
>> v2:
>>   Fix some grammar errors in commit message.
>> ---
>>  fs/btrfs/qgroup.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 48c1c3e7baf3..5a5372b33d96 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info 
>> *fs_info)
>> qgroup->rfer_cmpr = 0;
>> qgroup->excl = 0;
>> qgroup->excl_cmpr = 0;
>> +   qgroup_dirty(fs_info, qgroup);
>> }
>> spin_unlock(_info->qgroup_lock);
>>  }
>> --
>> 2.18.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] fstests: btrfs: Add test for corrupted orphan qgroup numbers

2018-08-09 Thread Filipe Manana
On Thu, Aug 9, 2018 at 8:45 AM, Qu Wenruo  wrote:
> This bug is exposed by populating a high level qgroup, and then make it
> orphan (high level qgroup without child)

Same comment as in the kernel patch:

"That sentence is confusing. An orphan, by definition [1], is someone
(or something in this case) without parents.
But you mention a group without children, so that should be named
"childless" or simply say "without children".
So one part of the sentence is wrong, either what is in parenthesis or
what comes before them.

[1] https://www.thefreedictionary.com/orphan
"

> with old qgroup numbers, and
> finally do rescan.
>
> Normally rescan should zero out all qgroups' accounting number, but due
> to a kernel bug which won't mark orphan qgroups dirty, their on-disk
> data is not updated, thus old numbers remain and cause qgroup
> corruption.
>
> Fixed by the following kernel patch:
> "btrfs: qgroup: Dirty all qgroups before rescan"
>
> Reported-by: Misono Tomohiro 
> Signed-off-by: Qu Wenruo 
> ---
>  tests/btrfs/170 | 82 +
>  tests/btrfs/170.out |  3 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 86 insertions(+)
>  create mode 100755 tests/btrfs/170
>  create mode 100644 tests/btrfs/170.out
>
> diff --git a/tests/btrfs/170 b/tests/btrfs/170
> new file mode 100755
> index ..bcf8b5c0e4f3
> --- /dev/null
> +++ b/tests/btrfs/170
> @@ -0,0 +1,82 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2018 SUSE Linux Products GmbH.  All Rights Reserved.
> +#
> +# FS QA Test 170
> +#
> +# Test if btrfs can clear orphan (high level qgroup without child) qgroup's
> +# accounting numbers during rescan.
> +# Fixed by the following kernel patch:
> +# "btrfs: qgroup: Dirty all qgroups before rescan"
> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +
> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +
> +
> +# Populate the fs
> +_run_btrfs_util_prog subvolume create "$SCRATCH_MNT/subvol"
> +_pwrite_byte 0xcdcd 0 1M "$SCRATCH_MNT/subvol/file1" | _filter_xfs_io > 
> /dev/null
> +
> +# Ensure that file reach disk, so it will also appear in snapshot

# Ensure that buffered file data is persisted, so we won't have an
empty file in the snapshot.
> +sync
> +_run_btrfs_util_prog subvolume snapshot "$SCRATCH_MNT/subvol" 
> "$SCRATCH_MNT/snapshot"
> +
> +
> +_run_btrfs_util_prog quota enable "$SCRATCH_MNT"
> +_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
> +
> +# Create high level qgroup
> +_run_btrfs_util_prog qgroup create 1/0 "$SCRATCH_MNT"
> +
> +# Don't use _run_btrfs_util_prog here, as it can return 1 to info user
> +# that qgroup is marked inconsistent, this is a bug in btrfs-progs, but
> +# to ensure it will work, we just ignore the return value.

Comment should go away IMHO. The preferred way is to call
$BTRFS_UTIL_PROG and have failures noticed
through differences in the golden output. There's no point in
mentioning something that currently doesn't work
if it's not used here.

> +$BTRFS_UTIL_PROG qgroup assign "$SCRATCH_MNT/snapshot" 1/0 "$SCRATCH_MNT"
> +
> +# Above assign will mark qgroup inconsistent due to the shared extents

assign -> assignment

> +# between subvol/snapshot/high level qgroup, do rescan here
> +_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"

Use $BTRFS_UTIL_PROG directly instead, and adjust the golden output if needed.

> +
> +# Now remove the qgroup relationship and make 1/0 orphan
> +# Due to the shared extent outside of 1/0, we will mark qgroup inconsistent
> +# and keep the number of qgroup 1/0

Missing "." at the end of the sentences.

> +$BTRFS_UTIL_PROG qgroup remove "$SCRATCH_MNT/snapshot" 1/0 "$SCRATCH_MNT"
> +
> +# Above removal also marks qgroup inconsistent, rescan again
> +_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"

Use $BTRFS_UTIL_PROG directly instead, and adjust the golden output if needed.

Thanks.

> +
> +# After the test, btrfs check will verify qgroup numbers to catch any
> +# corruption.
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/170.out b/tests/btrfs/170.out
> new file mode 100644
> index ..9002199e48ed
> --- /dev/null
> +++ b/tests/btrfs/170.out
> @@ -0,0 +1,3 @@
> +QA output created by 170
> +WARNING: quotas may be inconsistent, rescan needed
> +WARNING: quotas may be inconsistent, rescan needed
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index b616c73d09bf..339c977135c0 100644
> --- 

Re: [PATCH v2] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Filipe Manana
On Thu, Aug 9, 2018 at 8:08 AM, Qu Wenruo  wrote:
> [BUG]
> In the following case, rescan won't zero out the number of qgroup 1/0:
> --
> $ mkfs.btrfs -fq $DEV
> $ mount $DEV /mnt
>
> $ btrfs quota enable /mnt
> $ btrfs qgroup create 1/0 /mnt
> $ btrfs sub create /mnt/sub
> $ btrfs qgroup assign 0/257 1/0 /mnt
>
> $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
> $ btrfs sub snap /mnt/sub /mnt/snap
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre /mnt
> qgroupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  16.00KiB 16.00KiB none none --- ---
> 0/257  1016.00KiB 16.00KiB none none 1/0 ---
> 0/258  1016.00KiB 16.00KiB none none --- ---
> 1/01016.00KiB 16.00KiB none none --- 0/257
>
> so far so good, but:
>
> $ btrfs qgroup remove 0/257 1/0 /mnt
> WARNING: quotas may be inconsistent, rescan needed
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre  /mnt
> qgoupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  16.00KiB 16.00KiB none none --- ---
> 0/257  1016.00KiB 16.00KiB none none --- ---
> 0/258  1016.00KiB 16.00KiB none none --- ---
> 1/01016.00KiB 16.00KiB none none --- ---
>^^  not cleared
> --
>
> [CAUSE]
> Before rescan we call qgroup_rescan_zero_tracking() to zero out all
> qgroups' accounting numbers.
>
> However we don't mark all qgroups dirty, but rely on rescan to mark
> qgroups dirty.
>
> If we have any high level qgroup but without any child (orphan group),

That sentence is confusing. An orphan, by definition [1], is someone
(or something in this case) without parents.
But you mention a group without children, so that should be named
"childless" or simply say "without children".
So one part of the sentence is wrong, either what is in parenthesis or
what comes before them.

[1] https://www.thefreedictionary.com/orphan

> it
> won't be marked dirty during rescan, since we can not reach that qgroup.
>
> This will cause QGROUP_INFO items of orphan qgroups never get updated in
> quota tree, thus their numbers will stay the same in "btrfs qgroup show"
> output.
>
> [FIX]
> Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even we
> have orphan qgroups their QGROUP_INFO items will still get updated during
> rescan.
>
> Reported-by: Misono Tomohiro 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Fix some grammar errors in commit message.
> ---
>  fs/btrfs/qgroup.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 48c1c3e7baf3..5a5372b33d96 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info 
> *fs_info)
> qgroup->rfer_cmpr = 0;
> qgroup->excl = 0;
> qgroup->excl_cmpr = 0;
> +   qgroup_dirty(fs_info, qgroup);
> }
> spin_unlock(_info->qgroup_lock);
>  }
> --
> 2.18.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/20] btrfs-progs: Rework of "subvolume list/show" and relax the root privileges of them

2018-08-09 Thread Misono Tomohiro
On 2018/08/03 22:46, David Sterba wrote:
> On Wed, Jul 04, 2018 at 05:14:59PM +0900, Misono Tomohiro wrote:
>> Gentle ping, as this is related to the new ioctls merged in 4.18-rc1.
> 
> Due to me spending more time than expected on kernel, this patchset will
> be merged partially or the 4.18 will be deleayed.

github version is rebased to 4.17.1:
  https://github.com/t-msn/btrfs-progs/tree/rework-sub-list

I will take a vacation from tomorrow and my response may be delayed.

Regards,
Misono

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: btrfs: Add test for corrupted orphan qgroup numbers

2018-08-09 Thread Qu Wenruo
This bug is exposed by populating a high level qgroup, and then make it
orphan (high level qgroup without child) with old qgroup numbers, and
finally do rescan.

Normally rescan should zero out all qgroups' accounting number, but due
to a kernel bug which won't mark orphan qgroups dirty, their on-disk
data is not updated, thus old numbers remain and cause qgroup
corruption.

Fixed by the following kernel patch:
"btrfs: qgroup: Dirty all qgroups before rescan"

Reported-by: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
---
 tests/btrfs/170 | 82 +
 tests/btrfs/170.out |  3 ++
 tests/btrfs/group   |  1 +
 3 files changed, 86 insertions(+)
 create mode 100755 tests/btrfs/170
 create mode 100644 tests/btrfs/170.out

diff --git a/tests/btrfs/170 b/tests/btrfs/170
new file mode 100755
index ..bcf8b5c0e4f3
--- /dev/null
+++ b/tests/btrfs/170
@@ -0,0 +1,82 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2018 SUSE Linux Products GmbH.  All Rights Reserved.
+#
+# FS QA Test 170
+#
+# Test if btrfs can clear orphan (high level qgroup without child) qgroup's
+# accounting numbers during rescan.
+# Fixed by the following kernel patch:
+# "btrfs: qgroup: Dirty all qgroups before rescan"
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+
+
+# Populate the fs
+_run_btrfs_util_prog subvolume create "$SCRATCH_MNT/subvol"
+_pwrite_byte 0xcdcd 0 1M "$SCRATCH_MNT/subvol/file1" | _filter_xfs_io > 
/dev/null
+
+# Ensure that file reach disk, so it will also appear in snapshot
+sync
+_run_btrfs_util_prog subvolume snapshot "$SCRATCH_MNT/subvol" 
"$SCRATCH_MNT/snapshot"
+
+
+_run_btrfs_util_prog quota enable "$SCRATCH_MNT"
+_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
+
+# Create high level qgroup
+_run_btrfs_util_prog qgroup create 1/0 "$SCRATCH_MNT"
+
+# Don't use _run_btrfs_util_prog here, as it can return 1 to info user
+# that qgroup is marked inconsistent, this is a bug in btrfs-progs, but
+# to ensure it will work, we just ignore the return value.
+$BTRFS_UTIL_PROG qgroup assign "$SCRATCH_MNT/snapshot" 1/0 "$SCRATCH_MNT"
+
+# Above assign will mark qgroup inconsistent due to the shared extents
+# between subvol/snapshot/high level qgroup, do rescan here
+_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
+
+# Now remove the qgroup relationship and make 1/0 orphan
+# Due to the shared extent outside of 1/0, we will mark qgroup inconsistent
+# and keep the number of qgroup 1/0
+$BTRFS_UTIL_PROG qgroup remove "$SCRATCH_MNT/snapshot" 1/0 "$SCRATCH_MNT"
+
+# Above removal also marks qgroup inconsistent, rescan again
+_run_btrfs_util_prog quota rescan -w "$SCRATCH_MNT"
+
+# After the test, btrfs check will verify qgroup numbers to catch any
+# corruption.
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/170.out b/tests/btrfs/170.out
new file mode 100644
index ..9002199e48ed
--- /dev/null
+++ b/tests/btrfs/170.out
@@ -0,0 +1,3 @@
+QA output created by 170
+WARNING: quotas may be inconsistent, rescan needed
+WARNING: quotas may be inconsistent, rescan needed
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b616c73d09bf..339c977135c0 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -172,3 +172,4 @@
 167 auto quick replace volume
 168 auto quick send
 169 auto quick send
+170 auto quick qgroup
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Misono Tomohiro
On 2018/08/09 16:08, Qu Wenruo wrote:
> [BUG]
> In the following case, rescan won't zero out the number of qgroup 1/0:
> --
> $ mkfs.btrfs -fq $DEV
> $ mount $DEV /mnt
> 
> $ btrfs quota enable /mnt
> $ btrfs qgroup create 1/0 /mnt
> $ btrfs sub create /mnt/sub
> $ btrfs qgroup assign 0/257 1/0 /mnt
> 
> $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
> $ btrfs sub snap /mnt/sub /mnt/snap
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre /mnt
> qgroupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  16.00KiB 16.00KiB none none --- ---
> 0/257  1016.00KiB 16.00KiB none none 1/0 ---
> 0/258  1016.00KiB 16.00KiB none none --- ---
> 1/01016.00KiB 16.00KiB none none --- 0/257
> 
> so far so good, but:
> 
> $ btrfs qgroup remove 0/257 1/0 /mnt
> WARNING: quotas may be inconsistent, rescan needed
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre  /mnt
> qgoupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  16.00KiB 16.00KiB none none --- ---
> 0/257  1016.00KiB 16.00KiB none none --- ---
> 0/258  1016.00KiB 16.00KiB none none --- ---
> 1/01016.00KiB 16.00KiB none none --- ---
>^^  not cleared
> --
> 
> [CAUSE]
> Before rescan we call qgroup_rescan_zero_tracking() to zero out all
> qgroups' accounting numbers.
> 
> However we don't mark all qgroups dirty, but rely on rescan to mark
> qgroups dirty.
> 
> If we have any high level qgroup but without any child (orphan group), it
> won't be marked dirty during rescan, since we can not reach that qgroup.
> 
> This will cause QGROUP_INFO items of orphan qgroups never get updated in
> quota tree, thus their numbers will stay the same in "btrfs qgroup show"
> output.
> 
> [FIX]
> Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even we
> have orphan qgroups their QGROUP_INFO items will still get updated during
> rescan.
> 
> Reported-by: Misono Tomohiro 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Fix some grammar errors in commit message.
> ---
>  fs/btrfs/qgroup.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 48c1c3e7baf3..5a5372b33d96 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info 
> *fs_info)
>   qgroup->rfer_cmpr = 0;
>   qgroup->excl = 0;
>   qgroup->excl_cmpr = 0;
> + qgroup_dirty(fs_info, qgroup);
>   }
>   spin_unlock(_info->qgroup_lock);
>  }
> 
Yes, this and previous patch
  [PATCH] btrfs: qgroup: Don't populating excl numbers for snapshot src if it 
belongs to other qgroups
resolves the problem I see.

So, this will reset all the qgroup items not rescanned in first transaction 
commit
by btrfs_run_qgroups(), but I don't think it is a problem.

Tested-by/Reviewed-by: Misono Tomohiro 

Thanks,
Misono

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recover broken partition on external HDD

2018-08-09 Thread Marijn Stollenga
Thanks for the reply!

Indeed I should back up more properly, that was actually what I was in
the process of doing but yeah. I'll check out the pointers, and I
guess I'll just read the papers describing the whole btrfs system to
see how it works.
I would like to make an automatic scrubbing application that you can
just point to a block device where partitions are not even correct and
it tries to find as many files as possible.



On 7 August 2018 at 00:40, Duncan <1i5t5.dun...@cox.net> wrote:
> Marijn Stollenga posted on Sat, 04 Aug 2018 12:14:44 +0200 as excerpted:
>
>> Hello btrfs experts, I need your help trying to recover an external HDD.
>> I accidentally created a zfs partition on my external HD, which of
>> course screw up the whole partition. I quickly unplugged it and it being
>> a 1TB drive is assume there is still data on it.
>
> Just a user and list regular here, not a dev, so my help will be somewhat
> limited, but as I've seen no other replies yet, perhaps it's better than
> nothing...
>
>> With some great help on IRC I searched for tags using grep and found
>> many positions:
>> https://paste.ee/p/xzL5x
>>
>> Now I would like to scan all these positions for their information and
>> somehow piece it together, I know there is supposed to be a superblock
>> around 256GB but I'm not sure where the partition started (the search
>> was run from a manually created partition starting at 1MB).
>
> There's a mention of the three superblock copies and their addresses in
> the problem FAQ (wrapped link due to posting-client required hoops I
> don't want to jump thru to post it properly ATM, unwrap manually):
>
> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#What_if_I_don.
> 27t_have_wipefs_at_hand.3F
>
>> In general I would be happy if someone can point me to a library that
>> can do low level reading and piecing together of these pieces of meta
>> information and see what is left.
>
> There are multiple libraries in various states available, but being more
> a sysadmin than a dev I'd consume them as dependencies of whatever app I
> was installing that required them, so I've not followed the details.
> However, here's a bit of what I found just now with a quick look:
>
> The project ideas FAQ on the wiki has a (somewhat outdated) library entry
> (wrapped link...):
>
> https://btrfs.wiki.kernel.org/index.php/
> Project_ideas#Provide_a_library_covering_.27btrfs.27_functionality
>
> That provides further links to a couple python projects as well as a
> haskell lib.
>
> But I added the "somewhat dated" parenthetical due to libtrfsutil by Omar
> Sandoval, which appeared in btrfs-progs 4.16.  So there's now an official
> library. =:^)  Tho not being a dev I've not the foggiest whether it'll
> provide the functionality you're after.
>
> I also see a rust lib mentioned on-list (Oct 2016).
>
> https://gitlab.wellbehavedsoftware.com/well-behaved-software/rust-btrfs
>
>> I know there is btrfs-check etc. but these need the superblock to be
>> known. Also on another messed up drive (I screwed up two btrfs drives in
>> the same way at the same time) I was able to find the third superblock,
>> but it seems they in the end pointed to other parts in the file system
>> in the beginning of the drive which were broken.
>
> OK, this may seem like rubbing salt in the wound ATM, but there's a
> reason they did that back in the day before modern disinfectants, it
> helped stop infection before it started.  Likewise, the following policy
> should help avoid the problem in the first place.
>
> A sysadmin's first rule of data value and backups is that the real value
> placed on data isn't defined by arbitrary claims, but rather by the
> number and quality of backups those who control that data find it
> worthwhile to make of it.  If it's worth a lot, there will be multiple
> backups, likely stored in multiple locations, some offsite in ordered to
> avoid loss in the event of fire/flood/bombing/etc.  Only data that's of
> trivial value, less than that of the time/trouble/resources necessary to
> do that backup, will have no backup at all.
>
> (Of course, age of backups is simply a sub-case of the above, since in
> that case the data in question is simply the data in the delta between
> the last backup and the current working state.  By definition, as soon as
> it is considered worth more than the time/trouble/resources necessary to
> update the backup, an updated or full new backup will be made.)
>
> (The second rule of backups is that it's not a backup until it has been
> tested to actually be usable under conditions similar to those in which
> the backup would actually be needed.  In many cases that'll mean booting
> to rescue media and ensuring they can access and restore the backup from
> there using only the resources available from that rescue media.  In
> other cases it'll mean booting directly to the backup and ensuring that
> normal operations can resume from there.  Etc.  And if it hasn't been
> tested yet, it's 

Re: [PATCH] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Qu Wenruo



On 8/9/18 3:05 PM, Nikolay Borisov wrote:
> 
> 
> On  9.08.2018 10:03, Qu Wenruo wrote:
>> [BUG]
>> In the following case, rescan won't zero out the number of qgroup 1/0:
>> --
>> $ mkfs.btrfs -fq $DEV
>> $ mount $DEV /mnt
>>
>> $ btrfs quota enable /mnt
>> $ btrfs qgroup create 1/0 /mnt
>> $ btrfs sub create /mnt/sub
>> $ btrfs qgroup assign 0/257 1/0 /mnt
>>
>> $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
>> $ btrfs sub snap /mnt/sub /mnt/snap
>> $ btrfs quota rescan -w /mnt
>> $ btrfs qgroup show -pcre /mnt
>> qgroupid rfer excl max_rfer max_excl parent  child
>>      --  -
>> 0/5  16.00KiB 16.00KiB none none --- ---
>> 0/257  1016.00KiB 16.00KiB none none 1/0 ---
>> 0/258  1016.00KiB 16.00KiB none none --- ---
>> 1/01016.00KiB 16.00KiB none none --- 0/257
>>
>> so far so good, but:
>>
>> $ btrfs qgroup remove 0/257 1/0 /mnt
>> WARNING: quotas may be inconsistent, rescan needed
>> $ btrfs quota rescan -w /mnt
>> $ btrfs qgroup show -pcre  /mnt
>> qgoupid rfer excl max_rfer max_excl parent  child
>>      --  -
>> 0/5  16.00KiB 16.00KiB none none --- ---
>> 0/257  1016.00KiB 16.00KiB none none --- ---
>> 0/258  1016.00KiB 16.00KiB none none --- ---
>> 1/01016.00KiB 16.00KiB none none --- ---
>>^^  not cleared
>> --
>>
>> [CAUSE]
>> Before rescan we call qgroup_rescan_zero_tracking() to zero out all
>> qgroups' accounting numbers.
>>
>> However we don't mark all these qgroups dirty, but rely on rescan to
>> mark qgroup dirty.
>>
>> If we have high level qgroup but without any child (orphan group), it
>> won't be marked dirty during rescan, since we can reach that qgroup.
>>
>> This will cause QGROUP_INFO item of orphan qgroups never updated in
>> quota tree, thus its number will stay the same in "btrfs qgroup show"
>> output.
>>
>> [FIX]
>> Just mark all qgroup dirty in qgroup_rescan_zero_tracking(), so even we
>> have orphan qgroups their QGROUP_INFO item will still get updated during
>> rescan.
>>
>> Reported-by: Misono Tomohiro 
>> Signed-off-by: Qu Wenruo 
> 
> DOesn't this warrant an xfstests as well ?

Just crafting. :)

Thanks,
Qu

>> ---
>>  fs/btrfs/qgroup.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 48c1c3e7baf3..5a5372b33d96 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info 
>> *fs_info)
>>  qgroup->rfer_cmpr = 0;
>>  qgroup->excl = 0;
>>  qgroup->excl_cmpr = 0;
>> +qgroup_dirty(fs_info, qgroup);
>>  }
>>  spin_unlock(_info->qgroup_lock);
>>  }
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Qu Wenruo
[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:
--
$ mkfs.btrfs -fq $DEV
$ mount $DEV /mnt

$ btrfs quota enable /mnt
$ btrfs qgroup create 1/0 /mnt
$ btrfs sub create /mnt/sub
$ btrfs qgroup assign 0/257 1/0 /mnt

$ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
$ btrfs sub snap /mnt/sub /mnt/snap
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/257  1016.00KiB 16.00KiB none none 1/0 ---
0/258  1016.00KiB 16.00KiB none none --- ---
1/01016.00KiB 16.00KiB none none --- 0/257

so far so good, but:

$ btrfs qgroup remove 0/257 1/0 /mnt
WARNING: quotas may be inconsistent, rescan needed
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre  /mnt
qgoupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/257  1016.00KiB 16.00KiB none none --- ---
0/258  1016.00KiB 16.00KiB none none --- ---
1/01016.00KiB 16.00KiB none none --- ---
   ^^  not cleared
--

[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.

However we don't mark all qgroups dirty, but rely on rescan to mark
qgroups dirty.

If we have any high level qgroup but without any child (orphan group), it
won't be marked dirty during rescan, since we can not reach that qgroup.

This will cause QGROUP_INFO items of orphan qgroups never get updated in
quota tree, thus their numbers will stay the same in "btrfs qgroup show"
output.

[FIX]
Just mark all qgroups dirty in qgroup_rescan_zero_tracking(), so even we
have orphan qgroups their QGROUP_INFO items will still get updated during
rescan.

Reported-by: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Fix some grammar errors in commit message.
---
 fs/btrfs/qgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48c1c3e7baf3..5a5372b33d96 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info)
qgroup->rfer_cmpr = 0;
qgroup->excl = 0;
qgroup->excl_cmpr = 0;
+   qgroup_dirty(fs_info, qgroup);
}
spin_unlock(_info->qgroup_lock);
 }
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5] btrfs: qgroup: Remove qgroup items along with subvolume deletion

2018-08-09 Thread Misono Tomohiro
When qgroup is on, subvolume deletion does not remove qgroup items
of the subvolume (qgroup info, limit, relation) from quota tree and
they need to get removed manually by "btrfs qgroup destroy".

Since level 0 qgroup cannot be used/inherited by any other subvolume,
let's remove them automatically when subvolume is deleted
(to be precise, when the subvolume root is dropped).

Signed-off-by: Misono Tomohiro 
---
v4 -> v5:
  Commit current transaction before calling btrfs_remove_qgroup() to
  keep qgroup consistency in all case. This resolves the concern in
  v4 path and there should be no demerit in this patch.

 fs/btrfs/extent-tree.c | 45 +
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9e7b237b9547..ed052105e741 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
struct btrfs_root_item *root_item = >root_item;
struct walk_control *wc;
struct btrfs_key key;
+   u64 objectid = root->root_key.objectid;
int err = 0;
int ret;
int level;
bool root_dropped = false;
 
-   btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
+   btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
 
path = btrfs_alloc_path();
if (!path) {
@@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_end_trans;
}
 
-   if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
+   if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
ret = btrfs_find_root(tree_root, >root_key, path,
  NULL, NULL);
if (ret < 0) {
@@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 *
 * The most common failure here is just -ENOENT.
 */
-   btrfs_del_orphan_item(trans, tree_root,
- root->root_key.objectid);
+   btrfs_del_orphan_item(trans, tree_root, objectid);
}
}
 
@@ -9056,6 +9056,43 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
btrfs_put_fs_root(root);
}
root_dropped = true;
+
+   if (test_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags)) {
+   /*
+* Remove level-0 qgroup items since no other subvolume can
+* use them.
+*
+* First, commit current transaction in order to make sure
+* this subvolume's excl == rfer == 0. Otherwise removing
+* qgroup relation causes qgroup inconsistency if excl != rfer.
+*/
+   ret = btrfs_commit_transaction(trans);
+   if (ret)
+   goto out_free;
+
+   /* Start new transaction and remove qgroup items */
+   trans = btrfs_start_transaction(tree_root, 0);
+   if (IS_ERR(trans)) {
+   err = PTR_ERR(trans);
+   goto out_free;
+   }
+
+   ret = btrfs_remove_qgroup(trans, objectid);
+   if (ret == 1) {
+   /*
+* This means qgroup becomes inconsistent
+* (should not happen since we did transaction commit)
+*/
+   btrfs_warn(fs_info,
+   "qgroup inconsistency found, need qgroup rescan");
+   } else if (ret == -EINVAL || ret == -ENOENT) {
+   /* qgroup is already removed, just ignore this */
+   } else if (ret) {
+   btrfs_abort_transaction(trans, ret);
+   err = ret;
+   }
+   }
+
 out_end_trans:
btrfs_end_transaction_throttle(trans);
 out_free:
-- 
2.14.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Nikolay Borisov



On  9.08.2018 10:03, Qu Wenruo wrote:
> [BUG]
> In the following case, rescan won't zero out the number of qgroup 1/0:
> --
> $ mkfs.btrfs -fq $DEV
> $ mount $DEV /mnt
> 
> $ btrfs quota enable /mnt
> $ btrfs qgroup create 1/0 /mnt
> $ btrfs sub create /mnt/sub
> $ btrfs qgroup assign 0/257 1/0 /mnt
> 
> $ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
> $ btrfs sub snap /mnt/sub /mnt/snap
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre /mnt
> qgroupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  16.00KiB 16.00KiB none none --- ---
> 0/257  1016.00KiB 16.00KiB none none 1/0 ---
> 0/258  1016.00KiB 16.00KiB none none --- ---
> 1/01016.00KiB 16.00KiB none none --- 0/257
> 
> so far so good, but:
> 
> $ btrfs qgroup remove 0/257 1/0 /mnt
> WARNING: quotas may be inconsistent, rescan needed
> $ btrfs quota rescan -w /mnt
> $ btrfs qgroup show -pcre  /mnt
> qgoupid rfer excl max_rfer max_excl parent  child
>      --  -
> 0/5  16.00KiB 16.00KiB none none --- ---
> 0/257  1016.00KiB 16.00KiB none none --- ---
> 0/258  1016.00KiB 16.00KiB none none --- ---
> 1/01016.00KiB 16.00KiB none none --- ---
>^^  not cleared
> --
> 
> [CAUSE]
> Before rescan we call qgroup_rescan_zero_tracking() to zero out all
> qgroups' accounting numbers.
> 
> However we don't mark all these qgroups dirty, but rely on rescan to
> mark qgroup dirty.
> 
> If we have high level qgroup but without any child (orphan group), it
> won't be marked dirty during rescan, since we can reach that qgroup.
> 
> This will cause QGROUP_INFO item of orphan qgroups never updated in
> quota tree, thus its number will stay the same in "btrfs qgroup show"
> output.
> 
> [FIX]
> Just mark all qgroup dirty in qgroup_rescan_zero_tracking(), so even we
> have orphan qgroups their QGROUP_INFO item will still get updated during
> rescan.
> 
> Reported-by: Misono Tomohiro 
> Signed-off-by: Qu Wenruo 

DOesn't this warrant an xfstests as well ?
> ---
>  fs/btrfs/qgroup.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 48c1c3e7baf3..5a5372b33d96 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info 
> *fs_info)
>   qgroup->rfer_cmpr = 0;
>   qgroup->excl = 0;
>   qgroup->excl_cmpr = 0;
> + qgroup_dirty(fs_info, qgroup);
>   }
>   spin_unlock(_info->qgroup_lock);
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: qgroup: Dirty all qgroups before rescan

2018-08-09 Thread Qu Wenruo
[BUG]
In the following case, rescan won't zero out the number of qgroup 1/0:
--
$ mkfs.btrfs -fq $DEV
$ mount $DEV /mnt

$ btrfs quota enable /mnt
$ btrfs qgroup create 1/0 /mnt
$ btrfs sub create /mnt/sub
$ btrfs qgroup assign 0/257 1/0 /mnt

$ dd if=/dev/urandom of=/mnt/sub/file bs=1k count=1000
$ btrfs sub snap /mnt/sub /mnt/snap
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre /mnt
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/257  1016.00KiB 16.00KiB none none 1/0 ---
0/258  1016.00KiB 16.00KiB none none --- ---
1/01016.00KiB 16.00KiB none none --- 0/257

so far so good, but:

$ btrfs qgroup remove 0/257 1/0 /mnt
WARNING: quotas may be inconsistent, rescan needed
$ btrfs quota rescan -w /mnt
$ btrfs qgroup show -pcre  /mnt
qgoupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/257  1016.00KiB 16.00KiB none none --- ---
0/258  1016.00KiB 16.00KiB none none --- ---
1/01016.00KiB 16.00KiB none none --- ---
   ^^  not cleared
--

[CAUSE]
Before rescan we call qgroup_rescan_zero_tracking() to zero out all
qgroups' accounting numbers.

However we don't mark all these qgroups dirty, but rely on rescan to
mark qgroup dirty.

If we have high level qgroup but without any child (orphan group), it
won't be marked dirty during rescan, since we can reach that qgroup.

This will cause QGROUP_INFO item of orphan qgroups never updated in
quota tree, thus its number will stay the same in "btrfs qgroup show"
output.

[FIX]
Just mark all qgroup dirty in qgroup_rescan_zero_tracking(), so even we
have orphan qgroups their QGROUP_INFO item will still get updated during
rescan.

Reported-by: Misono Tomohiro 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/qgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48c1c3e7baf3..5a5372b33d96 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2864,6 +2864,7 @@ qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info)
qgroup->rfer_cmpr = 0;
qgroup->excl = 0;
qgroup->excl_cmpr = 0;
+   qgroup_dirty(fs_info, qgroup);
}
spin_unlock(_info->qgroup_lock);
 }
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Remove qgroup items along with subvolume deletion

2018-08-09 Thread Misono Tomohiro
On 2018/08/09 15:14, Qu Wenruo wrote:
> 
> 
> On 8/9/18 2:05 PM, Misono Tomohiro wrote:
>> On 2018/08/09 14:47, Qu Wenruo wrote:
>>>
>>>
>>> On 8/9/18 12:12 PM, Misono Tomohiro wrote:
 When qgroup is on, subvolume deletion does not remove qgroup items
 of the subvolume (qgroup info, limit, relation) from quota tree and
 they need to get removed manually by "btrfs qgroup destroy".

 Since level 0 qgroup cannot be used/inherited by any other subvolume,
 let's remove them automatically when subvolume is deleted
 (to be precise, when the subvolume root is dropped).

 Note that qgroup becomes inconsistent in following case:
   1. qgroup relation exists
   2. and subvolume's excl != rref
>>>
>>> That's a little strange.
>>>
>>> If a subvolume is completely dropped, its excl should be the same rfer,
>>> all 0, and removing its relationship should not mark qgroup inconsistent.
>>>
>>> So the problem is the timing when btrfs_remove_qgroup() is called.
>>>
>>> Since qgroup accounting is only called at transaction commit time, and
>>> we're holding a trans handler, it's almost ensured we can't commit this
>>> transaction, thus the number is not updated yet (still not 0)
>>>
>>> So that's why qgroup is inconsistent.
>>>
>>> What about commit current transaction and then call btrfs_remove_qgroup()?
>>>
>>> (Sorry I didn't catch this problem last time I reviewed this patch)
>>
>> well, I'm little confusing about flow of transaction commit.
>> btrfs_drop_snapshot() is called from cleaner_kthread and
>> is it ok to commit transaction in it?
> 
> Not completely clear of the cleaner_kthread(), but from what I see in
> btrfs_drop_snapshot(), btrfs_end_transaction_throttle() itself could
> commit current transaction.
> 
> So in theory we should be OK to finish all the original work of
> btrfs_drop_snapshot(), and then commit current transaction, and finally
> do the qgroup cleanup work.
> 
> But I could totally be wrong, and feel free to point what I'm missing.

Thank you very much for explanation.
I changed code to commit transaction and it works,
so I hope next version will solve all the problem.

Thanks,
Misono

> 
> Thanks,
> Qu
> 
>>
>>>
>>> Thanks,
>>> Qu
>>>
 In this case manual qgroup rescan is needed.

 Reviewed-by: Lu Fengqi 
 Reviewed-by: Qu Wenruo 
 Signed-off-by: Misono Tomohiro 
 ---
 Hi David,
 It turned out that this patch may cause qgroup inconsistency in case
 described above and need manual rescan. Since current code will keep 
 qgroup items but not break qgroup consistency when deleting subvolume,
 I cannot clearly say which behavior is better for qgroup usability.
 Can I ask your opinion?

 v3 -> v4:
   Check return value of btrfs_remove_qgroup() and if it is 1,
   print message in syslog that fs needs qgroup rescan

  fs/btrfs/extent-tree.c | 22 ++
  1 file changed, 18 insertions(+), 4 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 9e7b237b9547..828d9e68047d 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
struct btrfs_root_item *root_item = >root_item;
struct walk_control *wc;
struct btrfs_key key;
 +  u64 objectid = root->root_key.objectid;
int err = 0;
int ret;
int level;
bool root_dropped = false;
  
 -  btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
 +  btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
  
path = btrfs_alloc_path();
if (!path) {
 @@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_end_trans;
}
  
 -  if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
 +  if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
ret = btrfs_find_root(tree_root, >root_key, path,
  NULL, NULL);
if (ret < 0) {
 @@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 *
 * The most common failure here is just -ENOENT.
 */
 -  btrfs_del_orphan_item(trans, tree_root,
 -root->root_key.objectid);
 +  btrfs_del_orphan_item(trans, tree_root, objectid);
}
}
  
 @@ -9056,6 +9056,20 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
btrfs_put_fs_root(root);
}
root_dropped = true;
 +
 +   /* Remove level-0 qgroup items since no other subvolume can use them */
 +  ret = btrfs_remove_qgroup(trans, objectid);
 +  if (ret == 1) {
 +  /* This means qgroup becomes inconsistent by removing items */
 +  btrfs_info(fs_info,

Re: [PATCH v4] btrfs: qgroup: Remove qgroup items along with subvolume deletion

2018-08-09 Thread Qu Wenruo


On 8/9/18 2:05 PM, Misono Tomohiro wrote:
> On 2018/08/09 14:47, Qu Wenruo wrote:
>>
>>
>> On 8/9/18 12:12 PM, Misono Tomohiro wrote:
>>> When qgroup is on, subvolume deletion does not remove qgroup items
>>> of the subvolume (qgroup info, limit, relation) from quota tree and
>>> they need to get removed manually by "btrfs qgroup destroy".
>>>
>>> Since level 0 qgroup cannot be used/inherited by any other subvolume,
>>> let's remove them automatically when subvolume is deleted
>>> (to be precise, when the subvolume root is dropped).
>>>
>>> Note that qgroup becomes inconsistent in following case:
>>>   1. qgroup relation exists
>>>   2. and subvolume's excl != rref
>>
>> That's a little strange.
>>
>> If a subvolume is completely dropped, its excl should be the same rfer,
>> all 0, and removing its relationship should not mark qgroup inconsistent.
>>
>> So the problem is the timing when btrfs_remove_qgroup() is called.
>>
>> Since qgroup accounting is only called at transaction commit time, and
>> we're holding a trans handler, it's almost ensured we can't commit this
>> transaction, thus the number is not updated yet (still not 0)
>>
>> So that's why qgroup is inconsistent.
>>
>> What about commit current transaction and then call btrfs_remove_qgroup()?
>>
>> (Sorry I didn't catch this problem last time I reviewed this patch)
> 
> well, I'm little confusing about flow of transaction commit.
> btrfs_drop_snapshot() is called from cleaner_kthread and
> is it ok to commit transaction in it?

Not completely clear of the cleaner_kthread(), but from what I see in
btrfs_drop_snapshot(), btrfs_end_transaction_throttle() itself could
commit current transaction.

So in theory we should be OK to finish all the original work of
btrfs_drop_snapshot(), and then commit current transaction, and finally
do the qgroup cleanup work.

But I could totally be wrong, and feel free to point what I'm missing.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>> In this case manual qgroup rescan is needed.
>>>
>>> Reviewed-by: Lu Fengqi 
>>> Reviewed-by: Qu Wenruo 
>>> Signed-off-by: Misono Tomohiro 
>>> ---
>>> Hi David,
>>> It turned out that this patch may cause qgroup inconsistency in case
>>> described above and need manual rescan. Since current code will keep 
>>> qgroup items but not break qgroup consistency when deleting subvolume,
>>> I cannot clearly say which behavior is better for qgroup usability.
>>> Can I ask your opinion?
>>>
>>> v3 -> v4:
>>>   Check return value of btrfs_remove_qgroup() and if it is 1,
>>>   print message in syslog that fs needs qgroup rescan
>>>
>>>  fs/btrfs/extent-tree.c | 22 ++
>>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 9e7b237b9547..828d9e68047d 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> struct btrfs_root_item *root_item = >root_item;
>>> struct walk_control *wc;
>>> struct btrfs_key key;
>>> +   u64 objectid = root->root_key.objectid;
>>> int err = 0;
>>> int ret;
>>> int level;
>>> bool root_dropped = false;
>>>  
>>> -   btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
>>> +   btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
>>>  
>>> path = btrfs_alloc_path();
>>> if (!path) {
>>> @@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> goto out_end_trans;
>>> }
>>>  
>>> -   if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
>>> +   if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
>>> ret = btrfs_find_root(tree_root, >root_key, path,
>>>   NULL, NULL);
>>> if (ret < 0) {
>>> @@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>>  *
>>>  * The most common failure here is just -ENOENT.
>>>  */
>>> -   btrfs_del_orphan_item(trans, tree_root,
>>> - root->root_key.objectid);
>>> +   btrfs_del_orphan_item(trans, tree_root, objectid);
>>> }
>>> }
>>>  
>>> @@ -9056,6 +9056,20 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>> btrfs_put_fs_root(root);
>>> }
>>> root_dropped = true;
>>> +
>>> +/* Remove level-0 qgroup items since no other subvolume can use them */
>>> +   ret = btrfs_remove_qgroup(trans, objectid);
>>> +   if (ret == 1) {
>>> +   /* This means qgroup becomes inconsistent by removing items */
>>> +   btrfs_info(fs_info,
>>> +   "qgroup inconsistency found, need qgroup rescan");
>>> +   } else if (ret == -EINVAL || ret == -ENOENT) {
>>> +   /* qgroup is not enabled or already removed, just ignore this */
>>> +   } else if (ret) {
>>> +   btrfs_abort_transaction(trans, ret);
>>> +  

Re: [PATCH v4] btrfs: qgroup: Remove qgroup items along with subvolume deletion

2018-08-09 Thread Misono Tomohiro
On 2018/08/09 14:47, Qu Wenruo wrote:
> 
> 
> On 8/9/18 12:12 PM, Misono Tomohiro wrote:
>> When qgroup is on, subvolume deletion does not remove qgroup items
>> of the subvolume (qgroup info, limit, relation) from quota tree and
>> they need to get removed manually by "btrfs qgroup destroy".
>>
>> Since level 0 qgroup cannot be used/inherited by any other subvolume,
>> let's remove them automatically when subvolume is deleted
>> (to be precise, when the subvolume root is dropped).
>>
>> Note that qgroup becomes inconsistent in following case:
>>   1. qgroup relation exists
>>   2. and subvolume's excl != rref
> 
> That's a little strange.
> 
> If a subvolume is completely dropped, its excl should be the same rfer,
> all 0, and removing its relationship should not mark qgroup inconsistent.
> 
> So the problem is the timing when btrfs_remove_qgroup() is called.
> 
> Since qgroup accounting is only called at transaction commit time, and
> we're holding a trans handler, it's almost ensured we can't commit this
> transaction, thus the number is not updated yet (still not 0)
> 
> So that's why qgroup is inconsistent.
> 
> What about commit current transaction and then call btrfs_remove_qgroup()?
> 
> (Sorry I didn't catch this problem last time I reviewed this patch)

well, I'm little confusing about flow of transaction commit.
btrfs_drop_snapshot() is called from cleaner_kthread and
is it ok to commit transaction in it?

> 
> Thanks,
> Qu
> 
>> In this case manual qgroup rescan is needed.
>>
>> Reviewed-by: Lu Fengqi 
>> Reviewed-by: Qu Wenruo 
>> Signed-off-by: Misono Tomohiro 
>> ---
>> Hi David,
>> It turned out that this patch may cause qgroup inconsistency in case
>> described above and need manual rescan. Since current code will keep 
>> qgroup items but not break qgroup consistency when deleting subvolume,
>> I cannot clearly say which behavior is better for qgroup usability.
>> Can I ask your opinion?
>>
>> v3 -> v4:
>>   Check return value of btrfs_remove_qgroup() and if it is 1,
>>   print message in syslog that fs needs qgroup rescan
>>
>>  fs/btrfs/extent-tree.c | 22 ++
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 9e7b237b9547..828d9e68047d 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -8871,12 +8871,13 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>  struct btrfs_root_item *root_item = >root_item;
>>  struct walk_control *wc;
>>  struct btrfs_key key;
>> +u64 objectid = root->root_key.objectid;
>>  int err = 0;
>>  int ret;
>>  int level;
>>  bool root_dropped = false;
>>  
>> -btrfs_debug(fs_info, "Drop subvolume %llu", root->objectid);
>> +btrfs_debug(fs_info, "Drop subvolume %llu", objectid);
>>  
>>  path = btrfs_alloc_path();
>>  if (!path) {
>> @@ -9030,7 +9031,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>  goto out_end_trans;
>>  }
>>  
>> -if (root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID) {
>> +if (objectid != BTRFS_TREE_RELOC_OBJECTID) {
>>  ret = btrfs_find_root(tree_root, >root_key, path,
>>NULL, NULL);
>>  if (ret < 0) {
>> @@ -9043,8 +9044,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>   *
>>   * The most common failure here is just -ENOENT.
>>   */
>> -btrfs_del_orphan_item(trans, tree_root,
>> -  root->root_key.objectid);
>> +btrfs_del_orphan_item(trans, tree_root, objectid);
>>  }
>>  }
>>  
>> @@ -9056,6 +9056,20 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>  btrfs_put_fs_root(root);
>>  }
>>  root_dropped = true;
>> +
>> + /* Remove level-0 qgroup items since no other subvolume can use them */
>> +ret = btrfs_remove_qgroup(trans, objectid);
>> +if (ret == 1) {
>> +/* This means qgroup becomes inconsistent by removing items */
>> +btrfs_info(fs_info,
>> +"qgroup inconsistency found, need qgroup rescan");
>> +} else if (ret == -EINVAL || ret == -ENOENT) {
>> +/* qgroup is not enabled or already removed, just ignore this */
>> +} else if (ret) {
>> +btrfs_abort_transaction(trans, ret);
>> +err = ret;
>> +}
>> +
>>  out_end_trans:
>>  btrfs_end_transaction_throttle(trans);
>>  out_free:
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html