Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Marat Khalili
If we were going to reserve something, it should be a high number, not a low one. Having 0 reserved makes some sense, but reserving other low numbers seems kind of odd when they aren't already reserved. I did some experiments. Currently assigning higher-level qgroup to lower-level qgroup is

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Duncan
Austin S. Hemmelgarn posted on Tue, 28 Mar 2017 07:44:56 -0400 as excerpted: > On 2017-03-27 21:49, Qu Wenruo wrote: >> The problem is, how should we treat subvolume. >> >> Btrfs subvolume sits in the middle of directory and (logical) volume >> used in traditional stacked solution. >> >> While

Re: [PATCH] Btrfs: fix wrong failed mirror_num of read-repair on raid56

2017-03-28 Thread Liu Bo
On Mon, Mar 27, 2017 at 07:07:15PM +0200, David Sterba wrote: > On Fri, Mar 24, 2017 at 12:13:42PM -0700, Liu Bo wrote: > > In raid56 senario, after trying parity recovery, we didn't set > > mirror_num for btrfs_bio with failed mirror_num, hence > > end_bio_extent_readpage() will report a random

Re: [PATCH] Btrfs: enable repair during read for raid56 profile

2017-03-28 Thread Liu Bo
On Mon, Mar 27, 2017 at 06:59:44PM +0200, David Sterba wrote: > On Fri, Mar 24, 2017 at 12:13:35PM -0700, Liu Bo wrote: > > Now that scrub can fix data errors with the help of parity for raid56 > > profile, repair during read is able to as well. > > > > Although the mirror num in raid56 senario

[PATCH 2/2] btrfs: clear the RAID5/6 incompat flag once no longer needed

2017-03-28 Thread Adam Borowski
There's no known taint on a filesystem that was RAID5/6 once but has been since converted to something non-experimental. Signed-off-by: Adam Borowski --- fs/btrfs/disk-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index

[PATCH 1/2] btrfs: warn about RAID5/6 being experimental at mount time

2017-03-28 Thread Adam Borowski
Too many people come complaining about losing their data -- and indeed, there's no warning outside a wiki and the mailing list tribal knowledge. Message severity chosen for consistency with XFS -- "alert" makes dmesg produce nice red background which should get the point across. Signed-off-by:

[PATCH v3 0/5] raid56: scrub related fixes

2017-03-28 Thread Qu Wenruo
This patchset can be fetched from my github repo: https://github.com/adam900710/linux.git raid56_fixes It's based on v4.11-rc2, the last two patches get modified according to the advice from Liu Bo. The patchset fixes the following bugs: 1) False alert or wrong csum error number when scrubbing

[PATCH v3 2/5] btrfs: scrub: Fix RAID56 recovery race condition

2017-03-28 Thread Qu Wenruo
When scrubbing a RAID5 which has recoverable data corruption (only one data stripe is corrupted), sometimes scrub will report more csum errors than expected. Sometimes even unrecoverable error will be reported. The problem can be easily reproduced by the following steps: 1) Create a btrfs with

[PATCH v3 1/5] btrfs: scrub: Introduce full stripe lock for RAID56

2017-03-28 Thread Qu Wenruo
Unlike mirror based profiles, RAID5/6 recovery needs to read out the whole full stripe. And if we don't do proper protect, it can easily cause race condition. Introduce 2 new functions: lock_full_stripe() and unlock_full_stripe() for RAID5/6. Which stores a rb_tree of mutex for full stripes, so

[PATCH v3 3/5] btrfs: scrub: Don't append on-disk pages for raid56 scrub

2017-03-28 Thread Qu Wenruo
In the following situation, scrub will calculate wrong parity to overwrite correct one: RAID5 full stripe: Before | Dev 1 | Dev 2 | Dev 3 | | Data stripe 1 | Data stripe 2 | Parity Stripe | --- 0 | 0x (Bad) |

[PATCH v3 5/5] btrfs: Prevent scrub recheck from racing with dev replace

2017-03-28 Thread Qu Wenruo
scrub_setup_recheck_block() calls btrfs_map_sblock() and then access bbio without protection of bio_counter. This can leads to use-after-free if racing with dev replace cancel. Fix it by increasing bio_counter before calling btrfs_map_sblock() and decrease the bio_counter when corresponding

[PATCH v3 4/5] btrfs: Wait flighting bio before freeing target device for raid56

2017-03-28 Thread Qu Wenruo
When raid56 dev replace is cancelled by running scrub, we will free target device without waiting flighting bios, causing the following NULL pointer deference or general protection. BUG: unable to handle kernel NULL pointer dereference at 05e0 IP:

Re: How to test in-bound dedupe

2017-03-28 Thread Qu Wenruo
At 03/29/2017 01:52 AM, Jitendra wrote: Hi Qu/All, I am looking into in-memory in-bound dedup. I have cloned your git tree from following urls. Linux Tree: https://github.com/adam900710/linux.git wang_dedupe_latest btrfs-progs: https://g...@github.com:adam900710/btrfs-progs.git

Re: send snapshot from snapshot incremental

2017-03-28 Thread Hugo Mills
On Tue, Mar 28, 2017 at 06:40:25PM -0400, J. Hart wrote: > Don't be embarrassed. > I'm a native speaker and still have trouble with most explanations.:-) You should try writing them. ;) Hugo ("darkling"). > > On 03/28/2017 06:01 PM, Jakob Schürz wrote: > >Thanks for that

Re: send snapshot from snapshot incremental

2017-03-28 Thread J. Hart
Don't be embarrassed. I'm a native speaker and still have trouble with most explanations.:-) On 03/28/2017 06:01 PM, Jakob Schürz wrote: Thanks for that explanation. I'm sure, i didn't understand the -c option... and my english is pretty good enough for the most things I need to know

Re: send snapshot from snapshot incremental

2017-03-28 Thread Jakob Schürz
Thanks for that explanation. I'm sure, i didn't understand the -c option... and my english is pretty good enough for the most things I need to know in Linux-things... but not for this. :-( Am 2017-03-26 um 22:07 schrieb Peter Grandi: > [ ... ] >> BUT if i take a snapshot from the system, and

Re: [PATCH v2] Btrfs: fix unexpected file hole after disk errors

2017-03-28 Thread Liu Bo
On Tue, Mar 28, 2017 at 02:50:06PM +0200, David Sterba wrote: > On Mon, Mar 06, 2017 at 12:23:30PM -0800, Liu Bo wrote: > > Btrfs creates hole extents to cover any unwritten section right before > > doing buffer writes after commit 3ac0d7b96a26 ("btrfs: Change the expanding > > write sequence to

How to test in-bound dedupe

2017-03-28 Thread Jitendra
Hi Qu/All, I am looking into in-memory in-bound dedup. I have cloned your git tree from following urls. Linux Tree: https://github.com/adam900710/linux.git wang_dedupe_latest btrfs-progs: https://g...@github.com:adam900710/btrfs-progs.git dedupe_20170316 Then run the follwoing test

Re: [PATCH V2 4/4] btrfs: cleanup barrier_all_devices() to check dev stat flush error

2017-03-28 Thread David Sterba
On Tue, Mar 14, 2017 at 04:26:11PM +0800, Anand Jain wrote: > The objective of this patch is to cleanup barrier_all_devices() > so that the error checking is in a separate loop independent of > of the loop which submits and waits on the device flush requests. I think that getting completely rid

Re: Shrinking a device - performance?

2017-03-28 Thread Austin S. Hemmelgarn
On 2017-03-28 10:43, Peter Grandi wrote: This is going to be long because I am writing something detailed hoping pointlessly that someone in the future will find it by searching the list archives while doing research before setting up a new storage system, and they will be the kind of person

Re: [PATCH 2/4] btrfs: Communicate back ENOMEM when it occurs

2017-03-28 Thread David Sterba
On Mon, Mar 13, 2017 at 03:42:12PM +0800, Anand Jain wrote: > The only error that write dev flush (send) will fail is due > to the ENOMEM then, as its not a device specific error and > rather a system wide issue, we should rather stop further > iterations and perpetuate the -ENOMEM error to the

Re: Shrinking a device - performance?

2017-03-28 Thread Tomasz Kusmierz
I’ve glazed over on “Not only that …” … can you make youtube video of that : > On 28 Mar 2017, at 16:06, Peter Grandi wrote: > >> I glazed over at “This is going to be long” … :) >>> [ ... ] > > Not only that, you also top-posted while quoting it pointlessly > in

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Austin S. Hemmelgarn
On 2017-03-28 09:53, Marat Khalili wrote: There are a couple of reasons I'm advocating the specific behavior I outlined: Some of your points are valid, but some break current behaviour and expectations or create technical difficulties. 1. It doesn't require any specific qgroup setup. By

Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
> I glazed over at “This is going to be long” … :) >> [ ... ] Not only that, you also top-posted while quoting it pointlessly in its entirety, to the whole mailing list. Well played :-). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to

Re: [PATCH 1/4] btrfs: REQ_PREFLUSH does not use btrfs_end_bio() completion callback

2017-03-28 Thread David Sterba
On Mon, Mar 13, 2017 at 03:42:11PM +0800, Anand Jain wrote: > REQ_PREFLUSH bio to flush dev cache uses btrfs_end_empty_barrier() > completion callback only, as of now, and there it accounts for dev > stat flush errors BTRFS_DEV_STAT_FLUSH_ERRS, so remove it from the > btrfs_end_bio(). Can you

Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
> [ ... ] slaps together a large storage system in the cheapest > and quickest way knowing that while it is mostly empty it will > seem very fast regardless and therefore to have awesome > performance, and then the "clever" sysadm disappears surrounded > by a halo of glory before the storage

Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
> [ ... ] reminded of all the cases where someone left me to > decatastrophize a storage system built on "optimistic" > assumptions. In particular when some "clever" sysadm with a "clever" (or dumb) manager slaps together a large storage system in the cheapest and quickest way knowing that while

Re: Shrinking a device - performance?

2017-03-28 Thread Tomasz Kusmierz
I glazed over at “This is going to be long” … :) > On 28 Mar 2017, at 15:43, Peter Grandi wrote: > > This is going to be long because I am writing something detailed > hoping pointlessly that someone in the future will find it by > searching the list archives while

Re: Shrinking a device - performance?

2017-03-28 Thread Peter Grandi
This is going to be long because I am writing something detailed hoping pointlessly that someone in the future will find it by searching the list archives while doing research before setting up a new storage system, and they will be the kind of person that tolerates reading messages longer than

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Marat Khalili
There are a couple of reasons I'm advocating the specific behavior I outlined: Some of your points are valid, but some break current behaviour and expectations or create technical difficulties. 1. It doesn't require any specific qgroup setup. By definition, you can be 100% certain that the

[PATCH] btrfs: sink GFP flags parameter to tree_mod_log_insert_move

2017-03-28 Thread David Sterba
All (1) callers pass the same value. Signed-off-by: David Sterba --- fs/btrfs/ctree.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 7dc8844037e0..d034d47c5470 100644 --- a/fs/btrfs/ctree.c +++

[PATCH] btrfs: sink GFP flags parameter to tree_mod_log_insert_root

2017-03-28 Thread David Sterba
All (1) callers pass the same value. Signed-off-by: David Sterba --- fs/btrfs/ctree.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index d034d47c5470..165e7ec12af7 100644 --- a/fs/btrfs/ctree.c +++

[PATCH] btrfs: track exclusive filesystem operation in flags

2017-03-28 Thread David Sterba
There are several operations, usually started from ioctls, that cannot run concurrently. The status is tracked in mutually_exclusive_operation_running as an atomic_t. We can easily track the status as one of the per-filesystem flag bits with same synchronization guarantees. The conversion

Re: [PATCH v2] Btrfs: fix unexpected file hole after disk errors

2017-03-28 Thread David Sterba
On Mon, Mar 06, 2017 at 12:23:30PM -0800, Liu Bo wrote: > Btrfs creates hole extents to cover any unwritten section right before > doing buffer writes after commit 3ac0d7b96a26 ("btrfs: Change the expanding > write sequence to fix snapshot related bug."). > > However, that takes the start

[PATCH] btrfs: drop redundant parameters from btrfs_map_sblock

2017-03-28 Thread David Sterba
All callers pass 0 for mirror_num and 1 for need_raid_map. Signed-off-by: David Sterba --- fs/btrfs/scrub.c | 6 +++--- fs/btrfs/volumes.c | 6 ++ fs/btrfs/volumes.h | 3 +-- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/scrub.c

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Austin S. Hemmelgarn
On 2017-03-28 08:00, Marat Khalili wrote: The default should be to inherit the qgroup of the parent subvolume. This behaviour is only good for this particular use-case. In general case, qgroups of subvolume and snapshots should exist separately, and both can be included in some higher level

Re: [PATCH v5] qgroup: Retry after commit on getting EDQUOT

2017-03-28 Thread David Sterba
On Mon, Mar 27, 2017 at 01:13:46PM -0500, Goldwyn Rodrigues wrote: > > > On 03/27/2017 12:36 PM, David Sterba wrote: > > On Mon, Mar 27, 2017 at 12:29:57PM -0500, Goldwyn Rodrigues wrote: > >> From: Goldwyn Rodrigues > >> > >> We are facing the same problem with EDQUOT which

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Marat Khalili
The default should be to inherit the qgroup of the parent subvolume. This behaviour is only good for this particular use-case. In general case, qgroups of subvolume and snapshots should exist separately, and both can be included in some higher level qgroup (after all, that's what qgroup

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Austin S. Hemmelgarn
On 2017-03-27 21:49, Qu Wenruo wrote: At 03/27/2017 08:01 PM, Austin S. Hemmelgarn wrote: On 2017-03-27 07:02, Moritz Sichert wrote: Am 27.03.2017 um 05:46 schrieb Qu Wenruo: At 03/27/2017 11:26 AM, Andrei Borzenkov wrote: 27.03.2017 03:39, Qu Wenruo пишет: At 03/26/2017 06:03 AM,

Re: [PATCH v2 0/2] Cleanup for some hardcoded constants

2017-03-28 Thread David Sterba
On Thu, Mar 16, 2017 at 10:04:32AM -0600, ednadol...@gmail.com wrote: > From: Edmund Nadolski > > This series replaces several hard-coded values with descriptive > symbols. > > --- > v2: > + rename SEQ_NONE to SEQ_LAST and move definition to ctree.h > + clarify comment at

Re: Qgroups are not applied when snapshotting a subvol?

2017-03-28 Thread Austin S. Hemmelgarn
On 2017-03-27 15:32, Chris Murphy wrote: How about if qgroups are enabled, then non-root user is prevented from creating new subvolumes? Or is there a way for a new nested subvolume to be included in its parent's quota, rather than the new subvolume having a whole new quota limit? Tricky

Re: [PATCH] btrfs-progs: fix missing __error symbol in libbtrfs.so.0

2017-03-28 Thread David Sterba
On Mon, Mar 27, 2017 at 10:07:20PM +0100, sly...@gmail.com wrote: > From: Sergei Trofimovich > > The easiest way to reproduce the error is to try to build > btrfs-progs with > $ make LDFLAGS=-Wl,--no-undefined > > btrfs-list.o: In function `lookup_ino_path': >

Re: __link_block_group uses GFP_KERNEL

2017-03-28 Thread Denis Kirjanov
On 3/27/17, David Sterba wrote: > On Sat, Mar 25, 2017 at 09:48:28AM +0300, Denis Kirjanov wrote: >> On 3/25/17, Jeff Mahoney wrote: >> > On 3/24/17 5:02 AM, Denis Kirjanov wrote: >> >> Hi guys, >> >> >> >> Looks like that current code does GFP_KERNEL allocation

Re: [PATCH v2 1/5] btrfs: scrub: Introduce full stripe lock for RAID56

2017-03-28 Thread Qu Wenruo
Thanks for the review first. At 03/28/2017 12:38 AM, David Sterba wrote: On Fri, Mar 24, 2017 at 10:00:23AM +0800, Qu Wenruo wrote: Unlike mirror based profiles, RAID5/6 recovery needs to read out the whole full stripe. And if we don't do proper protect, it can easily cause race condition.