[PATCH 7/8] btrfs: be more explicit about allowed flush states

2018-11-21 Thread Josef Bacik
For FLUSH_LIMIT flushers we really can only allocate chunks and flush delayed inode items, everything else is problematic. I added a bunch of new states and it lead to weirdness in the FLUSH_LIMIT case because I forgot about how it worked. So instead explicitly declare the states that are ok for

[PATCH 3/8] btrfs: don't use global rsv for chunk allocation

2018-11-21 Thread Josef Bacik
We've done this forever because of the voodoo around knowing how much space we have. However we have better ways of doing this now, and on normal file systems we'll easily have a global reserve of 512MiB, and since metadata chunks are usually 1GiB that means we'll allocate metadata chunks more

[PATCH 4/8] btrfs: add ALLOC_CHUNK_FORCE to the flushing code

2018-11-21 Thread Josef Bacik
With my change to no longer take into account the global reserve for metadata allocation chunks we have this side-effect for mixed block group fs'es where we are no longer allocating enough chunks for the data/metadata requirements. To deal with this add a ALLOC_CHUNK_FORCE step to the flushing

[PATCH 1/8] btrfs: check if free bgs for commit

2018-11-21 Thread Josef Bacik
may_commit_transaction will skip committing the transaction if we don't have enough pinned space or if we're trying to find space for a SYSTEM chunk. However if we have pending free block groups in this transaction we still want to commit as we may be able to allocate a chunk to make our

[PATCH 6/8] btrfs: loop in inode_rsv_refill

2018-11-21 Thread Josef Bacik
With severe fragmentation we can end up with our inode rsv size being huge during writeout, which would cause us to need to make very large metadata reservations. However we may not actually need that much once writeout is complete. So instead try to make our reservation, and if we couldn't make

[PATCH 2/8] btrfs: dump block_rsv whe dumping space info

2018-11-21 Thread Josef Bacik
For enospc_debug having the block rsvs is super helpful to see if we've done something wrong. Signed-off-by: Josef Bacik Reviewed-by: Omar Sandoval Reviewed-by: David Sterba --- fs/btrfs/extent-tree.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/fs/btrfs/extent-tree.c

[PATCH 8/8] btrfs: reserve extra space during evict()

2018-11-21 Thread Josef Bacik
We could generate a lot of delayed refs in evict but never have any left over space from our block rsv to make up for that fact. So reserve some extra space and give it to the transaction so it can be used to refill the delayed refs rsv every loop through the truncate path. Signed-off-by: Josef

[PATCH 5/8] btrfs: don't enospc all tickets on flush failure

2018-11-21 Thread Josef Bacik
With the introduction of the per-inode block_rsv it became possible to have really really large reservation requests made because of data fragmentation. Since the ticket stuff assumed that we'd always have relatively small reservation requests it just killed all tickets if we were unable to

[PATCH 0/8] Enospc cleanups and fixes

2018-11-21 Thread Josef Bacik
The delayed refs rsv patches exposed a bunch of issues in our enospc infrastructure that needed to be addressed. These aren't really one coherent group, but they are all around flushing and reservations. may_commit_transaction() needed to be updated a little bit, and we needed to add a new state

[PATCH] btrfs: only run delayed refs if we're committing

2018-11-21 Thread Josef Bacik
I noticed in a giant dbench run that we spent a lot of time on lock contention while running transaction commit. This is because dbench results in a lot of fsync()'s that do a btrfs_transaction_commit(), and they all run the delayed refs first thing, so they all contend with each other. This

[PATCH 5/6] btrfs: introduce delayed_refs_rsv

2018-11-21 Thread Josef Bacik
From: Josef Bacik Traditionally we've had voodoo in btrfs to account for the space that delayed refs may take up by having a global_block_rsv. This works most of the time, except when it doesn't. We've had issues reported and seen in production where sometimes the global reserve is exhausted

[PATCH 6/6] btrfs: fix truncate throttling

2018-11-21 Thread Josef Bacik
We have a bunch of magic to make sure we're throttling delayed refs when truncating a file. Now that we have a delayed refs rsv and a mechanism for refilling that reserve simply use that instead of all of this magic. Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 79

[PATCH 3/7] btrfs: handle delayed ref head accounting cleanup in abort

2018-11-21 Thread Josef Bacik
We weren't doing any of the accounting cleanup when we aborted transactions. Fix this by making cleanup_ref_head_accounting global and calling it from the abort code, this fixes the issue where our accounting was all wrong after the fs aborts. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h

[PATCH 2/7] btrfs: make btrfs_destroy_delayed_refs use btrfs_delete_ref_head

2018-11-21 Thread Josef Bacik
Instead of open coding this stuff use the helper instead. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f062fb0487cd..7d02748cf3f6 100644 ---

[PATCH 1/7] btrfs: make btrfs_destroy_delayed_refs use btrfs_delayed_ref_lock

2018-11-21 Thread Josef Bacik
We have this open coded in btrfs_destroy_delayed_refs, use the helper instead. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/disk-io.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index

[PATCH 0/7] Abort cleanup fixes

2018-11-21 Thread Josef Bacik
A new xfstests that really hammers on transaction aborts (generic/495 I think?) uncovered a lot of random issues. Some of these were introduced with the new delayed refs rsv patches, others were just exposed by them, such as the pending bg stuff. With these patches in place I stopped getting all

[PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper

2018-11-21 Thread Josef Bacik
From: Josef Bacik We were missing some quota cleanups in check_ref_cleanup, so break the ref head accounting cleanup into a helper and call that from both check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that we don't screw up accounting in the future for other things that we

[PATCH 3/6] btrfs: cleanup extent_op handling

2018-11-21 Thread Josef Bacik
From: Josef Bacik The cleanup_extent_op function actually would run the extent_op if it needed running, which made the name sort of a misnomer. Change it to run_and_cleanup_extent_op, and move the actual cleanup work to cleanup_extent_op so it can be used by check_ref_cleanup() in order to

[PATCH 1/6] btrfs: add btrfs_delete_ref_head helper

2018-11-21 Thread Josef Bacik
From: Josef Bacik We do this dance in cleanup_ref_head and check_ref_cleanup, unify it into a helper and cleanup the calling functions. Signed-off-by: Josef Bacik Reviewed-by: Omar Sandoval --- fs/btrfs/delayed-ref.c | 14 ++ fs/btrfs/delayed-ref.h | 3 ++-

[PATCH 0/6] Delayed refs rsv

2018-11-21 Thread Josef Bacik
This patchset changes how we do space reservations for delayed refs. We were hitting probably 20-40 enospc abort's per day in production while running delayed refs at transaction commit time. This means we ran out of space in the global reserve and couldn't easily get more space in

[PATCH 4/6] btrfs: only track ref_heads in delayed_ref_updates

2018-11-21 Thread Josef Bacik
From: Josef Bacik We use this number to figure out how many delayed refs to run, but __btrfs_run_delayed_refs really only checks every time we need a new delayed ref head, so we always run at least one ref head completely no matter what the number of items on it. Fix the accounting to only be

[PATCH 4/7] btrfs: call btrfs_create_pending_block_groups unconditionally

2018-11-21 Thread Josef Bacik
The first thing we do is loop through the list, this if (!list_empty()) btrfs_create_pending_block_groups(); thing is just wasted space. Reviewed-by: Nikolay Borisov Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 3 +-- fs/btrfs/transaction.c | 6 ++ 2 files changed, 3

[PATCH 6/7] btrfs: cleanup pending bgs on transaction abort

2018-11-21 Thread Josef Bacik
We may abort the transaction during a commit and not have a chance to run the pending bgs stuff, which will leave block groups on our list and cause us accounting issues and leaked memory. Fix this by running the pending bgs when we cleanup a transaction. Reviewed-by: Omar Sandoval

[PATCH 0/3] Delayed iput fixes

2018-11-21 Thread Josef Bacik
Here are some delayed iput fixes. Delayed iputs can hold reservations for a while and there's no real good way to make sure they were gone for good, which means we could early enospc when in reality if we had just waited for the iput we would have had plenty of space. So fix this up by making us

[PATCH 3/3] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue

2018-11-21 Thread Josef Bacik
The throttle path doesn't take cleaner_delayed_iput_mutex, which means we could think we're done flushing iputs in the data space reservation path when we could have a throttler doing an iput. There's no real reason to serialize the delayed iput flushing, so instead of taking the

[RFC][PATCH v4 09/09] btrfs: use common file type conversion

2018-11-21 Thread Phillip Potter
Deduplicate the btrfs file type conversion implementation - file systems that use the same file types as defined by POSIX do not need to define their own versions and can use the common helper functions decared in fs_types.h and implemented in fs_types.c Acked-by: David Sterba Signed-off-by:

[PATCH 7/7] btrfs: wait on ordered extents on abort cleanup

2018-11-21 Thread Josef Bacik
If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted

[PATCH 5/7] btrfs: just delete pending bgs if we are aborted

2018-11-21 Thread Josef Bacik
We still need to do all of the accounting cleanup for pending block groups if we abort. So set the ret to trans->aborted so if we aborted the cleanup happens and everybody is happy. Reviewed-by: Omar Sandoval Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 8 +++- 1 file changed,

[PATCH 1/3] btrfs: run delayed iputs before committing

2018-11-21 Thread Josef Bacik
Delayed iputs means we can have final iputs of deleted inodes in the queue, which could potentially generate a lot of pinned space that could be free'd. So before we decide to commit the transaction for ENOPSC reasons, run the delayed iputs so that any potential space is free'd up. If there is

[PATCH 2/3] btrfs: wakeup cleaner thread when adding delayed iput

2018-11-21 Thread Josef Bacik
The cleaner thread usually takes care of delayed iputs, with the exception of the btrfs_end_transaction_throttle path. The cleaner thread only gets woken up every 30 seconds, so instead wake it up to do it's work so that we can free up that space as quickly as possible. Reviewed-by: Filipe

Re: [PATCH] Btrfs: fix race between enabling quotas and subvolume creation

2018-11-21 Thread David Sterba
On Mon, Nov 19, 2018 at 04:20:34PM +, fdman...@kernel.org wrote: > From: Filipe Manana > > We have a race between enabling quotas end subvolume creation that cause > subvolume creation to fail with -EINVAL, and the following diagram shows > how it happens: > > CPU 0

[PATCHv3] btrfs: Fix error handling in btrfs_cleanup_ordered_extents

2018-11-21 Thread Nikolay Borisov
Running btrfs/124 in a loop hung up on me sporadically with the following call trace: btrfs D0 5760 5324 0x Call Trace: ? __schedule+0x243/0x800 schedule+0x33/0x90 btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs] ?

Re: [PATCH 2/6] btrfs: add cleanup_ref_head_accounting helper

2018-11-21 Thread Qu Wenruo
On 2018/11/22 上午2:59, Josef Bacik wrote: > From: Josef Bacik > > We were missing some quota cleanups in check_ref_cleanup, so break the > ref head accounting cleanup into a helper and call that from both > check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that > we don't

[PATCH v12] Add cli and ioctl to forget scanned device(s)

2018-11-21 Thread Anand Jain
v12: Fixed coding style - leave space between " : ". v11: btrfs-progs: Bring the code into the else part of if(forget). Use strerror to print the erorr instead of ret. v10: Make btrfs-progs changes more readable. With an effort to keep the known bug [1] as it is.. [1] The

[PATCH] fstests: btrfs use forget if not reload

2018-11-21 Thread Anand Jain
[I will send this the xfstest ML after kernel and progs patch has been integrated]. btrfs reload was introduced to cleanup the device list inside the btrfs kernel module. The problem with the reload approach is that you can't run btrfs test cases 124,125, 154 and 164 on the system with btrfs as

[PATCH v12] btrfs: introduce feature to forget a btrfs device

2018-11-21 Thread Anand Jain
Support for a new command 'btrfs dev forget [dev]' is proposed here to undo the effects of 'btrfs dev scan [dev]'. For this purpose this patch proposes to use ioctl #5 as it was empty. IOW(BTRFS_IOCTL_MAGIC, 5, ..) This patch adds new ioctl BTRFS_IOC_FORGET_DEV which can be sent from the

[PATCH v12] btrfs-progs: add cli to forget one or all scanned devices

2018-11-21 Thread Anand Jain
This patch adds cli btrfs device forget [dev] to remove the given device structure in the kernel if the device is unmounted. If no argument is given it shall remove all stale (device which are not mounted) from the kernel. Signed-off-by: Anand Jain Reviewed-by: Nikolay Borisov --- v11->v12:

Re: [PATCH] btrfs: only run delayed refs if we're committing

2018-11-21 Thread Nikolay Borisov
On 21.11.18 г. 21:10 ч., Josef Bacik wrote: > I noticed in a giant dbench run that we spent a lot of time on lock > contention while running transaction commit. This is because dbench > results in a lot of fsync()'s that do a btrfs_transaction_commit(), and > they all run the delayed refs

[PATCH RESEND 1/2] btrfs-progs: fix kernel version parsing on some versions past 3.0

2018-11-21 Thread Adam Borowski
The code fails if the third section is missing (like "4.18") or is followed by anything but "." or "-". This happens for example if we're not exactly at a tag and CONFIG_LOCALVERSION_AUTO=n (which results in "4.18.5+"). Signed-off-by: Adam Borowski --- fsfeatures.c | 5 + 1 file changed, 1

[PATCH RESEND-v3 2/2] btrfs-progs: defrag: open files RO on new enough kernels

2018-11-21 Thread Adam Borowski
Defragging an executable conflicts both way with it being run, resulting in ETXTBSY. This either makes defrag fail or prevents the program from being executed. Kernels 4.19-rc1 and later allow defragging files you could have possibly opened rw, even if the passed descriptor is ro (commit

Re: [PATCH] Btrfs: fix access to available allocation bits when starting balance

2018-11-21 Thread David Sterba
On Mon, Nov 19, 2018 at 09:48:12AM +, fdman...@kernel.org wrote: > From: Filipe Manana > > The available allocation bits members from struct btrfs_fs_info are > protected by a sequence lock, and when starting balance we access them > incorrectly in two different ways: > > 1) In the read