For FLUSH_LIMIT flushers we really can only allocate chunks and flush
delayed inode items, everything else is problematic. I added a bunch of
new states and it lead to weirdness in the FLUSH_LIMIT case because I
forgot about how it worked. So instead explicitly declare the states
that are ok for
We've done this forever because of the voodoo around knowing how much
space we have. However we have better ways of doing this now, and on
normal file systems we'll easily have a global reserve of 512MiB, and
since metadata chunks are usually 1GiB that means we'll allocate
metadata chunks more
With my change to no longer take into account the global reserve for
metadata allocation chunks we have this side-effect for mixed block
group fs'es where we are no longer allocating enough chunks for the
data/metadata requirements. To deal with this add a ALLOC_CHUNK_FORCE
step to the flushing
may_commit_transaction will skip committing the transaction if we don't
have enough pinned space or if we're trying to find space for a SYSTEM
chunk. However if we have pending free block groups in this transaction
we still want to commit as we may be able to allocate a chunk to make
our
With severe fragmentation we can end up with our inode rsv size being
huge during writeout, which would cause us to need to make very large
metadata reservations. However we may not actually need that much once
writeout is complete. So instead try to make our reservation, and if we
couldn't make
For enospc_debug having the block rsvs is super helpful to see if we've
done something wrong.
Signed-off-by: Josef Bacik
Reviewed-by: Omar Sandoval
Reviewed-by: David Sterba
---
fs/btrfs/extent-tree.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/fs/btrfs/extent-tree.c
We could generate a lot of delayed refs in evict but never have any left
over space from our block rsv to make up for that fact. So reserve some
extra space and give it to the transaction so it can be used to refill
the delayed refs rsv every loop through the truncate path.
Signed-off-by: Josef
With the introduction of the per-inode block_rsv it became possible to
have really really large reservation requests made because of data
fragmentation. Since the ticket stuff assumed that we'd always have
relatively small reservation requests it just killed all tickets if we
were unable to
The delayed refs rsv patches exposed a bunch of issues in our enospc
infrastructure that needed to be addressed. These aren't really one coherent
group, but they are all around flushing and reservations.
may_commit_transaction() needed to be updated a little bit, and we needed to add
a new state
I noticed in a giant dbench run that we spent a lot of time on lock
contention while running transaction commit. This is because dbench
results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
they all run the delayed refs first thing, so they all contend with
each other. This
From: Josef Bacik
Traditionally we've had voodoo in btrfs to account for the space that
delayed refs may take up by having a global_block_rsv. This works most
of the time, except when it doesn't. We've had issues reported and seen
in production where sometimes the global reserve is exhausted
We have a bunch of magic to make sure we're throttling delayed refs when
truncating a file. Now that we have a delayed refs rsv and a mechanism
for refilling that reserve simply use that instead of all of this magic.
Signed-off-by: Josef Bacik
---
fs/btrfs/inode.c | 79
We weren't doing any of the accounting cleanup when we aborted
transactions. Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.
Signed-off-by: Josef Bacik
---
fs/btrfs/ctree.h
Instead of open coding this stuff use the helper instead.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/disk-io.c | 7 +--
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f062fb0487cd..7d02748cf3f6 100644
---
We have this open coded in btrfs_destroy_delayed_refs, use the helper
instead.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/disk-io.c | 11 ++-
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index
A new xfstests that really hammers on transaction aborts (generic/495 I think?)
uncovered a lot of random issues. Some of these were introduced with the new
delayed refs rsv patches, others were just exposed by them, such as the pending
bg stuff. With these patches in place I stopped getting all
From: Josef Bacik
We were missing some quota cleanups in check_ref_cleanup, so break the
ref head accounting cleanup into a helper and call that from both
check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that
we don't screw up accounting in the future for other things that we
From: Josef Bacik
The cleanup_extent_op function actually would run the extent_op if it
needed running, which made the name sort of a misnomer. Change it to
run_and_cleanup_extent_op, and move the actual cleanup work to
cleanup_extent_op so it can be used by check_ref_cleanup() in order to
From: Josef Bacik
We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
into a helper and cleanup the calling functions.
Signed-off-by: Josef Bacik
Reviewed-by: Omar Sandoval
---
fs/btrfs/delayed-ref.c | 14 ++
fs/btrfs/delayed-ref.h | 3 ++-
This patchset changes how we do space reservations for delayed refs. We were
hitting probably 20-40 enospc abort's per day in production while running
delayed refs at transaction commit time. This means we ran out of space in the
global reserve and couldn't easily get more space in
From: Josef Bacik
We use this number to figure out how many delayed refs to run, but
__btrfs_run_delayed_refs really only checks every time we need a new
delayed ref head, so we always run at least one ref head completely no
matter what the number of items on it. Fix the accounting to only be
The first thing we do is loop through the list, this
if (!list_empty())
btrfs_create_pending_block_groups();
thing is just wasted space.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/extent-tree.c | 3 +--
fs/btrfs/transaction.c | 6 ++
2 files changed, 3
We may abort the transaction during a commit and not have a chance to
run the pending bgs stuff, which will leave block groups on our list and
cause us accounting issues and leaked memory. Fix this by running the
pending bgs when we cleanup a transaction.
Reviewed-by: Omar Sandoval
Here are some delayed iput fixes. Delayed iputs can hold reservations for a
while and there's no real good way to make sure they were gone for good, which
means we could early enospc when in reality if we had just waited for the iput
we would have had plenty of space. So fix this up by making us
The throttle path doesn't take cleaner_delayed_iput_mutex, which means
we could think we're done flushing iputs in the data space reservation
path when we could have a throttler doing an iput. There's no real
reason to serialize the delayed iput flushing, so instead of taking the
Deduplicate the btrfs file type conversion implementation - file systems
that use the same file types as defined by POSIX do not need to define
their own versions and can use the common helper functions decared in
fs_types.h and implemented in fs_types.c
Acked-by: David Sterba
Signed-off-by:
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening. Fix this
by making sure we wait on ordered extents if we have to do the aborted
We still need to do all of the accounting cleanup for pending block
groups if we abort. So set the ret to trans->aborted so if we aborted
the cleanup happens and everybody is happy.
Reviewed-by: Omar Sandoval
Signed-off-by: Josef Bacik
---
fs/btrfs/extent-tree.c | 8 +++-
1 file changed,
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd. So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is
The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path. The cleaner
thread only gets woken up every 30 seconds, so instead wake it up to do
it's work so that we can free up that space as quickly as possible.
Reviewed-by: Filipe
On Mon, Nov 19, 2018 at 04:20:34PM +, fdman...@kernel.org wrote:
> From: Filipe Manana
>
> We have a race between enabling quotas end subvolume creation that cause
> subvolume creation to fail with -EINVAL, and the following diagram shows
> how it happens:
>
> CPU 0
Running btrfs/124 in a loop hung up on me sporadically with the
following call trace:
btrfs D0 5760 5324 0x
Call Trace:
? __schedule+0x243/0x800
schedule+0x33/0x90
btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
?
On 2018/11/22 上午2:59, Josef Bacik wrote:
> From: Josef Bacik
>
> We were missing some quota cleanups in check_ref_cleanup, so break the
> ref head accounting cleanup into a helper and call that from both
> check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that
> we don't
v12:
Fixed coding style - leave space between " : ".
v11:
btrfs-progs: Bring the code into the else part of if(forget).
Use strerror to print the erorr instead of ret.
v10:
Make btrfs-progs changes more readable.
With an effort to keep the known bug [1] as it is..
[1]
The
[I will send this the xfstest ML after kernel and progs patch
has been integrated].
btrfs reload was introduced to cleanup the device list inside the btrfs
kernel module.
The problem with the reload approach is that you can't run btrfs test
cases 124,125, 154 and 164 on the system with btrfs as
Support for a new command 'btrfs dev forget [dev]' is proposed here
to undo the effects of 'btrfs dev scan [dev]'. For this purpose
this patch proposes to use ioctl #5 as it was empty.
IOW(BTRFS_IOCTL_MAGIC, 5, ..)
This patch adds new ioctl BTRFS_IOC_FORGET_DEV which can be sent from
the
This patch adds cli
btrfs device forget [dev]
to remove the given device structure in the kernel if the device
is unmounted. If no argument is given it shall remove all stale
(device which are not mounted) from the kernel.
Signed-off-by: Anand Jain
Reviewed-by: Nikolay Borisov
---
v11->v12:
On 21.11.18 г. 21:10 ч., Josef Bacik wrote:
> I noticed in a giant dbench run that we spent a lot of time on lock
> contention while running transaction commit. This is because dbench
> results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
> they all run the delayed refs
The code fails if the third section is missing (like "4.18") or is followed
by anything but "." or "-". This happens for example if we're not exactly
at a tag and CONFIG_LOCALVERSION_AUTO=n (which results in "4.18.5+").
Signed-off-by: Adam Borowski
---
fsfeatures.c | 5 +
1 file changed, 1
Defragging an executable conflicts both way with it being run, resulting in
ETXTBSY. This either makes defrag fail or prevents the program from being
executed.
Kernels 4.19-rc1 and later allow defragging files you could have possibly
opened rw, even if the passed descriptor is ro (commit
On Mon, Nov 19, 2018 at 09:48:12AM +, fdman...@kernel.org wrote:
> From: Filipe Manana
>
> The available allocation bits members from struct btrfs_fs_info are
> protected by a sequence lock, and when starting balance we access them
> incorrectly in two different ways:
>
> 1) In the read
41 matches
Mail list logo