Re: [PATCH 01/21] Btrfs: rework outstanding_extents
On Fri, Oct 13, 2017 at 04:55:58PM +0300, Nikolay Borisov wrote: > > > > > The outstanding_extents accounting is consistent with only the items needed > > to > > handle the outstanding extent items. However since changing the inode > > requires > > updating the inode item as well we have to keep this floating reservation > > for > > the inode item until we have 0 outstanding extents. The way we do this is > > with > > the BTRFS_INODE_DELALLOC_META_RESERVED flag. So if it isn't set we will > > allocate nr_exntents + 1 in btrfs_delalloc_reserve_metadata() and then set > > our > > bit. If we ever steal this reservation we make sure to clear the flag so we > > know we don't have to clean it up when outstanding_extents goes to 0. It's > > not > > super intuitive but needs to be done under the BTRFS_I(inode)->lock so this > > was > > the best place to put it. I suppose we could move the logic out of here > > and put > > it somewhere else to make it more clear. > > I think defining this logic in its own, discrete block of code would be > best w.r.t readibility. It's not super obvious. > I went to do this and realized that I rip all of this out when we switch to per-inode block rsvs, so I'm just going to leave this patch as is. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/21] Btrfs: rework outstanding_extents
just a few quick things for the changelog: On 09/29/2017 01:43 PM, Josef Bacik wrote: > Right now we do a lot of weird hoops around outstanding_extents in order > to keep the extent count consistent. This is because we logically > transfer the outstanding_extent count from the initial reservation > through the set_delalloc_bits. This makes it pretty difficult to get a > handle on how and when we need to mess with outstanding_extents. > > Fix this by revamping the rules of how we deal with outstanding_extents. > Now instead everybody that is holding on to a delalloc extent is > required to increase the outstanding extents count for itself. This > means we'll have something like this > > btrfs_dealloc_reserve_metadata- outstanding_extents = 1 s/dealloc/delalloc/ > btrfs_set_delalloc - outstanding_extents = 2 should be btrfs_set_extent_delalloc? > btrfs_release_delalloc_extents- outstanding_extents = 1 > > for an initial file write. Now take the append write where we extend an > existing delalloc range but still under the maximum extent size > > btrfs_delalloc_reserve_metadata - outstanding_extents = 2 > btrfs_set_delalloc btrfs_set_extent_delalloc? > btrfs_set_bit_hook- outstanding_extents = 3 > btrfs_merge_bit_hook - outstanding_extents = 2 should be btrfs_clear_bit_hook? (or btrfs_merge_extent_hook?) > btrfs_release_delalloc_extents- outstanding_extnets = 1 btrfs_delalloc_release_metadata? > > In order to make the ordered extent transition we of course must now > make ordered extents carry their own outstanding_extent reservation, so > for cow_file_range we end up with > > btrfs_add_ordered_extent - outstanding_extents = 2 > clear_extent_bit - outstanding_extents = 1 > btrfs_remove_ordered_extent - outstanding_extents = 0 > > This makes all manipulations of outstanding_extents much more explicit. > Every successful call to btrfs_reserve_delalloc_metadata _must_ now be ^ btrfs_delalloc_reserve_metadata? Thanks, Ed -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/21] Btrfs: rework outstanding_extents
> > The outstanding_extents accounting is consistent with only the items needed to > handle the outstanding extent items. However since changing the inode > requires > updating the inode item as well we have to keep this floating reservation for > the inode item until we have 0 outstanding extents. The way we do this is > with > the BTRFS_INODE_DELALLOC_META_RESERVED flag. So if it isn't set we will > allocate nr_exntents + 1 in btrfs_delalloc_reserve_metadata() and then set our > bit. If we ever steal this reservation we make sure to clear the flag so we > know we don't have to clean it up when outstanding_extents goes to 0. It's > not > super intuitive but needs to be done under the BTRFS_I(inode)->lock so this > was > the best place to put it. I suppose we could move the logic out of here and > put > it somewhere else to make it more clear. I think defining this logic in its own, discrete block of code would be best w.r.t readibility. It's not super obvious. I'm slowly going through your patchkit so expect more question but otherwise the delalloc stuff after this and patch 03 really start looking a lot more obvious ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/21] Btrfs: rework outstanding_extents
On Fri, Oct 13, 2017 at 09:10:52AM -0400, Josef Bacik wrote: > On Fri, Oct 13, 2017 at 11:39:15AM +0300, Nikolay Borisov wrote: > > > > > > On 29.09.2017 22:43, Josef Bacik wrote: > > > > > > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode > > > *inode, > > > + int mod) > > > +{ > > > + ASSERT(spin_is_locked(>lock)); > > > + inode->outstanding_extents += mod; > > > + if (btrfs_is_free_space_inode(inode)) > > > + return; > > > +} > > > + > > > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode, > > > + int mod) > > > +{ > > > + ASSERT(spin_is_locked(>lock)); > > > > lockdep_assert_held(>lock); for both functions. I've spoken with > > Peterz and he said any other way of checking whether a lock is held, I > > quote, "must die" > > > > Blah I continually forget that's what we're supposed to use now. I try to keep such information on https://btrfs.wiki.kernel.org/index.php/Development_notes#BCP Just started the section with best practices, feel free to add more. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/21] Btrfs: rework outstanding_extents
On Fri, Oct 13, 2017 at 11:39:15AM +0300, Nikolay Borisov wrote: > > > On 29.09.2017 22:43, Josef Bacik wrote: > > > > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode, > > +int mod) > > +{ > > + ASSERT(spin_is_locked(>lock)); > > + inode->outstanding_extents += mod; > > + if (btrfs_is_free_space_inode(inode)) > > + return; > > +} > > + > > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode, > > + int mod) > > +{ > > + ASSERT(spin_is_locked(>lock)); > > lockdep_assert_held(>lock); for both functions. I've spoken with > Peterz and he said any other way of checking whether a lock is held, I > quote, "must die" > Blah I continually forget that's what we're supposed to use now. > > + inode->reserved_extents += mod; > > + if (btrfs_is_free_space_inode(inode)) > > + return; > > +} > > + > > static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 > > generation) > > { > > int ret = 0; > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index a7f68c304b4c..1262612fbf78 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct > > btrfs_root *root, > > u64 *qgroup_reserved, bool use_global_rsv); > > void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info, > > struct btrfs_block_rsv *rsv); > > +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 > > num_bytes); > > + > > int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 > > num_bytes); > > void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 > > num_bytes); > > int btrfs_delalloc_reserve_space(struct inode *inode, > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > index 1a6aced00a19..aa0f5c8953b0 100644 > > --- a/fs/btrfs/extent-tree.c > > +++ b/fs/btrfs/extent-tree.c > > @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct > > btrfs_fs_info *fs_info, > > } > > > > /** > > - * drop_outstanding_extent - drop an outstanding extent > > + * drop_over_reserved_extents - drop our extra extent reservations > > * @inode: the inode we're dropping the extent for > > - * @num_bytes: the number of bytes we're releasing. > > * > > - * This is called when we are freeing up an outstanding extent, either > > called > > - * after an error or after an extent is written. This will return the > > number of > > - * reserved extents that need to be freed. This must be called with > > - * BTRFS_I(inode)->lock held. > > + * We reserve extents we may use, but they may have been merged with other > > + * extents and we may not need the extra reservation. > > + * > > + * We also call this when we've completed io to an extent or had an error > > and > > + * cleared the outstanding extent, in either case we no longer need our > > + * reservation and can drop the excess. > > */ > > -static unsigned drop_outstanding_extent(struct btrfs_inode *inode, > > - u64 num_bytes) > > +static unsigned drop_over_reserved_extents(struct btrfs_inode *inode) > > { > > - unsigned drop_inode_space = 0; > > - unsigned dropped_extents = 0; > > - unsigned num_extents; > > + unsigned num_extents = 0; > > > > - num_extents = count_max_extents(num_bytes); > > - ASSERT(num_extents); > > - ASSERT(inode->outstanding_extents >= num_extents); > > - inode->outstanding_extents -= num_extents; > > + if (inode->reserved_extents > inode->outstanding_extents) { > > + num_extents = inode->reserved_extents - > > + inode->outstanding_extents; > > + btrfs_mod_reserved_extents(inode, -num_extents); > > + } > > > > if (inode->outstanding_extents == 0 && > > test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED, > >>runtime_flags)) > > - drop_inode_space = 1; > > - > > - /* > > -* If we have more or the same amount of outstanding extents than we > > have > > -* reserved then we need to leave the reserved extents count alone. > > -*/ > > - if (inode->outstanding_extents >= inode->reserved_extents) > > - return drop_inode_space; > > - > > - dropped_extents = inode->reserved_extents - inode->outstanding_extents; > > - inode->reserved_extents -= dropped_extents; > > - return dropped_extents + drop_inode_space; > > + num_extents++; > > + return num_extents; > > Something bugs me around the handling of this. In > btrfs_delalloc_reserve_metadata we do add the additional bytes necessary > for updating the inode. However outstanding_extents is modified with the > number of extent items necessary to cover the requires byte range but we > don't account the extra inode item as being an extent. Then why do we > have to actually increment
Re: [PATCH 01/21] Btrfs: rework outstanding_extents
On 29.09.2017 22:43, Josef Bacik wrote: > > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode, > + int mod) > +{ > + ASSERT(spin_is_locked(>lock)); > + inode->outstanding_extents += mod; > + if (btrfs_is_free_space_inode(inode)) > + return; > +} > + > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode, > + int mod) > +{ > + ASSERT(spin_is_locked(>lock)); lockdep_assert_held(>lock); for both functions. I've spoken with Peterz and he said any other way of checking whether a lock is held, I quote, "must die" > + inode->reserved_extents += mod; > + if (btrfs_is_free_space_inode(inode)) > + return; > +} > + > static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 > generation) > { > int ret = 0; > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index a7f68c304b4c..1262612fbf78 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root > *root, >u64 *qgroup_reserved, bool use_global_rsv); > void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info, > struct btrfs_block_rsv *rsv); > +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 > num_bytes); > + > int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 > num_bytes); > void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 > num_bytes); > int btrfs_delalloc_reserve_space(struct inode *inode, > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 1a6aced00a19..aa0f5c8953b0 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct > btrfs_fs_info *fs_info, > } > > /** > - * drop_outstanding_extent - drop an outstanding extent > + * drop_over_reserved_extents - drop our extra extent reservations > * @inode: the inode we're dropping the extent for > - * @num_bytes: the number of bytes we're releasing. > * > - * This is called when we are freeing up an outstanding extent, either called > - * after an error or after an extent is written. This will return the > number of > - * reserved extents that need to be freed. This must be called with > - * BTRFS_I(inode)->lock held. > + * We reserve extents we may use, but they may have been merged with other > + * extents and we may not need the extra reservation. > + * > + * We also call this when we've completed io to an extent or had an error and > + * cleared the outstanding extent, in either case we no longer need our > + * reservation and can drop the excess. > */ > -static unsigned drop_outstanding_extent(struct btrfs_inode *inode, > - u64 num_bytes) > +static unsigned drop_over_reserved_extents(struct btrfs_inode *inode) > { > - unsigned drop_inode_space = 0; > - unsigned dropped_extents = 0; > - unsigned num_extents; > + unsigned num_extents = 0; > > - num_extents = count_max_extents(num_bytes); > - ASSERT(num_extents); > - ASSERT(inode->outstanding_extents >= num_extents); > - inode->outstanding_extents -= num_extents; > + if (inode->reserved_extents > inode->outstanding_extents) { > + num_extents = inode->reserved_extents - > + inode->outstanding_extents; > + btrfs_mod_reserved_extents(inode, -num_extents); > + } > > if (inode->outstanding_extents == 0 && > test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED, > >runtime_flags)) > - drop_inode_space = 1; > - > - /* > - * If we have more or the same amount of outstanding extents than we > have > - * reserved then we need to leave the reserved extents count alone. > - */ > - if (inode->outstanding_extents >= inode->reserved_extents) > - return drop_inode_space; > - > - dropped_extents = inode->reserved_extents - inode->outstanding_extents; > - inode->reserved_extents -= dropped_extents; > - return dropped_extents + drop_inode_space; > + num_extents++; > + return num_extents; Something bugs me around the handling of this. In btrfs_delalloc_reserve_metadata we do add the additional bytes necessary for updating the inode. However outstanding_extents is modified with the number of extent items necessary to cover the requires byte range but we don't account the extra inode item as being an extent. Then why do we have to actually increment the num_extents here? Doesn't this lead to underflow? To illustrate: A write comes for 1 mb, we count this as 1 outstanding extent in delalloc_reserve_metadata: nr_extents = count_max_extents(num_bytes); inode->outstanding_extents += nr_extents; We do account the extra inode item but only
[PATCH 01/21] Btrfs: rework outstanding_extents
Right now we do a lot of weird hoops around outstanding_extents in order to keep the extent count consistent. This is because we logically transfer the outstanding_extent count from the initial reservation through the set_delalloc_bits. This makes it pretty difficult to get a handle on how and when we need to mess with outstanding_extents. Fix this by revamping the rules of how we deal with outstanding_extents. Now instead everybody that is holding on to a delalloc extent is required to increase the outstanding extents count for itself. This means we'll have something like this btrfs_dealloc_reserve_metadata - outstanding_extents = 1 btrfs_set_delalloc - outstanding_extents = 2 btrfs_release_delalloc_extents - outstanding_extents = 1 for an initial file write. Now take the append write where we extend an existing delalloc range but still under the maximum extent size btrfs_delalloc_reserve_metadata - outstanding_extents = 2 btrfs_set_delalloc btrfs_set_bit_hook - outstanding_extents = 3 btrfs_merge_bit_hook- outstanding_extents = 2 btrfs_release_delalloc_extents - outstanding_extnets = 1 In order to make the ordered extent transition we of course must now make ordered extents carry their own outstanding_extent reservation, so for cow_file_range we end up with btrfs_add_ordered_extent- outstanding_extents = 2 clear_extent_bit- outstanding_extents = 1 btrfs_remove_ordered_extent - outstanding_extents = 0 This makes all manipulations of outstanding_extents much more explicit. Every successful call to btrfs_reserve_delalloc_metadata _must_ now be combined with btrfs_release_delalloc_extents, even in the error case, as that is the only function that actually modifies the outstanding_extents counter. The drawback to this is now we are much more likely to have transient cases where outstanding_extents is much larger than it actually should be. This could happen before as we manipulated the delalloc bits, but now it happens basically at every write. This may put more pressure on the ENOSPC flushing code, but I think making this code simpler is worth the cost. I have another change coming to mitigate this side-effect somewhat. I also added trace points for the counter manipulation. These were used by a bpf script I wrote to help track down leak issues. Signed-off-by: Josef Bacik--- fs/btrfs/btrfs_inode.h | 18 ++ fs/btrfs/ctree.h | 2 + fs/btrfs/extent-tree.c | 139 --- fs/btrfs/file.c | 22 +++ fs/btrfs/inode-map.c | 3 +- fs/btrfs/inode.c | 114 +++ fs/btrfs/ioctl.c | 2 + fs/btrfs/ordered-data.c | 21 ++- fs/btrfs/relocation.c| 3 + fs/btrfs/tests/inode-tests.c | 18 ++ 10 files changed, 186 insertions(+), 156 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index eccadb5f62a5..a6a22ef41f91 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -267,6 +267,24 @@ static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode) return false; } +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode, +int mod) +{ + ASSERT(spin_is_locked(>lock)); + inode->outstanding_extents += mod; + if (btrfs_is_free_space_inode(inode)) + return; +} + +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode, + int mod) +{ + ASSERT(spin_is_locked(>lock)); + inode->reserved_extents += mod; + if (btrfs_is_free_space_inode(inode)) + return; +} + static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation) { int ret = 0; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a7f68c304b4c..1262612fbf78 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, u64 *qgroup_reserved, bool use_global_rsv); void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info, struct btrfs_block_rsv *rsv); +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes); + int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes); void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes); int btrfs_delalloc_reserve_space(struct inode *inode, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1a6aced00a19..aa0f5c8953b0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info, } /** - * drop_outstanding_extent - drop an outstanding extent + *