Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-19 Thread Josef Bacik
On Fri, Oct 13, 2017 at 04:55:58PM +0300, Nikolay Borisov wrote:
> 
> > 
> > The outstanding_extents accounting is consistent with only the items needed 
> > to
> > handle the outstanding extent items.  However since changing the inode 
> > requires
> > updating the inode item as well we have to keep this floating reservation 
> > for
> > the inode item until we have 0 outstanding extents.  The way we do this is 
> > with
> > the BTRFS_INODE_DELALLOC_META_RESERVED flag.  So if it isn't set we will
> > allocate nr_exntents + 1 in btrfs_delalloc_reserve_metadata() and then set 
> > our
> > bit.  If we ever steal this reservation we make sure to clear the flag so we
> > know we don't have to clean it up when outstanding_extents goes to 0.  It's 
> > not
> > super intuitive but needs to be done under the BTRFS_I(inode)->lock so this 
> > was
> > the best place to put it.  I suppose we could move the logic out of here 
> > and put
> > it somewhere else to make it more clear.
> 
> I think defining this logic in its own, discrete block of code would be
> best w.r.t readibility. It's not super obvious.
> 

I went to do this and realized that I rip all of this out when we switch to
per-inode block rsvs, so I'm just going to leave this patch as is.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-18 Thread Edmund Nadolski
just a few quick things for the changelog:

On 09/29/2017 01:43 PM, Josef Bacik wrote:
> Right now we do a lot of weird hoops around outstanding_extents in order
> to keep the extent count consistent.  This is because we logically
> transfer the outstanding_extent count from the initial reservation
> through the set_delalloc_bits.  This makes it pretty difficult to get a
> handle on how and when we need to mess with outstanding_extents.
> 
> Fix this by revamping the rules of how we deal with outstanding_extents.
> Now instead everybody that is holding on to a delalloc extent is
> required to increase the outstanding extents count for itself.  This
> means we'll have something like this
> 
> btrfs_dealloc_reserve_metadata- outstanding_extents = 1

s/dealloc/delalloc/


>  btrfs_set_delalloc   - outstanding_extents = 2

should be btrfs_set_extent_delalloc?


> btrfs_release_delalloc_extents- outstanding_extents = 1
> 
> for an initial file write.  Now take the append write where we extend an
> existing delalloc range but still under the maximum extent size
> 
> btrfs_delalloc_reserve_metadata - outstanding_extents = 2
>   btrfs_set_delalloc

btrfs_set_extent_delalloc?


> btrfs_set_bit_hook- outstanding_extents = 3
> btrfs_merge_bit_hook  - outstanding_extents = 2

should be btrfs_clear_bit_hook? (or btrfs_merge_extent_hook?)


> btrfs_release_delalloc_extents- outstanding_extnets = 1

btrfs_delalloc_release_metadata?


> 
> In order to make the ordered extent transition we of course must now
> make ordered extents carry their own outstanding_extent reservation, so
> for cow_file_range we end up with
> 
> btrfs_add_ordered_extent  - outstanding_extents = 2
> clear_extent_bit  - outstanding_extents = 1
> btrfs_remove_ordered_extent   - outstanding_extents = 0
> 
> This makes all manipulations of outstanding_extents much more explicit.
> Every successful call to btrfs_reserve_delalloc_metadata _must_ now be
   ^
btrfs_delalloc_reserve_metadata?


Thanks,
Ed

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread Nikolay Borisov

> 
> The outstanding_extents accounting is consistent with only the items needed to
> handle the outstanding extent items.  However since changing the inode 
> requires
> updating the inode item as well we have to keep this floating reservation for
> the inode item until we have 0 outstanding extents.  The way we do this is 
> with
> the BTRFS_INODE_DELALLOC_META_RESERVED flag.  So if it isn't set we will
> allocate nr_exntents + 1 in btrfs_delalloc_reserve_metadata() and then set our
> bit.  If we ever steal this reservation we make sure to clear the flag so we
> know we don't have to clean it up when outstanding_extents goes to 0.  It's 
> not
> super intuitive but needs to be done under the BTRFS_I(inode)->lock so this 
> was
> the best place to put it.  I suppose we could move the logic out of here and 
> put
> it somewhere else to make it more clear.

I think defining this logic in its own, discrete block of code would be
best w.r.t readibility. It's not super obvious.


I'm slowly going through your patchkit so expect more question but
otherwise the delalloc stuff after this and patch 03 really start
looking a lot more obvious !


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread David Sterba
On Fri, Oct 13, 2017 at 09:10:52AM -0400, Josef Bacik wrote:
> On Fri, Oct 13, 2017 at 11:39:15AM +0300, Nikolay Borisov wrote:
> > 
> > 
> > On 29.09.2017 22:43, Josef Bacik wrote:
> > >  
> > > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode 
> > > *inode,
> > > +  int mod)
> > > +{
> > > + ASSERT(spin_is_locked(>lock));
> > > + inode->outstanding_extents += mod;
> > > + if (btrfs_is_free_space_inode(inode))
> > > + return;
> > > +}
> > > +
> > > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> > > +   int mod)
> > > +{
> > > + ASSERT(spin_is_locked(>lock));
> > 
> > lockdep_assert_held(>lock); for both functions. I've spoken with
> > Peterz and he said any other way of checking whether a lock is held, I
> > quote, "must die"
> > 
> 
> Blah I continually forget that's what we're supposed to use now.

I try to keep such information on
https://btrfs.wiki.kernel.org/index.php/Development_notes#BCP

Just started the section with best practices, feel free to add more.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread Josef Bacik
On Fri, Oct 13, 2017 at 11:39:15AM +0300, Nikolay Borisov wrote:
> 
> 
> On 29.09.2017 22:43, Josef Bacik wrote:
> >  
> > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
> > +int mod)
> > +{
> > +   ASSERT(spin_is_locked(>lock));
> > +   inode->outstanding_extents += mod;
> > +   if (btrfs_is_free_space_inode(inode))
> > +   return;
> > +}
> > +
> > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> > + int mod)
> > +{
> > +   ASSERT(spin_is_locked(>lock));
> 
> lockdep_assert_held(>lock); for both functions. I've spoken with
> Peterz and he said any other way of checking whether a lock is held, I
> quote, "must die"
> 

Blah I continually forget that's what we're supposed to use now.

> > +   inode->reserved_extents += mod;
> > +   if (btrfs_is_free_space_inode(inode))
> > +   return;
> > +}
> > +
> >  static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 
> > generation)
> >  {
> > int ret = 0;
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index a7f68c304b4c..1262612fbf78 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct 
> > btrfs_root *root,
> >  u64 *qgroup_reserved, bool use_global_rsv);
> >  void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
> >   struct btrfs_block_rsv *rsv);
> > +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 
> > num_bytes);
> > +
> >  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 
> > num_bytes);
> >  void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 
> > num_bytes);
> >  int btrfs_delalloc_reserve_space(struct inode *inode,
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 1a6aced00a19..aa0f5c8953b0 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct 
> > btrfs_fs_info *fs_info,
> >  }
> >  
> >  /**
> > - * drop_outstanding_extent - drop an outstanding extent
> > + * drop_over_reserved_extents - drop our extra extent reservations
> >   * @inode: the inode we're dropping the extent for
> > - * @num_bytes: the number of bytes we're releasing.
> >   *
> > - * This is called when we are freeing up an outstanding extent, either 
> > called
> > - * after an error or after an extent is written.  This will return the 
> > number of
> > - * reserved extents that need to be freed.  This must be called with
> > - * BTRFS_I(inode)->lock held.
> > + * We reserve extents we may use, but they may have been merged with other
> > + * extents and we may not need the extra reservation.
> > + *
> > + * We also call this when we've completed io to an extent or had an error 
> > and
> > + * cleared the outstanding extent, in either case we no longer need our
> > + * reservation and can drop the excess.
> >   */
> > -static unsigned drop_outstanding_extent(struct btrfs_inode *inode,
> > -   u64 num_bytes)
> > +static unsigned drop_over_reserved_extents(struct btrfs_inode *inode)
> >  {
> > -   unsigned drop_inode_space = 0;
> > -   unsigned dropped_extents = 0;
> > -   unsigned num_extents;
> > +   unsigned num_extents = 0;
> >  
> > -   num_extents = count_max_extents(num_bytes);
> > -   ASSERT(num_extents);
> > -   ASSERT(inode->outstanding_extents >= num_extents);
> > -   inode->outstanding_extents -= num_extents;
> > +   if (inode->reserved_extents > inode->outstanding_extents) {
> > +   num_extents = inode->reserved_extents -
> > +   inode->outstanding_extents;
> > +   btrfs_mod_reserved_extents(inode, -num_extents);
> > +   }
> >  
> > if (inode->outstanding_extents == 0 &&
> > test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED,
> >>runtime_flags))
> > -   drop_inode_space = 1;
> > -
> > -   /*
> > -* If we have more or the same amount of outstanding extents than we 
> > have
> > -* reserved then we need to leave the reserved extents count alone.
> > -*/
> > -   if (inode->outstanding_extents >= inode->reserved_extents)
> > -   return drop_inode_space;
> > -
> > -   dropped_extents = inode->reserved_extents - inode->outstanding_extents;
> > -   inode->reserved_extents -= dropped_extents;
> > -   return dropped_extents + drop_inode_space;
> > +   num_extents++;
> > +   return num_extents;
> 
> Something bugs me around the handling of this. In
> btrfs_delalloc_reserve_metadata we do add the additional bytes necessary
> for updating the inode. However outstanding_extents is modified with the
> number of extent items necessary to cover the requires byte range but we
> don't account the extra inode item as being an extent. Then why do we
> have to actually increment 

Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread Nikolay Borisov


On 29.09.2017 22:43, Josef Bacik wrote:
>  
> +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
> +  int mod)
> +{
> + ASSERT(spin_is_locked(>lock));
> + inode->outstanding_extents += mod;
> + if (btrfs_is_free_space_inode(inode))
> + return;
> +}
> +
> +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> +   int mod)
> +{
> + ASSERT(spin_is_locked(>lock));

lockdep_assert_held(>lock); for both functions. I've spoken with
Peterz and he said any other way of checking whether a lock is held, I
quote, "must die"

> + inode->reserved_extents += mod;
> + if (btrfs_is_free_space_inode(inode))
> + return;
> +}
> +
>  static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 
> generation)
>  {
>   int ret = 0;
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index a7f68c304b4c..1262612fbf78 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
> *root,
>u64 *qgroup_reserved, bool use_global_rsv);
>  void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
> struct btrfs_block_rsv *rsv);
> +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 
> num_bytes);
> +
>  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 
> num_bytes);
>  void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 
> num_bytes);
>  int btrfs_delalloc_reserve_space(struct inode *inode,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1a6aced00a19..aa0f5c8953b0 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct 
> btrfs_fs_info *fs_info,
>  }
>  
>  /**
> - * drop_outstanding_extent - drop an outstanding extent
> + * drop_over_reserved_extents - drop our extra extent reservations
>   * @inode: the inode we're dropping the extent for
> - * @num_bytes: the number of bytes we're releasing.
>   *
> - * This is called when we are freeing up an outstanding extent, either called
> - * after an error or after an extent is written.  This will return the 
> number of
> - * reserved extents that need to be freed.  This must be called with
> - * BTRFS_I(inode)->lock held.
> + * We reserve extents we may use, but they may have been merged with other
> + * extents and we may not need the extra reservation.
> + *
> + * We also call this when we've completed io to an extent or had an error and
> + * cleared the outstanding extent, in either case we no longer need our
> + * reservation and can drop the excess.
>   */
> -static unsigned drop_outstanding_extent(struct btrfs_inode *inode,
> - u64 num_bytes)
> +static unsigned drop_over_reserved_extents(struct btrfs_inode *inode)
>  {
> - unsigned drop_inode_space = 0;
> - unsigned dropped_extents = 0;
> - unsigned num_extents;
> + unsigned num_extents = 0;
>  
> - num_extents = count_max_extents(num_bytes);
> - ASSERT(num_extents);
> - ASSERT(inode->outstanding_extents >= num_extents);
> - inode->outstanding_extents -= num_extents;
> + if (inode->reserved_extents > inode->outstanding_extents) {
> + num_extents = inode->reserved_extents -
> + inode->outstanding_extents;
> + btrfs_mod_reserved_extents(inode, -num_extents);
> + }
>  
>   if (inode->outstanding_extents == 0 &&
>   test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED,
>  >runtime_flags))
> - drop_inode_space = 1;
> -
> - /*
> -  * If we have more or the same amount of outstanding extents than we 
> have
> -  * reserved then we need to leave the reserved extents count alone.
> -  */
> - if (inode->outstanding_extents >= inode->reserved_extents)
> - return drop_inode_space;
> -
> - dropped_extents = inode->reserved_extents - inode->outstanding_extents;
> - inode->reserved_extents -= dropped_extents;
> - return dropped_extents + drop_inode_space;
> + num_extents++;
> + return num_extents;

Something bugs me around the handling of this. In
btrfs_delalloc_reserve_metadata we do add the additional bytes necessary
for updating the inode. However outstanding_extents is modified with the
number of extent items necessary to cover the requires byte range but we
don't account the extra inode item as being an extent. Then why do we
have to actually increment the num_extents here? Doesn't this lead to
underflow?

To illustrate: A write comes for 1 mb, we count this as 1 outstanding
extent in delalloc_reserve_metadata:

nr_extents = count_max_extents(num_bytes);
inode->outstanding_extents += nr_extents;

We do account the extra inode item but only 

[PATCH 01/21] Btrfs: rework outstanding_extents

2017-09-29 Thread Josef Bacik
Right now we do a lot of weird hoops around outstanding_extents in order
to keep the extent count consistent.  This is because we logically
transfer the outstanding_extent count from the initial reservation
through the set_delalloc_bits.  This makes it pretty difficult to get a
handle on how and when we need to mess with outstanding_extents.

Fix this by revamping the rules of how we deal with outstanding_extents.
Now instead everybody that is holding on to a delalloc extent is
required to increase the outstanding extents count for itself.  This
means we'll have something like this

btrfs_dealloc_reserve_metadata  - outstanding_extents = 1
 btrfs_set_delalloc - outstanding_extents = 2
btrfs_release_delalloc_extents  - outstanding_extents = 1

for an initial file write.  Now take the append write where we extend an
existing delalloc range but still under the maximum extent size

btrfs_delalloc_reserve_metadata - outstanding_extents = 2
  btrfs_set_delalloc
btrfs_set_bit_hook  - outstanding_extents = 3
btrfs_merge_bit_hook- outstanding_extents = 2
btrfs_release_delalloc_extents  - outstanding_extnets = 1

In order to make the ordered extent transition we of course must now
make ordered extents carry their own outstanding_extent reservation, so
for cow_file_range we end up with

btrfs_add_ordered_extent- outstanding_extents = 2
clear_extent_bit- outstanding_extents = 1
btrfs_remove_ordered_extent - outstanding_extents = 0

This makes all manipulations of outstanding_extents much more explicit.
Every successful call to btrfs_reserve_delalloc_metadata _must_ now be
combined with btrfs_release_delalloc_extents, even in the error case, as
that is the only function that actually modifies the
outstanding_extents counter.

The drawback to this is now we are much more likely to have transient
cases where outstanding_extents is much larger than it actually should
be.  This could happen before as we manipulated the delalloc bits, but
now it happens basically at every write.  This may put more pressure on
the ENOSPC flushing code, but I think making this code simpler is worth
the cost.  I have another change coming to mitigate this side-effect
somewhat.

I also added trace points for the counter manipulation.  These were used
by a bpf script I wrote to help track down leak issues.

Signed-off-by: Josef Bacik 
---
 fs/btrfs/btrfs_inode.h   |  18 ++
 fs/btrfs/ctree.h |   2 +
 fs/btrfs/extent-tree.c   | 139 ---
 fs/btrfs/file.c  |  22 +++
 fs/btrfs/inode-map.c |   3 +-
 fs/btrfs/inode.c | 114 +++
 fs/btrfs/ioctl.c |   2 +
 fs/btrfs/ordered-data.c  |  21 ++-
 fs/btrfs/relocation.c|   3 +
 fs/btrfs/tests/inode-tests.c |  18 ++
 10 files changed, 186 insertions(+), 156 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index eccadb5f62a5..a6a22ef41f91 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -267,6 +267,24 @@ static inline bool btrfs_is_free_space_inode(struct 
btrfs_inode *inode)
return false;
 }
 
+static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
+int mod)
+{
+   ASSERT(spin_is_locked(>lock));
+   inode->outstanding_extents += mod;
+   if (btrfs_is_free_space_inode(inode))
+   return;
+}
+
+static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
+ int mod)
+{
+   ASSERT(spin_is_locked(>lock));
+   inode->reserved_extents += mod;
+   if (btrfs_is_free_space_inode(inode))
+   return;
+}
+
 static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
 {
int ret = 0;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a7f68c304b4c..1262612fbf78 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
*root,
 u64 *qgroup_reserved, bool use_global_rsv);
 void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
  struct btrfs_block_rsv *rsv);
+void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
+
 int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes);
 int btrfs_delalloc_reserve_space(struct inode *inode,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1a6aced00a19..aa0f5c8953b0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct 
btrfs_fs_info *fs_info,
 }
 
 /**
- * drop_outstanding_extent - drop an outstanding extent
+ *