Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-16 Thread Qu Wenruo


On 2018年04月17日 01:27, David Sterba wrote:
> On Mon, Apr 16, 2018 at 04:53:10PM +0900, Misono Tomohiro wrote:
>> On 2018/04/12 22:13, David Sterba wrote:
>>> On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
 I didn't see this patch merged in your misc-next branch but only the
 remaining patches.

 However without this patch, btrfs qgroup reserved space will get
 obviously increased as prealloc metadata reserved space is never freed
 until inode reserved space is freed.

 This would cause a lot of qgroup related test cases to fail.

 Would you please merge this patch with all qgroup patchset?
>>>
>>> For the record: patch 9 and 10 are now in misc next and will go to 4.17.
>>> I need to let them through the for-next and other testing, so it will be
>>> some of the post rc1 pull requests.
>>
>> I still saw qgroup related xfstests (btrfs/022 etc.) fails in current 
>> misc-next branch
>> which includes 9th and 10th patches.
>>
>> I noticed 5th patch in the tree (already in v4.17-rc1) is older version and 
>> this
>> seems the reason that the tests still fails.
> 
> Qu, can you please have a look a send an incremental fixup? Handling of
> this patchset was not very good from my side, I should have asked for a
> fresh resend as I applied the changes from mailinglist and not from git.

No problem, and for next time, I'll double check misc-next branch for
such difference.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-16 Thread David Sterba
On Mon, Apr 16, 2018 at 04:53:10PM +0900, Misono Tomohiro wrote:
> On 2018/04/12 22:13, David Sterba wrote:
> > On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
> >> I didn't see this patch merged in your misc-next branch but only the
> >> remaining patches.
> >>
> >> However without this patch, btrfs qgroup reserved space will get
> >> obviously increased as prealloc metadata reserved space is never freed
> >> until inode reserved space is freed.
> >>
> >> This would cause a lot of qgroup related test cases to fail.
> >>
> >> Would you please merge this patch with all qgroup patchset?
> > 
> > For the record: patch 9 and 10 are now in misc next and will go to 4.17.
> > I need to let them through the for-next and other testing, so it will be
> > some of the post rc1 pull requests.
> 
> I still saw qgroup related xfstests (btrfs/022 etc.) fails in current 
> misc-next branch
> which includes 9th and 10th patches.
> 
> I noticed 5th patch in the tree (already in v4.17-rc1) is older version and 
> this
> seems the reason that the tests still fails.

Qu, can you please have a look a send an incremental fixup? Handling of
this patchset was not very good from my side, I should have asked for a
fresh resend as I applied the changes from mailinglist and not from git.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-16 Thread Misono Tomohiro


On 2018/04/12 22:13, David Sterba wrote:
> On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
>> I didn't see this patch merged in your misc-next branch but only the
>> remaining patches.
>>
>> However without this patch, btrfs qgroup reserved space will get
>> obviously increased as prealloc metadata reserved space is never freed
>> until inode reserved space is freed.
>>
>> This would cause a lot of qgroup related test cases to fail.
>>
>> Would you please merge this patch with all qgroup patchset?
> 
> For the record: patch 9 and 10 are now in misc next and will go to 4.17.
> I need to let them through the for-next and other testing, so it will be
> some of the post rc1 pull requests.

I still saw qgroup related xfstests (btrfs/022 etc.) fails in current misc-next 
branch
which includes 9th and 10th patches.

I noticed 5th patch in the tree (already in v4.17-rc1) is older version and this
seems the reason that the tests still fails.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-12 Thread David Sterba
On Tue, Apr 03, 2018 at 03:30:34PM +0800, Qu Wenruo wrote:
> I didn't see this patch merged in your misc-next branch but only the
> remaining patches.
> 
> However without this patch, btrfs qgroup reserved space will get
> obviously increased as prealloc metadata reserved space is never freed
> until inode reserved space is freed.
> 
> This would cause a lot of qgroup related test cases to fail.
> 
> Would you please merge this patch with all qgroup patchset?

For the record: patch 9 and 10 are now in misc next and will go to 4.17.
I need to let them through the for-next and other testing, so it will be
some of the post rc1 pull requests.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-12 Thread David Sterba
On Wed, Apr 11, 2018 at 05:03:15PM -0700, Omar Sandoval wrote:
> > 
> > On 2018年04月04日 16:53, Nikolay Borisov wrote:
> > > On  3.04.2018 10:30, Qu Wenruo wrote:
> > >> I didn't see this patch merged in your misc-next branch but only the
> > >> remaining patches.
> > >>
> > >> However without this patch, btrfs qgroup reserved space will get
> > >> obviously increased as prealloc metadata reserved space is never freed
> > >> until inode reserved space is freed.
> > >>
> > >> This would cause a lot of qgroup related test cases to fail.
> > >>
> > >> Would you please merge this patch with all qgroup patchset?
> > > 
> > > I'm seeing btrfs/022 and btrfs/099 fail as of today's misc-next. Both
> > > are qgroup related tests.
> > 
> > Exactly.
> > 
> > Wondering why this patch is left.
> I'm also seeing several qgroups xfstests failures on Linus' master
> branch. Dave?

Hm, dunno where the last patch got lost. The intention whas to merge the
whole series, I'll add the patch to 2nd pull.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-11 Thread Omar Sandoval
On Wed, Apr 04, 2018 at 08:17:22PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年04月04日 16:53, Nikolay Borisov wrote:
> > 
> > 
> > On  3.04.2018 10:30, Qu Wenruo wrote:
> >> Hi David,
> >>
> >> I didn't see this patch merged in your misc-next branch but only the
> >> remaining patches.
> >>
> >> However without this patch, btrfs qgroup reserved space will get
> >> obviously increased as prealloc metadata reserved space is never freed
> >> until inode reserved space is freed.
> >>
> >> This would cause a lot of qgroup related test cases to fail.
> >>
> >> Would you please merge this patch with all qgroup patchset?
> > 
> > I'm seeing btrfs/022 and btrfs/099 fail as of today's misc-next. Both
> > are qgroup related tests.
> 
> Exactly.
> 
> Wondering why this patch is left.
> 
> Thanks,
> Qu

I'm also seeing several qgroups xfstests failures on Linus' master
branch. Dave?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-04-03 Thread Qu Wenruo
Hi David,

I didn't see this patch merged in your misc-next branch but only the
remaining patches.

However without this patch, btrfs qgroup reserved space will get
obviously increased as prealloc metadata reserved space is never freed
until inode reserved space is freed.

This would cause a lot of qgroup related test cases to fail.

Would you please merge this patch with all qgroup patchset?

Thanks,
Qu

On 2017年12月22日 14:18, Qu Wenruo wrote:
> Unlike reservation calculation used in inode rsv for metadata, qgroup
> doesn't really need to care things like csum size or extent usage for
> whole tree COW.
> 
> Qgroup care more about net change of extent usage.
> That's to say, if we're going to insert one file extent, it will mostly
> find its place in CoWed tree block, leaving no change in extent usage.
> Or cause leaf split, result one new net extent, increasing qgroup number
> by nodesize.
> (Or even more rare case, increase the tree level, increasing qgroup
> number by 2 * nodesize)
> 
> So here instead of using the way overkilled calculation for extent
> allocator, which cares more about accurate and no error, qgroup doesn't
> need that over-calculated reservation.
> 
> This patch will maintain 2 new members in btrfs_block_rsv structure for
> qgroup, using much smaller calculation for qgroup rsv, reducing false
> EDQUOT.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h   | 18 +
>  fs/btrfs/extent-tree.c | 55 
> --
>  2 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0c58f92c2d44..97783ba91e00 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>   unsigned short full;
>   unsigned short type;
>   unsigned short failfast;
> +
> + /*
> +  * Qgroup equivalent for @size @reserved
> +  *
> +  * Unlike normal normal @size/@reserved for inode rsv,
> +  * qgroup doesn't care about things like csum size nor how many tree
> +  * blocks it will need to reserve.
> +  *
> +  * Qgroup cares more about *NET* change of extent usage.
> +  * So for ONE newly inserted file extent, in worst case it will cause
> +  * leaf split and level increase, nodesize for each file extent
> +  * is already way overkilled.
> +  *
> +  * In short, qgroup_size/reserved is the up limit of possible needed
> +  * qgroup metadata reservation.
> +  */
> + u64 qgroup_rsv_size;
> + u64 qgroup_rsv_reserved;
>  };
>  
>  /*
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 986660f301de..9d579c7bcf7f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
> btrfs_fs_info *fs_info,
>  
>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>   struct btrfs_block_rsv *block_rsv,
> - struct btrfs_block_rsv *dest, u64 num_bytes)
> + struct btrfs_block_rsv *dest, u64 num_bytes,
> + u64 *qgroup_to_release_ret)
>  {
>   struct btrfs_space_info *space_info = block_rsv->space_info;
> + u64 qgroup_to_release = 0;
>   u64 ret;
>  
>   spin_lock(_rsv->lock);
> - if (num_bytes == (u64)-1)
> + if (num_bytes == (u64)-1) {
>   num_bytes = block_rsv->size;
> + qgroup_to_release = block_rsv->qgroup_rsv_size;
> + }
>   block_rsv->size -= num_bytes;
>   if (block_rsv->reserved >= block_rsv->size) {
>   num_bytes = block_rsv->reserved - block_rsv->size;
> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct 
> btrfs_fs_info *fs_info,
>   } else {
>   num_bytes = 0;
>   }
> + if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
> + qgroup_to_release = block_rsv->qgroup_rsv_reserved -
> + block_rsv->qgroup_rsv_size;
> + block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
> + } else
> + qgroup_to_release = 0;
>   spin_unlock(_rsv->lock);
>  
>   ret = num_bytes;
> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info 
> *fs_info,
>   space_info_add_old_bytes(fs_info, space_info,
>num_bytes);
>   }
> + if (qgroup_to_release_ret)
> + *qgroup_to_release_ret = qgroup_to_release;
>   return ret;
>  }
>  
> @@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
>   struct btrfs_root *root = inode->root;
>   struct btrfs_block_rsv *block_rsv = >block_rsv;
>   u64 num_bytes = 0;
> + u64 qgroup_num_bytes = 0;
>   int ret = -ENOSPC;
>  
>   spin_lock(_rsv->lock);
>   if 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-23 Thread Jeff Mahoney
On 2/22/18 6:34 PM, Qu Wenruo wrote:
> 
> 
> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>>> doesn't really need to care things like csum size or extent usage for
>>> whole tree COW.
>>>
>>> Qgroup care more about net change of extent usage.
>>> That's to say, if we're going to insert one file extent, it will mostly
>>> find its place in CoWed tree block, leaving no change in extent usage.
>>> Or cause leaf split, result one new net extent, increasing qgroup number
>>> by nodesize.
>>> (Or even more rare case, increase the tree level, increasing qgroup
>>> number by 2 * nodesize)
>>>
>>> So here instead of using the way overkilled calculation for extent
>>> allocator, which cares more about accurate and no error, qgroup doesn't
>>> need that over-calculated reservation.
>>>
>>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>>> EDQUOT.
>>
>>
>> I think this is the right idea but, rather than being the last step in a
>> qgroup rework, it should be the first.
> 
> That's right.
> 
> Although putting it as 1st patch needs extra work to co-operate with
> later type seperation.
> 
>>  Don't qgroup reservation
>> lifetimes match the block reservation lifetimes?
> 
> Also correct, but...
> 
>>  We wouldn't want a
>> global reserve and we wouldn't track usage on a per-block basis, but
>> they should otherwise match.  We already have clear APIs surrounding the
>> use of block reservations, so it seems to me that it'd make sense to
>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>> your patchset makes a parallel reservation system with a different API.
>> That keeps the current state of qgroups as a bolt-on that always needs
>> to be tracked separately (and which developers need to ensure they get
>> right).
> 
> The problem is, block reservation is designed to ensure every CoW could
> be fulfilled without error.
> 
> That's to say, for case like CoW write with csum, we need to care about
> space reservation not only for EXTENT_DATA in fs tree, but also later
> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
> 
> However extent tree and csum tree doesn't contribute to quota at all.
> If we follow such over-reservation, it would be much much easier to hit
> false EDQUOTA early.

I'm not suggesting a 1:1 mapping between block reservations and qgroup
reservations.  If that were possible, we wouldn't need separate
reservations at all.  What we can do is only use bytes from the qgroup
reservation when we allocate the leaf nodes belonging to the root we're
tracking.  Everywhere else we can migrate bytes normally between
reservations the same way we do for block reservations.  As we discussed
offline yesterday, I'll work up something along what I have in mind and
see if it works out.

-Jeff

> That's the main reason why a separate (and a little simplified) block
> rsv tracking system.
> 
> And if there is better solution, I'm all ears.
> 
> Thanks,
> Qu
> 
>>
>> -Jeff
>>
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>  fs/btrfs/ctree.h   | 18 +
>>>  fs/btrfs/extent-tree.c | 55 
>>> --
>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 0c58f92c2d44..97783ba91e00 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>> unsigned short full;
>>> unsigned short type;
>>> unsigned short failfast;
>>> +
>>> +   /*
>>> +* Qgroup equivalent for @size @reserved
>>> +*
>>> +* Unlike normal normal @size/@reserved for inode rsv,
>>> +* qgroup doesn't care about things like csum size nor how many tree
>>> +* blocks it will need to reserve.
>>> +*
>>> +* Qgroup cares more about *NET* change of extent usage.
>>> +* So for ONE newly inserted file extent, in worst case it will cause
>>> +* leaf split and level increase, nodesize for each file extent
>>> +* is already way overkilled.
>>> +*
>>> +* In short, qgroup_size/reserved is the up limit of possible needed
>>> +* qgroup metadata reservation.
>>> +*/
>>> +   u64 qgroup_rsv_size;
>>> +   u64 qgroup_rsv_reserved;
>>>  };
>>>  
>>>  /*
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 986660f301de..9d579c7bcf7f 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
>>> btrfs_fs_info *fs_info,
>>>  
>>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>> struct btrfs_block_rsv *block_rsv,
>>> -   struct btrfs_block_rsv *dest, u64 num_bytes)
>>> +   struct 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-23 Thread Qu Wenruo
[snip]
>>
>> We don't need to do such check at call site.
>>
>> Just do the calculation (which should be really simple, as simple as
>> nodesize * nr_items_to_add), and pass it to btrfs_qgroup_reserve_* APIs,
>> which would handle the quota enabled check.
>>
>>>
>>> be contained into the block rsv apis. This will ensure lifetime of
>>> blockrsv/qgroups is always consistent. I think this only applies to
>>> qgroup metadata reservations.
>>
>> Yep, and only applies to PREALLOC type metadata reservation.
>> For per-transaction rsv, it's handled by PERTRANS type.
> 
> And what about the btrfs_qgroup_reserve_meta()-type reservations, this
> function doesn't take a transaction, what kind of reservation is that o_O

We only have two functions to reserve metadata:
1) btrfs_qgroup_reserve_meta_pertrans
2) btrfs_qgroup_reserve_meta_prealloc

No 3rd option.
And each function name should explain themselves.

For pertrans rsv, we don't really need @tran parameter in fact.
We only needs to tell qgroup to reserve some meta space for pertrans type.

And there is only one caller for pertrans, that in transaction.c.

> 
> To sum it up we have  PERTRANS, PREALLOC and  btrfs_qgroup_reserve_meta
> and those are 3 distinct type of reservations, correct?

Only 2 in facts.

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>
>>>

 Thanks,
 Qu

>
> -Jeff
>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/ctree.h   | 18 +
>>  fs/btrfs/extent-tree.c | 55 
>> --
>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 0c58f92c2d44..97783ba91e00 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>  unsigned short full;
>>  unsigned short type;
>>  unsigned short failfast;
>> +
>> +/*
>> + * Qgroup equivalent for @size @reserved
>> + *
>> + * Unlike normal normal @size/@reserved for inode rsv,
>> + * qgroup doesn't care about things like csum size nor how many 
>> tree
>> + * blocks it will need to reserve.
>> + *
>> + * Qgroup cares more about *NET* change of extent usage.
>> + * So for ONE newly inserted file extent, in worst case it will 
>> cause
>> + * leaf split and level increase, nodesize for each file extent
>> + * is already way overkilled.
>> + *
>> + * In short, qgroup_size/reserved is the up limit of possible 
>> needed
>> + * qgroup metadata reservation.
>> + */
>> +u64 qgroup_rsv_size;
>> +u64 qgroup_rsv_reserved;
>>  };
>>  
>>  /*
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 986660f301de..9d579c7bcf7f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
>> btrfs_fs_info *fs_info,
>>  
>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>  struct btrfs_block_rsv *block_rsv,
>> -struct btrfs_block_rsv *dest, u64 
>> num_bytes)
>> +struct btrfs_block_rsv *dest, u64 
>> num_bytes,
>> +u64 *qgroup_to_release_ret)
>>  {
>>  struct btrfs_space_info *space_info = block_rsv->space_info;
>> +u64 qgroup_to_release = 0;
>>  u64 ret;
>>  
>>  spin_lock(_rsv->lock);
>> -if (num_bytes == (u64)-1)
>> +if (num_bytes == (u64)-1) {
>>  num_bytes = block_rsv->size;
>> +qgroup_to_release = block_rsv->qgroup_rsv_size;
>> +}
>>  block_rsv->size -= num_bytes;
>>  if (block_rsv->reserved >= block_rsv->size) {
>>  num_bytes = block_rsv->reserved - block_rsv->size;
>> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct 
>> btrfs_fs_info *fs_info,
>>  } else {
>>  num_bytes = 0;
>>  }
>> +if (block_rsv->qgroup_rsv_reserved >= 
>> block_rsv->qgroup_rsv_size) {
>> +qgroup_to_release = block_rsv->qgroup_rsv_reserved -
>> +block_rsv->qgroup_rsv_size;
>> +block_rsv->qgroup_rsv_reserved = 
>> block_rsv->qgroup_rsv_size;
>> +} else
>> +qgroup_to_release = 0;
>>  spin_unlock(_rsv->lock);
>>  
>>  ret = num_bytes;
>> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct 
>> 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-23 Thread Nikolay Borisov


On 23.02.2018 11:06, Qu Wenruo wrote:
> 
> 
> On 2018年02月23日 16:14, Nikolay Borisov wrote:
>>
>>
>> On 23.02.2018 01:34, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年02月23日 06:44, Jeff Mahoney wrote:
 On 12/22/17 1:18 AM, Qu Wenruo wrote:
> Unlike reservation calculation used in inode rsv for metadata, qgroup
> doesn't really need to care things like csum size or extent usage for
> whole tree COW.
>
> Qgroup care more about net change of extent usage.
> That's to say, if we're going to insert one file extent, it will mostly
> find its place in CoWed tree block, leaving no change in extent usage.
> Or cause leaf split, result one new net extent, increasing qgroup number
> by nodesize.
> (Or even more rare case, increase the tree level, increasing qgroup
> number by 2 * nodesize)
>
> So here instead of using the way overkilled calculation for extent
> allocator, which cares more about accurate and no error, qgroup doesn't
> need that over-calculated reservation.
>
> This patch will maintain 2 new members in btrfs_block_rsv structure for
> qgroup, using much smaller calculation for qgroup rsv, reducing false
> EDQUOT.


 I think this is the right idea but, rather than being the last step in a
 qgroup rework, it should be the first.
>>>
>>> That's right.
>>>
>>> Although putting it as 1st patch needs extra work to co-operate with
>>> later type seperation.
>>>
  Don't qgroup reservation
 lifetimes match the block reservation lifetimes?
>>>
>>> Also correct, but...
>>>
  We wouldn't want a
 global reserve and we wouldn't track usage on a per-block basis, but
 they should otherwise match.  We already have clear APIs surrounding the
 use of block reservations, so it seems to me that it'd make sense to
 extend those to cover the qgroup cases as well.  Otherwise, the rest of
 your patchset makes a parallel reservation system with a different API.
 That keeps the current state of qgroups as a bolt-on that always needs
 to be tracked separately (and which developers need to ensure they get
 right).
>>>
>>> The problem is, block reservation is designed to ensure every CoW could
>>> be fulfilled without error.
>>>
>>> That's to say, for case like CoW write with csum, we need to care about
>>> space reservation not only for EXTENT_DATA in fs tree, but also later
>>> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
>>>
>>> However extent tree and csum tree doesn't contribute to quota at all.
>>> If we follow such over-reservation, it would be much much easier to hit
>>> false EDQUOTA early.
>>>
>>> That's the main reason why a separate (and a little simplified) block
>>> rsv tracking system.
>>>
>>> And if there is better solution, I'm all ears.
>>
>> So looking at the code the main hurdles seems to be the fact that the
>> btrfs_block_rsv_* API's take a raw byte count, which is derived from
>> btrfs_calc_trans_metadata_size, which in turn is passed the number of
>> items we want.
>>
>> So what if we extend the block_rsv_* apis to take an additional
>> 'quota_bytes' or somesuch argument which would represent the amount we
>> would like to be charged to the qgroups. This will force us to revisit
>> every call site of block_rsv API's and adjust the code accordingly so
>> that callers pass the correct number. This will lead to code such as:
>>
>> if (test_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags)) { blah }
> 
> We don't need to do such check at call site.
> 
> Just do the calculation (which should be really simple, as simple as
> nodesize * nr_items_to_add), and pass it to btrfs_qgroup_reserve_* APIs,
> which would handle the quota enabled check.
> 
>>
>> be contained into the block rsv apis. This will ensure lifetime of
>> blockrsv/qgroups is always consistent. I think this only applies to
>> qgroup metadata reservations.
> 
> Yep, and only applies to PREALLOC type metadata reservation.
> For per-transaction rsv, it's handled by PERTRANS type.

And what about the btrfs_qgroup_reserve_meta()-type reservations, this
function doesn't take a transaction, what kind of reservation is that o_O

To sum it up we have  PERTRANS, PREALLOC and  btrfs_qgroup_reserve_meta
and those are 3 distinct type of reservations, correct?
> 
> Thanks,
> Qu
> 
>>
>>
>>
>>
>>>
>>> Thanks,
>>> Qu
>>>

 -Jeff

> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h   | 18 +
>  fs/btrfs/extent-tree.c | 55 
> --
>  2 files changed, 62 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0c58f92c2d44..97783ba91e00 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>   unsigned short full;
>   unsigned short type;
>   unsigned short failfast;
> +
> + /*
> +  * 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-23 Thread Qu Wenruo


On 2018年02月23日 16:14, Nikolay Borisov wrote:
> 
> 
> On 23.02.2018 01:34, Qu Wenruo wrote:
>>
>>
>> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
 Unlike reservation calculation used in inode rsv for metadata, qgroup
 doesn't really need to care things like csum size or extent usage for
 whole tree COW.

 Qgroup care more about net change of extent usage.
 That's to say, if we're going to insert one file extent, it will mostly
 find its place in CoWed tree block, leaving no change in extent usage.
 Or cause leaf split, result one new net extent, increasing qgroup number
 by nodesize.
 (Or even more rare case, increase the tree level, increasing qgroup
 number by 2 * nodesize)

 So here instead of using the way overkilled calculation for extent
 allocator, which cares more about accurate and no error, qgroup doesn't
 need that over-calculated reservation.

 This patch will maintain 2 new members in btrfs_block_rsv structure for
 qgroup, using much smaller calculation for qgroup rsv, reducing false
 EDQUOT.
>>>
>>>
>>> I think this is the right idea but, rather than being the last step in a
>>> qgroup rework, it should be the first.
>>
>> That's right.
>>
>> Although putting it as 1st patch needs extra work to co-operate with
>> later type seperation.
>>
>>>  Don't qgroup reservation
>>> lifetimes match the block reservation lifetimes?
>>
>> Also correct, but...
>>
>>>  We wouldn't want a
>>> global reserve and we wouldn't track usage on a per-block basis, but
>>> they should otherwise match.  We already have clear APIs surrounding the
>>> use of block reservations, so it seems to me that it'd make sense to
>>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>>> your patchset makes a parallel reservation system with a different API.
>>> That keeps the current state of qgroups as a bolt-on that always needs
>>> to be tracked separately (and which developers need to ensure they get
>>> right).
>>
>> The problem is, block reservation is designed to ensure every CoW could
>> be fulfilled without error.
>>
>> That's to say, for case like CoW write with csum, we need to care about
>> space reservation not only for EXTENT_DATA in fs tree, but also later
>> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
>>
>> However extent tree and csum tree doesn't contribute to quota at all.
>> If we follow such over-reservation, it would be much much easier to hit
>> false EDQUOTA early.
>>
>> That's the main reason why a separate (and a little simplified) block
>> rsv tracking system.
>>
>> And if there is better solution, I'm all ears.
> 
> So looking at the code the main hurdles seems to be the fact that the
> btrfs_block_rsv_* API's take a raw byte count, which is derived from
> btrfs_calc_trans_metadata_size, which in turn is passed the number of
> items we want.
> 
> So what if we extend the block_rsv_* apis to take an additional
> 'quota_bytes' or somesuch argument which would represent the amount we
> would like to be charged to the qgroups. This will force us to revisit
> every call site of block_rsv API's and adjust the code accordingly so
> that callers pass the correct number. This will lead to code such as:
> 
> if (test_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags)) { blah }

We don't need to do such check at call site.

Just do the calculation (which should be really simple, as simple as
nodesize * nr_items_to_add), and pass it to btrfs_qgroup_reserve_* APIs,
which would handle the quota enabled check.

> 
> be contained into the block rsv apis. This will ensure lifetime of
> blockrsv/qgroups is always consistent. I think this only applies to
> qgroup metadata reservations.

Yep, and only applies to PREALLOC type metadata reservation.
For per-transaction rsv, it's handled by PERTRANS type.

Thanks,
Qu

> 
> 
> 
> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> -Jeff
>>>
 Signed-off-by: Qu Wenruo 
 ---
  fs/btrfs/ctree.h   | 18 +
  fs/btrfs/extent-tree.c | 55 
 --
  2 files changed, 62 insertions(+), 11 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 0c58f92c2d44..97783ba91e00 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
unsigned short full;
unsigned short type;
unsigned short failfast;
 +
 +  /*
 +   * Qgroup equivalent for @size @reserved
 +   *
 +   * Unlike normal normal @size/@reserved for inode rsv,
 +   * qgroup doesn't care about things like csum size nor how many tree
 +   * blocks it will need to reserve.
 +   *
 +   * Qgroup cares more about *NET* change of extent usage.
 +   * So for ONE newly inserted file extent, in worst case it will cause
 +   * leaf split and level increase, nodesize for each 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-23 Thread Nikolay Borisov


On 23.02.2018 01:34, Qu Wenruo wrote:
> 
> 
> On 2018年02月23日 06:44, Jeff Mahoney wrote:
>> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>>> doesn't really need to care things like csum size or extent usage for
>>> whole tree COW.
>>>
>>> Qgroup care more about net change of extent usage.
>>> That's to say, if we're going to insert one file extent, it will mostly
>>> find its place in CoWed tree block, leaving no change in extent usage.
>>> Or cause leaf split, result one new net extent, increasing qgroup number
>>> by nodesize.
>>> (Or even more rare case, increase the tree level, increasing qgroup
>>> number by 2 * nodesize)
>>>
>>> So here instead of using the way overkilled calculation for extent
>>> allocator, which cares more about accurate and no error, qgroup doesn't
>>> need that over-calculated reservation.
>>>
>>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>>> EDQUOT.
>>
>>
>> I think this is the right idea but, rather than being the last step in a
>> qgroup rework, it should be the first.
> 
> That's right.
> 
> Although putting it as 1st patch needs extra work to co-operate with
> later type seperation.
> 
>>  Don't qgroup reservation
>> lifetimes match the block reservation lifetimes?
> 
> Also correct, but...
> 
>>  We wouldn't want a
>> global reserve and we wouldn't track usage on a per-block basis, but
>> they should otherwise match.  We already have clear APIs surrounding the
>> use of block reservations, so it seems to me that it'd make sense to
>> extend those to cover the qgroup cases as well.  Otherwise, the rest of
>> your patchset makes a parallel reservation system with a different API.
>> That keeps the current state of qgroups as a bolt-on that always needs
>> to be tracked separately (and which developers need to ensure they get
>> right).
> 
> The problem is, block reservation is designed to ensure every CoW could
> be fulfilled without error.
> 
> That's to say, for case like CoW write with csum, we need to care about
> space reservation not only for EXTENT_DATA in fs tree, but also later
> EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.
> 
> However extent tree and csum tree doesn't contribute to quota at all.
> If we follow such over-reservation, it would be much much easier to hit
> false EDQUOTA early.
> 
> That's the main reason why a separate (and a little simplified) block
> rsv tracking system.
> 
> And if there is better solution, I'm all ears.

So looking at the code the main hurdles seems to be the fact that the
btrfs_block_rsv_* API's take a raw byte count, which is derived from
btrfs_calc_trans_metadata_size, which in turn is passed the number of
items we want.

So what if we extend the block_rsv_* apis to take an additional
'quota_bytes' or somesuch argument which would represent the amount we
would like to be charged to the qgroups. This will force us to revisit
every call site of block_rsv API's and adjust the code accordingly so
that callers pass the correct number. This will lead to code such as:

if (test_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags)) { blah }

be contained into the block rsv apis. This will ensure lifetime of
blockrsv/qgroups is always consistent. I think this only applies to
qgroup metadata reservations.




> 
> Thanks,
> Qu
> 
>>
>> -Jeff
>>
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>  fs/btrfs/ctree.h   | 18 +
>>>  fs/btrfs/extent-tree.c | 55 
>>> --
>>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 0c58f92c2d44..97783ba91e00 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>> unsigned short full;
>>> unsigned short type;
>>> unsigned short failfast;
>>> +
>>> +   /*
>>> +* Qgroup equivalent for @size @reserved
>>> +*
>>> +* Unlike normal normal @size/@reserved for inode rsv,
>>> +* qgroup doesn't care about things like csum size nor how many tree
>>> +* blocks it will need to reserve.
>>> +*
>>> +* Qgroup cares more about *NET* change of extent usage.
>>> +* So for ONE newly inserted file extent, in worst case it will cause
>>> +* leaf split and level increase, nodesize for each file extent
>>> +* is already way overkilled.
>>> +*
>>> +* In short, qgroup_size/reserved is the up limit of possible needed
>>> +* qgroup metadata reservation.
>>> +*/
>>> +   u64 qgroup_rsv_size;
>>> +   u64 qgroup_rsv_reserved;
>>>  };
>>>  
>>>  /*
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 986660f301de..9d579c7bcf7f 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
>>> 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-22 Thread Qu Wenruo


On 2018年02月23日 06:44, Jeff Mahoney wrote:
> On 12/22/17 1:18 AM, Qu Wenruo wrote:
>> Unlike reservation calculation used in inode rsv for metadata, qgroup
>> doesn't really need to care things like csum size or extent usage for
>> whole tree COW.
>>
>> Qgroup care more about net change of extent usage.
>> That's to say, if we're going to insert one file extent, it will mostly
>> find its place in CoWed tree block, leaving no change in extent usage.
>> Or cause leaf split, result one new net extent, increasing qgroup number
>> by nodesize.
>> (Or even more rare case, increase the tree level, increasing qgroup
>> number by 2 * nodesize)
>>
>> So here instead of using the way overkilled calculation for extent
>> allocator, which cares more about accurate and no error, qgroup doesn't
>> need that over-calculated reservation.
>>
>> This patch will maintain 2 new members in btrfs_block_rsv structure for
>> qgroup, using much smaller calculation for qgroup rsv, reducing false
>> EDQUOT.
> 
> 
> I think this is the right idea but, rather than being the last step in a
> qgroup rework, it should be the first.

That's right.

Although putting it as 1st patch needs extra work to co-operate with
later type seperation.

>  Don't qgroup reservation
> lifetimes match the block reservation lifetimes?

Also correct, but...

>  We wouldn't want a
> global reserve and we wouldn't track usage on a per-block basis, but
> they should otherwise match.  We already have clear APIs surrounding the
> use of block reservations, so it seems to me that it'd make sense to
> extend those to cover the qgroup cases as well.  Otherwise, the rest of
> your patchset makes a parallel reservation system with a different API.
> That keeps the current state of qgroups as a bolt-on that always needs
> to be tracked separately (and which developers need to ensure they get
> right).

The problem is, block reservation is designed to ensure every CoW could
be fulfilled without error.

That's to say, for case like CoW write with csum, we need to care about
space reservation not only for EXTENT_DATA in fs tree, but also later
EXTENT_ITEM for extent tree, and CSUM_ITEM for csum tree.

However extent tree and csum tree doesn't contribute to quota at all.
If we follow such over-reservation, it would be much much easier to hit
false EDQUOTA early.

That's the main reason why a separate (and a little simplified) block
rsv tracking system.

And if there is better solution, I'm all ears.

Thanks,
Qu

> 
> -Jeff
> 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/ctree.h   | 18 +
>>  fs/btrfs/extent-tree.c | 55 
>> --
>>  2 files changed, 62 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 0c58f92c2d44..97783ba91e00 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>>  unsigned short full;
>>  unsigned short type;
>>  unsigned short failfast;
>> +
>> +/*
>> + * Qgroup equivalent for @size @reserved
>> + *
>> + * Unlike normal normal @size/@reserved for inode rsv,
>> + * qgroup doesn't care about things like csum size nor how many tree
>> + * blocks it will need to reserve.
>> + *
>> + * Qgroup cares more about *NET* change of extent usage.
>> + * So for ONE newly inserted file extent, in worst case it will cause
>> + * leaf split and level increase, nodesize for each file extent
>> + * is already way overkilled.
>> + *
>> + * In short, qgroup_size/reserved is the up limit of possible needed
>> + * qgroup metadata reservation.
>> + */
>> +u64 qgroup_rsv_size;
>> +u64 qgroup_rsv_reserved;
>>  };
>>  
>>  /*
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 986660f301de..9d579c7bcf7f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
>> btrfs_fs_info *fs_info,
>>  
>>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>>  struct btrfs_block_rsv *block_rsv,
>> -struct btrfs_block_rsv *dest, u64 num_bytes)
>> +struct btrfs_block_rsv *dest, u64 num_bytes,
>> +u64 *qgroup_to_release_ret)
>>  {
>>  struct btrfs_space_info *space_info = block_rsv->space_info;
>> +u64 qgroup_to_release = 0;
>>  u64 ret;
>>  
>>  spin_lock(_rsv->lock);
>> -if (num_bytes == (u64)-1)
>> +if (num_bytes == (u64)-1) {
>>  num_bytes = block_rsv->size;
>> +qgroup_to_release = block_rsv->qgroup_rsv_size;
>> +}
>>  block_rsv->size -= num_bytes;
>>  if (block_rsv->reserved >= block_rsv->size) {
>>  num_bytes = block_rsv->reserved - block_rsv->size;
>> @@ -5581,6 +5585,12 @@ static u64 

Re: [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2018-02-22 Thread Jeff Mahoney
On 12/22/17 1:18 AM, Qu Wenruo wrote:
> Unlike reservation calculation used in inode rsv for metadata, qgroup
> doesn't really need to care things like csum size or extent usage for
> whole tree COW.
> 
> Qgroup care more about net change of extent usage.
> That's to say, if we're going to insert one file extent, it will mostly
> find its place in CoWed tree block, leaving no change in extent usage.
> Or cause leaf split, result one new net extent, increasing qgroup number
> by nodesize.
> (Or even more rare case, increase the tree level, increasing qgroup
> number by 2 * nodesize)
> 
> So here instead of using the way overkilled calculation for extent
> allocator, which cares more about accurate and no error, qgroup doesn't
> need that over-calculated reservation.
> 
> This patch will maintain 2 new members in btrfs_block_rsv structure for
> qgroup, using much smaller calculation for qgroup rsv, reducing false
> EDQUOT.


I think this is the right idea but, rather than being the last step in a
qgroup rework, it should be the first.  Don't qgroup reservation
lifetimes match the block reservation lifetimes?  We wouldn't want a
global reserve and we wouldn't track usage on a per-block basis, but
they should otherwise match.  We already have clear APIs surrounding the
use of block reservations, so it seems to me that it'd make sense to
extend those to cover the qgroup cases as well.  Otherwise, the rest of
your patchset makes a parallel reservation system with a different API.
That keeps the current state of qgroups as a bolt-on that always needs
to be tracked separately (and which developers need to ensure they get
right).

-Jeff

> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/ctree.h   | 18 +
>  fs/btrfs/extent-tree.c | 55 
> --
>  2 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0c58f92c2d44..97783ba91e00 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -467,6 +467,24 @@ struct btrfs_block_rsv {
>   unsigned short full;
>   unsigned short type;
>   unsigned short failfast;
> +
> + /*
> +  * Qgroup equivalent for @size @reserved
> +  *
> +  * Unlike normal normal @size/@reserved for inode rsv,
> +  * qgroup doesn't care about things like csum size nor how many tree
> +  * blocks it will need to reserve.
> +  *
> +  * Qgroup cares more about *NET* change of extent usage.
> +  * So for ONE newly inserted file extent, in worst case it will cause
> +  * leaf split and level increase, nodesize for each file extent
> +  * is already way overkilled.
> +  *
> +  * In short, qgroup_size/reserved is the up limit of possible needed
> +  * qgroup metadata reservation.
> +  */
> + u64 qgroup_rsv_size;
> + u64 qgroup_rsv_reserved;
>  };
>  
>  /*
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 986660f301de..9d579c7bcf7f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
> btrfs_fs_info *fs_info,
>  
>  static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
>   struct btrfs_block_rsv *block_rsv,
> - struct btrfs_block_rsv *dest, u64 num_bytes)
> + struct btrfs_block_rsv *dest, u64 num_bytes,
> + u64 *qgroup_to_release_ret)
>  {
>   struct btrfs_space_info *space_info = block_rsv->space_info;
> + u64 qgroup_to_release = 0;
>   u64 ret;
>  
>   spin_lock(_rsv->lock);
> - if (num_bytes == (u64)-1)
> + if (num_bytes == (u64)-1) {
>   num_bytes = block_rsv->size;
> + qgroup_to_release = block_rsv->qgroup_rsv_size;
> + }
>   block_rsv->size -= num_bytes;
>   if (block_rsv->reserved >= block_rsv->size) {
>   num_bytes = block_rsv->reserved - block_rsv->size;
> @@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct 
> btrfs_fs_info *fs_info,
>   } else {
>   num_bytes = 0;
>   }
> + if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
> + qgroup_to_release = block_rsv->qgroup_rsv_reserved -
> + block_rsv->qgroup_rsv_size;
> + block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
> + } else
> + qgroup_to_release = 0;
>   spin_unlock(_rsv->lock);
>  
>   ret = num_bytes;
> @@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info 
> *fs_info,
>   space_info_add_old_bytes(fs_info, space_info,
>num_bytes);
>   }
> + if (qgroup_to_release_ret)
> + *qgroup_to_release_ret = qgroup_to_release;
>   return ret;
>  }
>  
> @@ -5744,17 +5756,21 

[PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv

2017-12-21 Thread Qu Wenruo
Unlike reservation calculation used in inode rsv for metadata, qgroup
doesn't really need to care things like csum size or extent usage for
whole tree COW.

Qgroup care more about net change of extent usage.
That's to say, if we're going to insert one file extent, it will mostly
find its place in CoWed tree block, leaving no change in extent usage.
Or cause leaf split, result one new net extent, increasing qgroup number
by nodesize.
(Or even more rare case, increase the tree level, increasing qgroup
number by 2 * nodesize)

So here instead of using the way overkilled calculation for extent
allocator, which cares more about accurate and no error, qgroup doesn't
need that over-calculated reservation.

This patch will maintain 2 new members in btrfs_block_rsv structure for
qgroup, using much smaller calculation for qgroup rsv, reducing false
EDQUOT.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h   | 18 +
 fs/btrfs/extent-tree.c | 55 --
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0c58f92c2d44..97783ba91e00 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -467,6 +467,24 @@ struct btrfs_block_rsv {
unsigned short full;
unsigned short type;
unsigned short failfast;
+
+   /*
+* Qgroup equivalent for @size @reserved
+*
+* Unlike normal normal @size/@reserved for inode rsv,
+* qgroup doesn't care about things like csum size nor how many tree
+* blocks it will need to reserve.
+*
+* Qgroup cares more about *NET* change of extent usage.
+* So for ONE newly inserted file extent, in worst case it will cause
+* leaf split and level increase, nodesize for each file extent
+* is already way overkilled.
+*
+* In short, qgroup_size/reserved is the up limit of possible needed
+* qgroup metadata reservation.
+*/
+   u64 qgroup_rsv_size;
+   u64 qgroup_rsv_reserved;
 };
 
 /*
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 986660f301de..9d579c7bcf7f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5565,14 +5565,18 @@ static void space_info_add_new_bytes(struct 
btrfs_fs_info *fs_info,
 
 static u64 block_rsv_release_bytes(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *block_rsv,
-   struct btrfs_block_rsv *dest, u64 num_bytes)
+   struct btrfs_block_rsv *dest, u64 num_bytes,
+   u64 *qgroup_to_release_ret)
 {
struct btrfs_space_info *space_info = block_rsv->space_info;
+   u64 qgroup_to_release = 0;
u64 ret;
 
spin_lock(_rsv->lock);
-   if (num_bytes == (u64)-1)
+   if (num_bytes == (u64)-1) {
num_bytes = block_rsv->size;
+   qgroup_to_release = block_rsv->qgroup_rsv_size;
+   }
block_rsv->size -= num_bytes;
if (block_rsv->reserved >= block_rsv->size) {
num_bytes = block_rsv->reserved - block_rsv->size;
@@ -5581,6 +5585,12 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info 
*fs_info,
} else {
num_bytes = 0;
}
+   if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
+   qgroup_to_release = block_rsv->qgroup_rsv_reserved -
+   block_rsv->qgroup_rsv_size;
+   block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
+   } else
+   qgroup_to_release = 0;
spin_unlock(_rsv->lock);
 
ret = num_bytes;
@@ -5603,6 +5613,8 @@ static u64 block_rsv_release_bytes(struct btrfs_fs_info 
*fs_info,
space_info_add_old_bytes(fs_info, space_info,
 num_bytes);
}
+   if (qgroup_to_release_ret)
+   *qgroup_to_release_ret = qgroup_to_release;
return ret;
 }
 
@@ -5744,17 +5756,21 @@ int btrfs_inode_rsv_refill(struct btrfs_inode *inode,
struct btrfs_root *root = inode->root;
struct btrfs_block_rsv *block_rsv = >block_rsv;
u64 num_bytes = 0;
+   u64 qgroup_num_bytes = 0;
int ret = -ENOSPC;
 
spin_lock(_rsv->lock);
if (block_rsv->reserved < block_rsv->size)
num_bytes = block_rsv->size - block_rsv->reserved;
+   if (block_rsv->qgroup_rsv_reserved < block_rsv->qgroup_rsv_size)
+   qgroup_num_bytes = block_rsv->qgroup_rsv_size -
+  block_rsv->qgroup_rsv_reserved;
spin_unlock(_rsv->lock);
 
if (num_bytes == 0)
return 0;
 
-   ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
+   ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_num_bytes, true);
if (ret)