Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-13 Thread Austin S. Hemmelgarn

On 2018-08-12 03:04, Andrei Borzenkov wrote:

12.08.2018 06:16, Chris Murphy пишет:

On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.dun...@cox.net> wrote:

Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:


But whether data is shared or exclusive seems potentially ephemeral, and
not something a sysadmin should even be able to anticipate let alone
individual users.


Define "user(s)".


The person who is saving their document on a network share, and
they've never heard of Btrfs.



Arguably, in the context of btrfs tool usage, "user" /is/ the admin,


I'm not talking about btrfs tools. I'm talking about rational,
predictable behavior of a shared folder.

If I try to drop a 1GiB file into my share and I'm denied, not enough
free space, and behind the scenes it's because of a quota limit, I
expect I can delete *any* file(s) amounting to create 1GiB free space
and then I'll be able to drop that file successfully without error.

But if I'm unwittingly deleting shared files, my quota usage won't go
down, and I still can't save my file. So now I somehow need a secret
incantation to discover only my exclusive files and delete enough of
them in order to save this 1GiB file. It's weird, it's unexpected, I
think it's a use case failure. Maybe Btrfs quotas isn't meant to work
with samba or NFS shares. *shrug*



That's how both NetApp and ZFS work as well. I doubt anyone can
seriously call NetApp "not meant to work with NFS or CIFS shares".

On NetApp space available to NFS/CIFS user is volume size minus space
frozen in snapshots. If file, captured in snapshot, is deleted in active
file system, it does not make a single byte available to external user.
That's what surprised most every first time NetApp users.

On ZFS snapshots are contained in dataset and you limit total dataset
space consumption including all snapshots. Thus end effect is the same -
deleting data that is itself captured in snapshot does not make a single
byte available. ZFS allows you to additionally restrict active file
system size ("referenced" quota in ZFS) - this more closely matches your
expectation - deleting file in active file system decreases its
"referenced" size thus allowing user to write more data (as long as user
does not exceed total dataset quota). This is different from btrfs
"exculsive" and "shared". This should not be hard to implement in btrfs,
as "referenced" simply means all data in current subvolume, be it
exclusive or shared.

IOW ZFS allows to place restriction on both how much data user can use
and how much data user is allowed additionally to protect (snapshot).
Except user created snapshots are kind of irrelevant here.  If we're 
talking about NFS/CIFS/SMB, there is no way for the user to create a 
snapshot (at least, not in-band), so provided the admin is sensible and 
only uses the referenced quota for limiting space usage by users, things 
behave no differently on  ZFS than they do on ext4 or XFS using user quotas.


Note also that a lot of storage appliances that use ZFS as the 
underlying storage don't expose any way for the admin to use anything 
other than the referenced quota (and usually space reservations).  They 
do this because it makes the system behave as pretty much everyone 
intuitively expects, and it ensures that users don't have to go to an 
admin to remedy their free space issues.






"Regular users" as you use the term, that is the non-admins who just need
to know how close they are to running out of their allotted storage
resources, shouldn't really need to care about btrfs tool usage in the
first place, and btrfs commands in general, including btrfs quota related
commands, really aren't targeted at them, and aren't designed to report
the type of information they are likely to find useful.  Other tools will
be more appropriate.


I'm not talking about any btrfs commands or even the term quota for
regular users. I'm talking about saving a file, being denied, and how
does the user figure out how to free up space?



Users need to be educated. Same as with NetApp and ZFS. There is no
magic, redirect-on-write filesystems work differently than traditional
and users need to adapt.

Of course devil is in details, and usability of btrfs quota is far lower
than NetApp/ZFS. In those space consumption information is first class
citizen integrated into the very basic tools, not something bolted on
later and mostly incomprehensible to end user.
Except that this _CAN_ be made to work and behave just like classic 
quotas.  Your example of ZFS above proves it (referenced quotas behave 
just like classic VFS quotas).  Yes, we need to educate users regarding 
qgroups, but we need a _WORKING_ alternative so they can do things like 
they always have, and like most stuff that uses ZFS as part of a 
pre-built system (FreeNAS for example) does.



Anyway, it's a hypothetical scenario. While I have Samba running on a
Btrfs volume with various shares as subvolumes, I don't have quotas
enabled.


Given 

Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-12 Thread Andrei Borzenkov
12.08.2018 10:04, Andrei Borzenkov пишет:
> 
> On ZFS snapshots are contained in dataset and you limit total dataset
> space consumption including all snapshots. Thus end effect is the same -
> deleting data that is itself captured in snapshot does not make a single
> byte available. ZFS allows you to additionally restrict active file
> system size ("referenced" quota in ZFS) - this more closely matches your
> expectation - deleting file in active file system decreases its
> "referenced" size thus allowing user to write more data (as long as user
> does not exceed total dataset quota). This is different from btrfs
> "exculsive" and "shared". This should not be hard to implement in btrfs,
> as "referenced" simply means all data in current subvolume, be it
> exclusive or shared.
> 

Oops, actually this is exactly what "referenced" quota is. Limiting
total subvolume + snapshots is more difficult, as there is no inherent
connection between qgroups of source and snapshot nor any built-in way
to include snapshot qgroup in some common total qgroup when creating
snapshot.


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-12 Thread Andrei Borzenkov
12.08.2018 06:16, Chris Murphy пишет:
> On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.dun...@cox.net> wrote:
>> Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:
>>
>>> But whether data is shared or exclusive seems potentially ephemeral, and
>>> not something a sysadmin should even be able to anticipate let alone
>>> individual users.
>>
>> Define "user(s)".
> 
> The person who is saving their document on a network share, and
> they've never heard of Btrfs.
> 
> 
>> Arguably, in the context of btrfs tool usage, "user" /is/ the admin,
> 
> I'm not talking about btrfs tools. I'm talking about rational,
> predictable behavior of a shared folder.
> 
> If I try to drop a 1GiB file into my share and I'm denied, not enough
> free space, and behind the scenes it's because of a quota limit, I
> expect I can delete *any* file(s) amounting to create 1GiB free space
> and then I'll be able to drop that file successfully without error.
> 
> But if I'm unwittingly deleting shared files, my quota usage won't go
> down, and I still can't save my file. So now I somehow need a secret
> incantation to discover only my exclusive files and delete enough of
> them in order to save this 1GiB file. It's weird, it's unexpected, I
> think it's a use case failure. Maybe Btrfs quotas isn't meant to work
> with samba or NFS shares. *shrug*
> 

That's how both NetApp and ZFS work as well. I doubt anyone can
seriously call NetApp "not meant to work with NFS or CIFS shares".

On NetApp space available to NFS/CIFS user is volume size minus space
frozen in snapshots. If file, captured in snapshot, is deleted in active
file system, it does not make a single byte available to external user.
That's what surprised most every first time NetApp users.

On ZFS snapshots are contained in dataset and you limit total dataset
space consumption including all snapshots. Thus end effect is the same -
deleting data that is itself captured in snapshot does not make a single
byte available. ZFS allows you to additionally restrict active file
system size ("referenced" quota in ZFS) - this more closely matches your
expectation - deleting file in active file system decreases its
"referenced" size thus allowing user to write more data (as long as user
does not exceed total dataset quota). This is different from btrfs
"exculsive" and "shared". This should not be hard to implement in btrfs,
as "referenced" simply means all data in current subvolume, be it
exclusive or shared.

IOW ZFS allows to place restriction on both how much data user can use
and how much data user is allowed additionally to protect (snapshot).

> 
> 
>>
>> "Regular users" as you use the term, that is the non-admins who just need
>> to know how close they are to running out of their allotted storage
>> resources, shouldn't really need to care about btrfs tool usage in the
>> first place, and btrfs commands in general, including btrfs quota related
>> commands, really aren't targeted at them, and aren't designed to report
>> the type of information they are likely to find useful.  Other tools will
>> be more appropriate.
> 
> I'm not talking about any btrfs commands or even the term quota for
> regular users. I'm talking about saving a file, being denied, and how
> does the user figure out how to free up space?
> 

Users need to be educated. Same as with NetApp and ZFS. There is no
magic, redirect-on-write filesystems work differently than traditional
and users need to adapt.

Of course devil is in details, and usability of btrfs quota is far lower
than NetApp/ZFS. In those space consumption information is first class
citizen integrated into the very basic tools, not something bolted on
later and mostly incomprehensible to end user.

> Anyway, it's a hypothetical scenario. While I have Samba running on a
> Btrfs volume with various shares as subvolumes, I don't have quotas
> enabled.
> 
> 
> 

Given all performance issues with quota reported on this list it is
probably just as good for you.


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-11 Thread Chris Murphy
On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:
>
>> But whether data is shared or exclusive seems potentially ephemeral, and
>> not something a sysadmin should even be able to anticipate let alone
>> individual users.
>
> Define "user(s)".

The person who is saving their document on a network share, and
they've never heard of Btrfs.


> Arguably, in the context of btrfs tool usage, "user" /is/ the admin,

I'm not talking about btrfs tools. I'm talking about rational,
predictable behavior of a shared folder.

If I try to drop a 1GiB file into my share and I'm denied, not enough
free space, and behind the scenes it's because of a quota limit, I
expect I can delete *any* file(s) amounting to create 1GiB free space
and then I'll be able to drop that file successfully without error.

But if I'm unwittingly deleting shared files, my quota usage won't go
down, and I still can't save my file. So now I somehow need a secret
incantation to discover only my exclusive files and delete enough of
them in order to save this 1GiB file. It's weird, it's unexpected, I
think it's a use case failure. Maybe Btrfs quotas isn't meant to work
with samba or NFS shares. *shrug*



>
> "Regular users" as you use the term, that is the non-admins who just need
> to know how close they are to running out of their allotted storage
> resources, shouldn't really need to care about btrfs tool usage in the
> first place, and btrfs commands in general, including btrfs quota related
> commands, really aren't targeted at them, and aren't designed to report
> the type of information they are likely to find useful.  Other tools will
> be more appropriate.

I'm not talking about any btrfs commands or even the term quota for
regular users. I'm talking about saving a file, being denied, and how
does the user figure out how to free up space?

Anyway, it's a hypothetical scenario. While I have Samba running on a
Btrfs volume with various shares as subvolumes, I don't have quotas
enabled.



-- 
Chris Murphy


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-11 Thread Andrei Borzenkov
10.08.2018 12:33, Tomasz Pala пишет:
> 
>> For 4 disk with 1T free space each, if you're using RAID5 for data, then
>> you can write 3T data.
>> But if you're also using RAID10 for metadata, and you're using default
>> inline, we can use small files to fill the free space, resulting 2T
>> available space.
>>
>> So in this case how would you calculate the free space? 3T or 2T or
>> anything between them?
> 
> The answear is pretty simple: 3T. Rationale:
> - this is the space I do can put in a single data stream,
> - people are aware that there is metadata overhead with any object;
>   after all, metadata are also data,
> - while filling the fs with small files the free space available would
>   self-adjust after every single file put, so after uploading 1T of such
>   files the df should report 1.5T free. There would be nothing weird(er
>   that now) that 1T of data has actually eaten 1.5T of storage.
> 
> No crystal ball calculations, just KISS; since one _can_ put 3T file
> (non sparse, uncompressible, bulk written) on a filesystem, the free space is 
> 3T.
> 

As far as I can tell, that is exactly what "df" reports now. "btrfs fi
us" will tell you both max (reported by "df") and worst case min.


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-11 Thread Andrei Borzenkov
10.08.2018 21:21, Tomasz Pala пишет:
> On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote:
> 
>>> I.e.: every shared segment should be accounted within quota (at least once).
>> I think what you mean to say here is that every shared extent should be 
>> accounted to quotas for every location it is reflinked from.  IOW, that 
>> if an extent is shared between two subvolumes each with it's own quota, 
>> they should both have it accounted against their quota.
> 
> Yes.
> 

This is what "referenced" in quota group report is, is not it? What is
missing here?


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Andrei Borzenkov
10.08.2018 10:33, Tomasz Pala пишет:
> On Fri, Aug 10, 2018 at 07:03:18 +0300, Andrei Borzenkov wrote:
> 
>>> So - the limit set on any user
>>
>> Does btrfs support per-user quota at all? I am aware only of per-subvolume 
>> quotas.
> 
> Well, this is a kind of deceptive word usage in "post-truth" times.
> 
> In this case both "user" and "quota" are not valid...
> - by "user" I ment general word, not unix-user account; such user might
>   possess some container running full-blown guest OS,
> - by "quota" btrfs means - I guess, dataset-quotas?
> 
> 
> In fact: https://btrfs.wiki.kernel.org/index.php/Quota_support
> "Quota support in BTRFS is implemented at a subvolume level by the use of 
> quota groups or qgroup"
> 
> - what the hell is "quota group" and how it differs from qgroup? According to 
> btrfs-quota(8):
> 
> "The quota groups (qgroups) are managed by the subcommand btrfs qgroup(8)"
> 
> - they are the same... just completely different from traditional "quotas".
> 
> 
> My suggestion would be to completely remove the standalone "quota" word
> from btrfs documentation - there is no "quota", just "subvolume quota"
> or "qgroup" supported.
> 

Well, qgroup allows you to limit amount of data that can be stored in
subvolume (or under quota group in general), so it behaves like
traditional quota to me.


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Duncan
Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted:

> But whether data is shared or exclusive seems potentially ephemeral, and
> not something a sysadmin should even be able to anticipate let alone
> individual users.

Define "user(s)".

Arguably, in the context of btrfs tool usage, "user" /is/ the admin, the 
one who cares that it's btrfs in the first place, who should have chosen 
btrfs based on the best-case match for the use-case, and who continues to 
maintain the system's btrfs filesystems using btrfs tools.

Arguably, in this context "user" is /not/ the other users the admin is 
caring for the system in behalf of, who don't care /what/ is under the 
covers so long as it works, to which should be made available more 
appropriate-to-their-needs tools should they be found necessary or useful.

> Going back to the example, I'd expect to give the user a 2GiB quota,
> with 1GiB of initially provisioned data via snapshot, so right off the
> bat they are at 50% usage of their quota. If they were to modify every
> single provisioned file, they'd in effect go from 100% shared data to
> 100% exclusive data, but their quota usage would still be 50%. That's
> completely sane and easily understandable by a regular user. The idea
> that they'd start modifying shared files, and their quota usage climbs
> is weird to me. The state of files being shared or exclusive is not user
> domain terminology anyway.

It's user-domain terminology if the "user" is the admin, who will care 
about shared/exclusive usage in the context of how it affects the usage 
of available storage resources.

"Regular users" as you use the term, that is the non-admins who just need 
to know how close they are to running out of their allotted storage 
resources, shouldn't really need to care about btrfs tool usage in the 
first place, and btrfs commands in general, including btrfs quota related 
commands, really aren't targeted at them, and aren't designed to report 
the type of information they are likely to find useful.  Other tools will 
be more appropriate.

>> The most common case is, you do a snapshot, user would only care how
>> much new space can be written into the subvolume, other than the total
>> subvolume size.
> 
> I think that's expecting a lot of users.

Not really.  Remember, "users" in this context are admins, those to whom 
the duty of maintaining their btrfs falls, and the ones at whom btrfs * 
commands are normally targeted, since this is the btrfs tool designed to 
help them with that job.

And said "users" will (presumably) be concerned about shared/exclusive if 
they're using btrfs quotas because they are trying to well manage the 
filesystem space utilization per subvolume.

(FWIW, "presumably" is thrown in there because here I don't use 
subvolumes /or/ sub-filesystem-level quotas as personally, I prefer to 
manage that at the filesystem level, with multiple independent 
filesystems and the size of individual filesystems enforcing limits on 
how much the stuff stored in them can grow.)

> I also wonder if it expects a lot from services like samba and NFS who
> have to communicate all of this in some sane way to remote clients? My
> expectation is that a remote client shows Free Space on a quota'd system
> to be based on the unused amount of the quota. I also expect if I delete
> a 1GiB file, that my quota consumption goes down. But you're saying it
> would be unchanged if I delete a 1GiB shared file, and would only go
> down if I delete a 1GiB exclusive file. Do samba and NFS know about
> shared and exclusive files? If samba and NFS don't understand this, then
> how is a user supposed to understand it?

There's a reason btrfs quotas don't work with standard VFS level quotas.  
They're managing two different things, and I'd assume the btrfs quota 
information isn't typically what samba/NFS information exporting is 
designed to deal with in the first place.  Just because a screwdriver 
/can/ be used as a hammer doesn't make it the appropriate tool for the 
job.

> And now I'm sufficiently confused I'm ready for the weekend!

LOL!

(I had today/Friday off, arguably why I'm even taking the time to reply, 
but my second day off this "week" is next Tuesday, the last day of the 
schedule-week.  I had actually forgotten that this was the last day of 
the work-week for most, until I saw that, but then, LOL!)

> And we can't have quotas getting busted all of a sudden because the
> sysadmin decides to do -dconvert -mconvert raid1, without requiring the
> sysadmin to double everyone's quota before performing the operation.

Not every_one's_, every-subvolume's.  "Everyone's" quotas shouldn't be 
affected, because that's not what btrfs quotas manage.  There are other 
(non-btrfs) tools for that.

>>> In short: values representing quotas are user-oriented ("the numbers
>>> one bought"), not storage-oriented ("the numbers they actually
>>> occupy").

Btrfs quotas are storage-oriented, and if you're using them, at 

Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Austin S. Hemmelgarn

On 2018-08-10 14:07, Chris Murphy wrote:

On Thu, Aug 9, 2018 at 5:35 PM, Qu Wenruo  wrote:



On 8/10/18 1:48 AM, Tomasz Pala wrote:

On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:


2) Different limitations on exclusive/shared bytes
Btrfs can set different limit on exclusive/shared bytes, further
complicating the problem.

3) Btrfs quota only accounts data/metadata used by the subvolume
It lacks all the shared trees (mentioned below), and in fact such
shared tree can be pretty large (especially for extent tree and csum
tree).


I'm not sure about the implications, but just to clarify some things:

when limiting somebody's data space we usually don't care about the
underlying "savings" coming from any deduplicating technique - these are
purely bonuses for system owner, so he could do larger resource overbooking.


In reality that's definitely not the case.

 From what I see, most users would care more about exclusively used space
(excl), other than the total space one subvolume is referring to (rfer).


I'm confused.

So what happens in the following case with quotas enabled on Btrfs:

1. Provision a user with a directory, pre-populated with files, using
snapshot. Let's say it's 1GiB of files.
2. Set a quota for this user's directory, 1GiB.

The way I'm reading the description of Btrfs quotas, the 1GiB quota
applies to exclusive used space. So for starters, they have 1GiB of
shared data that does not affect their 1GiB quota at all.

3. User creates 500MiB worth of new files, this is exclusive usage.
They are still within their quota limit.
4. The shared data becomes obsolete for all but this one user, and is deleted.

Suddenly, 1GiB of shared data for this user is no longer shared data,
it instantly becomes exclusive data and their quota is busted. Now
consider scaling this to 12TiB of storage, with hundreds of users, and
dozens of abruptly busted quotas following this same scenario on a
weekly basis.

I *might* buy off on the idea that an overlay2 based initial
provisioning would not affect quotas. But whether data is shared or
exclusive seems potentially ephemeral, and not something a sysadmin
should even be able to anticipate let alone individual users.

Going back to the example, I'd expect to give the user a 2GiB quota,
with 1GiB of initially provisioned data via snapshot, so right off the
bat they are at 50% usage of their quota. If they were to modify every
single provisioned file, they'd in effect go from 100% shared data to
100% exclusive data, but their quota usage would still be 50%. That's
completely sane and easily understandable by a regular user. The idea
that they'd start modifying shared files, and their quota usage climbs
is weird to me. The state of files being shared or exclusive is not
user domain terminology anyway.
And it's important to note that this is the _only_ way this can sanely 
work for actually partitioning resources, which is the primary classical 
use case for quotas.


Being able to see how much data is shared and exclusive in a subvolume 
is nice, but quota groups are the wrong name for it because the current 
implementation does not work at all like quotas and can trivially result 
in both users escaping quotas (multiple ways), and in quotas being 
overreached by very large amounts for potentially indefinite periods of 
time because of actions of individuals who _don't_ own the data the 
quota is for.





The most common case is, you do a snapshot, user would only care how
much new space can be written into the subvolume, other than the total
subvolume size.


I think that's expecting a lot of users.

I also wonder if it expects a lot from services like samba and NFS who
have to communicate all of this in some sane way to remote clients? My
expectation is that a remote client shows Free Space on a quota'd
system to be based on the unused amount of the quota. I also expect if
I delete a 1GiB file, that my quota consumption goes down. But you're
saying it would be unchanged if I delete a 1GiB shared file, and would
only go down if I delete a 1GiB exclusive file. Do samba and NFS know
about shared and exclusive files? If samba and NFS don't understand
this, then how is a user supposed to understand it?
It might be worth looking at how Samba and NFS work on top of ZFS on a 
platform like FreeNAS and trying to emulate that.


Behavior there is as-follows:

* The total size of the 'disk' reported over SMB (shown on Windows only 
if you map the share as a drive) is equal to the quota for the 
underlying dataset.
* The reported space used on the 'disk' reported over SMB is based on 
physical space usage after compression, with a few caveats relating to 
deduplication:
- Data which is shared across multiple datasets is accounted 
against _all_ datasets that reference it.
- Data which is shared only within a given dataset is accounted 
only once.

* Free space is reported simply as the total size minus the used space.
* Usage reported by 

Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Austin S. Hemmelgarn

On 2018-08-10 14:21, Tomasz Pala wrote:

On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote:


I.e.: every shared segment should be accounted within quota (at least once).

I think what you mean to say here is that every shared extent should be
accounted to quotas for every location it is reflinked from.  IOW, that
if an extent is shared between two subvolumes each with it's own quota,
they should both have it accounted against their quota.


Yes.


Moreover - if there would be per-subvolume RAID levels someday, the data
should be accouted in relation to "default" (filesystem) RAID level,
i.e. having a RAID0 subvolume on RAID1 fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).


This is irrelevant to your point here.  In fact, it goes against it,
you're arguing for quotas to report data like `du`, but all of
chunk-profile stuff is invisible to `du` (and everything else in
userspace that doesn't look through BTRFS ioctls).


My point is user-point, not some system tool like du. Consider this:
1. user wants higher (than default) protection of some data,
2. user wants more storage space with less protection.

Ad. 1 - requesting better redundancy is similar to cp --reflink=never
- there are functional differences, but the cost is similar: trading
   space for security,

Ad. 2 - many would like to have .cache, .ccache, tmp or some build
system directory with faster writes and no redundancy at all. This
requires per-file/directory data profile attrs though.

Since we agreed that transparent data compression is user's storage bonus,
gains from the reduced redundancy should also profit user.
Do you actually know of any services that do this though?  I mean, 
Amazon S3 and similar services have the option of reduced redundancy 
(and other alternate storage tiers), but they charge 
per-unit-data-per-unit-time with no hard limit on how much space they 
use, and charge different rates for different storage tiers.  In 
comparison, what you appear to be talking about is something more 
similar to Dropbox or Google Drive, where you pay up front for a fixed 
amount of storage for a fixed amount of time and can't use more than 
that, and all the services I know of like that offer exactly one option 
for storage redundancy.


That aside, you seem to be overthinking this.  No sane provider is going 
to give their users the ability to create subvolumes themselves (there's 
too much opportunity for a tiny bug in your software to cost you a _lot_ 
of lost revenue, because creating subvolumes can let you escape qgroups) 
 That means in turn that what you're trying to argue for is no 
different from the provider just selling units of storage for different 
redundancy levels separately, and charging different rates for each of 
them.  In fact, that approach is better, because it works independent of 
the underlying storage technology (it will work with hardware RAID, 
LVM2, MD, ZFS, and even distributed storage platforms like Ceph and 
Gluster), _and_ it lets them charge differently than the trivial case of 
N copies costing N times as much as one copy (which is not quite 
accurate in terms of actual management costs).


Now, if BTRFS were to have the ability to set profiles per-file, then 
this might be useful, albeit with the option to tune how it gets accounted.


Disclaimer: all the above statements in relation to conception and
understanding of quotas, not to be confused with qgroups.





Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Tomasz Pala
On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote:

>> I.e.: every shared segment should be accounted within quota (at least once).
> I think what you mean to say here is that every shared extent should be 
> accounted to quotas for every location it is reflinked from.  IOW, that 
> if an extent is shared between two subvolumes each with it's own quota, 
> they should both have it accounted against their quota.

Yes.

>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
>
> This is irrelevant to your point here.  In fact, it goes against it, 
> you're arguing for quotas to report data like `du`, but all of 
> chunk-profile stuff is invisible to `du` (and everything else in 
> userspace that doesn't look through BTRFS ioctls).

My point is user-point, not some system tool like du. Consider this:
1. user wants higher (than default) protection of some data,
2. user wants more storage space with less protection.

Ad. 1 - requesting better redundancy is similar to cp --reflink=never
- there are functional differences, but the cost is similar: trading
  space for security,

Ad. 2 - many would like to have .cache, .ccache, tmp or some build
system directory with faster writes and no redundancy at all. This
requires per-file/directory data profile attrs though.

Since we agreed that transparent data compression is user's storage bonus,
gains from the reduced redundancy should also profit user.


Disclaimer: all the above statements in relation to conception and
understanding of quotas, not to be confused with qgroups.

-- 
Tomasz Pala 


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Chris Murphy
On Thu, Aug 9, 2018 at 5:35 PM, Qu Wenruo  wrote:
>
>
> On 8/10/18 1:48 AM, Tomasz Pala wrote:
>> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
>>
>>> 2) Different limitations on exclusive/shared bytes
>>>Btrfs can set different limit on exclusive/shared bytes, further
>>>complicating the problem.
>>>
>>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>>It lacks all the shared trees (mentioned below), and in fact such
>>>shared tree can be pretty large (especially for extent tree and csum
>>>tree).
>>
>> I'm not sure about the implications, but just to clarify some things:
>>
>> when limiting somebody's data space we usually don't care about the
>> underlying "savings" coming from any deduplicating technique - these are
>> purely bonuses for system owner, so he could do larger resource overbooking.
>
> In reality that's definitely not the case.
>
> From what I see, most users would care more about exclusively used space
> (excl), other than the total space one subvolume is referring to (rfer).

I'm confused.

So what happens in the following case with quotas enabled on Btrfs:

1. Provision a user with a directory, pre-populated with files, using
snapshot. Let's say it's 1GiB of files.
2. Set a quota for this user's directory, 1GiB.

The way I'm reading the description of Btrfs quotas, the 1GiB quota
applies to exclusive used space. So for starters, they have 1GiB of
shared data that does not affect their 1GiB quota at all.

3. User creates 500MiB worth of new files, this is exclusive usage.
They are still within their quota limit.
4. The shared data becomes obsolete for all but this one user, and is deleted.

Suddenly, 1GiB of shared data for this user is no longer shared data,
it instantly becomes exclusive data and their quota is busted. Now
consider scaling this to 12TiB of storage, with hundreds of users, and
dozens of abruptly busted quotas following this same scenario on a
weekly basis.

I *might* buy off on the idea that an overlay2 based initial
provisioning would not affect quotas. But whether data is shared or
exclusive seems potentially ephemeral, and not something a sysadmin
should even be able to anticipate let alone individual users.

Going back to the example, I'd expect to give the user a 2GiB quota,
with 1GiB of initially provisioned data via snapshot, so right off the
bat they are at 50% usage of their quota. If they were to modify every
single provisioned file, they'd in effect go from 100% shared data to
100% exclusive data, but their quota usage would still be 50%. That's
completely sane and easily understandable by a regular user. The idea
that they'd start modifying shared files, and their quota usage climbs
is weird to me. The state of files being shared or exclusive is not
user domain terminology anyway.


>
> The most common case is, you do a snapshot, user would only care how
> much new space can be written into the subvolume, other than the total
> subvolume size.

I think that's expecting a lot of users.

I also wonder if it expects a lot from services like samba and NFS who
have to communicate all of this in some sane way to remote clients? My
expectation is that a remote client shows Free Space on a quota'd
system to be based on the unused amount of the quota. I also expect if
I delete a 1GiB file, that my quota consumption goes down. But you're
saying it would be unchanged if I delete a 1GiB shared file, and would
only go down if I delete a 1GiB exclusive file. Do samba and NFS know
about shared and exclusive files? If samba and NFS don't understand
this, then how is a user supposed to understand it?

And now I'm sufficiently confused I'm ready for the weekend!


>> And the numbers accounted should reflect the uncompressed sizes.
>
> No way for current extent based solution.

I'm less concerned about this. But since the extent item shows both
ram and disk byte values, why couldn't the quota and the space
reporting be predicated on the ram value which is always uncompressed?



>
>>
>>
>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
>
> No possible again for current extent based solution.

It's fine, I think it's unintuitive for DUP or raid1 profiles to cause
quota consumption to double. The underlying configuration of the array
is not the business of the user. They can only be expected to
understand file size. Underlying space consumed, whether compressed,
or duplicated, or compressed and duplicated, is out of scope for the
user. And we can't have quotas getting busted all of a sudden because
the sysadmin decides to do -dconvert -mconvert raid1, without
requiring the sysadmin to double everyone's quota before performing
the operation.





>
>>
>>
>> In 

Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Austin S. Hemmelgarn

On 2018-08-09 13:48, Tomasz Pala wrote:

On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:


2) Different limitations on exclusive/shared bytes
Btrfs can set different limit on exclusive/shared bytes, further
complicating the problem.

3) Btrfs quota only accounts data/metadata used by the subvolume
It lacks all the shared trees (mentioned below), and in fact such
shared tree can be pretty large (especially for extent tree and csum
tree).


I'm not sure about the implications, but just to clarify some things:

when limiting somebody's data space we usually don't care about the
underlying "savings" coming from any deduplicating technique - these are
purely bonuses for system owner, so he could do larger resource overbooking.

So - the limit set on any user should enforce maximum and absolute space
he has allocated, including the shared stuff. I could even imagine that
creating a snapshot might immediately "eat" the available quota. In a
way, that quota returned matches (give or take) `du` reported usage,
unless "do not account reflinks withing single qgroup" was easy to implemet.

I.e.: every shared segment should be accounted within quota (at least once).
I think what you mean to say here is that every shared extent should be 
accounted to quotas for every location it is reflinked from.  IOW, that 
if an extent is shared between two subvolumes each with it's own quota, 
they should both have it accounted against their quota.


And the numbers accounted should reflect the uncompressed sizes.
This is actually inconsistent with pretty much every other VFS level 
quota system in existence.  Even ZFS does it's accounting _after_ 
compression.  At this point, it's actually expected by most sysadmins 
that things behave that way.



Moreover - if there would be per-subvolume RAID levels someday, the data
should be accouted in relation to "default" (filesystem) RAID level,
i.e. having a RAID0 subvolume on RAID1 fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).
This is irrelevant to your point here.  In fact, it goes against it, 
you're arguing for quotas to report data like `du`, but all of 
chunk-profile stuff is invisible to `du` (and everything else in 
userspace that doesn't look through BTRFS ioctls).



In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Austin S. Hemmelgarn

On 2018-08-09 19:35, Qu Wenruo wrote:



On 8/10/18 1:48 AM, Tomasz Pala wrote:

On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:


2) Different limitations on exclusive/shared bytes
Btrfs can set different limit on exclusive/shared bytes, further
complicating the problem.

3) Btrfs quota only accounts data/metadata used by the subvolume
It lacks all the shared trees (mentioned below), and in fact such
shared tree can be pretty large (especially for extent tree and csum
tree).


I'm not sure about the implications, but just to clarify some things:

when limiting somebody's data space we usually don't care about the
underlying "savings" coming from any deduplicating technique - these are
purely bonuses for system owner, so he could do larger resource overbooking.


In reality that's definitely not the case.

 From what I see, most users would care more about exclusively used space
(excl), other than the total space one subvolume is referring to (rfer).

The most common case is, you do a snapshot, user would only care how
much new space can be written into the subvolume, other than the total
subvolume size.
I would really love to know exactly who these users are, because it 
sounds to me like you've heard from exactly zero people who are 
currently using conventional quotas to impose actual resource limits on 
other filesystems (instead of just using them for accounting, which is a 
valid use case but not what they were originally designed for).




So - the limit set on any user should enforce maximum and absolute space
he has allocated, including the shared stuff. I could even imagine that
creating a snapshot might immediately "eat" the available quota. In a
way, that quota returned matches (give or take) `du` reported usage,
unless "do not account reflinks withing single qgroup" was easy to implemet.


In fact, that's the case. In current implementation, accounting on
extent is the easiest (if not the only) way to implement.



I.e.: every shared segment should be accounted within quota (at least once).


Already accounted, at least for rfer.



And the numbers accounted should reflect the uncompressed sizes.


No way for current extent based solution.

While this may be true, this would be a killer feature to have.





Moreover - if there would be per-subvolume RAID levels someday, the data
should be accouted in relation to "default" (filesystem) RAID level,
i.e. having a RAID0 subvolume on RAID1 fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).


No possible again for current extent based solution.




In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").


Well, if something is not possible or brings so big performance impact,
there will be no argument on how it should work in the first place.

Thanks,
Qu





Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Tomasz Pala
On Fri, Aug 10, 2018 at 15:55:46 +0800, Qu Wenruo wrote:

>> The first thing about virtually every mechanism should be
>> discoverability and reliability. I expect my quota not to change without
>> my interaction. Never. How did you cope with this?
>> If not - how are you going to explain such weird behaviour to users?
> 
> Read the manual first.
> Not every feature is suitable for every use case.

I, the sysadm, must RTFM.
My users won't comprehend this and moreover - they won't even care.

> IIRC lvm thin is pretty much the same for the same case.

LVM doesn't pretend to be user-oriented, it is the system scope.
LVM didn't name it's thin provisioning "quotas".

> For 4 disk with 1T free space each, if you're using RAID5 for data, then
> you can write 3T data.
> But if you're also using RAID10 for metadata, and you're using default
> inline, we can use small files to fill the free space, resulting 2T
> available space.
> 
> So in this case how would you calculate the free space? 3T or 2T or
> anything between them?

The answear is pretty simple: 3T. Rationale:
- this is the space I do can put in a single data stream,
- people are aware that there is metadata overhead with any object;
  after all, metadata are also data,
- while filling the fs with small files the free space available would
  self-adjust after every single file put, so after uploading 1T of such
  files the df should report 1.5T free. There would be nothing weird(er
  that now) that 1T of data has actually eaten 1.5T of storage.

No crystal ball calculations, just KISS; since one _can_ put 3T file
(non sparse, uncompressible, bulk written) on a filesystem, the free space is 
3T.

> Only yourself know what the heck you're going to use the that 4 disks
> with 1T free space each.
> Btrfs can't look into your head and know what you're thinking.

It shouldn't. I expect raw data - there is 3TB of unallocated space for
current data profile.

> That's the design from the very beginning of btrfs, yelling at me makes
> no sense at all.

Sorry if you receive me "yelling" - I honestly must put in on my
non-native english. I just want to clarify some terminology and
perspective expectations. They are irrevelant to the underlying
technical solutions, but the literal *description* of the solution
you provide should match user expectations of that terminology.

> I have tried to explain what btrfs quota does and it doesn't, if it
> doesn't fit you use case, that's all.
> (Whether you have ever tried to understand is another problem)

I am (more than before) aware what btrfs quotas are not.

So, my only expectation (except for worldwide peace and other
unrealistic ones) would be to stop using "quotas", "subvolume quotas"
and "qgroups" interchangeably in btrfs context, as IMvHO these are not
plain, well-known "quotas".

-- 
Tomasz Pala 


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Qu Wenruo


On 8/10/18 3:17 PM, Tomasz Pala wrote:
> On Fri, Aug 10, 2018 at 07:35:32 +0800, Qu Wenruo wrote:
> 
>>> when limiting somebody's data space we usually don't care about the
>>> underlying "savings" coming from any deduplicating technique - these are
>>> purely bonuses for system owner, so he could do larger resource overbooking.
>>
>> In reality that's definitely not the case.
> 
> Definitely? How do you "sell" a disk space when there is no upper bound?
> Every, and I mean _every_ resource quota out in the wild gives you an 
> user-perspective.
> You can assign CPU cores/time, RAM or network bandwidth with HARD limit.
> 
> Only after that you _can_ sometimes assign some best-effort
> outer, not guaranteed limits, like extra network bandwidth or grace
> periods with filesystem usage (disregarding technical details - in case
> of quota you move hard limit beyond and apply lowere soft limit).
> 
> This is the primary quota usage. Quotas don't save system resources,
> quotas are valuables to "sell" (by quotes I mean every possible
> allocations, including interorganisation accouting).
> 
> Quotas are overbookable by design and like I said before, the underlying
> savings mechanism allow sysadm to increase actual overbooking ratio.
> 
> If I run out of CPU, RAM, storage or network I simply need to expand
> such resource. I won't shrink quotas in such case.
> Or apply some other resuorce-saving technique, like LVM with VDO,
> swapping, RAM deduplication etc.
> 
> If that is not the usecase of btrfs quotas, then it should be renamed to
> not confuse users. Using the incorrect terms for things widely known
> leads to user frustration at least.
> 
>> From what I see, most users would care more about exclusively used space
>> (excl), other than the total space one subvolume is referring to (rfer).
> 
> Consider this:
> 1. there is some "template" system-wide snapshot,
> 2. users X and Y have CoW copies of it - both see "0 bytes exclusive"?

Yep, although not zero, it's 16K.

> 3. sysadm removes "template" - what happens to X and Y quotas?

Still 16K, unless X or Y dropes their copy.

> 4. user X removes his copy - what happens to Y quota?

Now Y owns the all the snapshot exclusively.

In fact, it's not the correct way to organize your qgroups.
In your case, you should put a higher qgroup (1/0) to contain all the
original snapshot, and user X/Y's subvolume.

In that case, all the snapshots' data and X/Y's newer data are all
exclusive to qgroup 1/0 (as long as you don't do reflink to files out of
subvolume X/Y/snapshot).

And then exclusive number of qgroup 1/0 should be your total usage, and
as long as you don't do reflink out of X/Y/snapshot source, your rfer is
the same as excl, both representing how many bytes used by all three
subvolumes.

This is in btrfs-quota(5) man page.

> 
> The first thing about virtually every mechanism should be
> discoverability and reliability. I expect my quota not to change without
> my interaction. Never. How did you cope with this?
> If not - how are you going to explain such weird behaviour to users?

Read the manual first.
Not every feature is suitable for every use case.

IIRC lvm thin is pretty much the same for the same case.

> 
> Once again: numbers of quotas *I* got must not be influenced by external
> operations or foreign users.
> 
>> The most common case is, you do a snapshot, user would only care how
>> much new space can be written into the subvolume, other than the total
>> subvolume size.
> 
> If only that would be the case... then exactly - I do care how much new
> data is _guaranteed_ to fit on my storage.
> 
> So please tell me, as I might get it wrong - what happens if source
> subvolume get's removed and the CoWed data are not shared anymore?

It's exclusive to the only owner.

> Is the quota recalculated? - this would be wrong, as there were no new data 
> written.

It's recalculated and due to the owner change, the number will change.
It's about extent ownership, as already stated, not all solution suit
all use case.

If you don't think ownership change should change quota, then just don't
use btrfs quota (nor LVM thin if I didn't miss something), it doesn't
fit your use case.

Your use case need LVM snapshot (dm-snapshot), or follow my multi-level
qgroup setup above.

> Is the quota left intact? - this is wrong too, as this gives the false view 
> of exclusive space taken.
> 
> This is just another reincarnation of famous "btrfs df" problem you
> couldn't comprehend so long - when reporting "disk FREE" status I want
> to know the amount of data that is guaranteed to be written in current
> RAID profile, i.e. ignoring any possible savings from compression etc.

Because we have so many ways to use the unallocated space.
It's just impossible to give you a single number of how many space you
can use.

For 4 disk with 1T free space each, if you're using RAID5 for data, then
you can write 3T data.
But if you're also using RAID10 for metadata, and you're using 

Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Tomasz Pala
On Fri, Aug 10, 2018 at 07:03:18 +0300, Andrei Borzenkov wrote:

>> So - the limit set on any user
> 
> Does btrfs support per-user quota at all? I am aware only of per-subvolume 
> quotas.

Well, this is a kind of deceptive word usage in "post-truth" times.

In this case both "user" and "quota" are not valid...
- by "user" I ment general word, not unix-user account; such user might
  possess some container running full-blown guest OS,
- by "quota" btrfs means - I guess, dataset-quotas?


In fact: https://btrfs.wiki.kernel.org/index.php/Quota_support
"Quota support in BTRFS is implemented at a subvolume level by the use of quota 
groups or qgroup"

- what the hell is "quota group" and how it differs from qgroup? According to 
btrfs-quota(8):

"The quota groups (qgroups) are managed by the subcommand btrfs qgroup(8)"

- they are the same... just completely different from traditional "quotas".


My suggestion would be to completely remove the standalone "quota" word
from btrfs documentation - there is no "quota", just "subvolume quota"
or "qgroup" supported.

-- 
Tomasz Pala 


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-10 Thread Tomasz Pala
On Fri, Aug 10, 2018 at 07:35:32 +0800, Qu Wenruo wrote:

>> when limiting somebody's data space we usually don't care about the
>> underlying "savings" coming from any deduplicating technique - these are
>> purely bonuses for system owner, so he could do larger resource overbooking.
> 
> In reality that's definitely not the case.

Definitely? How do you "sell" a disk space when there is no upper bound?
Every, and I mean _every_ resource quota out in the wild gives you an 
user-perspective.
You can assign CPU cores/time, RAM or network bandwidth with HARD limit.

Only after that you _can_ sometimes assign some best-effort
outer, not guaranteed limits, like extra network bandwidth or grace
periods with filesystem usage (disregarding technical details - in case
of quota you move hard limit beyond and apply lowere soft limit).

This is the primary quota usage. Quotas don't save system resources,
quotas are valuables to "sell" (by quotes I mean every possible
allocations, including interorganisation accouting).

Quotas are overbookable by design and like I said before, the underlying
savings mechanism allow sysadm to increase actual overbooking ratio.

If I run out of CPU, RAM, storage or network I simply need to expand
such resource. I won't shrink quotas in such case.
Or apply some other resuorce-saving technique, like LVM with VDO,
swapping, RAM deduplication etc.

If that is not the usecase of btrfs quotas, then it should be renamed to
not confuse users. Using the incorrect terms for things widely known
leads to user frustration at least.

> From what I see, most users would care more about exclusively used space
> (excl), other than the total space one subvolume is referring to (rfer).

Consider this:
1. there is some "template" system-wide snapshot,
2. users X and Y have CoW copies of it - both see "0 bytes exclusive"?
3. sysadm removes "template" - what happens to X and Y quotas?
4. user X removes his copy - what happens to Y quota?

The first thing about virtually every mechanism should be
discoverability and reliability. I expect my quota not to change without
my interaction. Never. How did you cope with this?
If not - how are you going to explain such weird behaviour to users?

Once again: numbers of quotas *I* got must not be influenced by external
operations or foreign users.

> The most common case is, you do a snapshot, user would only care how
> much new space can be written into the subvolume, other than the total
> subvolume size.

If only that would be the case... then exactly - I do care how much new
data is _guaranteed_ to fit on my storage.

So please tell me, as I might get it wrong - what happens if source
subvolume get's removed and the CoWed data are not shared anymore?
Is the quota recalculated? - this would be wrong, as there were no new data 
written.
Is the quota left intact? - this is wrong too, as this gives the false view of 
exclusive space taken.

This is just another reincarnation of famous "btrfs df" problem you
couldn't comprehend so long - when reporting "disk FREE" status I want
to know the amount of data that is guaranteed to be written in current
RAID profile, i.e. ignoring any possible savings from compression etc.


Please note: my assumptions are based on
https://btrfs.wiki.kernel.org/index.php/Quota_support

"File copy and file deletion may both affect limits since the unshared
limit of another qgroup can change if the original volume's files are
deleted and only one copy is remaining"

so if I write something invalid this might be the source of my mistake.


>> And the numbers accounted should reflect the uncompressed sizes.
> 
> No way for current extent based solution.

OK, since the data is provided by the user, it's "compressableness"
might be considered his saving (we only provide transparency).

>> Moreover - if there would be per-subvolume RAID levels someday, the data
>> should be accouted in relation to "default" (filesystem) RAID level,
>> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
>> data, and twice the data in an opposite scenario (like "dup" profile on
>> single-drive filesystem).
> 
> No possible again for current extent based solution.

Doesn't extent have information about devices it's cloned on? But OK,
this is not important until per-subvolume profiles are available.

>> In short: values representing quotas are user-oriented ("the numbers one
>> bought"), not storage-oriented ("the numbers they actually occupy").
> 
> Well, if something is not possible or brings so big performance impact,
> there will be no argument on how it should work in the first place.

Actually I think you did something overcomplicated (shared/exclusive),
which would only lead to user confusion (especially when his data
becomes "exclusive" one day without any known reason), misnamed ...and
not reflecting anything valuable, unless the problems with extent
fragmentation are already resolved somehow?

So IMHO current quotas are:
- not discoverable for 

Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-09 Thread Qu Wenruo


On 8/10/18 1:48 AM, Tomasz Pala wrote:
> On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:
> 
>> 2) Different limitations on exclusive/shared bytes
>>Btrfs can set different limit on exclusive/shared bytes, further
>>complicating the problem.
>>
>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>It lacks all the shared trees (mentioned below), and in fact such
>>shared tree can be pretty large (especially for extent tree and csum
>>tree).
> 
> I'm not sure about the implications, but just to clarify some things:
> 
> when limiting somebody's data space we usually don't care about the
> underlying "savings" coming from any deduplicating technique - these are
> purely bonuses for system owner, so he could do larger resource overbooking.

In reality that's definitely not the case.

From what I see, most users would care more about exclusively used space
(excl), other than the total space one subvolume is referring to (rfer).

The most common case is, you do a snapshot, user would only care how
much new space can be written into the subvolume, other than the total
subvolume size.

> 
> So - the limit set on any user should enforce maximum and absolute space
> he has allocated, including the shared stuff. I could even imagine that
> creating a snapshot might immediately "eat" the available quota. In a
> way, that quota returned matches (give or take) `du` reported usage,
> unless "do not account reflinks withing single qgroup" was easy to implemet.

In fact, that's the case. In current implementation, accounting on
extent is the easiest (if not the only) way to implement.

> 
> I.e.: every shared segment should be accounted within quota (at least once).

Already accounted, at least for rfer.

> 
> And the numbers accounted should reflect the uncompressed sizes.

No way for current extent based solution.

> 
> 
> Moreover - if there would be per-subvolume RAID levels someday, the data
> should be accouted in relation to "default" (filesystem) RAID level,
> i.e. having a RAID0 subvolume on RAID1 fs should account half of the
> data, and twice the data in an opposite scenario (like "dup" profile on
> single-drive filesystem).

No possible again for current extent based solution.

> 
> 
> In short: values representing quotas are user-oriented ("the numbers one
> bought"), not storage-oriented ("the numbers they actually occupy").

Well, if something is not possible or brings so big performance impact,
there will be no argument on how it should work in the first place.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota

2018-08-09 Thread Tomasz Pala
On Tue, Jul 31, 2018 at 22:32:07 +0800, Qu Wenruo wrote:

> 2) Different limitations on exclusive/shared bytes
>Btrfs can set different limit on exclusive/shared bytes, further
>complicating the problem.
> 
> 3) Btrfs quota only accounts data/metadata used by the subvolume
>It lacks all the shared trees (mentioned below), and in fact such
>shared tree can be pretty large (especially for extent tree and csum
>tree).

I'm not sure about the implications, but just to clarify some things:

when limiting somebody's data space we usually don't care about the
underlying "savings" coming from any deduplicating technique - these are
purely bonuses for system owner, so he could do larger resource overbooking.

So - the limit set on any user should enforce maximum and absolute space
he has allocated, including the shared stuff. I could even imagine that
creating a snapshot might immediately "eat" the available quota. In a
way, that quota returned matches (give or take) `du` reported usage,
unless "do not account reflinks withing single qgroup" was easy to implemet.

I.e.: every shared segment should be accounted within quota (at least once).

And the numbers accounted should reflect the uncompressed sizes.


Moreover - if there would be per-subvolume RAID levels someday, the data
should be accouted in relation to "default" (filesystem) RAID level,
i.e. having a RAID0 subvolume on RAID1 fs should account half of the
data, and twice the data in an opposite scenario (like "dup" profile on
single-drive filesystem).


In short: values representing quotas are user-oriented ("the numbers one
bought"), not storage-oriented ("the numbers they actually occupy").

-- 
Tomasz Pala 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html